Drawing Causal Inference from Big Data

This meeting was held March 26-27, 2015 at the National Academy of Sciences 2101 Constitution Ave. NW in Washington, D.C.
Organized by Richard M. Shiffrin (Indiana University), Susan Dumais (Microsoft Corporation), Mike Hawrylycz (Allen Institute), Jennifer Hill (New York University), Michael Jordan (University of California, Berkeley), Bernhard Schölkopf (Max Planck Institute) and Jasjeet Sekhon (University of California, Berkeley)
Graduate Student / Postdoctoral Researcher travel awards sponsored by the National Science Foundation and the Ford Foundation.

Overview

This colloquium was motivated by the exponentially growing amount of information collected about complex systems, colloquially referred to as “Big Data”. It was aimed at methods to draw causal inference from these large data sets, most of which are not derived from carefully controlled experiments. Although correlations among observations are vast in number and often easy to obtain, causality is much harder to assess and establish, partly because causality is a vague and poorly specified construct for complex systems. Speakers discussed both the conceptual framework required to establish causal inference and designs and computational methods that can allow causality to be inferred. The program illustrates state-of-the-art methods with approaches derived from such fields as statistics, graph theory, machine learning, philosophy, and computer science, and the talks will cover such domains as social networks, medicine, health, economics, business, internet data and usage, search engines, and genetics. The presentations also addressed the possibility of testing causality in large data settings, and will raise certain basic questions: Will access to massive data be a key to understanding the fundamental questions of basic and applied science? Or does the vast increase in data confound analysis, produce computational bottlenecks, and decrease the ability to draw valid causal inferences?

Videos of the talks are available on the Sackler YouTube Channel. More videos will be added as they are approved by the speakers.

Speakers' Bio sketches

Agenda

Thursday, March 26

Introduction, Richard Shiffrin, Indiana University, The Big Data Sea Change

Michael Jordan, University of California, Berkeley, On Computational Thinking, Inferential Thinking and Big Data

Judea Pearl, University of California, Los Angeles, Taming the Challenge of Extrapolation: From Multiple Experiments and Observations to Valid Causal Conclusions

Thomas Richardson, University of Washington, Non-parametric Causal Inference

David Heckerman, Microsoft Corporation, Causal Inference in the Presence of Hidden Confounders in Genomics

Jasjeet Sekhon, University of California, Berkeley, Combining Experiments with Big Data to Estimate Treatment Effects

Bin Yu, University of California, Berkeley, Lasso Adjustments of Treatment Effect Estimates in Randomized Experiments

Bernhard Schölkopf, Max Planck Institute, Toward Causal Machine Learning

John Stamatoyannopoulos, University of Washington, Decoding the Human Genome: From Sequence to Knowledge

Edo Airoldi, Harvard University, Optimal Design of Causal Experiments in the Presence of Social Interference

Peter Buhlmann, ETH Zurich, Causal Inference Based on Invariance: Exploiting the Power of Heterogeneous Data

Annual Sackler Lecture

Introduction by Ralph J. Cicerone, President, National Academy of Sciences

Sackler Lecture presented by Steven Levitt, The University of Chicago, Thinking Differently About Big Data

Friday, March 27

Michael Hawrylycz, Allen Institute, Project MindScope: From Big Data to Behavior in the Functioning Cortex

David Madigan, Columbia University, Honest Inference From Observational Database Studies

Susan Athey, Stanford University, Estimating Heterogeneous Treatment Effects Using Machine Learning in Observational Studies

Leon Bottou, Facebook AI Research, Causal Reasoning and Learning Systems

Dean Eckles, Facebook, Identifying Peer Effects in Social Networks

Hal Varian, Google, Inc., Causal Inference, Econometrics, and Big Data

James Robins, Harvard University, Personalized Medicine, Optimal Treatment Strategies, and First Do No Harm: Time Varying Treatments and Big Data

James Fowler, University of California, San Diego, An 85 Million Person Follow-up to a 61 Million Person Experiment in Social Influence and Political Mobilization

General Discussion, Jennifer Hill, New York University