Drawing Causal Inference from Big Data
This meeting was held March 26-27, 2015 at the National Academy of Sciences 2101 Constitution Ave. NW in Washington, D.C.
Organized by Richard M. Shiffrin (Indiana University), Susan Dumais (Microsoft Corporation), Mike Hawrylycz (Allen Institute), Jennifer Hill (New York University), Michael Jordan (University of California, Berkeley), Bernhard Schölkopf (Max Planck Institute) and Jasjeet Sekhon (University of California, Berkeley)
Graduate Student / Postdoctoral Researcher travel awards sponsored by the National Science Foundation and the Ford Foundation.
Overview
This colloquium was motivated by the exponentially growing amount of information collected about complex systems, colloquially referred to as “Big Data”. It was aimed at methods to draw causal inference from these large data sets, most of which are not derived from carefully controlled experiments. Although correlations among observations are vast in number and often easy to obtain, causality is much harder to assess and establish, partly because causality is a vague and poorly specified construct for complex systems. Speakers discussed both the conceptual framework required to establish causal inference and designs and computational methods that can allow causality to be inferred. The program illustrates state-of-the-art methods with approaches derived from such fields as statistics, graph theory, machine learning, philosophy, and computer science, and the talks will cover such domains as social networks, medicine, health, economics, business, internet data and usage, search engines, and genetics. The presentations also addressed the possibility of testing causality in large data settings, and will raise certain basic questions: Will access to massive data be a key to understanding the fundamental questions of basic and applied science? Or does the vast increase in data confound analysis, produce computational bottlenecks, and decrease the ability to draw valid causal inferences?
Videos of the talks are available on the Sackler YouTube Channel. More videos will be added as they are approved by the speakers.
Speakers' Bio sketches
Agenda
Thursday, March 26
Introduction, Richard Shiffrin, Indiana University, The Big Data Sea Change
Michael Jordan, University of California, Berkeley, On Computational Thinking, Inferential Thinking and Big Data
Judea Pearl, University of California, Los Angeles, Taming the Challenge of Extrapolation: From Multiple Experiments and Observations to Valid Causal Conclusions
Thomas Richardson, University of Washington, Non-parametric Causal Inference
David Heckerman, Microsoft Corporation, Causal Inference in the Presence of Hidden Confounders in Genomics
Jasjeet Sekhon, University of California, Berkeley, Combining Experiments with Big Data to Estimate Treatment Effects
Bin Yu, University of California, Berkeley, Lasso Adjustments of Treatment Effect Estimates in Randomized Experiments
Bernhard Schölkopf, Max Planck Institute, Toward Causal Machine Learning
John Stamatoyannopoulos, University of Washington, Decoding the Human Genome: From Sequence to Knowledge
Edo Airoldi, Harvard University, Optimal Design of Causal Experiments in the Presence of Social Interference
Peter Buhlmann, ETH Zurich, Causal Inference Based on Invariance: Exploiting the Power of Heterogeneous Data
Annual Sackler Lecture
Introduction by Ralph J. Cicerone, President, National Academy of Sciences
Sackler Lecture presented by Steven Levitt, The University of Chicago, Thinking Differently About Big Data
Friday, March 27
Michael Hawrylycz, Allen Institute, Project MindScope: From Big Data to Behavior in the Functioning Cortex
David Madigan, Columbia University, Honest Inference From Observational Database Studies
Susan Athey, Stanford University, Estimating Heterogeneous Treatment Effects Using Machine Learning in Observational Studies
Leon Bottou, Facebook AI Research, Causal Reasoning and Learning Systems
Dean Eckles, Facebook, Identifying Peer Effects in Social Networks
Hal Varian, Google, Inc., Causal Inference, Econometrics, and Big Data
James Robins, Harvard University, Personalized Medicine, Optimal Treatment Strategies, and First Do No Harm: Time Varying Treatments and Big Data
James Fowler, University of California, San Diego, An 85 Million Person Follow-up to a 61 Million Person Experiment in Social Influence and Political Mobilization
General Discussion, Jennifer Hill, New York University