AISTATS*2014 Keynote Speakers

High-dimensional Causal Inference

Understanding cause-effect relationships between variables is of great interest in many fields of science. An ambitious but highly desirable goal is to infer causal effects from observational data obtained by observing a system of interest without subjecting it to interventions. This would allow to circumvent severe experimental constraints or to substantially lower experimental costs. Our main motivation to study this goal comes from applications in biology.

We present recent progress for estimating causal graphs (networks) and prediction of causal effects. Various restrictions in structural equation models lead to better or full identifiability, and statistical estimation and inference can be done using order search and penalized (sparse) regression. The methods and algorithms are computationally feasible and statistically consistent, even for high-dimensional settings with thousands of variables but a few observations only. We highlight exciting possibilities and fundamental limitations. In view of the latter, statistical modeling should be complemented with experimental validations: we discuss this in the context of molecular biology for yeast (Saccharomyces Cerevisiae) and the model plant Arabidopsis Thaliana.

Andrew Gelman, Columbia University

Weakly Informative Priors: When a little information can do a lot of regularizing

A challenge in statistics is to construct models that are structured enough to be able to learn from data but not be so strong as to overwhelm the data. We introduce the concept of "weakly informative priors" which contain important information but less than may be available for the given problem at hand. We discuss weakly informative priors for logistic regression coefficients, hierarchical variance parameters, covariance matrices, and other models, in various applications in social science and public health. I think this is an extremely important idea that should change how we think about Bayesian models.

Michael I. Jordan, University of California, Berkeley

On the Computational and Statistical Interface and "Big Data"

The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the statistical and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level---in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. I present three research vignettes on topics at the computation/statistics interface, the first involving the deployment of resampling methods such as the bootstrap on parallel and distributed computing platforms, the second involving large-scale matrix completion, and the third introducing a methodology of "algorithmic weakening," whereby hierarchies of convex relaxations are used to control statistical risk as data accrue. [Joint work with Venkat Chandrasekaran, Ariel Kleiner, Lester Mackey, Purna Sarkar, and Ameet Talwalkar].