
Peter Bühlmann,
ETH Zürich
Highdimensional Causal Inference
Understanding causeeffect relationships between variables is of great
interest in many fields of science. An ambitious but highly desirable
goal is to infer causal effects from observational data obtained by
observing a system of interest without subjecting it to interventions. This
would allow to circumvent severe experimental constraints or to substantially
lower experimental costs. Our main motivation to study this goal comes from
applications in biology.
We present recent progress for estimating causal graphs (networks) and
prediction of causal effects. Various restrictions in structural
equation models lead to better or full identifiability, and statistical
estimation and inference can be done using order search and penalized
(sparse) regression. The methods and algorithms are computationally feasible and
statistically consistent, even for highdimensional settings with
thousands of variables but a few observations only. We highlight exciting
possibilities and fundamental limitations. In view of the latter,
statistical modeling should be complemented with experimental validations:
we discuss this in the context of molecular biology for yeast
(Saccharomyces Cerevisiae) and the model plant Arabidopsis Thaliana.


Andrew Gelman,
Columbia University
Weakly Informative Priors: When a little information can do a lot of regularizing
A challenge in statistics is to construct models that are structured enough to be able to learn
from data but not be so strong as to overwhelm the data. We introduce the concept of "weakly
informative priors" which contain important information but less than may be available for the
given problem at hand. We discuss weakly informative priors for logistic regression coefficients,
hierarchical variance parameters, covariance matrices, and other models, in various applications in
social science and public health. I think this is an extremely important idea that should change
how we think about Bayesian models.


Michael I. Jordan,
University of California, Berkeley
On the Computational and Statistical Interface and "Big Data"
The rapid growth in the size and scope of datasets in science and
technology has created a need for novel foundational perspectives on
data analysis that blend the statistical and computational sciences.
That classical perspectives from these fields are not adequate to
address emerging problems in "Big Data" is apparent from their sharply
divergent nature at an elementary levelin computer science, the
growth of the number of data points is a source of "complexity" that
must be tamed via algorithms or hardware, whereas in statistics, the
growth of the number of data points is a source of "simplicity" in
that inferences are generally stronger and asymptotic results can be
invoked. I present three research vignettes on topics at the
computation/statistics interface, the first involving the deployment
of resampling methods such as the bootstrap on parallel and
distributed computing platforms, the second involving largescale
matrix completion, and the third introducing a methodology of
"algorithmic weakening," whereby hierarchies of convex relaxations are
used to control statistical risk as data accrue. [Joint work with
Venkat Chandrasekaran, Ariel Kleiner, Lester Mackey, Purna Sarkar, and
Ameet Talwalkar].
