“Astronomy and Statistics”

Daniel Mortlock (Imperial)

   Why "the statistical frontiers of astrophysics"?  Why not "the statistical frontiers of particle physics", or "the statistical frontiers of neuroscience" or "the statistical frontiers of palmistry?"  Any measurement that yields quantitative data inevitably requires statistical analysis, but lack of repeatable experiments in astronomy makes probabilistic methods unusually important, while huge modern astronomical data-sets provide an exciting real-world testing ground for innovative statistical techniques.  This introductory talk will explore astronomical statistics through a number concrete case-studies that range from completely straight-forward applications of Bayes's theorem to parameter estimation, through more ambiguous model-selection problems, to the far more open-ended world of data-mining.  There will also be particular emphasis on cases in which manifestly erroneous conclusions have been reached, and how this could often have been avoided simply by following Laplace's principle that "probability is nothing but common sense reduced to calculus".

“Statistics of TAOS:  the good, the bad, and the ugly”

John Rice (Berkeley)

   The Taiwanese-American Occultation Survey (TAOS) operates four 50 cm telescopes at Lu-Lin Observatory in central Taiwan to search for stellar occultations by small Kuiper belt objects. The robotic telescopes simultaneously monitor several hundred stars at 5 Hz searching for extremely rare occultations which would typically result in flux drops less than 30% in one or two consecutive time points.  (Larger objects could produce longer occultations of more complex shapes,  due to Fresnel diffraction.) To date, more than 10^5 star-hours of data have been recorded. 

   Requiring simultaneous detection by four telescopes guards against false positive events, including events of terrestial origin such as bats, airplanes, and satellites, and events due to “noise.”   The noise distribution of each lightcurve is not of a known form, and in particular is not Gaussian or Poisson.  Typical signal to noise ratios are around 10, and changes in atmospheric transparency introduce further uncertainty into the flux measurements.

   After briefly reviewing how the data are collected and the novel photometry. I will focus on a variety of statistical problems that arise in TAOS.  Although anchored in the context of TAOS, I think some of the issues are of wider spread interest.  As examples, in the context of TAOS, I will discuss the problem of detecting rare events among a large number of measurements and the importance of robust methods.  The talk will also illustrate the tensions inherent at the nexus of scientific data and statistical idealizations of the stochastic nature of that data, reflected in the often quoted phrase of George Box, “all models are wrong, but some are useful.”

“Issues and Directions in Parameter Estimation with Complex Models”

Chad Schafer (CMU)


   Complex simulation models are increasingly-important tools in the research of cosmologists and astronomers. Yet, the use of these models as part of formal statistical analysis presents challenging issues: How does one perform parametric inference when only provided a simulation code for mapping from parameter space to data? This presentation combines elements of recent publications of Schafer and Stark (2009) and Richards, et al. (2009), among others, to provide a basis for approaching parameter estimation in such situations. Specifically, I will explore the use of dimension reduction techniques for approximating the requisite likelihood functions, and describe how this can be naturally incorporated into a Monte Carlo approach to constructing confidence regions of optimal (minimax) expected size.

  This is joint work with Ann Lee, Joseph Richards, Philip Stark, and Peter Freeman.

“The  Full Monte Carlo:  A Live Performance with Stars”

Xiao-Li Meng (Harvard)

   Markov chain Monte Carlo (MCMC)  is an incredibly powerful general methodology for statistical computation and has been used with increasing intensity in modern Astrostatistics.  The popularity of MCMC has been driven by both its generality (from simple to highly complex problems) and its simplicity (the availability of out-of-the-box recipes). Unfortunately, like everything else, there is no free lunch. In many applications, standard “off-the-shelf” implementations may produce results that are far from what the theory predicts.  The first part of this talk is a tutorial of the two most popular MCMC algorithms, namely, the Gibbs Sampler and the Metropolis-Hasting Algorithm, and illustrates their good, bad, and ugly implementations via live demonstration. Audience participations are required, though no prior experiences are needed. 

  The second part of the talk presents an Ancillary-Sufficient Interweaving Strategy (ASIS), a surprisingly simple and effective boosting method for combating some of the serious problems revealed in the first part.  The ASIS method was discovered almost by accident during a Ph.D. student’s (Yaming Yu) struggle with fitting a Cox process model for detecting changes in source intensity of photon counts observed by the Chandra X-ray telescope from a (candidate) neutron/quark star. Yu’s method for solving that particular problem turned out to be of considerable generality, which ultimately led to the full formulation of ASIS (Yu and Meng, 2009). Among many of its applications, ASIS is helping another Ph.D. student (Paul Baines) to combat another exceedingly challenging MCMC implementation for hierarchical modeling for estimating ages of stellar populations from color-magnitude diagrams (Baines will present this project in his talk).