Invited speaker presentation abstracts
Allan Donner
Title: Methods for the Meta-Analysis of Cluster Randomization Trials
Abstract: This talk begins by reviewing some of the methodological challenges that arise when it is of interest to synthesize the results from several cluster randomization trials.Focussing on the case of a binary outcome, we also report on a simulation study comparing several approaches for conducting inferences on the overall effect of an intervention.
Donner,A and Klar,N (2002) Issues in the meta-analysis of cluster randomization trials. Statistics in Medicine 21, 2971-2980.
Nicholas Horton
Title: Much ado about nothing: methods and software implementations to estimate incomplete data regression models
Abstract: Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The development of statistical methods to address missingness has been actively pursued in recent years, including imputation, likelihood and weighting approaches (Ibrahim et al, JASA 2005; Horton and Kleinman, TAS 2007). Each approach is considerably more complicated when there are many patterns of missing values and both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available, though not widely implemented (Burton and Altman, BJC 2004).
We review these methods in the context of a motivating example from a large health services research dataset. While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible and scientifically desirable to incorporate partially observed values as well as undertake sensitivity analyses to modelling and missingness assumptions.
Russell Millar
Title: Zero-inflated and hierarchical negative binomial modelling of count data
Abstract: The hierarchical negative binomial model corresponds to a lognormal-gamma-Poisson model with the gamma variance component applied at the observation level of the hierarchy. This model introduces the additional challenge of choosing an appropriate prior for the gamma variance component. One possibility is to match the prior dispersion of the gamma with that of the higher level lognormal variance components. Comparison with the more conventional lognormal-Poisson hierarchical model (with lognormal variance components at all levels of the hierarchy) must be done with care. The DIC was seen to be a very dangerous model assessment criterion if used naively. Conditional posterior predictive checks on the zero and non-zero counts were found to be much more effective than omnibus checks for detecting lack of fit or the need to zero-inflate the model.
Kerrie Mengersen
Title: Making decisions based on diverse sources of information: a Bayesian perspective
Abstract: Almost every decision that we make is based on multiple sources of information. In many fields, such as medicine, natural resources and finance, a critical component of 'evidence-based' decision-making is data, presented through analysis, models, design and so on. However, in practice even these decisions are made by modifying the statistical results in light of the decision-maker's prior beliefs and other available information. For example, an expert interested in the location of a rare and threatened species might consider a map of predicted presences based on available data, then 'white-out' those areas which in the opinion of the expert the animal would not inhabit. A community group might decide about water management or allocation based on a combination of careful water flow models, statistical water quality analyses and local knowledge. In most cases this is absolutely what we would want these experts to do, but it is of interest to explore how it might be achieved in a more formal manner; that is, how we might combine this expertise, related results and current data in a single analysis taking into account the relative strengths of the individual sources of evidence.
In this presentation, we explore the use of a Bayesian framework for combining data with informative priors in models such as GLMs and CART. In doing so, we will consider options for elicitation of expert information and representation of this information as priors. This will involve a showcase of software that we have developed for the elicitation of geographic information. We will also canvas the use of Bayesian networks as a mechanism for such integration. The ideas will be discussed through a series of case studies in environmental decision-making, in which we have been involved with our government collaborators in the past few years.
Ross Sparks
Title: Early disease outbreak detection
Abstract: Emergency departments in New South Wales (Australia) are collecting data using the doctors' major diagnosis, triage nurse notes and laboratory assessments for notifiable diseases. Integrating this information into a real-time surveillance system that makes full use of all dimensions of the data is a challenge. At the simplest level monitoring daily counts of diseases is used to identify unusual high counts and thus flag epidemics or public health issues for a single disease. However it is known that (infectious) diseases start in small clusters, often determined by spatial location, age group, gender, etc. Technology for detecting spatial disease clusters can be found in Raubertas (1989), Kulldorff (2001) and Diggle et al. (2005). However spatial information could be place of residence, work/school location or treatment centre.
Surveillance methodology will be discussed that allows for several levels of geographical location to be used in describing geographical clustering properties of the disease. In other words, a surveillance system which exploits the clustering nature of diseases by tracking the sources of variation will be demonstrated as an efficient way of monitoring diseases. This strategy will be demonstrated to lead to earlier detection than monitoring disease counts aggregations over all these dimensions (e.g., see Rossi et al. 1999). This strategy also has the advantage of describing how epidemics move over time in the population.
Diggle, P.J., Rowlingson, B. and Su, T-L. (2005). Point process methodology for. on-line spatio-temporal disease surveillance. Environmetrics, 16, 423-34
Kulldorff, M. (2001). Prospective time periodic geographical disease surveillance using a scan statistic. Journal of Royal Statistical Society, Series A, 164, 61-72.
Raubertas, G. (1989). An analysis of disease surveillance data that uses the geographical locations of the reporting units. Statistics in Medicine, 8, 267-271.
Rossi, G., Lampugnani, L., and Marchi, M. (1999). An approximate CUSUM procedure for surveillance of health events. Statistics in Medicine, 18, 2111-2122.
Ari Verbyla
Title: Whole genome analysis of QTL
Abstract: Associating genomic regions with the performance of plant crops is aimed at improving plant breeding and ultimately farmers. The association of quantitative traits with genetic information using molecular marker data is called QTL analysis or mapping and the area has an extensive literature; see Collard et al. (2005) for a review. In the main, methods of analysis are largely two-stage. At stage one, a summary measure of the trait for each line of interest is obtained, and at stage two, these summary measures are used to establish association with the genetic information. The approaches for analysis at the second stage have been very piecemeal and involve repeated model fitting. A more comprehensive approach is to combine the trait, design, management and genetic information into a unified data set and to reduce the level of refitting to a minimum.
Along these lines, a whole genome average interval mapping (WGAIM) approach has been proposed by Verbyla et al. (2007). The approach is an extension of interval mapping, and using a linkage map of the molecular marker data, incorporates all intervals on that linkage map simultaneously in the analysis. A simple working model is proposed in which the sizes of putative QTL for all intervals across the genome are random effects. This allows a full mixed model specification that incorporates both genetic and non-genetic sources of variation in the analysis. An outlier detection method is used to screen for possible QTL, and selected QTL are subsequently fitted as fixed effects. The selection process requires a sequence of models to be fitted. However, the number of such models is greatly reduced in comparison to standard methods of analysis. A stopping rule is available.
In a comprehensive simulation study, the proposed method has been shown to be superior to composite interval mapping in terms of power of detection of QTL. There is an increase in the rate of false positive QTL detected when using the new approach, but this rate decreases as the population size increases. The new approach is much simpler computationally. The method has natural extensions in the multivariate context, be it multi-environment trials, multi-treatment situations or the case of multiple traits. Both additive and dominance effects can be examined, as can epistatic interactions. Analyses of various experiments will be presented using an implementation of the approach in the ASReml software (Gilmour et al., 2007; Butler et al., 2007).
Butler, D. G., Cullis, B. R., Gilmour, A. R. and Gogel, B. J. (2007). ASReml-R Reference Manual. Release 2. Queensland Department of Primary Industries.
Collard, B. C. Y., Jahufer, M. Z. Z., Brouwer, J. B. and Pang, E. C. K. (2005). An introduction to markers, quantitative trait loci (QTL) and marker-assisted selection for crop improvement: The basic concepts. Euphytica 142, 169-196.
Gilmour, A. R., Gogel, B. J., Cullis, B. R. and Thompson, R. (2007). ASReml Users Guide. Release 2.0. VSN International Ltd: Hemel Hampsted, UK.
Verbyla, A. P., Cullis, B.R. and Thompson, R. (2007). The analysis of QTL by simultaneous use of the full linkage map. Theoretical and Applied Genetics, submitted.
Robert L. Wolpert
Title: Bayesian Semiparametric Space-Time Models
Abstract:
A new class of semi-parametric Bayesian models is introduced for spatial, temporal, and spatio-temporal data, generalizing the kernel convolution of Levy random fields. The method is useful for building flexible spatio-temporal models that can accommodate non-Gaussian non-stationary spatio-temporal data while keeping the computation feasible even for large data sets. The methods are illustrated in an application to sulfur dioxide monitoring in mid-Atlantic states.
Jim Zidek
Title: Reconciling Physical & Statistical Approaches to Modelling
Abstract: The cultures of physical and statistical modellers differ greatly. However, a search for reconciliation has begun, driven by the practical requirements of handling processes over very large space-time domains, and the risks attached to them. My talk derives from the experience of me and my UBC co-researchers, Nhu Le, Yiping Dou, and Zhong Liu with much input from Douw Steyn, an atmospheric scientist. We have been examining hourly ground level ozone concentrations over a very large part of the eastern USA. In particular, we have been seeing how to reconcile simulated data from MAQSIP, a very large deterministic model for that field, and data from about 300 sites. The data were produced over about 120 days in a single summer. I will describe our approaches and some of the results. However, much of the discussion will be devoted to more fundamental issues arising from the differences between these two cultures.
Last updated 9 Dec 2007.
![[Another coffs harbour beach]](coffswide1.jpg)