[Washington Statistical Society]
[WSS home] [WSS Newsletter] [WSS Information] [Seminars] [Short Courses] [Employment] [Feedback] [Join WSS!]

Washington Statistical Society Seminars

Current | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | 1997 | 1996 | 1995 | Methodology

January, 2008
9
Wed.
Reliability Growth Projection of One-ShotSystems
9
Wed.
U. S. Census Bureau
Demographic Statistical Methods Division Seminar
Alternative Survey Sample Designs, Seminar #2: Sampling with Multiple Overlapping Frames
16
Wed.
Medicaid Underreporting in the CPS: Results from a Record Check Study
17
Thur.
Questionnaire Design Guidelines for Establishment Surveys
23
Wed.
Coverage Measurement for the 2010 Census
30
Wed.
U.S. Bureau Of Census
Statistical Research Division Seminar
Extracting Intrinsic Modes in Stationary and Nonstationary Time Series Using Reproducing Kernels and Quadratic Programming
31
Thur.
U.S. Bureau Of Census
Statistical Research Division Seminar
Experiments on the Optimal Design of Complex Survey Questions
February, 2008
5
Tues.
U. S. Census Bureau
The Wise Elders Program Seminar
A View From The Field
6
Wed.
Statistical Analysis of Bullet Lead Compositions as Forensic Evidence
7
Thur.
Reducing Disclosure Risk in Microdata and Tabular Data
8
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences and Department of Statistics Seminar
Functional Shape Analysis to Forecast Box-Office Revenue using Virtual Stock Exchanges
13
Wed.
Conducting a 360 Degree Feedback Survey for Managers: Implementing Organization Culture Change
14
Thur.
National Household Travel Survey - Demographic Indicators of Travel Demand
15
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences
Pre-Modeling via Bayesian Additive Regression Trees (BART)
20
Wed.
University of Maryland
Statistics Program Seminar
Rationalizing Momentum Interactions
22
Fri.
Office of Biostatistics Research
National Heart, Lung and Blood Institute
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
22
Fri.
Georgetown University Seminar
Resampling from the past to improve on MCMC algorithms
22
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Knowledge Mining in Health Care
March, 2008
5
Wed.
Generalized Confidence Intervals: Methodology and Applications
6
Thur.
Bringing Statistical Principles to US Elections
6
Thur.
Statistics Can Lie But Can Also Correct for Lies: Reducing Response Bias in NLAAS via Bayesian Imputation
6
Thur.
University of Maryland
Statistics Program Seminar
One-Sided Coverage Intervals for a Proportion Estimated from a Stratified Simple Random Sample
7
Fri.
George Washington University
Department of Statistics Seminar
Bayesian Variable Selection Methods For Class Discovery And Gene Selection
7
Fri.
George Washington University
Department of Decision Sciences and The Institute for Integrating Statistics in Decision Sciences Seminar
Baby at Risk: The Uncertain Legacies of Medical Miracles for Babies, Families and Society
13
Thur.
A Semiparametric Generalization of One-Way ANOVA
13
Thur.
University of Maryland
Statistics Program Seminar
What happens to the location estimator if we minimize with a power other than 2?
28
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences and Department of Statistics Seminar
Non-parametric Continuous Bayesian Belief Nets
28
Fri.
U. S. Census Bureau
Statistical Research Division Seminar
Using Cognitive Predictors for Evaluation
April, 2008
2
Wed.
Studies in Military Medicine from the Center for Data Analysis and Statistics (CDAS) at West Point
8
Tues.
Using the Peters-Belson Method in EEO Personnel Evaluations
11
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences and Department of Statistics Seminar
Computation with Imprecise Probabilities
11
Fri.
Joint Program in Survey Methodology Distinguished Lecture
Survey Design a la carte: Survey Research in the 21st Century
15
Tues.
Assessing Disclosure Risk, and Preventing Disclosure, in Microdata
15
Tues.
U. S. Census Bureau
Statistical Research Division Seminar
Statistical Meta-Analysis - a Review
25
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Text Mining, Social Networks, and High Dimensional Analysis
25
Fri.
George Washington University
Department of Statistics
Network Sampling with Sampled Networks
25
Fri.
Georgetown University Seminar
Preprocessing in High Throughput Biological Experiments
May, 2008
2
Fri.
Statistical issues in disease surveillance: A case study from ESSENCE
2
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Some Issues Raised by High Dimensions in Statistics
5
Mon.
Statistical Issues Arising in the Interpretation of a Measure of Relative Disparity Used in Educational Funding: The Zuni School District 89 Case
8
Thur.
U. S. Census Bureau
9th Elders Program Seminar
Different Directorates, Not So Different Approach
13
Tues.
Multivariate Event Detection and Characterization
15
Thur.
President's Invited Seminar
What's Up at the ASA?
16
Fri.
Bayesian Dose-finding Trial Designs for Drug Combinations
June, 2008
10
Tues.
Nonresponse Adjustments in Survey Applications
17
Tues.
Recent Developments in Address-based Sampling
26
Thur.
Multiple Frame Surveys: Lessons from CBECS Experience



WSS Home | Newsletter | WSS Info | Seminars | Courses | Employment | Feedback | Join!


Title: Reliability Growth Projection of One-ShotSystems

Abstract:

This paper offers several contributions to the area of discrete reliability growth projection. We present a new, logically derived model for estimating the reliability growth of complex, one-shot systems (i.e., the reliability following implementation of corrective actions to known failure modes). Multiple statistical estimation procedures are utilized to approximate this exact expression. A new estimation method is derived to approximate the vector of failure probabilities associated with a complex, one-shot system. A mathematically convenient functional form for the s-expected initial system reliability of a one-shot system is derived. Monte-Carlo simulation results are presented to highlight model accuracy with respect to resulting estimates of reliability growth. This model is useful to program managers, and reliability practitioners who wish to assess one-shot system reliability growth.

Index Terms One-shot system, projection, reliability growth.

U. S. CENSUS BUREAU
DEMOGRAPHIC STATISTICAL METHODS DIVISION SEMINAR

Title: Alternative Survey Sample Designs, Seminar #2: Sampling with Multiple Overlapping Frames

Abstract:

The Census Bureau's Demographic Survey Sample Redesign Program, among other things, is responsible for research into improving the designs of demographic surveys, particularly focused on the design of survey sampling. Historically, the research into improving sample design has been restricted to the "mainstream" methods like basic stratification, multi-stage designs, systematic sampling, probability-proportional-to size sampling, clustering, and simple random sampling. Over the past thirty years or more, we have increasingly faced reduced response rates and higher costs coupled with an increasing demand for more data on all types of populations. More recently, dramatic increases in computing power and availability of auxiliary data from administrative records have indicated that we may have more options than we did when we established our current methodology.

This seminar series is the beginning of an exploration into alternative methods of sampling. In this second seminar of the three seminar series, from 9:30 to 10:30, we will hear about Professor Lohr's work on the use of multiple overlapping frames for sampling. She will discuss various alternative approaches and their statistical properties. Following Professor Lohr's presentation, there will be a 10-minute break, and then from 10:40 to 11:30, Professor Jean Opsomer will provide discussion about the methods and their potential in demographic surveys, particularly focusing on impact on estimation. The seminar will conclude with an open discussion session from 11:30 to 11:45 with 15 additional minutes available if necessary.

Seminar #3 is currently slated for June 2, 2008 and will feature Professor Yves Tille of University of Neuchatel in Switzerland discussing balanced sampling.

This event is accessible to persons with disabilities. Please direct all requests for sign language interpreting services, Computer Aided Real-time Translation (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Title: Medicaid Underreporting in the CPS: Results from a Record Check Study

Abstract:

The Medicaid program covers roughly 38 million people in the U.S., and the research community regularly studies the effectiveness of the program. Though administrative records provide information on enrollment status and history, the data are 3 years old before they can be used for analysis, and they do not offer information on certain characteristics of Medicaid enrollees, such as their employment status, health status and use of health services. Researchers generally turn to surveys for this type of rich data, and the Current Population Survey (CPS) is one of the most common sources used for analysis. However, there is a fairly substantial literature that indicates Medicaid is underreported in surveys when compared to counts from records. Recently an inter-agency team of researchers was assembled to address the Medicaid undercount issue in the CPS. Records on enrollment in 2000-2001 were compiled from the Medicaid Statistical Information System (MSIS) and matched to the CPS survey data covering the same years. This matched dataset allows researchers to compare data on known Medicaid enrollees to survey data in which those same enrollees were (or were not) reported to have been covered by Medicaid. This kind of "truth source" enables a rich analysis of the respondent and household member characteristics associated with Medicaid misreporting. In the CPS a single household respondent is asked questions about coverage status for all other household members, and one possible source of misreporting is the relationship between the household respondent and the other household members for whom he or she is reporting. Recent research from cognitive testing of the CPS suggests that the household respondent may be more likely to report accurately about another household member if they both share the same coverage. This paper explores whether the hypothesis suggested by cognitive testing is evident in the records data. Other variables are also considered, such as recency and duration of coverage and demographics of both respondents and people for whom they are reporting.

Title: Questionnaire Design Guidelines for Establishment Surveys

Abstract:

Previous literature has shown the effects of question wording or visual design on the data provided by respondents. However, few articles have been published that link the effects of question wording and visual design to the development of questionnaire design guidelines. This article proposes specific guidelines for the design of establishment surveys within statistical agencies based on theories regarding communication and visual perception, experimental research on question wording and visual design, and findings from cognitive interviews with establishment survey respondents. The guidelines are applicable to both paper and electronic instruments, and cover such topics as the phrasing of questions, the use of space, the placement and wording of instructions, the design of answer spaces, and matrices.

This talk is an expanded version of a paper given at ICES-3 in Montreal, Quebec, Canada in June 2007. It represents a collaborative effort with Don A. Dillman (Washington State University), and Leah M. Christian (University of Georgia).

Title: Coverage Measurement for the 2010 Census

Abstract:

For the 2010 Census Coverage Measurement (CCM), we plan to use logistic regression modeling instead of post-stratification cells in the dual system estimation. We believe that by using logistic regression that we can potentially utilize more variables than we have used in the past in trying to minimize the impact of correlation bias and high variances. Logistic regression gives us the option of using variables in the modeling as main effects and not having to introduce any unnecessary interactions. In addition to potentially utilizing more variables, logistic regression can also use variables in the model as continuous variables. This presentation shows some of the initial results of using continuous variables for the modeling and dual system estimation.

U.S. BUREAU OF CENSUS
STATISTICAL RESEARCH DIVISION SEMINAR

Topic: Extracting Intrinsic Modes in Stationary and Nonstationary Time Series Using Reproducing Kernels and Quadratic Programming

Abstract:

The Empirical ModeDecomposition (EMD) method is a nonlinear adaptive process for stationary and nonstationary time series which produces a finite family of Intrinsic Mode Functions (IMFs) from which many time-frequency properties of the data can be analyzed. The main difficulty in extracting IMFs is in choosing the criteria of convergence in the iterative extraction algorithm along with the basis in which the IMFs are represented. In this paper, we introduce a new method for extracting intrinsic modes using certain linear combinations of reproducing kernel functions which satisfy a quadratic programming problem with both equality and inequality constraints. We discuss advantages of this proposed method of signal extraction compared to the classical EMD approach and present applications to nonstationary time series.

This seminar is physicallyaccessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free & confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your request via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

U.S. BUREAU OF CENSUS
STATISTICAL RESEARCH DIVISION SEMINAR

Topic: Experiments on the Optimal Design of Complex Survey Questions

Abstract:

Many survey questions are complex in order to convey very specific information to respondents. Often, questions could be written in several different ways to meet specific measurement goals-for example, the information could be compressed into a single complex item or spread out over multiple questions; when single questions are used, the information within them can be structured in various ways; and, potentially ambiguous concepts can be illustrated through examples or definitions. Questionnaire designers must also decide whether it is necessary to include certain details in questions, or whether general statements will be sufficient to explain the topic of interest and to adequately stimulate recall. Although questionnaire design principles provide some advice on constructing complex questions, little empirical evidence demonstrates the superiority of certain decisions over others.

This seminar presents results from several rounds of split-ballot experiments that were designed to provide more systematic guidance on these issues. Alternative versions of questions were embedded in RDD telephone surveys (n=450 in one set of experiments and n=425 in another). In some experiments, alternative questions used the same words but were structured differently. Other experiments compared the use of examples and definitions to explain complex concepts, compared the use of one vs. two questions to measure the same phenomenon, and compared questions before and after cognitive interviews had been used to clarify key concepts. With respondent permission, interviews were tape recorded and behavior-coded, making it possible to compare various interviewer and respondent difficulties across question versions, in addition to comparing differences in response distributions.

Taken altogether, the results begin to suggest some general design principles for complex questions. For example, the disadvantages of presenting information that "dangles" after the question mark are becoming clear, as are the advantages of using multiple questions to disentangle certain complex concepts. The paper will report results of these and other experimental comparisons, with an eye toward providing more systematic questionnaire design guidance.

This seminar is physicallyaccessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free & confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your request via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

U. S. CENSUS BUREAU
THE WISE ELDERS PROGRAM SEMINAR

Title: A View From The Field

Abstract:

I will talk about some of my experiences and career with the Census Bureau. I will talk a bit about the Field data collection environment and some of the changes that I have observed over the last 40 years. I will also talk a bit about the potential impact of those changes on the Census Bureau and the larger statistical community. I will conclude with some thoughts on the challenges I see ahead for the Census Bureau.

Biography:

In June 2005, James Holmes retired after a distinguished 37-year career with the United States Census Bureau. His position just prior to retirement was Director for the Atlanta Regional Office. During his career, Jim also served as the Director for the Philadelphia Regional Office, Assistant Director for the Los Angeles Regional Office, Assistant Census Manager for the Kansas City Regional Office, as well as other management positions in the Kansas City and Detroit Regions.

In January 1998, Jim was appointed by then Secretary of Commerce William Daley, to serve as Acting Director of the Census Bureau while the search was conducted for a new permanent Director. He is the only African American to have served as Census Bureau Director (acting or otherwise). Jim served in that capacity through October 1998, when Dr. Kenneth Prewitt was sworn in as Director. In the press release announcing Dr. Prewitt's confirmation, Secretary Daley made the following statement:

"I would also like to take this opportunity to thank James Holmes, who has served with distinction as Acting Director of the Census Bureau for most of this year. Jim has done a superb job, successfully guiding the Census Bureau at a very critical time. I salute Jim for his hard work and success. Both Dr. Prewitt and I will continue to turn to him for his trusted insights as the important work of the Census Bureau moves forward."

Important Information:

Please e-mail or callLaVonne Lewis by COB, Friday, February 1, to be placed on the visitors' list - lavonne.m.lewis@census.gov; (301) 763-2118. A photo ID is required for security purposes.

Please direct all requests for Sign Language Interpreting Services, Computer Aided Real-time (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Title: Statistical Analysis of Bullet Lead Compositions as Forensic Evidence

Abstract:

Since the 1960s, FBI has performed Compositional Analysis of Bullet Lead (CABL), a forensic technique that compares the elemental composition of bullets found at a crime scene to that of bullets found in a suspect's possession. CABL has been used when no gun is recovered, or when bullets are too small or fragmented to compare striations on the casings with those on the gun barrel.

The National Academy of Sciences formed a Committee charged with the assessment of CABL's scientific validity. The report, "Forensic Analysis: Weighing Bullet Lead Evidence" (National Research Council, 2004), included discussions on the effects of the manufacturing process on the validity of the comparisons, the precision and accuracy of the chemical measurement technique, and the statistical methodology used to compare two bullets and test for a "match". The report has been cited in recent appeals brought forth by defendants whose trials involved bullet lead evidence (60 Minutes, 11/18/2007; Washington Post, 11/18-19/2007). This talk will focus on the statistical analysis: the FBI's methods of testing for a ``match'', the apparent false positive and false negative rates, the FBI's clustering algorithm (``chaining''), and the Committee's recommendations. Additional analyses on data later made available, and the use of forensic evidence in general, also will be discussed.

Index Terms One-shot system, projection, reliability growth.

Title: Reducing Disclosure Risk in Microdata and Tabular Data

Abstracts:

Analytically Valid Discrete Data Files and Re-identification (Winkler)

With the exception of synthetic data (e.g., Reiter 2002, 2005) and a few other methods (Kim 1986, Dandekar, Cohen, and Kirkendal 2002), masking methods and resultant public-use files are seldom justified in terms of valid analytic properties. If a file has valid analytic properties, then the analytic characteristics can be used as a starting point for re-identification using analytic methods only (Lambert 1993, Fienberg 1997). In this paper, we describe a general method for building a synthetic data file having valid analytic properties. If we use general modeling/edit/imputation methods (Winkler 2007a, 2007b, 2008) that allow additional convex constraints, then we can create synthetic data with nearly identical analytic properties and with significantly reduced re-identification risk.

Comparative Evaluation of Seven Different Sensitive Tabular Data Protection Methods Using a Real Life Table Structure of Complex Hierarchies and Links (Dandekar)

The practitioners of tabular data protection methods in federal statistical agencies have some familiarity with commonly used table structures. However, they require some guidance on how to evaluate appropriateness of various sensitive tabular data methods when applied to their own table structure. With that in mind, we use a real life "typical" table structure of moderate hierarchical and linked complexity and populate it with synthetic micro data to evaluate the relative performance of four different tabular data protection methods. The methods selected for the evaluation are: 1) lp-based classical cell suppression; 2) lp-based CTA (Dandekar 2001); 3) network flow-based cell suppression as implemented in DiAna, a software product made available to other Federal statistical agencies by the US Census Bureau; 4) a micro data level noise addition method documented in a US Census Bureau research paper; 5) Hybrid EM/IPF based CTA method; 6) simplified CTA method; 7) conventional rounding based method.

GEORGE WASHINGTON UNIVERSITY
THE INSTITUTE FOR INTEGRATING STATISTICS IN DECISION SCIENCES AND DEPARTMENT OF STATISTICS SEMINAR

Title: Functional Shape Analysis to Forecast Box-Office Revenue using Virtual Stock Exchanges

Abstract:

In this paper we propose a novel model for forecasting innovation success based on online virtual stock markets. In recent years, online virtual stock markets have been increasingly used as an economic and efficient information gathering tool for the online community. It has been used to forecast events ranging from presidential elections to sporting events and applied by major corporations such as HP and Google for internal forecasting. In this study, we demonstrate the predictive power of online virtual stock markets, as compared to several conventional methods, in forecasting demand for innovations in the context of the motion picture industry. In particular, we forecast the release weekend box office performance of movies which serves as an important planning tool for allocating marketing resources, determining optimal release timing and advertising strategies, and coordinating production and distributions for different movies. We accomplish this forecasting task using n ovel statistical methodology from the area of functional data analysis. Specifically, we develop a forecasting model that uses the entire trading path rater than only its final value. We also employ trading dynamics and we tease out differences between different trading paths using functional shape analysis. Our results show that the model has strong predictive power and improves tremendously over competing approaches.

Title: Conducting a 360 Degree Feedback Survey for Managers: Implementing Organization Culture Change

Abstract:

A nationwide 360 Degree Feedback (Multi-Source) Pilot for Managers was conducted in a federal agency. The multi-source work group conducted an on-line survey asking managers to rate themselves on fifty two questions addressing important managerial behaviors (e.g., strategic planning skills). The number of managers participating in the study was around 500. In addition to having the managers rate themselves, the managers'supervisors, peer group, and employees rated the manager as well. While the pilot work group did not have access to survey data, a number of lessons were learned about administering a multi-source feedback pilot for managers. The discussion will focus on the survey methodology and lessons learned.

e-mail: esrodela@cox.net

Title: National Household Travel Survey - Demographic Indicators of Travel Demand

Abstract:

Since 1969, the National Household Transportation Survey (NHTS) has collected information about the U.S. population's daily travel behavior. The NHTS connects detailed trip characteristics to vehicle information, geography, and household and person demographic data. The study is designed primarily to obtain behavioral data to understand demand data needed for performance measurement, policy analyses, and program development and prioritization.

Thispresentation will provide an overview of information included in the NHTS and how demographic information plays an important role in assessing programs, policies, and in forecasting future demand. The discussion will include an overview of methods, emerging challenges, and example applications of the data, including the integration with other data sources such as the ACS.

GEORGE WASHINGTON UNIVERSITY
THE INSTITUTE FOR INTEGRATING STATISTICS IN DECISION SCIENCES

Title: Pre-Modeling via Bayesian Additive Regression Trees (BART)

Abstract:

Consider the canonical regression setup where one wants to learn about the relationship between y, a variable of interest,and x1...xp, potential predictor variables. Although one may ultimately want to build a parametric model to describe and summarize this relationship, preliminary analysis via flexible nonparametric models may provide useful guidance. For this purpose, we propose BART (Bayesian Additive Regression Trees), a flexible nonparametric ensemble Bayes approach for estimating f(x1...xp) ≡ E(Y|x1...xp), for obtaining predictive regions for future y, for describing the marginal effects of subsets of x1...xp, and for model-free variable selection. Essentially, BART approximates f by a Bayesian "sum-of-trees" model where fitting and inference are accomplished via an iterative backfitting MCMC algorithm. By using a large number of trees, which yields a redundant basis for f, BART is seen to be remarkably effective at finding highly nonlinear relationships hidden within a large number of irrelevant potential predictors. BART alsoprovides an omnibus test: the absence of any relationship between and any subset of x1...xp,, is indicated when BARTposterior intervals for reveal no signal.

This is joint workwith Hugh Chipman and Robert McCulloch.

THE UNIVERSITY OF MARYLAND
STATISTICS PROGRAM SEMINAR

Title: Rationalizing Momentum Interactions

Abstract:

Momentum profitabilityconcentrates in high information uncertainty and high credit risk firms and is virtually nonexistent otherwise. This paper rationalizes such momentum interactions in equilibrium asset pricing. In our paradigm, dividend growth is mean reverting, expected dividend growth is persistent, the representative agent is endowed with stochastic differential utility of Duffie and Epstein (1992), and leverage, which proxies for credit risk, is modeled based on the Abel's (1999) formulation. Using reasonable risk aversion levels we are able to produce the observational momentum effects. In particular, momentum profitability is especially large in the interaction between high levered and risky cash flow firms. It rapidly deteriorates and ultimately disappears as leverage or cash flow risk diminishes.

Please check for seminar updates at: http://www.math.umd.edu/statistics/seminar.shtml

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

BIOSTATISTICS SEMINAR
OFFICE OF BIOSTATISTICS RESEARCH
NATIONAL HEART, LUNG AND BLOOD INSTITUTE

Title: Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Abstract:

Some case-control genome-wide association studies (GWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. We define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected", namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association among the N (~ 500,000) SNPs studied. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk. For a genetic odds ratio per disease allele of 1.2 or less, even a GWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. These results for one-stage designs have implications for two- and multi-stage designs. In particular, a large fraction of the available cases and controls usually must be studied in the first stage if the study is to have adequate DP.

This is the joint work of Mitchell H. Gail, Ruth M. Pfeiffer, William Wheeler and David Pee.

GEORGETOWN UNIVERSITY SEMINAR

Title: Resampling from the past to improve on MCMC algorithms

Abstract:

Markov Chain Monte Carlo (MCMC) methods provide a very general and flexible approach to sample from arbitrary probability distributions. MCMC has considerably expanded the capability of Statistics in dealing with more realistic models. But designing MCMC samplers with good mixing properties is often tedious and involves many trial-and-errors. This talk will explore various ideas where sample paths are re-used to build more adaptive and automatic MCMC samplers. I will discuss the mixing of these new samplers theoretically and through examples.

Technical Report available at:
http://www.stat.lsa.umich.edu/~yvesa/eprop.pdf

GEORGE MASON UNIVERSITY
CDS/CCDS/STATISTICS COLLOQUIUM SERIES

Title: Knowledge Mining in Health Care

Abstract:

Knowledge mining concerns discovering knowledge that is useful and understandable by people. Unlike traditional data mining, it is not only about discovering useful patterns in large volumes of data, but also from small datasets that can be deficient, and with extensive use of background knowledge. This talk presents an approach to knowledge mining developed at the GMU Machine Learning and Inference Laboratory, its relation to health care, and some applications in this area.

Title: Generalized Confidence Intervals: Methodology and Applications

Abstract:

The concept ofgeneralized confidence intervals is fairly recent, and is useful to obtain confidence intervals for certain complicated parametric functions. The usual confidence intervals are derived using the percentiles of a pivotal quantity. Generalized confidence intervals are derived based on a generalized pivotal quantity, which is a function of a random variable, its observed value, and also the parameters. In the talk, I will explain the construction of a generalized pivotal quantity and will describe the conditions that they must satisfy. I will then discuss a series of applications of the generalized confidence interval methodology for obtaining confidence intervals for a number of somewhat complicated problems: confidence intervals for (i) the lognormal mean, (ii) the lognormal variance, (iii) the mean and variance of limited and truncated normal as well as lognormal distributions and (iv) some problems involving random effects models. In each case, I will motivate the problem with specific applications and will also illustrate the results using the relevant data analysis. Some attractive features of the generalized confidence intervals are that they are easy to compute and they exhibit excellent performance even for small sample sizes. We will comment on the situation where some variation on the assumption of normality does not apply.

Title: Bringing Statistical Principles to US Elections

Abstract:

Members of The ASA Special Interest Group on Volunteerism and the ASA Scientific and Public Affairs Advisory Committee have been actively working on issues related to elections. Vote counts seem to be off in some measurable way in some precinct whenever there is an election. The most recent example is in the November 2006 results in the 13th district of Florida where the undervote, apparently due to poor design form, appears to have changed the election outcome. These incidents provide interesting discussions for statisticians and survey methodologists but the more important result is that they undermine confidence in the electoral process. Electronic vote tally miscounts arise for many reasons, including hardware malfunctions, unintentional programming errors, malicious tampering, or stray ballot marks that interfere with correct counting. Thus, Congress and several states are considering requiring audits to compare machine tabulations with hand counts of paper ballots in randomly chosen precincts. This session will describe some of the analyses that have been used to indicate potential problems. It will also describe work that ASA members have been doing in conjunction with election activists to bring statistical principles to the procedures for sampling precincts for post-election audits of election results.

Title: Statistics Can Lie But Can Also Correct for Lies: Reducing Response Bias in NLAAS via Bayesian Imputation

Abstract:

This talk is based on thejoint work with Liu, Chen and Alegria of the same title. The National Latino and Asian American Study (NLAAS) is a multi-million dollar survey of psychiatric epidemiology, the most comprehensive survey of its kind. Data from the NLAAS was made public in July 2007. A unique feature of NLAAS is its embedded experiments for estimating the effect of alternative interview questions orderings. Although the findings from the experiments were not completely unexpected, the magnitudes of the effects were nevertheless astonishing. Compared to survey results from the widely used traditional ordering, the self-reported psychiatric service-use rates often doubled or even tripled under the new, more sensible, ordering introduced by NLAAS. These findings partially answer some perplexing questions in the literature, e.g., why the self-reported rates of using religious services were typically much lower than results from other sources of empirical evidence. At the same time, however, these new insights come at a price. For example: how can one assess racial disparities when different races were surveyed with different survey instruments, (e.g., the existing data on white populations were collected using the traditional questionnaire ordering) when it is now known that these survey instruments induce substantial differences? The project documented in this paper is part of the effort to address these questions. We do this by creating models for imputing the correct responses had the respondents under the traditional survey not been able to take advantage of skip patterns to reduce interview time. The ability to skip large numbers of questions resulted in increased rates of untruthful negative responses over the course of the interview. The task of modeling the imputation is particularly challenging because of the complexity of the questionnaire, the small sample sizes for subgroups of interests, the existence of high-order interactions among variables, and above all, the need to provide sensible imputation for whatever subpopulation a future user might be interested in studying. This paper is intended to serve three purposes: (1) to provide a published record of the key steps and strategies adopted in creating the released multiple imputation for NLAAS, (2) to alert the potential users of the limitations of the imputed data, and (3) to provide a vivid demonstration of the type of challenges and opportunities typically encountered in modern applied statistics.

THE UNIVERSITY OF MARYLAND
STATISTICS PROGRAM SEMINAR

Title: One-Sided Coverage Intervals for a Proportion Estimated from a Stratified Simple Random Sample

Abstract:

It is well known that Wald confidence intervals for a proportion calculated from a simple random sample often do not work very well. Hall (1982) showed how translating such an interval towards 1/2 could markedly improve its one-sided coverage properties. While extending Hall's methodology to a proportion estimated from a sample drawn independently within a number of strata having (perhaps) differing means, we discovered some surprising things. First, a simple modification of Hall's method is more effective that the more complicated `second-order' correction proposed by Cai (2004). Second, the heart of the method is less using an Edgeworth expansion to account for the skewness of the sample proportion as Hall (and Cai) argued and more replacing the standard formulation of the estimated variance in the denominator of the Wald pivotal with a more efficient, but not directly calculable, expression. We investigated two choices for this expression under stratified sampling. One allows the strata to have differing means. As a consequence, the expression itself has a variance. This suggested replacing the Normal z-score with a t -score when constructing the interval. Our last surprise was the realization that the methodology extends not only to more complicated sampling designs but also to more complicated estimands.

Please check for seminar updates at: http://www.math.umd.edu/statistics/seminar.shtml

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

GEORGE WASHINGTON UNIVERSITY
THE INSTITUTE FOR INTEGRATING STATISTICS IN DECISION SCIENCES AND DEPARTMENT OF STATISTICS SEMINAR

Title: Bayesian Variable Selection Methods For Class Discovery And Gene Selection

Abstract:

For various malignancies, currently used diagnostic approaches tend to be too broad in their classification. Patients who receive the same diagnosis often follow significantly different clinical courses and respond differently to therapy. It is believed that gene expression profiles may better capture disease heterogeneities. This calls for methods that uncover cluster structure among tissue samples and identify genes with distinctive expression patterns. I will present some Bayesian methods we have proposed that provide a unified approach to address these problems simultaneously. Model-based clustering is used to uncover the cluster structure and a stochastic search variable selection method is built into the model to identify discriminating genes. We let the number of clusters be unknown and adopt two different approaches. One consists of formulating the clustering problem in terms of finite mixture models with an unknown number of components and uses a reversible jump MCMC technique. The second approach uses infinite mixture models via Dirichlet process mixture priors. We illustrate the methods with applications to gene expression microarray data.

For a complete listing of our current seminars, visit http://www.gwu.edu/~stat/seminar.htm.

NATIONAL CANCER INSTITUTE
BIOSTATISTICS BRANCH SEMINAR

Title: Disparities in Defining Disparities: Statistical Conceptual Frameworks

Abstract:

This talk is based on a join work with Naihua Duan, Julia Y. Lin, Chih-nan Chen, and Margarita Alegria (Statistics in Medicine, to appear) with the same title and the following abstract. "Motivated by the need to meaningfully implement the Institute of Medicine's (IOM's) definition of health care disparity, this paper proposes statistical frameworks that lay out explicitly the needed causal assumptions for defining disparity measures. Our key emphasis is that a scientifically defensible disparity measure must take into account the direction of the causal relationship between allowable covariates that are not considered to be contributors to disparity and non-allowable covariates that are considered to be contributors to disparity, to avoid flawed disparity measures based on implausible populations that are not relevant for clinical or policy decisions. However, these causal relationships are usually unknown and undetectable from observed data. Consequently, we must make strong causal assumptions in order to proceed. Two frameworks are proposed in this paper, one is the conditional disparity framework under the assumption that allowable covariates impact non-allowable covariates but not vice versa. The other is the marginal disparity framework under the assumption that non-allowable covariates impact allowable ones but not vice versa. We establish theoretical conditions under which the two disparity measures are the same, and present a theoretical example showing that the difference between the two disparity measures can be arbitrarily large. Using data from the Collaborative Psychiatric Epidemiology Survey, we also provide an example where the conditional disparity is misled by Simpson's paradox, while the marginal disparity approach handles it correctly."

GEORGE WASHINGTON UNIVERSITY
DEPARTMENT OF DECISION SCIENCES AND
THE INSTITUTE FOR INTEGRATING STATISTICS IN DECISION SCIENCES SEMINAR

Title: Baby at Risk: The Uncertain Legacies of Medical Miracles for Babies, Families and Society

Abstract:

Seven years ago, I became interested in how decisions are made for babies who are born at risk. These babies are sick at birth, or are born with genetic anomalies, or are born too early. The latter group--premature babies, or preemies--have been growing in number each year for the past 25 years; currently, 500,000 preemies are born each year in just the United States. The problems for these babies and their families can be medical, social, financial, and legal. The consequences of their premature births and illnesses can be short-lived or lifelong. The children and their families may have financial, social, medical, educational, psychological, and legal needs. The lives of these children affect everyone, not just the babies and their families and those who care for them in the hospital and afterward. They live in the contexts of their families and their communities, and few communities (either local or state or federal) have adequately prepared for their complex and resource-demanding lives.

In 2006, I wrote the book whose themes I will be discussing: Baby at Risk: The Uncertain Legacies of Medical Miracles for Babies, Families and Society. I interviewed staff members of neonatal intensive care units, families whose babies had done well or had not, and many others. The parents are always young (that is, young enough to have babies) and typically have had little or no experience facing a medical ethics dilemma. They have no sense of the longterm outcomes for their newborn babies, and they are making decisions in a highly emotionally charged climate.

I will describe the roles of the therapeutic imperative and the technological imperative in decision making, the moral distress of nursing and medical staff members who care for these babies, and various other themes that I address in the book. I will talk about how medical and nursing staff members, women and their partners, community members, and policy makers might become better educated about what is medically appropriate and what is not. I will also discuss the role of the media (who have caused huge problems by hyping stories of "miracle babies") in raising expectations about what medicine and science can do. Many medical decisions today are also ethics decisions, and it is time for American society to grasp this concept and then more proactively help families whose babies are born at risk.

Title: A Semiparametric Generalization of One-Way ANOVA

Abstract:

Under the classical one-way ANOVA, with normal data and equal variances, the problem is to test the equality of means. Then, under the hypothesis of normality, the problem reduces to testing equality of distributions. By relaxing the normal assumption, we show how to test for equi-distribution directly and obtain tests that rival the usual t and F tests. The key idea is to "tilt" a reference distribution. This provides estimates for all the distributions from which we have data, using a modified kernel density estimate which is superior to the traditional kernel estimate. The attractive feature of the semiparametric generalization is that it provides BOTH powerful tests and graphical displays of all the estimated distributions. This will be demonstrated using gene expression data. The "tilting" idea has numerous other statistical applications. We shall briefly outline several recent applications.

THE UNIVERSITY OF MARYLAND
STATISTICS PROGRAM SEMINAR

Title: What happens to the location estimator if we minimize with a power other than 2?

Abstract:

he location estimator forms a path as the power varies from 1 to infinity. This path indicates how critical the selection of an exponent is. An alternative proof of Descartes' rule of signs, applied to exponential sums, limits the number of repeated exponents for the same minimum point with usual data sets. Several bounds on this path include that it stays among the averages of pairs of data points.

Reference: Robert J. Blodgett, The Path of the Minimum Lp-Norm estimator for p Between 1 and infinity, Commun. in Statistics-Theory and Methods, 36, 2007, pp. 2829-2839.

Please check for seminar updates at: http://www.math.umd.edu/statistics/seminar.shtml

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

GEORGE WASHINGTON UNIVERSITY
THE INSTITUTE FOR INTEGRATING STATISTICS IN DECISION SCIENCES AND DEPARTMENT OF STATISTICS SEMINAR

Title: Non-parametric Continuous Bayesian Belief Nets

Abstract:

Bayesian Belief Nets (BBNs) enjoy wide popularity in Europe as a decision support tool. The main attraction is that the directed acyclic graph provides a graphical model in which the problem owner recognizes his problem and which at the same time is a user interface for running and updating the model. discrete BBN's support rapid updating algorithms, but involve exponential complexity that limits their use to toy problems. Continuous BBNs hold more promise. To date, only 'discrete normal' BBNs have been available. The user specifies a mean and conditional variance for each node, and the child nodes are regressed on their parents. Continuous nodes can have discrete parents but not discrete children and all continuous nodes are normal. Overcoming the restriction to normality has opened new areas for applications. A large risk model for Schiphol airport involving some 300 probabilistic nodes and 300 functional nodes will be demonstrated. Updating is facilitated by the use of the 'normal copula'. This type of BBN can be used either in a probabilistic modeling mode (user supplies distributions) or in a data mining mode (a BBN is built to model multivariate data). The latter application will be demonstrated using fine particulate emission and collector data.

U.S. BUREAU OF CENSUS
STATISTICAL RESEARCH DIVISION SEMINAR

Topic: Using Cognitive Predictors for Evaluation

Abstract:

This research examines job/task knowledge as a mechanism through which cognitive ability affects performance. Further, two types of job knowledge tests are compared to a cognitive ability test for their efficacy in performance prediction. The knowledge tests differ in the methods used for their development and in the resulting types of information that are assessed. More specifically, one test developed with 'traditional' methods used in Industrial/Organizational Psychology, assesses Basic knowledge about how to complete the task. The second, more 'Cognitively Oriented' test is focused on the assessment of the understanding of the application of task knowledge for successful task completion. Results demonstrate that the Cognitively-Oriented test, i.e., the test of 'Understanding' accounts for significantly more variance in performance than the cognitive ability test, completely mediates cognitive ability effects on performance, and predicts performance more fairly than the test of Basic knowledge. These results have clear implication for selection, training development and assessment, and display design and evaluation of the effectiveness of the tools (displays) being used. For example, the Cognitively-Oriented test provides a method for evaluating the individual's amount of knowledge acquired after receiving training on the use of a particular device and will allow us to evaluate the level of task 'understanding' provided by the use of device itself.

Please direct all requests for Sign Language Interpreting Services, Computer Aided Real-time (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Title: Studies in Military Medicine from the Center for Data Analysis and Statistics (CDAS) at West Point

Abstract:

The importance of maintaining and improving the health and fitness of soldiers in the Army has always been high. Stresses of combat as an Army at war have made concerns in this area even greater and highlighted new areas where improvements are necessary. The military medical community has responded with new treatment ideas that have resulted in studies that will both contribute to efforts on behalf of our soldiers and impact medical practices more generally. The Center for Data Analysis and Statistics (CDAS) has been involved in several of these studies in support of Walter Reed, Beaumont Army Medical Center and Keller Army Community Hospital. We will discuss several of these studies and the results to include Leishmania detection, ACL repair, air casts, LASEK surgery, incidence rates for injuries among different demographics, lumbar support for air crews and medical leadership.

Title: Using the Peters-Belson Method in EEO Personnel Evaluations

Abstract:

The Peters-Belson method was developed to examine wage discrimination using linear regression analyses. In application, one conducts a regression analysis on the favored class and applies it to the non-favored class to identify a disparity between the actual and predicted values. Recently, the method was extended to examine health care disparities and other forms of discrimination for binary outcomes via logistic regression. In this paper, we will examine the general properties in personnel hiring discrimination evaluations as compared to a standard regression analysis as related to the size of the applicant pool, the differences in the traits for the favored and non-favored class members ,and the employer's uniform consideration applied for factors by class. We will also discuss some of the philosophical and legal issues from selected court cases surrounding the use of this approach relative to a standard regression analysis and the methodology for applying a jackknife variance estimator to measure the statistical precision in the disparities.

Title: Two-Sample Rank Tests for Treatment Effectiveness When Death and Censoring Depend on Covariates

Abstract:

Popular two-sample rank tests for treatment effectiveness in clinical trials rely on the independence of death and censoring, yet there are often baseline covariates on which survival and potentially also censoring may depend. Suppose that survival T and censoring C are conditionally independent given covariates V, and that treatment-allocation Z is independent of V. In a paper of Slud and Kong (Biometrika 1997), an assumption was introduced [essentially, that the conditional survival function for censoring is the sum of a function of (Z,t) and another function of (V,t)] under which the usual logrank test was shown to be consistent. But this assumption is not fully general, and DiRienzo and Lagakos (2001, papers in Biometrika and JRSSB) proposed a bias-correcting weighting for the logrank and studied its performance. In this talk, some theoretical results are presented on asymptotic validity of tests based on an estimated form of their weighting function. Simulations will show the good performance of these methods, and theoretical calculations show that these weight-adjustments cannot simply be ignored.

For current and future OBR seminar series, please contact: Gang Zheng (zhengg@nhlbi.nih.gov) or Jungnam Joo (jooj@nhlbi.nih.gov).

GEORGE WASHINGTON UNIVERSITY
THE INSTITUTE FOR INTEGRATING STATISTICS IN DECISION SCIENCES AND DEPARTMENT OF STATISTICS SEMINAR

Title: Computation with Imprecise Probabilities

Abstract:

Computation with imprecise probabilities is not an academic exercise-it is a bridge to reality. In the real world, imprecision of probabilities is the norm rather than exception. In large measure, real-world probabilities are perceptions of likelihood. Perceptions are intrinsically imprecise. Imprecision of perceptions entails imprecision of probabilities. In applications of probability theory it is a common practice to ignore imprecision of probabilities and treat imprecise probabilities as if they were precise. A problem with this practice is that it leads to results whose validity is open to question. Publication of Peter Walley's seminal work "Statistical Reasoning with Imprecise Probabilities," in l99l, sparked a rapid growth of interest in imprecise probabilities. Today, there is a substantive literature. The approach described in this lecture is a radical departure from the mainstream. First, imprecise probabilities are dealt with not in isolation, as in the mainstream literature, but in an environment of imprecise events, imprecise relations and imprecise constraints. Second, imprecise probability distributions are assumed to be described in a natural language. The approach is based on the formalism of Computing with Words (CW) (Zadeh 1999, 2006). In the CW-based approach, the first step involves precisiation of information described in natural language. Precisiation is achieved through representation of the meaning of a proposition, p, as a generalized constraint. A generalized constraint if an expression of the form X isr R , where X is the constrained variable, R is a constraining relation and r is an indexical variable which defines the modality of the constraint, that is, its semantics. The primary constraints are possibilistic, probabilistic and veristic. Computation follows precisiation. In the CW- based approach the objects of computation are generalized constraints. The CW-based approach to computation with imprecise probabilities enhances the ability of probability theory to deal with problems in fields such as economics, operations research, decision sciences, theory of evidence, analysis of causality and diagnostics.

Title: Assessing Disclosure Risk, and Preventing Disclosure, in Microdata

Abstracts:

Matching NCES Data to External Databases to Assess Disclosure Risk
J. Neil Russell, National Center for Education Statistics

The National Center for Education Statistics (NCES) is the Federal statistical agency responsible for collecting information on the condition of education in the United States. The agency's Disclosure Review Board (DRB) reviews and approves all microdata products prior to release. Since the early 1990s, the DRB has required that survey programs that release public-use microdata files (PUMFs) match to external databases as part of a disclosure risk analysis. Most NCES PUMFs have been matched to external databases to model an intruder's behavior for trying to disclose a respondent's identity. This presentation will focus on two features of this process. First, we will chronicle the history of matching at NCES as a disclosure risk assessment method. Second, we will present general findings of the disclosure risks discovered by matching to external databases.

Measuring Disclosure Risk and an Examination of the Possibilities of Using of Synthetic Data in the Individual Income Tax Return Public Use File (PUF)
Michael Weber, Internal Revenue Service and Sonya Vartivarian, Mathematica

The Statistics of Income Division (SOI) currently measures disclosure risk through a distance based technique that compares the Public Use File against the population of all tax returns and uses top-coding, subsampling and multivariate microaggregation as disclosure avoidance techniques. SOI is interested in exploring the use of other techniques that prevent disclosure while providing less data distortion. Synthetic or simulated data may be such a technique. But while synthetic data may be the ultimate in disclosure protection, creating a synthetic dataset that preserves the key characteristics of the source data presents a significant challenge. An additional constraint in creating synthetic data for the SOI PUF is found in maintaining the accounting relationships among numerous income, deduction, and tax items that appear on a tax return.

Data Synthesis via Expert Knowledge, Modeling, and Hot Deck
Sam Hawala, U.S. Census Bureau

The presentation will focus on a method to produce synthetic data through the combined use of expert knowledge, model fitting to the data, and matching using the model predicted values. All three elements play an important role in the successful reproduction of the aggregate behavior and the main features of a data set.

U. S. CENSUS BUREAU
STATISTICAL RESEARCH DIVISION SEMINAR

Topic: Statistical Meta-Analysis - a Review

Abstract:

Statistical meta-analysis deals with statistical methods to efficiently combine information or evidence from several studies in order to produce a meaningful inference about a common phenomenon. Applications of meta-analysis abound in the literature. In this talk a review of some salient features of statistical meta-analysis will be presented.

This talk is based on the 2008 John Wiley Book by the speaker.

This seminar is physically accessible to persons with disabilities. Please direct all requests for Sign Language Interpreting Services, Computer Aided Real-time (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

GEORGE MASON UNIVERSITY
CDS/CCDS/STATISTICS COLLOQUIUM SERIES

Title: Text Mining, Social Networks, and High Dimensional Analysis

Abstract:

A traditional approach to text mining has been to represent a document by a vector. In the bag-of-words representation binary vectors are used and two documents are regarded as similar if the angle between their corresponding vectors is small (i.e., correlation between the vectors is high). The document vectors may be assembled into a term-document matrix (TDM). A more satisfying representation of a document can be formulated in terms of bigrams or trigrams, because these have a better chance of capturing semantic content Bigram vectors ran be assembled into bigram document matrices (BDM). The TDM and BDM resemble the two-mode adjacency matrices associated with social network analysis (SNA). Using cues from SNA, we formulate the one-mode social network adjacency matrices to form document-document matrices (DD) and bigram-bigram matrices (BB). In this talk I outline the basics, discuss the connection between text mining and social networks and, by example, illustrate the dimensionality issues raised by such vector space methods.

GEORGE WASHINGTON UNIVERSITY
DEPARTMENT OF STATISTICS

Title: Text Mining, Social Networks, and High Dimensional Analysis

Abstract:

Data naturally represented in the form of a network, such as social and information networks, are being encountered increasingly often and have led to the development of new generative models (such as exponential random graphs and power law mechanisms) to attempt to explain the observed structure. However, it is usually prohibitively expensive to observe the entire network, so sampling in the network is needed. There has been comparatively little attention given to the question of what network properties are stable under what sampling schemes. We will discuss some examples where valid inferences about the structure of the network can and cannot be drawn from the sample, depending on the generative model, the sampling method, and the quantity of interest.

GEORGETOWN UNIVERSITY SEMINAR

Title: Preprocessing in High Throughput Biological Experiments

Abstract:

High throughput experiments including gene expression arrays,array comparative genomic hybridization (aCGH), and other spot imaging bioassays require so called "low level" processing to remove backgroundsignal and systematic variation. Afterwards a normalization step is required to compare assays within a given experiment. Although each platform in high throughput experiments has a distinct set of preprocessing steps, there are common preprocessing concepts and principles that should drive any scheme for preprocessing spot bioassays. In this talk, we will compare and contrast several preprocessing schemes that we developed for high throughput experimentsincluding: xerogel assays, gene expression arrays, aCGH arrays, and gel electrophoresis images.

Title: Statistical issues in disease surveillance: A case study from ESSENCE

Abstract:

Syndromic surveillance systems attempt to monitor the burden of disease in communities in real time, using health-related data and tools from statistics, epidemiology, informatics, and other disciplines. A potential benefit of such surveillance is early detection and tracking of infectious disease outbreaks.

The Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) is a syndromic surveillance system that monitors outpatient visits to military medical treatment facilities. This study examines whether ESSENCE can detect more infectious disease outbreaks, and detect them earlier, using joint monitoring of laboratory test orders and outpatient visit data rather than outpatient visit data alone. Statistical issues that arise from this question include which aberration detection algorithm is best suited to these data sources, how to quantify the tradeoffs among sensitivity, specificity and timeliness for detecting outbreaks, and how to monitor information from multiple data sources simultaneously.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

GEORGE MASON UNIVERSITY
CDS/CCDS/STATISTICS COLLOQUIUM SERIES

Title: Some Issues Raised by High Dimensions in Statistics

Abstract:

This talk is an overview presentation made by D.M. Titterington as a summary of the activities at Cambridge during the Spring of 2008. Most of twentieth-century statistical theory was restricted to problems in which the number p of 'unknowns', such as parameters, is much less than n, the number of experimental units. However, the practical environment has changed dramatically over the last twenty years or so, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is comparatively small but the underlying dimension is massive, leading to the desire to fit complex models for which the effective p is very large. Areas of application include image analysis, microarray analysis, finance, document classification, astronomy and atmospheric science. Some methodological advances have been made, but there is a need to provide firm consolidation in the form of a systematic and critical assessment of the new approaches as well as appropriate theoretical underpinning in this 'large p, small n' context. The existence of key applications strongly motivates the programme, but the fundamental aim is to promote core theoretical and methodological research. Both frequentist and Bayesian paradigms will be featured. The programme is directed at a broad research community, including both mainstream statisticians and the growing population of researchers in machine learning.

Title: Statistical Issues Arising in the Interpretation of a Measure of Relative Disparity Used in Educational Funding: The Zuni School District 89 Case

Abstract:

This seminar will discuss statistical issues that arose in recent cases. The first case concerns the interpretation of a formula Congress wrote when it revised a law that provides funds for educating children in areas with a large federal presence (e.g. major research lab). Because federal land is not subject to local real estate tax, the primary source of funding education, the law is intended to assist the relevant school districts. We will discuss the statute and the various interpretations that arose during the proceedings and the justifications provided. A counter-example to one of the assertions made by the lawyers at the Supreme Court hearing, which appears to have been accepted by the Court's majority, will also be presented.

U. S. CENSUS BUREAU
9TH ELDERS PROGRAM SEMINAR

Topic: Different Directorates, Not So Different Approach

Abstract:

I hope to provide insight into the earlydevelopment of Jeffersonville; how it originated, expanded, and how it interfaced with the Bureau's subject matter divisions in the 1970's and 1980"s. Also, I will address the changes in the Bureau's collection and publication of foreign trade statistics in the late 1980's and early 1990's. Biography : Don joined the Bureau in 1963 as an Industry Division analyst, moved to Demographic Surveys Division and in 1969 relocated to Jeffersonville in charge of processing the 1969 Census of Agriculture. This "temporary" assignment lasted for 16 years. He became Chief, Data Preparation Division (now NPC) in 1976 until late 1985 when he returned to Suitland as Chief, Data User Services Division. In less than a year, he became Chief, Foreign Trade Division, a position he held until the end of 1993. For much of the year 1993, one of reorganization in the Economic Directorate, Don was the Assistant Director for Economic Programs; Acting Chief, Foreign Trade Division; Acting Chief, Construction Division; and Acting Chief, Industry Division--all at the same time. A recipient of the Department's Silver and Gold Medals, Don retired as Assistant Director of Economic Programs on December 31, 1993.

Important Information:

This seminaris physically accessible to persons with disabilities. Please direct all requests for Sign Language Interpreting Services, Computer Aided Real-time (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Title: Multivariate Event Detection and Characterization

Abstract:

We present the multivariate Bayesian scan statistic (MBSS), a general framework for event detection and characterization in multivariate spatial time series data. MBSS integrates prior information and observations from multiple data streams in a principled Bayesian framework, computing the posterior probability of each type of event in each space-time region. MBSS learns a multivariate Gamma-Poisson model from historical data, and models the effects of each event type on each stream using expert knowledge or labeled training examples. We evaluated MBSS on various disease surveillance tasks, detecting and characterizing disease outbreaks injected into three streams of Pennsylvania medication sales data. We demonstrated that MBSS can be used both as a "general" event detector, with high detection power across a variety of event types, and a "specific" detector that incorporates prior knowledge of an event's effects to achieve much higher detection power. MBSS has many other advantages over previous event detection approaches, including efficient computation and easy interpretation and visualization of results, and allows faster and more accurate detection by integrating information from the multiple streams. Most importantly, MBSS can model and differentiate between multiple event types, thus distinguishing between events requiring urgent responses and other, less relevant patterns in the data. This talk will present an overview of the MBSS framework, and compare MBSS to other recently proposed multivariate detection approaches. Time permitting, I will also discuss how incremental learning (both passive and active) can be incorporated into the MBSS framework and used to improve detection performance, and consider extensions of MBSS to more general pattern detection problems.

PRESIDENT'S INVITED SEMINAR

Title: What's Up at the ASA?

Abstract:

ASA Executive Director Ron Wasserstein will provide a brief update on activities and directions of the association. However, most of the session will be devoted to questions and comments from the participants. Among the many things we could discuss:

Title: Bayesian Dose-finding Trial Designs for Drug Combinations

Abstract:

Treating patients with a combination of agents is becoming commonplace in cancer clinical trials, with biochemical synergism often the primary focus. In a typical drug combination trial, the toxicity profile of each individual drug has already been thoroughly studied in the single-agent trials, which naturally offers rich prior information. We propose Bayesian adaptive designs to search for the maximum tolerated dose combination. We continuously update the posterior estimates for the toxicity probabilities of the combined doses. By reordering the dose toxicities in the two-dimensional probability space, we adaptively assign each new cohort of patients to the most appropriate dose. Dose escalation, de-escalation or staying the same is determined by comparing the posterior estimates of the toxicity probabilities of combined doses and the prespecified toxicity target. We conduct extensive simulation studies to examine the operating characteristics of the design and illustrate the proposed method under various practical scenarios.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Title: Nonresponse Adjustments in Survey Applications

Abstract (Kreuter):

Using Proxy Measures and Other Correlates of Survey Outcomes to Adjust for Nonresponse: Examples from Multiple Surveys Nonresponse weighting is a commonly used method to adjust for bias due to unit nonresponse in surveys. Theory and simulations show that, in order to effectively reduce bias without increasing variance, a covariate used for nonresponse weighting adjustment needs to be highly associated with both response and the survey outcome. In practice, these requirements pose a challenge that is often overlooked. Recently some surveys have begun collecting supplementary data, such as interviewer observations and other proxy measures of key survey outcomes. These variables are promising candidates for nonresponse adjustment because they should be highly correlated with the actual outcomes. In the present study, we examine the extent to which traditional covariates and new proxy measures satisfy the weighting requirements for the National Survey of Family Growth, the Medical Expenditure Survey, the U.S. National Election Survey, the European Social Surveys and the University of Michigan Transportation Research Institute Survey. We provide empirical estimates of the association between proxy measures and the likelihood of response as well as the actual survey responses. We also compare unweighted and weighted estimates under various nonresponse models. Results show the difficulty of finding suitable covariates and the need to improve the quality of proxy measures. s to examine the operating characteristics of the design and illustrate the proposed method under various practical scenarios.

Abstract (Ezzati-Rice):

Assessment of the Impact of Health Variables on Nonresponse Adjustment in the Medical Expenditure Panel Survey The Medical Expenditure Panel Survey(MEPS) is a large complex sample survey, designed to provide nationally representative annual estimates of health care use, expenditures, sources of payment, and insurance coverage for the U.S. civilian non-institutionalized population. A new panel of households is selected each year for the MEPS from households that responded to the previous year's National Health Interview Survey(NHIS). Nonresponse is a common problem in household sample surveys. To compensate for nonresponse and to reduce the potential bias of the survey estimates, two separate nonresponse adjustments are performed in development of analytic weights in MEPS. The first, the focus of this presentation, is an adjustment for dwelling unit (DU) level nonresponse to account for nonresponse among those households subsampled from NHIS for the MEPS. The adjustment is carried out using socio-economic, demographic, and health variables that are available for both respondents and nonrespondents. In this study, we examine the impact of health variables on the MEPS DU level nonresponse weight adjustment. Response propensity scores are calculated based on logistic regression models and quintiles of the propensity scores are used to adjust the MEPS base weights. Comparisons of the nonresponse adjusted weights and selected survey variables with and without inclusion of health variables as a nonresponse adjustment covariate are discussed.

Title: Recent Developments in Address-based Sampling

Abstract:

Increasingly, survey researchers are reverting back to address-based methodologies to reach the general public for survey administration and related commercial applications. Essentially, there are three main factors for this change: evolving coverage problems associated with telephone-based methods; eroding rates of response to telephone contacts; and on the other hand, recent improvements in the databases of household addresses available to researchers. This presentation provides an assessment of these three factors along with an over view of the structure of the Delivery Sequence File (DSF) of the USPS that is often used for construction of address-based sampling frames. Moreover, key enhancements available for the DSF will be discussed. While reducing undercoverage bias particularly in rural areas where more households rely on P.O. Boxes and inconsistent address formats such enhancements enable researcher to develop more efficient sample designs as well as broaden their analytical possibilities through an expanded set of covariates for hypothesis testing and statistical modeling tasks.

Title: Multiple Frame Surveys: Lessons from CBECS Experience

Abstract:

Because there is no single frame of buildings in the country, the Commercial Building Energy Consumption Survey relies on federal databases, commercially- available databases and field listing to construct a sampling frame. The 2007 round of CBECS, which NORC is currently fielding on behalf of the Energy Information Administration of the DOE, sampled from seven frames. The difficulty in sampling simultaneously from many frames is in adjusting for the overlap among them. To calculate the correct weights, we must know the true probability of selection for all selected cases: for each selected building we must either know its probability of selection from each of the frames to which it belongs or we must remove it from all of the frames but one. We will present the steps we took to remove these overlaps before the sample was fielded. Despite our best efforts, duplicates were discovered in the field: we will discuss the ways in which we modified the probabilities of selection once data collection was underway. We must also confront the fact that undetected duplicates remain, and that some of the probabilities of selection will not be correct. The CBECS experience can help other surveys to decide whether incorporating additional frames is worth the difficulty. Our findings also apply to dual-frame phone surveys, where it may not be possible to fully deduplicate the frames.

WSS Home | Newsletter | WSS Info | Seminars | Courses | Employment | Feedback | Join!