Washington Statistical Society Seminars

Current | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | 1997 | 1996 | 1995 | Methodology

January 2005
19
Wed.
Bureau of Labor Statistics Seminar
Machine Learning Methods for Text Classification
26
Wed.
Bureau of Labor Statistics Seminar
27
Thur.
Bureau of Labor Statistics Seminar
February 2005
15
Tues.
Documenting Attrocities in Dafur
16
Wed.
News, Noise, and Estimates of the "True" Unobserved State of the Economy
17
Thur.
Bureau of Labor Statistics Seminar
Accumulated Respondent Burden on NASS Surveys
24
Thur.
Bureau of Labor Statistics Seminar
An Efficient Method of Estimating the True Value of a Population Characteristic from its Discrepant Estimates
25
Fri.
The George Washington University
Department of Statistics Seminar
Database Integration in Biological Studies
March 2005
8
Tues.
Pre-election Polls and the 2004 General Election
10
Thur.
Analysis of Nonresponse in Telephone Surveys
16
Wed.
The Conservation Effects Assessment Project (CEAP) - An Overview
17
Thur.
U.S. Bureau Of Census
Statistical Research Division Seminar
Data Quality of the American Community Survey Across Individuals Living in Linguistically Isolated and Non-Linguistically Isolated Households: A Latent Variable Model Assessment
18
Fri.
A Statistician's View of the FBI Compositional Analysis of Bullet Lead (CABL)
April 2005
6
Wed.
2004 Roger Herriot Award
Major Challenges for the Future: Administrative Records, American Community Survey, and the American Fact Finder
6
Wed.
U.S. Bureau Of Census
Statistical Research Division Seminar
Using Census Data to Define Estimation Areas for the American Community Survey: A Case Study
8
Fri.
Mortality Before and After the 2003 Invasion Of Iraq: Cluster Sample Survey
15
Fri.
Why Small Changes in Question Wording Can Produce Big Changes in Survey Measurement: Unraveling Some Mysteries of Questionnaire Design with the Theory of Satisficing
19
Tues.
Analyzing Call History Records to Achieve Optimal Outcomes: Survey Response and Efficient Use of Resources
22
Fri.
The George Washington University
Department of Statistics Seminar
Modeling and Spatial Prediction of Pre-Settlement Patterns of Forest Distribution using Witness Tree Data
27
Wed.
Evaluating Alternative Calibration Schemes for an Economic Survey With Large Nonresponse
May 2005
5
Thur.
The National Academies
Committee on National Statistics
Discrepant Estimates of Social and Economic Phenomena: Love 'Em or Leave 'Em?
5
Thur.
The George Washington University
Department of Statistic
Runs in Bernoulli trials: how much success should we expect?
6
Fri.
George Mason University
Statistics Colloquium Series
On the Borders of Statistics and Computer Science
11
Wed.
Evolving Information Access/Security Issues
24
Tues.
Questionnaire Design Methodology for a Study of Human Rights Abuses during the Armed Internal Conflict of Sierra Leone
25
Wed.
Questionnaire Design Methodology for a Study of Human Rights Abuses during the Armed Internal Conflict of Sierra Leone
26
Thur.
An Introduction to the American Time Use Survey
June 2005
9
Thurs.
An Assessment of the Comparative Accuracy of Time Series Forecasts of Patent Filings: The Benefits of Disaggregation in Space or Time
13
Mon.
The Fourth Funding Opportunity in Survey and Statistical Research Seminar
16
Thurs.
Household Telephone Service and Usage Patterns in the United States in 2004
July 2005
19
Tues.
Using MASSC to Minimize Information Loss and Control Disclosure Risk
September 2005
7
Wed.
User and Network Profiling
15
Thur.
University of Maryland
Statistics Program Seminar
Statistical Analysis of Ultrasound Images of Tongue Contours during Speech
16
Fri.
The George Washington University
Department of Statistics Seminar
Estimation Following a Group Sequential Test for Distributions in the One-Parameter Exponential Family
22
Thurs.
The Dynamics of Nonmetro Wage Income Distribution: A Model-Based Approach
23
Fri.
The George Washington University
Department of Statistics Seminar
Testable and Model-Preserving Parametric Constraints in a Parametric Probability Model
28
Wed..
Panel on Privacy and Public Perception of Risks
28
Wed..
University of Maryland
Statistics Program Seminar
Topic related to modeling of age at first marriage in the US population
October 2005
6
Thur.
University of Maryland
Statistics Program Seminar
Balanced Sampling with Applications to Accounting Populations
14
Fri.
The George Washington University
Department of Statistics Seminar
Testing Degenerate Tensors in Diffusion Tensor Images
18
Tues.
Should the RCT Model Be a "Gold Standard" for Social Policy Research? Should the RCT Model Be a "Gold Standard" for Social Policy Research?
20
Thurs.
Julius Shiskin Award Presentation
Methodological Problems with the Consumer Price Index
20
Thurs.
Methodological Problems with the Consumer Price Index
20
Thur.
University of Maryland
Statistics Program Seminar
Stochastic Variants of EM: Monte Carlo, Quasi-Monte Carlo, and More
26
Wed.
A Lifetime in Official Statistics — A Question & Answer Period with Dr. Ivan Fellegi, Chief Statistician - Statistics Canada
27
Thurs.
The National Academies
Committee on National Statistics
How Can We Conduct Telephone Surveys in a Cell Phone Age?
27
Thurs.
U.S. Bureau Of Census
Statistical Research Division Seminar
Methods for Re-identifying and Analyzing Masked Microdata
28
Fri.
The George Washington University
Department of Statistics Seminar
Augmented designs to assess immune response in vaccine trials
November 2005
2
Wed.
Fifteenth Annual Morris Hansen Lecture
Causal Inference Through Potential Outcomes: Application to Quality of Life Studies with 'Censoring' Due to Death and to Studies of the Effect of Job-Training Programs on Wages
11
Fri.
The George Washington University
Department of Statistics Seminar
Composite Likelihood Inference in Spatial Generalized Linear Mixed Models
December 2005
1
Thurs.
Empirical Bayes Analysis of Bivariate Binary Data: An Application to Small Area Estimation
2
Fri.
The George Washington University
Department of Statistics
Bayesian Social Network Models with Acute Outcomes
5
Mon.
U.S. Bureau Of Census
Statistical Research Division Seminar
A Comparative Study of Complex Households in Six Race/Ethnic Groups, with Implications for Censuses and Surveys
8
Thurs.
The Effects of Cell Collapsing in Poststratification
8
Thur.
University of Maryland
Statistics Program Seminar
Analysis of Genotype-Phenotype Relationships: Machine Learning/Statistical Methods
12
Mon.
Comparing Homeowner and Lender Estimates of Housing Wealth and Mortgage Terms
15
Thurs.
Data Collection and Statistical Issues in Surveying Cell and Landline Telephone Samples


WSS Home | Newsletter | WSS Info | Seminars | Courses | Employment | Feedback | Join!


Title: Machine Learning Methods for Text Classification

Abstract:

Textual information consisting of words can be used for areas such as classification of documents into categories (e.g., industry and occupation coding), queries in web and library searches, and the record linkage of name and address lists. To use text effectively, the text might possibly be cleaned to remove typographical error and documents (records) be given a mathematical representation in a probabilistic model. This talk describes an application of Bayesian networks to classify a collection of Reuter's newpaper articles (Lewis 1992) into categories (Nigam, McCallum, Thrun, and Mitchell 2000, Winkler 2000). The results are indirectly compared with the current best-performing methods such as Support Vector Machines (Vapnik 1995, 2000) and Boosting (Freund and Schapire 1996, Friedman, Hastie, and Tibshirani 2000). For text classification, until seven years ago, the best methods in computational linguistics outperformed the best machine learning methods. Without the need to build complicated semantic or syntatical representations, the best machine learning methods now outperform the best methods in computational linguistics with a large number of widely used test decks. These make the methods much more language independent and can make them more application dependent.


Title: Credit Report Accuracy and Access to Credit

Abstract:

Data that credit-reporting agencies maintain on consumers' credit-related experiences play a central role in U.S. credit markets. Analysts widely agree that the data enable these markets to function more efficiently and at lower cost than would otherwise be possible. Despite the great benefits of the current system, however, some analysts have raised concerns about the accuracy, timeliness, completeness, and consistency of consumer credit records and about the effects of data problems on the availability and cost of credit.

The analysis expands on available research by quantifying the effects of credit record limitations on the access to credit. Using the credit records of a nationally representative sample of individuals, we examine the possible effects of data problems on consumers by estimating the changes in consumers' credit history scores that would result from "correcting" the problems in their credit records. Results for consumer groups are segmented by strength of credit history (credit history score range), depth of credit history (number of credit accounts in a credit record), and selected demographic characteristics


Title: Discussion of Nonresponse on the 2001 National Household Travel Survey

Abstract:

This presentation explores several aspects of nonresponse associated with the 2001 National Household Travel Survey, including the methods that were implemented to improve response rates and reduce nonresponse bias and the results of a wide range of analyses to help understand the characteristics of nonrespondents.

It is well documented that response rates for traditional household travel surveys using random digit dial (RDD) telephone survey methods have been declining for some years. The public has become wary of unfamiliar callers conducting surveys, polls or telemarketing and is generally less willing to devote their time for important official surveys. This is especially true for multi-stage, contact/re-contact travel surveys where each household must be contacted at least two or three times in order to complete the survey. Such procedures provide many opportunities for household, person and item nonresponse. It was clear from the earliest planning stages that special measures were needed to maximize the likelihood of response at the household and person levels for the 2001 NHTS. The first presentation describes a number of procedures that were used in the NHTS. NHTS response rates are also presented and compared to a similar regional household travel survey conducted during the same field period as the NHTS.

The second presentation will examine correlates of nonresponse for the 2001 NHTS by comparing nonrespondents to respondents to the survey. The analysis makes comparisons for the overall nonresponse rate, as well as for the screener interviewing stage and for the extended interviewing stage. Most characteristics that are examined show wide ranges in response rates, implying that the potential exists for high bias due to nonresponse.

The third presentation discusses the effects of two design features on nonresponse and nonresponse error. The first analysis addresses whether the scheduling algorithm used to initially contact households should target households with certain characteristics. The second analysis assesses whether the short time window required to complete the travel interview results in nonresponse for certain types of respondents. Both analyses find that there are characteristics that are correlated with each design feature. The implications for scheduling screening and travel interviews are discussed in light of these results.


Topic: Documenting Attrocities in Dafur

Abstract:

This past summer, the State Department and USAID's Office of Transition Initiatives partnered with the Coalition for International Justice (CIJ) to conduct a survey of refugees fleeing the Darfur crisis. CIJ assembled and led a team of close to thirty interviewers from around the world who, along with staff from the State Department and OTI, conducted over 1,200 interviews in Chad with refugees from Darfur over a six-week period in July and August. The project was ground-breaking in its application of random-sample survey methodology in the midst of a humanitarian crisis. The data and report generated by the Darfur Atrocities Documentation Team was citied by Secretary of State Colin Powell in his historic September 9, 2004 determination that genocide is occurring in the Darfur region of Sudan.

During this talk, Jonathan P. Howard, a research analyst with the State Department who served as the team's survey methodologist, will discuss the methodology applied by the team as well as the challenges faced by methodologists in humanitarian crisis zones. Stefanie Frease, Director of Programs for CIJ, is a former investigator at the International Criminal Tribunal for the former Yugoslavia and led the data collection team in Chad. She will talk about the various elements required in planning, deploying and conducting a targeted data collection mission in a challenging environment.


Title: News, Noise, and Estimates of the "True" Unobserved State of the Economy

Abstract:

Which is a better indicator of the "true" state of the economy, gross domestic product (GDP) or gross domestic income (GDI)? Taking a weighted average of the two measures may produce the best estimate, but in choosing weights this question must be confronted. In prior attempts to do so, analysts have often assumed that the difference between the "true" state of the economy and each estimate is pure noise, down-weighting estimates with greater variance as they are assumed to be noisier. In contrast, we show that each difference could be viewed as a piece of pure news, in which case a greater variance warrants a larger weight as it reflects more information about the "true" state of the economy. Additional considerations must be brought to bear to produce sensible weights; using information from revisions to GDP and GDI, we show that the news assumption is probably the closer approximation to reality, in which case GDP would receive the larger weight.


Title: Accumulated Respondent Burden on NASS Surveys

Abstract:

USDA's National Agricultural Statistics Service (NASS) provides statistical information and services to farmers, ranchers, agribusinesses and public officials. Most of the data published is derived from surveys of farmers across the country, which is a relatively small population. Many producers are contacted repeatedly and the survey respondent burden placed upon farmers is an important issue to NASS. Respondent burden consists of many factors, including the interview length and question difficulty. Burden is accumulated over time when a respondent is in a panel study or selected for multiple surveys. Accumulated respondent burden, therefore, also includes the number of surveys a participant is selected for and the length of time between interviews. Respondent burden is often linked in discussions with response rates, and one common theory is that greater respondent burden is correlated with lower response rates.

This discussion presents results summarizing responses from many of NASS' major agricultural surveys conducted from January 2000 through December 2003. Survey response data were combined from 184 surveys and included over 2.2 million survey records with 579,531 farming operations. Response rates were evaluated by the number of surveys operations were in during the four-year study period. In addition, prior respondent burden was compared between participants and non-participants for several individual surveys.


Title: An Efficient Method of Estimating the True Value of a Population Characteristic from its Discrepant Estimates

Talk to be video-conferenced.

Abstract:

In the context of sample surveys, an estimator for a domain (or a subpopulation) such as a geographical area, a socio-demographic group, or an industry group is referred to as "direct" if it uses values of a variable only from the sample units within the domain. A domain is considered as small if the domain sample size is not large enough to yield direct estimates of adequate precision and large otherwise. The standard design-based mode of inference used in survey analysis may produce reliable domain estimates if the domains are large. Strengths and weakness of design-based inference for surveys are discussed in the statistics literature. Reliable estimates for small domains can only be produced by moving away from the design-based estimation of conventional direct estimates to indirect model-dependent estimates. Naturally, concerns are raised about the reliance on models for the production of small domain estimates. The major weakness of model-based inference is that if the model is seriously misspecified, then it can yield inferences that are much worse than design-based inferences. In this paper, we consider a situation where a generalized errors-in-the-variables model for small domains is appropriate to the available data. This model has the virtues of (i) avoiding certain model misspecifications and (ii) yielding good approximations to preferred design-consistent estimators.


Title: Database Integration in Biological Studies

Abstract:

The work in our laboratory involves integration of different databases to solve biological problems of interest. Our philosophy is that different databases will give us some but not all the information about the biological problems. By combining different problems intelligently, we are able to obtain a more complete picture of the problems of interest. We will use the following two examples to show our points: a) Protein function prediction combining different data sources, and b) Understanding lethality.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.


Title: Pre-election Polls and the 2004 General Election

Abstract:

Many arguments over results from pre-election polls took place in the months leading up to the November election. Speculation about the effects of the conventions and the debates, not to mention the Swift Boat accusations and the CBS News story on Bush's military service, ran rampant. All of the polls drew criticism from one quarter or another. This seminar will feature three of the most prominent national pollsters in last year's election. They will discuss the performance of the polls and the controversies surrounding them, including the important role that state polls played in the presidential election.


Topic: Analysis of Nonresponse in Telephone Surveys

Abstract:

The measurement effects of survey nonresponse are especially difficult to address when little is known about nonrespondents. Nonresponse is often evaluated via comparisons to known demographic, socioeconomic, and geographic distributions, or through costly re-interviews with nonrespondents. Surveys with longitudinal components offer the opportunity to evaluate nonresponse at the individual level without the need for re-contacts. Previous survey responses exist for nonrespondents in the current survey cycle, which allows for analyzing the impact of attrition on the survey measures of interest. Often for cross-sectional surveys, geographic location is the only information that is known for nonrespondents. Nonresponse analysis is dependent on the availability of survey information for respondents and from independent sources. We examine two large-scale telephone surveys different in design and intent conducted for the Department of Housing and Urban Development (HUD) Fair Market Rent program. The analyses are driven by the different sample designs and challenges such as the scarcity of response data beyond the subject of rent. The Regions surveys, which focus on yearly rent change, have a longitudinal component for analyzing differential attrition with respect to rent. Without the benefit of longitudinal data, nonresponse analysis for the Areas surveys, which focus on rent levels, requires a different approach based on census data extracted at the tract level.


Title: The Conservation Effects Assessment Project (CEAP) - An Overview

Abstract:

The Natural Resources Conservation Service (NRCS) and the Agricultural Research Service (ARS) have joined together in collaboration with other Federal agencies and universities to initiate studies that will quantify the environmental benefits of conservation practices. A national assessment is being implemented to track environmental benefits over time at the national scale. In selected regions, watershed studies are being initiated to provide more in-depth assessments at a finer scale of resolution. This presentation will provide a brief overview of the CEAP effort and summarize the analytical approach being used to assess benefits of conservation practices on cropland. The approach is based on microsimulation modeling using a subset of the Natural Resources Inventory (NRI) sample. The CEAP sample consists of 30,000 NRI sample points where additional information on farming practices and conservation practices have been collected from farmers by the National Agricultural Statistics Service.


Topic: Data Quality of the American Community Survey Across Individuals Living in Linguistically Isolated and Non-Linguistically Isolated Households: A Latent Variable Model Assessment

Abstract:

The possibility exists that survey instruments provide differential internal validity and/or reliability across multiple populations. In the psychometric tradition, this potential is described as measurement bias, and is also labeled measurement noninvariance, differential item functioning (DIF), and measurement heterogeneity. As defined in the psychometric discipline, measurement bias is present on survey instruments when individuals equivalent on a measured construct, but from different groups, do not have identical probabilities of producing observed scores. A source of measurement error and non-sampling error, bias has the potential to negatively affect the quality of survey data. To the extent that the surveys used to gather data are comprised of biased items across subpopulations, differences or similarities in the observed responses across subpopulations may reflect differential item functioning rather than true findings, and the quality of the data will be compromised. Given the potential problems that individuals whose primary language is not English may experience in responding to the American Community Survey (ACS), the Census Bureau is concerned with ascertaining the quality of data collected across individuals living in linguistically isolated (LI) and non-LI households. The goal of the current study was to assess the internal validity of the ACS across LI status using a latent variable measurement model, confirmatory factor analysis for ordered-categorical measures (CFA-OCM), thereby addressing one dimension of data quality on the survey. Analyses examined whether the full set of measurement parameters for the six items measuring disability demonstrated differential item functioning. Given the set of adopted statistical criteria, results generally demonstrated measurement equivalence. Findings support uniformity in the internal validity of ACS data collected across LI individuals for these items, and dispute concerns that LI substantially affects the ability of LI individuals to answer these items on the ACS in a meaningful way as compared to non-LI individuals. When making comparisons regarding disability, investigators using ACS data can be less concerned that LI status impacts the validity of comparisons and can place greater faith in the quality of their comparisons.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.


Title: A Statistician's View of the FBI Compositional Analysis of Bullet Lead (CABL)

Abstract:

The FBI has used the chemical analysis of bullet lead as a forensic tool for about 40 years. When a bullet(s) is found at a crime scene and in a suspect's possession the trace elements in the bullet can be compared. If the bullets are similar enough a match is declared. The use of this technique in court testimony is the subject of a recent NAS report. The NAS report has been covered by national media, see for example an L. A. Times article http://www.latimes.com/news/nationworld/nation/la-sci-bullet11feb11,1,4488937.story?coll=la-home-nation, other major newspapers, and broadcast media also have had coverage. I will discuss the report from the point of view of one of the two statisticians on the NAS panel (Professor Karen Kafadar the other statistician colleague on this effort) who wrote the report. I discuss some of the scientific method issues concerning the crime lab in this instance as well. Karen and I are giving a "Chance" invited talk at the JSM on this Topic.


2004 ROGER HERRIOT AWARD

WSS Seminar and Reception to Recognize Paula Schneider as the Recipient of the 2004 Roger Herriot Award (rescheduled due to snow in March)

Title: Major Challenges for the Future: Administrative Records, American Community Survey, and the American Fact Finder

Roger Herriot was the Associate Commissioner for Statistical Standards and Methodology at the National Center for Education Statistics (NCES). After his sudden death in May 1994 the Washington Statistical Society, the Social Statistics and Government Statistics Sections of the American Statistical Association established an award in his memory to recognize individuals who develop unique approaches to the solution of statistical problems in Federal data collection programs.

Paula J. Schneider, formerly of the U.S. Census Bureau, was the recipient of the 2004 Roger Herriot Award for Innovation in Federal Statistics. While the participating ASA sections honored the Herriot Award recipient at the Joint Statistical Meetings in Toronto last summer, the Washington Statistical Society, as a co-sponsor of the award, will hold its own ceremony, centered around one of its lunch-time seminars at the Bureau of Labor Statistics (BLS) Conference Center.


Topic: Using Census Data to Define Estimation Areas for the American Community Survey: A Case Study

Abstract:

The full implementation of the American Community Survey (ACS) rolled out in January 2005. The ACS selected samples in each of the 3,141 counties in the U.S. and each of the 78 municipios in Puerto Rico. Estimates for geographic areas with 65,000 population or more will be published in 2006. The ACS controls weighting and estimation at the county level and uses a minimum sample size requirement of 400 sample persons. The ACS weighting and estimation methodology defines estimation areas to meet a minimum population size such that when accounting for nonresponse and subsampling the minimum sample size requirement is met. The minimum sample size is required to support adequate reliability for aggregation purposes. Counties below the population threshold must be grouped or clustered prior to estimation.

A simple method - referred to as the naive method - simply groups the counties based on @geographic adjacency@ and then assess similarity of counties within cluster based on some predefined socio-demographic criteria. This paper describes the naive clustering process and statistical assessment. A more sophisticated algorithm - referred to as the Aautomatic method@ - was also developed. The automatic method is an iterative method that uses a set of demographic and economic characteristics to define a similarity index based on the Euclidean distance metric. Geographic proximity is one of the variables included in the algorithm. This paper describes the algorithm and the development of the similarity index. The results of the two clustering schemes are compared based on an application developed for the Puerto Rico municipios and counties in the state of Texas.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.


Title: Major Challenges for the Future: Administrative Records, American Topic: Mortality Before and After the 2003 Invasion Of Iraq: Cluster Sample Survey

Abstract:

Roger Herriot was the Associate Commissioner for Statistical Standards and The number of Iraqis dying because of conflict or sanctions since the 1991 Gulf war is uncertain. Claims ranging from a denial of increased mortality to millions of excess deaths have been made. No surveys or census-based estimates of crude mortality have been undertaken in Iraq in more than a decade. In a setting of insecurity and limited availability of health information, a nationwide household survey was conducted using a cluster sample to estimate mortality during the 146 months before the invasion (Jan 1, 2002, to March 18, 2003) and to compare it with the period from March 19, 2003, to the date of the interview, between Sept 8 and 20, 2004.

It is estimated that about 100,000 excess deaths or more have occurred since the 2003 invasion of Iraq. This is the basis of the number widely cited in the media and originally reported in the Lancet. Violence accounted for most of the excess deaths and air strikes from coalition forces accounted for most violent deaths. The study demonstrates that collection of public-health information is possible even during periods of extreme violence. The results need further verification and should lead to changes to reduce noncombatant deaths from air strikes.


Title: Why Small Changes in Question Wording Can Produce Big Changes in Survey Measurement: Unraveling Some Mysteries of Questionnaire Design with the Theory of Satisficing

Abstract:

Since the 1940s, thousands of experiments have been published showing that small changes in the wording of questions or the ordering of questions or response choices can substantially affect the answers survey respondents provide. But over the years, much more research has documented such effects than has explained the psychological mechanisms responsible for them, describing when and why these effects occur and what to do about them in the pursuit of accurate measurement. This talk will present the theory of survey satisficing, which offers a parsimonious explanation for a range of question wording, structuring, and ordering effects and ties them all to a single psychological mechanism and a single set of variables that are thought to turn these effects on and off. A review of the accumulated social science literature documents wide-ranging empirical support for satisficing theory, which has clear implications for good measurement practice in surveys.

Contact: Rupa Jethwa Eapen, 301-314-7911, rjeapen@survey.umd.edu


Title: Analyzing Call History Records to Achieve Optimal Outcomes: Survey Response and Efficient Use of Resources

Abstract:

Acceptable response rates are only maintained through increased efforts. However, these efforts are often directed by anecdotal evidence and a small core of literature. This research attempts to analyze attempt history information in order to evaluate operational guidelines and better utilize resources. Although data used in the analysis are from a telephone survey, these techniques can be readily adapted to other survey modes, providing the appropriate metadata is collected. The data used in this analysis are from the call history information of the Telephone Point-of-Purchase Survey (TPOPS), a nationally representative, list-assisted, RDD survey conducted by the Census Bureau for the Bureau of Labor Statistics. The data used in this analysis were collected quarterly in the years 2003 and 2004.

Attempts soliciting response to surveys can be modified in a number of ways to improve the probability of a positive outcome: the number of attempts, the time of day (or interview shift) of an attempt, the time lag between attempts, and the priority of attempts to sampling units. As a starting point, various probability distributions are examined that describe contact and interview completion in relation to these attempt variables. It is clear, however, that each attempt is dependent on not only the previous attempt, but also combinations of prior attempts. Conditional distributions are examined leading to logistic and proportional hazard models. Logistic regression and multinomial logistic are used to address the likelihood of a positive outcome (contact and interview completion) given call history information. Because the TPOPS is a rotating panel survey, information from previous waves are also utilized in these models. Proportional hazard models are used to examine, not only the likelihood of contact or interview completion, but the time until that event occurs, giving insight into the most efficient avenue to pursue with attempt strategies. Finally, we propose a survival model that estimates optimal time lags by modeling the probability of contact and completion using the complete attempt history. The estimated calling lags can be used to inform call scheduling rules for similar telephone surveys.


Title: Modeling and Spatial Prediction of Pre-Settlement Patterns of Forest Distribution using Witness Tree Data

Abstract:

Prior to European settlement, land surveys were conducted throughout the United States. These surveys include records of witness trees at grid intersections, providing quantitative information on pre-settlement forest composition and species-site relationships. Such information can provide insight into environmental factors influencing the distributions of each tree species, free from European influences. Assuming that the locations of trees of each species are realized from independent inhomogeneous Poisson processes whose respective log intensities are linear functions of environmental covariates (i.e., elevation, land form, and province), the species observed at the survey-grid intersections are independently sampled from generalized logistic regression model. A model for all 68 species found in the survey would be highly over-parameterized, so only the distribution of the most common species, longleaf pine, will be considered at this time. To assess the impact of environmental factors not included in the model, a hidden Gaussian Markov random field shall be added as a random effect. A Markov Chain Monte Carlo algorithm is developed for Bayesian inference on model parameters, and Bayes posterior prediction of the distribution of longleaf pine in southeastern Alabama.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.


Title: Evaluating Alternative Calibration Schemes for an Economic Survey With Large Nonresponse

Abstract:

The U.S. Department of Agriculture (USDA) conducts an annual economic survey of farm operations called the Cost and Returns Report (CRR). This survey is fairly long and is based on a multiphase sample with several opportunities for nonresponse. As a result, CRR response rates are low by government standards. The USDA has historically "trued up" the estimated number of farms within a region and economic size class using more reliable benchmark aggregates determined from other sources.

The USDA used truncated linear calibration to reweight the 2002 CRR. This allowed the Department to employ greater number of benchmark aggregates than it the past. Delete-a-group (d-a-g) jackknives were used to measure the accuracy of 2002 CRR estimates. Furthermore, d.a.g jackknife methodology was employed to investigate"

  1. the effects of replacing the (last phase of the) nonresponse adjustment with linear calibration,
  2. the effects of accounting for the area frame component of the CRR with linear calibration of the list frame sample only, and
  3. the effects of using a calibration routine based on a log function in place of a linear one.

Both analyses find that there are characteristics that are correlated with each design feature.this Topic.

THE NATIONAL ACADEMIES
COMMITTEE ON NATIONAL STATISTICS

Title: Discrepant Estimates of Social and Economic Phenomena: Love 'Em or Leave 'Em?

Panel Session I - How Data Producers Handle Discrepant Estimates

Panel Session II - Views from Downstream Users

Abstract:

Discrepant estimates from two or more sources exist for many key social and economic statistics that are widely used for policy analysis and in policy debates. Just think of the two estimates of monthly job creation that featured prominently in political discourse before the November 2004 elections - one estimate from the CPS household survey, the other estimate from employer payroll records. Discrepancies can arise from differences (major or minor) in concepts, question wording, data collection mode, and other aspects of data collection and processing. In some instances, estimates would not be expected to agree; in other instances, estimates should be close but diverge for such reasons as sampling variability and measurement error differences. Panelists in Session I will briefly describe some discrepant statistics, what is known (and not known) about the reasons for them, what problems they present for statistical agencies, and what agencies have done to educate the media and data users about them. Panelists in Session II will discuss the benefits and problems of discrepant estimates for users of the data for policy analysis, research, and program evaluation. A book of source documents on different estimates will be available at the seminar.


Title: Runs in Bernoulli trials: how much success should we expect?

Abstract:

Consider a (long) sequence of independent Bernoulli trials. We consider first the length of the longest success run. The modern history of this problem goes back to the celebrated Erdos-Renyi Theorem in 1970; we will review some applications of this result, and some generalizations, to DNA sequences. We look also at a related problem which seems to have attracted much less attention: the length of the longest run of successes OR failures in a sequence of Bernoulli trials, or more generally, the longest run of any type in a sequence of multinomial trials.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.


Title: On the Borders of Statistics and Computer Science

Abstract:

Machine learning in computer science and prediction and classification in statistics are essentially equivalent fields. I will try to illustrate the relation between theory and practice in this huge area by a few examples and results. In particular I will try to address an apparent puzzle: Worst case analyses, using empirical process theory, seem to suggest that even for moderate data dimension and reasonable sample sizes good prediction (supervised learning) should be very difficult. On the other hand, practice seems to indicate that even when the number of dimensions is very much higher than the number of observations, we can often do very well. We also discuss a new method of dimension estimation and some features of cross validation.

The Statistics Colloquium Series is open to all and is sponsored by the Department of Applied and Engineering Statistics, the Center for Computational Statistics, the School of Computational Sciences and the Data Sciences Program at George Mason University. Use these links for directions and a campus map. If driving, visitors should use the visitor's parking area in the Parking Deck (near the middle of the map). Signs on campus point the way to the Parking Deck. Visitors using Metro can take a bus from the Vienna Metro Station.


Title: Evolving Information Access/Security Issues

Abstract:

The events of September 11 compelled federal agencies to carefully review the information made available to the public over the Internet in a new light. Prior to 9-11, EPA had some experience with this issue, removing information on facility Risk Management Plans from public access based on security concerns articulated by DOJ. In Fall 2001, EPA initiated a broad review, screening the large array of its databases, tools, and models publicly accessible via the Internet, to assess their potential for misuse. In Spring 2002, the White House initiated a second, broader round of reviews by federal agencies. An ongoing issue has been how to maintain consistency in access decisions across the Agency. In addition to these security issues, the Agency has encountered more complex privacy issues as it seeks greater access to public health data collected by other agencies. Information access and security must be addressed in the context of the federal guidelines on identity management, and NIST's new FIPS guidelines. These physical security systems and decisions may provide solutions for "sensitive" information, but they do not resolve the primary policy issues of what needs protecting.

Heightened information security raises a number of publicpolicy issues, including impacts on public access and potential privacy infringements, as well as increased costs. What has happened as a result of heightened information concerns over the past few years, and what lessons are discernible? e.g., Has any solid federal policy emerged? How important is it that agencies have a unified policy across their various offices/subagencies, and what pressures push toward fractured approaches? To what extent might an identity management framework "solve" these problems? What are the likely effects on access and privacy?


Title: Jackknife and Bootstrap Resampling Methods in Two-Stage Designs

Abstracts:

Empiracal Study on the Second Stage Sample Size - Yan Liu, Mary Batcher, Ryan Petska, and Amy Luo

In a typical research setting, two-stage stratified sampling is typically done in situations where both the populations and the samples are large. But in the case of an audit setting, where business records are sampled and reviewed, sampling is typically done on relatively small populations and samples. For this setting, there are two common methods used for variance estimation; the classical design-based approach or a resampling approach. The classical design-based approach directly incorporates the second-stage sample size into the variance formula, while the typical resampling approach does not explicitly express the second-stage sample size but it is implied in the variance formula.

It is known that as the second-stage sample size increases, the overall variance decreases; but how large of a second-stage sample size is 'large enough?' In this paper, we will investigate the impact the second-stage sample size has on the overall estimation in different estimation approaches in thetwo-stage stratified, audit sampling setting.

The Bootstrap Variance Estimator in a Nested Two-stage Sample Design with High Sampling Rates - Steven Kaufman.

When the sampling rates are high, it is important to reflect the finite population correction (FPC) in the variance estimator. With replication methodologies, this can be accomplished by multiplying the replicate weights by an appropriate factor. This is the same thing as multiplying the replication variance by the first-stage FPC. Since this is a simple multiplication, this factor is applied to all variance components. In a single stage sample design, this works quite well because there is only one variance component, the first-stage component, which needs to be multiplied by the first-stage FPC. In multiple stage designs, the second and subsequent stages variance components are correct without adjustment. So this adjustment, when applied, will necessarily introduce a bias in the overall variance estimate. With the bootstrap, it is easy to adjust the variance estimator to correct for this bias. However, in this process, it is usually assumed that there are at least two units selected within each stratum for all selection stages. This paper describes how to modify the bootstrap procedure to handle the situation where only one unit is selected within a second and/or subsequent stage.


Title: Questionnaire Design Methodology for a Study of Human Rights Abuses during the Armed Internal Conflict of Sierra Leone

Abstract:

In order to estimate a count of human rights abuses of various types in Sierra Leone during the 1991-2001 armed internal conflict, a national random sample survey was administered between January and July of 2004 by the American Bar Association. The author served as the technical and administrative coordinator for that project, and used the opportunity to test several questionnaire design methodologies, including scripted probes to elicit time information, pairing of interviewer and respondent by gender, and cognitive interviewing across multiple languages follows by extensive standardization across languages. In this talk, the author will discuss the conflict, the survey project in general, and the questionnaire design methodology in specific. She will then discuss the results of the survey in the context of this methodology, and finally make suggestions as to future uses for the data collected in Sierra Leone.


Title: An Introduction to the American Time Use Survey

Abstract:

The American Time Use Survey (ATUS) collects information on how people living in the United States spend their time. While BLS has long produced statistics about the labor market, such as employment, hours, and earnings, the ATUS marks the first time that a Federal statistical agency has produced estimates on how Americans spend another critical resource their time. Estimates show the kinds of activities people do and the time spent doing them by sex, age, educational attainment, labor force status, and other characteristics, as well as by weekday and weekend day. The possibilities for using ATUS data are extremely broad, and this seminar is designed to introduce researchers and policymakers to the ATUS and to illustrate some of the questions that can be answered using ATUS data. This seminar will provide a brief overview of the ATUS, describe what data are collected, and present results from some of the early research. A question and answer period will follow.


Title: An Assessment of the Comparative Accuracy of Time Series Forecasts of Patent Filings: The Benefits of Disaggregation in Space or Time

Abstract:

This work with the European Patent Office studies methods for forecasting the filing of patents. The filings are subdivided by regional blocs and industries. Issues addressed are: benefits of multivariate models versus univariate in exploiting correlations between filings in different blocs or industries; effects of aggregation over time and effects of aggregation by bloc or industry on forecast accuracy of total EPO filings. Two approaches are used: the ARIMA framework and the dynamic linear model (DLM) in both univariate and multivariate modes.

We find: that monthly data tends to provide greater accuracy in annual forecasts; no significant benefits are gained from multivariate modelling or aggregating over blocs or industries. There were benefits from using monthly data, the best modelling approach is the univariate DLM; for annual data either the univariate ARIMA or DLM could be used.

The recommended forecasting approach provides a benchmark against which other forecasts drawing on different data sources can be compared.


Title: The Fourth Funding Opportunity in Survey and Statistical Research Seminar

Registration:

There is no registration fee. If you plan to attend, please e-mail SDockery@CDC.gov by May 31 if possible, to guarantee seating, help with planning refreshments, and to be put on the BLS seminar attendance list.

If you are not registered by May 31, please e-mail your name, affiliation, and the name of this seminar to wss_seminar@BLS.gov (underscore after wss) by noon June 9 or call 202-691-7524 and leave message.

Abstract:

Since 1998, 12 Federal statistical agencies in collaboration with the National Science Foundation and the support of the Federal Committee on Statistical Methodology have been funding and administrating the Funding Opportunity in Survey and Statistical Research, a research grants program oriented to the needs of the Federal Statistical System. The Fourth Funding Opportunity Seminar features the reports of the principal investigators of 3 research projects that were funded in 2003, and invited speakers and discussants.

Agenda:

8:45 a.m. - Continental Breakfast
9:00 a.m. - Welcoming Remarks, Brian Harris-Kojetin, OMB
9:10 a.m. - Session 1. Future Directions of Total Error Research
Invited Speaker: Paul Biemer, RTI
10:00 a.m. - Session 2. Topics in Small Area Estimation
Investigators: Malay Ghosh, University of Florida & Tapabrata Maiti, Iowa State University
Discussant: Jerry J. Maples, USCB
11:00 a.m. - Refreshment Break
11:15 a.m. - Session 3. Improved Methods of Estimating Production and Income Across Nations
Investigators: Alan Heston, University of Pennsylvania & Robert Feenstra, University of California-Davis
Discussant: Raymond Mataloni Jr., BEA
12:15 p.m. - Lunch on your own
1:30 p.m. Session 4. Regression and Deconvolution with Heteroscedastic Measurement Error
Investigator: Leonard Stefanski, North Carolina State University
Discussant: Stephen M. Miller, BLS
2.30 p.m. - Refreshment Break
2:45 p.m. - Session 5. Remarks on the Funding Opportunity, Past and Future
Speaker: Robert E. Fay, USCB

Title: Household Telephone Service and Usage Patterns in the United States in 2004

Abstract:

Recent changes in the U.S. telephone system (especially the growing reliance on cell phones) have led to concern about coverage error and productivity in telephone surveys. In 2004, a supplement to the Current Population Survey on telephone service was conducted. This talk will present some of the more interesting results from this supplement with respect to cell phone usage. There also will be a discussion of problems associated with asking questions about household telephone service. In addition, a recent study comparing survey results from a sample of cell phones and a sample of landlines will be discussed. This discussion will include a description of procedures used to improve response rates among the cell phone respondents as well as weighting and estimation issues. Finally, an explanation will be given for why it is unlikely that the major Federal surveys will ever be conducted by telephone.


Topic: Using MASSC to Minimize Information Loss and Control Disclosure Risk

Abstract:

This presentation describes a statistical disclosure limitation methodology, applicable to microdata, that is designed to control the information loss that occurs as a result of the disclosure treatment. This methodology, developed by RTI and known as MASSC, provides a statistical process that relies partly on random perturbation and partly on random suppression, thus limiting the introduction of bias and variance. With this stochastic framework, MASSC introduces sufficient uncertainty about the presence and identity of a target. As a result, sensitive databases previously unavailable because of confidentiality concerns can be treated with MASSC and made available to researchers. MASSC enables the control of disclosure risk and information loss without modeling assumptions for both tabular and micro data. In addition, standard software for the analysis of survey data may be used to analyze MASSC-treated data sets.

In order to examine the real-world performance of MASSC, we will discuss the application of MASSC to data derived from the National Health Interview Survey (NHIS). The NHIS, conducted by the National Center for Health Statistics, provides information on health-related outcomes in sampled families and individuals. Protecting the confidentiality of families and individuals in the family is of great concern when releasing public use files because they contain confidential information on health-related characteristics. The impact of MASSC treatment will be examined by comparing estimates derived from the 2000 NHIS public use data with estimates derived from the data set produced by applying MASSC to the 2000 NHIS public use data. In particular, the impact of MASSC treatment on a variety of totals, standard errors, proportions, and regression coefficients will be discussed.

Title: User and Network Profiling

Abstract:

We address two types of problems which come up in many information assurance / cyber security contexts. One is to determine identity or intent by indirect methods. For example, we want to know who a computer user is, or whether a program is good or ill, by observing not what they assert about themselves but how they behave. The other is to determine whether aggregate behavior is normal or abnormal. For example, we want to know whether the traffic coming into and going out of our network represents normal activity or not.

In each of these problems the underlying data has both numerical and categorical components and varies with time. Numerical and categorical data collected over intervals of time can be interpreted as collections of text documents The success of many text document techniques depends on the representations of the data having sparseness, information localization and near-orthogonality. This suggests that techniques used in text classification and analysis may be useful in other problems whose representations as text documents have these properties.

We describe two experiments. In the first the goal is to identify computer users by their behavior, as evidenced by sequences of window titles from login sessions. In the second the goal is to create a process for identifying abnormal or anomalous network behavior using data from network packet headers.


Title: Statistical Analysis of Ultrasound Images of Tongue Contours during Speech

Abstract:

The shape and movement of the tongue are critical in the formation of human speech. Modern imaging techniques allow scientists to study tongue shape and movement without interfering with speech. This presentation describes statistical isssues arising from ultrasound imaging of tongue contour data.

There are many sources of variability in tongue image data, including speaker to speaker differences, intraspeaker differences, noise in the images, and other measurement problems. To make matters worse, the tongue is supported entirely by soft tissue, so no fixed co-ordinate system is available. Statistical methods to deal with these problems are presented.

The goal of the research is to associate tongue shapes and sound production. Principal component analysis is used to reduce contours. Combinations of two basic shapes accurately represent tongue contours. The results are physiologically meaningful and correspond well to actual speech activity. The methods are applied to a sample of 16 subjects, each producing four vowel sounds. It was found that principal components clearly distinguish vowels based on tongue contours.

We also investigate whether speakers fall into distinct groups on the basis of their tongue contours. Cluster analysis is used to identify possible groupings, but many variants of this technique are possible and the results are sometimes conflicting. Methods to compare multiple cluster analyses are suggested and applied to tongue contour to assess the meaning of apparent speaker clusters.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.


Title: Estimation Following a Group Sequential Test for Distributions in the One-Parameter Exponential Family

Abstract:

We consider unbiased estimation following a group sequential test for distributions in a one-parameter exponential family. We show that, for an estimable parameter function, there exists uniquely an unbiased estimator depending on the sufficient statistic and based on the truncation-adaptation criterion (Liu and Hall (1999)); moreover, this estimator is identical to one based on the Rao-Blackwell method. When completeness fails, we show that the uniformly minimum-variance unbiased estimator may not exist or might possess undesirable performance. A Phase-II clinical trial application with exponentially distributed responses is included.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.


Title: The Dynamics of Nonmetro Wage Income Distribution: A Model-Based Approach

Abstract:

The dispersion of nonmetro wage incomes has increased in the last four decades. The conventional wisdom in the labor economics literature is that this fact and the fact that the relative frequency of large nonmetro wage incomes has increased together indicate a trend toward greater wage income inequality caused by the "hollowing out" of the wage income distribution into groups of wage rich and wage poor workers with few in between. In fact, all percentiles of wage incomes increase by roughly the same proportion when mean nonmetro wage income increases, which it did in the last four decades. The main exception is small wage income percentiles which increase by a somewhat larger proportion. This transformation necessarily increases the dispersion of nonmetro wage incomes and greatly increases the relative frequency of very large incomes. This transformation is a stretching of the distribution over larger wage incomes, not the emergence of a hollowed out distribution. This transformation has happened without a substantial increase in the Gini concentration ratio of wage incomes. In the perspective of the conventional wisdom of the labor economics literature, a tax subsidy strategy for rural economic development will only exacerbate inequality and do very little for low wage workers. The new model-based findings reported in this talk imply that while direct subsidy to low wage workers may certainly help those particular nonmetro low wage workers, the way to raise small nonmetro wage income percentiles is to raise the nonmetro mean of wage income, and a tax subsidy strategy might be optimal for that goal.


Title: Testable and Model-Preserving Parametric Constraints in a Parametric Probability Model

Abstract:

We shall first introduce two basic concepts, namely "testable" parametric constraints and "model-preserving" parametric constraints. Some general results on these concepts are presented along with some examples to illustrate them. It will be noted that even when a hypothesis is stated in terms of a non-testable parametric constraint, some statistical inference can still be drawn on this by considering the "equivalent" testable version of this hypothesis.

Next, some of the results are specialized for the linear model. Lastly, a necessary and sufficient condition is obtained for a linear parametric constraint to be model-preserving.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.


Title: Panel on Privacy and Public Perception of Risks

Abstract:

This panel will discuss a variety of views on key privacy issues facing the research community today. David Banks talk about data confidentiality in transportation applications, and the cost-benefit issues that arise. He will also touch on data mining issues, especially in the context of counterterrorism. Gerald Gates, will highlight work done by the American Statistical Association to describe the nature of data mining as a statistical tool and ethical issues that may arise. This work was prompted by media characterization of statisticians, who assist data mining, as unconcerned with privacy threats. Through a series of Frequently Asked Questions, the association responds to concerns about the appropriate use of data mining, the rights of persons whose data are mined, the role of the statistician in the data mining process and methods to ensure privacy when mining data. Ari Schwartz will discuss his work on increasing privacy protection and increasing individual's access to government information via the internet. Brian Dautch will speak about privacy legislation that has been introduced or passed by the 109th Congress. Many of these bills impact people's individual privacy rights, and some of them were even driven by people's increasing privacy demands and expectations. Herb Lin will pull together some common themes from the panelist discussion and touch on some of his research on privacy and technology for the Computer Science and Telecommunications Board, National Research Council of the National Academies.


Title: Topic related to modeling of age at first marriage in the US population

Abstract:

TDo recent decreases in marriage rates mean that more women are forgoing marriage, or that women are simply marrying at later ages? Recently published demographic projections from standard nuptiality models that suggest changes in marriage rates have different implications for women of different social classes, producing an "education crossover" in which four-year college graduate women have become more likely to marry than other women in the US, instead of less likely as has been the case for at least a century. To test these findings, I develop a new projection technique that predicts the proportion of women marrying by age 45 under flexible assumptions about trends in age-specific marriage rates and effects of unmeasured heterogeneity. Results from the 1996 and 2001 Surveys of Income and Program Participation suggest that the "crossover" in marriage by educational attainment is either not happening or is taking much longer than predicted. Also, recent trends are broadly consistent with an ongoing slow decline in proportions of women ever marrying, although that decline is less pronounced in the last decade than in previous decades.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.


TITLE: Balanced Sampling with Applications to Accounting Populations

Abstract:

Weighted balanced sampling is a way of restricting the configure of sample units that can be selected from a finite population. This method can be extremely efficient under certain types of structural models that are reasonable in some accounting problems. We review theoretical results that support weighted balancing, compare different methods of selecting weighted balanced samples, and give some practical examples. Where appropriate, balancing can meet precision goals with small samples and can be robust to some types of model misspecification. The variance that can be achieved is closely related to the Godambe-Joshi lower bound from design-based theory.

One of the methods of selecting these samples is restricted randomization in which "off-balance" samples are rejected if selected. Another is deep stratification in which strata are formed based on a function of a single auxiliary and one or two units are selected with equal probability from each stratum. For both methods, inclusion probabilities can be computed and design-based inference done if desired.

Simulation results will be presented to compare results from balanced samples with ones selected in more traditional ways.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.


Title: Testing Degenerate Tensors in Diffusion Tensor Images

Abstract:

Diffusion tensor (DT) images are used to map accurately the structure and orientation of fiber tracts in the white matter of the human brain in vivo. The directional dependence of diffusion is characterized by a matrix of the effective diffusion of water, denoted by D. Tractography algorithms have been developed to connect consecutive directions of maximal diffusion in order to reconstruct white matter tracts in the human brain in vivo. However, the performance of these algorithms is strongly influenced by the amount of noise in the images and by the number and prevalence of degeneracy in the brain where maximal diffusion is poorly defined. We propose a simple procedure for searching for degeneracy that uses rigorous test statistics based on invariant measures of diffusion tensors, such as fractional anisotropy. Our procedure effectively identifies singularities while accounting for the effects of noise. Examining DT images in human subjects, we demonstrate that this new procedure readily classifies diffusion tensors at each voxel into standard types (nondegenerate, oblate, prolate, and isotropic) without resorting to tensor characteristics at neighboring voxels. We also study the effects of singularities on the reconstructing fiber tracts in specific anatomical regions.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.


Title: Should the RCT Model Be a "Gold Standard" for Social Policy Research?

Abstract:

Randomized clinical trials (RCTs) are deemed the gold standard for evaluating the impact of medical treatments. But can the RCT model be transferred to test the impact of social policy interventions? Issues to be discussed include blinding/lack of blinding, reliance on subject self reports, and the role of social interaction.


WSS SEMINAR AND JULIUS SHISKIN AWARD PRESENTATION

Title: Topic related to modeling of age at first marriage in the US population

Abstract:

The paper outlines the approach statistical agencies used in constructing a Consumer Price Index prior to the appearance of the new international CPI Manual and explains why there was a need for new approaches. The paper explains the main theoretical approaches to index number theory and how they converge on just a few index number formulae that are "best' for each approach. Fortunately, these "best" formulae all closely approximate each other and hence there is no need for national statistical agencies to choose between the competing theoretical approaches in order to define a "target" index concept for their CPI. The paper concludes with a list of 6 main problems associated with constructing a CPI.

Shiskin Award: Following the seminar presentation, the Julius Shiskin Award of 2005 will be presented to Professor Diewert, for path-breaking economic theoretical innovations, notably in index number theory, adapted to improve national economic statistics around the world. He has contributed original theoretical work in a wide range of fields, from measurement of capital services to analyses of productivity to applications of duality theory and flexible functional forms. Professor Diewert is a leading economic and statistical theorist who is dedicated to improving economic statistics throughout the world. The Julius Shiskin Award was intended to honor original and important contributions in the development of economic statistics and in their use in interpreting economic events, and is jointly sponsored by the Washington Statistical Society and the National Association of Business Economists.

Please join the Washington Statistical Society on October 20, 2005, at 3:30 p.m. to honor W. Erwin Diewert as we present the award to him and celebrate in a reception following the award.


Title: Best Practices In Estimating And Reporting Nonresponse Bias

Abstract:

Declining response rates have prompted increasing concern among statistical agencies about potential nonresponse bias. Since most surveys have little information about those who don't respond, estimating the effect of nonresponse on key estimates can be a difficult task. This presentation illustrates some of the methods which may be useful in studying the impact of nonresponse in household surveys using the CPS as an example. Every survey is different, so different methods may be needed depending on the differences in nonresponse and differences in estimates. Using multiple methods and a sensitivity analysis to show what might be a plausible effect of nonresponse is the focus of the talk. Data from the CPS Census match study will serve as the benchmark to evaluate and compare the different methods for nonresponse bias analysis.


TITLE: Stochastic Variants of EM: Monte Carlo, Quasi-Monte Carlo, and More

Abstract:

We review recent advances in stochastic implementations of the EM algorithm. We review the Ascent-based Monte Carlo EM algorithm, a new automated version of Monte Carlo EM based on EM's likelihood ascent property. We discuss more efficient implementations via quasi-Monte Carlo sampling. We also re-visit a new implementation of the old stochastic approximation version for EM. We illustrate some of the methods on a geostatistical model of online purchases.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.


Title: Lifetime in Official Statistics — A Question & Answer Period with Dr. Ivan Fellegi, Chief Statistician - Statistics Canada

Abstract:

Dr. Ivan Fellegi was appointed Chief Statistician of Canada in 1985. He has served Statistics Canada since 1957 in positions of increasing responsibility. He chaired the Conference of European Statisticians of the United Nations Economic Commission for Europe (ECE), 1993-97. He has been President of a number of statistical bodies including the International Statistical Institute, the International Association of Survey Statisticians, and the Statistical Society of Canada. In 1978 he was appointed to the Commission on the Reorganization of the US Statistical System, established by President Carter. He has served on panels of the National Academy of Sciences. In 1997, he was awarded the Gold Medal by the Statistical Society of Canada and awarded the Robert Schuman medal by the European Community. Dr. Fellegi has published extensively on statistical methods, on the social and economic applications of statistics and on the successful management of statistical agencies. Dr. Fellegi will discuss the importance of mission and leadership in statistical organizations and will answer related questions.

For more information, and to gain access to the Census Bureau for this event, contact Yves Thibaudeau at (301)763-1906 or at yves.thibaudeau@census.gov by Tuesday, October 25.


THE NATIONAL ACADEMIES
COMMITTEE ON NATIONAL STATISTICS

Title: How Can We Conduct Telephone Surveys in a Cell Phone Age?

Abstract:

The increasing use of cell phones is challenging traditional telephone survey data collection methods in many respects. Seminar speakers will address important elements of this emerging set of issues. Clyde Tucker will discuss what has been learned from such sources as the Current Population Survey cell phone supplement, the National Health Interview Survey, and the Consumer Expenditure Survey about cell phone prevalence and usage patterns (for example, as an add-on or replacement for land-line phones). Mike Brick will discuss the implications of growing cell phone usage-survey nonresponse rates and bias from differential nonresponse-as well as experience with methods that attempt to compensate for bias. Bob Groves will address the broad range of policy issues that cell phones present for survey researchers (for example, incentives and confidentiality protection) and outline changes in survey designs that may be needed in the future-specifically,mixed-mode designs that use telephone and personal interviews. He will also contrast the U.S. situation with Finland's 10 years of experience in conducting cell phone surveys. A book of background materials will be available at the seminar.


Topic: Methods for Re-identifying and Analyzing Masked Microdata

Abstract:

Publicly available microdata are a valuable resource for data mining and statistical analysis. Microdata are masked to protect the privacy of individuals. The resultant file should be suitable for one or two analytic purposes. This overview begins by describing a number of methods for masking microdata. It provides re-identification methods based on analytic understanding of populations and variables and applying re-identification methods such as record linkage or nearest-neighbor. To provide one or two analyses, masked microdata need to reproduce approximately distributional properties or certain aggregates. The producer should describe the limitations of the masked microdata. Original, non-public microdata may be of poor quality. Any resultant masked microdata will likely be of lesser quality and not suitable for most analyses. Background papers are available at http://www.census.gov/srd/papers/pdf/rrs2004-06.pdf and http://www.census.gov/srd/papers/pdf/rrs2004-03.pdf. A general list of references for microdata confidentiality is available at http://www.niss.org/affiliates/totalsurveyerrorworkshop200503/tse_presentations.html

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.


Title: Augmented designs to assess immune response in vaccine trials

Abstract:

This paper introduces methods for use in vaccine clinical trials to help determine if the immune response to a vaccine is actually causing a reduction in the infection rate. This is not easy because immune response to the (say HIV) vaccine is only observed in the HIV vaccine arm. If we knew what the HIV-specific immune response in placebo recipients would have been, had they been vaccinated, this immune response could be treated essentially like a baseline covariate and an interaction with treatment could be evaluated. Relatedly, the rate of infection by this baseline covariate could be compared between the two groups and a causative role of immune response would be supported if infection risk decreased with increasing HIV immune response only in the vaccine group. We introduce two methods for inferring this HIV-specific immune response. The first involves vaccinating everyone before baseline with an irrelevant vaccine, e.g. rabies. Randomization ensures that the relationship between the immune responses to the rabies and HIV vaccines observed in the vaccine group is the same as what would have been seen in the placebo group. We infer a placebo volunteer's response to the HIV vaccine using their rabies response and a prediction model from the vaccine group. The second method entails vaccinating all uninfected placebo patients at the closeout of the trial with the HIV vaccine and recording immune response. We pretend this immune response at closeout is what they would have had at baseline. We can then infer what the distribution of immune response among placebo infecteds would have been. Such designs may help elucidate the role of immune response in preventing infections. More pointedly, they could be helpful in the decision to improve or abandon an HIV vaccine with mediocre performance in a phase III trial.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.


FIFTEENTH ANNUAL MORRIS HANSEN LECTURE

Title: Causal Inference Through Potential Outcomes: Application to Quality of Life Studies with 'Censoring' Due to Death and to Studies of the Effect of Job-Training Programs on Wages

Abstract:

Causal inference is best understood using potential outcomes, which include all post-treatment quantities. The use of potential outcomes to define causal effects is particularly important in more complex settings, i.e., observational studies or randomized experiments with complications such as noncompliance. This lecture deals with the issue of estimating the casual effect of a treatment on a primary outcome that is "censored" by an intermediate outcome, for example, the effect of a drug treatment on Quality of Life (QOL) in a randomized experiment where some of the patients die before their QOL can be assessed. Because both QOL and death are post-randomization quantities, they both should be considered potential outcomes, and the effect of treatment versus control on QOL is only well-defined for the subset of patients who would live under either treatment or control. Another application is to an educational program designed to increase final test scores, which are not defined for those who drop out of school before taking the test. A further application is to studies of the effect of job-training programs on wages, where wages are only defined for those who are employed, and thus the effect of the job-training program on wages is only well-defined for the subset of individuals who would be employed whether or not they were trained. Some empirical results are presented from Zhang, Rubin, and Mealli (2004), which indicate that this framework can lead to new insights because the analysis is not predicated on traditional econometric assumptions.

About the lecturer:

Donald B. Rubin is the John L. Loeb Professor of Statistics and former Chairman of the Department of Statistics at Harvard University, where he has taught for over 20 years. Professor Rubin has over 300 publications, including several books, on a variety of topics, including causal inference, missing data, sample surveys, computational methods, Bayesian statistics, and applications in many areas of social and biomedical science; and he is among the most highly cited mathematical scientists in the world. Among his many honors and awards, he is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the American Association for the Advancement of Science, a past John Simon Guggenheim Fellow, a member of the International Statistical Institute and the American Academy of Arts and Sciences, a past Fisher Lecturer at the Joint Statistical Meetings, and a recipient of two of the most prestigious awards available to statisticians: the Samuel S. Wilks Medal of the American Statistical Association and the Emanuel and Carol Parzen Prize for Statistical Innovation. Professor Rubin holds an A.B. degree (psychology) from Princeton University, and M.S. (computer science) and Ph.D. (statistics) degrees from Harvard.


Title: Composite Likelihood Inference in Spatial Generalized Linear Mixed Models

Abstract:

Spatial GLMMs (Diggle, et al. 1998) are flexible models for a variety of applications where we have observations of spatially dependent and non-Gaussian random variables. As in a standard GLMM (Breslow and Clayton, 1993) given the random effects, which they model by a Gaussian random field, the observations are conditionally independent and follow a generalized linear model. In a number of applications, neither Bayesian nor maximum likelihood approaches appear practical for large sets of correlated data. To gain computational efficiency, one may approximate the objective function. Instead of the likelihood, we consider a composite likelihood (Lindsay, 1988), which is the product of likelihoods for subsets of data, and estimate parameters by maximizing this product. The asymptotic properties of such estimators will be outlined. The application of the methods to the data from the experiment on aberrant crypt foci (ACF), which are precursors of colon cancer, will be presented.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.


Title: Empirical Bayes Analysis of Bivariate Binary Data: An Application to Small Area Estimation

Abstracts:

The paper provides an empirical Bayes (EB) analysis of bivariate binary data with application to small area estimation. Small area estimation is gaining increasing prominence in survey methodology. The need for such statistics is felt in both the public and private sectors. The reason behind its success is that the same survey data, originally targeted towards a higher level of geography (e.g. states) needs to be used also for producing estimates at a lower level of aggregation (e.g. counties, subcounties or census tracts). The direct estimates in such cases are unavailable (e.g. due to zero sample size) and almost always unreliable due to large standard errors and coefficients of variation arising from the paucity of samples in individual areas.

The motivating example in this study is to estimate jointly the proportion of newborns with low birthweight and infant mortality rate at low levels of geography such as districts within a state. The data from the infant mortality study was conducted by NCHS. The original survey was designed to obtain reliable estimates at the state level. The same data needs to be used to produce estimates at the district level. We have used an EB approach for the analysis of such data. We have found second order correct approximations of the mean squared errors (MSE's) of these estimators, and have derived estimators of these MSE's which are also correct up to the second order. The methodology is illustrated with some real data related to low birthweight and infant mortality.


Title: Bayesian Social Network Models with Acute Outcomes

Abstract:

I begin by introducing the concept of DALYs, Disability Adjusted Life Years and indicate that alcohol abuse is a major risk factor in public health. I give an illustration of exploratory analysis of the consequences of alcohol abuse using some contemporary visualization software. Alcohol abuse also leads to violence related acute outcomes for both society and individuals. Among these, I identify DWI crashes with fatalities, assault and battery, suicide, murder, sexual assault, domestic violence, and child abuse. Alcohol abusers are embedded in a social network that involves the user, family and friends, producers and distributors of alcohol products, law enforcement, the judiciary, remediation, education, detox and treatment facilities, which are coupled to insurance and managed-care programs. This complex network is reminiscent of more traditional biologic ecology systems, hence the name. The basic idea is to formulate a model of this network with the goal of exploring short- and long-term interventions that reduce the overall probability of acute outcomes. The framework that is being pursued is a dynamic agent-based simulation. The basic model is a stochastic directed-graph model that follows agents (sometimes referred to as actors or individuals) through a 24-hour period. The stochastic directed graph has two major features that are being developed. First, I engage in what I call scenario development. This involves development of scenarios of typical behaviors throughout a day for nonusers, casual drinkers, alcohol abusers, and alcoholics. Associated with these scenarios, I am developing methods for estimation of transition probabilities from state to state during the day reflecting different behaviors and specific to both ethnic groups and geographic location. It is clear that models of this type can be used to investigate negative effects of drugs post FDA approval. Also, it is clear that a similar model structure of social networks can be applied to terrorist networks with the same ability to examine interventions in order to assess their effectiveness.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.


Topoic: A Comparative Study of Complex Households in Six Race/Ethnic Groups, with Implications for Censuses and Surveys

Abstract:

What commonalities link Navajos in their vast Arizona reservation withurban African Americans in Virginia and rural whites in upstate NewYork? More than you=d suspect when they live in complex householdsthat include people other than nuclear kin. This session presents thefinal results of a unique collaborative, cross-disciplinary study ofcomplex households in six major race/ethnic groups, summarized from thebook, Complex Ethnic Households in America, edited by Laurel Schwede,Rae Lesser Blumberg and Anna Y. Chan, forthcoming from Rowman &Littlefield Publishers, Inc., in January, 2006. It presents theresults of both qualitative ethnographic fieldwork and uniqueapplications of Census 2000 data available in the Census Bureau websitetables to give a fuller picture of the interaction of complexhouseholds, race/ethnicity, and gender than either type of data canprovide alone.

In this study, ethnographic researchers teamed with Census Bureausocial science researchers to examine increasing diversity in bothrace/ethnic and household structure in fieldwork in 2000 and in 2001.They conducted integrated small-scale qualitative studies of complexhouseholds, using the same objectives, methods, and core protocol whileconducting simultaneous studies of six U.S. race/ethnic communities:Navajo in Arizona (Tongue, with Blumberg presenting results), Inupiatin Alaska (Craver), Korean migrants in Queens, NY (Kang), Latinoimmigrants in Virginia (Goerman), urban African Americans in Virginia(Holmes) and rural whites in NY (Childs). Comparative studies are rarein qualitative research. We begin the session with an overview of thisstudy and why complex households may be growing in numbers andimportance. Schwede sets the context, using Census 2000 data to comparehousehold structure patterns of each group with the overall nationalpopulation. The ethnographers then present the results of theirindividual qualitative ethnic studies exploring interactions ofhousehold structure and ethnicity. Each presents fascinatingdescriptions of kinship, residence patterns, and living situations ofactual households, often through respondents= own words. Theethnographers compare conceptions of Ahousehold@ and Afamily@ in theirgroups with those of the overall culture, finding substantialvariations in meaning, raising issues of cross-ethnic validity of thesebedrock units of social science analysis.

The final presenters synthesize the results. Blumberg analyzes theintersection of ethnicity, household structure, and gender, viewingeach group through the lens of her theory of gender stratification.Schwede discusses factors influencing the formation, maintenance anddissolution of complex households and identifies implications of thestudy for censuses and surveys.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.


Title: The Effects of Cell Collapsing in Poststratification

Abstracts:

Poststratification is a common method of estimation in household surveys. Cells are formed based on characteristics that are known for all sample respondents and for which external control counts are available from a census or another source. The inverses of the poststratification adjustments are usually referred to as coverage ratios. Coverage of some demographic groups may be substantially below 100 percent, and poststratifying serves to correct for biases due to poor coverage. A standard procedure in poststratification is to collapse or combine cells when the sample sizes fall below some minimum or the weight adjustments are above some maximum. Collapsing may decrease the variance of an estimate but may simultaneously increase its bias. We study the effects on bias and variance of this type of dynamic cell collapsing through simulation using a population based on the 2003 National Health Interview Survey.


Title: Analysis of Genotype-Phenotype Relationships: Machine Learning/Statistical Methods

Abstract:

Understanding the relationship of genotype to phenotype is a fundamental problem in modern genetics research. However, significant analytical challenges exist in the study of genotype-phenotype relationships. These challenges include genotype data in the form of unordered categorical values (e.g., nucleotides, amino acids, SNPs), numerous levels of variables, mixture of variable types (categorical and numerical), and potential for non-additive interactions between variables (epistasis). These challenges can be dealt with through use of machine learning/statistical approaches such as tree-based statistical models and random forests. These methods recursively partition a data set in two (binary split) based on values of a single predictor variable to best achieve homogeneous subsets of a categorical response variable (classification) or to best separate low and high values of a continuous response variable (regression). These methods are very well suited for the analysis of genotype-phenotype relationships and have been shown to provide outstanding results. Examples to be presented include identifying amino acids important in spectral tuning in color vision and nucleotide sequence changes important in some growth characteristics in maize.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.


Title: Comparing Homeowner and Lender Estimates of Housing Wealth and Mortgage Terms

Abstracts:

Much research on housing wealth relies on the assumption that households are able to report these data accurately. In this paper, we test the validity of this assumption by comparing homeowner-reported data on house values and mortgage terms from the Survey of Consumer Finances (SCF) to lender-reported data from the Office of Federal Housing Enterprise Oversight (OFHEO), the Loan Performance Corporation, and the Residential Finance Survey. We test the accuracy of the data in two ways. First, we compare the distributions of key variables in the homeowner- and lender-reported data. Second, we examine the internal edit codes in the SCF to assess respondent confidence in their answers.

We find that homeowners are able to report the broad features of their housing wealth rather well. An index of house value appreciation based on SCF data matches the aggregate OFHEO index fairly closely. This finding is consistent with other studies that suggest that owner assessments of house value are reasonably accurate. Homeowners are also able to report the maturity and type of their mortgage with a fair amount of accuracy. However, homeowners with adjustable-rate mortgages are less certain about many aspects of their mortgages.

These findings imply that homeowner-reported data are more useful for investigating some housing wealth questions than others. Studies of the effects of housing wealth on consumption, for example, can reasonably be based on homeowner-reported data. However, lender-reported data may be preferred for studies of the vulnerability of households to interest rate shocks.


Title: Data Collection and Statistical Issues in Surveying Cell and Landline Telephone Samples

Abstracts:

As an increasing proportion of the US population use cell phones for most or all of their personal telephone activities, research into conducting surveys that include cell phones is important. This talk reviews a dual frame survey of landline and cell phone numbers conducted in the summer of 2004 for the Joint Program in Survey Methodology. The goal of the survey was to evaluate the feasibility of including cellular phone numbers in a random digit dial telephone survey. As an introduction, a brief background on the status of coverage and usage by telephone service will given, followed by some of the key operational and statistical issues identified as a result of conducting the survey. Special attention is devoted to the statistical biases associated with the dual frame approach.