Washington Statistical Society Seminars

Current | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | 1997 | 1996 | 1995 | Methodology

January 2000
2
Thur.
Power-Tail Distributions for Modeling Internet Traffic
18
Tues.
The National Center for Health Statistics Research Data Center: New Research Opportunities
20
Thur.
Issues in Combining Survey Data: Estimates from the Medical Expenditure Panel Survey and the Medicare Current Beneficiary Survey
21
Fri.
Robust Mixture Modeling and Applications
25
Tues.
Latent Class Analysis of Embedded Repeated Measurements: An Application to the National Household Survey on Drug Abuse
February 2000
4
Tues.
Analyzing Recurrent Event Data with Informative Censoring
14
Mon.
Addressing Multiple Goals In Evaluating Region-Specific Risk Using Bayesian Methods
8
Fri.
Mining Superlarge Datasets
16
Wed.
Experimental Poverty Measures: Research Issues
23
Wed.
Measuring Job Flows and the Life Cycle of Establishments With BLS Longitudinal Establishment Microdata
March 2000
8
Wed.
Recent Developments in Legal Frameworks Governing Individually Identifiable Data
9
Thur.
Issues in Combining Survey Data: Estimates from the Medical Expenditure Panel Survey and the Medicare Current Beneficiary Survey
13
Mon.
Tutorial -- Multiple Imputation: Fabricate Your Data Well
23
Thur.
Common Influences Across Household Surveys on Noncontact Nonresponse: Theory and Data
28
Tues.
An Innovative Technique for Estimating Sensitive Survey Items ("Three Card Method")
29
Wed.
Productivity in the Services Sector
April 2000
12
Wed.
Downweighting Influential PSUs in Surveys, with Application to the 1990 Post-Enumeration Survey
12
Wed.
The Pros and Cons of Using Design-Based Methods for Estimating Model Parameters: A General Theory
18
Tues.
An Analysis of The Relationship Between Survey burden and Non-response: If we bother them more, are they less cooperative?
18
Tues.
Exploring the Relationship between Survey Participation and Survey Sponsorship: What do respondents and non-respondents think of us?
19
Wed.
Estimation of Capital and Technology with a Dynamic Economic Model
25
Tues.
Latent Class Analysis of Embedded Repeated Measurements: An Application to the National Household Survey on Drug Abuse
25
Tues.
Producing an Annual Superlative Index Using Monthly Price Data
25
Tues.
AudioCASI: Design and Data Collection Issues
(4/4/00 - New Location)
May 2000
3
Wed.
Further Examination of the Distribution of Individual Income and Taxes Using a Consistent and Comprehensive Measure of Income
8
Mon.
The Foundation of AIC and Its Use in Multi-Model Inference
9
Tues.
Time-Use Surveys: What Are They? Why Are We Interested?
10
Wed.
TUTORIAL -- To Bayes or Not to Bayes
10
Wed.
Comparing IVR, CATI, and Mail to Collect Demographic Information
11
Thur.
Bayesian Statistics in Action
11
Thur.
WSS PRESIDENT'S INVITED ADDRESS
Structural Modeling in Time Series and an Analysis of Fatal Road Accidents
23
Tues.
Hierarchical Bayesian Nonresponse Models for Binary Data with Uncertainty about Ignorability
24
Wed.
Stable Distributions: Models for Heavy Tailed Data
25
Thur.
A Comparison of the Household Sector from the Flow of Funds Accounts and the Survey of Consumer Finances
31
Wed.
Bank Failures, Household Income Distribution, and Robust Mixture Modeling
June 2000
5
Mon.
Response Variance in the Current Population Survey Income Supplement
12
Mon.
Spatial and temporal trends in cancer incidence
13
Tues.
WSS PRESIDENT'S INVITED ADDRESS
A (Latin) Square Deal for Voters: Fixing a Flaw in the Australian Preferential Voting System

14
Wed.
Automated Multivariable Time Series Analysis of Industrial and Econometric Data
26
Mon.
The Influence of Environmental Characteristics on Survey Cooperation: A Comparison of Metropolitan Areas
28
Wed.
New Sources of Demographic Data on the Web
July 2000
10
Mon.
Recent developments in CATI and RDD issues
11
Tues.
Construction of efficient one-level rotation sampling designs
18
Tues.
Some Practical Aspects of Disclosure Analysis
September 2000
12
Tues.
Generalized Linear Models for Sample Surveys
12
Tues.
Predicting Nonresponse from Household and Regional Characteristics
14
Thurs.
An Algorithm for the Distribution of the Number of Successes in Fourth- or Lower-Order Markovian Trials
22
Fri.
Use of Hierarchical Modeling in Substance Abuse Surveys
October 2000
3
Tues.
Control Charts as a Tool in Data Quality Improvement
3
Tues.
Interdisciplinary Survey Research Involving the Computer and Cognitive Sciences: Cognitive Issues in the Design of Web Surveys
5
Thur.
The Reluctant Embrace of Law and Statistics
11
Wed.
TUTORIAL: Data Presentation -- A Guide to Good Graphics and Tables
19
Thur.
ASA Session Reprise: Research on Government Survey Nonresponse (Part 1)
24
Tues.
Hansen Lecture:  Models in the Practice of Survey Sampling (Revisited)
25
Wed.
Financial Service Usage Patterns of the Poor: Cost Considerations
November 2000
2
Thur.
Interdisciplinary Survey Research Involving the Computer and Cognitive Sciences: "Unraveling the Seam Effect," and "The Gold Standard of Question Quality on Surveys: Experts, Computer Tools, Versus Statistical Indices"
14
Tues.
Delivering Interactive Graphics on the Web
14
Tues.
Poverty estimates from the CPS and CE: Why they differ
15
Wed.
The 2000 Roger Herriot Award For Innovation In Federal Statistics
16
Thur.
ASA Session Reprise: Research on Government Survey Nonresponse (Part 2)
16
Thur.
Advances in Telephone Sample Designs
28
Tues.
The Effects of Person-level vs. Household-level Questionnaire Design on Survey Estimates and Data Quality
28
Tues.
Measuring Sexual Orientation in Health Surveys: Lesbian Health Research Issues
December 2000
7
Thur.
Education Certifications and Other Non-Traditional Educational Outcomes: Perspectives of Data Users and Researchers
11
Mon.
Classifying Open Ended Reports: Coding Occupation in the Current Population

WSS Home | Newsletter | WSS Info | Seminars | Courses | Employment | Feedback | Join!


Topic: Power-Tail Distributions for Modeling Internet Traffic

Abstract:

Internet traffic data indicate that long-tailed (power-tail, fat-tail) distributions typically serve as better models for packet interarrival times and/or service lengths. These distributions have properties (e.g., possibly lacking some or all of their moments) that make it impossible to derive manageable queueing theory formulas for measuring congestion. We discuss a plan to apply a new method for fitting probability distributions through their Laplace transforms to generate complete probabilistic analyses of queues with power-tail interarrival or service times.

Topic: The National Center for Health Statistics Research Data Center: New research Opportunities

Abstract:

The National Center for Health Statistics (NCHS) has developed a Research Data Center (RDC) which allows researchers and data users to access internal data files from its numerous surveys containing data items which have not been available to the research community until now. Internal NCHS files contain lower levels of geography such as state, county, census tract, block-group, or blocks, depending on the survey. Examples of data systems that are available through the RDC include the National Health Interview Survey, the National Health and Nutrition Examination Survey, the National Hospital Discharge Survey, the National Survey of Family Growth Contextual Data Files (these consist of the survey data and about 1,300 contextual variables and is only available through the RDC) the National Ambulatory Medical Care Survey among others. Researchers may use internal NCHS data files to merge data from the Census Bureau, the Area Resource File, or other data collected or provided by the researcher (air pollution data, state, county, or local laws or ordinances, reimbursement policies, medical facilities, etc.) to perform contextual analyses while maintaining respondent confidentiality. Because of the confidentiality constraints NCHS has not been able to release survey data with lower levels of geography to its data users which has limited the amount and types of research, policy, and programmatic projects that could be undertaken with its data systems. The development of the RDC begins an exciting new era for NCHS and its data users.

Topic: Issues in Combining Survey Data: Estimates from the Medical Expenditure Panel Survey and the Medicare Current Beneficiary Survey


Topic: Robust Mixture Modeling and Applications

Abstract:

We investigate the use of the popular nonparametric integrated squared error criterion in parametric estimation. Of particular interest are the problems of fitting normal mixture densities and linear regression. The algorithm is in the class of minimum distance estimators. We discuss some of its theoretical properties and compare it to maximum likelihood. The robustness of the procedure is demonstrated by example. The criterion may be applied in a wide range of models. Two case studies are given: an application to a series of yearly household income samples as well as a more complex application involves estimating an economic frontier function of U.S. banks where the data are assumed to be noisy. Extensions to clustering and discrimination problems follow.

Topic: Latent Class Analysis of Embedded Repeated Measurements:
An Application to the National Household Survey on Drug Abuse

Abstract:

Latent class analysis (LCA) is a statistical methodology that can be used to evaluate the error in categorical data when repeated measurements of the same constructs are available. Special problems arise in the analysis when the measurements are embedded within a single survey instrument. For example, the assumptions of independent classification error (ICE) may not hold due to respondent memory or other conditioning effects. In this article, we consider the application of LCA for evaluating classification error using repeated measurements embedded in survey questionnaire. To illustrate the techniques, we apply LCA to data from the 1994, 1995, and 1996 implementations of the National Household Survey on Drug Abuse. This application demonstrates the importance of LCA of embedded repeated measurements to identify questionnaire problems and, potentially, as a means for adjusting estimates, such as drug use prevalence, for classification error bias.

Topic: Analyzing Recurrent Event Data with Informative Censoring

Abstract

Recurrent event data are frequently encountered in longitudinal follow-up studies. The non-informative censoring assumption is usually required for the validity of statistical methods for analyzing recurrent event data. In many applications, however, censoring could be caused by informative drop-out or death, and it is unrealistic to assume the independence between the recurrent event process and the censoring time. In this talk, we consider recurrent events of the same type and allow the censoring mechanism to be either informative or non-informative. A multiplicative intensity model which possesses desirable interpretations is used as the underlying model. Statistical methods are developed for (i) nonparametric estimation of the cumulative occurrence rate function, (ii) kernel estimation of the occurrence rate function, (iii) semiparametric estimation of regression parameters.

An analysis of the inpatient care data from the AIDS Link to Intravenous Experiences cohort (ALIVE) is presented.

For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/spring2000.html

Topic: Mining Superlarge Datasets

Abstract:

Data mining has sprung up at the nexus of computer science, statistics, and management information systems. It attempts to find structure in large, high-dimensional datasets. To that end, the different disciplines have each developed their own repertoire of tools, which are now beginning to cross-pollinate and produce an improved understanding of structure discovery. This talk has two goals: (1) to outline to statisticians some of the contributions that have been developed by others, especially computer scientists, and (2) to lay out practical issues that have arisen in recent experience with superlarge datasets. More specifically, this talk will discuss the preanalysis and indexing of superlarge datasets, and present results of a desi gned experiment to compare the performances of such new-wave techniques as MARS, neural nets, projection pursuit regression, and other methods for structure discovery.

Topic: Addressing Multiple Goals In Evaluating Region-Specific Risk Using Bayesian Methods


Topic: Experimental Poverty Measures: Research Issues

Abstract:

In the summer of 1999 the Census Bureau released a report on experimental poverty measures. That report uses alternative measures of poverty based on recommendations of the National Academy of Sciences to illustrate our different understanding of who is poor depending on the measure of poverty that is used. We show the differential incidence of poverty among various demographic and socioeconomic subgroups using these alternative poverty measures compared with the current official measure of poverty. Particular attention will be paid to the important effect that changing our poverty measure has on our understanding of the economic situation of the elderly and the role of health care in poverty measurement.

Topic: Measuring Job Flows and the Life Cycle of Establishments With BLS Longitudinal Establishment Microdata

Abstract:

The Bureau of Labor Statistics (BLS) is constructing a longitudinal database with monthly employment and quarterly wage data for virtually all business establishments in the United States. This longitudinal database will enable us to track changes in employment and wages not only at the macro level, but also at the micro level of the establishment. This paper describes this new database, demonstrates its potential for researchers and policy-makers, and presents initial research results.

We begin with a description of the longitudinal database. The source of the establishment microdata are the quarterly contribution reports that all employers subject to state unemployment insurance laws are required to submit. These data are a comprehensive and accurate source of employment and wages, and they provide a virtual census (98 percent) of employees on nonfarm payrolls. A section of the paper will be devoted to explaining how we link establishments across quarters, with particular attention given to the accurate identification of continuous, new, and closing establishments. The longitudinal database is being constructed from ten years of quarterly microdata. At inception, this database will be a high quality, high frequency, timely and historically consistent source for empirical research.

One of the purposes of the longitudinal database is to encourage microdata research into topics such as job creation, job destruction, and the life cycle of establishments. We will discuss how researchers can obtain access to this longitudinal database. We will present initial research results from this database, highlighting differences across industries, geography, and size classes.

Title: Recent Developments in Legal Frameworks Governing Individually Identifiable Data

Abstract

This session will provide a brief overview of existing statutory and regulatory frameworks governing the use of individually identifiable data by federal agencies, including the Freedom of Information Act, the Privacy Act, and some of the agency specific requirements. It will then address the potential effects of recent Congressional requirements concerning OMB Circular A-110 and publication of the Notices of Proposed Rulemaking for the privacy and security of health data required under the Health Insurance Portability and Accountability Act of 1996.

Topic: Issues in Combining Survey Data: Estimates from the
Medical Expenditure Panel Survey and the Medicare Current Beneficiary Survey

Abstract:

The household survey (HS) of both the Medical Expenditure Panel Survey (MEPS) and the Medicare Current Beneficiary Survey (MCBS) were designed to produce annual estimates for a variety of measures related to health care use, expenses, sources of payment, health status and insurance coverage. Both are national representative population based samples; longitudinal, with rotation; and require multiple rounds of in-person CAPI data collection to produce an annual estimate. The surveys differ with respect to their target populations; in MEPS HS, the U.S. civilian non-institutionalized population, and in MCBS, the current U.S. Medicare Beneficiary population. The purpose of this evaluation was to assess the compatibility of the survey estimates derived from these surveys. The objectives were to: (1) enhance the analytic utility of each survey, and (2) further advance the goals of the DHHS=s Survey Integration Plan, which called for Athe analytic linkage of the MCBS and the MEPS samples.@ For this presentation, we compare and contrast the design of each survey, explore issues for combining data, and compare and contrast estimates from the surveys. The paper also includes a discussion of key analytic measures considered incompatible for pooling, given survey differences, and provides some recommendations for future efforts.

Topic: Spatial and temporal trends in cancer incidence

Abstract: Empirical Bayes and Markov Chain Monte Carlo (MCMC) methods for fitting the conditional autoregressive model are known to offer a useful way of smoothing spatial patterns in disease rates. These methods are extended to incorporate time trends, thus offering additional insight into the spread of disease over space and time. Alternative ways of graphically displaying trends in disease maps will be presented. This model can also be used to identify geographic areas that are experiencing rapid change in disease incidence. These statistical techniques will be demonstrated using breast cancer incidence data from the 169 towns of Connecticut during the years 1984-1994.


Tutorial -- Multiple Imputation: Fabricate Your Data Well

NOTE: This seminar will be shown simultaneously at the National Center for Health Statistics and at Westat via video.

Abstract:

Multiple imputation (MI) (Rubin, 1987) is a general-purpose method for handling missing data. Each missing observation is replaced by M > 1 simulated values, producing M completed datasets. The datasets are analyzed separately and the results are combined to yield inferences that account for missing-data uncertainty. This tutorial presentation will provide an overview of MI, including its advantages over other commonly used missing-data methods. Computational techniques for generating MI's in multivariate databases will be presented, with a live software demonstration. Finally, some issues surrounding the use of MI in complex surveys will be discussed, including its performance when used in conjunction with traditional randomization-based point and variance estimators.

Outline of talk:

  1. The missing-date problem (pp. 3-9)
  2. Methods for missing data (pp. 10-13)
  3. Single imputation (pp. 14-21)
  4. Multiple imputation (pp. 22-26) with key references (p. 27)
  5. Creating MI's (pp. 28-32)
  6. Multiple imputation FAQ's (pp. 33-42)
  7. Data example (pp. 43-46)
  8. References (pp. 47-49)

Three-way Video Conference - Michigan, Maryland, BLS

Topic: Common Influences Across Household Surveys on Noncontact Nonresponse: Theory and Data

Abstract:

There is growing convergence of findings that household survey nonresponse rates are increasing in the United States and other developed countries. At the same time there are recent experimental findings that high nonresponse rates do not necessarily produce high nonresponse error in large classes of survey statistics and some designs. This combination has prompted the development of theories to explain when nonresponse matters to survey inference and when it can be ignored.

By dissecting the nonresponse phenomenon into two major components B noncontact and refusals B we argue that there is some hope of separating a set of influences that are pervasive from a set that act in a more limited set of situations. We present a theoretical rationale asserting that influences on noncontact nonresponse are more consistent over survey designs than those affecting refusal nonresponse.

We then combine data from several surveys differing in mode, agency of collection, and response rates. We show consistent patterns of ease of contact across the surveys, across groups varying in household composition, access impediments, and calling patterns. Given these empirical results, We end by speculating on classes of measures that will be more or less affected by high noncontact rates, consistently, over broad classes of household survey designs.

Washington Statistical Society
Office of Research and Methodology Seminar

Topic: An Innovative Technique for Estimating Sensitive Survey Items ("Three Card Method")

Abstract:

The "three card method" is an innovative questionnaire technique that is designed to protect respondent privacy and possibly encourage more truthful answers in large-scale surveys. The technique involves three random subsamples-each consisting of completely different respondents. All subsamples are asked the same question, but each subsample answers using a slightly different answer card-so that a different piece of nonsensitive information is gathered from each subsample. When results from all three subsamples are combined, the sensitive answer category can be indirectly estimated for the relevant population or key subgroups. Initial development and testing (conducted or sponsored by GAO) focused on asking Hispanic immigrants/farmworkers about their immigration status. However, the three card method may prove applicable to a range of sensitive areas, including violent behaviors (road rage, child abuse, spouse abuse, police brutality), sensitive personal choices (abortion, marijuana use), and potentially many others.

Topic: Productivity in the Services Sector

Abstract:

In some services industries, the concept of real output is unclear. What is the output of an insurance company? Of an economics or statistics consulting firm? In what units would those outputs be measured? When the economic concepts that statistical agencies measure are unclear, it is hardly surprising that their output measures and their price indexes are problematic. And if it is difficult to measure the output of an industry, it must also be difficult to measure its productivity. The importance of this topic is indicated by two facts: First, as Griliches has pointed out, the post-1973 slowdown in U.S. productivity growth is concentrated in precisely those industries in which output measurement problems exist. For example, finance and insurance had positive productivity growth has been negative (declining by more than 2 per cent per year from 1977 to 1993). Do measurement errors in output and price deflators contribute to the negative productivity trend? Second, those hard-to-measure services sectors are also accounting for a growing proportion of the economy. Their measurement problems are accordingly making an increasing impact on the nation's overall measures of economic performance. The Brooking Program on Output and Productivity Measurement in the Service Sector was designed to address concepts and measurement problems in the difficult to measure services industries. This paper will present a progress report. It reviews measurement issues and recent research on a group of industries, such as banking, insurance and finance, communications and transportation, retail and wholesale trade, and business and professional services. It will also make a preliminary assessment of the degree that measurement errors in these sectors account for the post-1973 slowdown in the U.S. productivity growth.

Washington Statistical Society
Office of Research and Methodology Seminar

Title: Downweighting Influential PSUs in Surveys, with Application to the 1990 Post-Enumeration Survey

Abstract

Certain primary sampling units (PSUs) may be extremely influential on survey estimates and consequently contribute disproportionately to their variance. This talk will propose a general approach to downweighting influential PSUs, with downweighting factors derived by applying robust M-estimation to the empirical influence of the PSUs. The method is motivated by a problem in census coverage estimation. In this context, both extreme sampling weights and large coverage errors can lead to high influence, and influence can be estimated empirically by Taylor linearization of the survey estimator. As predicted by theory, the robust procedure greatly reduces the variance of estimated coverage rates, more so than truncation of weights. On the other hand, the procedure may introduce bias into survey estimates when the distributions of the influence statistics are asymmetric. Properties of the procedure in the presence of asymmetry will be considered, and techniques for assessing the bias-variance tradeoff will be demonstrated.

For further information contact: Joe Fred Gonzalez, ORM, NCHS, at 301-458-4239 or jfg2@cdc.gov

Topic: The Pros and Cons of Using Design-Based Methods for Estimating Model Parameters: A General Theory

Abstract:

One of the first questions an analyst asks when fitting a model to data that has been collected from a complex survey is whether or not to account for the survey design in the analysis. In fact, there are two questions that should be addressed. Not only must the analyst decide on whether or not to use the sampling weights for the point estimates of the unknown parameters, he must also consider how to estimate the variance of the estimators for hypothesis testing and deriving confidence intervals. There are a number of schools of thought on these questions. The pure model-based approach would demand that if the model being fitted is true, then one should use an optimal model-based estimator, and normally this would result in ignoring the sample design.

The variance of the estimator would be with respect to the underlying stochastic model in which the sample design is irrelevant. In the design-based approach, on the other hand, we assume that the observations are a random sample from a finite population. There is no reference to a superpopulation. The randomization mechanism is dictated by the chosen sampling design, which may include unequal probabilities of selection, clustering, stratification, and so on. We show that the design-based variance of the weighted estimator will be asymptotically equal to its model-based variance under a wide range of assumptions, at least for large samples where the sampling fraction is small, and that the pure model-based approach can lead to misleading conclusions when the model assumptions are violated. Some interesting new results on variance estimation based on estimating functions will be discussed.

Title: An Analysis of The Relationship Between Survey burden and Non-response: If we bother them more, are they less cooperative?

Abstract: In surveys of certain populations, individuals may be contacted on numerous occasions over time. This is particularly true in surveys of establishments, where large or unique operations may be selected with near certainty for recurring surveys and may be included in samples for multiple surveys. Cooperation in any particular survey may be affected by the number and frequency of times an establishment has been selected for surveys by that organization in the past.

This paper examines the relationship between response on the 1998 June Crops Survey in South Dakota and the reporting burden placed on operations by NASS in the past. The number of other NASS surveys operations have been contacted for the length of time since they were last contacted for a NASS survey, and the type of information they have been contacted for in the 2 years prior to the June Survey will be considered. Comparisons of these burden measures will be made between respondents and non-respondents for the June Survey. Implications of the relationship between survey burden and response will be discussed

Title: Exploring the Relationship between Survey Participation and Survey Sponsorship: What do respondents and non-respondents think of us?

NOTE: This seminar immediately follows the one above.

Abstract:

A series of questions was asked of QAS respondents in South Dakota in order to examine the relationship between their knowledge and attitudes toward NASS surveys and their survey participation. The questions were about the respondents' identification of NASS (at the local and national level), their perceptions of NASS and its data, the effect of data on the respondents, and their opinions regarding responding to NASS surveys. These questions were asked of both respondents and non-respondents to the QAS in contacts throughout 1998 and 1999.

Distinct differences were found in attitudes of respondents and non-respondents for most of these measures. Some differences were also found between different types and sizes of operations.

Findings are intended to guide promotional and public relations activities that will be targeted toward potential respondents and suggest data collection procedures that will increase survey participation. The opinion questions will continue to be asked of survey respondents to gauge changes in attitudes toward NASS as these activities continue.

Topic: Estimation of Capital and Technology with a Dynamic Economic Model

Abstract:

Two fundamental sources of growth of output are accumulation of production capital and technological knowledge (henceforth, more simply called 'capital' and 'technology'). The problem is that capital and technology are unobserved except at the most disaggregated levels of production activity. Therefore, in order to use capital and technology series in quantitative analysis, economists first have had to construct or estimate these series. The premise of the paper is that conventional estimates of capital and technology series have been based on unnecessarily limited theoretical and sample information. The paper describes a method for obtaining estimates of capital and technology from prices and quantities of related input and output variables. The method involves specifying and estimating a detailed structural dynamic economic model of a representative production firm in an industry and applying the Kalman smoother to the estimated model to compute estimates of unobserved capital and technology over the sample period. The specified model is estimated using annual U.S. total manufacturing data from 1947 to 1998. Because the resulting estimates of capital and technology are based on the detailed structural model and on sample observations of ten related prices and quantities of inputs and output, they are based on much wider theoretical and sample information than conventional estimates.

Topic: Latent Class Analysis of Embedded Repeated Measurements: An Application to the National Household Survey on Drug Abuse

Abstract:

Latent class analysis (LCA) is a statistical methodology that can be used to evaluate the error in categorical data when repeated measurements of the same constructs are available. Special problems arise in the analysis when the measurements are embedded within a single survey instrument. For example, the assumptions of independent classification error (ICE) may not hold due to respondent memory or other conditioning effects. In this article, we consider the application of LCA for evaluating classification error using repeated measurements embedded in survey questionnaire. To illustrate the techniques, we apply LCA to data from the 1994, 1995, and 1996 implementations of the National Household Survey on Drug Abuse. This application demonstrates the importance of LCA of embedded repeated measurements to identify questionnaire problems and, potentially, as a means for adjusting estimates, such as drug use prevalence, for classification error bias.

Topic: Producing an Annual Superlative Index Using Monthly Price Data

Abstract:

The main purpose of the presentation is to outline some alternative approaches on how a superlative annual consumer price index could be constructed using the monthly price information that is presently collected by statistical agencies. The first issue that must be addressed is: how should the monthly price information at the lowest level of aggregation be aggregated up (over months) to form an annual price level or price relative at this lowest level of commodity aggregation? Having constructed appropriate annual elementary indexes, the presentation discusses how to complete the process to construct an annual (calendar year) superlative index.

Considering the problem of seasonal commodities, it is noted that the construction of year over year (superlative) indexes for each month of the year should be free of seasonal influences. Moreover, the business community is typically quite interested in this class of index numbers so the presentation recommends that statistical agencies produce them. This approach leads to another two stage aggregation of the micro price information into an overall annual index: first aggregate across commodities holding the month constant and then aggregate across months. It is noted that the alternative two stage annual indexes can be quite close to each other.

The two annual indexes are on a calendar year basis; i.e., the price and quantity data pertaining to all 12 months in 1999 are compared to the corresponding prices and quantities for the base year, say 1995. However, we can also construct a moving year or rolling year annual index, using price and quantity information collected each month. Thus the statistical agency could produce a new rolling year index every month, which of course would lag behind its present very timely CPI index due to the lags involved in collecting the relevant quantity (or expenditure) information. The real advantage of these moving year superlative indexes is that they are both timely and do not require any seasonal adjustment.

A problem with the indexes discussed so far is that they do not give us any information on short term price movements; i.e., all of these indexes compare prices in a month in the current year with the same month in the base year. Thus, the presentation briefly discusses superlative month to month price indexes.

Topic: AudioCASI: Design and Data Collection Issues

Abstract:

Although audioCASI has been around for quite a while, until recently actual large-scale implementations of the technology have been few and far between. During the past few years, however, the software and hardware required for A-CASI have improved significantly, and its advantages for reducing the underreporting of sensitive behavior have become increasingly accepted. Consequently, more A-CASI projects have moved first into development, and then into full-scale production. This session will focus on practical issues and "lessons learned" in designing and fielding ACASI surveys, with special attention to sensitive items and populations with low literacy levels. What is different about the design process, compared to CAPI and CATI studies? What are the impacts on the development process and schedule? How about interviewer training? Are there additional logistical and support issues in the field?

Topic: Further Examination of the Distribution of Individual Income and Taxes Using a Consistent and Comprehensive Measure of Income

Abstract

Different approaches have been used to measure the distribution of individual income over time. Survey data, such as those of the Census Bureau=s CPS and SIPP, have been compiled with innovative enumeration methods, but underreporting, inter-temporal consistency and inadequate coverage at the highest income levels can still jeopardize results. Administrative records, such as tax returns, may be less susceptible to underreporting of income but can be limited in scope and coverage. Record linkage studies have capitalized on the advantages of both approaches, but are severely restricted by the laws governing data sharing.

This paper is the third in a series examining trends in the distribution of individual income and taxes based on a consistent and comprehensive measure of income derived exclusively from individual income tax returns. Statistics from 1979 through 1997 on the distribution of individual income, the shares of taxes paid, and average tax burdens, all by size of income, are presented, and analyzed. In addition, Lorenz curves and Gini coefficients have been estimated to assess trends in income inequality, both before- and after taxes, and some conclusions are made on these trends and the overall redistributive effects of the Federal income tax system.

Topic: The Foundation of AIC and Its Use in Multi-Model Inference

Abstract:

Today, model selection is most often the search for the "best" model (often using hypothesis testing), followed by inference conditional only on the selected model. This process ignores the fact that multiple models often provide competitive explanations of the data. Using Akaike's Information Criterion (AIC), model selection could account for the uncertainty in selecting a model, providing more realistic inferences. This talk outlines the philosophical and mathematical basis for AIC as a model selection criterion, and shows that AIC can be used to make model-averaged inferences over the set of models considered. A strong case can be made for basing model selection on likelihood and Kullback-Leibler (K-L) information theory. This approach leads to AIC and its important variations and does so without assuming the "true model" is even considered. Because of this, only the relative AIC differences between each model and the best model in the set are what matter. These differences measure the relative support for competing fitted models. Normalized weights derived from these relative differences allow unconditional inferences based either on the selected model or on averages over the model set. In both cases, the inferences account for the model selection uncertainty, i.e., they are conditional on more than the one best model.

Topic: Time-Use Surveys: What Are They? Why Are We Interested?

Abstract: The last half of the twentieth century has been witness to enormous social change and shifts in patterns of time use, both in the workplace and in the home. A variety of methodological approaches have been used to systematically collect information about the activity patterns of modern life. But it is the detailed chronological reporting procedure of the time-use survey that is valued by many social researchers because it provides a way to measure changes in behavior while avoiding many of the pitfalls associated with other survey collection procedures. This talk will introduce the Atime-use survey@ method of data collection. It will also include an overview of the essential features of a time-use interview and present the types of customers who are (or might be) interested in the data.

Title: TUTORIAL -- To Bayes or Not to Bayes

Abstract:

Developments in statistical computing over the last 10-15 years have led to a substantial increase in interest in the Bayesian approach to statistical inference. There remain many, however, that are concerned about the Bayesian approach's reliance on prior information. In this introductory level talk we review the Bayesian approach and discuss the various issues (advantages and disadvantages) associated with its use.

Topic: Comparing IVR, CATI, and Mail to Collect Demographic Information

Abstract:

This talk describes a study that compared three methods of collecting demographic information. (The questionnaire used the same items that will be administered on the Census 2000 Long Form.) One version of the questionnaire was administered by a computer program, which played digitized recordings of the questions over the telephone. This methodology--known variously as interactive voice response (IVR) or telephone audio-CASI (TACASI)--was compared with mail data collection and CATI interviews. With the long questionnaire we used, many respondents may broke the IVR interviews before they finished the questionnaire. Even ignoring these breakoffs, item nonresponse rates were also higher in IVR than with CATI or mail. Still, IVR may produce more accurate reports about sensitive topics, such as the receipt of welfare.

Title: Bayesian Statistics in Action

Abstract:

By structuring complicated models and providing a formalism for bringing in objective and subjective information, Bayes and empirical Bayes methods have the potential to produce more efficient and informative designs and analyses than those based on traditional approaches. However, realization of this potential requires that the methods be robust to departures from assumptions and be credible to the community that will use findings. To set a framework for discussing the role of Bayesian statistics, I outline the approach, including its potentials and prerequisites. Then, I discuss frequentist performance for Bayesian procedures, addressing non-standard goals, accommodating multiplicity, clinical trial monitoring and Bayesian design for frequentist analysis. Each topic includes both the necessary statistical formality and applied examples.

WSS PRESIDENT'S INVITED ADDRESS

Title: Structural Modeling in Time Series and an Analysis of Fatal Road Accidents

Abstract:

The statistical analysis of time series tends to be dominated by ARIMA (Autoregressive Integrated Moving Average) models. In the first part of this talk we consider the structural approach to time series modeling and identify some of the advantages and disadvantages of this framework relative to ARIMA. We also consider briefly some recent results that enable us to broaden the structural class in various ways. In particular, we find that exponential smoothing procedures are model-based and have a much wider range of applicability than previously thought.

In the second part of the talk we describe a study on numbers of fatal accidents on the interstate system using monthly data for individual States over the period 1975-98. The goal is to examine the impact of changes in speed limits upon the incidence of fatal accidents. These series were analyzed using the STAMP package for structural modeling. The analysis, summarized in Consumer Reports (April 2000), reveals some interesting differences when compared to previous studies.

About the speakers:

J. Keith Ord is a professor at the McDonough School of Business at Georgetown University. His research interests include business forecasting, inventory planning, and the statistical modeling of business processes. Dr. Ord is a co-author of Kendall's Advanced Theory of Statistics, a two-volume reference work now in its sixth edition; he is also an editor of the International Journal of Forecasting. Dr. Ord is a fellow of the American Statistical Association and an elected member of the International Statistical Institute.

Sandy Balkin is a Senior Consultant in Statistics Group of the Policy Economics and Quantitative Methods Group of Ernst & Young LLP. He recently received his Ph.D. in Business Administration from Penn State University. Dr. Balkin specializes in business statistics, marketing, finance, and operations research. He has published articles on statistical design of experiments, neural networks, and time series methodology.

Title: Hierarchical Bayesian Nonresponse Models for Binary Data with Uncertainty about Ignorability

Abstract:

We consider three Bayesian hierarchical models for binary nonresponse data which are clustered within a number of areas. The first model assumes the missing-data mechanism is ignorable, and the second assumes it to be nonignorable. We argue that discrete model expansion is inappropriate for modeling uncertainty about ignorability. Then we use a single model through continuous model expansion on an odds ratio ? (odds of success among respondents versus odds of success among all individuals) for each area. When ? =1, we have the ignorable model, otherwise there is nonignorability. By constructing a Bayesian credible interval, we can decide which areas have nonignorable nonresponses. We use data from two different household surveys, National Health Interview Survey (NHIS) and the National Crime Survey (NCS), to illustrate our methodology which is implemented using Markov chain Monte Carlo methods. There are differences among the three models for estimating the proportion of households with a characteristic (doctor visit in NHIS and victimization in NCS), and the missing data mechanism for some of the areas can be considered ignorable.

Key words: Model uncertainty, Model expansion, Nonignorability, Proportion, Selection model

Topic: Stable Distributions: Models for Heavy Tailed Data

Abstract:

Stable random variables are the r.v.s that retain their shape when added together. These distributions generalize the Gaussian distribution and allow skewness and heavy tails - features found in many large data sets from finance, telecommunication and hydrology. We give an overview of univariate and multivariate stable laws, focusing on statistical applications. Examples of financial and other data sets will be given. These distributions are now computationally accessible and should be added to the toolbox of the working statistician.

Title: A Comparison of the Household Sector from the Flow of Funds Accounts and the Survey of Consumer Finances

Abstract:

This paper compares figures on selected assets and liabilities from the flow of funds accounts (FFA) household sector with survey-based estimates from the 1989, 1992, and 1995 Survey of Consumer Finances (SCF). Previous studies compared definitionally inconsistent FFA and SCF measures and, thus, arrived at incorrect conclusions about the validity of the estimates. This analysis addresses common misperceptions about the definitions of the FFA household sector's assets and liabilities and reconciles more fully the FFA and SCF wealth components. The results show that for aggregate assets, aggregate liabilities, and specific wealth components, such as owner-occupied real estate, consumer credit, and home mortgage debt, the FFA and SCF estimates are quite close in 1989 and 1992 but move apart in 1995. Also, when placed on a comparable basis, differences between the FFA and SCF measures of savings deposits and publicly traded corporate shares shrank from those documented in previous studies but, nevertheless, still remain substantial.

Topic: Bank Failures, Household Income Distribution, and Robust Mixture Modeling

Abstract:

We investigate the use of the popular nonparametric integrated squared error criterion in parametric estimation. Of particular interest are the problems of fitting normal mixture densities and linear regression. We discuss some theoretical properties and comparisons to maximum likelihood. The robustness of the procedure is demonstrated by example. The criterion may be applied in a wide range of models. Two case studies are given: an application to a series of yearly household income samples as well as a more complex application involves estimating an economic frontier function of U.S. banks where the data are assumed to be noisy. Extensions to clustering and discrimination problems follow.

Title: Response Variance in the Current Population Survey Income Supplement

Abstract:

The Annual Demographic Supplement to the Current Population Survey (CPS) is the source of the annual estimate of the national poverty rate. In 1998, for the first time ever, the Census Bureau used reinterview to evaluate response error in the Supplement. Response error results from respondent errors in reporting or interviewer error in recording information in an interview. In categorical data, response error virtually guarantees bias.

The goal of the reinterview was to assess the reliability of the data from the Supplement. We describe the reinterview methodology and discuss overall results for five general sets of questions: income, public assistance, work experience, health insurance, and migration. We will highlight some specific questions from those sets and we will compare the response error for poverty and non-poverty households.

Title: Spatial and temporal trends in cancer incidence..

Abstract:

Empirical Bayes and Markov Chain Monte Carlo (MCMC) methods for fitting the conditional autoregressive model are known to offer a useful way of smoothing spatial patterns in disease rates. These methods are extended to incorporate time trends, thus offering additional insight into the spread of disease over space and time. Alternative ways of graphically displaying trends in disease maps will be presented. This model can also be used to identify geographic areas that are experiencing rapid change in disease incidence. These statistical techniques will be demonstrated using breast cancer incidence data from the 169 towns of Connecticut during the years 1984-1994.

WSS PRESIDENT'S INVITED ADDRESS

Title: A (Latin) Square Deal for Voters: Fixing a Flaw in the Australian Preferential Voting System

Abstract:

Australian parliamentary elections use preferential voting. A candidate who has insufficient votes for election on first preferences may receive lower preference votes from candidates with still fewer votes, who have been "excluded," and eventually be elected "on preferences." If, as is often the case, more than one vacancy is being filled at the same time, candidates may also receive lower preference votes from the "surpluses" of candidates who have more than fulfilled the requirement that they must have at least the required "quota" to secure election.

Complications arise where voters are essentially indifferent between rival candidates from the same party and vote "1, 2, 3, ..." down their chosen party's list ("party linear" voting). In 1979 a scheme called "Robson Rotation" was introduced in the State of Tasmania in which the party's list would be presented in c column orderings, where c is the number of candidates in the party's list, the columns being headed by each of the party's candidates in rotation, and the remaining names in each column being determined by a Latin square design.

When the Australian Capital Territory also adopted Robson Rotation in 1995, the extent of party linear voting was much greater than in Tasmania, and evidence soon appeared that a single Latin Square was inadequate to put candidates on an equal footing. In 1998 it was obvious that two and probably three out of the 17 successful candidates for the ACT's Legislative Assembly had been elected "by the luck of the draw" over candidates with greater popular support. The Canberra Branch of the Statistical Society of Australia took the initiative to find optimal experimental designs for the purpose, and it seems probable that these (or schemes closely based on the same idea) will be used in the 2001 elections.

For more information, see Professor Brewer's report of Robson rotation on the Statistical Society of Australia, Canberra Branch, webpage: http://www.ozemail.com.au/~ssacanb.

Topic: Automated Multivariable Time Series Analysis of Industrial and Econometric Data

Abstract:

Automatic statistical methods have recently been developed for modeling multivariable time series from observational input/output data. This has been applied to a number of difficult industrial and econometric problems resulting in major improvements. In this presentation, a tutorial is given of the primary elements of this new technology, followed by a discussion of some significant applications. The statistical modeling involves linear, time invariant dynamical processes with noise disturbances, possibly inputs and feedback, and includes determination of the system state order.

The basic method involves a canonical variate analysis that, for each potential state order, gives an optimal statistical selection of the system states. The computation involves primarily a singular value decomposition that is always computationally stable and accurate. For model state order selection, an optimal statistical procedure is used, namely a small sample version of the Akaike information criterion. The accuracy of the method is close to the optimal lower bound achieved by maximum likelihood for large samples. The resulting procedure is completely automatic and suitable for online time series analysis of high-order dynamic processes.

This technology has been widely applied in both academic and industrial settings to a variety of problems involving high-order multivariable processes that are possibly unstable, non-minimum phase, and/or involve nonstationary noise, stiff dynamics, unknown feedback and delays. This presentation describes a number of applications to the analysis of causality and feedback in monetary data, detection of abrupt system changes, industrial process monitoring, and adaptive modeling and online adaptive control. Automated multivariable time series analysis is a critical technology that is necessary to enable wide scale industrial automation and data mining of time series.

Title: The Influence of Environmental Characteristics on Survey Cooperation: A Comparison of Metropolitan Areas

Abstract:

A request for survey participation takes place within a broad context - a social and economic environment that can vary over time, across societies, or even across different geographic areas within a society. There are many examples of differences in nonresponse across different areas within a country, particularly distinctions observed between urban and rural; however, there is much less documentation of the varying social and economic conditions that may underlie these environmental differences in response rates. Recent research in social psychology has focused on specific characteristics of communities to try to understand some of the aspects of the environment that underlie differences in people's helping behavior. For example, Levine and his colleagues (1994) examined six different types of helping behavior in 36 cities and identified demographic, social, and economic characteristics of these communities that was related to the level of helping behaviors observed.

In this paper, we examine a number of indicators of the demographic, social, and economic environment from a number of sources, including Census data, to construct composite indicators that reflect social psychological attributes of metropolitan areas in the United States. We will then examine how well these indicators are related to differing levels of survey cooperation rates across metropolitan areas in the United States in two major national surveys. Finally, we discuss the implications of these findings for theories of survey cooperation and for improving data collection procedures.

Topic: New Sources of Demographic Data on the Web

Abstract:

The explosive growth of information on the World Wide Web has revolutionized personal, professional, and business practices. Five years ago, demographic data were only accessible to a limited audience of issue specialists. Today, anyone with access to the Internet can view and download demographic data from a wide variety of government and private Web sites (e.g., American Factfinder, FedStats, Ferret, PDQ-Explore, Ameristat). With increased access to data, there are also greater risks of misinterpreting important economic, social, and demographic trends, especially among non-technical users. The Population Reference Bureau is working in collaboration with Bill Frey of SUNY, Albany to develop a new Web site, Ameristat.org, a summary of the latest U.S. demographic trends and their consequences. The goal is to use Internet technologies to increase public awareness and understanding of demographic data.

Topic: Recent developments in CATI and RDD issues

Abstract:

There will be four papers presented in the by the panel members on recent CATI and RDD developements:

  1. Evaluation of the Use of Data on Interruption in Telephone Service (Ismael Flores-Cervantes, J. Michael Brick, Kevin Wang and Tom Hankins)

  2. Bias From Excluding Households without Telephones in Random Digit Dialing Surveys - Results of Two Surveys (Gary Shapiro, Ismael Flores-Cervantes, John Hall, and Genevieve Kenney)

  3. A Comparison and Evaluation of Two Survey Data Collection Methodologies: CATI vs. Mail (Paula Weir, Sherry Beri and Benita O'Colmain)

  4. The Effects of Telephone Introductions on Cooperation: An Experimental Comparison (Nileeni Meegama and Johnny Blair)

Title: Construction of efficient one-level rotation sampling designs

Abstract:

We introduce the formal rules how to construct the 2-way balanced one-level rotation design for which balancing is done on interview scheme in monthly sample and in rotation group. We provide the necessary and sufficient condition for 2-way balancing and an algorithm to construct such a design. From this design, we obtain a generalized composite estimator (GCE) and minimum linear unbiased estimator (MVLUE). The variance of GCE and MVLUE are presented when we consider two types of correlations among the subunits of a group, and variables depend on the number of interview times. Minimizing this variance, we derive the optimal coefficients of the GCE. The efficiency of the GCE with the optimal coefficients is compared to that of MVLUE and of other GCE with the fixed coefficients. We generate a family of two balanced one-level rotation designs, and show the efficiency of these and 4-8-4 designs.

Title: Some Practical Aspects of Disclosure Analysis

Abstract:

Disclosure analysis consists of a set of procedures applied to a data set to (1) ascertain the risk that an individual or organization whose information appears in the data can be identified, and (2) lower that risk to an acceptable level through sampling cases, eliminating variables, or manipulating data. Disclosure analysis is usually conducted when data collected under the promise of confidentiality are to be released to the public. While there has been a substantial amount of statistical theory developed around disclosure techniques (REFS) our concern is with practical solutions to disclosure problems. We base our discussion on our experiences with disclosure analyses we have conducted with files of different types and levels. In this paper we discuss the purpose of disclosure analysis and types of disclosure problems, focusing on the distinction between direct and inferential disclosure. We present a technique for determining which information is potentially disclosive that may help ease the tension between analysts, who want as much information as possible, and data collectors, who want to preserve the confidentiality of their respondents. We also examine the tension between statistical bias and confidentiality. In the course of our work we have come across techniques to apply when conducting disclosure analysis. We discuss each of these techniques and show have they are applicable to large or small data sets and to single or multi-level data.

Topic: Generalized Linear Models for Sample Surveys

Abstract:

Generalized linear models provide an important extension to linear models. The general class includes as some special cases: linear models with uncorrelated or correlated error structure, multilevel models, loglinear models, and Poisson and logistic regression. A number of papers over the last twenty years have considered fitting regression models or small area estimates. This previous research has looked at using the inverse selection probabilities as weights, but the joint selection probabilities have been ignored. How best to incorporate both the unequal selection probabilities and the joint selection probabilities (such as occur in clustered or stratified sample designs) when fitting generalized linear models with fixed, random or mixed model parameters has remained a substantially unanswered question. The role of the joint selection probabilities even the linear model case with fixed model parameters has not been clarified previously, except for the simplest case of a pure design error model, for which it will be shown that estimation of a mean or total leads to the Horvitz-Thompson estimator. This paper considers the role of both the selection and joint selection probabilities in unbiased estimation for generalized linear models with random, fixed or mixed parameters. The general problem and its solution are discussed in a joint design / superpopulation context, and a class of generalized linear models is developed that allows for incorporation of both superpopulation structure and the first and second order properties of the randomisation distribution induced by the survey design. For optimal design, the relationship between the superpopulation structure and selection and joint selection probabilities will be shown to be of central importance. Some results will be given on choice of selection and joint selection probabilities for a complex sample that ensure good estimates for parameters in generalized linear models.

American Association for Public Opinion Research
Washington/Baltimore Chapter
and the WSS Data Collection Methods Section

Title: Predicting Nonresponse from Household and Regional Characteristics

Abstract:

This paper investigates predictors of nonresponse rates for a panel survey (i.e.: The Current Population Survey) using logistic models. The types of predictors include interviewer work characteristics (e.g., workload, number of attempted contacts), and household characteristics (e.g., age, gender of respondent). Much previous research has examined simple effects to predict interviewer or household nonresponse rates. A recent review can be found in Groves and Couper (1998). In contrast, the present study examines confounding and interaction effects between the predictors. Confounding effects occur when two predictors share the same relationship with the interviewer nonresponse rate. Interaction effects occur when the relationship between a predictor and the interviewer nonresponse rate depends on another variable.

Note: If you did not get an e-mail notice of this meeting but want one for future meetings, please contact dc-aapor.admin@erols.com.

Title: An Algorithm for the Distribution of the Number of Successes in Fourth- or Lower-Order Markovian Trials

Abstract:

Many statistical applications may be modeled as a sequence of n dependent trials, with outcomes that may be classified as either a "success" or a "failure". In this talk we present and algorithm that may be used to compute the distribution of the number of successes in such sequences. We assume that the probability of success on the nth trial depends on the outcome of trials n - v, n - v + 1, ..., n - 1, for some value of v, but is independent of trials before n - v. The algorithm extends algorithms given by Kedem for v = 1 and v = 2 to the case where v = 4. The importance of selecting an appropriate value for v when modeling dependent data is discussed. We also discuss the application of the algorithm to the selection of v, and to the computation of waiting time distributions

Title: Use of Hierarchical Modeling in Substance Abuse Surveys

Abstract:

Hierarchical Modeling (HM) is a methodology that recognizes the role that the hierarchical structure can play in analysis and adjusts for this in an appropriate manner. This talk will explore the application of HM to the National Household Survey on Drug Abuse (NHSDA). The initial goal of that research was to determine the impact of ignoring levels of the hierarchy above the person level. Much of the analysis of relationships of other variables to drug use in this field has been limited to simple person-level logistic regressions (using sample weights). The 1997 NHSDA was a nested sample of Primary Sampling Units (typically counties or groups of counties), segments (specially designed combinations of blocks and block groups), households, and persons (one or two per selected household).

We explore some practical considerations as to having sufficient numbers of observations at each level and a brief discussion of use of weights (we don't use them). The discussion will include both continuous drug-related scales and dichotomous variables (use or non-use in the past year of marijuana). The focus is on considerations of variance decomposition, especially for the dichotomous case. The discussion includes a methodology for taking the reported variances from a two-level hierarchy, variances that are in the log odds scale, and converting them to variances in the original scale (reported at the 1999 JSM in Baltimore, MD). We believe we have a way to extend this to 3 or more levels. The relative sizes of variance components are important in the area of Drug Prevention Programs in that they have implications for the relative importance of Drug Programs aimed at the person, family, and neighborhood levels. We will conclude with some of our current research on variance components and the use of sample weights.

Title: Control Charts as a Tool in Data Quality Improvement

Abstract:

A novel method of using control-charting has been successfully applied to two National Highway Traffic Safety Administration data systems to help improve and assure the quality of their data. The approach, requiring only the existing data, differs from the data control and data tracking methods previously described in the literature. Using this method, problems in data quality may be detected and dealt with far in advance of the release of a data base to the user community. This talk describes the methods used, illustrates the approach through various examples, and discusses various technical issues in applying control charts to these traffic safety data. The talk also explains the rationale of the methods in terms of statistical process control logic. Finally, an example of nonrandomly missing data is given.

Important Note:

This seminar is to be shown at BLS Conference Center, Room 1; Census 4, 3225 ; NCHS, Auditorium 11th Floor; and Westat, IRC Reading Room on RE 40F. The October 5 seminar titled "The Reluctant Embrace of Law and Science" is to be shown at BLS Conference Center, Room 9 & 10; Census 4, 3225; and Westat, IRC Reading Room on RE 40F. The October 10 seminar titled "Data Presentation B A Guide to Good Graphics and Tables" is to be shown at BLS Conference Center, Room 9 & 10; Census 4, 3225; USDA ERS Waugh Auditorium B; Westat, IRC Reading Room on RE 40F; NSF- Room 350; and NCHS, Auditorium 11th Floor.

The site facilitators are Stuart Scott/BLS at (202) 691-7383 and Glenn White/EY at (202) 327-6414 for BLS, Maribel Aponte at (301) 457-3480 for Census, Linda Atkinson at (202) 694-5046 for USDA ERS, Hongsheng Hao at (301) 738-3540 for Westat, Ron Fecso at (703) 292-7769 for NSF, and Joe Fred Gonzales at (301) 458-4239 or Iris Shimazu at (301) 458-4497 for NCHS.

The technical contacts are Mark Wisnieski/BLS at (202) 691-7535 for BLS, Barbara Palumbo at (301) 457-4974 for Census, Bob Donegan at 692-5063 for USDA ERS, Jane McGrath at (301) 251-4375 at Westat, Edward Yu at (703) 292-8024 for NSF, and Chandra Singleton at (301) 458-4628 for NCHS.

WSS Data Collection Methods Section
and the
American Association for Public Opinion Research
Washington/Baltimore Chapter

Topic: Interdisciplinary Survey Research Involving the Computer and Cognitive Sciences: Cognitive Issues in the Design of Web Surveys

Abstract:

We describe the results of an experiment on Web surveys. Many studies have demonstrated the advantages of self-administration, which include increased reporting of sensitive information and decreased interviewer effects. Computer administration of survey questions appears to combine these advantages of self-administration with the added advantages of computer assistance. Still, a growing body of evidence suggests that features of the computer interface can elicit reactions similar to those triggered by human interviewers. Our experiment examined features of the interface thought to create a virtual social presence. We varied whether or not the electronic questioner is identified by name ("Hi! I'm John") and whether or not it offers explicit reminders of prior answers. The main hypothesis to be tested in the study is that the more the interface creates a sense of social presence, the more respondents will act as if they are interacting with another human being. The major effects of social presence will be lower levels of reporting sensitive information; at the same time, rates of missing data may be reduced. Thus, the analyses examine both unit and item nonresponse and levels of reporting. The study is designed to begin to fill an important gap in knowledge about the impact of Web data collection on data quality and to address important theoretical concerns about socially desirable reporting and interacting with computers.

Title: The Reluctant Embrace of Law and Statistics

Abstract:

In three recent decisions the Supreme Court of the United States has established new standards for considering expert evidence, including evidence offered by statisticians. This presentation will review the emerging legal standards for expert testimony, the problems that arise with such evidence, and opportunities for improving the quality of scientific testimony offered in litigation. Particular attention will be paid to the use of court-appointed experts and the Federal Judicial Center's Reference Guide on Statistics (www.fjc.gov/EVIDENCE/science/sc_ev_sec.html).

Title: TUTORIAL: Data Presentation -- A Guide to Good Graphics and Tables

Abstract:

Quality data presentations ensure user understanding by taking advantage of how users already process information, reduce the number of thought processes required to understand the data, and breakdown fundamental obstacles to understanding. This workshop will cover when to use graphics and tables, using your data to determine the type of graphic or table, the elements of good graphics and tables, and achieving clarity in presentation. Based on the principles set forth by Edward Tufte and William Cleveland, this is a practical workshop to show participants how to improve their presentations of quantitative data. The tutorial is presented by Marianne W. Zawitz of the Bureau of Justice Statistics (BJS), the statistical agency of the U.S. Department of Justice. She is the creator and content manager of the BJS Web site (http://www.ojp.usdoj.gov/bjs/).

Handouts and slides from this presentation are available.

ASA Seminar Reprise:
Research on Government Survey Nonresponse
Part I

Title: The Last Five Percent: What Can We Learn From Difficult/Late Interviews?

Abstract:

A few studies have examined nonresponse and the impact it has on survey estimates (e.g., Tucker and Harris-Kojetin 1998; Harris-Kojetin and Robison 1998). Less research has focused specifically on the characteristics of late or "difficult" cases that comprise the last few percentage points of survey response rates -- particularly for personal surveys. To address this topic, we examine several characteristic of late/difficult cases from the Current Population Survey (CPS) and the National Crime Victimization Survey (NCVS). First, we explore whether the household and demographic characteristics of late cases differ from other interviews and to what degree they resemble nonrespondents. Second, we check to see if critical survey estimates would be different without the late/difficult cases. The paper concludes with a discussion of how these results can help survey managers and analysts better understand the relative contribution that late cases make to the two surveys examined.


Title: The Relationship Between Household Moving, Nonresponse, and Unemployment Rate in the Current Population Survey

Abstract:

In the Current Population Survey, a household survey from which labor force estimates are produced, selected housing units remain in sample during a 16-month period. The households are interviewed during the first 4 and last 4 months of this period. During this time, the household occupying a sample housing unit may change. Matching households between months allows an analysis of the relationship between whether a household moves and estimates of the employment rate. Many households move during the 16 months they are in sample. Since change in employment may be related to the household's decision to move, the estimates of employment status may be affected. "Inmovers" don't completely make up for the number of "outmovers" so their relative effect may not be offset. The differences in response rates can also affect estimates. The current study examines the nature of this relationship through an analysis of the characteristics of movers and the resulting effect on labor force estimates.

The 2000 Morris Hansen Lecture

The Washington Statistical Society is pleased to announce the tenth in the annual series of lectures to honor the memory of Morris Hansen. This lecture series is made possible by a grant from WESTAT, where Morris Hansen was senior statistician for 20 years, and chairman of the Board of Directors at the time of his death.

Title: Models in the Practice of Survey Sampling (Revisited)

Abstract:

In 1983, Hansen, Madow, and Tepping published an important paper in JASA entitled "An evaluation of model-dependent and probability-sampling inferences in sample surveys." The authors' position was that, at least for large samples, models can and should be used within the framework of probability-sampling (or design-based) inference and that inferences about population characteristics should also be design-based. The present paper revisits this issue. It basically supports the position of Hansen, Madow, and Tepping. However, it notes that many of the developments in survey sampling in recent years have been concerned with situations where some reliance on model-dependent procedures is needed. In this context, it reviews various uses of model-dependent methods in surveys, including handling missing data, small area estimation, statistical matching, analyses concerned with determining causal mechanisms, and generalized variance functions.

Title: Financial Service Usage Patterns of the Poor: Cost Considerations

Abstract:

Low-income individuals use a variety of financial services to meet such common financial needs as receiving income, cashing checks, paying bills, sending funds, and accumulating savings. Some obtain these services from banks, but many others use check cashing outlets, post office branches, supermarkets, and non-institutional sources. This paper describes the extent to which financial costs explain the financial service usage patterns of low-income individuals, and the extent to which these financial costs may vary in importance among demographic subgroups.

The paper is based on data obtained from the 1998-99 "Survey of Financial Activities and Attitudes," which was sponsored by the Office of the Comptroller of the Currency in order to better understand why millions of Americans have no bank account. The survey contains detailed information on the financial activities and attitudes of 2,000 randomly-selected individuals living in low- and moderate-income neighborhoods of Los Angeles County and New York City.

Topic: Interdisciplinary Survey Research Involving the Computer and Cognitive Sciences: "Unraveling the Seam Effect," and "The Gold Standard of Question Quality on Surveys: Experts, Computer Tools, Versus Statistical Indices"



Abstract:

(Rips) Panel surveys sometimes ask respondents for data from several different intervals within a longer reference period. Findings from such surveys often show larger changes from one month to the next when the data come from two different interviews than from the same interview. We have studied this seam effect experimentally in a setting that allows us to control the information that respondents should report. The results of the experiments are consistent with a theory in which the seam difference is due to two factors: (a) respondents' forgetting information within the response interval, and (b) their bias in reporting when they can no longer remember correct answers.

(Graesser) We have developed a computer tool (called QUAID) that assists survey methodologists in improving the wording, syntax, and semantics of survey questions. We have performed analyses that assess these problems on a corpus of surveys. There has been a persistent challenge in our assessments of the validity of QUAID's critiques: What is the gold standard for determining whether a question has a particular problem. A computer tool can perform complex computations, but the question remains whether the output is valid. This presentation addresses the challenges of performance evaluation when there is no defensible gold standard for question quality.

Title: Delivering Interactive Graphics on the Web

Abstract:

The Internet, Intranets, and Extranets have all proven to be very effective for allowing large audiences access to vast amounts of information. But, just like other communications media before them, these networks solve important problems while creating others. In particular, flexible content design and customized interaction with statistical graphics present unique problems for web-based networks. nViZn (pronounced Aenvision@ - http://www.spss.com/nvizn) addresses these issues.

nViZn is a Java Application Programming Interface (API) designed specifically for developers working on web-based data dissemination. nViZn focuses on three aspects of statistical graphics: powerful analytical content creation, customizable interactivity with rich meta-data, and flexible, high-quality presentation. The content creation engine is based on a new ground-breaking theory presented by Leland Wilkinson in his book, "The Grammar of Graphics." This new theory is an "algebra" of graphics that provides a simple way to model both basic and complex data structures, as well as novel ways for mapping data dimensions to the aesthetic attributes of graph elements. However, content creation does not end with building a graph. Content providers retain complete control over end-user interactivity with highly customizable "controller" components. Further, a rich graph meta-data model allows content providers to embed important contextual information, which can be revealed to the end-user at key locations within the graph. We feel that these three aspects provide a complete solution for web-based data dissemination.

Topic: Poverty estimates from the CPS and CE: Why they differ

Abstract:

Poverty measures are one indicator of the well-being of Americans. A family's poverty status, using the official measure of poverty, is determined by comparing its annual before tax money income to the appropriate poverty threshold. When this family resource definition is used in measuring economic well-being, poverty is the lack of the economic resources needed to support consumption of economic goods or services.

Currently, the official measure of poverty is obtained from the Current Population Survey (CPS). Poverty measures can also be constructed using other surveys, such as the Consumer Expenditure Survey (CE). Poverty rates obtained from the CE are significantly higher than those obtained from the CPS. Thus, when the CE is used to count the number of poor, the overall well-being of the U.S. population as measured by those in poverty appears lower.

Little attention has been paid to examining why these differences in poverty estimates occur. Martini and Dowhan in Documenting and Explaining SIPP-CPS Differences in Poverty Measures Among the Elderly, pointed out that since poverty plays a significant role in the formulation and evaluation of social policies, it is important that discrepancies between poverty estimates produced by different surveys be understood. To do so requires examining why income estimates differ across surveys.

Title: The 2000 Roger Herriot Award For Innovation In Federal Statistics

Abstract:

On November 15, 2000 the Washington Statistical Society will present the Roger Herriot award to Don Dillman. Don is Professor of Rural Sociology at Washington State Universtiy and the first Roger Herriott Award recipient to have a career outside federal government. Don's innovations have had a major impact on federal statistics, notably Census 2000, as well as in other data collection settings. Don's contributions reflect two of the hallmarks of Roger Herriot's career: dedication to issues of measurement and improving the efficiency of data collection programs. Don's contributions to questionnaire design, to the way telephone surveys are conducted, to emerging Internet research surveys, and to his mentoring of students and journeyman practioners exemplify the accomplishments the Herriot Award represents.

Several distinguished speakers will discuss Don's various contributions to federal statistics. Invited speakers include: Cleo Redline, Census Bureau, Bob Tortora, The Gallup Organization, and Robbie Sangster, Bureau of Labor Statistics.

The Roger Herriot Award is sponsored by the Washington Statistical Society, the ASA's Social Statistics Section, and ASA's Government Statistics Section. Roger Herriott was the Associate Commissioner for Statistical Standards and Methodology at the National Center for Education Statistics (NCES) before he died in 1994. Throughout his career at NCES and the Census Bureau, Roger developed unique approaches to the solution of statistical problems in federal data collection programs. Don Dillman truly exemplifies this tradition.

Please join the Washington Statistical Society on Wednesday, November 15, 2000 at 12:30 p.m. to honor Don as we present the award to him and celebrate in a reception following the award.

ASA Seminar Reprise:
Research on Government Survey Nonresponse
Part II

Title: The Last Five Percent: What Can We Learn From Difficult/Late Interviews?

Abstract:

A few studies have examined nonresponse and the impact it has on survey estimates (e.g., Tucker and Harris-Kojetin 1998; Harris-Kojetin and Robison 1998). Less research has focused specifically on the characteristics of late or "difficult" cases that comprise the last few percentage points of survey response rates -- particularly for personal surveys. To address this topic, we examine several characteristic of late/difficult cases from the Current Population Survey (CPS) and the National Crime Victimization Survey (NCVS). First, we explore whether the household and demographic characteristics of late cases differ from other interviews and to what degree they resemble nonrespondents. Second, we check to see if critical survey estimates would be different without the late/difficult cases. The paper concludes with a discussion of how these results can help survey managers and analysts better understand the relative contribution that late cases make to the two surveys examined.


Title: A Comparison of Bias Estimates in RDD and Address-Based Samples

Abstract:

Declining response rates and the resulting nonresponse bias in sample surveys is increasingly becoming a barrier to the utilization of Random Digit Dialing (RDD) samples and the pure reliance on the telephone as a mode of data collection. As a result, pressure is mounting in the federal government to rely exclusively on address-based sample designs and in-person visits as a solution. However, address-based sample designs are expensive to employ at the national level, and high response rates are becoming more difficult to obtain with this methodology as well. Given the current difficulties in the field, which methodology is best? To answer this question, researchers need to know how much bias is incurred by both methods of selecting the sample.

Using the results from the joint Nationwide Personal Transportation Survey/American Travel Survey pretest (NPTS/ATS), this paper will present a quantitative analysis of nonresponse bias in both address-based samples and list-assisted RDD samples. Through the inclusion of both address-based and RDD sample frames in the NPTS/ATS pretest, differences in demographics and questionnaire items between the responding RDD and address-based samples can be evaluated. These two samples can each be validated against the Census data taken from the samples' own geographic base, providing an accurate estimate of total bias (i.e., coverage and non-response bias). This research will enhance our understanding of alleged biases in RDD samples and provide an assessment of improvements, if any, in bias that the address-based sample design offers.

American Association for Public Opinion Research
Washington/Baltimore Chapter
and the
WSS Data Collection Methods Section

Topic: Advances in Telephone Sample Designs

Abstract:

List-assisted RDD designs became popular in the late 1980s and early 90s. Work done by BLS and the University of Michigan resulted in the development of the underlying theory for these designs as well as the evaluation of various alternative sampling plans to optimize the method. This work was documented in an article by Robert Casady and James Lepkowski in the June 1993 issue of Survey Methodology. Recent research by Jim Lepkowski, Clyde Tucker, and Linda Piekarski to re-evaluate these designs in light of the significant changes in the telephone system over the last decade will be presented.

Title: The Effects of Person-level vs. Household-level Questionnaire Design on Survey Estimates and Data Quality

Abstract:

Demographic household surveys frequently seek the same set of information from all adult household members. An issue for questionnaire designers is how best to collect data about each person without compromising data quality or unduly lengthening the survey. One possible design strategy is the person-level approach, in which all questions are asked about each eligible household member, person by person. An alternative approach uses household-level screening questions to identify first whether anyone in the household has the characteristic of interest, and then identifies specific individuals. Common wisdom holds that the person-level approach's more thorough, person by person enumeration yields higher quality data. This approach, however, is not without its drawbacks - especially in larger households, the person-level design can be quite time consuming and tedious.

Household-level screening questions offer important efficiencies, since they often present a question only once per household, but may be suspect with regard to data quality. Little empirical research exists comparing these two design strategies with regard to their impact on response completeness or data quality, interview length, or other outcome measures of interest. This paper presents results from Census Bureau's 1999 Questionnaire Design Experimental Research Survey (QDERS), which included a split-ballot test comparing person-level questions to household-level questions on selected demographic, disability, health insurance, and income questions. We find some evidence that the use of a household screener entails an increased risk of under-reporting relative to a person-level design, in particular for Asummary@ measures of functional limitations, the identification of persons covered by employer/union-based health plans, and asset ownership. We also find evidence, however, that the household-level approach produces more reliable data than the person-level approach for most topic areas (health insurance coverage is a notable exception). Item nonresponse is generally trivial in both treatments, except for asset ownership, where there is lower item nonresponse for the household-level approach. Behavior coding results showed no inherent superiority of one or the other design. We do find the expected increase in interview efficiency with the household-level design, and some evidence that interviewers preferred it. We conclude with a brief discussion of the implications of these findings, and suggestions for further research.

Topic: Measuring Sexual Orientation in Health Surveys: Lesbian Health Research Issues

Abstracts:

(Bradford) National interest in lesbian health has accelerated as a result of the 1999 Institute of Medicine report "Lesbian Health: Current Assessment and Future Directions." A methodology working group at the DHHS-sponsored March 2000 Scientific Workshop on Lesbian Health recommended a series of activities to develop effective measures and include them on national surveys. These recommendations and current efforts to expand inclusion of LGBT populations in Healthy People 2010 will be reviewed, and opportunities for participation will be presented and discussed.

(Fisher) A Needs Assessment Study of LBT Women: The Relationship Between Sexual Orientation/Gender Identity, Health-Seeking Behaviors, and Perceived Quality of Care Issues. This study reports the results of an extensive needs assessment survey conducted with approximately 700 lesbian, bisexual, and transgendered (LBT) women in the greater metropolitan Washington DC area under the auspices of the Lesbian Services Program (LSP) at Whitman-Walker Clinic.

(Brody and Miller) Sexual behavior and orientation questions in a national survey: challenges and opportunities. The inclusion of questions on sexual behavior and orientation is new for the National Health and Nutrition Examination Survey (NHANES). The current NHANES (2000+) contains questions on same-sex sexual behavior and sexual orientation for men and women 18-59 years of age. Questions are administered, in both English and Spanish, using the audio - computer assisted self-interview technique. In addition to presenting preliminary data from the current NHANES, findings from exploratory work conducted by the NCHS Questionnaire Design Laboratory, will be discussed. The independent study was designed to help evaluate existing questions on this sensitive topic in relationship to cognitive processes (i.e., the methods used to construct answers and the meanings attributed to key words) and will provide an opportunity to refine and develop additional questions for NHANES and other national surveys.

Title: Education Certifications and Other Non-Traditional Educational Outcomes: Perspectives of Data Users and Researchers



Abstract:

Credentials that are outside the traditional postsecondary educational attainment hierarchy (i.e., associate's degrees, bachelor's degrees, master's degrees, etc.) are an important and growing area of education. Yet, federal agencies do not know what such credentials (e.g., licenses, certificates awarded after the completion of a computer course) represent in terms of skill achievement and potential for increased earning. In its first report, the Committee on Measures of Educational Attainment, which was chartered by the Interagency Council on Statistical Policy (ICSP), concluded that the awarding of such credentials is growing and that the federal agencies need to learn more about certification programs. The Committee's specific focus are credentials that require some coursework and that are related to labor force participation.

The Committee received funding from several ICSP agencies to begin research in the followining areas: how to ask questions about credentials, specifically, terminology and definitions; collecting data on what institution offer credentials, what credentials are offered, how many hours of coursework are required, etc.; what are the characterictics of students who enroll in these programs; and what are the benefits, both anticipated and actual.

This seminar will inform the Committee's activities and broaden the discussion. It brings together five experts from outside the statistical agencies to aid the Committee in its reseach plan as well as highlight some of the issues. The speakers, who represent a range of different perspectives, and their topics follow:

Copies of the Committee's first report and research plan will be available. A reception will follow the panel's presentation.

Topic: Classifying Open Ended Reports: Coding Occupation in the Current Population

Abstract:

An overlooked source of survey measurement error is the misclassification of open-ended responses. This seminar reports on efforts to understand the misclassification of occupation descriptions in the Current Population Survey (CPS). Actual CPS descriptions were analyzed to identify which features vary with intercoder reliability. An experiment was conducted to test how these features interact with each other to affect reliability. Finally the presenters observed and interacted with coders at work to help explain the experimental results.

First a set of occupation descriptions (n=32,362) reported by CPS respondents and entered by interviewers was analyzed; each description was classified by two independent coders. One factor that was strongly related to reliability was the length of the occupation description: contrary to our intuitions, longer descriptions were less reliably coded than shorter ones. This was followed with an experimental study of occupation descriptions (n=800) that was constructed to vary on key features (e.g. specific terms led to low or high reliability in study one); these descriptions were again "double-coded." The effect of description length depended on the difficulty of primary occupation terms. Difficult occupation terms led to a strong length effect; easy occupation terms led to virtually no length effect. Finally, coders classifying 50 experimental descriptions were observed and asked about their reasoning. This qualitative study produced a possible explanation for the lower reliability of longer descriptions: inconsistent application of informal coding rules.

Topic: Meeting the Sampling Needs of Business

Abstract:

Sampling for business uses often requires efforts to keep the sample small. There are situations where there are major cost implications associated with an increase in sample size. Generally, in these circumstances, the increased precision of ratio or regression estimation is useful. For small samples, the design-based ratio estimate under simple random sampling could be seriously biased unless the sample is balanced with respect to the covariate. Stratification on the covariate can achieve the effect of balancing, but the sample size needs to be reasonably large. We propose a deep stratification method at the design stage such that only one unit is drawn from each of the equal sized strata. We then use the regular design-based ratio estimate and its variance estimate. This method makes a remarkable contribution towards the bias reduction and also gives good variance estimates and coverage rates. Simulation results are presented.