Estimation of a Proportion with Survey Data


Authors: 
Duchesne, P.
Category: 
Volume: 
11(3)
Pages: 
Online
Year: 
2003
Publisher: 
Journal of Statistics Education
URL: 
http://www.amstat.org/publications/jse/v11n3/duchesne.pdf
Abstract: 

The estimation of proportions is a subject which cannot be circumvented in a first survey sampling course. Estimating the proportion of voters in favour of a political party, based on a political opinion survey, is just one concrete example of this procedure. However, another important issue in survey sampling concerns the proper use of auxiliary information, which typically comes from external sources, such as administrative records or past surveys. Very often, an efficient insertion of the auxiliary information available will improve the precision of the estimations of the mean or the total when a regression estimator is used. Conceptually, it is difficult to justify using a regression estimator for estimating proportions. A student might want to know how the estimation of proportions can be improved when auxiliary information is available. In this article, I present estimators for a proportion which use the logistic regression estimator. Based on logistic models, this estimator efficiently facilitates a good modelling of survey data. The paper's second objective is to estimate a proportion using various sampling plans (such as a Bernoulli sampling and stratified designs). In survey sampling, each sample possesses its own probability and for a given unit, the inclusion probability denotes the probability that the sample will contain that particular unit. Bernoulli sampling may have an important pedagogical value, because students often have trouble with the concept of the inclusion probability. Stratified sampling plans may provide more insight and more precision. Some empirical results derived from applying four sampling plans to a real data base show that estimators of proportions may be made more efficient by the proper use of auxiliary information and that choosing a more satisfactory model may give additional precision. The paper also contains computer code written in S-Plus and a number of exercises.

The CAUSE Research Group is supported in part by a member initiative grant from the American Statistical Association’s Section on Statistics and Data Science Education

register