Model-Based Statistical Sampling

A Statistical Methodology for Program Impact Evaluation and Other Studies

Copyright Notice

Copyright © 2012 2013 2014 2015 2016 2017 2018   Roger L. Wright, Ph.D.   roger.l.wright@gmail.com   All rights reserved

Introduction

Model-based statistical sampling (MBSS) is a statistical methodology for developing information about a large population by collecting the desired information in a smaller statistical sample. MBSS guides:

MBSS is especially effective when you have supporting information for all elements in the population, and a positively skewed distribution of the size of the elements in the population. By taking full advantage of the available supporting information, MBSS can provide information that is reliable, cost-effective, and timely. MBSS methodology is consistent with generally accepted statistical methods of sampling and yields studies that are statistically defensible.

The MBSS strategy is to assume a model for the relationship between the key variables of interest and the supporting information. The model is used to help choose the sample size and to construct the sample design. The statistical analysis is assisted by the model but is primarily based on the sample design. Hence the MBSS methodology is most accurately characterized as model-based sample design combined with design-based data analysis. Elsewhere this strategy has been called model-assisted survey sampling.

If you have a strong background in mathematical statistics and survey sampling and want a concise overview of the MBSS methodology, click here,

The model-assisted approach is important because it contributes to the defensibility, transparency, and objectivity of a study. Using information from the sample design, standard statistical methods of finite population sampling are used to develop the findings. If the sample design has been accurately followed, the findings are very resistant to challenge on statistical grounds.

Conversely, the findings are vulnerable to selection bias if the sample design has not been followed. Often the most crucial weakness is non-response. If the non-response rate is high and the non-response might be statistically correlated with the phenomena of interest in the study, then non-response is likely to have a deleterious affect on the study.

A second vulnerability is measurement error. If the measurement error is random and unbiased, then it is generally reflected in the model and the findings. To the extent that data collection is systematically inaccurate or biased, the findings will generally be biased and the confidence intervals will be misleading.

The upshot is that the validity of the study depends on close adherence to the sample design and unbiased data collection.

Program Assessment and Impact Evaluation

MBSS has been especially useful for program assessment and impact evaluation. An assessment and impact evaluation study measures the quantitative impact of a program and generates recommendations for its improvement.

For example, suppose a government agency is responsible for the construction and maintenance of highway bridges in a particular jurisdiction. If the agency completes more than a handful of projects annually, an independent program assessment study may be undertaken periodically. By providing an outside technical audit, the study can provide an objective, quantitative assessment of the quality of the program, increase confidence in the program's cost effectiveness, and stimulate continuous program improvement.

In a typical program impact evaluation, the population consists of a large number of projects undertaken by the program during a specific year. Often, there are a large number of smaller projects and a fewer number of larger projects so that the distribution of size is positively skewed.

The program manager maintains a program-tracking database that provides administrative and technical information about each project. The tracking database includes one or more fields of data about the estimated impact of each project. By tabulating these fields the program manager can estimate the success of the program in reaching its objectives.

An impact evaluation study is usually conducted by a third-party organization to provide an independent assessment of the program's quantitative impact and to provide information for program improvement. The evaluator selects a suitable statistical sample of the projects and carries out an independent assessment of the impact of each sample project using a suitable measurement methodology. The results are considered the true impact of each sample project.

The evaluator compares the true impact of each sample project to the estimated impact recorded in the program tracking database. An examination of these discrepancies usually leads to specific recommendations for program improvement.

The Realization Rate

The key quantitative parameter of interest in the evaluation study is called the realization rate of a particular measure of impact. The realization rate is the ratio Y/X where:

For example, if the realization rate were 0.80, then the true impact would be 80% of the total estimated impact tabulated from the tracking database. The true impact of the program is assessed by using the sample data to statistically estimate the true realization rate.

The statistical precision of the estimated realization rate depends on the strength of the association between the true impact and the tracking estimate of impact. If the true impact is highly correlated with the tracking estimate of impact, then the realization rate can be estimated reliably from a relatively small sample. Conversely, if the correlation is weaker, then a larger sample is required.

The Error Ratio

A key issue in designing an evaluation study is the choice of the sample size, i.e. the number of projects to be evaluated. The sample must be large enough to provide a reliable estimate of the realization rate. Of course, the sample size and the measurement methodology are the most important determinants of the resources and time required for the evaluation study so the sample should not be needlessly large.

Choosing the appropriate sample size is a major focus of the MBSS methodology. The appropriate choice depends on a second population parameter, called the error ratio. The error ratio is the most relevant measure of the strength of association between the true impact and the supporting information, especially the tracking estimate of impact. The error ratio can be estimated from the characteristics of the program and prior evaluation studies. The estimated error ratio is an important product of the data analysis of each study and is used to plan subsequent studies.

The error ratio is also an important measure of program effectiveness. Along with the realization rate, the error ratio measures the accuracy of the tracking estimates of impact. While the realization rate measures the overall accuracy of the total impact recorded in the tracking system, the error ratio measures the accuracy of the tracking estimates of impact for the individual projects in the population. As such, the error ratio measures the quality of program delivery. A program that provides relatively accurate estimates of impact within the tracking system tends to have projects that are well chosen and carefully implemented. Such a program can also be evaluated reliably at a substantially lower cost than one with poor tracking data.

The Sample Design

A second focus of MBSS methodology is the development of a suitable sample design. The sample design guides the selection of the sample projects. By following a sample design, the sample will provide statistical estimates of the population characteristics -- totals, realization rates, and error ratios for the entire population and domains of interests -- with little or no sampling bias and measurable statistical precision. An effectively stratified sample design will provide near optimal statistical precision by taking full advantage of the supporting information. In particular, when the population is positively skewed, an MBSS sample design will provide an appropriate allocation of the sample among size categories, e.g., very small, small, typical, large, and very large.

Data Analysis

A third focus of MBSS methodology is the analysis of the sample data. The principle findings are developed using standard methods of ratio estimation with stratified sampling. The MBSS analysis methodology also supports ad hoc exploratory analysis of the data--the estimation of population characteristics for any domains that can be identified from the population or sample data. Finally, MBSS provide a method for estimating error ratios and other parameters relevant to future sample designs for the entire population and domains of interests.

Summary

Program impact evaluation makes three important contributions to effective program delivery:

The MBSS methodology provides:

Other Applications of MBSS

MBSS was developed by the electric utility industry for load research and has been extensively used for energy-efficiency research and energy-conservation program evaluation. MBSS has also been used in financial auditing and market research. With the Paris climate accord for addressing global warming, MBSS should be considered for measuring the impact of carbon-reduction programs.

The notes to this page give examples of MBSS applications and information to help you decide whether MBSS will be useful for your study. To see these notes, click the link at the top or bottom of this page or

The Fifteen Steps

This site will give you a guided, interactive tour of the concepts and methods of MBSS. The site will explain and demonstrate the main ideas of MBSS and give you an easy way to experiment with the methodology. You will use an illustrative example included in the site to get a feel for MBSS and then, if you like, you can use the same procedures with your own data.

This site is made up of fifteen web pages called analysis steps. You will move through the steps in sequence.

The first seven analysis steps will deal with known population data . You will enter the data of the illustrative example or your own data and then use various statistical tools to understand the data and some key concepts of the MBSS methodology..

The next three analysis steps will help you select a sample from the population. You will learn how to choose the sample size, how to develop an efficiently stratified sample design and how to draw the sample.

Four more analysis steps will guide you through the analysis of your sample data. You will enter your sample data, summarize the information in tables, estimate means, totals and ratios for quantities of interest, assess the statistical precision of your results, and develop the information needed to plan future studies -- thereby closing the circle.

The final step is a tool for testing and demonstrating the effectiveness of the methodology. Use this step to address any doubts you have about the performance and validity of MBSS.

The first fourteen steps will give you a statistical methodology that is applicable to many fields, taking you from project planning and sample design through data analysis. Beyond learning these methods, your unique role will be to:

The Notes

Each of the fifteen analysis steps has a corresponding set of notes. These notes are very important. They provide several types of information: The notes to this page say more about the purpose of MBSS and when it is likely to be especially helpful. These notes also describe several examples where MBSS has been used. This first set of notes is a good place to start learning about MBSS.

How to Use this Site

Use the links at the top and bottom of each page to navigate through the steps and notes. The first time through, you should complete each step before going on to the next one.

Many of your inputs will be used in later steps. If you want to change an input that was set in a prior step, just go back to that step.

You do not have to go through all of the steps in one work session. The system will store your inputs in local storage on your computer. When you are ready to continue your work, simply open the MBSS home page. You will find a link to continue where you left off.

If you do want to start over from scratch, simply open the home page, go to the first step, and click Submit. The system will delete all of your inputs and allow you to re-load the sample data or your own data. When you click on Submit, all of your prior inputs will be erased and you will be able to start afresh.

Alternatively, you can review your prior work without re-entering your data and inputs. Just open the home page, go to the first step, and click on Next Step immediately so that the data entry step is skipped. Then move from one step to the next, repeating or changing your work whenever you wish. Unless you change something, the system will remember and display your prior inputs.

To see the notes for the current analysis step, simply click on the link. If you need to review the notes for a prior step, simply open the current notes page and then follow the link back to the prior note.

You may find it convenient to have an analysis step open in one window and the corresponding notes open in a second window. If you click Notes from the Home Page, the notes will open in a second window. Then, in the first window you can click on First Step or Continue.

If you want to review a prior analysis step, you can step back to it. But if you click the Submit button on an earlier page, the system will think this is the last step that you have done so you will need to click through each of the following steps again.

Suggestions:

Limitations

The present version is still under testing and refinement. Please let us know about any corrections or suggestions. Please contact us by email at roger.l.wright@gmail.com.

If you are familiar with MBSS from the DNV-GL Load Research System, you will see that this site implements the main features of MBSS analysis used in static applications such as evaluation studies.

Please remember that the primary goal of this web site is education. The site is not intended for serious applications. This system is entirely interactive. It provides no audit trail documenting the options selected or the steps undertaken and no batch processing. Moreover it is not intended for large applications involving large population files or sample data with a large number of variables. It is certainly not intended for load research applications.

Despite these limitations, we hope that you find this site to be of some help in understanding MBSS concepts and methods and using them effectively in your own work..

Acknowledgements

The MBSS methodology presented here is strongly reliant on Model Assisted Survey Sampling by Carl-Eric Sarndal, Bengt Swenson and Jan Wretman, (Springer Series in Statistics) 2003.

MBSS methods for program evaluation were described in The California Evaluation Framework prepared for Southern California Edison Company and the California Public Utility Commission, by TecMarket Works, June 2004, Chapters 12-13.

MBSS methodology for utility load research has been taught for many years in the Advanced Methodology Course sponsored by the Load Research Committee of the Association of Edison Illuminating Companies, .

Invaluable technical assistance with Javascript programming was generously provided by Mr. Kai Stinchcombe.

The Author

Roger Wright has a Ph.D. in mathematical statistics from The University of Michigan (1968). Dr. Wright was Professor of Statistics at the Ross School of Business, University of Michigan from 1968 through 1988. From 1988 through 2009, Wright was President and CEO of RLW Analytics, Inc., a firm providing independent program evaluation research and services primarily to the electric utility industry. Dr. Wright is the author of numerous scholarly articles and professional reports and received lifetime achievement awards from the Association of Edison Illuminating Companies and the International Program Evaluation Conference.