Tag Archives: proc surveyreg

SAS Survey Procedures: PROC SURVEYLOGISTIC vs. PROC LOGISTIC Output

In a previous post, I talked about complex survey designs and why analysis of such survey data requires the use of SAS survey procedures. PROC SURVEYREG and PROC SURVEYLOGISTIC have some of the same options available for output/diagnostics as do their non-survey counterparts, PROC REG and PROC LOGISTIC. Default output includes fit statistics (R squared, AIC, and Schwartz’s criterion), chi-squared tests of the global null hypothesis, degrees of freedom, and coefficient estimates for each parameter along with standard error of coefficient estimates and p-values. PROC SURVEYLOGISTIC also includes odds ratio point estimates and 95% Wald confidence intervals for each input parameter, as does PROC LOGISTIC.

The survey procedures are more limited in some ways, though. For example, PROC LOGISTIC can use an option such as stepwise selection to restrict the output to only predictors with significance above a certain level; there is also an option to rank those predictors. These options do not work with PROC SURVEYLOGISTIC, which makes the output more unwieldy with a large number of predictors. Most notably in terms of differences, PROC LOGISTIC automatically outputs a chi-squared test of the residuals for each input variable; however, any analysis of residuals is irrelevant for the survey procedures since assumptions of normality and equal variance are not applicable due to survey design. Tabled residuals are not output at all for the survey procedures, although covariance matrices are available for both as a non-default option. Similarly, influential observations/outliers are also not analyzed due to the use of person weights. As long as we use person weights, we would get the same coefficients with a regular PROC REG as we would with PROC SURVEYREG, but standard error estimates would be different and predictor significance could also vary.

Survey Design: Stratification & Clustering

In a previous post, I talked about importing Medical Expenditure Panel Survey (MEPS) data into SAS. MEPS survey design is complex, with person weights, stratification and multi-stage clustering techniques; it is not a random sample of the population. Stratification is a survey design technique which is typically done by demographic variables such as age, race, sex, income, etc. The goal is to maximize homogeneity within strata and heterogeneity between strata. Sometimes stratification is used when it is desirable to oversample certain groups under-represented in the general population or with interesting characteristics relevant to what is being studied (for example, blacks, Hispanics, and low-income households).

Clustering is typically done by geography in order to reduce survey costs, where it is not feasible or cost-effective to do a random sample of the entire population of the U.S., for example. Within-cluster correlation underestimates variance/error, as two families in the same neighborhood are more likely to be similar demographically (in regard to income, for instance). Therefore, we want clusters to be spatially close for cost effectiveness but as heterogeneous within as possible for reasonable variance. Sometimes a multi-stage clustering approach is used, as in MEPS; for example, a sample of counties is taken, then a sample of blocks is taken from that sample of counties, and finally individuals/households are surveyed from the sample of blocks. Information about how the survey was designed is then stored in survey design variables which are included in the dataset. These survey design variables are used to obtain population means and estimates and can also be used in regression analysis with procedures such as PROC SURVEYREG and PROC SURVEYLOGISTIC.

If person weights are ignored and one tries to generalize sample findings to the entire population, total numbers, percentages, or means are inflated for the groups that are oversampled and underestimated for others. It is therefore highly undesirable to estimate population frequencies or means without using person weights or SAS procedures such as PROC SURVEYMEANS and PROC SURVEYFREQ. In regression analysis, ignoring person weights leads to biased coefficient estimates. If sampling strata and cluster variables are ignored, means and coefficient estimates are unaffected, but standard error (or population variance) may be underestimated; that is, the reliability of an estimate may be overestimated. For example, when comparing one estimated population mean to another, the difference may appear to be statistically significant when it is not.

Importing Medical Expenditure Panel Survey Data Into SAS

I did an in-house SAS user group presentation last week on using SAS survey procedures to analyze Medical Expenditure Panel Survey (MEPS) data with regard to insurance coverage in the context of healthcare reform (ACA and the New Individual Segment: Profiling the Uninsured and Non-Group Insured Populations with MEPS 2010 and SAS Survey Procedures). The MEPS 2011 consolidated data file is available as of September 2013 for download. It contains detailed information (over 1900 variables) on demographics, household income, employment, diagnosed health conditions, additional health status issues, medical expenditures and utilization, satisfaction with and access to care, and insurance coverage of those surveyed.

There are several government data sets made available to the public each year that are designed for easy analysis with SAS and other statistical programming software (including STATA and SPSS). I attended a BASUG training by Paul Gorrell back in 2012 which introduced me to some of these data sets. The MEPS website includes programming statements to help you import the data to SAS. If you have a SAS/STAT license with Base SAS of version 9 or above, you have access to four SAS survey procedures (PROC SURVEYFREQ, PROC SURVEYMEANS, PROC SURVEYREG, PROC SURVEYLOGISTIC) that you can use to analyze data from complex survey designs such as MEPS.

You can get started in just a few easy steps. There are a couple of ways to do this, but this is the method I used:
1. Download and run the h147ssp.exe file to extract the data to your chosen library.
2. After you execute the file above, you should be able to find the sas transport data file (h147.ssp) in your folder. Now you have to tell SAS where to find it with a FILENAME statement.
3. Assign the LIBNAME where you want your SAS data set to be created.
4. Import the data using PROC XCOPY.

Here’s an example:
LIBNAME MYLIB ‘C:\Users\C31497\Desktop’;
FILENAME H147 ‘C:\Users\C31497\Desktop\h147.ssp’;
PROC XCOPY IN=H147 OUT=MYLIB IMPORT;
RUN;

That’s it! Next you can run a PROC CONTENTS to get a full variable listing, or you can view the online codebook on the MEPS site. You can find out more about SAS Survey procedures in my NESUG 2013 paper, Proc SurveyCorr.