Category Archives: SAS Tips

ORA-00972 Error with SQL Pass-Through in SAS

If you are using explicit pass-through syntax in SAS, you can run into problems when naming conventions in the native database are different from those in SAS. For example, this can happen when SAS allows more characters for an identifier than Oracle does. I ran the following statements in SAS to pull some pharmacy claims and got an error in my log:

proc sql;
connect to oracle (path=”pathname” user=xxxxxxx password=xxxxxxxx);
create table sasuser.thrx_rx_claim_2012 as
select * from connection to oracle
    (
    select a.indiv_enterprise_id
    ,b.clm_pd_dt
    ,b.svc_beg_dt
    ,b.svc_end_dt
    ,b.elgbl_chrg_amt
    ,b.mail_retl_cd
    ,b.new_prscrptn_ind
      from c31497.thrx_id_elig_rsk_cens_dedup_2012 a
      left join ccdm_mart.clm_svc b on a.indiv_enterprise_id=b.indiv_enterprise_id
        where svc_beg_dt >= to_date(’01/01/2012′,’mm/dd/yyyy’)
        and svc_beg_dt <= to_date(’12/31/2013′,’mm/dd/yyyy’)
        and CVRD_AMT<>NOT_COVRD_AMT
        and clm_frmt_ty=’RX’
    )
;
quit;

Here is the error I received in the log:
ORA-00972: identifier is too long

An identifier can refer to a table name or a column (variable) name. In this case the table name thrx_id_elig_rsk_cens_dedup_2012 in the FROM statement is exactly 32 characters, which is the maximum allowable in SAS. However, Oracle gives you a maximum of 30 characters, so the query will not run.

Case-Sensitive Schema in a SAS Libname Statement

This is a minor annoying issue that had been bothering me for a few years (!) and was finally resolved with a ticket to SAS Tech Support. I have several libname statements that I have been using to connect to DB2 databases. In the past, the libname statements all ran successfully, and I was able to query the tables in the databases. However, when I tried to view the tables under my assigned libraries in the SAS Explorer window by clicking on the libname, the libraries appeared empty. This was irritating since I either had to use another tool if I wanted to browse the tables and look at the actual data, or rely on Proc Contents.

The answer to the problem turns out to be that some schemas are case-sensitive. SAS Tech Support suggested trying the following libname statements:

libname CRDM1 db2 dsn=crdm schema=appdm uid=xxxxx pwd=xxxxxx sql_functions=all dbindex=yes;

libname CRDM2 db2 dsn=crdm schema=”appdm” uid=xxxxx pwd=xxxxxx sql_functions=all dbindex=yes;

libname CRDM3 db2 dsn=crdm schema=APPDM uid=xxxxx pwd=xxxxxx sql_functions=all dbindex=yes;

libname CRDM4 db2 dsn=crdm schema=”APPDM” uid=xxxxx pwd=xxxxxx sql_functions=all dbindex=yes;

The first and second libname statements didn’t display the tables, but the last two with the schema in all CAPS both worked.

Checking the Match in a Matched Case-Control Study

Sometimes we wish to conduct a study in which we take a population of interest (the treatment group) and match each case to a similar individual sampled from the population which is not undergoing the treatment (the control group). The goal is to find out whether the outcome we wish to measure after treatment is significantly different for each population. This is known as an individually matched case-control study. This post will focus on checking that the matched case population is similar to the control population.

For example, say that we want to find out if providing an incentive to individuals in the treatment group will influence their behavior. We might match on the following variables: medical condition, gender, geographic area, age, risk score, number of office visits in the past 12 months, and total medical cost (TMC) in the past 12 months. Some of these we would want an exact match for (the first 3 variables in this example), whereas the last 4 variables we would match within a given range.

After we run the match, we want to check that the characteristics for the treatment group are similar to those of the control group — that the population means are not significantly different for the continuous variables such as TMC. In order to check this, we first want to test for normality. Whether or not the variable is normally distributed will determine which kind of test we run to see if there is a significant difference between the two groups.

To test normality for the variable TMC, we can use PROC UNIVARIATE as follows:

proc univariate data=work.testdata normal plot;
var TMC;
qqplot TMC /normal(mu=est sigma=est color=red L=1);
run;

This will give you a boxplot and a q-q plot, and the NORMAL option will also give you some normality tests, including Shapiro-Wilk (good for sample sizes < 2000). For normality, you want the mean and the median to be close together on the box plot, a reasonably symmetric distribution (skew close to zero), a relatively straight line for the q-q plot, and the p-values on the normality test results should be > 0.05. If the Shapiro-Wilk W value is close to 1, this also indicates that the data is normal. If p-values are less than 0.05 (alpha), that means you reject the null hypothesis that the distribution is normal and proceed with non-parametric testing.

Assuming your distribution is normal, you would run a paired t-test to see if TMC is similar for both populations (in this example, “study_group” is a flag variable that indicates whether the observation is in the treatment/study group or the control group):

proc ttest data=work.testdata;
paired TMC*study_group;
run;

The null hypothesis is that there is no statistically significant difference between the two groups, so you would want to see a large p-value here (>0.05). This would indicate that our two populations are well-matched on this variable.

However, it is unlikely that TMC is normally distributed, so you would probably end up using the NPAR1WAY procedure instead, with the Wilcoxon test (again, the null hypothesis says that the means are equal, so we want large p-values since we want to accept the null hypothesis):

proc npar1way data=work.testdata wilcoxon;
class study_group;
var TMC;
run;

The tests shown above have all been for continuous variables. Alternatively, if you wanted to test a discrete variable such as gender, you could use PROC FREQ with a chi square test to ensure that gender is independent of the study_group variable:

proc freq data=work.testdata;
tables study_group*male / chisq fisher;
run;

SAS Survey Procedures: PROC SURVEYLOGISTIC vs. PROC LOGISTIC Output

In a previous post, I talked about complex survey designs and why analysis of such survey data requires the use of SAS survey procedures. PROC SURVEYREG and PROC SURVEYLOGISTIC have some of the same options available for output/diagnostics as do their non-survey counterparts, PROC REG and PROC LOGISTIC. Default output includes fit statistics (R squared, AIC, and Schwartz’s criterion), chi-squared tests of the global null hypothesis, degrees of freedom, and coefficient estimates for each parameter along with standard error of coefficient estimates and p-values. PROC SURVEYLOGISTIC also includes odds ratio point estimates and 95% Wald confidence intervals for each input parameter, as does PROC LOGISTIC.

The survey procedures are more limited in some ways, though. For example, PROC LOGISTIC can use an option such as stepwise selection to restrict the output to only predictors with significance above a certain level; there is also an option to rank those predictors. These options do not work with PROC SURVEYLOGISTIC, which makes the output more unwieldy with a large number of predictors. Most notably in terms of differences, PROC LOGISTIC automatically outputs a chi-squared test of the residuals for each input variable; however, any analysis of residuals is irrelevant for the survey procedures since assumptions of normality and equal variance are not applicable due to survey design. Tabled residuals are not output at all for the survey procedures, although covariance matrices are available for both as a non-default option. Similarly, influential observations/outliers are also not analyzed due to the use of person weights. As long as we use person weights, we would get the same coefficients with a regular PROC REG as we would with PROC SURVEYREG, but standard error estimates would be different and predictor significance could also vary.

Reflexive Joins in SAS with Proc SQL

Sometimes, you may want to use a table twice in the same query. When you join a table to itself*, it is called a reflexive join. In the example below, a common task when calculating admissions-related healthcare quality measures is to exclude transfers from admissions rates. When a patient is transferred from one facility to another within 24 hours of being discharged from the first facility, we do not want to count this as a separate admission. If we did, this would inflate our rates.

The parameterized macro below uses a reflexive join to identify transfers and delete them from the original file (designated by the parameter filename). The table is referenced twice in the FROM clause, first with alias A and then with alias B. We use member identifying information (fields rhmo and bkey) to join the table to itself. We also use the intnx function to calculate when an admission for a single member occurs within 24 hours of the discharge date. Finally, in the WHERE clause we specify that the readmission must have occurred at a different facility (src_sys_prov_id). The unique identifier for the admission (conf_id) is used in the DELETE operation to exclude these transfers from the original file.

%macro transfer(filename=)/STORE SOURCE;
CREATE TABLE &filename._TRANS AS
SELECT B.conf_id, B.admit_dt, A.disch_dt
     FROM PQI.&filename A
       JOIN PQI.&filename B ON A.rhmo=B.rhmo AND A.bkey=B.bkey
           AND A.disch_dt BETWEEN B.admit_dt AND intnx(‘DAY’,B.admit_dt,-1)
               WHERE A.conf_id <> B.conf_id AND A.src_sys_prov_id <> B.src_sys_prov_id
;
DELETE
FROM PQI.&filename
    WHERE conf_ID IN
      (SELECT conf_id FROM &filename._TRANS)
;
%mend transfer;

*Fun grammar fact! “Itself” is a reflexive pronoun, which we use in English when the subject of a verb is also the object. Such a verb is a reflexive verb. In Spanish, these verbs take on an entirely different form than other verbs, while in English, a reflexive verb can usually also be used in a non-reflexive way (example: “I dressed myself” vs. “I dressed my cat”).

Jessica Hampton

Analytics Professional