Category Archives: SAS Tips

Simple Random Sampling with Proc SurveySelect

A common way to create a random sample of n=1000 in SAS is to generate a random number field for each observation using RANUNI or a similar function. The data set is then sorted on that field and the top 1000 selected as the sample.

PROC SURVEYSELECT offers a simple alternative with just a few lines of code:

proc surveyselect data=Customers
method=srs n=1000 out=SampleSRS;
run;

The METHOD statement set equal to “srs” indicates that simple random sampling will be used. DATA= specifies the input data set, while OUT= specifies the output data set. N= is used to set the sample size, and an optional SEED= statement can be used if a particular seed is desired for generating the random number; otherwise the seed will default to the time of day from the computer clock. Default output will include the seed, selection probability, and sampling weight for each observation.

Alternatively, if you want to get a little fancier and play around with various sample sizes for different markets, I like to use macro variables when setting some of the parameters:

proc surveyselect data=work.total_elig_pop_&mkt
method=srs n=&size out=ci_share.sample_&mkt._new;
run;

There are many other options available with PROC SURVEYSELECT which you can use for more complex sampling. Additional SAS survey procedures for analyzing data created using complex sampling methods are discussed in one of my conference papers. For more about macro variables and how they make your code easier to maintain, see 10 Steps to Easier SAS Code Maintenance.

Granting SELECT Access to Your Tables

If you have created a table in your own personal schema in an Oracle database, others do not have permissions to access that object. You may at some point wish to allow SELECT access to another user. To do this in SAS, you can use PROC DBLOAD as follows:

proc dbload dbms=odbc;
user=’yourusername‘;
password=’password‘;
dsn=database; /* this is the name of your ODBC Connection to the database. */
sql grant select on yourschema.yourtablename to anotherusername;
run;

Or you can use explicit pass-through syntax to grant access:

proc sql;
connect to oracle (user=yourusername password=password);
execute (grant select on yourusername.yourtablename
to anotherusername) by oracle;
disconnect from oracle;
quit;

Finally, you can do this directly in the Oracle database itself just by executing the following statement which is passed through in the syntax above:

grant select on yourusername.yourtablename to anotherusername;

How to View Remote Server Files on Your PC

Many of us create and store files on remote servers. One way of viewing the files you create is to use Proc Download to export them to your local libraries, but downloading large files can be time-consuming. Fortunately, there is an easier way.

First, I connect to the server and rsubmit my libname statement with the path to its location on the server; then I submit a second libname statement locally using the same libref. After submitting the following code, I can open Explorer, navigate to Libraries, and the library I specified on the remote server is now available for me to browse.

rsubmit;
libname WHP “/projects/hedis/hedis/whp”;
endrsubmit;
/*submit locally*/
libname WHP server=remoteservername; /*view remote server files on local machine*/

In-Database Processing with SAS 9.2

There are some new SAS In-Database processing features available with version 9.2. In addition to the explicit pass-through syntax with Proc SQL option, SAS has expanded its capacity to generate DBMS-specific or “native” SQL for Proc SQL and even non-SQL procedures to do more processing within the native databases and minimize I/O. Seven non-SQL procedures are now supported in Oracle and DB2, including Proc Freq, Proc Means, Proc Summary, Proc Tabulate, Proc Report, Proc Rank, and Proc Sort (additional procs are supported in Teradata). System options to enable additional output to the log are SASTRACE (see previous post) and SASTRACELOC. Use system options DIRECT_SQL and SQLGENERATION to enable and disable in-database processing and try comparing log output with and without in-database processing.

Creating Row Numbers with Proc SQL

Some of you may be familiar with the Proc SQL NUMBER option. This option displays row/observation numbers with a report; however, it does not store these row numbers in a dataset. In order to create and store these row numbers, use the MONOTONIC() function. Here’s an admissions-related example of using MONOTONIC() with PROC RANK to create a list of unique procedure codes ranked by procedure type for each case id:

/*create listing of unique procedure codes*/
proc sql;
create table unique_procs as
select monotonic() as row,
sq.* from
(select
case_id,
mbr_num,
svc_dt,
proc_cd,
midsu_proc_mdfr_cd,
proc_ty,
case when proc_ty IN(‘4′,’5′,’RV’)then proc_cd end as rev_proc,
case when proc_ty in (‘7′,’HC’) then proc_cd end as hcpcs_proc,
case when proc_ty IN(‘1′,’CP’) then proc_cd end as cpt_proc,
case when proc_ty IN(‘9′,’IC’) then proc_cd end as icd9_proc
from work.ppo_ip_claim_all_&mkt
group by
case_id,
mbr_num,
svc_dt,
proc_cd,
midsu_proc_mdfr_cd,
proc_ty
)sq
order by
case_id,
proc_ty
;
quit;

/*number fields*/
proc rank data=work.unique_procs out=work.unique_procs_ranked ties=low;
by case_id proc_ty;
var row;
ranks ln_num;
run;