Tag Archives: proc sql

In-Database Processing with SAS 9.2

There are some new SAS In-Database processing features available with version 9.2. In addition to the explicit pass-through syntax with Proc SQL option, SAS has expanded its capacity to generate DBMS-specific or “native” SQL for Proc SQL and even non-SQL procedures to do more processing within the native databases and minimize I/O. Seven non-SQL procedures are now supported in Oracle and DB2, including Proc Freq, Proc Means, Proc Summary, Proc Tabulate, Proc Report, Proc Rank, and Proc Sort (additional procs are supported in Teradata). System options to enable additional output to the log are SASTRACE (see previous post) and SASTRACELOC. Use system options DIRECT_SQL and SQLGENERATION to enable and disable in-database processing and try comparing log output with and without in-database processing.

Creating Row Numbers with Proc SQL

Some of you may be familiar with the Proc SQL NUMBER option. This option displays row/observation numbers with a report; however, it does not store these row numbers in a dataset. In order to create and store these row numbers, use the MONOTONIC() function. Here’s an admissions-related example of using MONOTONIC() with PROC RANK to create a list of unique procedure codes ranked by procedure type for each case id:

/*create listing of unique procedure codes*/
proc sql;
create table unique_procs as
select monotonic() as row,
sq.* from
(select
case_id,
mbr_num,
svc_dt,
proc_cd,
midsu_proc_mdfr_cd,
proc_ty,
case when proc_ty IN(‘4′,’5′,’RV’)then proc_cd end as rev_proc,
case when proc_ty in (‘7′,’HC’) then proc_cd end as hcpcs_proc,
case when proc_ty IN(‘1′,’CP’) then proc_cd end as cpt_proc,
case when proc_ty IN(‘9′,’IC’) then proc_cd end as icd9_proc
from work.ppo_ip_claim_all_&mkt
group by
case_id,
mbr_num,
svc_dt,
proc_cd,
midsu_proc_mdfr_cd,
proc_ty
)sq
order by
case_id,
proc_ty
;
quit;

/*number fields*/
proc rank data=work.unique_procs out=work.unique_procs_ranked ties=low;
by case_id proc_ty;
var row;
ranks ln_num;
run;

SAS System Options: SASTRACE

You can use the SAS system option SASTRACE to see where your SAS code is being processed when using implicit pass-through (with the libname statement vs. explicit pass-through syntax) to query a relational database. PC users must also use SASTRACELOC with SASTRACE to show trace results in the log. Even if your SAS code is not written using Proc SQL, SAS will still try to translate some of it and pass it through to the native database for processing.

The following statement with SASTRACE set to ā€˜,,,dā€™ ensures that SQL SELECT, INSERT, UPDATE, CREATE, DROP, and DELETE statements sent to the Microsoft Jet engine are identified and printed to the log:

OPTIONS SASTRACE=’,,,d’ SASTRACELOC=SASLOG NOSTSUFFIX;

Testing the (keep=) Option with Proc SQL

I frequently have to pull data from tables with a large number of variables (columns). For example, the table in the example below has well over 100 variables, but I am referencing only seven variables total, including the four variables I want to select and those mentioned in the where clause.

In order to conserve system resources, it is good programming practice to specify only the fields you need instead of routinely using SELECT * to pull all fields. Those who use Data Step programming are probably familiar with drop= and keep= options to limit variable selection and increase programming efficiency; however, did you know that you can also use them with Proc SQL? I wanted to find out if using keep= within Proc SQL made my query run faster even if I had already specified the variables I wanted in my SELECT statement.

I tested this below, running the same query with and without the keep = option and using OPTION FULLSTIMER to show real time and cpu time. Here are the log results:

887 PROC SQL /*inobs = 100*/;
888 CREATE TABLE WORK.F1_Prv AS
889 SELECT DISTINCT
890 DATA_SRC_CD,
891 Prv_Calc_ID,
892 ORIG_DATA_SRC_CD,
893 Calc_Submitter_Cd
894 FROM PRODJ.INTG_SERVICE_DATE
895 WHERE Process_YM = 201001
896 AND PROD_CTG_CD = ’13’
897 AND Calc_Submitter_CD = ‘F’
898 ;
NOTE: Table WORK.F1_PRV created, with 105896 rows and 4 columns.
899 QUIT;

NOTE: PROCEDURE SQL used (Total process time):
real time 6:15.53
user cpu time 0.34 seconds
system cpu time 0.20 seconds
Memory 390k

900 PROC SQL /*inobs = 100*/;
901 CREATE TABLE WORK.F1_Prv AS
902 SELECT DISTINCT
903 DATA_SRC_CD,
904 Prv_Calc_ID,
905 ORIG_DATA_SRC_CD,
906 Calc_Submitter_Cd
907 FROM PRODJ.INTG_SERVICE_DATE (keep = DATA_SRC_CD ORIG_DATA_SRC_CD CALC_SUBMITTER_CD PROD_CTG_CD Prv_Calc_ID Process_YM)
908 WHERE Process_YM = 201001
909 AND PROD_CTG_CD = ’13’
910 AND Calc_Submitter_CD = ‘F’
911 ;
NOTE: Table WORK.F1_PRV created, with 105896 rows and 4 columns.
912 QUIT;

NOTE: PROCEDURE SQL used (Total process time):
real time 1:09.46
user cpu time 12.68 seconds
system cpu time 1.36 seconds
Memory 54200k

We can see a huge difference; Process 1 takes over 6 minutes in real time, while Process 2 takes just over 1 minute. But not so fast! Real time can vary greatly (as a matter of fact, the next two times I ran Process 2, real time was 4 minutes and 7 minutes, while user cpu time stayed between 12 and 13 seconds, and system cpu time was about 1.7 seconds) so the number we really want to look at is the cpu time, which is a better gauge of how much system resources are being used. Cpu time and memory use are both much lower for Process 1 than Process 2. My conclusion? In this case we are better off not using the (keep = ) option with Proc SQL.

Automatic SAS Macro Variables: &sqlxmsg, &sqlxsrc

When I gave my presentation on Explicit SQL Pass-Through syntax, one of the questions asked by the audience was how to retrieve error messages from the DBMS being queried. So, for example, if I query an Oracle database, how can I get Oracle-generated error messages if my query fails? SAS will give you error messages in the log, but sometimes they are not very helpful (ex: ERROR: PROC SQL requires any created table to have at least one column).

Using two automatic SAS macro variables with a %put statement can print more specific information to the log:

&sqlxmsg: contains DBMS-specific error messages
&sqlxrc: contains DBMS-specific error codes

Here’s what it looks like when you use these macro variables in a pass-through query:

proc sql;
connect to oracle (path= prodj.cigna.com user=uid password=pwd);
%put &sqlxmsg &sqlxrc; /*prints error messages to log*/
create table work.test as
select * from connection to oracle
(select * from onesource_o.s1_claim);
%put &sqlxmsg &sqlxrc; /*prints error messages to log*/
disconnect from oracle;
quit;