Category Archives: SAS Tips

SAS System Options: SASTRACE

You can use the SAS system option SASTRACE to see where your SAS code is being processed when using implicit pass-through (with the libname statement vs. explicit pass-through syntax) to query a relational database. PC users must also use SASTRACELOC with SASTRACE to show trace results in the log. Even if your SAS code is not written using Proc SQL, SAS will still try to translate some of it and pass it through to the native database for processing.

The following statement with SASTRACE set to ā€˜,,,dā€™ ensures that SQL SELECT, INSERT, UPDATE, CREATE, DROP, and DELETE statements sent to the Microsoft Jet engine are identified and printed to the log:

OPTIONS SASTRACE=’,,,d’ SASTRACELOC=SASLOG NOSTSUFFIX;

Testing the (keep=) Option with Proc SQL

I frequently have to pull data from tables with a large number of variables (columns). For example, the table in the example below has well over 100 variables, but I am referencing only seven variables total, including the four variables I want to select and those mentioned in the where clause.

In order to conserve system resources, it is good programming practice to specify only the fields you need instead of routinely using SELECT * to pull all fields. Those who use Data Step programming are probably familiar with drop= and keep= options to limit variable selection and increase programming efficiency; however, did you know that you can also use them with Proc SQL? I wanted to find out if using keep= within Proc SQL made my query run faster even if I had already specified the variables I wanted in my SELECT statement.

I tested this below, running the same query with and without the keep = option and using OPTION FULLSTIMER to show real time and cpu time. Here are the log results:

887 PROC SQL /*inobs = 100*/;
888 CREATE TABLE WORK.F1_Prv AS
889 SELECT DISTINCT
890 DATA_SRC_CD,
891 Prv_Calc_ID,
892 ORIG_DATA_SRC_CD,
893 Calc_Submitter_Cd
894 FROM PRODJ.INTG_SERVICE_DATE
895 WHERE Process_YM = 201001
896 AND PROD_CTG_CD = ’13’
897 AND Calc_Submitter_CD = ‘F’
898 ;
NOTE: Table WORK.F1_PRV created, with 105896 rows and 4 columns.
899 QUIT;

NOTE: PROCEDURE SQL used (Total process time):
real time 6:15.53
user cpu time 0.34 seconds
system cpu time 0.20 seconds
Memory 390k

900 PROC SQL /*inobs = 100*/;
901 CREATE TABLE WORK.F1_Prv AS
902 SELECT DISTINCT
903 DATA_SRC_CD,
904 Prv_Calc_ID,
905 ORIG_DATA_SRC_CD,
906 Calc_Submitter_Cd
907 FROM PRODJ.INTG_SERVICE_DATE (keep = DATA_SRC_CD ORIG_DATA_SRC_CD CALC_SUBMITTER_CD PROD_CTG_CD Prv_Calc_ID Process_YM)
908 WHERE Process_YM = 201001
909 AND PROD_CTG_CD = ’13’
910 AND Calc_Submitter_CD = ‘F’
911 ;
NOTE: Table WORK.F1_PRV created, with 105896 rows and 4 columns.
912 QUIT;

NOTE: PROCEDURE SQL used (Total process time):
real time 1:09.46
user cpu time 12.68 seconds
system cpu time 1.36 seconds
Memory 54200k

We can see a huge difference; Process 1 takes over 6 minutes in real time, while Process 2 takes just over 1 minute. But not so fast! Real time can vary greatly (as a matter of fact, the next two times I ran Process 2, real time was 4 minutes and 7 minutes, while user cpu time stayed between 12 and 13 seconds, and system cpu time was about 1.7 seconds) so the number we really want to look at is the cpu time, which is a better gauge of how much system resources are being used. Cpu time and memory use are both much lower for Process 1 than Process 2. My conclusion? In this case we are better off not using the (keep = ) option with Proc SQL.

Automatic SAS Macro Variables: &sqlxmsg, &sqlxsrc

When I gave my presentation on Explicit SQL Pass-Through syntax, one of the questions asked by the audience was how to retrieve error messages from the DBMS being queried. So, for example, if I query an Oracle database, how can I get Oracle-generated error messages if my query fails? SAS will give you error messages in the log, but sometimes they are not very helpful (ex: ERROR: PROC SQL requires any created table to have at least one column).

Using two automatic SAS macro variables with a %put statement can print more specific information to the log:

&sqlxmsg: contains DBMS-specific error messages
&sqlxrc: contains DBMS-specific error codes

Here’s what it looks like when you use these macro variables in a pass-through query:

proc sql;
connect to oracle (path= prodj.cigna.com user=uid password=pwd);
%put &sqlxmsg &sqlxrc; /*prints error messages to log*/
create table work.test as
select * from connection to oracle
(select * from onesource_o.s1_claim);
%put &sqlxmsg &sqlxrc; /*prints error messages to log*/
disconnect from oracle;
quit;

Create a Data Dictionary Part II: SQL Dictionary Tables

I have already posted on how to export a data dictionary to Excel using a modified Proc Contents. However, you can also practice using the Excel libname statement with SQL dictionary tables to do the same thing. The advantage of using SQL dictionary tables over Proc Contents is that SQL dictionary tables store much more information and give you access to many more variables. The two tables I use in the following code are dictionary.tables and dictionary.columns.

First, you can use Proc SQL with the noexec and feedback options to get a full list of available variables for each table to print to the log without actually having to execute the sql statement (for more on the feedback option, see my previous post):

89 proc sql feedback noexec;
90 select *
91 from dictionary.tables;

NOTE: Statement transforms to:
select libname, memname, memtype, dbms_memtype, memlabel, typemem, crdate, modate, nobs, obslen, nvar, protect, compress, encrypt, npage, filesize,pcompress, reuse, bufsize, delobs, nlobs, maxvar, maxlabel, maxgen, gen, attr, indxtype, datarep, sortname, sorttype, sortchar, reqvector, datarepname,encoding, audit, audit_before, audit_admin, audit_error, audit_data
from DICTIONARY.TABLES;

Now, suppose I want to create my data dictionary based on all the tables I have in my library “HRA”. First, I assign my libname statement to point to the Excel workbook I want to create (note that the libname statement does not support the replace option, so if the file already exists, you’ll have to delete it before trying to recreate it):

libname myxls \\WLDAPP043\Hedisapp\Personal\Jess\HRA\2010\Data_Dictionary.xls;

Then I select from the dictionary tables using Proc SQL to create my two worksheets:

proc sql;
create table myxls.directory as
select memname,
crdate,
modate,
nobs,
obslen,
nvar,
filesize
from dictionary.tables
where libname = ‘HRA’
;
create table myxls.table_listing as
select memname,
name,
type,
length,
format
from dictionary.columns
where libname = ‘HRA’
;
QUIT;

One final note to add is that you can’t open the Excel file while the libname is still assigned, so you’ll either have to close out of SAS before admiring your data dictionary, or run the following line of code to deassign your libname statement.

libname myxls clear;

Proc SQL Feedback Option

Most people who work with large data sets understand the importance of only selecting the fields you need from any given table or tables rather than using Select * to select all fields. This leads to more efficient use of system resources, but can be annoying for the programmer who has to type in a long list of fields.

In order to avoid this, one easy tip is to use Enterprise Guide’s interface to select desired fields and then copying and pasting the generated code into your SAS program. An alternative to doing this, however, is to use the Proc SQL Feedback option.

For example, if I want to get a list of fields available in the SQL Dictionary table dictionary.tables for a given libname, I can use the feedback option to get the following log output:

21 proc sql feedback;
22 select *
23 from dictionary.tables
24 where libname = ‘yourlibname’
25 ;
NOTE: Statement transforms to:
select libname, memname, memtype, dbms_memtype, memlabel, typemem, crdate, modate, nobs, obslen, nvar, protect, compress, encrypt, npage, filesize, pcompress, reuse, bufsize, delobs, nlobs, maxvar, maxlabel, maxgen, gen, attr, indxtype, datarep, sortname, sorttype, sortchar, reqvector, datarepname, encoding, audit, audit_before, audit_admin, audit_error, audit_data

from DICTIONARY.TABLES

where libname=’yourlibname’;

I can then copy and paste this listing into my original program and select only those fields I want without having to type them in manually.