Author Archives: jessica.hampton@gmail.com

SAS System Options: SASTRACE

You can use the SAS system option SASTRACE to see where your SAS code is being processed when using implicit pass-through (with the libname statement vs. explicit pass-through syntax) to query a relational database. PC users must also use SASTRACELOC with SASTRACE to show trace results in the log. Even if your SAS code is not written using Proc SQL, SAS will still try to translate some of it and pass it through to the native database for processing.

The following statement with SASTRACE set to ā€˜,,,dā€™ ensures that SQL SELECT, INSERT, UPDATE, CREATE, DROP, and DELETE statements sent to the Microsoft Jet engine are identified and printed to the log:

OPTIONS SASTRACE=’,,,d’ SASTRACELOC=SASLOG NOSTSUFFIX;

Testing the (keep=) Option with Proc SQL

I frequently have to pull data from tables with a large number of variables (columns). For example, the table in the example below has well over 100 variables, but I am referencing only seven variables total, including the four variables I want to select and those mentioned in the where clause.

In order to conserve system resources, it is good programming practice to specify only the fields you need instead of routinely using SELECT * to pull all fields. Those who use Data Step programming are probably familiar with drop= and keep= options to limit variable selection and increase programming efficiency; however, did you know that you can also use them with Proc SQL? I wanted to find out if using keep= within Proc SQL made my query run faster even if I had already specified the variables I wanted in my SELECT statement.

I tested this below, running the same query with and without the keep = option and using OPTION FULLSTIMER to show real time and cpu time. Here are the log results:

887 PROC SQL /*inobs = 100*/;
888 CREATE TABLE WORK.F1_Prv AS
889 SELECT DISTINCT
890 DATA_SRC_CD,
891 Prv_Calc_ID,
892 ORIG_DATA_SRC_CD,
893 Calc_Submitter_Cd
894 FROM PRODJ.INTG_SERVICE_DATE
895 WHERE Process_YM = 201001
896 AND PROD_CTG_CD = ’13’
897 AND Calc_Submitter_CD = ‘F’
898 ;
NOTE: Table WORK.F1_PRV created, with 105896 rows and 4 columns.
899 QUIT;

NOTE: PROCEDURE SQL used (Total process time):
real time 6:15.53
user cpu time 0.34 seconds
system cpu time 0.20 seconds
Memory 390k

900 PROC SQL /*inobs = 100*/;
901 CREATE TABLE WORK.F1_Prv AS
902 SELECT DISTINCT
903 DATA_SRC_CD,
904 Prv_Calc_ID,
905 ORIG_DATA_SRC_CD,
906 Calc_Submitter_Cd
907 FROM PRODJ.INTG_SERVICE_DATE (keep = DATA_SRC_CD ORIG_DATA_SRC_CD CALC_SUBMITTER_CD PROD_CTG_CD Prv_Calc_ID Process_YM)
908 WHERE Process_YM = 201001
909 AND PROD_CTG_CD = ’13’
910 AND Calc_Submitter_CD = ‘F’
911 ;
NOTE: Table WORK.F1_PRV created, with 105896 rows and 4 columns.
912 QUIT;

NOTE: PROCEDURE SQL used (Total process time):
real time 1:09.46
user cpu time 12.68 seconds
system cpu time 1.36 seconds
Memory 54200k

We can see a huge difference; Process 1 takes over 6 minutes in real time, while Process 2 takes just over 1 minute. But not so fast! Real time can vary greatly (as a matter of fact, the next two times I ran Process 2, real time was 4 minutes and 7 minutes, while user cpu time stayed between 12 and 13 seconds, and system cpu time was about 1.7 seconds) so the number we really want to look at is the cpu time, which is a better gauge of how much system resources are being used. Cpu time and memory use are both much lower for Process 1 than Process 2. My conclusion? In this case we are better off not using the (keep = ) option with Proc SQL.

Automatic SAS Macro Variables: &sqlxmsg, &sqlxsrc

When I gave my presentation on Explicit SQL Pass-Through syntax, one of the questions asked by the audience was how to retrieve error messages from the DBMS being queried. So, for example, if I query an Oracle database, how can I get Oracle-generated error messages if my query fails? SAS will give you error messages in the log, but sometimes they are not very helpful (ex: ERROR: PROC SQL requires any created table to have at least one column).

Using two automatic SAS macro variables with a %put statement can print more specific information to the log:

&sqlxmsg: contains DBMS-specific error messages
&sqlxrc: contains DBMS-specific error codes

Here’s what it looks like when you use these macro variables in a pass-through query:

proc sql;
connect to oracle (path= prodj.cigna.com user=uid password=pwd);
%put &sqlxmsg &sqlxrc; /*prints error messages to log*/
create table work.test as
select * from connection to oracle
(select * from onesource_o.s1_claim);
%put &sqlxmsg &sqlxrc; /*prints error messages to log*/
disconnect from oracle;
quit;

Create a Data Dictionary Part II: SQL Dictionary Tables

I have already posted on how to export a data dictionary to Excel using a modified Proc Contents. However, you can also practice using the Excel libname statement with SQL dictionary tables to do the same thing. The advantage of using SQL dictionary tables over Proc Contents is that SQL dictionary tables store much more information and give you access to many more variables. The two tables I use in the following code are dictionary.tables and dictionary.columns.

First, you can use Proc SQL with the noexec and feedback options to get a full list of available variables for each table to print to the log without actually having to execute the sql statement (for more on the feedback option, see my previous post):

89 proc sql feedback noexec;
90 select *
91 from dictionary.tables;

NOTE: Statement transforms to:
select libname, memname, memtype, dbms_memtype, memlabel, typemem, crdate, modate, nobs, obslen, nvar, protect, compress, encrypt, npage, filesize,pcompress, reuse, bufsize, delobs, nlobs, maxvar, maxlabel, maxgen, gen, attr, indxtype, datarep, sortname, sorttype, sortchar, reqvector, datarepname,encoding, audit, audit_before, audit_admin, audit_error, audit_data
from DICTIONARY.TABLES;

Now, suppose I want to create my data dictionary based on all the tables I have in my library “HRA”. First, I assign my libname statement to point to the Excel workbook I want to create (note that the libname statement does not support the replace option, so if the file already exists, you’ll have to delete it before trying to recreate it):

libname myxls \\WLDAPP043\Hedisapp\Personal\Jess\HRA\2010\Data_Dictionary.xls;

Then I select from the dictionary tables using Proc SQL to create my two worksheets:

proc sql;
create table myxls.directory as
select memname,
crdate,
modate,
nobs,
obslen,
nvar,
filesize
from dictionary.tables
where libname = ‘HRA’
;
create table myxls.table_listing as
select memname,
name,
type,
length,
format
from dictionary.columns
where libname = ‘HRA’
;
QUIT;

One final note to add is that you can’t open the Excel file while the libname is still assigned, so you’ll either have to close out of SAS before admiring your data dictionary, or run the following line of code to deassign your libname statement.

libname myxls clear;

HASUG Meeting Notes: November 2011 (Social Media)

The 4th quarter 2011 HASUG meeting took place at Bristol-Myers Squibb in Wallingford, CT, on November 10th. Speakers included John Adams of Boehringer Ingelheim and David Kelly from Customer Intelligence at SAS Institute.

David Kelly presented SAS Institute’s Social Media Analytics software platform designed to enable companies to process large volumes of unstructured data from internal and external social media and to base business decisions on that data. His presentation, “The Power of Social Media Listening,” introduced the SAS Customer Intelligence organization at SAS, provided an engaging narrative of social media statistics (about 70% of YouTube and FaceBook activity come from outside the US, for example), portrayed the potential and the landscape of social media and outlined the data challenges (punctuation, spelling, segmentation, acronyms, industry and social media jargon) associated with analyzing the unstructured text data which comprises over 70% of social media data.

Kelly cited the infamous viral YouTube video posted by a previously obscure singer-songwriter who watched United Airlines cargo handlers break his expensive Taylor guitar as a social media example of negative PR which led to a loss of $180 million for United Airlines. Clearly, how corporations react to social media in real-time can have serious financial implications. Kelly also discussed the “4 Cs” of social media: Content, Context, Connections, and Conversations, and noted the importance of being able to identify key “influencers” (a concept which will be of interest to those acquainted with Malcolm Gladwell’s “Tipping Point”) and the origins of negative PR stories.

SAS’s solution for businesses looking to monitor and respond quickly to information about their brand floating around on social media sites is the SAS Social Media Analytics software platform. This platform crawls the web for industry or company-specific information (largely in the form of unstructured text data), capturing, cleaning, organizing, and analyzing that data as part of a customizable self-service application that allows your organization to generate real-time reports, including comparison reports (vs. competition), analysis of historical data and trend identification, “sentiment analysis” currently supported in 13 languages, and much more. See my conference paper for an example of the kind of text analysis you can do using Base SAS.