Tag Archives: SAS

3 Ways to Export Your Data to Excel

SAS offers many different ways to export data to Excel, including Excel ODS output. The following alternatives to ODS are also available:

The File Export Wizard

This is an easy way to export your data without having to write any code. To use this wizard:

Select File>>Export Data
Choose the library and SAS data set name
Select a data source from the dropdown list (in this case, the version of Excel you’d like to export to – other types of data sources are also available here)
Browse to the folder where you’d like to create your Excel Workbook and type a name for the file
You then have the option to save the Proc Export statements generated by the Wizard to a SAS program file

Proc Export

If you chose to use the File Export Wizard to save your Proc Export statements, your code may look something like this:

PROC EXPORT DATA=sasdatasetname
OUTFILE=”\\path\filename.xls”
DBMS=EXCEL REPLACE;
SHEET=”sheetname”;
RUN;

The data set you’d like to export goes after DATA=
The file path goes after OUTFILE=
You are exporting to Excel, so DBMS=EXCEL
If you wish to replace any existing workbook or worksheet with the same name, use the REPLACE option.
The SHEET= statement is optional, but if you wish to create a multi-sheet workbook, it is useful. For your next PROC EXPORT statement, just specify a different file and a new SHEET name.

The Libname Statement

This is a newer option for exporting to Excel which does not support the REPLACE option, so if you already have a file of the same name you are trying to create, you will have to delete or rename the older version first. Also, while the libname is assigned, you will not be able to access the Excel document you have created; you will first have to exit out of SAS or clear the libname. It is easy to create multiple sheets using this method, though, as shown below.

libname myxls \\path\filename.xls;

DATA myxls.sheetname1;
SET sasdataset1;
DATA myxls.sheetname2;
SET sasdataset2;
RUN;

PROC SQL;
Create table myxls.sheetname3 as
SELECT *
from sasdataset3;
QUIT;

HASUG Meeting Notes: May 2011

Northeast Utilities hosted May’s HASUG meeting in Berlin, CT. Both speakers focused on detecting fraud, first from store credit card issuer GE’s perspective, and then from a property and casualty insurance claims perspective.

Usage of SAS in Credit Card Fraud Detection,” presented by GE employee Clint Rickards, began by introducing the most common types of credit card fraud and contrasting challenges faced by PLCC (store card) vs. bank card issuers. He presented the following interesting statistic: half of all credit card fraud as measured in dollar amounts occurs in only six states (CA, TX, FL, NJ, NY, and MI). He then discussed the general architecture of GE’s Risk Assessment Platform (RAP) designed to detect both real-time and post-transaction fraud, which uses the full SAS Business Intelligence suite of products: SAS/Intrnet, Enterprise Guide, Data Integration Studio, Management Console, SAS/Scalable Performance Data Server, and Platform Flow Manager/Calendar Editor. Finally, he stressed the importance of automated processes, reusable code, large jobs broken into smaller pieces to allow for easier debugging, and separation between the testing and production environment.

Next, Janine Johnson of ISO Innovative Analytics presented “Mining Text for Suspicious P&C Claims,” describing how her consulting firm developed an automated (labor intensive, but cost effective) process in Base SAS for “mining” insurance claim adjusters’ notes in an unstructured text field to get data for use in a predictive model. She introduced the process of text mining as follows: information retrieval, natural language processing, creating structured data from unstructured text, and evaluating structured outputs (classification, clustering, association, etc.). Before beginning this process, she emphasized the necessity of consulting a domain expert (in this case, someone in the P&C industry familiar with industry jargon and non-standard abbreviations). She then organized her own project into five steps of an iterative process: cleaning the text (using upcase, compress, translate, and combl functions), standardizing terms (using regular expression functions prxparse, prxposn, prxchange, as well as scan and tranwrd), identifying words associated with suspicious claims and grouping them into concepts (“concept generation”), flagging records with those suspicious phrases, and finally using proc freq with the chi squared option to evaluate lift.

Book Review: Web Development with SAS by Example

Title: Web Development with SAS by Example, 3rd edition
Author: Frederick Pratter
Publisher: SAS Publishing
Pages: 354 pages
Available: September 2011

This ambitious volume covers how to deliver your SAS output online from start to finish in a mere 360 pages. Pratter assumes his audience has no prior knowledge of web programming, giving a thorough introduction in his first four chapters to the basics of HTML and XML, static vs. dynamic web pages, and how the internet works along with some background history on TCP/IP, different types of web servers, and a whole host of acronyms. Chapters 5 and 6 in Part II outline different ways to access your data, focusing on SAS/SHARE and SAS/ACCESS, with examples of how to use SQL pass-through for both and information to help the reader in selecting an appropriate method of access. I found the section on OLEDB/ODBC here interesting as well. Part III goes on to introduce SAS/IntrNet, Part IV devotes five chapters to SAS BI Server, and the book concludes with some Java.

One of the strengths of this book is that Pratter throughout shows multiple ways of displaying and accessing the same data, for example contrasting various “old school” programming methods with ODS HTML statements and Proc Access vs. the newer SAS/Access interface. Such examples demonstrate how SAS has evolved since its earlier versions and may be of interest to both experienced and newer programmers. A challenge of this book is that a lot of SAS users are not familiar with administrative aspects such as server configurations, including TCP, and may find some of this material harder to understand.

HASUG Meeting Notes: February 2011

The first quarter HASUG Meeting on February 24, 2011, took place at Case Memorial Library in Orange, CT from 10 am -1 pm.

Santosh Bari, a SAS-certified professional currently with eClinical Solutions (a division of Eliassen Group in New London, CT), opened the meeting with his presentation on Proc Report: A Step-by-Step Introduction to Proc Report and Advanced Techniques. Proc Report is a powerful report-generating procedure which combines many of the features of Proc Print, Proc Sort, Proc Means, Proc Freq, and Proc Tabulate. Mr. Bari’s presentation was a very in-depth discussion of proc report options and attributes which included code samples alongside corresponding sample output. He did a very thorough job of presenting the wide array of functionality included in proc report, including more advanced, lesser-known topics such as BREAK BEFORE/AFTER statements, COMPUTE blocks, and PANELS and FLOW options.

Following Mr. Bari, Charles Patridge of ISO Innovative Analytics presented Best Practices: Using SAS Effectively/Efficiently. His presentation, a compilation of a number of popular past topics, included the introduction of an effective naming convention for programs and files along with compelling reasons for creating such a naming system, creation of data dictionaries in Excel with the use of a proc contents-based macro, and central macro autocall libraries. Mr. Patridge used his many years of past consulting experience to argue for spending a little time up-front to organize and name one’s programs and data sets in such a way that makes transparent the order of execution of the programs and the origin of the datasets. When dataset names correspond to the names of the programs which created them, this makes a project self-documenting and easier to hand off to others.

SEMMA and CRISP-DM: Data Mining Methodologies

Data mining is the process of examining large sets of data for previously unsuspected patterns which can give us useful information. Data mining has a great variety of applications: it can be used to try to predict future events (such as stock prices or football scores), cluster populations into groups of people having similar characteristics, or estimate the likelihood of certain health conditions being present given other known variables.

Cross Industry Standard Process for Data Mining (CRISP-DM) is a 6-phase model of the entire data mining process, from start to finish, that is broadly applicable across industries for a wide array of data mining projects. To see a visual representation of this model, visit www.crisp-dm.org.

CRISP-DM is not the only standard process for data mining. SEMMA, from SAS Institute, is an alternative methodology:
Sample – the subset of data should be large enough to be a representative sample but not too large of a dataset to process easily
Explore – look for patterns in the data
Modify – create and transform variables, or eliminate unnecessary ones
Model – select and apply a model that best fits your situation and data
Assess – determine whether or not your results are useful and reliable. Test your results against known data or another sample

According to the SAS website: “SEMMA is not a data mining methodology but rather a logical organisation of the functional tool set of SAS Enterprise Miner for carrying out the core tasks of data mining. Enterprise Miner can be used as part of any iterative data mining methodology adopted by the client. Naturally steps such as formulating a well defined business or research problem and assembling quality representative data sources are critical to the overall success of any data mining project. SEMMA is focused on the model development aspects of data mining.”

This is a good summary of some of the differences between CRISP-DM and SEMMA. Firstly, SEMMA was developed with a specific data mining software package in mind (Enterprise Miner), rather than designed to be applicable with a broader range of data mining tools and the general business environment. Since it is focused on SAS Enterprise Miner software and on model development specifically, it places less emphasis on the initial planning phases covered in CRISP-DM (Business Understanding and Data Understanding phases) and omits entirely the Deployment phase.

That said, there are some similarities as well. The Sample and Explore stages of SEMMA roughly correspond with the Data Understanding phase of CRISP-DM; Modify translates to the Data Preparation phase; Model is obviously the Modeling phase, and Assess parallels the Evaluation phase of CRISP-DM. Additionally, both models are intended to be somewhat cyclical rather than linear in nature. The SEMMA model recommends returning to the Explore stage in response to new information that comes to light in later stages which may necessitate changes to the data. The CRISP-DM model also emphasizes data mining as a non-linear, adaptive process.

Jessica Hampton

Analytics Professional