Book Review: Web Development with SAS by Example

Title: Web Development with SAS by Example, 3rd edition
Author: Frederick Pratter
Publisher: SAS Publishing
Pages: 354 pages
Available: September 2011

This ambitious volume covers how to deliver your SAS output online from start to finish in a mere 360 pages. Pratter assumes his audience has no prior knowledge of web programming, giving a thorough introduction in his first four chapters to the basics of HTML and XML, static vs. dynamic web pages, and how the internet works along with some background history on TCP/IP, different types of web servers, and a whole host of acronyms. Chapters 5 and 6 in Part II outline different ways to access your data, focusing on SAS/SHARE and SAS/ACCESS, with examples of how to use SQL pass-through for both and information to help the reader in selecting an appropriate method of access. I found the section on OLEDB/ODBC here interesting as well. Part III goes on to introduce SAS/IntrNet, Part IV devotes five chapters to SAS BI Server, and the book concludes with some Java.

One of the strengths of this book is that Pratter throughout shows multiple ways of displaying and accessing the same data, for example contrasting various “old school” programming methods with ODS HTML statements and Proc Access vs. the newer SAS/Access interface. Such examples demonstrate how SAS has evolved since its earlier versions and may be of interest to both experienced and newer programmers. A challenge of this book is that a lot of SAS users are not familiar with administrative aspects such as server configurations, including TCP, and may find some of this material harder to understand.

HASUG Meeting Notes: February 2011

The first quarter HASUG Meeting on February 24, 2011, took place at Case Memorial Library in Orange, CT from 10 am -1 pm.

Santosh Bari, a SAS-certified professional currently with eClinical Solutions (a division of Eliassen Group in New London, CT), opened the meeting with his presentation on Proc Report: A Step-by-Step Introduction to Proc Report and Advanced Techniques. Proc Report is a powerful report-generating procedure which combines many of the features of Proc Print, Proc Sort, Proc Means, Proc Freq, and Proc Tabulate. Mr. Bari’s presentation was a very in-depth discussion of proc report options and attributes which included code samples alongside corresponding sample output. He did a very thorough job of presenting the wide array of functionality included in proc report, including more advanced, lesser-known topics such as BREAK BEFORE/AFTER statements, COMPUTE blocks, and PANELS and FLOW options.

Following Mr. Bari, Charles Patridge of ISO Innovative Analytics presented Best Practices: Using SAS Effectively/Efficiently. His presentation, a compilation of a number of popular past topics, included the introduction of an effective naming convention for programs and files along with compelling reasons for creating such a naming system, creation of data dictionaries in Excel with the use of a proc contents-based macro, and central macro autocall libraries. Mr. Patridge used his many years of past consulting experience to argue for spending a little time up-front to organize and name one’s programs and data sets in such a way that makes transparent the order of execution of the programs and the origin of the datasets. When dataset names correspond to the names of the programs which created them, this makes a project self-documenting and easier to hand off to others.

SEMMA and CRISP-DM: Data Mining Methodologies

Data mining is the process of examining large sets of data for previously unsuspected patterns which can give us useful information. Data mining has a great variety of applications: it can be used to try to predict future events (such as stock prices or football scores), cluster populations into groups of people having similar characteristics, or estimate the likelihood of certain health conditions being present given other known variables.

Cross Industry Standard Process for Data Mining (CRISP-DM) is a 6-phase model of the entire data mining process, from start to finish, that is broadly applicable across industries for a wide array of data mining projects. To see a visual representation of this model, visit www.crisp-dm.org.

CRISP-DM is not the only standard process for data mining. SEMMA, from SAS Institute, is an alternative methodology:
Sample – the subset of data should be large enough to be a representative sample but not too large of a dataset to process easily
Explore – look for patterns in the data
Modify – create and transform variables, or eliminate unnecessary ones
Model – select and apply a model that best fits your situation and data
Assess – determine whether or not your results are useful and reliable. Test your results against known data or another sample

According to the SAS website: “SEMMA is not a data mining methodology but rather a logical organisation of the functional tool set of SAS Enterprise Miner for carrying out the core tasks of data mining. Enterprise Miner can be used as part of any iterative data mining methodology adopted by the client. Naturally steps such as formulating a well defined business or research problem and assembling quality representative data sources are critical to the overall success of any data mining project. SEMMA is focused on the model development aspects of data mining.”

This is a good summary of some of the differences between CRISP-DM and SEMMA. Firstly, SEMMA was developed with a specific data mining software package in mind (Enterprise Miner), rather than designed to be applicable with a broader range of data mining tools and the general business environment. Since it is focused on SAS Enterprise Miner software and on model development specifically, it places less emphasis on the initial planning phases covered in CRISP-DM (Business Understanding and Data Understanding phases) and omits entirely the Deployment phase.

That said, there are some similarities as well. The Sample and Explore stages of SEMMA roughly correspond with the Data Understanding phase of CRISP-DM; Modify translates to the Data Preparation phase; Model is obviously the Modeling phase, and Assess parallels the Evaluation phase of CRISP-DM. Additionally, both models are intended to be somewhat cyclical rather than linear in nature. The SEMMA model recommends returning to the Explore stage in response to new information that comes to light in later stages which may necessitate changes to the data. The CRISP-DM model also emphasizes data mining as a non-linear, adaptive process.

Spotlight on Support.SAS.com: SAS TALKS

The SAS website has so much information that it can be overwhelming at times, so I want to occasionally highlight features that I have found helpful in the past. In this case, the feature I want to highlight is the SAS TALKS free webinar series.

These 50-minute, complimentary live webinars are offered on a monthly basis and cover a variety of topics. This month’s featured webinar is a January 27th presentation on Enterprise Guide 4.3. You can register here. Next month, a SAS Online Customer Support Specialist presents Tips and Tricks for Finding SAS Information (February 24th).

That’s not all, though – the site also hosts “On-Demand TALKS,” which are recorded webinars on past topics ranging from specific procedures such as Proc Report and Proc Format to ODS graphics and data models. You can listen to these On-Demand webinars any time you want.

On this page, you can also find a link to Vince DelGobbo’s presentation: An Introduction to Creating Multi-Sheet Microsoft Excel Workbooks the Easy Way with SAS. I’ve seen variations on this talk at HASUG and NESUG, and I was pleased to find a link to it here.

The Cheapskate’s Guide to SAS Certification

We’re all setting our APOs right now and thinking about our goals for the coming year. For many of us, our professional goals include increasing our SAS knowledge and possibly even getting SAS certified. If you’re new to SAS and want the full training package from scratch, you may have googled the closest SAS training facility and created a wish list that looks something like this:

Training for Base Programming Exam:
SAS Programming I: Essentials: $1500
SAS Programming II: Data Manipulation Techniques: $1800
OR
SAS Certification Review: Base Programming for SAS 9 (if you’re more advanced and would just like a review): $1000
base practice exam: $52
base certification exam: $180

Training for Advanced Programming Exam:
SAS Programming III: Advanced Techniques and Efficiencies: $1800
SAS Macro Language I: Essentials: $1200
SAS SQL I: Essentials $1200
advanced practice exam: $52
advanced certification exam: $180

SAS does offer certification packages which include practice exams and exam vouchers in addition to classroom-based training courses, which can help you save on your total cost. If you’re willing to substitute online courses for live classroom training, you’ll save even more.

But let’s say you want the deluxe classroom experience, and you’re doing the whole thing a la carte. For Base training plus certification, your total cost could be as much as $3532. For Advanced training plus certification, you could spend $4432. For both, that’s $7964, which is close to the yearly tuition reimbursement cap that my employer offers for grad school.

So what happens if you take your list to your manager, and she says it’s not in the budget for this year? You have two options:

1. Throw up your hands and use this as an excuse not to learn SAS.
2. Decide to find another way to learn SAS.

If you go with option 2, you’re going to have to foot the costs yourself, so you’re going to want to find the most cost-effective way possible to do this. It’s going to take more work on your part, but you can do this for free with a little creativity and the willpower to self-study.

Step 1: Get Free Training

My employer’s intranet site offers a link to Books 24×7 which allows employees to access all kinds of free training materials; check to see if your employer offers a similar resource. Books 24×7 includes both SAS Certification Prep Guides:

SAS Certification Prep Guide: Base Programming for SAS 9
SAS Certification Prep Guide: Advanced Programming for SAS 9

This is a great value to you since each prep guide has a list price of $129. If you insist on having a paper copy, look for deals on eBay, Half.com, or Amazon. Do the practice quizzes at the end of each chapter and open up your code editor window in SAS and practice using different functions and procedures on your own.

You can supplement these guides with other free SAS reference materials on support.sas.com (huge library of pdf files) or books 24×7. If manuals aren’t enough for you, take advantage of your local SAS user group meetings and explore the online SAS community. However, it is completely possible to use these techniques and pass your certification test without having to pay for any classroom instruction.

Step 2: Take the Exam

Each certification exam costs $180 at the time of this posting. If you’ve learned SAS through self-study, you’ve already saved your department a lot of money and made yourself a more valuable employee. You might not be able to get $4000 out of your department’s training budget, but you might be able to get the $180 exam fee reimbursed if you ask nicely.

If you can’t get your department to fork out the $180 for each exam, you can either choose to forgo certification (although I find having a concrete goal at the end of the learning process helps me to stay motivated), or you can pay it yourself and still get certified for a fraction of the original $7964.