Tag Archives: SAS

HASUG Meeting Notes: December 2012

HASUG’s 4th quarter meeting, featuring speakers Kevin Viel and Vinodh Paida, took place at Boehringer Ingelheim in Danbury, CT. PharmaSUG speaker Kevin Viel led with “Using the SAS System as a Bioinformatics Tool: A Macro That Calls the Standalone BLAST Setup”. Before sharing the macro, Viel began with some background on genomics and BLAST. A genome is all the genetic information about an organism; the human genome is the complete DNA of an individual person. DNA is a nucleic acid formed by a chain of nucleotides. These nucleotides are four possible bases (adenine, cytosine, guanine, and thymine) represented by A,C,T,G, or N for unknown. We are interested in the nucleotide sequences of DNA fragments (for example, AAAGTCTGAC), which can be used to identify genetic diseases in an individual or to find evolutionary relationships. Viel discussed four types of simple variations that can occur within a given nucleotide sequence: single substitution (AAAGTCTGAC vs. AAACTCCGAC), insertion (AAACTGCCGAC), deletion (AAAGTCTGAC vs. AAGTCTGAC), or inversion (AAAGTCTGAC vs. AAATGCTGAC).

Looking for similar sequences manually is a tedious, time-intensive process which can involve transcription errors. As an alternative, Viel discussed using regular expressions in SAS to look for matching sequences, allowing for one mismatching character such as a single nucleotide substitution in a strand. He then introduced a SAS macro to call BLAST, a sequence similarity tool from NCBI which can be downloaded or used interactively on the web. NCBI’s website defines the tool as follows: “The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.” Viel also described how to set up BLAST for Windows PC and configure the necessary environmental variables for the program to run successfully.

Following Viel’s presentation, Vinodh Paida of Accenture/Octagon shared “Data Edit Checks Integration Using ODS Tagset”, applicable to SAS versions 9.1.3 or higher. Although the paper was written specifically with regard to clinical trials data and reporting, it can generalize easily to other types of data and domains. First Paida summarized five types of commonly encountered data issues centering around invalid dates and missing data in clinical trials: partial dosing start and stop dates (checked for with the length function), future dates, subject with final summary data but missing stop date, adverse events with missing terms, and lab data with missing units but available results.

His SAS code contained blocks of edit checks for each scenario, followed by a macro to create a multi-sheet Excel workbook including a TOC listing with the selected edit checks, along with corresponding descriptions and sheet names. Problem records for each edit check are then output in different sheets of the workbook. The code is flexible to allow the user to select which edit checks to output to Excel. This presentation reminded me of an earlier HASUG presentation which inspired my post on how to create a data dictionary in Excel.

What is SAS Visual Analytics?

SAS Visual Analytics was highlighted at this year’s Global Forum opening session as one of the biggest developments for SAS in recent memory. In essence it is a powerful data visualization tool that uses a high-performance SAS LASR Analytic Server and a distributed computing environment to improve the data exploration and model development process by making it faster, more automatic, and adding a web-based, interactive user interface. Most users do not have access to this product since it is so new, but it can be advantageous to develop a knowledge of analytic products currently available in the industry.

According to SAS, Visual Analytics allows users to:

  • Visually explore huge amounts of data extremely quickly
  • Execute analytic correlations in seconds
  • Deliver results quickly wherever needed (V.A. supports web reports and mobile devices such as the iPad).

Users wishing to learn more about this new product offering from SAS can read about key features, system requirements, and access both screenshots and demos through the SAS Visual Analytics site: www.sas.com/technologies/bi/visual-analytics.html.

How to View Remote Server Files on Your PC

Many of us create and store files on remote servers. One way of viewing the files you create is to use Proc Download to export them to your local libraries, but downloading large files can be time-consuming. Fortunately, there is an easier way.

First, I connect to the server and rsubmit my libname statement with the path to its location on the server; then I submit a second libname statement locally using the same libref. After submitting the following code, I can open Explorer, navigate to Libraries, and the library I specified on the remote server is now available for me to browse.

rsubmit;
libname WHP “/projects/hedis/hedis/whp”;
endrsubmit;
/*submit locally*/
libname WHP server=remoteservername; /*view remote server files on local machine*/

In-Database Processing with SAS 9.2

There are some new SAS In-Database processing features available with version 9.2. In addition to the explicit pass-through syntax with Proc SQL option, SAS has expanded its capacity to generate DBMS-specific or “native” SQL for Proc SQL and even non-SQL procedures to do more processing within the native databases and minimize I/O. Seven non-SQL procedures are now supported in Oracle and DB2, including Proc Freq, Proc Means, Proc Summary, Proc Tabulate, Proc Report, Proc Rank, and Proc Sort (additional procs are supported in Teradata). System options to enable additional output to the log are SASTRACE (see previous post) and SASTRACELOC. Use system options DIRECT_SQL and SQLGENERATION to enable and disable in-database processing and try comparing log output with and without in-database processing.

BASUG Meeting Notes: September 2012

I attended the third quarter BASUG (Boston Area SAS User Group) meeting in Cambridge, MA at the Microsoft NERD center on September 20th. Morning speakers included Craig Dickstein of Tamarack Professional Services and Paul Gorrell of IMPAQ International. Craig Dickstein is one of the authors of Health Care Data and SAS and has worked with Cigna as a HEDIS code reviewer. An afternoon training on using SAS to analyze publicly available healthcare data sets was also led by Paul Gorrell.

The entire day focused on healthcare data, with the following presentations: “Data Hygiene Routines for Administrative Healthcare Data”, “Calculating the Hospital Readmission Interval”, and “Using SAS to Generate Estimates of U.S. Prescription Drug Cost and Use”. Dickstein’s presentation on “data hygiene routines” included a useful overview of the architecture of ICD-9 diagnosis codes, CPT codes (categories I-III), and HCPCS procedure codes. His code samples demonstrated how to create procedure code lookup tables with Proc Format and use these lookup tables to identify bad values. His second presentation described the challenges of calculating re-admission intervals and presented some alternatives to using the LAG function, including a detailed discussion of how the Program Data Vector (PDV) works in SAS. Finally, Paul Gorrell discussed how to replicate the numbers found in commonly cited statistics such as “5% of Americans make up 50% of U.S. health care spending” by using SAS/STAT survey procedures such as Proc Surveyfreq and Proc Surveymeans in combination with the HRQ data files available from the Medical Expenditure Panel Survey (MEPS). The next quarterly meeting is scheduled for December 11th, 2012.