Tag Archives: ODS

How to Use ODS to Zip Files in SAS 9.2

Starting with SAS version 9.2, you can use ODS to create zip files in SAS without having to use platform-specific commands (such as Unix commands if your files are on a Unix server). I used ODS to zip a file on my C drive in four steps:

  1. First, I told SAS that I wanted to create a new package. You don’t need to name your package unless you are going to create multiple packages:
    ods package open nopf;
  2. Next, I added my file to the package. You can add multiple files, but I recommend that you only zip a single file at a time. Also note that unless you add the file extension to the full path, your SAS program will run with no errors, but the resulting zip file will be empty:
    ods package add file=’C:\Users\C31497\Desktop\projects\h138.sas7bdat’;
  3. Then I published the zip file and gave it a name. The zip file appears by default in the same path as the original file, but you can specify a different path using the archive_path argument when you assign the properties. If there is an existing zip file with the same name, it will be overwritten:
    ods package publish archive properties(archive_name=’h138.zip’);
  4. Finally, don’t forget to close the ods destination after you’re finished. You may also wish to drop your original (unzipped) file, since this is not done automatically after the zip file is created:
    ods package close;

Here’s what it looks like when you put it together:

ods package open nopf;
ods package add file=’C:\Users\C31497\Desktop\projects\h138.sas7bdat’;
ods package publish archive properties(archive_name=’h138.zip’);
ods package close;

There is still no way to unzip your files outside of using platform-specific syntax, so hopefully SAS is working on that for a future version.

HASUG Meeting Notes: December 2012

HASUG’s 4th quarter meeting, featuring speakers Kevin Viel and Vinodh Paida, took place at Boehringer Ingelheim in Danbury, CT. PharmaSUG speaker Kevin Viel led with “Using the SAS System as a Bioinformatics Tool: A Macro That Calls the Standalone BLAST Setup”. Before sharing the macro, Viel began with some background on genomics and BLAST. A genome is all the genetic information about an organism; the human genome is the complete DNA of an individual person. DNA is a nucleic acid formed by a chain of nucleotides. These nucleotides are four possible bases (adenine, cytosine, guanine, and thymine) represented by A,C,T,G, or N for unknown. We are interested in the nucleotide sequences of DNA fragments (for example, AAAGTCTGAC), which can be used to identify genetic diseases in an individual or to find evolutionary relationships. Viel discussed four types of simple variations that can occur within a given nucleotide sequence: single substitution (AAAGTCTGAC vs. AAACTCCGAC), insertion (AAACTGCCGAC), deletion (AAAGTCTGAC vs. AAGTCTGAC), or inversion (AAAGTCTGAC vs. AAATGCTGAC).

Looking for similar sequences manually is a tedious, time-intensive process which can involve transcription errors. As an alternative, Viel discussed using regular expressions in SAS to look for matching sequences, allowing for one mismatching character such as a single nucleotide substitution in a strand. He then introduced a SAS macro to call BLAST, a sequence similarity tool from NCBI which can be downloaded or used interactively on the web. NCBI’s website defines the tool as follows: “The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.” Viel also described how to set up BLAST for Windows PC and configure the necessary environmental variables for the program to run successfully.

Following Viel’s presentation, Vinodh Paida of Accenture/Octagon shared “Data Edit Checks Integration Using ODS Tagset”, applicable to SAS versions 9.1.3 or higher. Although the paper was written specifically with regard to clinical trials data and reporting, it can generalize easily to other types of data and domains. First Paida summarized five types of commonly encountered data issues centering around invalid dates and missing data in clinical trials: partial dosing start and stop dates (checked for with the length function), future dates, subject with final summary data but missing stop date, adverse events with missing terms, and lab data with missing units but available results.

His SAS code contained blocks of edit checks for each scenario, followed by a macro to create a multi-sheet Excel workbook including a TOC listing with the selected edit checks, along with corresponding descriptions and sheet names. Problem records for each edit check are then output in different sheets of the workbook. The code is flexible to allow the user to select which edit checks to output to Excel. This presentation reminded me of an earlier HASUG presentation which inspired my post on how to create a data dictionary in Excel.

Book Review: Web Development with SAS by Example

Title: Web Development with SAS by Example, 3rd edition
Author: Frederick Pratter
Publisher: SAS Publishing
Pages: 354 pages
Available: September 2011

This ambitious volume covers how to deliver your SAS output online from start to finish in a mere 360 pages. Pratter assumes his audience has no prior knowledge of web programming, giving a thorough introduction in his first four chapters to the basics of HTML and XML, static vs. dynamic web pages, and how the internet works along with some background history on TCP/IP, different types of web servers, and a whole host of acronyms. Chapters 5 and 6 in Part II outline different ways to access your data, focusing on SAS/SHARE and SAS/ACCESS, with examples of how to use SQL pass-through for both and information to help the reader in selecting an appropriate method of access. I found the section on OLEDB/ODBC here interesting as well. Part III goes on to introduce SAS/IntrNet, Part IV devotes five chapters to SAS BI Server, and the book concludes with some Java.

One of the strengths of this book is that Pratter throughout shows multiple ways of displaying and accessing the same data, for example contrasting various “old school” programming methods with ODS HTML statements and Proc Access vs. the newer SAS/Access interface. Such examples demonstrate how SAS has evolved since its earlier versions and may be of interest to both experienced and newer programmers. A challenge of this book is that a lot of SAS users are not familiar with administrative aspects such as server configurations, including TCP, and may find some of this material harder to understand.