Simple Random Sampling with Proc SurveySelect

A common way to create a random sample of n=1000 in SAS is to generate a random number field for each observation using RANUNI or a similar function. The data set is then sorted on that field and the top 1000 selected as the sample.

PROC SURVEYSELECT offers a simple alternative with just a few lines of code:

proc surveyselect data=Customers
method=srs n=1000 out=SampleSRS;
run;

The METHOD statement set equal to “srs” indicates that simple random sampling will be used. DATA= specifies the input data set, while OUT= specifies the output data set. N= is used to set the sample size, and an optional SEED= statement can be used if a particular seed is desired for generating the random number; otherwise the seed will default to the time of day from the computer clock. Default output will include the seed, selection probability, and sampling weight for each observation.

Alternatively, if you want to get a little fancier and play around with various sample sizes for different markets, I like to use macro variables when setting some of the parameters:

proc surveyselect data=work.total_elig_pop_&mkt
method=srs n=&size out=ci_share.sample_&mkt._new;
run;

There are many other options available with PROC SURVEYSELECT which you can use for more complex sampling. Additional SAS survey procedures for analyzing data created using complex sampling methods are discussed in one of my conference papers. For more about macro variables and how they make your code easier to maintain, see 10 Steps to Easier SAS Code Maintenance.

Granting SELECT Access to Your Tables

If you have created a table in your own personal schema in an Oracle database, others do not have permissions to access that object. You may at some point wish to allow SELECT access to another user. To do this in SAS, you can use PROC DBLOAD as follows:

proc dbload dbms=odbc;
user=’yourusername‘;
password=’password‘;
dsn=database; /* this is the name of your ODBC Connection to the database. */
sql grant select on yourschema.yourtablename to anotherusername;
run;

Or you can use explicit pass-through syntax to grant access:

proc sql;
connect to oracle (user=yourusername password=password);
execute (grant select on yourusername.yourtablename
to anotherusername) by oracle;
disconnect from oracle;
quit;

Finally, you can do this directly in the Oracle database itself just by executing the following statement which is passed through in the syntax above:

grant select on yourusername.yourtablename to anotherusername;

HASUG Meeting Notes: December 2012

HASUG’s 4th quarter meeting, featuring speakers Kevin Viel and Vinodh Paida, took place at Boehringer Ingelheim in Danbury, CT. PharmaSUG speaker Kevin Viel led with “Using the SAS System as a Bioinformatics Tool: A Macro That Calls the Standalone BLAST Setup”. Before sharing the macro, Viel began with some background on genomics and BLAST. A genome is all the genetic information about an organism; the human genome is the complete DNA of an individual person. DNA is a nucleic acid formed by a chain of nucleotides. These nucleotides are four possible bases (adenine, cytosine, guanine, and thymine) represented by A,C,T,G, or N for unknown. We are interested in the nucleotide sequences of DNA fragments (for example, AAAGTCTGAC), which can be used to identify genetic diseases in an individual or to find evolutionary relationships. Viel discussed four types of simple variations that can occur within a given nucleotide sequence: single substitution (AAAGTCTGAC vs. AAACTCCGAC), insertion (AAACTGCCGAC), deletion (AAAGTCTGAC vs. AAGTCTGAC), or inversion (AAAGTCTGAC vs. AAATGCTGAC).

Looking for similar sequences manually is a tedious, time-intensive process which can involve transcription errors. As an alternative, Viel discussed using regular expressions in SAS to look for matching sequences, allowing for one mismatching character such as a single nucleotide substitution in a strand. He then introduced a SAS macro to call BLAST, a sequence similarity tool from NCBI which can be downloaded or used interactively on the web. NCBI’s website defines the tool as follows: “The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.” Viel also described how to set up BLAST for Windows PC and configure the necessary environmental variables for the program to run successfully.

Following Viel’s presentation, Vinodh Paida of Accenture/Octagon shared “Data Edit Checks Integration Using ODS Tagset”, applicable to SAS versions 9.1.3 or higher. Although the paper was written specifically with regard to clinical trials data and reporting, it can generalize easily to other types of data and domains. First Paida summarized five types of commonly encountered data issues centering around invalid dates and missing data in clinical trials: partial dosing start and stop dates (checked for with the length function), future dates, subject with final summary data but missing stop date, adverse events with missing terms, and lab data with missing units but available results.

His SAS code contained blocks of edit checks for each scenario, followed by a macro to create a multi-sheet Excel workbook including a TOC listing with the selected edit checks, along with corresponding descriptions and sheet names. Problem records for each edit check are then output in different sheets of the workbook. The code is flexible to allow the user to select which edit checks to output to Excel. This presentation reminded me of an earlier HASUG presentation which inspired my post on how to create a data dictionary in Excel.

What is SAS Visual Analytics?

SAS Visual Analytics was highlighted at this year’s Global Forum opening session as one of the biggest developments for SAS in recent memory. In essence it is a powerful data visualization tool that uses a high-performance SAS LASR Analytic Server and a distributed computing environment to improve the data exploration and model development process by making it faster, more automatic, and adding a web-based, interactive user interface. Most users do not have access to this product since it is so new, but it can be advantageous to develop a knowledge of analytic products currently available in the industry.

According to SAS, Visual Analytics allows users to:

  • Visually explore huge amounts of data extremely quickly
  • Execute analytic correlations in seconds
  • Deliver results quickly wherever needed (V.A. supports web reports and mobile devices such as the iPad).

Users wishing to learn more about this new product offering from SAS can read about key features, system requirements, and access both screenshots and demos through the SAS Visual Analytics site: www.sas.com/technologies/bi/visual-analytics.html.