Tag Archives: _ERROR_

Two Interview Questions for SAS Programmers

I have a colleague who asks the same two questions every time he interviews a SAS programming consultant for an opening on his team. His first question has to do with the program data vector (pdv). His logic is that understanding the pdv is central to understanding how SAS processes data. Since he is a data step programmer, I also suspect that he uses this question to weed out programmers who rely extensively on PROC SQL, but lack a basic understanding of how the DATA STEP works. I am certainly guilty of preferring PROC SQL over DATA STEP programming, but I do know about the pdv and agree that it is important to understand how SAS processes data in order to make sure your programs run efficiently. Unlike querying with SQL in a relational database, where you want to use a set-based, non-sequential approach to maximize efficiency, SAS processes data sequentially. Understanding this procedural/iterative processing vs. the familiar set-based approach is important for someone coming from SQL to SAS.

When SAS compiles a data step, it creates the program data vector, which contains the automatic variables _N_ and _ERROR_. These variables are not written to the output data set, but they can be accessed and output to another variable or to the log; _N_ stores the number of times the data step has iterated during execution, while _ERROR_ is a binary variable set to 0 unless an error occurs during execution, in which case it is set to 1. As a data step executes and reads the input file, the current observation is read to the pdv, _N_ increments by 1, and the observation gets written to the output file. Then the program loops back to the top of the data step, and the process iterates until it reaches the end of the file, or until an error occurs. You can use this knowledge of the data step and the automatic variables to debug your programs — for example, you can write an observation containing an error to the log using the following statement:

if _error_=1 then put _infile_;

You can also use _N_ as an observation counter, if you are only reading in one observation for each iteration of the data step (it is important to remember that _N_ actually represents the number of iterations of the data step, which will often equal the number of observations read, but not always).

The second question for the prospective SAS programmer is to see if they can name three system options that are useful for debugging code containing macros. These are MPRINT, MLOGIC, and SYMBOLGEN. SYMBOLGEN automatically resolves macro variables used in submitted statements and prints them to the log; this helps you see at a glance if your macro variables are resolving to the values you expect. MPRINT outputs the full macro statements executed when you call a macro and prints them to the log. MLOGIC helps trace the execution of a macro by writing messages to the log at various stages of execution, so you can see more easily where a macro fails.