IF DO/ ELSE DO Programming in SAS

Most SAS programmers default to the tried and true IF/ELSE syntax to recode variables. There are occasions, however, when your programming tasks demand that you write repetitive IF statements that can clutter up your code. SAS also offers a more efficient way to structure your recodes using IF DO / ELSE DO syntax. Using “DO” syntax, we can execute entire blocks of code conditionally instead of writing a series of single IF statements. As always, an example will clarify. Let’s say we want to recode two categorical variables into a series of dummy variables (variables that have a value of 1 or 0). For this example, we will use a dataset that contains a categorical variable for race and Hispanic origin:

/root/mydata/demog.sas7bdat

id hispanic race
1 1 1
2 1 2
3 2 3
4 2 4
5 1 5
6 1 2
7 1 1
8 2 4

Here is a mini record layout and data dictionary:

ID: Unique identifier for each person
hispanic: Hispanic Origin of the respondent
race: Race of Respondent

Hispanic:
1 = Hispanic
2 = Non-Hispanic

Race:
1 = White
2 = Black
3 = Native American
4 = Asian
5 = Native Hawaiian

Our task here will be to recode Hispanic origin into a dummy variable, and also recode the race variable into a series of non-Hispanic race dummy variables (non-Hispanic white, non-Hispanic black, etc.). This will ensure that all of the new dummy variables are mutually exclusive.

Here is how we would accomplish this task using normal IF/ELSE logic.


/* Location of input data */
libname dsn "/root/mydata";

/* Start Datastep, begin recodes */
data recodes;
set dsn.demog;

/* Hispanic Origin */
if hispanic = 1 then hisp = 1 ;
else hisp = 0 ;

/* Non Hispanic white */
if hispanic = 2 and race = 1 then nhwht = 1 ;
else nhwht = 0 ;

/* Non Hispanic black */
if hispanic = 2 and race = 2 then nhblk = 1 ;
else nhblk = 0 ;

/* Non Hispanic Native American */
if hispanic = 2 and race = 3 then nhnat = 1 ;
else nhnat = 0 ;

/* Non Hispanic Asian */
if hispanic = 2 and race = 4 then nhasn = 1 ;
else nhasn = 0 ;

/* Non Hispanic Native Hawaiian */
if hispanic = 2 and race = 5 then nhhaw = 1 ;
else nhhaw = 0 ;

run;


While there is nothing wrong with this code, it is repetitive in one key respect: the IF statement begins the exact same way for each of the five race recodes.


/* Location of input data */
libname dsn "/root/mydata";

/* Start Datastep, begin recodes */
data recodes;
set dsn.demog;

/* Hispanic Origin */
if hispanic = 1 then hisp = 1 ;
else hisp = 0 ;

/* Non Hispanic white */
if hispanic = 2 and race = 1 then nhwht = 1 ;
else nhwht = 0 ;

/* Non Hispanic black */
if hispanic = 2 and race = 2 then nhwht = 1 ;
else nhwht = 0 ;

/* Non Hispanic Native American */
if hispanic = 2 and race = 3 then nhwht = 1 ;
else nhwht = 0 ;

/* Non Hispanic Asian */
if hispanic = 2 and race = 4 then nhwht = 1 ;
else nhwht = 0 ;

/* Non Hispanic Native Hawaiian */
if hispanic = 2 and race = 5 then nhwht = 1 ;
else nhwht = 0 ;

run;


We can remove this code repetition by using IF DO / ELSE DO logic instead of standard IF statements. Here is how we would translate our sample code into this style:


/* Location of input data */
libname dsn "/root/mydata";        

/* Start Datastep, begin recodes */
data recodes;
set dsn.demog;

/* Hispanic Origin. */
if hispanic = 1 then do;

hisp = 1 ;
nhwht = 0 ;
nhblk = 0 ;
nhnat = 0 ;
nhasn = 0 ;
nhhaw = 0 ;               

end;                     

else if hispanic = 2 then do;

hisp = 0 ;
if race = 1 then nhwht = 1 ;
else nhwht = 0 ;

if race = 2 then nhblk = 1 ;
else nhblk = 0 ;

if race = 3 then nhnat = 1 ;
else nhnat = 0 ;

if race = 4 then nhasn = 1 ;
else nhasn = 0 ;

if race = 5 then nhhaw = 1 ;
else nhhaw = 0 ;

end;
run;                         


Code Review

This style may seem much more intricate than what we previously saw, but if we study it closely, it actually is a much more straight-forward approach. We begin with an IF condition that starts with the word IF, and ends with the word DO. In this case, everything between the DO and END statements will execute if the “Hispanic” variable is equal to 1. Because this coding block will executes when Hispanic is equal to 1, we can initialize the “hisp” variable to 1 and the non-Hispanic race variables to 0.

/* Location of input data */
libname dsn "/root/mydata";        

/* Start Datastep, begin recodes */
data recodes;
set dsn.demog;

/* Hispanic Origin. */
if hispanic = 1 then do;

hisp = 1 ;
nhwht = 0 ;
nhblk = 0 ;
nhnat = 0 ;
nhasn = 0 ;
nhhaw = 0 ;               

end;  

The next coding block will execute when Hispanic is equal to 2. To build in the efficiency to the if statements, we begin the statement with ELSE and end with DO. This means that SAS will not evaluate this DO block if the prior DO block evaluates to true. As before, this coding block will end with the word END. Because Hispanic is equal to 2 in this DO block, we set hisp equal to 0 and code all of the race variables with a series of IF statements. Again, because these if statements will only execute when Hispanic is equal to 2, we do not need to include Hispanic into the if test.

/* Location of input data */
libname dsn "/root/mydata";        

/* Start Datastep, begin recodes */
data recodes;
set dsn.demog;

/* Hispanic Origin. */
if hispanic = 1 then do;

hisp = 1 ;
nhwht = 0 ;
nhblk = 0 ;
nhnat = 0 ;
nhasn = 0 ;
nhhaw = 0 ;               

end;

else if hispanic = 2 then do;

hisp = 0 ;
if race = 1 then nhwht = 1 ;
else nhwht = 0 ;

if race = 2 then nhblk = 1 ;
else nhblk = 0 ;

if race = 3 then nhnat = 1 ;
else nhnat = 0 ;

if race = 4 then nhasn = 1 ;
else nhasn = 0 ;

if race = 5 then nhhaw = 1 ;
else nhhaw = 0 ;

end;
run;

Summary

IF DO / ELSE DO syntax is a helpful alternative when your programming task requires that you write a long series of IF conditions. With “DO Blocks” (as I call them) SAS will execute whole chunks of code when a logical condition is evaluated as true. If SAS evaluates the DO condition as false, then SAS will control skip over all the code embedded within the DO block. This means that we retain all of the efficiency of standard IF tests while also removing repetitive code.

A DO block always begins the same way, either with an IF or and ELSE. The end of the logical test ends with the word DO. This will tell SAS that everything beneath the word DO will execute if the logical test is true, and to control skip over all of the code if the test is false. How does SAS know where the DO block ends? All DO blocks terminate with the word “end”.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s