A Gentle Introduction to the Datastep, Pt. 2

In a previous post, I discussed the basics of data stepping in SAS. I discussed how to start a datastep; how to create new variables in a datastep; how to create multiple datasets in a SAS program; and how the SAS datastep operates in the background. Here is a link to this post if you would like to follow up:

A Gentle Introduction to the Data Step

In this post, I hope to cover some additional concepts that I consider basic to data stepping in SAS: 1) Creating permanent datasets with a datastep; 2) The output statement; 3) Creating multiple datasets with selective outputs.

Creating a Permanent Dataset with a Data Step

When we create a dataset using a datastep, the dataset will exist in memory so that SAS can reference it again with other data steps and SAS procedures. However, once the program is finished, SAS will delete all temporary datasets that exist in memory. What if we want the output of a data step saved permanently? We simply combine the libref with the dataset name in the “data” statement of a data step. Here is an example:


libname inpt "/root/input_data/income";
libname outp "/root/output_data";


data outp.income ;
set inpt.inc;

total_inc = Salaray + bonus ;
label total_inc = "Sum of Salary and Bonus";
run;


Code Review

We begin by specifying two libname statements. The first libname has a libref called “inpt” and points to the directory containing the input data for the data step. The second libname as a libref called “outp” and points to the directory where we would like to save dataset we are creating.

libname inpt “/root/input_data/income”;
libname outp “/root/output_data”;

We next start a datastep by writing the word “data.” In order to save the dataset as permanent, we write in the name of a libref, followed by a period, then followed by the name of the dataset we would like to create. Writing a libref in front of the dataset name instructs SAS to write the output dataset to the directory to which the libname points.

data outp.income ;
set inpt.inc;

total_inc = Salaray + bonus ;
label total_inc = "Sum of Salary and Bonus";

run;

The Output Statement

When SAS reaches the end of a datastep and encounters no more data step code, it outputs the observation to the output dataset. In reality, though, it is the “output” statement that instructs SAS to output the observation. In fact, if there are no output statements in data step code, then SAS places an “invisible” output statement before the run statement. So when we execute the program in the example above, SAS will actually read it as:

data outp.income ;
set inpt.inc;

total_inc = Salaray + bonus ;
label total_inc = "Sum of Salary and Bonus";

output;

run;

Because this program is simple enough, the code will execute the same way whether or not we place an output statement in the program. The convention among most SAS programmers is to leave out output statements unless the program necessitates it. That being said, let’s take a look at a practical application of the output statement.

Creating Multiple Datasets with the Output Statement

One of the data step’s many functionalities is creating multiple datasets. As an example, let’s create separate datasets for men and women using the income data:

id sex salary bonus
1 M 10 1
2 M 20 1.5
3 M 30 2
4 F 15 2
5 F 30 2
6 F 20 1

In each of these datasets, we will create a new variable called income, which is the sum of salary and bonus.



libname inpt "/root/input_data"; /* Location of input data*/

/* Create Separate datasets for men and woment */
data male female;
set inpt.inc;

income = salary + bonus;

/* Selectively output records */
if sex= "M" then output male;
else if sex= "F" then output female;

run;


Code Review

We begin this program like many others: with a libname pointing to the location to the input data:

libname inpt “/root/input_data”; /* Location of input data*/

If you read the code carefully, you probably noticed something curious: the “data” statement had two datasets listed. In this situation, SAS will create two datasets, one called “male” the other named “female.”

data male female;
set inpt.inc;

We next create a variable called income by adding together the bonus and salary variables. What is important to note here is that if we were to just end the data step at this point, the male and female datasets would be carbon copies of one another. To separate out the male and female records to their respective datasets, we need a way to instruct SAS where to output the records. In the example, I use “if” statements in combination with the output statement:

if sex= “M” then output male;
else if sex= "F" then output female;

We will go over the workings of the if/else syntax in another post. For now, just know that the if statement performs a logical test, and then executes an action if the test resolves to true. In this example, if the sex variable has a value of “M”, SAS will output the record to the “male” dataset. If the variable has a value of “F”, SAS will ignore the first if statement, proceed to the “else if” statement, and then output the record to the female dataset.

 

Summary

Just as SAS can create temporary datasets in memory, it can also save the datasets it creates to a permanent storage device on your PC. You can create a permanent dataset in SAS by placing a libref in front of a data set name in a datastep.

The output statement instructs SAS to output a record in a datastep to an output data set. If you do not specify an output statement in a data step, SAS will place an output statement for you before the “run” statement when your code executes.

A datastep can create multiple datasets by placing multiple data set names in the “data” statement. You can selectively output records to these datasets by using logical tests like the “if” statement combined with output statements.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s