The Libname Statement

In many data processing languages, opening and editing data is a fairly simple task, and can usually be done by navigating to a directory and double-clicking on a dataset. This loads the data into memory, and all of the commands you execute thereafter manipulate the data.  Even if you use commands to load data into memory, such as the “use” command in stata, all one needs to do is specify a full file path along with the name of the data file, and you are basically done.

In SAS the process is not so simple, and requires some understanding of how SAS works. As long as you have a SAS formatted dataset ready to use, the first step in reading data into SAS is with the LIBNAME statement. The libname statement allows SAS to:

  1. Specify the location (or directory path) of an input dataset
  2. Save a dataset to a directory of your choosing

After you tell SAS where the location of the input data file is with a libname statement, you will be able to read the file into the program with a DATA step (see a later post for a more detailed explanation of datastepping). As an example, let’s pretend that we want to read a dataset called “income” into a SAS program. Here a are a few sample records:

income.sas7bdat:

id     name        hr_wages
01     sam          35
02     sam II       21
03     sam III     29

Let’s say that the income dataset is stored in this directory on your computer:

C:\mydocs\sasdata

Sample Program

Here is an example of how to read the data into your program using a libname and datastep.


/* Comment: Specify location of the input dataset */
LIBNAME samg "C:\mydocs\sasdata";

/* Check the number of observations, variable names, and  */
/* variable labels using the CONTENTS procedure    */

PROC CONTENTS DATA = samg.income; RUN;
/* COMMENT: Read data into a SAS datastep */

data inc;
set samg.income;

/* you can now create more SAS variables  */ 
/* in the body of the datastep.    */    

run ;

/* Comment: Print all of the rows of data using PRINT procedure */
PROC PRINT DATA = inc; run;

Code Review

We begin the program by specifying a LIBNAME statement:


LIBNAME samg "C:\mydocs\sasdata";

A LIBNAME statement will always have three components to it in the following order:

  1. The literal text LIBNAME
  2. A LIBREF – or a “nickname” that you are assigning to the path. In my example, I refer to my file path as “samg”.
  3. A directory path surrounded in quotation marks.

The libref part of the libname statement is crucial. This is the shorthand name we will use when we tell SAS where to go to fetch data and read it into a SAS datastep or procedure. This means that instead of typing out a full file path, we can just type the libref instead. In order to use a libref, you first type the libref, followed by a period, followed by the name of the dataset you would like SAS to process. Here is an example of how I use a libref with the contents procedure:


PROC CONTENTS DATA = samg.income; RUN;

Because there is a libref here in the contents procedure, SAS knows where to look for the “income” dataset :  C:\mydocs\sasdata. Just as an aside, the contents procedure is a helpful SAS tool that outputs the number of observations, along with all of the variable names and labels for a dataset. It is a helpful way to list every thing you need to get started in a program.

Next, I read the income dataset into a datastep. In this datastep, I create a new dataset called “inc”, and use the income dataset as my input dataset. The green comments in this datastep are where you would write code to create additional variables. Do not worry if some of this is over your head – we will talk more about datastepping in a later post. I bold the libref to emphasize how SAS knows where to go on your computer to find the income dataset.


data inc;
set samg.income;

/* you can now create more SAS variables  */ 
/* in the body of the datastep.  */    

run ;

We can print out the new dataset in its entirety with the PRINT procedure. In this case, the datastep did not create any new variables, so the output dataset (inc) is a copy of the original. Because we did not save the dataset as a permanent dataset, we do not need to place a libref in front of the dataset name, “inc.”

PROC PRINT DATA = inc; run;

id     name        hr_wages
01     sam          35
02     sam II       21
03     sam III     29

Further Notes: Outputting datasets and using more than one LIBNAME

One key feature of SAS is that you are not bound to one dataset. SAS can handle multiple datasets in one session – it is one of the main advantages to using SAS. As such, you can have multiple libnames, too. And you can use one of these libnames to tell SAS to create a permanent dataset. As a rule, if you do not specify a libref before a dataset name in a datastep, the dataset will exist in memory only and will disappear after the program is finished running. If you place a libname in front of a dataset name, SAS will output the dataset to that directory and save it as permanent. Here is a short program to demonstrate this style of inputting and outputting data. The bolded librefs should make the input and output pattern clear.


LIBNAME samg "C:\mydocs\sasdata";
LIBNAME outp  "C:\mydocs\outdata";

data outp.inc;
set samg.income;

/* you can now create more SAS variables  */ 
/* in the body of the datastep.   */   

run ;

To review, I create two file paths with two libname statements and named them samg and outp, respectively. I specify the input dataset with the “set” statement, and the name of the new dataset with the “data” statement. Because the libref “outp” is placed in front of the dataset name “inc”, SAS will create a dataset called “inc” and will save it to the directory “C:\mydocs\outdata” as permanent.

Summary

LIBNAMESs are the primary way through which SAS inputs and outputs datasets as permanent. A LIBNAME always contains three components:

  1. The literal text “LIBNAME”
  2. A libref, or a short name that you are assigning a directory.
  3. A directory path that will point SAS to the location of an input file, or the location where you would like a permanent dataset to be saved.

Just as you can have more than one dataset in a SAS session, you can have more than one libname, too. Libref’s point SAS to the location of your input data. You use a libref by placing the name of the libref (followed by a period) before the input dataset name in either a SAS procedure or data step.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s