Last week I analyzed 12 million records of taxi cap transactions in New York City. As part of that analysis, I used a DATA step view to create a new variable, which was the ratio of the tip amount to the fare amount.
A novice SAS programmer told me that he has never heard of a "DATA step view." He asked, "What is a DATA step view?"
Simply put, a "view" is a SAS DATA step program that is stored for later execution. When the SAS language processor encounters the RUN statement, the program is compiled and saved, but not executed. When a procedure uses a data view, the program runs and serves data to the procedure as if the procedure were reading a regular SAS data set. Thus you can use a view to manipulate data "on the fly."
I like to create views when I want to construct a new variable in a huge data set, but I don't want to physically copy the data. When I analyze the data view by using another procedure, the constructed variable is computed on the fly.
Here's an example. Suppose that you have a large data set that includes heights and weights for millions of patients. Some (but perhaps not all) analysts at your company need to analyze the body-mass index (BMI) for these patients. You have two options. The first option is to create a new data set that has the new column in it. This requires that you duplicate the original data and add a new column. The second option is to keep the original data unchanged, but create a view that computes the BMI. Any analyst that needs the BMI can access the data by using the view.
Let's see how this works by using a small data set. The Sashelp.Class data set contains height and weight (in English measurements) for 19 children. The BMI formula for children is slightly more complicated than for adults, but for simplicity the following SAS program simply uses the adult formula. The following DATA step creates a data view by using the VIEW= option on the DATA statement. The MEANS procedure then analyzes the newly calculated BMI variable:
data BMI / view=BMI; /* define DATA step view */ set Sashelp.Class; BMI = weight / height**2 * 703; /* BMI formula for adults (pounds and inches) */ run; proc means data=BMI; var BMI; run;
As you can see, the syntax for the MEANS procedure does not change. In fact, the only syntax that changes is the VIEW= option in the DATA step.
When SAS encounters the RUN statement in the DATA step, it saves the program. The program is not executed until it is used in the DATA= option in PROC MEANS. At that point, SAS executes the program and computes the BMI variable, which the procedure consumes and analyzes.
There are three main advantage of a data view: reduced storage space, not cluttering up a data set with extra variables, and if the view uses data that is updated regularly (for example, nightly sales data) then the view always reads the current data.
Three advantages to DATA step views #SAStip
Click To Tweet
The main disadvantage to DATA step views is that the computed columns must be recomputed every time that a procedure uses the view. If you specify DATA=BMI for additional procedures, the BMI variable is recomputed each time, which is slower than reading pre-computed data.
For more information about SAS data views, see
- The SAS documentation about DATA step views
- For an overview of other kinds of views, including SQL views, see Palmer (2003) "Using SAS Views and SQL Views."
Do you use DATA step views in your company? Leave a comment.