SAS tips

7月 282012
 
In honor of the 2012 Olympics, this week’s SAS tip is from award-winning UK author Phil Mason and his book In the Know...SAS Tips and Techniques from Around the Globe, Second Edition. SAS user Ron Fehd proclaimed Phil Mason "one of the SAS user community's famous creative tinkerers--the SAS user version [...]
11月 182011
 

Rick posted a tip today about using abbreviations in the SAS program editor window (often referred to as the "enhanced editor"). Defining abbreviations is a great way to save keystrokes and re-use "templates" of code that you've squirreled away. (One of Rick's readers also picked up on the tip, and added it to his blog.)

If you use SAS Enterprise Guide 4.3, you can define those abbreviations even easier, and you can call them up just like any other part of the SAS syntax, using the autocomplete features of the program editor. Here's how.

1. Open a SAS program window (by opening an existing program or select File->New->Program)

2. Select Program->Add Abbreviation Macro

3. In the Add Abbreviation Macro window, type a short name for your code snippet, and then add the code you want to substitute when the abbreviation is triggered. Bonus: the code field in this window also features the SAS program editor that helps you to complete the valid syntax.

4. To use the abbreviation, simply type the abbreviation within your program. The editor will automatically "suggest" the abbreviation as a possibility as you type, and you can press the spacebar to commit the selection, just as with any other suggested keyword. If you hover the mouse cursor over the suggestion, you can see a preview of the text that will be substituted in. Note that the abbreviation entry shows as a special item (green diamond instead of blue square in this case), as another hint that this element is different than the built-in syntax. (Were the autocomplete icons inspired by our bowl of Lucky Charms one morning? I'll never tell.)

You don't have to rely on the autocomplete feature of the editor to get to your abbreviation. You can do as Rick suggests and assign a shortcut key by selecting Program->Enhanced Editor Keys. However, I'd stay away from Ctrl+I, as that currently maps to what I call the "indenter servant" -- the SAS code formatter. (But hey, you can change that key assignment too!)

tags: abbreviations, SAS Enterprise Guide, SAS program editor, SAS programming, SAS tips
11月 162011
 

I've been working with date-time data on a recent project, and I've come across a few SAS programs that have "opportunity for improvement" when it comes time to create reports.

(Or maybe I haven't, and I contrived this entire blog post so that I could reference one of my favorite quotes from the movie Animal House.)

Suppose we have a data set that contains a collection of measurements over a one-year period, each with a time stamp that shows when the measure was captured. Here is a DATA step program that creates a sample data set, generating just over 300,000 rows of random measurements (with a bit of variation built in for visual interest later):

/* Create a year's worth of timestamp data */
data measurements (keep=timestamp measure);
 now = datetime();
 /* go back one year - minus number of seconds in a year */
 /* remember, SAS datetime value is number of seconds */
 /* since 01JAN1960 */
 yearago = now-31556926;
 length timestamp 8;
 format timestamp datetime20.;
 do val = yearago to now;
   if (ranuni(0) < 0.01) then
    do;
     /* add a few more records on certain days of week */
     if (weekday(datepart(val)) in (1,2) and ranuni(0)<.5) then do;
      timestamp = val;
      measure = 5 + log(ranuni(0));
      output;
     end;
     timestamp = val;
     measure = 5 - log(ranuni(0));
     output;
   end;
 end;
run;

The "classic" method for reporting on this data is to use a line plot to show the time series, as shown by this program and plot:

ods graphics / width=800 height=300;
proc sgplot data=measurements;
 series x=timestamp y=measure ;
 xaxis minor label="Time of measure";
 yaxis min=-8 max=20 label="Measurement";
run;


But sometimes we want to report further on the characteristics of the data, to answer questions such as "how many measurements were captured for each month? for each quarter? or for each day of the week?"

Creating categories the hard way

Sometimes while creating these reports, programmers decide to prepare the data further by creating new categorical variables (such as a variable for the day of the week, or for the month and year). For example, I've seen snippets such as this:

  /* numeric day of week */
  day = weekday(datepart(timestamp)); 
  /* value such as "2011_3" for March 2011 */
  year_month = cats(year(datepart(timestamp)),'_',month(datepart(timestamp)));

But there are problems with this approach, especially with the "year_month" example. First, it requires another pass through the data and additional storage space for the output. And in the "year_month" case, the result is a character variable, which no longer retains the ability for sorting the output in chronological order. (Yes, with a more clever construction we could make sure that there is a leading zero for single-digit months, and that would help...but there is an easier way.)

Use SAS formats instead of creating new variables

The person who wrote the above code forgot about the power of SAS formats. Using SAS formats, you can "recast" your variable for analysis to create a category from values that are otherwise continuous. You don't have to modify the original data at all, and it doesn't require you to make an additional copy.

For example, to create summary statistics for the measurements classified into month "buckets", you can apply the DTMONYY5. format within a PROC MEANS step. The DTMONYYw. format takes a date-time value and converts it to a month-year appearance. SAS procedures such as MEANS, FREQ, SGPLOT, and most others will use the formatted value for classification. You can see how this program produces a report of statistics classified by month-year:

title "Measures by month";
proc means data=measurements min max mean n;
 /* tells the procedure to bucket values into monthly bins */
 format timestamp dtmonyy5.;
 label timestamp = "Month/Year";
 class timestamp;
 var measure;
run;


With a minor change to use DTYYQC4. instead, we can produce a report of stats by quarter:

title "Measures by quarter";
proc means data=measurements min max mean n;
 /* tells the procedure to bucket values into quarterly bins */
 format timestamp dtyyqc4.;
 label timestamp = "Quarter";
 class timestamp;
 var measure;
run;


This isn't limited to tabular output. Applying a format can also help to simplify plots as well. Here is an example of an SGPLOT step that creates a frequency report of the number of measures for each day of the week, thanks to the DTWKDATX8. format:

title "Number of Measures by Day of Week";
ods graphics / width=400 height=300;
proc sgplot data=measurements;
 /* tells the procedure to bucket values into day-of-week bins */
 format timestamp dtwkdatx8.;
 label timestamp = "Day";
 vbar timestamp;
run;

The next time that you find yourself creating a new variable to serve as a category, check whether there might be a SAS format that you can apply instead. Even if there isn't, it might still be more efficient to create a new format rather than calculate a new variable, especially if it's a category that you need to use for multiple data sources.

tags: formats, SAS programming, SAS tips, SGPLOT
9月 152011
 

As I mentioned in my introductory post about Windows PowerShell, you can use PowerShell commands as a simple and concise method to collect data from your Windows systems -- information that is ripe for analysis within SAS.

In this example, I'll show a technique for using a SAS program to call PowerShell and funnel the results directly into your SAS session for analysis. This example uses the Get-Process cmdlet (pronounced /command-let/) to collect information about running processes on a given Windows machine.

I built this example in pieces, making sure that each part worked before putting it all together:

  • I ran the powershell command from a command prompt console, and redirected the output to a text file. For example:
    powershell -Command "Get-Process -ComputerName L73391" 
      > c:\temp\outProcesses.txt
  • I worked with the text file in SAS and SAS Enterprise Guide to develop a DATA step that could read/interpret it
  • I combined the command and DATA step into a program using FILENAME PIPE syntax
  • I tested the program on different machines, just to make sure that the behavior of the command and the text file didn't change across machines.

In order to run this example, you need the following conditions to be true:

 /* name of a machine on my network where I have admin rights */
%let machine = L73391; 
filename process pipe
  "powershell -Command ""Get-Process -ComputerName &machine.""";
data processes;
	infile process firstobs=4;
	length
		Handles 8
		WorkingSet 8
		CPU 8
		ID $ 8
		ProcessName $ 255;
	format
		WorkingSet comma12.
		Handles comma6.;
	input @1  Handles
		@25 WorkingSet
		@42 CPU 42-50
		@51 ID
		@59 ProcessName;
run;
 
title "Processes on &machine";
/* with an ODS destination active, this will create a report */
proc sql;
	select * from processes order by WorkingSet desc;
quit;
 
proc means data=processes;
	var Handles WorkingSet;
run;

This produces a simple report of the active processes on the given machine. Here is a partial example of the output, in descending order of "Working Set" (which is one measurement of the amount of system memory in use by a process).

tags: filename pipe, PowerShell, sas administration, SAS tips
9月 072011
 

I use Google Reader to keep up with SAS-related conversations on the blogosphere.   I thought it would be nice to share the lists of blogs that I follow as "shared bundles".  If you also use Google Reader, it will be very easy for you to add these bundles to items that you follow.  (Note: in order to use Google Reader you need a Google account, but my guess is that most people have at least one of those these days.)

If you have or know of other SAS-related blogs, post back in the comments so I can add them to my list of items to watch.

The following links will lead you to my two shared bundles:

tags: Google reader, SAS blog, SAS tips
8月 302011
 

While talking to SAS users in Australia earlier this month, I often demonstrated the capabilities of the new Map Chart task in SAS Enterprise Guide 4.3. Creating map charts has never been easier: select your map data source, then select your response data source, and click Run. Voila! You've got a map chart.  (See this SAS Global Forum paper from Stephanie Thompson for proof that it's "easier than you think".)

Of course, all of my map chart examples used data from the United States.   Australians recognize a map of the USA when they see it (I cannot say the same of many Americans who view a map of Australia), but they wanted to see examples of maps that featured their country. And although SAS provides map data for Australia with detail to the state level, these users wanted more detail than that.   Here is an example of a "coarse-grained" map chart of Australia with just the state boundaries, created using the MAPS.AUSTRAL data set that is supplied with SAS:
Not much detail in this map of Australia

On their web site, the Australia Bureau of Statistics provides map data with more details, down to the "statistical regions", "local government regions" and more. These data files are provided as shape files in ESRI or MapInfo format.

Fortunately, it's easy to import ESRI shape files into SAS map data sets by using PROC MAPIMPORT.  Assuming you've downloaded the ESRI shape files to your local system, this simple code can convert them to a SAS map data set:

/* Shape file for statistical local areas */
proc mapimport out=aus_sla
  datafile="c:\projects\AusMaps\SLA11aAust.shp";
run;

With the map data now available in WORK.AUS_SLA, all you need is some response data that contains a shared identifier field.  The following step creates a data source with random values:

/* make up some response data */
data response_sla;
  set aus_sla (keep=SLA_CODE11 SLA_NAME11);
	/* keep just one value per region */
	by SLA_CODE11;
	if first.SLA_CODE11;
	/* random value between 0 and 100 */
	RespValue = ranuni(0) * 100;
run;

Now use the GMAP procedure to combine the two data sources into a colorful, if somewhat meaningless, map of Australia:

/* create a simple map chart by combining the two */
legend1 across=3 label=("Values");
goptions colors=(green blue purple red orange yellow );
proc gmap data=work.response_sla map=work.aus_sla all;
	id sla_code11;
	choro RespValue / levels=6
		woutline=1
		legend=legend1
		;
run;
quit;

Here is a sample of the output (Americans, take note):
A sample map chart with more detail for Australia

Just remember: Map Charts = Map Data + Your Response Data.  With a third data set for annotations, you can create even fancier results.  But that's a topic for another blog post.  For those readers who are curious in the meantime, a simple SAS support search will open the door for you.

 

8月 262011
 
Contributed by Rex Pruitt, IFSUG Chairman ~ During SAS Global Forum 2011 in Las Vegas, a small group of users gathered to organize the first IFSUG (Insurance and Finance SAS Users Group). Today, there are more than 370 enrollees in IFSUG Constant Contact, 60 members in the IFSUG LinkedIn group [...]
7月 122011
 
It seems like such a simple problem: how can you reliably compute the age of someone or something? Susan lamented the subtle issues using the YRDIF function exactly 1.0356164384 years ago.

Sure, you could write your own function for calculating such things, as I suggested 0.1753424658 years ago.

Or you could ask your colleagues in the discussion forums, as I noticed some people doing just about 3.7890410959 years ago.

Or, now that SAS 9.3 is available, you can take advantage of the new AGE basis that the YRDIF function supports. For example:


data _null_;
  x = yrdif('29JUN2010'd,today(),'AGE');
  y = yrdif('09MAY2011'd,today(),'AGE');
  z = yrdif('27SEP2007'd,today(),'AGE');
  put x= y= z=;
run;
 
Yields:

x=1.0356164384 y=0.1753424658 z=3.7890410959
 
This new feature and many more were added in direct response to customer feedback from Susan and others. And that's a practice that never gets old.
7月 122011
 

It seems like such a simple problem: how can you reliably compute the age of someone or something? Susan lamented the subtle issues using the YRDIF function exactly 1.0356164384 years ago.

Sure, you could write your own function for calculating such things, as I suggested 0.1753424658 years ago.

Or you could ask your colleagues in the discussion forums, as I noticed some people doing just about 3.7890410959 years ago.

Or, now that SAS 9.3 is available, you can take advantage of the new AGE basis that the YRDIF function supports. For example:

data _null_;
  x = yrdif('29JUN2010'd,today(),'AGE');
  y = yrdif('09MAY2011'd,today(),'AGE');
  z = yrdif('27SEP2007'd,today(),'AGE');
  put x= y= z=;
run;

Yields:

x=1.0356164384 y=0.1753424658 z=3.7890410959

This new feature and many more were added in direct response to customer feedback from Susan and others. And that's a practice that never gets old.

7月 112011
 
SAS Enterprise Guide sets values for several useful SAS macro variables when it connects to a SAS session, including one macro variable, &_CLIENTPROJECTPATH, that contains the name and path of the current SAS Enterprise Guide project file.

(To learn about this and other macro variables that SAS Enterprise Guide assigns, look in Help->SAS Enterprise Guide help, and search the Index for "macro variables".)

If your SAS program knows the path of your project file, then you can use that information to make your project more "portable", and reference data resources using paths that are relative to the project location, instead of having to hard-code absolute paths into your program.

For example, I've been working with some data from DonorsChoose.org, and I've captured that work in a project file. After saving the project, I can easily get the name of the project file by checking the value of &_CLIENTPROJECTPATH:


15    %put &_clientprojectpath;
'C:\DataSources\DonorsChoose\DonorsChoose.egp'
 
If you can get the full path of the project file, then you can build that information into a SAS program so that as you move the project file and its "assets" (such as data), you don't need to make changes to the project to accommodate a new project folder "home". Here is the bit of code that takes the project file location and distills a path from it, and then uses it to assign a path-based SAS library.

/* a bit of code to detect the local project path                          */
/* NOTE: &_CLIENTPROJECTPATH is set only if you saved the project already! */
%let localProjectPath =
   %sysfunc(substr(%sysfunc(dequote(&_CLIENTPROJECTPATH)), 1,
   %sysfunc(findc(%sysfunc(dequote(&_CLIENTPROJECTPATH)), %str(/\), -255 ))));

libname DC "&localProjectPath";
 
Here's the output:

21  libname DC "&localProjectPath";
NOTE: Libref DC was successfully assigned as follows:
      Engine:        V9
      Physical Name: C:\DataSources\DonorsChoose
 
This allows me to keep the project together with the data that it consumes and creates. And I can easily share the project and data with a colleague, who can store it in a different folder on his/her machine, and it should work just the same without any changes.