sas programming

11月 112016
 

If you obtain data from web sites, social media, or other unstandardized data sources, you might not know the form of dates in the data. For example, the US Independence Day might be represented as "04JUL1776", "07/04/1776", "Jul 4, 1776", or "July 4, 1776." Fortunately, the ANYDTDTE informat makes it easy read dates like these into SAS.

The ANYDTDTEw. informat is a flexible alternative to older informats such as DATEw., MMDDYYw., and YYMMDDw. If your dates are in a specific form, the older informats work great and serve to document that all dates must be in that standard form. If the dates are not standardized or you need to read a string like "July 4, 1776", the ANYDTDTE informat is a godsend.

The ANYDTDTE informat for reading dates

The following SAS DATA step shows that the ANYDTDTEw. format combines several older formats into a "super format" that attempts to convert a character string into a date. The ANYDTDTE format can not only replace many of the older formats, but it can be used to convert a string like "Jul 4, 1776" into a date, as follows:

data Dates;
input @1 Style $8.
      @9 Value anydtdte12.;
format Value DATE10.;
datalines;
DATE    04JUL1776
MMDDYY  07041776
MMDDYY  07/04/1776
YYMMDD  17760704 
N/A     Jul 4, 1776
N/A     July 4, 1776
;
 
proc print noobs; run;
Result of using the ANYDTDTE informat to read strings that represent dates

As you can see, the ANYDTDTE informat reads six different strings, but converts all of them to the SAS date value that corresponds to 04JUL1776.

MMDD or DDMM? How does ANYDTDTE interpret ambiguous dates?

The string 07/04/1776 can be interpreted as "April 7, 1776" or "July 4, 1776," depending upon the local convention. Europeans tend to interpret the string as DD/MM/YYYY whereas the US convention is to use MM/DD/YYYY. How does the ANYDTDTEw. informat guess which interpretation might be correct?

The answer is that the informat looks at the DATESTYLE SAS option. By default, the DATESTYLE option uses the LOCALE system option to guess which style to use. You can use PROC OPTIONS to see the value of these options, which are printed to the SAS log:

proc options option=(DATESTYLE LOCALE) value; run;
Option Value Information For SAS Option DATESTYLE
    Value: MDY
    Scope: Default
    How option value set: Locale
 
Option Value Information For SAS Option LOCALE
    Value: EN_US
...

For my system, the DATESTYLE option is set to MDY, which means that the string "07/04/1776" will be interpreted MM/DD/YYYY. If you need to read dates that obey a different convention, you can use the global OPTIONS statement to set the DATESTYLE option:

options DATESTYLE=DMY;    /* change default style convention */
/* Restore default convention: options DATESTYLE=Locale; */

Other "ANY" informats in SAS

There are two other SAS infomats that are similar to the ANYDTDTE informat:

Here's a tip to help you remember these seemingly cryptic names. The first part of the name is "ANYDT", which means that the input string can be ANY datetime (DT) value. The end of the name refers to the numerical value that is produced by the informat. The resulting value can be a date (DTE), a datetime (DTM), or a time (TME) value. Thus the three informats all have the mnemonic form ANYDTXXX where the XXX suffix refers to the value that is produced.

tags: Reading and Writing Data, SAS Programming

The post One informat to rule them all: Read any date into SAS appeared first on The DO Loop.

11月 042016
 

My river walk last week turned into a spectacular fall show. But if it rains this week in San Antonio, like the weatherman predicts, what will I do? In the coming days, I’ll be presenting at two user groups,  one in eastern Canada in Halifax, and the other all the […]

The post The difference between the Subsetting IF and the IF—THEN—ELSE—IF statement appeared first on SAS Learning Post.

10月 172016
 

SAS programmers often resort to using the X command to list the contents of file directories and to process the contents of ZIP files (or gz files on UNIX). In centralized SAS environments, the X command is unavailable to most programmers. NOXCMD is the default setting for these environments (disallowing shell commands), and SAS admins are reluctant to change it.

In this article, I'll share a SAS program that can retrieve the contents of a file directory (all of the file names), and then also report on the contents of every ZIP file within that directory -- without using any shell commands. The program uses two lesser-known tricks to retrieve the information:

  1. The FILENAME statement can be applied to a directory, and then the DOPEN, DNUM, DREAD, and DCLOSE functions can be used to retrieve information about that directory. (Check SAS Note 45805 for a better example of just this - click the Full Code tab.)
  2. The FILENAME ZIP method (added in SAS 9.4) can retrieve the names of the files within a compressed archive (ZIP or gz files). For more information, see all of my previous articles about the FILENAME ZIP access method.

I wrote the program as a SAS macro so that it should be easy to reuse. And I tried to be liberal with the comments, providing a view into my thinking and maybe some opportunities for improvement.

%macro listzipcontents (targdir=, outlist=);
  filename targdir "&targdir";
 
  /* Gather all ZIP files in a given folder                */
  /* Searches just one folder, not subfolders              */
  /* for a fancier example see                             */
  /* http://support.sas.com/kb/45/805.html (Full Code tab) */
  data _zipfiles;
    length fid 8;
    fid=dopen('targdir');
 
    if fid=0 then
      stop;
    memcount=dnum(fid);
 
    /* Save just the names ending in ZIP*/
    do i=1 to memcount;
      memname=dread(fid,i);
      /* combo of reverse and =: to match ending string */
      /* Looking for *.zip and *.gz files */
      if (reverse(lowcase(trim(memname))) =: 'piz.') OR
         (reverse(lowcase(trim(memname))) =: 'zg.') then
        output;
    end;
 
    rc=dclose(fid);
  run;
 
  filename targdir clear;
 
  /* get the memnames into macro vars */ 
  proc sql noprint;
    select memname into: zname1- from _zipfiles;
    %let zipcount=&sqlobs;
  quit;
 
  /* for all ZIP files, gather the members */
  %do i = 1 %to &zipcount;
    %put &targdir/&&zname&i;
    filename targzip ZIP "&targdir/&&zname&i";
 
    data _contents&i.(keep=zip memname);
      length zip $200 memname $200;
      zip="&targdir/&&zname&i";
      fid=dopen("targzip");
 
      if fid=0 then
        stop;
      memcount=dnum(fid);
 
      do i=1 to memcount;
        memname=dread(fid,i);
 
        /* save only full file names, not directory names */
        if (first(reverse(trim(memname))) ^='/') then
          output;
      end;
 
      rc=dclose(fid);
    run;
 
    filename targzip clear;
  %end;
 
  /* Combine the member names into a single data set        */
  /* the colon notation matches all files with "_contents" prefix */
  data &outlist.;
    set _contents:;
  run;
 
  /* cleanup temp files */
  proc datasets lib=work nodetails nolist;
    delete _contents:;
    delete _zipfiles;
  run;
 
%mend;

Use the macro like this:

%listzipcontents(targdir=c:temp, 
 outlist=work.allfiles);

Here's an example of the output.
zip file contents within the target directory

Experience has taught me that savvy SAS programmers will scrutinize my example code and offer improvements. For example, they might notice my creative use of the REVERSE function and "=:" operator to simulate and "ends with" comparison function -- and then suggest something better. If I don't receive at least a few suggestions for improvements, I'll know that no one has read the post. I hope I'm not disappointed!

tags: FILENAME ZIP, SAS programming, xcmd, ZIP files

The post List the contents of your ZIP and gz files using SAS appeared first on The SAS Dummy.

10月 102016
 

The WHERE clause in SAS is a powerful mechanism for selecting observations as you read or write a data set. The WHERE clause supports many operators, including the IN operator, which enables you to compactly specify multiple conditions for a categorical variable.

A common use of the IN operator is to specify a list of US states and territories that should be included or excluded in an analysis. For example, the following DATA step reads the Sashelp.Zipcode data, but excludes zip codes for the states of Alaska ("AK"), Hawaii ("HI"), and US territories such as Puerto Rico ("PR"), Guam ("GU"), and so on:

data Lower48;
set Sashelp.Zipcode(where=(   /* exclude multiple states and territories */
    Statecode not in ("AK" "HI" "VI" "GU" "FM" "MP" "MH" "PW"))
    );
run;

WHERE operators in SAS/IML are vectorized

In my previous article about how to use the WHERE clause in SAS/IML, my examples used scalar comparisons such as where(sex="F") to select only females in the data. The SAS/IML language does not support the IN operator, but there is another compact way to include or exclude multiple values. Because SAS/IML is a matrix-vector language, many operations support vector arguments. In particular, the WHERE clause in the SAS/IML language enables you to use the ordinary equal operator (=) and specify a vector of values on the right hand side!

For example, the following statement reads in all US zip codes in the contiguous US and creates a scatter plot of their locations:

proc iml;
excludeList = {"AK" "HI" "PR" "VI" "GU" "FM" "MP" "MH" "PW"};
use Sashelp.Zipcode where(Statecode ^= excludeList);   /* vector equiv of NOT IN */
read all var {X Y "City" "Statecode"};
close;
 
title "Centers of US ZIP Codes";
call scatter(X, Y) group=Statecode option="markerattrs=(size=2)" 
     label={"Longitude" "Latitude"} procopt="noautolegend";
ZIP code locations filtered by a WHERE clause in SAS/IML

The WHERE clause skips observations for which the Statecode variable matches any of the values in the excludeList vector. The scatter plot reveals the basic shape of the contiguous US. You can see that the plot does not display locations from Alaska, Hawaii, or US territories.

String matching operators in the SAS WHERE clause

As long as we're talking about the WHERE clause in SAS, let's discuss some string-matching operators that might not be familiar to some SAS programmers. I'll use SAS/IML for the examples, but these operators are generally supported (in scalar form) in all SAS WHERE clauses. The operators are

  • The "contains" operator (?)
  • The "not contains" operator (^?)
  • The "begins with" operator (=:)
  • The "sounds like" operator (=*)

All these operators are documented in the list of WHERE clause operators in SAS/IML.


WHERE operators in #SAS: string matching and fuzzy matching
Click To Tweet


The "contains" operator (?) and the "not contains" operator (^?) match a substring that appears anywhere in the target character variable.

The "begins with" operator (=:) matches substrings that appear at the beginning of a target variable. For example, the following statements select observations for which the state begins with the letter "B", "C", or "D". (There are no US states that begin with "B.") Notice that the "begin with" operator is also vectorized in SAS/IML:

use Sashelp.Zipcode where(Statecode =: {"B" "C" "D"});  /* =:  "begins with" */
read all var {X Y "City" "Statecode"};
close;
 
u = unique(Statecode);
print u;
t_whereiml1

Fuzzy matching of English words

Perhaps the most unusual operator in the WHERE clause in SAS is the "sounds like" operator (=*), which does "fuzzy matching" of English words. The operator finds English words that are similar to the specified target words by using the SOUNDEX function in SAS. The SOUNDEX function is often used to select different names that sound alike but have different spelling, such as "John" and "Jon" or "Lynn" and "Lynne." In the following WHERE clause, the "sounds like" operator is used to select observations for which the city sounds similar to either "Cary" or "Asheville." The selected observations are plotted in a scatter plot after eliminating duplicate rows.

use Sashelp.Zipcode where(City =* {"Cary" "Asheville"}); /* =*  "sounds like" */
read all var {X Y "City" "Statecode"};
close;
 
start UniqueRows(x);            
   cols = 1:ncol(x);               /* sort by all columns */
   call sortndx(ndx, x, cols);     /* ndx = permutation of rows that sorts matrix */
   uRows = uniqueby(x, cols, ndx); /* locate unique rows of sorted matrix */
   return ( ndx[uRows] );          /* rows in original matrix */
finish;
 
r = UniqueRows(City||Statecode);   /* get row numbers in x for unique rows */
call scatter(X[r], Y[r]) group=Statecode[r] datalabel=City[r]
   option="markerattrs=(symbol=CircleFilled)" procopt="noautolegend";
Citites that sound like Cary or Asheville, filtered by a WHERE clause in SAS/IML

According to the "sounds like" operator, the name "Cary" sounds like "Carey," "Cory," and "Cherry." The name "Asheville" sounds like "Ashville," "Ashfield," and "Ash Flat."

In summary, the WHERE clause in SAS/IML works a little differently than the more-familiar version in the SAS DATA step. Both versions enable you to selectively include or exclude observations that satisfy one or more conditions. However, the SAS/IML WHERE clause is vectorized. You can specify a vector of conditions for operators, thus reproducing the functionality of the IN operator.

This article also demonstrates a few lesser-known string operators, such as "contains" (?), "not contains" (^?), "begins with" (=:), and "sounds like" (=*).

tags: SAS Programming

The post WHERE operators in SAS: Multiple comparisons and fuzzy matching appeared first on The DO Loop.

9月 112016
 

wuss2016logoThe Western Users of SAS Software 2016 conference is over.  I have been to a lot of SAS conferences, but WUSS is always my favorite because it is big enough for me to learn a lot, but small enough to be really friendly.  I’m already excited about next year’s conference in Long Beach, September 20-22.

If you missed my presentations this year or if you just want a written version, click the links below to download them.

How SAS Thinks: SAS Basics I

Introduction to DATA Step Programming: SAS Basics II

Introduction to SAS Procedures: SAS Basics III

SAS Studio: A New Way to Program in SAS presented by Lora Delwiche.

Errors, Warnings and Notes (Oh My): A Practical Guide to Debugging SAS Programs presented by Lora Delwiche.

 

 

 


8月 252016
 

First, if you landed on this topic because you encountered this SAS message:

ERROR 180-322: Statement is not valid or it is used out of proper order.

...then I'll tell you right now: you've probably left off a semicolon in one of your SAS statements. If you're lucky, the SAS log will show you where it happened by highlighting the offending line with the "180" error number:

create table f as select * from sashelp.class;
    ______
    180

Punctuation is important in any language. (Recommended reading: Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation.) It's especially important in a programming language like SAS, where semicolons are the only way that SAS can determine where one instruction ends and the next one begins.

So why doesn't SAS just say, "Hey buddy -- you're missing a semicolon!" ? The reason is that there are other potential causes for this message. For example, if you drop in a statement that SAS just doesn't understand -- or doesn't understand in the current context -- then the error message "Statement is not valid or it is used out of proper order" pretty much says it all.

But I was curious about the error number: 180-322. Where did that come from? To find out, I had to step into the SAS Wayback Machine.

Error numbers are historical

At SAS, the "Wayback Machine" is personified in Rick Langston, a 35+ year SAS employee who is the main steward of the SAS programming language. I asked Rick about the origin of the error number 180-322, and he immediately logged in to SAS 82.4 to check the behavior.

That's right: He ran SAS 82.4 -- as in, the version of SAS that was available in 1982. He didn't even have to climb into his DeLorean and drive 88MPH. We actually have SAS 82.4 running on a mainframe at SAS. Here's the log from Rick's syntax error test:

1       S A S   L O G    OS SAS 82.4         MVS/XA JOB SAS824   STEP SAS      
NOTE: THE JOB SAS824 HAS BEEN RUN UNDER RELEASE 82.4 OF SAS AT SAS INSTITUTE DEV
NOTE: CPUID   VERSION = 00  SERIAL = 035EA6  MODEL = 2964 .                     
NOTE: SAS OPTIONS SPECIFIED ARE:                                                
       SORT=4                                                                   
1          DATA TEMP; X=1; BLAH;                                                
                           ____                                                 
ERROR:                     180                                                  
180:  STATEMENT IS NOT VALID OR IT IS USED OUT OF PROPER ORDER.                 

While Rick couldn't tell me why the number was set to 180 originally, it's clear why it's there today: legacy. Automated processes depend on error codes, and you can't go changing those error codes after a world of SAS users begin to rely on them. Maybe the "180" is a reference to "180 degrees," as in "turn around and look behind you: you forgot a semicolon!"

The second part of the error code, "322", indicates a grouping of related syntax error messages. Here is a sample of other messages that you'll encounter with the -322 suffix:

75-322 Syntax error, expecting one of the following: <expected keywords>
76-322 Syntax error, statement will be ignored.
77-322 The statement is being ignored.
181-322 Procedure name misspelled.
216-322 No simple repair for the syntax error. The statement is being ignored.

That last one is my favorite but I've never seen it in action. I wonder what sequence of statements would coax the "No simple repair" message into your SAS log? If you can make it happen, let me know in the comments. (It sounds like my approach when my family asks me to fix something around the house. "No simple repair -- so IGNORE." Not recommended, by the way.)

Where to learn more about ERROR and WARNING messages

If you're just getting started with SAS programming, it's a good idea to learn how to interpret the SAS log messages. Here are some papers that help:

tags: ERROR 180-322, SAS programming, SAS wayback machine

The post ERROR 180-322: The story of an error message appeared first on The SAS Dummy.

8月 122016
 

A colleague approached me with this very important business problem:

Every Friday at SAS HQ, SAS cafe staff provides a breakfast goodie in our breakrooms. Often the supplied goodie is delicious, but sometimes it's more...well...healthy. I want to know whether I should eat my breakfast before I leave home on Friday morning, or if it's better to save my calories for the breakfast goodie at SAS. Can I write a SAS program to send me a text message on Friday morning with the "goodie news" so I can make an informed decision?

Find the data source

bgoodieSAS HQ publishes the cafeteria menus on our intranet each day, and on Thursday the published menu includes an entry for the planned Friday goodie. Thus, we know that this "data" is available somewhere. My colleague had the idea of using FILENAME URL or PROC HTTP to have SAS read the cafe's menu web page and parse the details about the breakfast goodie. That's one way to do it, but here's something that I know about web pages: they all have source files somewhere that are used to generate the content you see in the browser. If you can gain access to that source file instead of going through the web server, you'll have a much simpler process.

With a little spelunking on the SAS network, I found the source files that feed the cafe menus. ("Feed" -- see what I did there?) The file with the goodie news is an HTML file that is named with a predictable date stamp. Here's the name for today: "../menus/DailyMenus/BreakfastWeb20160812.html". The source file looks like this:

<!-- Published on Thursday, August 11, 2016 01:02 PM-->
<h2 class="menuHeader">Friday Breakfast Goodie</h2>
<div class="entry fridayGoodie"><div class="menuLeft"><div class="menuLeftContent">  
<span class="entryName">Blueberry Almond Crunch Teacake</span>
</div></div></div>

We want just the goodie name. With a file reference to the menu file, we can easily parse this out with SAS. After this step, the goodie text is stored in the GOODIE macro variable.

/* Assuming this is run on Friday */
%let day = %sysfunc(compress(%sysfunc(today(),yymmdd10.),'-'));
filename bg "../menus/DailyMenus/BreakfastWeb&day..html";
data _null_;
  infile bg dsd;
  length line $ 80 goodie $ 40;
  input line;
  /* Line with the goods */
  if (find(line,'span class="entryName"') >0) then do;
    startpos = find(line, '>');
	endpos=find(line, '</');
	goodie = substr(line,startpos+1,endpos-startpos-1);
	call symput('GOODIE',goodie);
  end;
run;

Sending the SMS message with SAS

Finally, we're getting to the part of this article that you probably came to read: how to send the text message with SAS. This feels a bit like cheating, but there isn't any magic function in SAS that sends SMS messages. Instead, we're going to rely on a mobile phone service trick. Most phone service providers allow you to send text messages via e-mail by using a special address scheme for the text message recipient. Each carrier is a little different, but you can find the details with a simple internet search: "sms via email".

My current carrier is AT&T, and so to receive a text message as e-mail I would send it to the address my_number@txt.att.net. With the FILENAME EMAIL method, it's easy to send an e-mail using SAS. The trickiest part might be to find your SMTP server host name and other details. Using a service like Gmail? I've written a blog post about how to send e-mail with that method. Note: as with all phone text messages, text-message charges from your carrier may apply.

options emailhost='mailserver.company.com' emailsys=smtp;
 
/* NOT my real phone number */
filename msg email to="9198675309@txt.att.net" 
  FROM = "Cafe Bot <youremail@company.com>"
  subject="Breakfast goodie: &Goodie.";
 
data _null_;
 file msg;
 put 'Bon appetit!';
run;

Here's an example of the text message from my phone:

bgoodiesms
"Blueberry Almond Crunch Teacake" - yum! I'm skipping the Cheerios this morning and taking breakfast at work instead.

Scheduling the SAS job to run automatically

This entire process is valuable only if we can run it unattended, without having to remember to trigger it every Friday morning. After all, if you can log in to run a SAS job that sends yourself a text message, you can (probably more easily) just check the breakfast menu for the day. So my colleague scheduled this program to run early every Friday morning at SAS using a cron job, the ubiquitous scheduler on UNIX. I imagine that his crontab -l output looks something like:

00 05 * * 5 /usr/local/bin/sas -nodms -sysin '/u/userid/food/breakfastgoodie.sas'

That sets up the job to run at 05:00 on day 5 (Friday), running SAS with this program as input. In the immortal words of Ron Popeil: Set it, and forget it!

tags: FILENAME EMAIL, SAS programming, SMS

The post How to send a text message with SAS appeared first on The SAS Dummy.

8月 082016
 

SAS formats are flexible, dynamic, and have many uses. For example, you can use formats to count missing values and to change the order of a categorical variable in a table or plot. Did you know that you can also use SAS formats to bin a numerical variable into categories? This can be very convenient because you do not need to create a new variable in the data set; you merely apply a format to an existing variable.

Income categories: Are you middle class?

Many people use several IF-THEN/ELSE statements in the DATA step (or in a DATA step view) to create a new discrete variable that represents binned values of a continuous variable. That is a fine technique, but an alternative technique is to create a user-defined format that bins a continuous variable. One advantage of a custom format is that you can apply the format to multiple variables in multiple data sets.

For example, suppose that you want to define income categories such as "working class," "middle class," and the ultra-rich "1 percenters." According to a 2012 Money magazine article, the following cut points divide US household income into seven categories:

/* 2012 income categories for US according to Money magazine */
proc format;
value IncomeFmt  
      low   -<  23000 = "Poverty"        /* < 23 thousand         */
      23000 -<  32500 = "Working"        /* [ 23,  32.5) thousand */
      32500 -<  60000 = "Lower Middle"   /* [ 32.5, 60) thousand  */
      60000 -< 100000 = "Middle"         /* [ 60, 100) thousand   */
     100000 -< 150000 = "Upper Middle"   /* [100, 150) thousand   */
     150000 -< 250000 = "5 Percenter"    /* [150, 250) thousand   */
     250000 -   high  = "1 Percenter";   /* > 250 thousand        */
run;

The call to PROC FORMAT creates a custom format called IncomeFmt. When you assign the IncomeFmt format to a numerical variable, SAS will look at the value of each observation and determine the formatted value from the raw value. For example, a value of 18,000 is less than 23,000, so that value is formatted as "Poverty." A value of 85,000 is in the half-open interval [60000, 100000), so that value is formatted as "Middle."

The following DATA step defines the incomes for 26 fictitious individuals. The IncomeFmt format is assigned to the Income variable:

data incomes;
length Name $10.;
input Name Income @@;
format Income IncomeFmt.;     /* assign IncomeFmt format to Income variable */
datalines;
Amy        65100 Brad      146500 
Carlos    113300 Dimtri     28800 
Eduardo   233300 Felicity   14600 
Guo        43400 Hector    141700 
Irene      53400 Jacob     170300 
Katerina   43100 Liu        66800 
Michael    15800 Nancy      30900 
Oscar      31800 Pablo      65800 
Quentin    40000 Rick       62200 
Stephan    32900 Tracy      64000 
Umberto   124000 Victoria  220700 
Wanda     263800 Xie         9300 
Yolanda    23400 Zachary    93800 
;

The Income variable is a continuous variable, but the format bins each value into one of seven discrete values. Consequently, SAS procedures can analyze the Income variable as if it were a discrete variable. For example, you can count the number of individuals in each income category by calling PROC FREQ:

proc freq data=incomes; 
   tables Income / nocum norow nocol;
run;
t_formatbin1

Assigning or unassigning formats at run time

The underlying data values are not lost. You can use a FORMAT statement in a SAS procedure to temporarily assign or unassign a format. If you remove the format, you can analyze the underlying raw data. For example, the following call to PROC UNIVARIATE analyzes the raw incomes:

proc univariate data=incomes;
   format Income;     /* remove the format for this analysis */
   var Income;
run;

In a similar way, if you specify the Income variable on a CLASS statement in a regression procedures, the formatted values are used for the analysis. However, if you do NOT include it on the CLASS statement, then the variable is treated as a continuous variable and the unformatted values are used.

Subset data by using formatted values

If you run PROC PRINT on the income data, it LOOKS like the Income variable is a character variable. Furthermore, it is analyzed like a character variable when used in some SAS procedures such as PROC FREQ. Consequently, you might forget that the Income variable is actually numeric. However, if you treat Income as a character variable in the DATA set or a WHERE clause, then SAS will report an error. For example, the following WHERE clause is incorrect:

proc print data=incomes; 
where Income in ("5 Percenter", "1 Percenter"); /* WRONG: Income is numeric */
run;
ERROR: WHERE clause operator requires compatible variables.

SAS reports an error because the WHERE clause cannot compare the raw (numeric) values of the Income variable with elements of a set that contains two strings. When you see an error message like this, use PROC CONTENTS to investigate the attributes of the variable:

ods select Position;
proc contents data=incomes order=varnum; run;
t_formatbin2

The output from PROC CONTENTS informs you that the Income variable is numeric and displays the name of the format that is attached to it.

If you know the cutoff values that are used for the format, you could create a valid WHERE clause that uses numeric values: where Income GE 150000. However, usually it makes more sense to create a valid WHERE clause by using the PUT statement to apply the format to the raw data and compare formatted values:

/* use formatted values to subset data */
proc print data=incomes; 
where put(Income, IncomeFmt.) in ("5 Percenter", "1 Percenter");
run;
t_formatbin3

You can use other DATA step functions when constructing WHERE clauses. A typical example is when a variable is a SAS date. For example, the Sashelp.Air data set contains a variable named Date. You can use the following WHERE clause to analyze the subset of data that corresponds to the month of January in years prior to 1955:

proc print data=Sashelp.Air;
where month(date)=1 and year(date)<1955;  /* all January dates prior to 1955 */
run;

Summary

As shown in this article, SAS formats are very useful and flexible:

  • You can use a custom format to bin a continuous variable into categories.
  • Within a SAS procedure, you can temporarily assign or unassign a format to change the way that the data are analyzed.
  • The WHERE clause looks at raw data values, so use the PUT function in a WHERE clause if you want to subset the data according to the formatted values.
tags: Getting Started, SAS Programming

The post Use SAS formats to bin numerical variables appeared first on The DO Loop.

7月 202016
 

I'm addicted to you.
You're a hard habit to break.
Such a hard habit to break.
—  Chicago, "Hard Habit To Break"

Habits are hard to break. For more than 20 years I've been putting semicolons at the end of programming statements in SAS, C/C++, and Java/Javascript. But lately I've been working in a computer language that does not require semicolons. Nevertheless, my fingers have a mind of their own, and I catch myself typing unnecessary semicolons out of habit.

I started thinking about superfluous statements in the SAS language. Some programmers might argue that if the program still runs correctly, then unnecessary statements are inconsequential. However, as a general rule I think it is a good programming practice to avoid writing unnecessary statements.

Here are a few example of unnecessary SAS statement. Can you think of more?

A RUN statement after a DATALINES statement

The doc for the DATALINES statement in the SAS DATA step states: "The DATALINES statement is the last statement in the DATA step. Use a null statement (a single semicolon) to indicate the end of the input data." In other words, you do not need a RUN statement after the semicolon to run the DATA step. The following example runs correctly and creates a data set:

data A;
input x @@;
datalines;
1 2 3 4 5 6
;                              /* <== no RUN statement required */

How many times have you seen a RUN statement after a DATALINES statement? Countless! I've even seen examples in the SAS documentation that use this unnecessary statement.

A semicolon after a macro call

If you define a macro that contains a complete set of valid SAS statements, you do not need another semicolon when you call the macro. For example, the following example is valid:

%macro TOP(dsname);
   proc print data=&dsname(obs=5); run;
%mend;
 
%TOP(sashelp.class)            /* <== no semicolon required */

It's not a big deal if you type the semicolon because a semicolon is the null statement. It has no performance implications. But for some reason it bothers me when I catch myself doing it.

A RUN statement in a fully interactive procedure

In a fully interactive procedure, statements are executed immediately. The RUN statement has no effect. Examples include PROC IML, PROC SQL, and PROC OPTMODEL. You use the QUIT statement to exit these procedures, which means that the RUN statement is never needed. The following program is correct and runs three statements. In interactive mode, each statement gets run when the SAS parser reaches the semicolon that ends the statement.

proc sql;
create table Example (x num, y num);          /* statement 1 */
insert into Example
   values(1, 2)  values(3, 4)  values(5, 6);  /* statement 2 */
select *
   from Example;                              /* statement 3 */
/* no RUN statement required! */

A RUN statement in (some) procedures that support RUN-group processing

Some SAS procedures are partly interactive. Procedures such as PROC DATASETS, PROC REG, and PROC GLM support RUN-group processing. For these procedures, the RUN statement defines blocks of statements that get executed, but the procedure remains running until it encounters a QUIT statement.

Many SAS/STAT procedures interpret QUIT to mean "run the most recent statements and then quit." For these procedures, you do not need a RUN statement before you call QUIT. The following statements run a regression analysis and then quit the procedure:

proc glm data=sashelp.class;
model weight = height;
quit;    /* <== No RUN statement; Runs previous statements, then quits */

Unfortunately, SAS procedures are not completely consistent in implementing the QUIT statement. In some SAS procedures the QUIT statement means "ignore the most recent statements and quit." The canonical examples are the traditional SAS/GRAPH procedures such as PROC GPLOT. In the following program, the first PLOT statement creates a scatter plot because it is followed by a RUN statement. However, the second plot statement is not followed by a RUN statement, so it is ignored and the second plot is not produced.

proc gplot data=sashelp.class;
   plot weight*height;
run;     /* <== executes previous PLOT statement; does not quit */
   plot weight*age;
quit;    /* <== ignores previous PLOT statement, then quits */
/* use RUN; QUIT; to produce the second plot */

If you aren't sure how a procedure behaves with regards to RUN-group processing, it is always safe to use the RUN and QUIT statements in tandem.

When to include optional statements?

The previous sections describe unnecessary statements that I like to skip. However, sometimes I include optional statements in my programs for clarity, readability, or to practice defensive programming.

SAS supports many optional statements. When you omit an optional statement, the procedure does some default behavior. For example, if you omit the VAR statement, most procedures runs on all numerical variables (for example, PROC MEANS) or on all variables (for example, PROC PRINT). When I want the default behavior, I skip the VAR statement.

Another statement that is technically optional is the RUN statement for a sequence of procedures. Because the next call to a procedure or DATA step will always end the previous procedure, you can technically omit the RUN statement for all but the last procedure. This means that the following program is valid, although I do not recommend this style of programming:

data class;
   set sashelp.class(where=(sex='M'));  /* 'class' is the _LAST_ data set */
proc means;                             /* DATA= _LAST_ */
proc print;                             /* DATA= _LAST_ */
run;

If I'm feeling lazy, I might write these statement during the early exploratory phase of a data analysis. However for serious work I terminate every procedure by using a RUN or QUIT statement. Skipping a RUN statement can lead to undesirable interactions with global statements such as the TITLE statement and ODS statements.

Your thoughts?

There is much more that can be said about these topics. What are your thoughts?

  • Are there unnecessary statements that you write out of habit?
  • Are there optional statements that you always include because it makes the program clearer?
tags: Getting Started, SAS Programming

The post Do you write unnecessary SAS statements? appeared first on The DO Loop.

7月 122016
 

I've been working on a SAS program that can add content to the SAS Support Communities (more on that in a future post). Despite my 20+ years of SAS experience, there are a lot of SAS programming tricks that I don't know. Or that I use so infrequently that I always need to remind myself how to accomplish them.

Here's one. I needed to read the contents of an external text file into a SAS macro variable, so that I could then use that value (a very long text string) as part of an API call. In searching for a technique that would work for me, I came across a similar question on SAS Support Communities -- one that had been solved by our resident SASJedi, Mark Jordan. Perfect!

Here's the solution that worked for me:

FILENAME msghtml "path-to-text-file" ;
data _null_;
   length text $32767;
   retain text '';
   infile msghtml flowover dlmstr='//' end=last;
   input;
   text=cats(text,_infile_);
   if last then call symput('MSGBODY',text);
run;

The RETAIN statement allows me to build up the "text" variable as the DATA step processes multiple lines. The END=last on the INFILE statement sets a flag when we hit end-of-file, so I know that we're done and I can CALL SYMPUT the macro value. The FLOWOVER option tells the INPUT statement to keep reading even if no input values are found in the current record. (FLOWOVER is the default behavior, so the option probably isn't needed here.) DLMSTR allows you to specify a multichar delimiter string that's different than the default delimiter (a space character). We're using the CATS function to concatenate a trimmed version of the input buffer (_INFILE_) to the RETAINed "text" variable.

For my project I needed to URL-encode the text value for use in an HTTP-based REST API. So for me, the last line is really:

if last then call symput('MSGBODY',urlencode(trim(text)));

The SAS Support Communities has been a big help to me during this project -- a project that is designed to improve the communities even more. It's a virtuous cycle! I hope that this helps some of you out there, too.

tags: macro programming, SAS Communities, SAS programming

The post How to read the contents of a file into a SAS macro variable appeared first on The SAS Dummy.