sas programming

5月 152017
 

The little SAS program’s official name was Extract_Transform_Load_ 0314.sas.  But, that name was much too formal, way too long, and did not roll off of the tongue very easily at all.  So, everybody simply called her:  ETL Pi. ETL Pi was conceived in a 2-hour project strategy meeting in conference [...]

The post The Little SAS Program’s Big Night Out appeared first on SAS Learning Post.

5月 152017
 

Last week I showed a timeline of living US presidents. The number of living presidents is constant during the time interval between inaugurations and deaths of presidents. The data was taken from a Wikipedia table (shown below) that shows the number of years and days between events. This article shows how you can use the INTCK and INTNX functions in SAS to compute the time between events in this format. In particular, I use two little-known options to these functions that make this task easy.

Intervals between dates

If you are computing the interval between two dates (a start date and an end date) there are two SAS functions that you absolutely must know about.

  • The INTCK function returns the number of time units between two dates. For the time unit, you can choose years, months, weeks, days, and more. For example, in my previous article I used the INTCK function to determine the number of days between two dates.
  • The INTNX function returns a SAS date that is a specified number of time units away from a specified date. For example, you can use the INTNX function to compute the date that is 308 days in the future from a given date.

These two functions complement each other: one computes the difference between two dates, the other enables you to add time units to a date value.

By default, these functions use the number of "calendar boundaries" between the dates, such as the first day of a year, month, or week. For example, if you choose to measure year intervals, the INTCK function counts how many times 01JAN occurred between the dates, and the INTNX function returns a future 01JAN date. Similarly, if you measure month intervals, the INTCK function counts how many first-of-the-months occur between two dates, and the INTNX function returns a future first-of-the-month date.

Options to compute anniversary dates

Both functions support many options to modify the default behavior. If you want to count full year intervals, instead of the number of times people celebrated New Year's Eve, these function support options (as of SAS 9.2) to count the number of "anniversaries" between two dates and to compute the date of a future anniversary. You can use the 'CONTINUOUS' option for the INTCK function and the 'SAME' option for the INTNX function, as follows:

  • The 'CONTINUOUS' option in the INTCK function enables you to count the number of anniversaries of one date that occur prior to a second date. For example, the statement
    Years = intck('year', '30APR1789'd, '04MAR1797'd, 'continuous');
    returns the value 7 because there are 7 full years (anniversaries of 30APR) between those two dates. Without the 'CONTINUOUS' option, the function returns 8 because 01JAN occurs 8 times between those dates.
  • The statement
    Anniv = intnx('year', '30APR1789'd, 7, 'same');
    returns the 7th anniversary of the date 30APR1789. In other words, it returns the date value for 30APR1796.

The beauty of these functions is that they automatically handle leap years! If you request the number of days between two dates, the INTCK function includes leap days in the result. If an event occurs on a leap day, and you ask the INTNX function for the next anniversary of that event, you will get 28FEB of the next year, which is the most common convention for handling anniversaries of a leap day.

An algorithm to compute years and days between events

The following algorithm computes the number of years and days between dates in SAS:

  • Use the INTCK function with the 'CONTINUOUS' option to compute the number of complete years between the two dates.
  • Use the INTNX function to find a third date (the anniversary date) which is the same month and day as the start date, but occurs less than one year prior to the end date. (The anniversary of a leap days is either 28FEB or 29FEB, depending on whether the anniversary occurs in a leap year.)
  • Use the INTCK function to compute the number of days between the anniversary date and the end date.

The following DATA step computes the time interval in years and days between the first few US presidential inaugurations and deaths. The resulting Year and Day variables contain the same information as is displayed in the Wikipedia table.

data YearDays;
format Date prevDate anniv Date9.;
input @1  Date anydtdte12.
      @13 Event $26.;
prevDate = lag(Date);
if _N_=1 then do;                               /* when _N_=1, lag(Date)=. */
   Years=.; Days=.; return;            /* set years & days, go to next obs */
end;
Years = intck('year', prevDate, Date, 'continuous'); /* num complete years */
Anniv = intnx('year', prevDate, Years, 'same');      /* most recent anniv  */
Days = intck('day', anniv, Date);                    /* days since anniv   */
datalines;
Apr 30, 1789 Washington Inaug
Mar 4, 1797  J Adams Inaug
Dec 14, 1799 Washington Death
Mar 4, 1801  Jefferson Inaug
Mar 4, 1809  Madison Inaug
Mar 4, 1817  Monroe Inaug
Mar 4, 1825  JQ Adams Inaug
Jul 4, 1826  Jefferson Death
Jul 4, 1826  J Adams Death
run;
 
proc print data=YearDays;
var Event prevDate Date Anniv Years Days;
run;

Summary and references

In summary, the INTCK and INTNX functions are essential for computing intervals between dates. In this article, I emphasized two little-known options: the 'CONTINUOUS' option in INTCK and the 'SAME' option in INTNX. By using these options, you can to compute the number of anniversaries between dates and the most recent anniversary. Thus you can compute the years and days between two dates.

There have been countless articles and papers written about SAS dates and finding intervals between dates. I recommend the following articles:

Lastly, do you know what the acronyms INTCK and INTNX stand for? Obviously the 'INT' part refers to INTervals. The general consensus is that 'INTCK' stands for 'Interval Check' and 'INTNX' stands for "Interval Next."

The post INTCK and INTNX: Two essential functions for computing intervals between dates in SAS appeared first on The DO Loop.

4月 252017
 

When developing SAS® data sets, program code and/or applications, efficiency is not always given the attention it deserves, particularly in the early phases of development. Since data sizes and system performance can affect a program and/or an application’s behavior, SAS users may want to access information about a data set’s [...]

The post Determining the size of a SAS data set appeared first on SAS Learning Post.

4月 142017
 

Image of The Little SAS Enterprise Guide BookThere is a new member of The Little SAS Book family: The Little SAS Enterprise Guide Book.

If you are familiar with our other EG books, you may be wondering why this one isn’t called the “Fourth Edition.”  That is because we changed the title slightly.  Our previous EG books were each written for a specific version of EG, and consequently had the version number right in the title.  This book was written using EG 7.1, but it also applies to some earlier versions (5.1 and 6.1).  With a little luck, this book will also apply to future versions.  So it’s a keeper.

I’m very pleased with how this book has turned out.  We updated it so that all the windows and icons match the current EG, and we also added some great new sections.  Even with the new topics, this book is 60 pages shorter than our previous EG book!  It is shorter because we replaced some chapters on specific types of tasks, with a new chapter that explains how tasks work in general.  The result is a book that is easier to read and more useful.

For more information about this book including the table of contents, an excerpt, and reviews, click here.


4月 142017
 

Image of The Little SAS Enterprise Guide BookThere is a new member of The Little SAS Book family: The Little SAS Enterprise Guide Book.

If you are familiar with our other EG books, you may be wondering why this one isn’t called the “Fourth Edition.”  That is because we changed the title slightly.  Our previous EG books were each written for a specific version of EG, and consequently had the version number right in the title.  This book was written using EG 7.1, but it also applies to some earlier versions (5.1 and 6.1).  With a little luck, this book will also apply to future versions.  So it’s a keeper.

I’m very pleased with how this book has turned out.  We updated it so that all the windows and icons match the current EG, and we also added some great new sections.  Even with the new topics, this book is 60 pages shorter than our previous EG book!  It is shorter because we replaced some chapters on specific types of tasks, with a new chapter that explains how tasks work in general.  The result is a book that is easier to read and more useful.

For more information about this book including the table of contents, an excerpt, and reviews, click here.


4月 142017
 

Image of The Little SAS Enterprise Guide BookThere is a new member of The Little SAS Book family: The Little SAS Enterprise Guide Book.

If you are familiar with our other EG books, you may be wondering why this one isn’t called the “Fourth Edition.”  That is because we changed the title slightly.  Our previous EG books were each written for a specific version of EG, and consequently had the version number right in the title.  This book was written using EG 7.1, but it also applies to some earlier versions (5.1 and 6.1).  With a little luck, this book will also apply to future versions.  So it’s a keeper.

I’m very pleased with how this book has turned out.  We updated it so that all the windows and icons match the current EG, and we also added some great new sections.  Even with the new topics, this book is 60 pages shorter than our previous EG book!  It is shorter because we replaced some chapters on specific types of tasks, with a new chapter that explains how tasks work in general.  The result is a book that is easier to read and more useful.

For more information about this book including the table of contents, an excerpt, and reviews, click here.


4月 132017
 
The Penitent Magdalene

Titian (Tiziano Vecellio) (Italian, about 1487 - 1576) The Penitent Magdalene, 1555 - 1565, Oil on canvas 108.3 × 94.3 cm (42 5/8 × 37 1/8 in.) The J. Paul Getty Museum, Los Angeles; Digital image courtesy of the Getty's Open Content Program.

Even if you are a traditional SAS programmer and have nothing to do with cybersecurity, you still probably have to deal with this issue in your day-to-day work.

The world has changed, and what you do as a SAS programmer is not just between you and your computer anymore. However, I have found that many of us are still careless, negligent or reckless enough to be dangerous.

Would you scotch-tape your house key to the front door next to the lock and go on vacation? Does anybody do that? Still, some of us have no problem explicitly embedding passwords in our code.

That single deadly sin, the thing that SAS programmers (or any programmers) must not do under any circumstances, is placing unmasked passwords into their code. I must confess, I too have sinned, but I have seen the light and hope you will too.

Password usage examples

Even if SAS syntax calls for a password, never type it or paste it into your SAS programs. Ever.

If you connect to a database using a SAS/ACCESS LIBNAME statement, your libname statement might look like:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password=mypwd1;

If you specify the LIBNAME statement options for the metadata engine to connect to the metadata server, it may look like:

libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw=abcdefg;

If you use LIBNAME statement options for the metadata engine to connect to a database, it may look like:

libname oralib meta library=oralib dbuser=orauser dbpassword=orapw;

In all of the above examples, some password is “required” to be embedded in the SAS code. But it does not mean you should put it there “as is,” unmasked. SAS program code is usually saved as a text file, which is stored on your laptop or somewhere on a server. Anyone who can get access to such a file would immediately get access to those passwords, which are key to accessing databases that might contain sensitive information. It is your obligation and duty to protect this sensitive data.

Hiding passwords in a macro variable or a macro?

I’ve seen some shrewd SAS programmers who do not openly put passwords in their SAS code. Instead of placing the passwords directly where the SAS syntax calls for, they assign them to a macro variable in AUTOEXEC.SAS, an external SAS macro file, a compiled macro or some other SAS program file included in their code, and then use a macro reference, e.g.

/* in autoexec.sas or external macro */
%let opw = mypwd1;
 
/* or */
 
%macro opw;
	mypwd1
%mend opw;
/* in SAS program */
libname mydblib oracle user=myusr1 password=&opw
        path=airdb_remote schema=hrdept;
 
/* or */
 
libname mydblib oracle user=myusr1 password=%opw
        path=airdb_remote schema=hrdept;

Clever! But it’s no more secure than leaving your house key under the door mat. In fact it is much less secure. One who wants to look up your password does not even need to look under the door mat, oh, I mean look at your program file where the password is assigned to a macro variable or a macro. For a macro variable, a simple %put &opw; statement will print the password’s actual value in the SAS log. In case of a macro, one can use %put %opw; with the same result.

In other words, hiding the passwords do not actually protect them.

What to do

What should you do instead of openly placing or concealing those required passwords into your SAS code? The answer is short: encrypt passwords.

SAS software provides a powerful procedure for encrypting passwords that renders them practically unusable outside of the SAS system.

This is PROC PWENCODE, and it is very easy to use. In its simplest form, in order to encrypt (encode) your password abc123 you would need to submit just the following two lines of code:

proc pwencode in="abc123";
run;

The encrypted password is printed in the SAS log:

1 proc pwencode in=XXXXXXXX;
2 run;

{SAS002}3CD4EA1E5C9B75D91A73A37F

Now, you can use this password {SAS002}3CD4EA1E5C9B75D91A73A37F in your SAS programs. The SAS System will seamlessly take care of decrypting the password during compilation.

The above code examples can be re-written as follows:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password="{SAS002}9746E819255A1D2F154A26B7";
 
libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw="{SAS002}9FFC53315A1596D92F13B4CA";
 
libname oralib meta library=oralib dbuser=orauser
dbpassword="{SAS002}9FFC53315A1596D92F13B4CA";

Encryption methods

The {SAS002} prefix indicates encoding method. This SAS-proprietary encryption method which uses 32-bit key encryption is the default, so you don’t have to specify it in the PROC PWENCODE.

There are other, stronger encryption methods supported in SAS/SECURE:

{SAS003} – uses a 256-bit key plus 16-bit salt to encode passwords,

{SAS004} – uses a 256-bit key plus 64-bit salt to encode passwords.

If you want to encode your password with one of these stronger encryption methods you must specify it in PROC PWENCODE:

proc pwencode in="abc123" method=SAS003;
run;

SAS Log: {SAS003}50374C8380F6CDB3C91281FF2EF57DED10E6

proc pwencode in="abc123" method=SAS004;
run;

SAS Log: {SAS004}1630D14353923B5940F3B0C91F83430E27DA19164FC003A1

Beyond encryption

There are other methods of obscuring passwords to protect access to sensitive information that are available in the SAS Business Intelligence Platform. These are the AUTHDOMAIN= SAS/ACCESS option supported in LIBNAME statements, as well as PROC SQL CONNECT statements, SAS Token Authentication, and Integrated Windows Authentication. For more details, see Five strategies to eliminate passwords from your SAS programs.

Conclusion

Never place unencrypted password into your SAS program. Encrypt it!

Place this sticker in front of you until you memorize it by heart.

PROC PWENCODE sticker

 

One deadly sin SAS programmers should stop committing was published on SAS Users.

4月 132017
 

I recently asked a SAS user, “Which interface do you use for SAS?” She replied, “Interface? I just install SAS and use it.” “You’re using the SAS windowing environment,” I explained, but she had no idea what I was talking about. This person is an extremely sophisticated SAS user who [...]

The post What’s your SAS interface? appeared first on SAS Learning Post.

3月 272017
 

Did you know that you can check a SAS macro variable to see if ODS graphics is enabled? The other day I wanted to write a SAS program that creates a graph only if ODS graphics is enabled. The solution is to check the SYSODSGRAPHICS macro variable, which is automatically updated by the SAS system whenever ODS graphics is enabled or disabled. (Thanks to Warren Kuhfeld for this tip!) The SYSODSGRAPHICS variable has the value 0 for "off" and 1 for "on."

For example, the following SAS/IML program checks to see if ODS graphics is on. If so, it creates a bar chart of a categorical variable. If not, it computes the count for each category and prints a frequency table:

ods graphics on;         /* -OR-  ods graphics off */
proc iml;
use sashelp.cars;  read all var "Origin";  close;
 
if &SYSODSGRAPHICS then  /* is ODS graphics enabled? */
   call bar(Origin);     /* optionally display a graphical summary */
/* always display a tabular summary */
call tabulate(Categories, Counts, Origin);
print Counts[colname=Categories];

The SYSODSGRAPHICS automatic macro variable is a recent addition to the automatic macro variables in SAS. You can read Rick Langston's 2015 paper to discover other (relatively) new macro features in SAS 9.3 and SAS 9.4.

SAS has many other automatic macro variables. In general, you can use automatic macro variables in SAS to check the status of the SAS system at run time. Some of my other favorite automatic macro variables (which some people call system macro variables) are the following:

  • SYSERR, SYSERRORTEXT, and SYSWARNINGTEXT: Provide information about errors and warnings from SAS procedures. In my book Statistical Programming with SAS/IML I show how to use these macro variables in conjunction with the SUBMIT/ENDSUBMIT statements in SAS/IML.
  • SYSVER and SYSVLONG: Provide information about your version of SAS. You can use these macro variables to execute newer, more efficient, code if the user has a recent version of SAS.
  • SYSSCP and SYSSCPL: Provide information about the operating system. For example, is SAS running on Windows or Linux?

Do you have a favorite automatic variable? Leave a comment and tell me how you use automatic variables in your SAS programs.

The post Is ODS graphics enabled? Use automatic macro variables to determine the state of SAS appeared first on The DO Loop.

3月 202017
 

SAS formats are very useful and can be used in a myriad of creative ways. For example, you can use formats to display decimal values as a fraction. However, SAS supports so many formats that it is difficult to remember details about the format syntax, such as the default field width. I often use the "Formats by Category" page in the SAS documentation to look up the range of valid values of the field width (w) and decimal places (d) that are associated with a format such as PERCENTw.d or DATETIMEw.d. (Recall that the field width specifies the width for the formatted output.)

The documentation provides the minimum, maximum, and default values of the field width, but did you know that you can discover these value programmatically? In SAS 9.4m3 you can call the FMTINFO function, which provide information about SAS formats and informats. The FMTINFO function takes two arguments (the name of a format or informat, and a keyword) and returns a character value. For brevity, I will refer to the first argument as the "format," even though the function also supports informats.

You can use the FMTINFO function to create a personalized "cheat sheet" of the formats/informats that you use most often. The following SAS DATA step uses the FMTINFO function to retrieve information about SAS formats, including a short description, default parameter values, and the minimum and maximum values of the width and decimal parameters. You can modify the DATALINES statement to produce a table for your favorite formats.


Create a "cheat sheet" of your favorite #SAS formats.
Click To Tweet


data FormatInfo;
length Name $9. Type $8. Category $4. Desc $40. 
       DefW $5. MinW $5. MaxW $5. DefD $2. MinD $2. MaxD $2.; 
input Name @@;
Category = fmtinfo(Name, "Cat");  /* numeric, character, date, ... */
Type = fmtinfo(Name, "Type");     /* format, informat, or both */
Desc = fmtinfo(Name, "Desc");     /* short description of the format */
DefW = fmtinfo(Name, "DefW");     /* default width if you omit w. Example: BEST. */
MinW = fmtinfo(Name, "MinW");     /* minimum width */
MaxW = fmtinfo(Name, "MaxW");     /* maximum width */
DefD = fmtinfo(Name, "DefD");     /* default decimal digits */
MinD = fmtinfo(Name, "MinD");     /* minimum decimal digits */
MaxD = fmtinfo(Name, "MaxD");     /* maximum decimal digits */
datalines;
ANYDTDTE BEST   
DATETIME DOLLAR 
FRACT    PERCENT
PVALUE   $UPCASE
;
 
proc print data=FormatInfo noobs;
   var Name Type Category Desc;
run;
 
proc print data=FormatInfo noobs;
   var Name DefW MinW MaxW DefD MinD MaxD;
run;
Attributes of SAS formats as output by the FMTINFO function
Field width and decimal places of SAS formats as output by the FMTINFO function

The first table shows the name of a few SAS formats and informats. The TYPE column shows whether the name is a format, an informat, or both. The CATEGORY column shows the general category of data to which the format applies. The DESC column gives a brief description of the format.

The second table shows default, minimum, and maximum values of the field width (w) and the decimal places (d) that are displayed. The columns that display field width information are the most valuable for me. You can see that the default and minimum field widths vary quite a bit among the formats. In contrast, most of the formats in the table display zero decimal places by default. (The exception is the PVALUE. format, which displays numbers between 0 and 1.) If the maximum number of decimal places is zero, it means that the format does not support a decimal value. For example, character formats do not support decimal places.

Other than creating a cheat sheet, I don't think that the casual SAS programmer will need this function very often. It seems most useful for advanced applications such as validating input from a GUI, but maybe I'm wrong. What do you think? Do you anticipate using the FMTINFO function in your work? Leave a comment.

The post Discover information about SAS formats... programatically appeared first on The DO Loop.