sas programming

4月 142017
 

Image of The Little SAS Enterprise Guide BookThere is a new member of The Little SAS Book family: The Little SAS Enterprise Guide Book.

If you are familiar with our other EG books, you may be wondering why this one isn’t called the “Fourth Edition.”  That is because we changed the title slightly.  Our previous EG books were each written for a specific version of EG, and consequently had the version number right in the title.  This book was written using EG 7.1, but it also applies to some earlier versions (5.1 and 6.1).  With a little luck, this book will also apply to future versions.  So it’s a keeper.

I’m very pleased with how this book has turned out.  We updated it so that all the windows and icons match the current EG, and we also added some great new sections.  Even with the new topics, this book is 60 pages shorter than our previous EG book!  It is shorter because we replaced some chapters on specific types of tasks, with a new chapter that explains how tasks work in general.  The result is a book that is easier to read and more useful.

For more information about this book including the table of contents, an excerpt, and reviews, click here.


4月 142017
 

Image of The Little SAS Enterprise Guide BookThere is a new member of The Little SAS Book family: The Little SAS Enterprise Guide Book.

If you are familiar with our other EG books, you may be wondering why this one isn’t called the “Fourth Edition.”  That is because we changed the title slightly.  Our previous EG books were each written for a specific version of EG, and consequently had the version number right in the title.  This book was written using EG 7.1, but it also applies to some earlier versions (5.1 and 6.1).  With a little luck, this book will also apply to future versions.  So it’s a keeper.

I’m very pleased with how this book has turned out.  We updated it so that all the windows and icons match the current EG, and we also added some great new sections.  Even with the new topics, this book is 60 pages shorter than our previous EG book!  It is shorter because we replaced some chapters on specific types of tasks, with a new chapter that explains how tasks work in general.  The result is a book that is easier to read and more useful.

For more information about this book including the table of contents, an excerpt, and reviews, click here.


4月 142017
 

Image of The Little SAS Enterprise Guide BookThere is a new member of The Little SAS Book family: The Little SAS Enterprise Guide Book.

If you are familiar with our other EG books, you may be wondering why this one isn’t called the “Fourth Edition.”  That is because we changed the title slightly.  Our previous EG books were each written for a specific version of EG, and consequently had the version number right in the title.  This book was written using EG 7.1, but it also applies to some earlier versions (5.1 and 6.1).  With a little luck, this book will also apply to future versions.  So it’s a keeper.

I’m very pleased with how this book has turned out.  We updated it so that all the windows and icons match the current EG, and we also added some great new sections.  Even with the new topics, this book is 60 pages shorter than our previous EG book!  It is shorter because we replaced some chapters on specific types of tasks, with a new chapter that explains how tasks work in general.  The result is a book that is easier to read and more useful.

For more information about this book including the table of contents, an excerpt, and reviews, click here.


4月 132017
 
The Penitent Magdalene

Titian (Tiziano Vecellio) (Italian, about 1487 - 1576) The Penitent Magdalene, 1555 - 1565, Oil on canvas 108.3 × 94.3 cm (42 5/8 × 37 1/8 in.) The J. Paul Getty Museum, Los Angeles; Digital image courtesy of the Getty's Open Content Program.

Even if you are a traditional SAS programmer and have nothing to do with cybersecurity, you still probably have to deal with this issue in your day-to-day work.

The world has changed, and what you do as a SAS programmer is not just between you and your computer anymore. However, I have found that many of us are still careless, negligent or reckless enough to be dangerous.

Would you scotch-tape your house key to the front door next to the lock and go on vacation? Does anybody do that? Still, some of us have no problem explicitly embedding passwords in our code.

That single deadly sin, the thing that SAS programmers (or any programmers) must not do under any circumstances, is placing unmasked passwords into their code. I must confess, I too have sinned, but I have seen the light and hope you will too.

Password usage examples

Even if SAS syntax calls for a password, never type it or paste it into your SAS programs. Ever.

If you connect to a database using a SAS/ACCESS LIBNAME statement, your libname statement might look like:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password=mypwd1;

If you specify the LIBNAME statement options for the metadata engine to connect to the metadata server, it may look like:

libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw=abcdefg;

If you use LIBNAME statement options for the metadata engine to connect to a database, it may look like:

libname oralib meta library=oralib dbuser=orauser dbpassword=orapw;

In all of the above examples, some password is “required” to be embedded in the SAS code. But it does not mean you should put it there “as is,” unmasked. SAS program code is usually saved as a text file, which is stored on your laptop or somewhere on a server. Anyone who can get access to such a file would immediately get access to those passwords, which are key to accessing databases that might contain sensitive information. It is your obligation and duty to protect this sensitive data.

Hiding passwords in a macro variable or a macro?

I’ve seen some shrewd SAS programmers who do not openly put passwords in their SAS code. Instead of placing the passwords directly where the SAS syntax calls for, they assign them to a macro variable in AUTOEXEC.SAS, an external SAS macro file, a compiled macro or some other SAS program file included in their code, and then use a macro reference, e.g.

/* in autoexec.sas or external macro */
%let opw = mypwd1;
 
/* or */
 
%macro opw;
	mypwd1
%mend opw;
/* in SAS program */
libname mydblib oracle user=myusr1 password=&opw
        path=airdb_remote schema=hrdept;
 
/* or */
 
libname mydblib oracle user=myusr1 password=%opw
        path=airdb_remote schema=hrdept;

Clever! But it’s no more secure than leaving your house key under the door mat. In fact it is much less secure. One who wants to look up your password does not even need to look under the door mat, oh, I mean look at your program file where the password is assigned to a macro variable or a macro. For a macro variable, a simple %put &opw; statement will print the password’s actual value in the SAS log. In case of a macro, one can use %put %opw; with the same result.

In other words, hiding the passwords do not actually protect them.

What to do

What should you do instead of openly placing or concealing those required passwords into your SAS code? The answer is short: encrypt passwords.

SAS software provides a powerful procedure for encrypting passwords that renders them practically unusable outside of the SAS system.

This is PROC PWENCODE, and it is very easy to use. In its simplest form, in order to encrypt (encode) your password abc123 you would need to submit just the following two lines of code:

proc pwencode in="abc123";
run;

The encrypted password is printed in the SAS log:

1 proc pwencode in=XXXXXXXX;
2 run;

{SAS002}3CD4EA1E5C9B75D91A73A37F

Now, you can use this password {SAS002}3CD4EA1E5C9B75D91A73A37F in your SAS programs. The SAS System will seamlessly take care of decrypting the password during compilation.

The above code examples can be re-written as follows:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password="{SAS002}9746E819255A1D2F154A26B7";
 
libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw="{SAS002}9FFC53315A1596D92F13B4CA";
 
libname oralib meta library=oralib dbuser=orauser
dbpassword="{SAS002}9FFC53315A1596D92F13B4CA";

Encryption methods

The {SAS002} prefix indicates encoding method. This SAS-proprietary encryption method which uses 32-bit key encryption is the default, so you don’t have to specify it in the PROC PWENCODE.

There are other, stronger encryption methods supported in SAS/SECURE:

{SAS003} – uses a 256-bit key plus 16-bit salt to encode passwords,

{SAS004} – uses a 256-bit key plus 64-bit salt to encode passwords.

If you want to encode your password with one of these stronger encryption methods you must specify it in PROC PWENCODE:

proc pwencode in="abc123" method=SAS003;
run;

SAS Log: {SAS003}50374C8380F6CDB3C91281FF2EF57DED10E6

proc pwencode in="abc123" method=SAS004;
run;

SAS Log: {SAS004}1630D14353923B5940F3B0C91F83430E27DA19164FC003A1

Beyond encryption

There are other methods of obscuring passwords to protect access to sensitive information that are available in the SAS Business Intelligence Platform. These are the AUTHDOMAIN= SAS/ACCESS option supported in LIBNAME statements, as well as PROC SQL CONNECT statements, SAS Token Authentication, and Integrated Windows Authentication. For more details, see Five strategies to eliminate passwords from your SAS programs.

Conclusion

Never place unencrypted password into your SAS program. Encrypt it!

Place this sticker in front of you until you memorize it by heart.

PROC PWENCODE sticker

 

One deadly sin SAS programmers should stop committing was published on SAS Users.

4月 132017
 

I recently asked a SAS user, “Which interface do you use for SAS?” She replied, “Interface? I just install SAS and use it.” “You’re using the SAS windowing environment,” I explained, but she had no idea what I was talking about. This person is an extremely sophisticated SAS user who [...]

The post What’s your SAS interface? appeared first on SAS Learning Post.

3月 272017
 

Did you know that you can check a SAS macro variable to see if ODS graphics is enabled? The other day I wanted to write a SAS program that creates a graph only if ODS graphics is enabled. The solution is to check the SYSODSGRAPHICS macro variable, which is automatically updated by the SAS system whenever ODS graphics is enabled or disabled. (Thanks to Warren Kuhfeld for this tip!) The SYSODSGRAPHICS variable has the value 0 for "off" and 1 for "on."

For example, the following SAS/IML program checks to see if ODS graphics is on. If so, it creates a bar chart of a categorical variable. If not, it computes the count for each category and prints a frequency table:

ods graphics on;         /* -OR-  ods graphics off */
proc iml;
use sashelp.cars;  read all var "Origin";  close;
 
if &SYSODSGRAPHICS then  /* is ODS graphics enabled? */
   call bar(Origin);     /* optionally display a graphical summary */
/* always display a tabular summary */
call tabulate(Categories, Counts, Origin);
print Counts[colname=Categories];

The SYSODSGRAPHICS automatic macro variable is a recent addition to the automatic macro variables in SAS. You can read Rick Langston's 2015 paper to discover other (relatively) new macro features in SAS 9.3 and SAS 9.4.

SAS has many other automatic macro variables. In general, you can use automatic macro variables in SAS to check the status of the SAS system at run time. Some of my other favorite automatic macro variables (which some people call system macro variables) are the following:

  • SYSERR, SYSERRORTEXT, and SYSWARNINGTEXT: Provide information about errors and warnings from SAS procedures. In my book Statistical Programming with SAS/IML I show how to use these macro variables in conjunction with the SUBMIT/ENDSUBMIT statements in SAS/IML.
  • SYSVER and SYSVLONG: Provide information about your version of SAS. You can use these macro variables to execute newer, more efficient, code if the user has a recent version of SAS.
  • SYSSCP and SYSSCPL: Provide information about the operating system. For example, is SAS running on Windows or Linux?

Do you have a favorite automatic variable? Leave a comment and tell me how you use automatic variables in your SAS programs.

The post Is ODS graphics enabled? Use automatic macro variables to determine the state of SAS appeared first on The DO Loop.

3月 202017
 

SAS formats are very useful and can be used in a myriad of creative ways. For example, you can use formats to display decimal values as a fraction. However, SAS supports so many formats that it is difficult to remember details about the format syntax, such as the default field width. I often use the "Formats by Category" page in the SAS documentation to look up the range of valid values of the field width (w) and decimal places (d) that are associated with a format such as PERCENTw.d or DATETIMEw.d. (Recall that the field width specifies the width for the formatted output.)

The documentation provides the minimum, maximum, and default values of the field width, but did you know that you can discover these value programmatically? In SAS 9.4m3 you can call the FMTINFO function, which provide information about SAS formats and informats. The FMTINFO function takes two arguments (the name of a format or informat, and a keyword) and returns a character value. For brevity, I will refer to the first argument as the "format," even though the function also supports informats.

You can use the FMTINFO function to create a personalized "cheat sheet" of the formats/informats that you use most often. The following SAS DATA step uses the FMTINFO function to retrieve information about SAS formats, including a short description, default parameter values, and the minimum and maximum values of the width and decimal parameters. You can modify the DATALINES statement to produce a table for your favorite formats.


Create a "cheat sheet" of your favorite #SAS formats.
Click To Tweet


data FormatInfo;
length Name $9. Type $8. Category $4. Desc $40. 
       DefW $5. MinW $5. MaxW $5. DefD $2. MinD $2. MaxD $2.; 
input Name @@;
Category = fmtinfo(Name, "Cat");  /* numeric, character, date, ... */
Type = fmtinfo(Name, "Type");     /* format, informat, or both */
Desc = fmtinfo(Name, "Desc");     /* short description of the format */
DefW = fmtinfo(Name, "DefW");     /* default width if you omit w. Example: BEST. */
MinW = fmtinfo(Name, "MinW");     /* minimum width */
MaxW = fmtinfo(Name, "MaxW");     /* maximum width */
DefD = fmtinfo(Name, "DefD");     /* default decimal digits */
MinD = fmtinfo(Name, "MinD");     /* minimum decimal digits */
MaxD = fmtinfo(Name, "MaxD");     /* maximum decimal digits */
datalines;
ANYDTDTE BEST   
DATETIME DOLLAR 
FRACT    PERCENT
PVALUE   $UPCASE
;
 
proc print data=FormatInfo noobs;
   var Name Type Category Desc;
run;
 
proc print data=FormatInfo noobs;
   var Name DefW MinW MaxW DefD MinD MaxD;
run;
Attributes of SAS formats as output by the FMTINFO function
Field width and decimal places of SAS formats as output by the FMTINFO function

The first table shows the name of a few SAS formats and informats. The TYPE column shows whether the name is a format, an informat, or both. The CATEGORY column shows the general category of data to which the format applies. The DESC column gives a brief description of the format.

The second table shows default, minimum, and maximum values of the field width (w) and the decimal places (d) that are displayed. The columns that display field width information are the most valuable for me. You can see that the default and minimum field widths vary quite a bit among the formats. In contrast, most of the formats in the table display zero decimal places by default. (The exception is the PVALUE. format, which displays numbers between 0 and 1.) If the maximum number of decimal places is zero, it means that the format does not support a decimal value. For example, character formats do not support decimal places.

Other than creating a cheat sheet, I don't think that the casual SAS programmer will need this function very often. It seems most useful for advanced applications such as validating input from a GUI, but maybe I'm wrong. What do you think? Do you anticipate using the FMTINFO function in your work? Leave a comment.

The post Discover information about SAS formats... programatically appeared first on The DO Loop.

3月 152017
 

SAS programmers who have experience with other programming languages sometimes wonder whether the SAS language supports statements that are equivalent to the "break" and "continue" statements in other languages. The answer is yes. The LEAVE statement in the SAS DATA step is equivalent to the "break" statement. It provides a way to immediately exit from an iterative loop. The CONTINUE statements in the SAS DATA step skips over any remaining statements in the body of a loop and starts the next iteration.

Not all languages in SAS support those statements. For example, the SAS/IML language does not. However, you can use an alternative syntax to implement the same logical behavior, as shown in this article.

To review the syntax of various DO, DO-WHILE, and DO-UNTIL loops in SAS, see "Loops in SAS."

The LEAVE statement

The LEAVE statement exits a DO loop, usually as part of an IF-THEN statement to test whether a certain condition is met.

To illustrate the LEAVE statement consider a DATA step that simulates tossing a coin until "heads" appears. (Represent "tails" by a 0 and "heads" by a 1, chosen at random.) In the following SAS DATA step, the DO WHILE statement is always true, so the program potentially contains an infinite loop. However, the LEAVE statement is used to break out of the loop when "heads" appears. This coin-tossing experiment is repeated 100 times, which is equivalent to 100 draws from the geometric distribution:

/* toss a coin until "heads" (1) */
data Toss;
call streaminit(321);
do trial = 1 to 100;               /* simulate an experiment 100 times */
   count = 0;                      /* how many tosses until heads? */
   do while (1);                   /* loop forever */
      coin = rand("Bernoulli", 0.5);  /* random 0 or 1 */
      if coin = 1 then LEAVE;      /* exit loop when "heads" */
      count + 1;                   /* otherwise increment count */
   end;
   output;
end;
keep trial count;
run;

Some people like this programming paradigm (set up an infinite loop, break out when a condition is satisfied), but I personally prefer a DO UNTIL loop because the exit condition is easier to see when I read the program. For example, the code for each trial could be rewritten as

   count  = 0;                     /* how many tosses until heads? */
   done = 0;                       /* initialize flag variable */
   do until (done);                /* exit loop when "heads" */
      coin = rand("Bernoulli", 0.5);  /* random 0 or 1 */
      done = (coin = 1);           /* update flag variable */
      if ^done then                /* otherwise increment count */
         count + 1;
   end;
   output;

Notice that the LEAVE statement exits only the inner loop. In this example, the LEAVE statement does n ot affect the iteration of the DO TRIAL loop.

The CONTINUE statement

The CONTINUE statement tells the DATA step to skip the remaining body of the loop and go to the next iteration. It is used to skip processing when a condition is true. To illustrate the CONTINUE statement, let's simulate a coin-tossing experiment in which you toss a coin until "heads" appears OR until you have tossed the coin five times. In the following SAS DATA step, if tails (0) appears the CONTINUE statement executes, which skips the remaining statements and begins the next iteration of the loop. Consequently, the DONE=1 assignment is not executed when coin=0. Only if heads (1) appears does the DONE variable get assigned to a nonzero value, thereby ending the DO-UNTIL loop:

data Toss2;
call streaminit(321);
do trial = 1 to 100;               /* simulate an experiment 100 times */
   done = 0;                       /* initialize flag variable */
   do count = 0 to 4 until (done); /* iterate at most 5 times */
      coin = rand("Bernoulli", 0.5);  /* random 0 or 1 */
      if coin = 0 then CONTINUE;   /* tails: go to next iteration */
      done = 1;                    /* exit loop when "heads" */
   end;
   output;
end;
keep trial count;
run

The CONTINUE statement is not strictly necessary, although it can be convenient. You can always use an IF-THEN statement to bypass the remainder of the loop, as follows:

   coin = rand("Bernoulli", 0.5);  /* random 0 or 1 */
   /* wrap the remainder of the body in an IF-THEN statement */
   if coin ^= 0 then do;           /* heads: abort the loop */
      done = 1;                    /* exit loop when "heads" */
      /* other computations, as needed */
   end;

The CONTINUE and the LEAVE statements are examples of "jump statements" that tell the program the location of the next statement to execute. Both statements jump to a statement that might be far away. Consequently, programs that contain these statements are less structured than programs that avoid them. I try to avoid these statements in my programs, although sometimes the LEAVE statement is the simplest way to abort a loop when the program has to check for multiple exit conditions.

While we are on the topic, another jump statement in SAS is the GOTO statement, which can be used to emulate the behavior of LEAVE and CONTINUE. I avoid the GOTO statement because I think programs are easier to read and maintain when the logical flow of the program is controlled by using the DO-WHILE, DO-UNTIL, and IF-THEN statements. I also use those control statements in the SAS/IML language, which does not support the CONTINUE or LEAVE statements (although it does support the GOTO statement).

What are your views? Do you use the CONTINUE or LEAVE statement to simplify the error-handling logic of your programs? Or do you avoid them in favor of a more structured programming style? Why?

The post LEAVE and CONTINUE: Two ways to control the flow in a SAS DO loop appeared first on The DO Loop.

3月 102017
 

I recently needed to solve a fun programming problem. I challenge other SAS programmers to solve it, too! The problem is easy to state: Given a long sequence of digits, can you write a program to count how many times a particular subsequence occurs? For example, if I give you a sequence of 1,000 digits, can you determine whether the five-digit pattern {1 2 3 4 5} appears somewhere as a subsequence? How many times does it appear?

If the sequence is stored in a data set with one digit in each row, then SAS DATA step programmers might suspect that the LAG function will be useful for solving this problem. The LAG function enables a DATA step to examine the values of several digits simultaneously.

The SAS/IML language also has a LAG function which enables you to form a matrix of lagged values. This leads to an interesting way use vectorized computations to solve this problem. The following SAS/IML program defines a small eight-digit set of numbers in which the pattern {1 2 3} appears twice. The LAG function in SAS/IML accepts a vector of lags and creates a matrix where each column is a lagged version of the input sequence:

/* Find a certain pattern in sequence of digits */
proc iml;
Digit = {1,1,2,3,3,1,2,3};      /* digits to search */
target = {1 2 3};               /* the pattern to find */
p = ncol(target);               /* length of target sequence */
D = lag(Digit, (p-1):0);        /* columns shift the digits */
print D;

The output shows a three-column matrix (D) that contains the second, first, and zeroth lag (in that order) for the input sequence. Notice that if I am searching for a particular three-digit pattern, this matrix is very useful. The rows of this matrix are all three-digit patterns that appear in the original sequence. Consequently, to search for a three-digit pattern, I can use the rows of the matrix D.

To make the task easier, you can delete the first two rows, which contain missing values. You can also form a binary matrix X that has the value X[i,j]=1 when the j_th element of the pattern equals the j_th element of the i_th row, and 0 otherwise, as shown in the following:

D = D[p:nrow(Digit),];          /* delete first p rows */
X = (D=target);                 /* binary matrix */
print X;

Notice that in SAS/IML, the comparison operator (=) can perform a vector comparison. The binary comparison operator detects that the matrix on the left (D) and the vector on the right (target) both contain three columns. Therefore the operator creates the three-column logical matrix X, as shown. The X matrix has a wonderful property: a row of X contains all 1s if and only if the corresponding row of D matches the target pattern. So to find matches, you just need to sum the values in the rows of X. If the row sum equals the number of digits in the pattern, then that row indicates a place where the target pattern appears in the original sequence.

You can program this test in PROC IML as follows. The subscript reduction operator [,+] is shorthand for "compute the sum of each row over all columns".

/* sum across columns. Which rows contain all 1s? */
b = (X[,+] = p);                /* b[i]=1 if i_th row matches target */
NumRepl = sum(b);               /* how many times does target appear? */
if NumRepl=0 then 
   print "The target does not appear in the digits";
else
   print "The target appears at location " (loc(b)[1]),  /* print 1st location */
         "The target appears" (NumRepl) "times.";

The program discovered that the target pattern appears in the sequence twice. The first appearance begins with the second digit in the sequence. The pattern also appears in the sequence at the sixth position, although that information is not printed.

Notice that you can solve this problem in SAS/IML without writing any loops. Instead, you can use the LAG function to convert the N-digit sequence into a matrix with N-p rows and p columns. You can then test whether the target pattern matches one of the rows.

Your turn! Can you solve this problem?

Now that I have shown one way to solve the problem, I invite SAS programmers to write their own program to determine whether a specified pattern appears in a numeric sequence. You can use the DATA step, DS2, SQL, or any other SAS language.

Post a comment to submit your best SAS program. Extra points for programs that count all occurrences of the pattern and display the location of the first occurrence.

To help you debug your program, here is test data that we can all use. It contains 10 million random digits:

data Have(keep=Digit);
call streaminit(31488);
do i = 1 to 1e7;
   Digit = floor(10*rand("uniform"));
   output;
end;
run;

To help you determine if your program is correct, you can use the following results. In this sequence of digits:

  • The five-digit pattern {1 2 3 4 5} occurs 101 times and the first appearance begins at row 34417
  • The six-digit patter {6 5 4 3 2 1} occurs 15 times and the first appearance begins at row 120920

You can verify these facts by using PROC PRINT as follows:

proc print data=Have(firstobs=34417 obs=34421); run;
proc print data=Have(firstobs=120920 obs=120925); run;

Happy programming!

The post Find a pattern in a sequence of digits appeared first on The DO Loop.

3月 092017
 

Editor’s note: This is the first in a series of posts to help current SAS programmers add SAS Viya to their analytics skillset. In this post, SAS instructors Stacey Syphus and Marc Huber introduce you to the new Transitioning from Programming in SAS 9 to SAS Viya video library, designed to show SAS programmers [...]

The post Transitioning from programming in SAS 9 to SAS Viya appeared first on SAS Learning Post.