Chris Hemedinger

7月 062018
 

SAS programmers have long wanted the ability to control the flow of their SAS programs without having to resort to complex SAS macro programming. With SAS 9.4 Maintenance 5, it's now supported! You can now recently came to light on SAS Support Communities. (Thanks to Super User Tom for asking about it.)

Prior to this change, if you wanted to check a condition -- say, whether a data set exists -- before running a PROC, you had to code it within a macro routine. It would look something like this:

/* capture conditional logic in macro */
%macro PrintIfExists();
 %if %sysfunc(exist(work.result)) %then
  %do;
    proc means data=work.result;
    run;
  %end; 
%else
  %do;
    %PUT WARNING: Missing WORK.RESULT - report process skipped.;
  %end;
%mend;
 
/* call the macro */
%PrintIfExists();

Now you can simplify this code to remove the %MACRO/%MEND wrapper and the macro call:

/* If a file exists, take an action */
/* else fail gracefully */
%if %sysfunc(exist(work.result)) %then
  %do;
    proc means data=work.result;
    run;
  %end;
%else
  %do;
    %PUT WARNING: Missing WORK.RESULT - report process skipped.;
  %end;

Here are some additional ideas for how to use this feature. I'm sure you'll be able to think of many more!

Run "debug-level" code only when in debug mode

When developing your code, it's now easier to leave debugging statements in and turn them on with a simple flag.

/* Conditionally produce debugging information */
%let _DEBUG = 0; /* set to 1 for debugging */
%if &_DEBUG. %then
  %do;
    proc print data=sashelp.class(obs=10);
    run;
  %end;

If you have code that's under construction and should never be run while you work on other parts of your program, you can now "IF 0" out the entire block. As a longtime C and C++ programmer, this reminds me of the "#if 0 / #endif" preprocessor directives as an alternative for commenting out blocks of code. Glad to see this in SAS!

/* skip processing of blocks of code */
/* like #if 0 / #endif in C/C++      */
%if 0 %then
  %do;
    proc ToBeDetermined;
      READMYMIND = Yes;
    run;
  %end;

Run code only on a certain day of the week

I have batch jobs that run daily, but that send e-mail to people only one day per week. Now this is easier to express inline with conditional logic.

/*If it's Monday, send a weekly report by email */
%if %sysfunc(today(),weekday1.)=2 %then
  %do;
    options emailsys=smtp emailhost=myhost.company.com;
    filename output email
      subject = "Weekly report for &SYSDATE."
      from = "SAS Dummy <sasdummy@sas.com>"
      to = "knowledgethirster@curious.net"
      ct ='text/html';
 
  ods tagsets.msoffice2k(id=email) 
    file=OUTPUT(title="Important Report!")
    style=seaside;
   title "The Weekly Buzz";
   proc print data=amazing.data;
   run;
  ods tagsets.msoffice2k(id=email) close;
  %end;

Check a system environment variable before running code

For batch jobs especially, system environment variables can be a rich source of information about the conditions under which your code is running. You can glean user ID information, path settings, network settings, and so much more. If your SAS program needs to pick up cues from the running environment, this is a useful method to accomplish that.

/* Check for system environment vars before running code */
%if %sysfunc(sysexist(ORACLE_HOME)) %then
  %do;
    %put NOTE: ORACLE client is installed.;
    /* assign an Oracle library */
    libname ora oracle path=corp schema=alldata authdomain=oracle;
  %end;

Limitations of %IF/%THEN in open code

As awesome as this feature is, there are a few rules that apply to the use of the construct in open code. These are different from what's allowed within a %MACRO wrapper.

First rule: your %IF/%THEN must be followed by a %DO/%END block for the statements that you want to conditionally execute. The same is true for any statements that follow the optional %ELSE branch of the condition.

And second: no nesting of multiple %IF/%THEN constructs in open code. If you need that flexibility, you can do that within a %MACRO wrapper instead.

And remember, this works only in SAS 9.4 Maintenance 5 and later. That includes the most recent release of SAS University Edition, so if you don't have the latest SAS release in your workplace, this gives you a way to kick the tires on this feature if you can't wait to try it.

The post Using %IF-%THEN-%ELSE in SAS programs appeared first on The SAS Dummy.

7月 032018
 

At SAS, we love data. Data is central to our corporate vision: to transform a world of data into a world of intelligence. We're also famous for enjoying M&Ms, but to us they are more than a sweet snack. They're also another source of data.

My colleague Pete Privitera, with a team of like-minded "makers," built a device that they named SnackBot. SnackBot is an internet-connected sensor that measures the flow of M&Ms in a particular SAS break room. There's a lot to love about this project. You can learn more by watching its origin story in this video:

As the number of M&Ms changes, SnackBot takes a reading and records the M&M count in a database. Most readings reflect a decrease in candy pieces, as my colleagues help themselves to a treat. But once per week, the reading shows a drastic increase -- as our facilities staff restocks the canister. SnackBot has its own website. It also has its own API, and you know what that means (right?). It means that we can use SAS to read and analyze the sensor data.

Reading sensor data into SAS with the SnackBot API

The SnackBot system offers a REST API with a JSON data response. Like any REST API, we can use PROC HTTP to fetch the data, and the JSON library engine to parse the response into a SAS data set.

%let start = '20MAY2018:0:0:0'dt;
 
/* format the start/end per the API needs */
%let start_time= %sysfunc(putn(&start.,is8601dt26.));
%let end_time=   %sysfunc(datetime(),is8601dt26.);
 
/* Call the SnackBot API from snackbot.net */
filename resp temp;
proc http
  method="GET"
  url="http://snackbot.net/snackdata?start_time=&start_time.%str(&)end_time=&end_time.%str(&)utc_offset_minutes=-240"
  out=resp;
run;
 
/* JSON libname engine to read the result       */
/* Simple record layout, all in the ROOT member */
libname mms json fileref=resp;
data mmlevels;
  set mms.root;
run;

I've written about how to use SAS with REST APIs in several other blog posts, so I won't dwell on this part of the process here. This short program retrieves the raw data from SnackBot, which represents a series of M&M "levels" (count of remaining pieces) and a timestamp for each measurement. It's a good start. Though there are only two fields to work with here, there's quite a bit we can do with these data.

raw SnackBot data

Add features to the raw sensor data

With a few additional DATA step statements and some built-in SAS formats, we can derive several interesting characteristics of these data for use in further analysis.

First, we need to convert the character-formatted datetime field to a proper SAS datetime value. That's easily achieved with the INPUT function and the ANYDTDTM informat. (Rick Wicklin wrote a helpful article about how how the ANYDT* informats work.)

data mmlevels;
  set mms.root;
  drop ordinal_root timestamp;
  /* Convert the TIMESTAMP field to native value -- it's a character */
  datetime = input(timestamp, anydtdtm.);
  date = datepart(datetime);
  time = timepart(datetime);
  dow = date;
  qhour = round(datetime,'0:15:0'T);
  format  datetime datetime20. 
          qhour datetime20.
          date date9.
          time timeampm10.
          dow downame.;
run;

For convenience, I duplicated the datetime value a few times and applied different formats so we can get different views of the same value: datetime, just the date, just the time-of-day, and the day-of-week. I also used the ROUND function to "round" the raw datetime value to the nearest quarter hour. I'll explain why I've done that in a later step, but the ROUNDing of a time value is one of the documented unusual uses of the ROUND function.

SnackBot data with features

Even with this small amount of data preparation, we can begin to analyze the characteristics of these data. For example, let's look at the descriptive stats for the data classified by day-of-week:

title "SnackBot readings per day-of-week";
proc means data=mmlevels mean stddev max min;
 var pieces;
 class dow;
run;

SnackBot by day of week

The "N Obs" column shows the number of measurements taken over the entire "study period" broken down by day-of-week. If a measurement is a proxy for a "number-of-pieces-changed" event, then we can see that most events happen on Wednesday, Thursday, and Friday. From this, can you guess which day the M&M canister is refilled?

Let's take another slice through these data, but this time looking at time-of-day. For this, I used PROC FREQ to count the measurements by hour. I applied the HOUR2. format, which allows the SAS procedure to group these data into hour-long intervals with no need for additional data prep. ( I've written previously about how to use SAS formats to derive new categories without expensive data rewriting.) Then I used PROC SGPLOT to produce a step plot for the 24-hour cycle.

/* Count of readings per hour of the day */ 
title "SnackBot readings per hour";
proc freq data=mmlevels ;
 table time / out=perhour;
 format time hour2.;
run;
 
ods graphics / height=400 width=800;
 
title "SnackBot readings per hour";
proc sgplot data=perhour des="Readings per hour of day";
 step x=time y=count;
 xaxis min='0:0:0't max='24:0:0't label="Time of day" grid;
 yaxis label="Servings";
run;

SnackBot hour step

From the chart, we can see that most M&M "events" happen around 11am, and then again between 2pm and 4pm. From personal experience, I can confirm that those are the times when I hear the M&Ms calling to me.

Expand the time series to regular intervals

The SnackBot website can tell you how many M&Ms are remaining right now. But what if you want to know how many were remaining last Friday? Or on any typical Monday morning?

The sensor data that we've analyzed so far is sparse -- that is, there are data entries for each "change" event, but not for every discrete time interval in the study period. I don't know how the SnackBot sensor records its readings -- it might sample the M&M levels every minute, or every second. Regardless, the API reports (and probably stores) only the records that represent a change. If SnackBot records that the final 24 pieces were depleted at 25JUN2018:07:45:00 (a Monday morning) bringing the count to 0, how many M&Ms remain at 1pm later that day? The data don't tell us explicitly with a recorded reading. But we can assume at that point that the count was still 0. The next recorded reading occurs at 27JUN2018:10:30:00 (on a Wednesday, bringing the count to 1332 -- oh joy!).

If we want to create a useful time series visualization of the M&M candy counts over time, we need to expand the time series from these sparse recordings to regular intervals. SAS offers a few sophisticated time series procedures to accomplish this: PROC EXPAND, PROC TIMESERIES, and PROC TIMEDATA. Each of these offer powerful econometrics methods for interpolation and forecasting -- and that's more than we need for this situation. For my example, I took a more low-tech approach.

First, I created an empty data set with datetime entries at quarter-hour intervals, covering the study period of the data we're looking at.

/* Empty data set with 15 minute interval slots    */
/* Regular intervals for the entire "study" period */
data timeslots;
  last = datetime();
  length qhour 8;
  format qhour datetime20;
  drop last i;
  do i = &start. to last by '0:15:00't;
    qhour = i;
    output;
  end;
run;

Then I used a DATA step to merge these empty slots with the actual event data that I had rounded to the nearest quarter hour (remember that?):

/* Merge the sample data with the timeslots */
data expand;
  merge mmlevels(keep=pieces qhour) timeslots;
  by qhour;
run;

Finally, I used a variation of a last-observation-carried-forward (LOCF) approach to fill in the remaining empty slots. If a reading at 20MAY2018:11:15:00 reports 132 pieces remaining, then that value should be RETAINed for each 15-minute slot until the next reading at 20MAY2018:17:30:00. (That reading is 82 pieces -- meaning somebody helped themselves to 50 pieces. Recommended serving size for plain M&Ms is 20 pieces, but I'm not passing judgement.) I also recorded a text value for the day-of-week to help with the final visualization.

/* for empty timeslots, carry the sample data   */
/* forward, so we always have a count of pieces */
/* Variation on a LOCF technique                */
data final;
  set expand;
  length day $ 3;
  /* 3-char value for day of week */
  day=put(datepart(qhour),weekdate3.);
  retain hold;
  if not missing(pieces) then
    hold=pieces;
  else pieces=hold;
  drop hold;
  if not missing(pieces);
run;

Now I have data that represents the regular intervals that we need.

SnackBot regular intervals

Putting it all together

For my final visualization, I created a series plot for the study period. It shows the rise and fall of M&Ms levels in one SAS break room over several weeks. For additional "color", I annotated the plot with a block chart to delineate the days of the week.

title 'Plain M&M pieces on S1 tracked by SnackBot';
ods graphics / height=300 width=1600;
 
proc sgplot data=final des='M&M pieces tracked by SnackBot';
 
  /* plot the data as a series */ 
  series x=qhour y=pieces / lineattrs=(color=navy thickness=3px);
 
  /* Yes, these are the "official" M&M colors               */
  /* Will be applied in data-order, so works best when data */
  /* begins on a Sunday                                     */
  styleattrs datacolors=(red orange yellow green blue CX593B18 red);
  /* block areas to indicate days-of-week                   */
  block x=qhour block=day / transparency=0.65
    valueattrs=(weight=bold size=10pt color=navy);
 
  xaxis minor display=(nolabel);
  yaxis display=(nolabel) grid max=1600 minor;
run;

You can see the pattern. M&Ms are typically filled on Wednesday to the canister capacity of about 1400 pieces. We usually enter into the weekend with 0 remaining, but there are exceptions. The week of May 27 was our Memorial Day holiday, which explains the lack of activity on Monday (and even Tuesday) during that week as SAS folks took advantage of a slow week with their vacation plans.

SnackBot visualization

More about SAS and M&Ms data

You can download the complete code for this example from my public Gist on GitHub. The example code should work with SAS University Edition and SAS OnDemand for Academics, as well as with any SAS environment that can reach the internet with PROC HTTP.

For more M&M data fun, check out Rick Wicklin's article about the distribution of colors in plain M&Ms. SnackBot does not (yet) report on how many and which color of M&Ms are taken per serving, but using statistics, we can predict that!

The post The Internet of Snacks: SnackBot data and what it reveals about SAS life appeared first on The SAS Dummy.

6月 212018
 

I've said it before: spreadsheets are not databases. However, many of us use spreadsheets as if they were databases, and then we struggle when the spreadsheet layout does not support database-style rigor of predictable rows, columns, and variable types -- the basic elements we need for analytics and reporting. If you're using SAS to read data from Microsoft Excel, what can you do when the data you need doesn't begin at cell A1?

By design, SAS can read data from any range of cells in your spreadsheet. In this article, I'll describe how to use the RANGE statement in PROC IMPORT to get the data you need.

With SAS 9.4 and later, SAS recommends using DBMS=XLSX for the most flexibility. It works on all operating systems without the need for additional components like the PC Files Server. Your Excel file must be in the Excel 2007-or-later format (XLSX). You do need a licence for SAS/ACCESS to PC Files. (Just learning? These DBMS=XLSX techniques also work in SAS University Edition.)

If your Excel data does not begin in cell A1 (the default start point for an import process), then you can add a RANGE= value that includes the specific cells. The easiest method is to use a Named Range in Excel to define the exact boundaries of the data.

How to add a Named Range

To define a named range in Excel, highlight the range of cells to include and simply type the new name of the range in the Name Box:
Excel named range

Then save the Excel file.

Then to import into SAS, specify that range name in the RANGE= option:

proc import datafile="/myprojects/myfile.xlsx"
 out=mydata 
 replace;
range="myspecialrange";
run;

Using Excel notation for a cell range

What if you don't know the range ahead of time? You can use PROC IMPORT to read the entire sheet, but the result will not have the column headers and types you want. Consider a sheet like this:

Excel with floating data

This code will read it:

proc import datafile="/myprojects/middle.xlsx"
 out=mid dbms=xlsx
 replace;
run;

But the result will contain many empty cells, and the values will be read as all character types:

Excel naive import

With additional coding, you can "fix" this result in another pass using DATA step. Or, if you're willing to add the RANGE option with the Excel notation for the specific cell ranges, you can read it properly in the first pass:

proc import datafile="/myprojects/middle.xlsx"
 out=mid dbms=xlsx
 replace;
 range="Sheet1$E7:K17" ;
run;

How to "discover" the structure of your Excel file

You can also use LIBNAME XLSX to read entire sheets from Excel, or simply as a discovery step to see what sheets the Excel file contains before you run PROC IMPORT. However, LIBNAME XLSX does not show the Excel named ranges.

On SAS for Windows systems, you can use LIBNAME EXCEL (32-bit) or LIBNAME PCFILES (64-bit) to reveal a little more information about the Excel file.

libname d pcfiles path="c:\myprojects\middle.xlsx";
proc datasets lib=d; quit;
 
/* always clear the libname, as it locks the file */
libname d clear;

Libname XLSX proc datasets

Note that DBMS=XLSX does not support some of the options we see in the legacy DBMS=XLS (which supports only old-format XLS files), such as STARTROW and NAMEROW. DBMS=XLSX does support GETNAMES (treats the first record of the sheet or range as the variable names). See the full reference for Excel file import/export in the SAS documentation.

The post How to use SAS to read a range of cells from Excel appeared first on The SAS Dummy.

5月 082018
 

WARNING: This blog post references Avengers: Infinity War and contains story spoilers. But it also contains useful information about random number generators (RNGs) -- tempting! If you haven't yet seen the movie, you should make peace with this inner conflict before reading on.

Throughout the movie, Thanos makes it clear that his goal is to eliminate half of the population of every civilization in the universe. With the power of all six infinity stones imbued into his gauntlet, he'll be able to accomplish this with a "snap of his fingers." By the end of the film, Thanos has all of the stones, and then he literally snaps his fingers. (Really? I kept thinking that this was just a figure of speech he used to indicate how simple this will be -- but I guess it works more like the ruby slippers in The Wizard of Oz. Some clicking was required.)

So, Thanos snaps his huge fingers and -- POOF -- there goes half of us. Apparently the universe already had some sort of population-reduction subroutine just waiting for a hacker like Thanos to access it. Who put that there? Not a good plan, universe designer. (Check here to see if you survived the snap.)

But how did Thanos (or the universe) determine which of us was wiped from existence and which of us was spared? I have to assume that it was a seriously high-performing, massively parallel random number generator. And if Thanos had access to 9.4 Maintenance 5 or later (part of the Power [to Know] stone?), then he would have his choice of algorithms.

(Tony Stark has been to SAS headquarters, but we haven't seen Thanos around here. Still, he's welcome to download SAS University Edition.)

Your own RNG gauntlet, built into SAS

I know a little bit about this topic because I talked with Rick Wicklin about RNGs. As Rick discusses in his blog post, a recent release of SAS added support for several new/updated RNG algorithms, including Mersenne twister, PCG, Threefry, and one that introduces hardware-based entropy for "extra randomness." If you want to save yourself some reading, watch our 10-minute discussion here.

Implementing my own random Avengers terminator

I was going to write a SAS program to simulate Thanos' "snap," but I don't have a list of every single person in the universe (thanks GDPR!). However, courtesy of IMDB.com, I do have a list of the approximately 100 credited characters in the Infinity War movie. I wrote a DATA step to pull each name into a data set and "randomly decide" each fate by using the new PCG algorithm and the RAND function with a Bernoulli (binomial) distribution. I learned that trick from Rick's post about simulating coin flips. (I hope I did this correctly; Rick will tell me if I didn't.)

%let algorithm = PCG;
data characters;
  call streaminit(2018,"&algorithm.");
  infile datalines dsd;
  retain x 0 y 1;
  length Name $ 60 spared 8 x 8 y 8;
  input Name;
  Spared = rand("Bernoulli", 0.5);
  x+1;
  if x > 10 then
    do; y+1; x = 1;end;
datalines;
Tony Stark / Iron Man
Thor
Bruce Banner / Hulk
Steve Rogers / Captain America
/* about 96 more */
;
run;

After all of the outcomes were generated, I used PROC FREQ to check the distribution. In this run, only 48% were spared, not an even 50%. Well, that's randomness for you. On the universal scale, I'm not sure that anyone is keeping track.

How many spared

Using a trick I learned from Sample 54315: Customize your symbols with the SYMBOLCHAR statement in PROC SGPLOT, I created a scatter plot of the outcomes. I included special Unicode characters to signify the result in terms that even Hulk can understand. Hearts represent survivors; frowny faces represent the vanished heroes. Here's the code:

data thanosmap;
  input id $ value $ markercolor $ markersymbol $;
  datalines;
status 0 black frowny
status 1 red heart
;
run;
 
title;
ods graphics / height=400 width=400 imagemap=on;
proc sgplot data=Characters noautolegend dattrmap=thanosmap;
  styleattrs wallcolor=white;
  scatter x=x y=y / markerattrs=(size=40) 
    group=spared tip=(Name Spared) attrid=status;
  symbolchar name=heart char='2665'x;
  symbolchar name=frowny char='2639'x;
  xaxis integer display=(novalues) label="Did Thanos Kill You? Red=Dead" 
    labelattrs=(family="Comic Sans MS" size=14pt);
    /* Comic Sans -- get it ???? */
  yaxis integer display=none;
run;

Scatter plot of spared

For those of you who can read, you might appreciate a table with the rundown. For this one, I used a trick that I saw on SAS Support Communities to add strike-through text to a report. It's a simple COMPUTE column with a style directive, in a PROC REPORT step.

proc report data=Characters nowd;
  column Name spared;
  define spared / 'Spared' display;
  compute Spared;
    if spared=1 then
      call define(_row_,"style",
        "style={color=green}");
    if spared=0 then
      call define(_row_,"style",
        "style={color=red textdecoration=line_through}");
  endcomp;
run;

Table of results

Remember, my results here were generated with SAS and don't match the results from the film. (I feel like I need to say that to preempt a few comments.) The complete code for this blog post is available on my public Gist.

Learn more about RNGs

Just as the end of Avengers: Infinity War has sent throngs of viewers to the Internet to find out What's Next, I expect that readers of this blog are eager to learn more about these modern random number generators. Here are the go-to articles from Rick that are worth your review:

Unanswered questions

Before Thanos completed his gauntlet, his main hobby was traveling around the cosmos reducing the population of each civilization "the hard way." With the gauntlet in hand when he snapped his fingers, did he eliminate one-half of the remaining population? Or did the universe's algorithm spare those civilizations that had already been culled? Was this a random sample with replacement or not? In the film, Thanos did not express concern about these details (typical upper management attitude), but the grunt-workers of the universe need to know the parameters for this project. Coders need exact specifications, or else you can expect less-than-heroic results from your infinity gauntlet. I'm pretty sure it says so in the owner's manual.

The post Which random number generator did Thanos use? appeared first on The SAS Dummy.

5月 012018
 

The title of this blog says what you really need to know: SAS Enterprise Guide does have a future, and it's a bright one. Ever since SAS Studio debuted in 2014, onlookers have speculated about its impact on the development of SAS Enterprise Guide.

I think that we have been consistent with our message that SAS Enterprise Guide serves an important purpose -- a power-user interface for SAS on the desktop -- and that the product will continue to get support and new features. But that doesn't stop folks from wondering whether it might meet sudden demise like a favorite Star Wars or Game of Thrones character.

I recently recorded a session with Amy Peters, the SAS product manager for SAS Enterprise Guide and SAS Studio. Amy loves to meet with SAS users and hear their successes, their concerns, and their ideas. Her enthusiasm for SAS Enterprise Guide comes through in this video, even as I bumble my way through the prototype of the Next Big Release.

Coming soon: the features of a modern IDE

In addition to a much-needed makeover and modern appearance, the new version of SAS Enterprise Guide (scheduled for sometime in 2019) addresses many of the key requests that we hear from SAS users. First, the new version blows open the window management capabilities. You can open and view many items -- programs, data, log, results -- at the same time, and arrange those views exactly as you want. You can spread your workspace over multiple displays. And you can tear away or dock each item to suit your working style.

Screenshot of Future EG

(in development) screenshot of SAS Enterprise Guide

Second, you can decide whether you want to work with a SAS Enterprise Guide project -- or just simply write and run code. Currently you must start with a project before you can create or open anything else. The new version allows you leverage a project to organize your work...or not, depending on your need at the moment.

And finally, you can expect more alignment and collaboration features between SAS Studio and SAS Enterprise Guide. We see that more users find themselves using both interfaces for related tasks, and presenting a common experience is important. SAS Studio runs in your browser while SAS Enterprise Guide works on your desktop. Each application has different capabilities related to that, but there's no reason that they need to be so different, right?

For more information about what the future will bring, check out the communities article that recaps the SAS Global Forum 2018 presentation. It includes an attached presentation slide deck with many exciting screenshots and roadmap details. All of this is subject to change, of course (including release dates!), but I think it's safe to say the future is bright for SAS users who love their tools.

The post A productive future for SAS Enterprise Guide appeared first on The SAS Dummy.

4月 052018
 

Sir Tim Berners-Lee is famous for inventing the World Wide Web and for the construction of URLs -- a piece of syntax that every 8-year-old is now familiar with. According to the lore, when Sir Tim invented URLs he did not imagine that Internet surfers of all ages and backgrounds would be expected to type these cryptic schemes into their browser windows...but here we are. Today's young people can navigate the Web with URLs in the same way that migratory birds can find their way South for the winter -- by pure instinct.

URLs are syntax, the language of Web navigation. And HTML is syntax too, the language of the Web page. Each of these represent instructions to our web browser, telling it to how to go somewhere or display something. As we navigate the web programmatically, it is sometimes necessary to encode information in a URL or in HTML in a way that it won't be mistakenly interpreted as syntax. And of course, SAS has some functions for that.

Here's a table with links to the SAS documentation for these functions. You'll find they are intuitively named. The names are the same or similar to corresponding functions in other programming languages -- you can get only so creative with basic functions like these.

HTMLDECODE Function   Decodes a string that contains HTML numeric character references or HTML character entity references, and returns the decoded string.
HTMLENCODE Function   Encodes characters using HTML character entity references, and returns the encoded string.
URLDECODE Function   Returns a string that was decoded using the URL escape syntax.
URLENCODE Function   Returns a string that was encoded using the URL escape syntax.

Of these four functions, I use URLENCODE the most often. I need it when I need to pass syntax for instructions to a REST API, in which the API call itself is a URL. Here's an example from my paper about accessing Google Analytics with SAS:

%let workdate='01Oct2017'd;
%let urldate=%sysfunc(putn(&workdate.,yymmdd10.));
%let metrics=%sysfunc(urlencode(%str(ga:pageviews,ga:sessions)));
%let id=%sysfunc(urlencode(%str(ga:XXXXXX)));
filename garesp temp;
proc http
  url="https://www.googleapis.com/analytics/v3/data/ga?ids=&id.%str(&)start-date=&urldate.%str(&)end-date=&urldate.%str(&)metrics=&metrics.%str(&)max-results=20000"
  method="GET" out=garesp;
  headers 
    "Authorization"="Bearer &access_token."
    "client-id:"="&client_id.";
 run;

In the previous example, I need to include "ga:pageviews,ga:sessions" as an instruction on the Google Analytics API, but the colon character has special meaning within the URL syntax. I need to "escape" the colon character so that the URL parser ignores it and simply passes the value to the API. The URLENCODE function converts this segment to "ga%3Apageviews,ga%3Asessions". The colon has been replaced by its hexadecimal code, set off by a percent sign: %3A.

An aside: the percent (%) and ampersand (&) characters have special meaning in URLs. They also have special meaning in the SAS macro language. We can claim that the SAS macro language preceded URL syntax by decades, but there are only so many characters on the keyboard that syntax designers can use to set off code instructions. I use the

HTMLENCODE is useful when you need to represent HTML syntax in your output, but prevent the web browser from interpreting the HTML as code. Here's a simple example.

filename out temp;
ods html5 file=out;
data link;
  site = "sas.com";
  link = "<a href='https://www.sas.com'>sas.com</a>";
  code_as = htmlencode(link);
run;
 
proc print data=link;
run;
ods html5 close;

When produced using the HTML5 destination, the link variable is formatted as a live link, while the code_as variable shows the syntax that went into it.

As you might expect, the URLDECODE function "unescapes" the URL hex characters and restores the original URL syntax. HTMLDECODE does the same for HTML content. If you are writing code that implements an API endpoint in SAS (as you might do with a SAS stored process on the back end of a web service), you'll find these functions useful to unpack the information that was encoded on an API call.

HTMLENCODE and URLENCODE are not interchangeable. More than once, I have written programs that mistakenly use HTMLENCODE when it was URLENCODE that was needed. Those mistakes can be tricky to debug, so pay attention!

Cover image by Fabio Lanari, Internet2, CC BY-SA 4.0

The post SAS functions to encode and decode data for the Web appeared first on The SAS Dummy.

2月 062018
 

Good news learners! SAS University Edition has gone back to school and learned some new tricks.

With the December 2017 update, SAS University Edition now includes the SASPy package, available in its Jupyter Notebook interface. If you're keeping track, you know that SAS University Edition has long had support for Jupyter Notebook. With that, you can write and run SAS programs in a notebook-style environment. But until now, you could not use that Jupyter Notebook to run Python programs. With the latest update, you can -- and you can use the SASPy library to drive SAS features like a Python coder.

Oh, and there's another new trick that you'll find in this version: you can now use SAS (and Python) to access data from HTTPS websites -- that is, sites that use SSL encryption. Previous releases of SAS University Edition did not include the components that are needed to support these encrypted connections. That's going to make downloading web data much easier, not to mention using REST APIs. I'll show one HTTPS-enabled example in this post.

How to create a Python notebook in SAS University Edition

When you first access SAS University Edition in your web browser, you'll see a colorful "Welcome" window. From here, you can (A) start SAS Studio or (B) start Jupyter Notebook. For this article, I'll assume that you select choice (B). However, if you want to learn to use SAS and all of its capabilities, SAS Studio remains the best method for doing that in SAS University Edition.

When you start the notebook interface, you're brought into the Jupyter Home page. To get started with Python, select New->Python 3 from the menu on the right. You'll get a new empty Untitled notebook. I'm going to assume that you know how to work with the notebook interface and that you want to use those skills in a new way...with SAS. That is why you're reading this, right?

Move data from a pandas data frame to SAS

pandas is the standard for Python programmers who work with data. The pandas module is included in SAS University Edition -- you can use it to read and manipulate data frames (which you can think of like a table). Here's an example of retrieving a data file from GitHub and loading it into a data frame. (Read more about this particular file in this article. Note that GitHub uses HTTPS -- now possible to access in SAS University Edition!)

import saspy
import pandas as pd
 
df = pd.read_csv('https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv')
df.describe()

Here's the result. This is all straight Python stuff; we haven't started using any SAS yet.

Before we can use SAS features with this data, we need to move the data into a SAS data set. SASPy provides a dataframe2sasdata() method (shorter alias: df2sd) that can import your Python pandas data frame into a SAS library and data set. The method returns a SASdata object. This example copies the data into WORK.PROBLY in the SAS session:

sas = saspy.SASsession()
probly = sas.df2sd(df,'PROBLY')
probly.describe()

The SASdata object also includes a describe() method that yields a result that's similar to what you get from pandas:

Drive SAS procedures with Python

SASPy includes a collection of built-in objects and methods that provide APIs to the most commonly used SAS procedures. The APIs present a simple "Python-ic" style approach to the work you're trying to accomplish. For example, to create a SAS-based histogram for a variable in a data set, simply use the hist() method.

SASPy offers dozens of simple API methods that represent statistics, machine learning, time series, and more. You can find them documented on the GitHub project page. Note that since SAS University Edition does not include all SAS products, some of these API methods might not work for you. For example, the SASml.forest() method (representing

In SASPy, all methods generate SAS program code behind the scenes. If you like the results you see and want to learn the SAS code that was used, you can flip on the "teach me SAS" mode in SASPy.

sas.teach_me_sas('true')

Here's what SASPy reveals about the describe() and hist() methods we've already seen:

Interesting code, right? Does it make you want to learn more about SCALE= option on PROC SGPLOT?

If you want to experiment with SAS statements that you've learned, you don't need to leave the current notebook and start over. There's also a built-in %%SAS "magic command" that you can use to try out a few of these SAS statements.

%%SAS
proc means data=sashelp.cars stackodsoutput n nmiss median mean std min p25 p50 p75 max;run;

Python limitations in SAS University Edition

SAS University Edition includes over 300 Python modules to support your work in Jupyter Notebook. To see a complete list, run the help('modules') command from within a Python notebook. This list includes the common Python packages required to work with data, such as pandas and NumPy. However, it does not include any of the popular Python-based machine learning modules, nor any modules to support data visualization. Of course, SASPy has support for most of this within its APIs, so why would you need anything else...right?

Because SAS University Edition is packaged in a virtual machine that you cannot alter, you don't have the option of installing additional Python modules. You also don't have access to the Jupyter terminal, which would allow you to control the system from a shell-like interface. All of this is possible (and encouraged) when you have your own SAS installation with your own instance of SASPy. It's all waiting for you when you've outgrown the learning environment of SAS University Edition and you're ready to apply your SAS skills and tech to your official work!

Learn more

The post Coding in Python with SAS University Edition appeared first on The SAS Dummy.

1月 242018
 

Using SAS with REST APIs is fun and rewarding, but it's also complicated. When you're dealing with web services, credentials, data parsing and security, there are a lot of things that can go wrong. It's useful to have a simple program that verifies that the "basic plumbing" is working before you try to push a lot of complex coding through it.

I'm gratified that many of my readers are able to adapt my API examples and achieve similar results. When I receive a message from a frustrated user who can't get things to work, the cause is almost always one of the following:

  • Not using a recent enough version of SAS. PROC HTTP was revised and improved in SAS 9.4 Maint 3. The JSON library engine was added in SAS 9.4 Maint 4 (released in 2016). It's time to upgrade!
  • Cannot access the web service through a corporate firewall. Use the SSL certificates with the SSLCALISTLOC= option. If you're using SAS University Edition, you'll need the update from December 2017 or later -- that's where SSL was added. The editions prior to that did not include the SSL support.

The following SAS program is a simple plumbing test. It uses a free HTTP test service (httpbin.org) to verify your Internet connectivity from SAS and your ability to use SSL. The endpoint returns a JSON-formatted collection of timestamps in various formats, which the program parses using the JSON library engine. I have successfully run this program from my local SAS on Windows, from SAS University Edition (Dec 2017 release), and from SAS OnDemand for Academics (using SAS Studio).

If you can run this program successfully from your SAS session, then you're ready to attempt the more complex REST API calls. If you encounter any errors while running this simple test, then you will need to resolve these before moving on to the really useful APIs. (Like maybe checking on who is in space right now...)

/* PROC HTTP and JSON libname test          */
/* Requires SAS 9.4m4 or later to run       */
/* SAS University Edition Dec 2017 or later */
 
/* utility macro to echo the JSON out */
%macro echoFile(fn=);
  data _null_;
   infile &fn end=_eof;
   input;
   put _infile_;
  run;
%mend;
 
filename resp "%sysfunc(getoption(WORK))/now.json";
proc http
 url="https://now.httpbin.org/"
 method="GET"
 out=resp;
run;
 
/* Supported with SAS 9.4 Maint 5 */
/*
 %put HTTP Status code = &SYS_PROCHTTP_STATUS_CODE. : &SYS_PROCHTTP_STATUS_PHRASE.; 
*/
 
%echoFile(fn=resp);
 
/* Tell SAS to parse the JSON response */
libname time JSON fileref=resp;
 
title "JSON library structure";
proc datasets lib=time;
quit;
 
/* interpret the various datetime vals and convert to SAS */
data values (keep=from:);
 length from_epoch 8
        from_iso8601 8
        from_rfc2822 8
        from_rfc3339 8;
 
 /* Apply the DT format to a range of vars */
 format from: datetime20.;
 set time.now;
 
 /* epoch = # seconds since 01Jan1970        */
 /* SAS datetime is based on 01Jan1960, so   */
 /* add the offset here for a SAS value      */
 from_epoch = epoch + '01Jan1970:0:0:0'dt;
 
 /* straight conversion to ISO8061           */
 from_iso8601 = input(iso8601,is8601dt.);
 
 /* trim the leading "day of week"      */
 from_rfc2822 = input(substr(rfc2822,5),anydtdtm21.);
 
 /* allow SAS to figure it out */
 from_rfc3339 = input(rfc3339,anydtdtm.);
 
 run;
 
title "Raw values from the JSON response";
proc print data=time.now (drop=ord:);
run;
 
title "Formatted values as SAS datetime";
proc print data=values;
run;
 
libname time clear;
filename resp clear;

The post How to test PROC HTTP and the JSON library engine appeared first on The SAS Dummy.

1月 232018
 

Is it time to add SAS to the list of "romance" languages?*.

It's no secret that there are enthusiastic SAS programmers who love the SAS language. So it only makes sense that sometimes, these programmers will "nerd out" and express their adoration for fellow humans by using the code that they love. For evidence, check out these contributions from some nerdy would-be Cupids:

Next-level Nerd love: popping the question with SAS

Recently a SAS community member took this to the next level and crafted a marriage proposal using SAS. With help from the SAS community, this troubadour wrote a SAS program that popped the question in a completely new way. Our hero wanted to present a SAS program that was cryptic enough so that a casual glance would not reveal its purpose. But it had to be easy to open and run within SAS University Edition. His girlfriend is a biostatistics student; she's not an expert in SAS, but she does use it for her coursework and research. With over 70 messages exchanged among nearly 20 people, the community came through.

When Brandon's beloved ran his program in SAS University Edition, she was presented with an animated graphic that metaphorically got down on one knee and asked for her hand in marriage. And she said "Yes!" (Or maybe in SAS terms her answer was "RESPONSE=1".)

Simulated version of the proposal created with SAS

A marriage proposal that was helped along by the SAS Support Communities? That's definitely a remarkable event, but it's not completely surprising to us at SAS. SAS users regularly impress us by combining their SAS skills with their creativity. And, with their willingness to help one another.

The SAS Valentine Challenge for YOU

With Valentine's Day approaching, we thought that it would be fun to prepare some Valentine's greetings using SAS. We've started a topic on the SAS Support Communities and we invite you to contribute your programs and ideas. You can use any SAS code or tool that you want: ODS Graphics, SAS/GRAPH, SAS Visual Analytics, even text-based output. Our only request is that it's something that another SAS user could pick up and use. In other words, show your work!

It's not exactly a contest so there are no rules for "winning." Still, we hope that you'll try to one-up one another with your creative contributions. The best entries will get bragging rights and some social media love from SAS and others. And who knows? Maybe we'll have some other surprises along the way. We'll come back and update this post with some of our favorite replies.

Check out the community topic and add your idea. We can't wait to see what you come up with!

* Yes, I know that "romance" languages are named such because they stem from the language spoken by the Romans -- not because they are inherently romantic in the lovey-dovey sense. Still, it's difficult to resist the play on words.

A Valentine challenge: express your <3 with SAS was published on SAS Users.

1月 172018
 

I've used SAS with a bunch of different REST APIs: GitHub, Brightcove, Google Analytics, Lithium, LinkedIn, and more. For most of these I have to send user/password or "secret" application tokens to the web service so that it knows who I am and what data I can retrieve. I do not want to keep this secret information in my SAS program files -- that would be a bad idea. If my credentials were part of the program -- even if they were obfuscated and not stored in clear text -- then anyone who managed to get a copy of my program could run it. And they could gain access to my data, as if they were me.

I've written about this topic for SAS-related passwords. In this article, I'll share the approach that I use for API credentials and tokens.

REST APIs: Each service requires different types of secrets

My REST API services don't require just simple user ID and password combos. It depends on the API, but usually the information is in the form of one or more tokens that I've generated using the vendor's developer console, or perhaps that have been granted by an administrator.

For example, to access the Google Analytics API, I need three things: a client ID, a client secret token, and a valid "refresh" token. I can send these three items to the Google OAuth2 API, and in return I'll receive a live "access" token that I can use to request my data. I think of this like checking into a hotel. I show my ID and a credit card at the front desk, and in exchange I receive a room key. Just like my hotel room key, the access token doesn't last forever and cannot be reused on my next visit.

Other APIs are simpler and require just a single token that never expires. That's more like a house key -- it's mine to use forever, or until someone decides to change the locks.

Whether a static token or a token-for-token exchange, I don't want to just leave these keys lying around for just anyone to find and use.

Hide your tokens in a file that only you can read

My simple approach is to store the token values in a text file within my home directory. Then, I change the permissions on the file such that only my account can read it. Whether I submit my program interactively (in SAS Enterprise Guide or SAS Studio) or as a scheduled batch job, it's running under my account. I'm showing the instructions here for UNIX/Linux, but Windows users can accomplish something similar with Windows permissions.

On Linux, I've used the chmod command to specify the octal value that says "only the owner can read/write." That's "chmod 600 filename". The "ls -l" command shows that this permissions mask has been applied.

chmod 600 ./.google_creds.csv
ls -l ./.google_creds.csv
> -rw------- 1 myid mygroup 184 Jan 15 12:41 ./.google_creds.csv

I stored my tokens in a standard CSV format because it's easy for SAS to read and it's easy for me to read if I ever need to change it.

Use INFILE to read the tokens dynamically

With this critical data now stored externally, and the file permissions in place, I can use SAS to read the credentials/tokens within my program and store the values in SAS macro variables. In the following SAS program, I assigned a macro variable to my user root folder. Since I might run this program on Linux or Windows, I used this trick to determine the proper path notation. I also used the &SYSUSERID macro variable to make my program more portable. If I want to supply this program to any colleagues (or to you!), the only thing that's needed is to create and store the token CSV files in the proper location.

/* My path is different for UNIX vs Windows */
%let authpath = %sysfunc(ifc(&SYSSCP. = WIN,
	 \\netshare\root\u\&sysuserid.,
	 /u/&sysuserid.));
 
/* This should be a file that only YOU or trusted group members can read */
/* Use "chmod 0600 filename" in UNIX environment */
/* "dotfile" notation is convention for on UNIX for "hidden" */
filename auth "&authpath./.google_creds.csv";
 
/* Read in the secret account keys from another file */
data _null_;
 infile auth firstobs=2 dsd delimiter=',' termstr=crlf;
 length client_id $ 100 client_secret $ 30 refresh_token $ 60;
 input client_id client_secret refresh_token;
 call symputx('client_id',client_id);
 call symputx('client_secret',client_secret);
 call symputx('refresh_token',refresh_token);
run;

When I run this code in my production job, I can see the result:

NOTE: The infile AUTH is:
      Filename=/u/myid/.google_creds.csv,
      Owner Name=myid,Group Name=mygroup,
      Access Permission=rw-------,
      Last Modified=Mon Jan 15 12:41:58 2018,
      File Size (bytes)=184

NOTE: 1 record was read from the infile AUTH.
      The minimum record length was 145.
      The maximum record length was 145.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.01 seconds

For this example, my next step is to call the Google API to get my access token. I'll use the macro variables that my program created with CALL SYMPUT to build the proper API call.

/* Call Google API to exchange the refresh token for an active access token */
%let oauth2=https://www.googleapis.com/oauth2/v4/token;
filename rtoken temp;
proc http
 method="POST"
 url="&oauth2.?client_id=&client_id.%str(&)client_secret=&client_secret.%str(&)grant_type=refresh_token%str(&)refresh_token=&refresh_token."
 out=rtoken;
run;

See the full explanation of this Google Analytics example in this article.

The post How to secure your REST API credentials in SAS programs appeared first on The SAS Dummy.