JSON

11月 152018
 

JSON is a popular format for data exchange between APIs and some modern databases. It's also used as a way to archive audit logs in many systems. Because JSON can represent hierarchical relationships of data fields, many people consider it to be superior to the CSV format -- although it's certainly not yet universal.

I learned recently that newline-delimited JSON, also called JSONL or JSON Lines, is growing in popularity. In a JSONL file, each line of text represents a valid JSON object -- building up to a series of records. But there is no hierarchical relationship among these lines, so when taken as a whole the JSONL file is not valid JSON. That is, a JSON parser can process each line individually, but it cannot process the file all at once.

In SAS, use the JSON libname engine to parse valid JSON files. But neither of these can create or parse JSONL files directly. Here's a simple example of a JSONL file. Each line, enclosed in braces, represents valid JSON. But if you paste the entire body into a validation tool like JSONLint, the parsing fails.

JSONL sample, not valid in JSON parsers

If we needed these records to be true JSON, we need a hierarchy. This requires us to set off the rows with more braces and brackets and separate them with commas, like this:

Valid JSON with hierarchies

Creating JSONL with PROC JSON and DATA step

In a recent SAS Support Communities thread, a SAS user was struggling to use PROC JSON and a SAS data set to create a JSONL file for use with the Amazon Redshift database. PROC JSON can't create the finished file directly, but we can use PROC JSON to create the individual JSON object records. Our solution looks like this:

  • Use PROC JSON to read each record of the source data set and create a new JSON file. DATA step and CALL EXECUTE can generate these steps for us.
  • Using DATA step, post-process the collection of JSON files and append these into a final JSONL file.

Here's the code we used. You need to change only the output file name and the source SAS data set.

/* Build a JSONL (newline-delimited JSON) file */
/* from the records in a SAS data set          */
filename final "c:\temp\final.jsonl" ;
%let datasource = sashelp.class;
 
/* Create a new subfolder in WORK to hold */
/* temp JSON files, avoiding conflicts    */
options dlcreatedir;
%let workpath = %sysfunc(getoption(WORK))/json;
libname json "&workpath.";
libname json clear;
 
/* Will create a run a separate PROC JSON step */
/* for each record.  This might take a while   */
/* for very large data.                        */
/* Each iteration will create a new JSON file  */
data _null_;
 set &datasource.;
 call execute(catt('filename out "',"&workpath./out",_n_,'.json";'));
 call execute('proc json out=out nosastags ;');
 call execute("export &datasource.(obs="||_n_||" firstobs="||_n_||");");
 call execute('run;');
run;
 
/* This will concatenate the collection of JSON files */
/* into a single JSONL file                           */
data _null_;
 file final encoding='utf-8' termstr=cr;
 infile "&workpath./out*.json";
 input;
 /* trim the start and end [ ] characters */
 final = substr(_infile_,2,length(_infile_)-2);
 put final;
run;

From what I've read, it's a common practice to compress JSONL files with gzip for storage or faster transfers. That's a simple step to apply in our example, because SAS supports a GZIP method in SAS 9.4 Maintenance 5. To create a gzipped final result, change the first FILENAME statement to something like:

filename final ZIP "c:\temp\final.jsonl.gz" GZIP;

The JSONL format is new to me and I haven't needed to use it in any of my applications. If you use JSONL in your work, I'd love to hear your feedback about whether this approach would create the types of files you need.

The post Create newline-delimited JSON (or JSONL) with SAS appeared first on The SAS Dummy.

7月 032018
 

At SAS, we love data. Data is central to our corporate vision: to transform a world of data into a world of intelligence. We're also famous for enjoying M&Ms, but to us they are more than a sweet snack. They're also another source of data.

My colleague Pete Privitera, with a team of like-minded "makers," built a device that they named SnackBot. SnackBot is an internet-connected sensor that measures the flow of M&Ms in a particular SAS break room. There's a lot to love about this project. You can learn more by watching its origin story in this video:

As the number of M&Ms changes, SnackBot takes a reading and records the M&M count in a database. Most readings reflect a decrease in candy pieces, as my colleagues help themselves to a treat. But once per week, the reading shows a drastic increase -- as our facilities staff restocks the canister. SnackBot has its own website. It also has its own API, and you know what that means (right?). It means that we can use SAS to read and analyze the sensor data.

Reading sensor data into SAS with the SnackBot API

The SnackBot system offers a REST API with a JSON data response. Like any REST API, we can use PROC HTTP to fetch the data, and the JSON library engine to parse the response into a SAS data set.

%let start = '20MAY2018:0:0:0'dt;
 
/* format the start/end per the API needs */
%let start_time= %sysfunc(putn(&start.,is8601dt26.));
%let end_time=   %sysfunc(datetime(),is8601dt26.);
 
/* Call the SnackBot API from snackbot.net */
filename resp temp;
proc http
  method="GET"
  url="http://snackbot.net/snackdata?start_time=&start_time.%str(&)end_time=&end_time.%str(&)utc_offset_minutes=-240"
  out=resp;
run;
 
/* JSON libname engine to read the result       */
/* Simple record layout, all in the ROOT member */
libname mms json fileref=resp;
data mmlevels;
  set mms.root;
run;

I've written about how to use SAS with REST APIs in several other blog posts, so I won't dwell on this part of the process here. This short program retrieves the raw data from SnackBot, which represents a series of M&M "levels" (count of remaining pieces) and a timestamp for each measurement. It's a good start. Though there are only two fields to work with here, there's quite a bit we can do with these data.

raw SnackBot data

Add features to the raw sensor data

With a few additional DATA step statements and some built-in SAS formats, we can derive several interesting characteristics of these data for use in further analysis.

First, we need to convert the character-formatted datetime field to a proper SAS datetime value. That's easily achieved with the INPUT function and the ANYDTDTM informat. (Rick Wicklin wrote a helpful article about how how the ANYDT* informats work.)

data mmlevels;
  set mms.root;
  drop ordinal_root timestamp;
  /* Convert the TIMESTAMP field to native value -- it's a character */
  datetime = input(timestamp, anydtdtm.);
  date = datepart(datetime);
  time = timepart(datetime);
  dow = date;
  qhour = round(datetime,'0:15:0'T);
  format  datetime datetime20. 
          qhour datetime20.
          date date9.
          time timeampm10.
          dow downame.;
run;

For convenience, I duplicated the datetime value a few times and applied different formats so we can get different views of the same value: datetime, just the date, just the time-of-day, and the day-of-week. I also used the ROUND function to "round" the raw datetime value to the nearest quarter hour. I'll explain why I've done that in a later step, but the ROUNDing of a time value is one of the documented unusual uses of the ROUND function.

SnackBot data with features

Even with this small amount of data preparation, we can begin to analyze the characteristics of these data. For example, let's look at the descriptive stats for the data classified by day-of-week:

title "SnackBot readings per day-of-week";
proc means data=mmlevels mean stddev max min;
 var pieces;
 class dow;
run;

SnackBot by day of week

The "N Obs" column shows the number of measurements taken over the entire "study period" broken down by day-of-week. If a measurement is a proxy for a "number-of-pieces-changed" event, then we can see that most events happen on Wednesday, Thursday, and Friday. From this, can you guess which day the M&M canister is refilled?

Let's take another slice through these data, but this time looking at time-of-day. For this, I used PROC FREQ to count the measurements by hour. I applied the HOUR2. format, which allows the SAS procedure to group these data into hour-long intervals with no need for additional data prep. ( I've written previously about how to use SAS formats to derive new categories without expensive data rewriting.) Then I used PROC SGPLOT to produce a step plot for the 24-hour cycle.

/* Count of readings per hour of the day */ 
title "SnackBot readings per hour";
proc freq data=mmlevels ;
 table time / out=perhour;
 format time hour2.;
run;
 
ods graphics / height=400 width=800;
 
title "SnackBot readings per hour";
proc sgplot data=perhour des="Readings per hour of day";
 step x=time y=count;
 xaxis min='0:0:0't max='24:0:0't label="Time of day" grid;
 yaxis label="Servings";
run;

SnackBot hour step

From the chart, we can see that most M&M "events" happen around 11am, and then again between 2pm and 4pm. From personal experience, I can confirm that those are the times when I hear the M&Ms calling to me.

Expand the time series to regular intervals

The SnackBot website can tell you how many M&Ms are remaining right now. But what if you want to know how many were remaining last Friday? Or on any typical Monday morning?

The sensor data that we've analyzed so far is sparse -- that is, there are data entries for each "change" event, but not for every discrete time interval in the study period. I don't know how the SnackBot sensor records its readings -- it might sample the M&M levels every minute, or every second. Regardless, the API reports (and probably stores) only the records that represent a change. If SnackBot records that the final 24 pieces were depleted at 25JUN2018:07:45:00 (a Monday morning) bringing the count to 0, how many M&Ms remain at 1pm later that day? The data don't tell us explicitly with a recorded reading. But we can assume at that point that the count was still 0. The next recorded reading occurs at 27JUN2018:10:30:00 (on a Wednesday, bringing the count to 1332 -- oh joy!).

If we want to create a useful time series visualization of the M&M candy counts over time, we need to expand the time series from these sparse recordings to regular intervals. SAS offers a few sophisticated time series procedures to accomplish this: PROC EXPAND, PROC TIMESERIES, and PROC TIMEDATA. Each of these offer powerful econometrics methods for interpolation and forecasting -- and that's more than we need for this situation. For my example, I took a more low-tech approach.

First, I created an empty data set with datetime entries at quarter-hour intervals, covering the study period of the data we're looking at.

/* Empty data set with 15 minute interval slots    */
/* Regular intervals for the entire "study" period */
data timeslots;
  last = datetime();
  length qhour 8;
  format qhour datetime20;
  drop last i;
  do i = &start. to last by '0:15:00't;
    qhour = i;
    output;
  end;
run;

Then I used a DATA step to merge these empty slots with the actual event data that I had rounded to the nearest quarter hour (remember that?):

/* Merge the sample data with the timeslots */
data expand;
  merge mmlevels(keep=pieces qhour) timeslots;
  by qhour;
run;

Finally, I used a variation of a last-observation-carried-forward (LOCF) approach to fill in the remaining empty slots. If a reading at 20MAY2018:11:15:00 reports 132 pieces remaining, then that value should be RETAINed for each 15-minute slot until the next reading at 20MAY2018:17:30:00. (That reading is 82 pieces -- meaning somebody helped themselves to 50 pieces. Recommended serving size for plain M&Ms is 20 pieces, but I'm not passing judgement.) I also recorded a text value for the day-of-week to help with the final visualization.

/* for empty timeslots, carry the sample data   */
/* forward, so we always have a count of pieces */
/* Variation on a LOCF technique                */
data final;
  set expand;
  length day $ 3;
  /* 3-char value for day of week */
  day=put(datepart(qhour),weekdate3.);
  retain hold;
  if not missing(pieces) then
    hold=pieces;
  else pieces=hold;
  drop hold;
  if not missing(pieces);
run;

Now I have data that represents the regular intervals that we need.

SnackBot regular intervals

Putting it all together

For my final visualization, I created a series plot for the study period. It shows the rise and fall of M&Ms levels in one SAS break room over several weeks. For additional "color", I annotated the plot with a block chart to delineate the days of the week.

title 'Plain M&M pieces on S1 tracked by SnackBot';
ods graphics / height=300 width=1600;
 
proc sgplot data=final des='M&M pieces tracked by SnackBot';
 
  /* plot the data as a series */ 
  series x=qhour y=pieces / lineattrs=(color=navy thickness=3px);
 
  /* Yes, these are the "official" M&M colors               */
  /* Will be applied in data-order, so works best when data */
  /* begins on a Sunday                                     */
  styleattrs datacolors=(red orange yellow green blue CX593B18 red);
  /* block areas to indicate days-of-week                   */
  block x=qhour block=day / transparency=0.65
    valueattrs=(weight=bold size=10pt color=navy);
 
  xaxis minor display=(nolabel);
  yaxis display=(nolabel) grid max=1600 minor;
run;

You can see the pattern. M&Ms are typically filled on Wednesday to the canister capacity of about 1400 pieces. We usually enter into the weekend with 0 remaining, but there are exceptions. The week of May 27 was our Memorial Day holiday, which explains the lack of activity on Monday (and even Tuesday) during that week as SAS folks took advantage of a slow week with their vacation plans.

SnackBot visualization

More about SAS and M&Ms data

You can download the complete code for this example from my public Gist on GitHub. The example code should work with SAS University Edition and SAS OnDemand for Academics, as well as with any SAS environment that can reach the internet with PROC HTTP.

For more M&M data fun, check out Rick Wicklin's article about the distribution of colors in plain M&Ms. SnackBot does not (yet) report on how many and which color of M&Ms are taken per serving, but using statistics, we can predict that!

The post The Internet of Snacks: SnackBot data and what it reveals about SAS life appeared first on The SAS Dummy.

1月 242018
 

Using SAS with REST APIs is fun and rewarding, but it's also complicated. When you're dealing with web services, credentials, data parsing and security, there are a lot of things that can go wrong. It's useful to have a simple program that verifies that the "basic plumbing" is working before you try to push a lot of complex coding through it.

I'm gratified that many of my readers are able to adapt my API examples and achieve similar results. When I receive a message from a frustrated user who can't get things to work, the cause is almost always one of the following:

  • Not using a recent enough version of SAS. PROC HTTP was revised and improved in SAS 9.4 Maint 3. The JSON library engine was added in SAS 9.4 Maint 4 (released in 2016). It's time to upgrade!
  • Cannot access the web service through a corporate firewall. Use the SSL certificates with the SSLCALISTLOC= option. If you're using SAS University Edition, you'll need the update from December 2017 or later -- that's where SSL was added. The editions prior to that did not include the SSL support.

The following SAS program is a simple plumbing test. It uses a free HTTP test service (httpbin.org) to verify your Internet connectivity from SAS and your ability to use SSL. The endpoint returns a JSON-formatted collection of timestamps in various formats, which the program parses using the JSON library engine. I have successfully run this program from my local SAS on Windows, from SAS University Edition (Dec 2017 release), and from SAS OnDemand for Academics (using SAS Studio).

If you can run this program successfully from your SAS session, then you're ready to attempt the more complex REST API calls. If you encounter any errors while running this simple test, then you will need to resolve these before moving on to the really useful APIs. (Like maybe checking on who is in space right now...)

/* PROC HTTP and JSON libname test          */
/* Requires SAS 9.4m4 or later to run       */
/* SAS University Edition Dec 2017 or later */
 
/* utility macro to echo the JSON out */
%macro echoFile(fn=);
  data _null_;
   infile &fn end=_eof;
   input;
   put _infile_;
  run;
%mend;
 
filename resp "%sysfunc(getoption(WORK))/now.json";
proc http
 url="https://now.httpbin.org/"
 method="GET"
 out=resp;
run;
 
/* Supported with SAS 9.4 Maint 5 */
/*
 %put HTTP Status code = &SYS_PROCHTTP_STATUS_CODE. : &SYS_PROCHTTP_STATUS_PHRASE.; 
*/
 
%echoFile(fn=resp);
 
/* Tell SAS to parse the JSON response */
libname time JSON fileref=resp;
 
title "JSON library structure";
proc datasets lib=time;
quit;
 
/* interpret the various datetime vals and convert to SAS */
data values (keep=from:);
 length from_epoch 8
        from_iso8601 8
        from_rfc2822 8
        from_rfc3339 8;
 
 /* Apply the DT format to a range of vars */
 format from: datetime20.;
 set time.now;
 
 /* epoch = # seconds since 01Jan1970        */
 /* SAS datetime is based on 01Jan1960, so   */
 /* add the offset here for a SAS value      */
 from_epoch = epoch + '01Jan1970:0:0:0'dt;
 
 /* straight conversion to ISO8061           */
 from_iso8601 = input(iso8601,is8601dt.);
 
 /* trim the leading "day of week"      */
 from_rfc2822 = input(substr(rfc2822,5),anydtdtm21.);
 
 /* allow SAS to figure it out */
 from_rfc3339 = input(rfc3339,anydtdtm.);
 
 run;
 
title "Raw values from the JSON response";
proc print data=time.now (drop=ord:);
run;
 
title "Formatted values as SAS datetime";
proc print data=values;
run;
 
libname time clear;
filename resp clear;

The post How to test PROC HTTP and the JSON library engine appeared first on The SAS Dummy.

4月 152017
 

Every day before I even wake up, I have little "SAS robots" that do work for me. These are SAS batch jobs that gather data from external services and build data marts, generate reports, and send e-mail. One of those SAS jobs gathers Google Analytics data about our SAS blogs at blogs.sas.com.

With SAS 9.4 Maintenance 4, it's finally easy (well, relatively speaking) to connect to complicated APIs like those supported by Google, and to gather information with an unattended batch process. It's made possible by recent enhancements in PROC HTTP and the new JSON library engine in SAS. The PROC HTTP enhancements make it easier to negotiate multi-step authentication schemes like OAuth2. And of course, the JSON engine makes it easier to parse JSON results into SAS data sets. If you scour the Internet and SAS conference papers, you might find examples that use PROC GROOVY or other tricks to call outside of SAS to drive these OAuth2 APIs and parse JSON. The recent enhancements make such tricks unnecessary, and thus provide a cleaner approach with fewer moving parts.

Joseph Henry, a senior software developer at SAS (and primary developer for PROC HTTP), wrote a useful SAS Global Forum paper about PROC HTTP and OAuth2. Joseph definitely understands this better than I do, so I recommend that you read his paper first. Joseph's example uses Google Drive. Come back to this post when you want to see the instructions I've hacked together -- it's what I've got working for Google Analytics.

How to use the Google APIs

There are four main steps to use the Google APIs. Two of these steps need to be performed just once, and don't need to be automated. The other two steps are necessary each time you access a Google API from a program, and they can be scripted. Here are the steps:

  1. Grant permission and obtain an authorization token for your "app" (a SAS program, in our case). This must be performed in a browser while logged into your Google account.
  2. Obtain an access token and refresh token. You can accomplish this with a SAS program that you run just once. You then save the refresh token (in a safe place!) for subsequent runs.
  3. Convert your saved refresh token into an access token. Do this at the start of every SAS job that needs to get Google Analytics data.
  4. And finally, use the Google Analytics API to get actual data!

There is a lot to learn about how Google APIs work and how to provision your Google account with access to resources like Google Analytics. That's more than I can go into here, so I'm going to assume a few things and skip ahead. I'm going to assume that you already have an API project created, and that you have a "client ID" and "client secret". The Google API documentation is very good on this topic, and there are many Internet resources to help you learn this part of the process.

Also, some Google APIs are free to use, while others cost money. And some are rate-limited -- that is, you can call the APIs only so many times within a given period. Your specific limits might differ depending on the type of accounts you have with Google. If you have a corporate relationship with Google, you might have an admin who has to provision your specific Google account to get access to the APIs. I'm glossing over all of these nuances -- this post is simply about the code.

Step 1. Grant permission and retrieve auth code

Do this step only once per account, per API permission. You perform this step in the browser while logged into your Google account. Enter the following URL, substituting your client-id as indicated. The URL needs to be all on one line, but I've broken it up here for readability.

 https://accounts.google.com/o/oauth2/v2/auth?
         scope=https://www.googleapis.com/auth/analytics.readonly
         &redirect_uri=urn:ietf:wg:oauth:2.0:oob
         &response_type=code
         &client_id=<your-client-id>.apps.googleusercontent.com 

You will be prompted to allow the "app" access to your Google Analytics data (read only). This is similar to allowing another app to access your Facebook or Twitter profile -- you've probably seen similar prompts during your own Internet citizenship. Permissions are an important component of allowing apps to act on your behalf via REST APIs.

Then, you'll be redirected to a web page with an auth code that you should copy and save. We'll use it in the next step. Setting the redirect_uri properly is very important: redirect_uri=urn:ietf:wg:oauth:2.0:oob. Otherwise the API won't generate a code that you can use in a tool-based app like SAS.

Step 2. Exchange the auth code for an access token

Next, run this PROC HTTP step in SAS with the POST method to exchange that auth code from Step 1 for an access token. It will return a JSON response with a valid Bearer access token. That token expires in 3600 seconds (1 hour). It also returns a refresh_token, which you can exchange again for a new access token after the first one expires. The refresh_token never expires (though it can be revoked via the developer console or API). Thus, you usually need to perform this step just once, unless your token is revoked for some reason.

/* file to store your result */
filename token "c:\temp\token.json";
%let code_given =<code-returned-from-step-1> ;
%let oauth2=https://www.googleapis.com/oauth2/v4/token;
%let client_id=<your-client-id>.apps.googleusercontent.com;
%let client_secret=<your-client-secret>;
proc http
/* put this all on one line! */
 url="&oauth2.?client_id=&client_id.%str(&)code=&code_given.
      %str(&)client_secret=&client_secret.%str(&)redirect_uri=urn:ietf:wg:oauth:2.0:oob
      %str(&)grant_type=authorization_code%str(&)response_type=token"
 method="POST"
 out=token
;
run;

Note: the refresh_token and client-id/secret values should be protected! Anyone who has access to these can get to your Google API data as if they were you. Consider storing them in a file that only you have read access to, and programmatically pull them in when running your SAS program under your host account.

Step 3. Exchange the refresh_token for a valid access token

Typically, you'll include this step just once at the beginning of your SAS job. This takes your saved refresh_token value and asks Google to grant an access token for use in the rest of your program. The Google APIs run very fast -- you should not need to renew the token again within the same job.

/* STEP 3. Do this every time you want to use the GA API */
/* Turn in a refresh-token for a valid access-token      */
/* Should be good for 60 minutes                         */
/* So typically run once at beginning of the job.        */
%let oauth2=https://www.googleapis.com/oauth2/v4/token;
%let client_id=<your-client-id>.apps.googleusercontent.com;
%let client_secret=<your-client-secret>;
%let refresh_token=<refresh-token-from-step-2>;
 
filename rtoken temp;
proc http
 method="POST"
 /* Again, put this all on one line */
 /* broken here for readability */
 url="&oauth2.?client_id=&client_id.
     %str(&)client_secret=&client_secret.
     %str(&)grant_type=refresh_token%str(&)refresh_token=&refresh_token."
 out=rtoken;
run;
 
/* Read the access token out of the refresh response  */
/* Relies on the JSON libname engine (9.4m4 or later) */
libname rtok json fileref=rtoken;
data _null_;
 set rtok.root;
 call symputx('access_token',access_token);
run;

Step 4. Use the Google Analytics API to gather data

Finally, we're to the point where we can retrieve data from this service! And this is where those new features in PROC HTTP come into play. In SAS 9.4m3, Joseph added support for an inline HEADERS statement, which is perfect for injecting OAuth2 information like "Bearer" token into the HTTP header.

proc http
  url="<REST-api-url-call>"
  method="GET" out=ga_resp;
  /* Headers statement makes this easy */
  headers 
    "Authorization"="Bearer &access_token."
    "client-id:"="&client_id.";
 run;

I've noticed that a lot of REST API docs use cUrl (command-line URL) as in their examples. In my experience, PROC HTTP can do just about anything that cUrl can do, and these new options make it even easier to make cUrl-based examples work in SAS.

The Google Analytics API is complex on its own, so I suggest that you spend some time in the Google API Explorer to see what types of requests yield what sort of results. Google Analytics allows you to fetch metrics across different dimensions and dates, so the results can come back summarized across the dimensions that you request. Since we're bringing this data into SAS and I probably want to further summarize in my reports, I try to get as much "unsummarized" data as possible, and I avoid aggregations such as "Average views" or "Average time on page" -- as I don't want to risk calculating new summaries based on these.

I've included my program for retrieving the page views, total time on page, entrances and exits at the daily level. This is granular enough for me to create meaningful reports all of our blog authors. The program is a SAS macro that issues one API call per day in the range specified.

/* Metrics and dimensions are defined in the Google Analytics doc */
/* Experiment in the developer console for the right mix          */
/* Your scenario might be different and would require a different */
/* type of query                                                  */
/* The GA API will "number" the return elements as                */
/* element1, element2, element3 and so on                         */
/* In my example, path and title will be 1 and 2 */
%let dimensions=  %sysfunc(urlencode(%str(ga:pagePath,ga:pageTitle)));
/* then pageviews, uniquepageviews, timeonpage will be 3, 4, 5, etc. */
%let metrics=     %sysfunc(urlencode(%str(ga:pageviews,ga:uniquePageviews,ga:timeOnPage,ga:entrances,ga:exits)));
/* this ID is the "View ID" for the GA data you want to access   */
%let id=          %sysfunc(urlencode(%str(ga:<your-view-ID>)));
 
%macro getGAdata;
%do workdate = &enddate %to &startdate %by -1;
	%let urldate=%sysfunc(putn(&workdate.,yymmdd10.));
	filename ga_resp temp;
	proc http
         /* again, put this url= value ALL on a single line! */
	 url="https://www.googleapis.com/analytics/v3/data/ga
            ?ids=&id.%str(&)start-date=&urldate.
            %str(&)end-date=&urldate.
            %str(&)metrics=&metrics.
            %str(&)dimensions=&dimensions.%str(&)max-results=20000"
	 method="GET" out=ga_resp;
	 headers 
	   "Authorization"="Bearer &access_token."
	   "client-id:"="&client_id.";
	run;
 
        /* Love this JSON libname engine! */
	libname garesp json fileref=ga_resp;
 
	data ga.ga_daily%sysfunc(compress(&urldate.,'-')) (drop=element:);
		set garesp.rows;
		drop ordinal_root ordinal_rows;
		length date 8 url $ 300 title $ 250 
	          views 8 unique_views 8 time_on_page 8 entrances 8 exits 8
	          ;
		format date yymmdd10.;
		date=&workdate.;
		/* Corerce the elements into data variables */
		/* Basic on expected sequence               */
		url = element1;
		title = element2;
		views = input(element3, 5.);
		unique_views = input(element4, 6.);
		time_on_page=input(element5, 7.2);
		entrances = input(element6, 6.);
		exits = input(element7, 6.);
	run;
%end;
%mend;
 
%getGAdata;
 
/* Assemble the daily files into one data set */
data alldays_gadata;
  set ga.ga_daily:;
run;

When it's all finally working, SAS pulls in data that looks like this:
Google Analytics data via API

Re-entry and disclaimers

This has been a pretty deep post about Google APIs and how to make them work from SAS programs. If you have experience with the Google APIs already, then I'm optimistic that this article is enough for you to get things working in SAS. If you've never used OAuth2 or any of the Google APIs, then this post might not be quite enough. Don't forget to read Joseph's paper about OAuth2 and PROC HTTP. You might also like his paper about using RESTful APIs.

Please leave a comment if you have questions or if you think that I left something out. I'll do my best to correct and clarify.

The post Using SAS to access Google Analytics APIs appeared first on The SAS Dummy.

1月 062017
 

At SAS, we've published more repositories on GitHub as a way to share our open source projects and examples. These "repos" (that's Git lingo) are created and maintained by experts in R&D, professional services (consulting), and SAS training. Some recent examples include:

With dozens of repositories under the sassoftware account, it becomes a challenge to keep track of them all. So, I've built a process that uses SAS and the GitHub APIs to create reports for my colleagues.

Using the GitHub API

GitHub APIs are robust and well-documented. Like most APIs these days, you access them using HTTP and REST. Most of the API output is returned as JSON. With PROC HTTP and the JSON libname engine (new in SAS 9.4 Maint 4), using these APIs from SAS is a cinch.

The two API calls that we'll use for this basic report are:

Fetching the GitHub account metadata

The following SAS program calls the first API to gather some account metadata. Then, it stores a selection of those values in macro variables for later use.

/* Establish temp file for HTTP response */
filename resp temp;
 
/* Get Org metadata, including repo count */
proc http
 url="https://api.github.com/orgs/sassoftware"  
 method="GET"
 out=resp
;
run;
 
/* Read response as JSON data, extract select fields */
/* It's in the ROOT data set, found via experiment   */
libname ss json fileref=resp;
 
data meta; 
  set ss.root; 
  call symputx('repocount',public_repos);
  call symputx('acctname',name);
  call symputx('accturl',html_url);
run;
 
/* log results */
%put &=repocount;
%put &=acctname;
%put &=accturl;

Here is the output of this program (as of today):

REPOCOUNT=66
ACCTNAME=SAS Software
ACCTURL=https://github.com/sassoftware

The important piece of this output is the count of repositories. We'll need that number in order to complete the next step.

Fetching the repositories and stats

It turns out that the /repos API call returns the details for 30 repositories at a time. For accounts with more than 30 repos, we need to call the API multiple times with a &page= index value to iterate through each batch. I've wrapped this process in a short macro function that repeats the calls as many times as needed to gather all of the data. This snippet calculates the upper bound of my loop index:

/* Number of repos / 30, rounded up to next integer     */
%let pages=%sysfunc(ceil(%sysevalf(&repocount / 30)));

Given the 66 repositories on the SAS Software account right now, that results in 3 API calls.

Each API call creates verbose JSON output with dozens of fields, only a few if which we care about for this report. To simplify things, I've created a JSON map that defines just the fields that I want to capture. I came up with this map by first allowing the JSON libname engine to "autocreate" a map file with the full response. I edited that file and whittled the result to just 12 fields. (Read my previous blog post about the JSON engine to learn more about JSON maps.)

The multiple API calls create multiple data sets, which I must then concatenate into a single output data set for reporting. Then to clean up, I used PROC DATASETS to delete the intermediate data sets.

First, here's the output data:

ssgit
Here's the code segment, which is rather long because I included the JSON map inline.

/* This trimmed JSON map defines just the fields we want */
/* Created by using AUTOMAP=CREATE on JSON libname       */
/* then editing the generated map file to reduce to      */
/* minimum number of fields of interest                  */
filename repomap temp;
data _null_;
 infile datalines;
 file repomap;
 input;
 put _infile_;
 datalines;
{
  "DATASETS": [
 {
   "DSNAME": "root",
   "TABLEPATH": "/root",
   "VARIABLES": [
  {
    "NAME": "id",
    "TYPE": "NUMERIC",
    "PATH": "/root/id"
  },
  {
    "NAME": "name",
    "TYPE": "CHARACTER",
    "PATH": "/root/name",
    "CURRENT_LENGTH": 50,
    "LENGTH": 50
  },
  {
    "NAME": "html_url",
    "TYPE": "CHARACTER",
    "PATH": "/root/html_url",
    "CURRENT_LENGTH": 100,
    "LENGTH": 100
  },
  {
    "NAME": "language",
    "TYPE": "CHARACTER",
    "PATH": "/root/language",
    "CURRENT_LENGTH": 20,
    "LENGTH": 20
  },
  {
    "NAME": "description",
    "TYPE": "CHARACTER",
    "PATH": "/root/description",
    "CURRENT_LENGTH": 300,
    "LENGTH": 500
  },
  {
    "NAME": "created_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/created_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "updated_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/updated_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "pushed_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/pushed_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "size",
    "TYPE": "NUMERIC",
    "PATH": "/root/size"
  },
  {
    "NAME": "stars",
    "TYPE": "NUMERIC",
    "PATH": "/root/stargazers_count"
  },
  {
    "NAME": "forks",
    "TYPE": "NUMERIC",
    "PATH": "/root/forks"
  },
  {
    "NAME": "open_issues",
    "TYPE": "NUMERIC",
    "PATH": "/root/open_issues"
  }
   ]
 }
  ]
}
;
run;
 
/* GETREPOS: iterate through each "page" of repositories */
/* and collect the GitHub data                           */
/* Output: <account>_REPOS, a data set with all basic data  */
/*  about an account's public repositories          */
%macro getrepos;
 %do i = 1 %to &pages;
  proc http
   url="https://api.github.com/orgs/sassoftware/repos?page=&i."  
   method="GET"
   out=resp
  ;
  run;
 
  /* Use JSON engine with defined map to capture data */
  libname repos json map=repomap fileref=resp;
  data _repos&i.;
   set repos.root;
  run;
 %end;
 
 /* Concatenate all pages of data */
 data sassoftware_allrepos;
  set _repos:;
 run;
 
 /* delete intermediate repository data */
 proc datasets nolist nodetails;
  delete _repos:;
 quit;
%mend;
 
/* Run the macro */
%getrepos;

Creating a simple report

Finally, I want to create simple report listing of all of the repositories and their top-level stats. I'm using PROC SQL without a CREATE TABLE statement, which will create a simple ODS listing report for me. I use this approach instead of PROC PRINT because I transformed a couple of the columns in the same step. For example, I created a new variable with a fully formed HTML link, which ODS HTML will render as an active link in the browser. Here's a snapshot of the output, followed by the code.

samplereport

/* Best with ODS HTML output */
title "github.com/sassoftware (&acctname.): Repositories and stats";
title2 "ALL &repocount. repos, Data pulled with GitHub API as of &SYSDATE.";
title3 height=1 link="&accturl." "See &acctname. on GitHub";
proc sql;
 select 
  catt('<a href="',t1.html_url,'">',t1.name,"</a>") as Repository, 
 case 
  when length(t1.description)>50 then cat(substr(t1.description,1,49),'...')
  else t1.description
 end 
as Description,
 t1.language as Language,
 t1.created_at format=dtdate9. as Created, 
 t1.pushed_at format=dtdate9. as Last_Update, 
 t1.stars as Stars, 
 t1.forks as Forks, 
 t1.open_issues as Open_Issues
from sassoftware_allrepos t1
 order by t1.pushed_at desc;
quit;

Get the entire example

Not wanting to get too meta on you here, but I've placed the entire program on my own GitHub account. The program I've shared has a few modifications that make it easier to adapt for any organization or user on GitHub. As you play with this, keep in mind that the GitHub API is "rate limited" -- they allow only so many API calls from a single IP address in a certain period of time. That's to ensure that the APIs perform well for all users. You can use authenticated API calls to increase the rate-limit threshold for yourself, and I do that for my own production reporting process. But...that's a blog post for a different day.

tags: github, JSON, open source, PROC HTTP

The post Reporting on GitHub accounts with SAS appeared first on The SAS Dummy.

12月 032016
 

JSON is the new XML. The number of SAS users who need to access JSON data has skyrocketed, thanks mainly to the proliferation of REST-based APIs and web services. Because JSON is structured data in text format, we've been able to offer simple parsing techniques that use DATA step and most recently PROC DS2. But finally*, with SAS 9.4 Maintenance 4, we have a built-in LIBNAME engine for JSON.

Simple JSON example: Who is in space right now?

Speaking of skyrocketing, I discovered a cool web service that reports who is in space right now (at least on the International Space Station). It's actually a perfect example of a REST API, because it does just that one thing and it's easily integrated into any process, including SAS. It returns a simple stream of data that can be easily mapped into a tabular structure. Here's my example code and results, which I produced with SAS 9.4 Maintenance 4.

filename resp temp;
 
/* Neat service from Open Notify project */
proc http 
 url="http://api.open-notify.org/astros.json"
 method= "GET"
 out=resp;
run;
 
/* Assign a JSON library to the HTTP response */
libname space JSON fileref=resp;
 
/* Print result, dropping automatic ordinal metadata */
title "Who is in space right now? (as of &sysdate)";
proc print data=space.people (drop=ordinal:);
run;

JSON who is in space
But what if your JSON data isn't so simple? JSON can represent information in nested structures that can be many layers deep. These cases require some additional mapping to transform the JSON representation to a rectangular data table that we can use for reporting and analytics.

JSON map example: Most recent topics from SAS Support Communities

In a previous post I shared a PROC DS2 program that uses the DS2 JSON package to call and parse our SAS Support Communities API. The parsing process is robust, but it requires quite a bit of fore knowledge about the structure and fields within the JSON payload. It also requires many lines of code to extract each field that I want.

Here's a revised pass that uses the JSON engine:

/* split URL for readability */
%let url1=http://communities.sas.com/kntur85557/restapi/vc/categories/id/bi/topics/recent;
%let url2=?restapi.response_format=json%str(&)restapi.response_style=-types,-null,view;
%let url3=%str(&)page_size=100;
%let fullurl=&url1.&url2.&url3;
 
filename topics temp;
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
/* Let the JSON engine do its thing */
libname posts JSON fileref=topics;
title "Automap of JSON data";
 
/* examine resulting tables/structure */
proc datasets lib=posts; quit;
proc print data=posts.alldata(obs=20); run;

Thanks to the many layers of data in the JSON response, here are the tables that SAS creates automatically.

json Auto tables
There are 12 tables that contain various components of the message data that I want, plus the ALLDATA member that contains everything in one linear table. ALLDATA is good for examining structure, but not for analysis. You can see that it's basically name-value pairs with no data types/formats assigned.

json ALLDATA
I could use DATA steps or PROC SQL to merge the various tables into a single denormalized table for my reporting purposes, but there is a better way: define and apply a JSON map for the libname engine to use.

To get started, I need to rerun my JSON libname assignment with the AUTOMAP option. This creates an external file with the JSON-formatted mapping that SAS generates automatically. In my example here, the file lands in the WORK directory with the name "top.map".

filename jmap "%sysfunc(GETOPTION(WORK))/top.map";
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts JSON fileref=topics map=jmap automap=create;

This generated map is quite long -- over 400 lines of JSON metadata. Here's a snippet of the file that describes a few fields in just one of the generated tables.

"DSNAME": "messages_message",
"TABLEPATH": "/root/response/messages/message",
"VARIABLES": [
{
  "NAME": "ordinal_messages",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages"
},
{
  "NAME": "ordinal_message",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages/message"
},
{
  "NAME": "href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/href",
  "CURRENT_LENGTH": 19
},
{
  "NAME": "view_href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/view_href",
  "CURRENT_LENGTH": 134
},

By using this map as a starting point, I can create a new map file -- one that is simpler, much smaller, and defines just the fields that I want. I can reference each field by its "path" in the JSON nested structure, and I can also specify the types and formats that I want in the final data.

In my new map, I eliminated many of the tables and fields and ended up with a file that was just about 60 lines long. I also applied sensible variable names, and I even specified SAS formats and informats to transform some columns during the import process. For example, instead of reading the message "datetime" field as a character string, I coerced the value into a numeric variable with a DATETIME format:

{
  "NAME": "datetime",
   "TYPE": "NUMERIC",
  "INFORMAT": [ "IS8601DT", 19, 0 ],
  "FORMAT": ["DATETIME", 20],
  "PATH": "/root/response/messages/message/post_time/_",
  "CURRENT_LENGTH": 8
},

I called my new map file 'minimap.map' and then re-issued the libname without the AUTOMAP option:

filename minmap 'c:tempminmap.map';
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts json fileref=topics map=minmap;
proc datasets lib=posts; quit;
 
data messages;
 set posts.messages;
run;

Here's a snapshot of the single data set as a result.

JSON final data
I think you'll agree that this result is much more usable than what my first pass produced. And the amount of code is much smaller and easier to maintain than any previous SAS-based process for reading JSON.

Here's the complete program in public GitHub gist, including my custom JSON map.


* By the way, tags: JSON, REST API, SAS programming

The post Reading data with the SAS JSON libname engine appeared first on The SAS Dummy.

9月 292015
 

Thanks to the proliferation of cloud services and REST-based APIs, SAS users have been making use of PROC HTTP calls (to query these web services) and some creative DATA step or PROC GROOVY code to process the JSON results. Such methods get the job done (JSON is simply text, after all), but they aren't as robust as an official JSON parser. JSON is simple: it's a series of name-value pairs that represent an object in JavaScript. But these pairs can be nested within one another, so in order to parse the result you need to know about the object structure. A parser helps with the process, but you still need to know the semantics of any JSON response.

SAS 9.4 introduced PROC JSON, which allows you to create JSON output from a data set. But it wasn't until SAS 9.4 Maintenance 3 that we have a built-in method to parse JSON content. This method was added as a DS2 package: the JSON package.

I created an example of the method working -- using an API that powers our SAS Support Communities! The example queries communities.sas.com for the most recent posts to the SAS Programming category. Here's a small excerpt of the JSON response.

 "post_time": "2015-09-28T16:29:05+00:00",
  "views": {
  "count": 1
  },
  "subject": "Re: How to code for the consecutive values",
  "author": {
  "href": "/users/id/13884",
  "login": "ballardw"

Notice that some items, such as post_time, are simple one-level values. But other items, such as views or author, require a deeper dive to retrieve the value of interest ("count" for views, and "login" for author). The DS2 JSON parser can help you to navigate to those values without you needing to know how many braces or colons or commas are in your way.

Here is an example of the result: a series plot from PROC SGPLOT and a one-way frequency analysis from PROC FREQ. The program also produces a detailed listing of the messages, the topic content, and the datetime stamp.

series

boardfreq
This is my first real DS2 program, so I'm open to feedback. I already know of a couple of improvements I should make, but I want to share it now as I think it's good enough to help others who are looking to do something similar.

The program requires SAS 9.4 Maintenance 3. It also works fine in the most recent version of SAS University Edition (using SAS Studio 3.4). All of the code runs using just Base SAS procedures.

/* DS2 program that uses a REST-based API */
/* Uses http package for API calls       */
/* and the JSON package (new in 9.4m3)   */
/* to parse the result.                  */
proc ds2; 
  data messages (overwrite=yes);
    /* Global package references */
    dcl package json j();
 
    /* Keeping these variables for output */
    dcl double post_date having format datetime20.;
    dcl int views;
    dcl nvarchar(128) subject author board;
 
    /* these are temp variables */
    dcl varchar(65534) character set utf8 response;
    dcl int rc;
    drop response rc;
 
    method parseMessages();
      dcl int tokenType parseFlags;
      dcl nvarchar(128) token;
      rc=0;
      * iterate over all message entries;
      do while (rc=0);
        j.getNextToken( rc, token, tokenType, parseFlags);
 
        * subject line;
        if (token eq 'subject') then
          do;
            j.getNextToken( rc, token, tokenType, parseFlags);
            subject=token;
          end;
 
        * board URL, nested in an href label;
        if (token eq 'board') then
          do;
            do while (token ne 'href');
               j.getNextToken( rc, token, tokenType, parseFlags );
            end;
            j.getNextToken( rc, token, tokenType, parseFlags );
            board=token;
          end;
 
        * number of views (int), nested in a count label ;
        if (token eq 'views') then
          do;
            do while (token ne 'count');
               j.getNextToken( rc, token, tokenType, parseFlags );
            end;
            j.getNextToken( rc, token, tokenType, parseFlags );
            views=inputn(token,'5.');
          end;
 
        * date-time of message (input/convert to SAS date) ;
        * format from API: 2015-09-28T10:16:01+00:00 ;
        if (token eq 'post_time') then
          do;
            j.getNextToken( rc, token, tokenType, parseFlags );
            post_date=inputn(token,'anydtdtm26.');
          end;
 
        * user name of author, nested in a login label;
        if (token eq 'author') then
          do; 
            do while (token ne 'login');
               j.getNextToken( rc, token, tokenType, parseFlags );
            end;
            * get the author login (username) value;
            j.getNextToken( rc, token, tokenType, parseFlags );
            author=token;
            output;
          end;
      end;
      return;
    end;
 
    method init();
      dcl package http webQuery();
      dcl int rc tokenType parseFlags;
      dcl nvarchar(128) token;
      dcl integer i rc;
 
      /* create a GET call to the API                                         */
      /* 'sas_programming' covers all SAS programming topics from communities */
      webQuery.createGetMethod(
         'http://communities.sas.com/kntur85557/' || 
         'restapi/vc/categories/id/sas_programming/posts/recent' ||
         '?restapi.response_format=json' ||
         '&restapi.response_style=-types,-null&page_size=100');
      /* execute the GET */
      webQuery.executeMethod();
      /* retrieve the response body as a string */
      webQuery.getResponseBodyAsString(response, rc);
      rc = j.createParser( response );
      do while (rc = 0);
        j.getNextToken( rc, token, tokenType, parseFlags);
        if (token = 'message') then
          parseMessages();
      end;
    end;
 
  method term();
    rc = j.destroyParser();
  end;
 
  enddata;
run;
quit;
 
/* Add some basic reporting */
proc freq data=messages noprint;
    format post_date datetime11.;
    table post_date / out=message_times;
run;
 
ods graphics / width=2000 height=600;
title '100 recent message contributions in SAS Programming';
title2 'Time in GMT';
proc sgplot data=message_times;
    series x=post_date y=count;
    xaxis minor label='Messages';
    yaxis label='Time created' grid;
run;
 
title 'Board frequency for recent 100 messages';
proc freq data=messages order=freq;
    table board;
run;
 
title 'Detailed listing of messages';
proc print data=messages;
run;
 
title;

I also shared this program on the SAS Support Communities as a discussion topic. If you want to contribute to the effort, please leave me a reply with your suggestions and improvements!

tags: DS2, JSON, REST API, SAS 9.4

The post Using SAS DS2 to parse JSON appeared first on The SAS Dummy.

12月 132013
 

Last year I took a bite of the newly SAS Proc Groovy to read JSON data since there was no direct “proc import” or “infile” or “libname” way to play with JSON. Here is an nice example from SAS official blog, by Falko Schulz where Proc Groovy is used to parse Twitter JSON file:

How to import Twitter tweets in SAS DATA Step using OAuth 2 authentication style

PS: Since SAS 9.4, you can output a JSON file by Proc JSON, but still no XML-libname-engine like JSON parser available.

10月 292012
 

groovy-logo-medium

see, it’s hip to be square
‘cuz SAS has a new PROC that’s GROOVY
-Chris Hemedinger, Poetry on our own terms   

These days I played Proc Groovy (new in SAS 9.3) for a while because Groovy natively supports JSON (JavaScript Object Notation) data format. I downloaded much JSON data in the past few weeks(Github archive for example).

——–I’m a line separator ———

DO wish SAS can also support JSON in the following release, in a way similar to the existing XML Libname engine, ODS XML markup and SAS XML Mapper!

UPDATE 20121030: I submitted a ticket on SASware Ballot on this idea and get a feedback that SAS will support JSON since 9.4, both reading and writing. That’s cool.

——–I’m a line separator end———

Groovy is a dynamic language running in a JVM (Java Virtual Machine) and looks like a lightweight version of JAVA. Don’t know why Groovy was chosen but it is nice to have (at least to parse JSON data) and it’s in Base SAS!

Set UP

Back to Proc Groovy. To make it work, you should first set a JDK(Java Development Kit; or at least a JVM, also called JRE, Java Runtime Environment) and install Groovy. I test in a Windows 7 machine with SAS 9.3:

  • Install a JDK (or JVM). Get a proper version for your machine. I have a JDK7 installed in C:Program FilesJavajdk1.7.0_09. By the way, if your SAS 9.3 works well in a64 bit of Windows 7, there must be one JRE in C:Program Files (x86)Javajre1.6.0_24.
  • Set up the windows environment variable for JRE. First, create a system variable JAVA_HOME with value of C:Program FilesJavajdk1.7.0_09, then put %JAVA_HOME%bin to the existing system variable Path. If you use a JVM, replace the value of JAVA_HOME as C:Program FilesJavajre7 or other proper directory.
  • Install Groovy using the latest version (current v2.05). Accept all the default settings and it will set the environment variable for Groovy. You will have the Groovy installed in C:Program Files (x86)GroovyGroovy-2.0.5.
  • Locate the sasv9.cfg file in C:Program FilesSASHomeSASFoundation9.3nlsen, find option –JREOPTIONS and add a line inside

-Dtkj.app.class.path=C:Program Files (x86)GroovyGroovy-2.0.5embeddablegroovy-all-2.0.5.jar

That’s it.

Hello World in Proc Groovy

proc groovy;
    submit;
        def name=’World’;
        println "Hello $name!"
    endsubmit;
quit;

Read JSON data

Use a simple example from JSON in wikipedia (save it to a file, test.json):

{
    "firstName": "John",
    "lastName": "Smith",
    "age": 25,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": "10021"
    },
    "phoneNumber": [
        {
            "type": "home",
            "number": "212 555-1234"
        },
        {
            "type": "fax",
            "number": "646 555-4567"
        }
    ]
}
The following codes simple demonstrate how to:
  • read JSON file
  • print JSON file
  • extract JSON values
  • output values into SAS macro variables

options nosource;

proc groovy;

    submit;

        import groovy.json.*

        def input=new File(‘a:testtest.json’).text

        def output = new JsonSlurper().parseText(input)

        println output   
        println ""

        output.each {println it}  

        println ""

        println output.address.streetAddress

        println "Street Address: $output.address.streetAddress"

        println output.address["streetAddress"]       


        exports = [fName1:output['firstName']]   
        exports.put(‘fName2′, output['firstName']) 
        
        endsubmit;

quit;

%put fName1: &fName1;

%put fName2: &fName2;

The output in LOG window:
JsoninSAS_Groovy

Happy Groovying!

References

Parse JSON in SAS with Groovy – part 2 by Simon Dawson

http://groovy.codehaus.org/Documentation

Overview: GROOVY Procedure in support.sas.com

7月 072012
 

As of this writing, I've posted 34 blogs posts during 2012 (not counting this one), which have been shared on Twitter a total of 368 times and on Facebook a total of 92 times. Those aren't exactly "viral" numbers, but I try not to get too hung up on it.

How do I know these numbers? If you are reading this blog post on our blogs.sas.com site, then you can see the "social share" counts for each individual post. For example, see the header from this recent post:


As it would be a tedious task to view each post and tally up the counts, I wrote a SAS program to do that work for me.

The little social count widgets that you see on our blog site are powered by "count" APIs that are surfaced by Twitter and Facebook servers. Each of these APIs is a little bit different, but they work basically in the same way. You create a request in the form of a URL, which contains the address of the API service plus the web page URL that you want to know the "count" of. You use HTTP to issue the request, and the response comes back in the form of a JSON string. Unlike some web service APIs, these API requests do not require authentication or application permissions, so they are easy to use with basic HTTP.

For example, the following URL request returns the number of times that this blog post has been tweeted (click on the link to see the response, and tweet this blog post to watch it go up!):

http://urls.api.twitter.com/1/urls/count.json?url=http://blogs.sas.com/content/sasdummy/2012/07/06/social-media-counts-in-sas/

The exact count in the response will change over time, but here is an example of an outrageously optimistic result:

{ "count":4032,
  "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2012\/07\/06\/social-media-counts-in-sas\/"
}

JSON has become an ubiquitous data exchange format among web service APIs. The SAS language doesn't yet support native parsing of JSON, but with a little bit of programming you can easily parse simple JSON to get the information that you need.

The SAS program that I wrote uses these techniques to gather the Twitter and Facebook counts:

  • Use a DATA step to create a list of URLs (web pages, or links). In my example I included the URLs for the blog topics I've posted thus far in 2012.
  • Use a SAS macro to loop through each URL record and create a data set with a single record.
  • In the macro loop for each URL, use FILENAME URL to construct filerefs that will connect to Twitter and Facebook using well-formed API requests.
  • In the macro loop for each URL, use a DATA step to "input" the file (producing the API response in JSON form) and record it into a data set.
  • Outside of the macro loop, concatenate all responses into a data set (one for Twitter responses and one for Facebook responses).
  • Use Perl regular expression functions to parse the responses into just the numbers I'm looking for.
  • And finally, join all of the social counts together to create a single data set with the URLs and their current tweet/share counts.

These count values are a snapshot of the current number of shares, so I also "stamp" each record with today's date in case I want to build a historical set of these counts and see the trends over time.

This program contains a few inefficiencies. It passes through the "links" source data many times (once for each link) and creates lots of tiny transient data sets along the way. However, the biggest impact on the program performance is the time it takes for the Twitter and Facebook APIs to respond. In my experience, the Twitter API is usually very fast. The Facebook API is sometimes fast, but sometimes a call can be blocked for several seconds (or longer). Because of their scale, both services are sensitive to overuse or abuse, and may "throttle" or deny requests from IP addresses that use them irresponsibly.

When you see the numbers, you might wonder: are they accurate? Can you rely on these results as the absolute counts of how your web content has been shared/liked via social media? Personally, I believe the answer is No. However, I think that the values are useful for relative comparisons. Without focusing on the absolute numbers, I can still learn which of my posts gathered more or less engagement on Twitter, Facebook, or both.

You can download the complete program (with sample links data) from here:
GetSocialStats_example.sas

I've included the program logic (not the sample data) in the remainder of this post. I added a few comments to the code to help convey what's going on.

/* Get count of URLs in source data                */
/* Here it assumes WORK.LINKS                      */
/* and the page link to check is in the URL column */
proc sql noprint;
  select count(URL) into :urlCount from WORK.LINKS;
quit;

/* trim this for later use in constructing data set names*/
%let urlCount = %trim(&urlCount);

%macro gatherSocialStats;
  /* This loop reads each URL link from the data set */
  /* one at a time, and then uses FILENAME URL       */
  /* to invoke the Twitter and Facebook APIs for     */
  /* Tweet counts and Share counts                   */
  %do i = 1 %to &urlCount;

    /* read the next URL */
    proc sql noprint;
      select trim(url) into :url from work.links(firstobs=&i obs=&i);
    quit;

    %put Checking %str(&url.);

    /* format of the Twitter API call */
    filename twurl url "http://urls.api.twitter.com/1/urls/count.json?url=&url." 
      /* supply PROXY= option if behind a corporate firewall/gateway */
    ;

    /* format of the Facebook API call */
    filename fburl url "http://graph.facebook.com/?ids=&url." 
      /* supply PROXY= option if behind a corporate firewall/gateway */
    ;

    data twresult&i;
      /* Running in EG? SYSECHO helps convey progress in status bar */
      SYSECHO "Checking Tweet URL &i. of &urlCount.";
      length line $ 500 url $ 500;
      infile twurl;
      url = "&url.";
      input line;
    run;

    data fbresult&i;
      /* Running in EG? SYSECHO helps convey progress in status bar */
      SYSECHO "Checking FB URL &i. of &urlCount.";
      length line $ 500 url $ 500;
      infile fburl;
      input line;
      url = "&url.";
    run;

    /* Free up the filerefs */
    filename twurl clear;
    filename fburl clear;
  %end;

  /* concatenate all Twitter counts */
  data twresults;
    set twresult1-twresult&urlCount.;
  run;

  /* concatenate all Facebook share counts */
  data fbresults;
    set fbresult1-fbresult&urlCount.;
  run;

  /* clean up all of the tiny data sets */
  proc datasets lib=work nolist;
    delete twresult1-twresult&urlCount.;
    delete fbresult1-fbresult&urlCount.;
  quit;

%mend;

%gatherSocialStats;

/* Post processing to PARSE the results, which */
/* are in JSON format */

/* Example result:
{
  "http://blogs.sas.com/content/sasdummy/2012/01/25/export-to-excel-got-easier/": 
  {
     "id": "http://blogs.sas.com/content/sasdummy/2012/01/25/export-to-excel-got-easier/",
     "shares": 3
  }
}
*/
data work.fbcounts (keep=fbshares url);
  set fbresults;
  length fbshares 8;
  if _n_ = 1 then
    do;
      retain fb_regex;
      /* looking for "shares":<a number> */
      fb_regex = prxparse("/\""shares\""\:([0-9]*)/");
    end;
  position = prxmatch(fb_regex,line);
  if (position ^= 0) then
    do;
      call prxposn(fb_regex, 1, start, length);
      fbshares = substr(line,start,length);
    end;
run;

/* Post processing to PARSE the results, which */
/* are in JSON format */

/* Example result:
  {
  "count":8,
  "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2012\/01\/27\/achieving-mediocrity\/"
  }
*/
data work.twcounts (keep=tweets url);
  set twresults;
  length tweets 8;
  if _n_ = 1 then
    do;
      retain tw_regex;
      /* looking for "count":<a number> */
      tw_regex = prxparse("/\""count\""\:([0-9]*)/");
    end;
  position = prxmatch(tw_regex,line);
  if (position ^= 0) then
    do;
      call prxposn(tw_regex, 1, start, length);
      tweets = substr(line,start,length);
    end;
run;

/* Join it all together with the URL, tweet counts, */
/* and share counts */
proc sql;
  create table work.socialcounts as 
    select t1.url, 
      ifn(t2.fbshares eq .,0,t2.fbshares) as fbshares, 
      t3.tweets,
      /* date-stamp the value for use in history/comparisons */
      today() as as_of_date format=date9.
  from work.links t1 
    left join work.fbcounts t2 on (t1.url = t2.url) 
    left join work.twcounts t3 on (t1.url = t3.url)
  ;
quit;
tags: facebook, FILENAME URL, JSON, REST API, twitter