REST API

082017
 

Digitalisation is blasting the cobwebs out from enterprises and organisations of all kinds – freeing them to innovate and take advantage of the always-on economy. But it’s also helping new disruptive players to gain an unexpectedly strong foothold in many markets. One of the key advantages these new players have [...]

Is governance getting in the way of innovation? was published on SAS Voices by Peter Pugh-Jones

十二 032016
 

JSON is the new XML. The number of SAS users who need to access JSON data has skyrocketed, thanks mainly to the proliferation of REST-based APIs and web services. Because JSON is structured data in text format, we've been able to offer simple parsing techniques that use DATA step and most recently PROC DS2. But finally*, with SAS 9.4 Maintenance 4, we have a built-in LIBNAME engine for JSON.

Simple JSON example: Who is in space right now?

Speaking of skyrocketing, I discovered a cool web service that reports who is in space right now (at least on the International Space Station). It's actually a perfect example of a REST API, because it does just that one thing and it's easily integrated into any process, including SAS. It returns a simple stream of data that can be easily mapped into a tabular structure. Here's my example code and results, which I produced with SAS 9.4 Maintenance 4.

filename resp temp;
 
/* Neat service from Open Notify project */
proc http 
 url="http://api.open-notify.org/astros.json"
 method= "GET"
 out=resp;
run;
 
/* Assign a JSON library to the HTTP response */
libname space JSON fileref=resp;
 
/* Print result, dropping automatic ordinal metadata */
title "Who is in space right now? (as of &sysdate)";
proc print data=space.people (drop=ordinal:);
run;

JSON who is in space
But what if your JSON data isn't so simple? JSON can represent information in nested structures that can be many layers deep. These cases require some additional mapping to transform the JSON representation to a rectangular data table that we can use for reporting and analytics.

JSON map example: Most recent topics from SAS Support Communities

In a previous post I shared a PROC DS2 program that uses the DS2 JSON package to call and parse our SAS Support Communities API. The parsing process is robust, but it requires quite a bit of fore knowledge about the structure and fields within the JSON payload. It also requires many lines of code to extract each field that I want.

Here's a revised pass that uses the JSON engine:

/* split URL for readability */
%let url1=http://communities.sas.com/kntur85557/restapi/vc/categories/id/bi/topics/recent;
%let url2=?restapi.response_format=json%str(&)restapi.response_style=-types,-null,view;
%let url3=%str(&)page_size=100;
%let fullurl=&url1.&url2.&url3;
 
filename topics temp;
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
/* Let the JSON engine do its thing */
libname posts JSON fileref=topics;
title "Automap of JSON data";
 
/* examine resulting tables/structure */
proc datasets lib=posts; quit;
proc print data=posts.alldata(obs=20); run;

Thanks to the many layers of data in the JSON response, here are the tables that SAS creates automatically.

json Auto tables
There are 12 tables that contain various components of the message data that I want, plus the ALLDATA member that contains everything in one linear table. ALLDATA is good for examining structure, but not for analysis. You can see that it's basically name-value pairs with no data types/formats assigned.

json ALLDATA
I could use DATA steps or PROC SQL to merge the various tables into a single denormalized table for my reporting purposes, but there is a better way: define and apply a JSON map for the libname engine to use.

To get started, I need to rerun my JSON libname assignment with the AUTOMAP option. This creates an external file with the JSON-formatted mapping that SAS generates automatically. In my example here, the file lands in the WORK directory with the name "top.map".

filename jmap "%sysfunc(GETOPTION(WORK))/top.map";
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts JSON fileref=topics map=jmap automap=create;

This generated map is quite long -- over 400 lines of JSON metadata. Here's a snippet of the file that describes a few fields in just one of the generated tables.

"DSNAME": "messages_message",
"TABLEPATH": "/root/response/messages/message",
"VARIABLES": [
{
  "NAME": "ordinal_messages",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages"
},
{
  "NAME": "ordinal_message",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages/message"
},
{
  "NAME": "href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/href",
  "CURRENT_LENGTH": 19
},
{
  "NAME": "view_href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/view_href",
  "CURRENT_LENGTH": 134
},

By using this map as a starting point, I can create a new map file -- one that is simpler, much smaller, and defines just the fields that I want. I can reference each field by its "path" in the JSON nested structure, and I can also specify the types and formats that I want in the final data.

In my new map, I eliminated many of the tables and fields and ended up with a file that was just about 60 lines long. I also applied sensible variable names, and I even specified SAS formats and informats to transform some columns during the import process. For example, instead of reading the message "datetime" field as a character string, I coerced the value into a numeric variable with a DATETIME format:

{
  "NAME": "datetime",
   "TYPE": "NUMERIC",
  "INFORMAT": [ "IS8601DT", 19, 0 ],
  "FORMAT": ["DATETIME", 20],
  "PATH": "/root/response/messages/message/post_time/_",
  "CURRENT_LENGTH": 8
},

I called my new map file 'minimap.map' and then re-issued the libname without the AUTOMAP option:

filename minmap 'c:tempminmap.map';
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts json fileref=topics map=minmap;
proc datasets lib=posts; quit;
 
data messages;
 set posts.messages;
run;

Here's a snapshot of the single data set as a result.

JSON final data
I think you'll agree that this result is much more usable than what my first pass produced. And the amount of code is much smaller and easier to maintain than any previous SAS-based process for reading JSON.

Here's the complete program in public GitHub gist, including my custom JSON map.


* By the way, tags: JSON, REST API, SAS programming

The post Reading data with the SAS JSON libname engine appeared first on The SAS Dummy.

十一 022016
 

With DataFlux Data Management 2.7, the major component of SAS Data Quality and other SAS Data Management solutions, every job has a REST API automatically created once moved to the Data Management Server. This is a great feature and enables us to easily call Data Management jobs from programming languages like Python. We can then involve the Quality Knowledge Base (QKB), a  pre-built set of data quality rules, and do other Data Quality work that is impossible or challenging to do when using only Python.

calling-sas-data-quality-jobs-from-pythonIn order to make a RESTful call from Python we need to first get the REST API information for our Data Management job. The best way to get this information is to go to Data Management Server in your browser where you’ll find respective links for:

  • Batch Jobs
  • Real-Time Data Jobs
  • Real-Time Process Jobs.

From here you can drill through to your job REST API.

Alternatively, you can use a “shortcut” to get the information by calling the job’s REST API metadata URL directly. The URL looks like this:

http://<DM Server>:<port>/<job type>/rest/jobFlowDefns/<job id>/metadata

calling-sas-data-quality-jobs-from-python02

The <job id> is simply the job name (with subdirectory and extension) Base64 encoded. This is a common method to avoid issues with illegal URL characters like: # % & * { } : < > ? / + or space. You can go to this website to Base64 encode your job name.

If you have many jobs on the Data Management Server it might be quicker to use the “shortcut” instead of drilling through from the top.

Here an example to get the REST API information for the Data Management job “ParseAddress.ddf” which is in the subdirectory Demo of Real-Time Data Services on DM Server:

calling-sas-data-quality-jobs-from-python03

We Base64 encode the job name “Demo/ParseAddress.ddf” using the website mentioned above…

calling-sas-data-quality-jobs-from-python04

…and call the URL for the job’s REST API metadata:

http://DMServer:21036/SASDataMgmtRTDataJob/rest/jobFlowDefns/ RGVtby9QYXJzZUFkZHJlc3MuZGRm/metadata

calling-sas-data-quality-jobs-from-python05

From here we collect the following information:

The REST API URL and Content-Type information…
calling-sas-data-quality-jobs-from-python06

…the JSON structure for input data

calling-sas-data-quality-jobs-from-python07

…which we need in this format when calling the Data Management job from Python:

{"inputs" : {"dataTable" : {"data" : [[ "sample string" ],[ "another string" ]], "metadata" : [{"maxChars" : 255, "name" : "Address", "type" : "string"}]}}}

…and the JSON structure for the data returned by the Data Management job.

calling-sas-data-quality-jobs-from-python08

When you have this information, the Python code to call the Data Management job would look like this:

calling-sas-data-quality-jobs-from-python09

The output data from the Data Management job will be in data_raw. We call the built-in JSON decoder from the “request” module to move the output into a dictionary (data_out) from where we can access the data. The structure of the dictionary is according to the REST metadata. We can access the relevant output data via data_out[‘outputs’][‘dataTable’][‘data’]

calling-sas-data-quality-jobs-from-python10

calling-sas-data-quality-jobs-from-python11

The Python program will produce an output like this…

calling-sas-data-quality-jobs-from-python12

You can find more information about the DataFlux Data Management REST API here.

Calling Data Management jobs from Python is straight forward and is a convenient way to augment your Python code with the more robust set of Data Quality rules and capabilities found in the SAS Data Quality solution.

Learn more about SAS Data Quality.

tags: data management, DataFlux Data Management Studio, open source, REST API

Calling SAS Data Quality jobs from Python was published on SAS Users.

202016
 

The study of social networks has gained importance over the years within social and behavioral research on HIV and AIDS. Social network research can show routes of potential viral transfer, and be used to understand the influence of peer norms and practices on the risk behaviors of individuals. This example analyzes the […]

Analyzing social networks using Python and SAS Viya was published on SAS Voices.

232016
 

Slack is a tremendously popular app for team collaboration. At its core, it's an open messaging app that allows team members to communicate with each other in real time. It works well with "startup"-style teamwork, which requires constant communication among people who aren't always located together. I've heard a lot about Slack from colleagues and from podcasts (Slack is a frequent advertiser on tech shows), but I'd never tried it myself until recently.

I was most curious about their APIs, which allow you to integrate other applications into the Slack messaging platform. Months ago, a colleague had asked me whether it was possible to integrate SAS output with Slack. I suspected that we could do something with PROC HTTP, but didn't get around to trying until today. I can now answer for certain: Yes! It's possible...and it's sort of fun!

Get started with a Slack "webhook"

I won't go into how you get started with Slack, except to say that it's really easy (and free) to create an account and a "channel" (your message space) for a small team. Assuming that you have that going, I'm going to show you how to use SAS to publish to your channel.

Slack supports a feature called "Incoming Webhooks," which is basically a simple endpoint URL that you can send messages to. From your Slack channel, you can select to Add Configuration, which takes you to the option to add a new Incoming Webhook:

slackwebhook
Click the button and Slack will provision a cryptic URL that's your unique endpoint. Any process that uses HTTP calls to POST content to that URL can publish content. The content itself (called the payload) is delivered in JSON format to the API.

Example 1: Simple in-line message with PROC HTTP

The Slack documentation supplies examples that use curl (command-line URL tool), but PROC HTTP is the SAS equivalent. Here's a simple example:

%let webhookUrl = https://hooks.slack.com/services/<your Webhook URL>;
 
/*
  Simple example with direct text 
*/
 
filename resp temp;
proc http
 url="&webhookUrl"
 method="POST"
 /* IN= supports text in SAS 9.4m3.  Earlier release? Use fileref with content */
 in='payload={"channel": "#fromsas", 
     "username": "sasprogram", 
     "text": "Created using PROC HTTP!"}'
 out=resp
 ;
run;

To try this, you'll have to first get your own Webhook URL and plug it into the program. I'd loan you mine, but you're not on my channel so you can't check the results...which look like this:

Slack simple

Example 2: Share rich messages with PROC HTTP and JSON

Slack also allows multipart messages with simple formatting, including some colors, custom icons, and working links. This requires a little bit more JSON content in the payload, including an array of attachments. Here's a more complex example:

/*
  More complex messages, with multiple parts.
  Use the attachments fields that Slack supports
*/
filename rich temp;
 
data _null_;
 file rich;
 infile datalines4;
 input;
 put _infile_;
datalines4;
payload=
   {   
    "channel": "#fromsas", 
    "username": "sasprogram",
    "icon_emoji": ":fax:",
   "attachments":[
      {
  "fallback":
   "New SAS Dummy post!: <http://blogs.sas.com/content/sasdummy|The SAS Dummy blog>",
 "pretext":
   "New SAS Dummy post!: <http://blogs.sas.com/content/sasdummy|The SAS Dummy blog>",
         "color":"#3030F0",
         "fields":[
            {
               "title":"Great news!",
               "value":"That Chris...he's done it again!",
               "short":false
            }
         ]
      }
   ]
  }	
;;;;
 
proc http
 url="&webhookUrl"
 method="POST"
 in=rich
 out=resp
 ;
run;

Here's the result. See how I selected a nice modern emoji as the account icon? Slack has hundreds of these available.

Slack message

Example 3: Data-driven JSON payload published to Slack using PROC HTTP

But the real power of this integration from SAS is the ability to push dynamic, data-driven content to your Slack channel. To accomplish that, you need to dynamically generate your JSON content with the fields that you want to share. Here's an example that publishes the output of a SAS procedure (read from a data set) to the channel:

/*
 And finally an example that publishes values from data!
*/
 
/* Calculate some data */
proc means data=sashelp.class noprint;
var age;
output out=stats;
run;
 
/* file to hold the JSON payload */
filename msg temp;
 
/* Create the start of the JSON payload */
data _null_;
 file msg ;
 infile datalines4;
 input;
 put _infile_;
datalines4;
payload=
   {   
    "channel": "#fromsas", 
    "username": "sasprogram",
    "icon_emoji": ":fax:",
   "attachments":[
      {
         "fallback":"Latest Stats for AGE in SASHELP.CLASS",
         "pretext":"Latest Stats for AGE in SASHELP.CLASS",
         "color":"#D000FF",
         "fields":[
;;;;
 
/* fill in the data fields in the middle */
data _null_;
 file msg mod;
 set stats end=eof;
 put '{ "short":false, "title": "' _stat_ '",';
 put '"value": "' age '" }';
 /* separate values with commas, except the last */
 if not eof then put ",";
run;
 
/*
  And finish with the tail end of the payload
*/
 
data _null_;
 file msg mod;
 infile datalines4;
 input;
 put _infile_;
datalines4;
         ]
      }
   ]
  }	
;;;;
 
proc http
 url="&webhookUrl"
 method="POST"
 in=msg
 out=resp
 ;
run;

Here's the result -- the latest figures from SASHELP.CLASS!
Slack data

I've shared the complete example code on a public Gist on GitHub. Remember, to try it yourself you'll need to:

  • Create a Slack account, if you don't have one. Along with a new Slack channel.
  • Use the Slack site to add a new Incoming Webhook for your channel
  • Replace the webhookURL macro value in my example code with your specific Webhook URL.

Have fun! And if you create anything really interesting, I hope you'll invite me to your Slack channel!

tags: PROC HTTP, REST API, Slack

The post How to publish to a Slack channel with SAS appeared first on The SAS Dummy.

172016
 

REpresentational State Transfer (REST) is an architectural style for designing web services that access a system's resources using HTTP methods.

With the release of DataFlux Data Management 2.7 earlier this year, three new REST Application Programming Interfaces (APIs) are now available:

  • Data Management Server Batch Jobs
  • Data Management Server Real-time Data Jobs
  • Data Management Server Real-time Process Jobs

The Base URLs for the new Data Management Server REST APIs are:

Batch Jobs – http://<DM Server IP address or server name>:<port>/ SASDataMgmtBatchJob/rest

Real-time Data Services – http://<DM Server IP address or server name>:<port>/ SASDataMgmtRTDataJob/rest

Real-time Process Services – http://<DM Server IP address or server name>:<port>/ SASDataMgmtRTProcessJob/rest

Here is a sample call to the Base URL for Real-time Data Services:

http://my_dmserver.com:21036/SASDataMgmtRTDataJob/rest

RESTAPIs

The Data Management Server job or service name is referred to by its ID which is simply the name of the job or service (with its extension) Base64 encoded.  You can use this site to Base64 encode text.  Base64 encoding is a common standard use to encode paths or other resource names in any URL.  For example, without Base64 encoding, a filename and path that contains a space can cause problems.

Here is a sample call for a specific Real-time Data Service using its Base64 encoded name for its ID.

http://my_dmserver.com:21036/SASDataMgmtRTDataJob/rest/jobFlowDefns/ RGV0ZXJtaW5lX0dlbmRlci5kZGY=

RESTAPIs2

Note that the returned text contains information about the links to call (href), method to use as well as content type of the request.

Before reviewing the REST API metadata information for the service, let’s first look at testing the Determine Gender data service using the Data Management Server test service utility.

RESTAPIs3

The input of the service is Name and the output is Gender.

Here is an example of the REST API metadata information which includes the inputs and outputs for the Determine_Gender.ddf service.

http://my_dmserver.com:21036/SASDataMgmtRTDataJob/rest/jobFlowDefns/ RGV0ZXJtaW5lX0dlbmRlci5kZGY=/metadata

RESTAPIs4

The following are some common tools for testing REST APIs requests and responses:

Here is an example of the request/response information for executing the Determine Gender data service. The input Names of “John Smith” and “Jane Smith” were sent to the service and the Genders of “M” and “F” were returned.  (The Mozilla Firefox Add-on tool was used to generate this information.)

RESTAPIs5

The table below lists the possible HTTP status response codes that could be returned by a Data Management Server REST API request:

RESTAPIs6

For more information refer to the DataFlux Data Management Server 2.7: REST API Reference guide.

 

 

tags: data management, Data Management Server, REST API, SAS Professional Services

DataFlux Data Management Server 2.7: REST APIs was published on SAS Users.

292015
 

Thanks to the proliferation of cloud services and REST-based APIs, SAS users have been making use of PROC HTTP calls (to query these web services) and some creative DATA step or PROC GROOVY code to process the JSON results. Such methods get the job done (JSON is simply text, after all), but they aren't as robust as an official JSON parser. JSON is simple: it's a series of name-value pairs that represent an object in JavaScript. But these pairs can be nested within one another, so in order to parse the result you need to know about the object structure. A parser helps with the process, but you still need to know the semantics of any JSON response.

SAS 9.4 introduced PROC JSON, which allows you to create JSON output from a data set. But it wasn't until SAS 9.4 Maintenance 3 that we have a built-in method to parse JSON content. This method was added as a DS2 package: the JSON package.

I created an example of the method working -- using an API that powers our SAS Support Communities! The example queries communities.sas.com for the most recent posts to the SAS Programming category. Here's a small excerpt of the JSON response.

 "post_time": "2015-09-28T16:29:05+00:00",
  "views": {
  "count": 1
  },
  "subject": "Re: How to code for the consecutive values",
  "author": {
  "href": "/users/id/13884",
  "login": "ballardw"

Notice that some items, such as post_time, are simple one-level values. But other items, such as views or author, require a deeper dive to retrieve the value of interest ("count" for views, and "login" for author). The DS2 JSON parser can help you to navigate to those values without you needing to know how many braces or colons or commas are in your way.

Here is an example of the result: a series plot from PROC SGPLOT and a one-way frequency analysis from PROC FREQ. The program also produces a detailed listing of the messages, the topic content, and the datetime stamp.

series

boardfreq
This is my first real DS2 program, so I'm open to feedback. I already know of a couple of improvements I should make, but I want to share it now as I think it's good enough to help others who are looking to do something similar.

The program requires SAS 9.4 Maintenance 3. It also works fine in the most recent version of SAS University Edition (using SAS Studio 3.4). All of the code runs using just Base SAS procedures.

/* DS2 program that uses a REST-based API */
/* Uses http package for API calls       */
/* and the JSON package (new in 9.4m3)   */
/* to parse the result.                  */
proc ds2; 
  data messages (overwrite=yes);
    /* Global package references */
    dcl package json j();
 
    /* Keeping these variables for output */
    dcl double post_date having format datetime20.;
    dcl int views;
    dcl nvarchar(128) subject author board;
 
    /* these are temp variables */
    dcl varchar(65534) character set utf8 response;
    dcl int rc;
    drop response rc;
 
    method parseMessages();
      dcl int tokenType parseFlags;
      dcl nvarchar(128) token;
      rc=0;
      * iterate over all message entries;
      do while (rc=0);
        j.getNextToken( rc, token, tokenType, parseFlags);
 
        * subject line;
        if (token eq 'subject') then
          do;
            j.getNextToken( rc, token, tokenType, parseFlags);
            subject=token;
          end;
 
        * board URL, nested in an href label;
        if (token eq 'board') then
          do;
            do while (token ne 'href');
               j.getNextToken( rc, token, tokenType, parseFlags );
            end;
            j.getNextToken( rc, token, tokenType, parseFlags );
            board=token;
          end;
 
        * number of views (int), nested in a count label ;
        if (token eq 'views') then
          do;
            do while (token ne 'count');
               j.getNextToken( rc, token, tokenType, parseFlags );
            end;
            j.getNextToken( rc, token, tokenType, parseFlags );
            views=inputn(token,'5.');
          end;
 
        * date-time of message (input/convert to SAS date) ;
        * format from API: 2015-09-28T10:16:01+00:00 ;
        if (token eq 'post_time') then
          do;
            j.getNextToken( rc, token, tokenType, parseFlags );
            post_date=inputn(token,'anydtdtm26.');
          end;
 
        * user name of author, nested in a login label;
        if (token eq 'author') then
          do; 
            do while (token ne 'login');
               j.getNextToken( rc, token, tokenType, parseFlags );
            end;
            * get the author login (username) value;
            j.getNextToken( rc, token, tokenType, parseFlags );
            author=token;
            output;
          end;
      end;
      return;
    end;
 
    method init();
      dcl package http webQuery();
      dcl int rc tokenType parseFlags;
      dcl nvarchar(128) token;
      dcl integer i rc;
 
      /* create a GET call to the API                                         */
      /* 'sas_programming' covers all SAS programming topics from communities */
      webQuery.createGetMethod(
         'http://communities.sas.com/kntur85557/' || 
         'restapi/vc/categories/id/sas_programming/posts/recent' ||
         '?restapi.response_format=json' ||
         '&restapi.response_style=-types,-null&page_size=100');
      /* execute the GET */
      webQuery.executeMethod();
      /* retrieve the response body as a string */
      webQuery.getResponseBodyAsString(response, rc);
      rc = j.createParser( response );
      do while (rc = 0);
        j.getNextToken( rc, token, tokenType, parseFlags);
        if (token = 'message') then
          parseMessages();
      end;
    end;
 
  method term();
    rc = j.destroyParser();
  end;
 
  enddata;
run;
quit;
 
/* Add some basic reporting */
proc freq data=messages noprint;
    format post_date datetime11.;
    table post_date / out=message_times;
run;
 
ods graphics / width=2000 height=600;
title '100 recent message contributions in SAS Programming';
title2 'Time in GMT';
proc sgplot data=message_times;
    series x=post_date y=count;
    xaxis minor label='Messages';
    yaxis label='Time created' grid;
run;
 
title 'Board frequency for recent 100 messages';
proc freq data=messages order=freq;
    table board;
run;
 
title 'Detailed listing of messages';
proc print data=messages;
run;
 
title;

I also shared this program on the SAS Support Communities as a discussion topic. If you want to contribute to the effort, please leave me a reply with your suggestions and improvements!

tags: DS2, JSON, REST API, SAS 9.4

The post Using SAS DS2 to parse JSON appeared first on The SAS Dummy.

072013
 

Last year I shared this popular tip for counting how many times a web link has been shared on Twitter or Facebook. I use this technique daily to report on the social media "popularity" of our blog articles at SAS.

I wanted to add LinkedIn into the mix. Like Twitter and Facebook, LinkedIn also has a REST-based API that can be used within a SAS program. For example, if I want to know how many times my recent post "It's raining analytics" has been shared on LinkedIn, I can use PROC HTTP to hit this URL:

http://www.linkedin.com/countserv/count/share?url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/

Here is the JSON-formatted response (as of today):

IN.Tags.Share.handleCount ( { "count":89, "fCnt":"89", "fCntPlusOne":"90", "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2013\/01\/15\/its-raining-analytics\/" } );

Wow! That blog post has done pretty well on LinkedIn (with "count"=89) - it's my most-shared post this year.

Here's a SAS program that checks the LinkedIn shares for me:

%let url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/;

/* temp holding area for LinkedIn response */
filename li temp;

/* call the LinkedIn API */
proc http
  url="http://www.linkedin.com/countserv/count/share?url=&url."
  method='GET'
  /* proxyhost= if behind corp firewall */
  out=li;
run;

/* use RegEx to gather the "count":n  value */
data liresult (keep=url lishares);
  length line $ 1000 lishares 8;
  length url $ 300;
  url = "&url.";
  infile li;
  input line;

  if _n_ = 1 then
    do;
      retain li_regex;
      li_regex = prxparse("/\""count\""\:([0-9]*)/");
    end;

  position = prxmatch(li_regex,line);

  if (position ^= 0) then
    do;
      call prxposn(li_regex, 1, start, length);
      lishares = substr(line,start,length);
    end;
run;

/* clear our temp response file */
filename li clear;

Result:

That's a lot of code to retrieve the answer for just one link. Thanks to the power of the SAS macro language, I can scale this to retrieve the values for an entire collection of links. With those results in hand, I can run other stats:

Of my 63 posts in the past 12 months, my links have been shared to LinkedIn an average of 4.58 times, with a total of 289 shares overall.

I'm not so naive that I consider these to be impressive numbers, but I've only just begun the habit of sharing my posts on LinkedIn. With this process as part of my daily blog reporting, I can now measure how my "LinkedIn engagement" improves as I share more content. Collect data, count, measure, report -- that's what it's all about, right?

Note: Many web articles, such as blog posts, can have multiple URLs. For example, the WordPress platform offers "short-link" GUID URLs as well as the longer, more descriptive URLs. While all of these different URLs might lead to the same page, LinkedIn counts only the URL you share. So if you are in the habit of publicizing different URLs for convenience or other tracking purposes, you might need to check each permutation of a page URL with this program to get the complete "LinkedIn shares" picture.

Reference example

Complete macro version of my LinkedIn shares program (lishares_example.sas)

tags: LinkedIn, PROC HTTP, REST API
072013
 

Last year I shared this popular tip for counting how many times a web link has been shared on Twitter or Facebook. I use this technique daily to report on the social media "popularity" of our blog articles at SAS.

I wanted to add LinkedIn into the mix. Like Twitter and Facebook, LinkedIn also has a REST-based API that can be used within a SAS program. For example, if I want to know how many times my recent post "It's raining analytics" has been shared on LinkedIn, I can use PROC HTTP to hit this URL:

http://www.linkedin.com/countserv/count/share?url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/

Here is the JSON-formatted response (as of today):

IN.Tags.Share.handleCount ( { "count":89, "fCnt":"89", "fCntPlusOne":"90", "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2013\/01\/15\/its-raining-analytics\/" } );

Wow! That blog post has done pretty well on LinkedIn (with "count"=89) - it's my most-shared post this year.

Here's a SAS program that checks the LinkedIn shares for me:

%let url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/;

/* temp holding area for LinkedIn response */
filename li temp;

/* call the LinkedIn API */
proc http
  url="http://www.linkedin.com/countserv/count/share?url=&url."
  method='GET'
  /* proxyhost= if behind corp firewall */
  out=li;
run;

/* use RegEx to gather the "count":n  value */
data liresult (keep=url lishares);
  length line $ 1000 lishares 8;
  length url $ 300;
  url = "&url.";
  infile li;
  input line;

  if _n_ = 1 then
    do;
      retain li_regex;
      li_regex = prxparse("/\""count\""\:([0-9]*)/");
    end;

  position = prxmatch(li_regex,line);

  if (position ^= 0) then
    do;
      call prxposn(li_regex, 1, start, length);
      lishares = substr(line,start,length);
    end;
run;

/* clear our temp response file */
filename li clear;

Result:

That's a lot of code to retrieve the answer for just one link. Thanks to the power of the SAS macro language, I can scale this to retrieve the values for an entire collection of links. With those results in hand, I can run other stats:

Of my 63 posts in the past 12 months, my links have been shared to LinkedIn an average of 4.58 times, with a total of 289 shares overall.

I'm not so naive that I consider these to be impressive numbers, but I've only just begun the habit of sharing my posts on LinkedIn. With this process as part of my daily blog reporting, I can now measure how my "LinkedIn engagement" improves as I share more content. Collect data, count, measure, report -- that's what it's all about, right?

Note: Many web articles, such as blog posts, can have multiple URLs. For example, the WordPress platform offers "short-link" GUID URLs as well as the longer, more descriptive URLs. While all of these different URLs might lead to the same page, LinkedIn counts only the URL you share. So if you are in the habit of publicizing different URLs for convenience or other tracking purposes, you might need to check each permutation of a page URL with this program to get the complete "LinkedIn shares" picture.

Reference example

Complete macro version of my LinkedIn shares program (lishares_example.sas)

tags: LinkedIn, PROC HTTP, REST API
072012
 

As of this writing, I've posted 34 blogs posts during 2012 (not counting this one), which have been shared on Twitter a total of 368 times and on Facebook a total of 92 times. Those aren't exactly "viral" numbers, but I try not to get too hung up on it.

How do I know these numbers? If you are reading this blog post on our blogs.sas.com site, then you can see the "social share" counts for each individual post. For example, see the header from this recent post:


As it would be a tedious task to view each post and tally up the counts, I wrote a SAS program to do that work for me.

The little social count widgets that you see on our blog site are powered by "count" APIs that are surfaced by Twitter and Facebook servers. Each of these APIs is a little bit different, but they work basically in the same way. You create a request in the form of a URL, which contains the address of the API service plus the web page URL that you want to know the "count" of. You use HTTP to issue the request, and the response comes back in the form of a JSON string. Unlike some web service APIs, these API requests do not require authentication or application permissions, so they are easy to use with basic HTTP.

For example, the following URL request returns the number of times that this blog post has been tweeted (click on the link to see the response, and tweet this blog post to watch it go up!):

http://urls.api.twitter.com/1/urls/count.json?url=http://blogs.sas.com/content/sasdummy/2012/07/06/social-media-counts-in-sas/

The exact count in the response will change over time, but here is an example of an outrageously optimistic result:

{ "count":4032,
  "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2012\/07\/06\/social-media-counts-in-sas\/"
}

JSON has become an ubiquitous data exchange format among web service APIs. The SAS language doesn't yet support native parsing of JSON, but with a little bit of programming you can easily parse simple JSON to get the information that you need.

The SAS program that I wrote uses these techniques to gather the Twitter and Facebook counts:

  • Use a DATA step to create a list of URLs (web pages, or links). In my example I included the URLs for the blog topics I've posted thus far in 2012.
  • Use a SAS macro to loop through each URL record and create a data set with a single record.
  • In the macro loop for each URL, use FILENAME URL to construct filerefs that will connect to Twitter and Facebook using well-formed API requests.
  • In the macro loop for each URL, use a DATA step to "input" the file (producing the API response in JSON form) and record it into a data set.
  • Outside of the macro loop, concatenate all responses into a data set (one for Twitter responses and one for Facebook responses).
  • Use Perl regular expression functions to parse the responses into just the numbers I'm looking for.
  • And finally, join all of the social counts together to create a single data set with the URLs and their current tweet/share counts.

These count values are a snapshot of the current number of shares, so I also "stamp" each record with today's date in case I want to build a historical set of these counts and see the trends over time.

This program contains a few inefficiencies. It passes through the "links" source data many times (once for each link) and creates lots of tiny transient data sets along the way. However, the biggest impact on the program performance is the time it takes for the Twitter and Facebook APIs to respond. In my experience, the Twitter API is usually very fast. The Facebook API is sometimes fast, but sometimes a call can be blocked for several seconds (or longer). Because of their scale, both services are sensitive to overuse or abuse, and may "throttle" or deny requests from IP addresses that use them irresponsibly.

When you see the numbers, you might wonder: are they accurate? Can you rely on these results as the absolute counts of how your web content has been shared/liked via social media? Personally, I believe the answer is No. However, I think that the values are useful for relative comparisons. Without focusing on the absolute numbers, I can still learn which of my posts gathered more or less engagement on Twitter, Facebook, or both.

You can download the complete program (with sample links data) from here:
GetSocialStats_example.sas

I've included the program logic (not the sample data) in the remainder of this post. I added a few comments to the code to help convey what's going on.

/* Get count of URLs in source data                */
/* Here it assumes WORK.LINKS                      */
/* and the page link to check is in the URL column */
proc sql noprint;
  select count(URL) into :urlCount from WORK.LINKS;
quit;

/* trim this for later use in constructing data set names*/
%let urlCount = %trim(&urlCount);

%macro gatherSocialStats;
  /* This loop reads each URL link from the data set */
  /* one at a time, and then uses FILENAME URL       */
  /* to invoke the Twitter and Facebook APIs for     */
  /* Tweet counts and Share counts                   */
  %do i = 1 %to &urlCount;

    /* read the next URL */
    proc sql noprint;
      select trim(url) into :url from work.links(firstobs=&i obs=&i);
    quit;

    %put Checking %str(&url.);

    /* format of the Twitter API call */
    filename twurl url "http://urls.api.twitter.com/1/urls/count.json?url=&url." 
      /* supply PROXY= option if behind a corporate firewall/gateway */
    ;

    /* format of the Facebook API call */
    filename fburl url "http://graph.facebook.com/?ids=&url." 
      /* supply PROXY= option if behind a corporate firewall/gateway */
    ;

    data twresult&i;
      /* Running in EG? SYSECHO helps convey progress in status bar */
      SYSECHO "Checking Tweet URL &i. of &urlCount.";
      length line $ 500 url $ 500;
      infile twurl;
      url = "&url.";
      input line;
    run;

    data fbresult&i;
      /* Running in EG? SYSECHO helps convey progress in status bar */
      SYSECHO "Checking FB URL &i. of &urlCount.";
      length line $ 500 url $ 500;
      infile fburl;
      input line;
      url = "&url.";
    run;

    /* Free up the filerefs */
    filename twurl clear;
    filename fburl clear;
  %end;

  /* concatenate all Twitter counts */
  data twresults;
    set twresult1-twresult&urlCount.;
  run;

  /* concatenate all Facebook share counts */
  data fbresults;
    set fbresult1-fbresult&urlCount.;
  run;

  /* clean up all of the tiny data sets */
  proc datasets lib=work nolist;
    delete twresult1-twresult&urlCount.;
    delete fbresult1-fbresult&urlCount.;
  quit;

%mend;

%gatherSocialStats;

/* Post processing to PARSE the results, which */
/* are in JSON format */

/* Example result:
{
  "http://blogs.sas.com/content/sasdummy/2012/01/25/export-to-excel-got-easier/": 
  {
     "id": "http://blogs.sas.com/content/sasdummy/2012/01/25/export-to-excel-got-easier/",
     "shares": 3
  }
}
*/
data work.fbcounts (keep=fbshares url);
  set fbresults;
  length fbshares 8;
  if _n_ = 1 then
    do;
      retain fb_regex;
      /* looking for "shares":<a number> */
      fb_regex = prxparse("/\""shares\""\:([0-9]*)/");
    end;
  position = prxmatch(fb_regex,line);
  if (position ^= 0) then
    do;
      call prxposn(fb_regex, 1, start, length);
      fbshares = substr(line,start,length);
    end;
run;

/* Post processing to PARSE the results, which */
/* are in JSON format */

/* Example result:
  {
  "count":8,
  "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2012\/01\/27\/achieving-mediocrity\/"
  }
*/
data work.twcounts (keep=tweets url);
  set twresults;
  length tweets 8;
  if _n_ = 1 then
    do;
      retain tw_regex;
      /* looking for "count":<a number> */
      tw_regex = prxparse("/\""count\""\:([0-9]*)/");
    end;
  position = prxmatch(tw_regex,line);
  if (position ^= 0) then
    do;
      call prxposn(tw_regex, 1, start, length);
      tweets = substr(line,start,length);
    end;
run;

/* Join it all together with the URL, tweet counts, */
/* and share counts */
proc sql;
  create table work.socialcounts as 
    select t1.url, 
      ifn(t2.fbshares eq .,0,t2.fbshares) as fbshares, 
      t3.tweets,
      /* date-stamp the value for use in history/comparisons */
      today() as as_of_date format=date9.
  from work.links t1 
    left join work.fbcounts t2 on (t1.url = t2.url) 
    left join work.twcounts t3 on (t1.url = t3.url)
  ;
quit;
tags: facebook, FILENAME URL, JSON, REST API, twitter