REST API

10月 122017
 

With SAS Data Management, you can setup SAS Data Remediation to manage and correct data issues. SAS Data Remediation allows user- or role-based access to data exceptions.

When a data issue is discovered it can be sent automatically or manually to a remediation queue where it can be corrected by designated users.

Let’s look how to setup a remediation service and how to send issue records to Data Remediation.

Register the remediation service.

To register a remediation service in SAS Data Remediation we go to Data Remediation Administrator “Add New Client Application.

Under Properties we supply an ID, which can be the name of the remediation service as long as it is unique, and a Display name, which is the name showing in the Remediation UI.

Under the tab Subject Area, we can register different subject categories for this remediation service.  When calling the remediation service we can categorize different remediation issues by setting different subject areas. We can, for example, use the Subject Area to point to different Data Quality Dimensions like Completeness, Uniqueness, Validity, Accuracy, Consistency.

Under the tab Issues Types, we can register issue categories. This enables us to categorize the different remediation issues. For example, we can point to the affected part of record like Name, Address, Phone Number.

At Task Templates/Select Templates we can set a workflow to be used for each issue type. You can design your own workflow using SAS Workflow Studio or you can use a prepared workflow that comes with Data Remediation. You need to make sure that the desired workflow is loaded on to Workflow Server to link it to the Data Remediation Service. Workflows are not mandatory in SAS Data Remediation but will improve efficiency of the remediation process.

Saving the remediation service will make it available to be called.

Sending issues to Data Remediation.

When you process data, and have identified issues that you want to send to Data Remediation, you can either call Data Remediation from the job immediately where you process the data or you store the issue records in a table first and then, in a second step, create remediation records via a Data Management job.

To send records to Data Remediation you can call remediation REST API form the HTTP Request node in a Data Management job.

Remediation REST API

The REST API expects a JSON structure supplying all required information:

{
	"application": "mandatory",
	"subjectArea": "mandatory",
	"name": "mandatory",
	"description": "",
	"userDefinedFieldLabels": {
		"1": "",
		"2": "",
		"3": ""
	},
	"topics": [{
		"url": "",
		"name": "",
		"userDefinedFields": {
			"1": "",
			"2": "",
			"3": ""
		},
		"key": "",
		"issues": [{
			"name": "mandatory",
			"importance": "",
			"note": "",
			"assignee": {
				"name": ""
			},
			"workflowName": "",
			"dueDate": "",
			"status": ""
		}]
	}]
}

 

JSON structure description:

In a Data Management job, you can create the JSON structure in an Expression node and use field substitution to pass in the necessary values from the issue records. The expression code could look like this:

REM_APPLICATION= "Customer Record"
REM_SUBJECT_AREA= "Completeness"
REM_PACKAGE_NAME= "Data Correction"
REM_PACKAGE_DESCRIPTION= "Mon-Result: " &formatdate(today(),"DD MM YY") 
REM_URL= "http://myserver/Sourcesys/#ID=" &record_id
REM_ITEM_NAME= "Mobile phone number missing"
REM_FIELDLABEL_1= "Source System"
REM_FIELD_1= "CRM"
REM_FIELDLABEL_2= "Redord ID"
REM_FIELD_2= record_id
REM_FIELDLABEL_3= "-"
REM_FIELD_3= ""
REM_KEY= record_id
REM_ISSUE_NAME= "Phone Number"
REM_IMPORTANCE= "high"
REM_ISSUE_NOTE= "Violated data quality rule phone: 4711"
REM_ASSIGNEE= "Ben"
REM_WORKFLOW= "Customer Tag"
REM_DUE-DATE= "2018-11-01"
REM_STATUS= "open"
 
JSON_REQUEST= '
{
  "application":"' &REM_APPLICATION &'",
  "subjectArea":"' &REM_SUBJECT_AREA &'",
  "name":"' &REM_PACKAGE_NAME &'",
  "description":"' &REM_PACKAGE_DESCRIPTION &'",
  "userDefinedFieldLabels": {
    "1":"' &REM_FIELDLABEL_1 &'",
    "2":"' &REM_FIELDLABEL_2 &'",
    "3":"' &REM_FIELDLABEL_3 &'"
  },
  "topics": [{
    "url":"' &REM_URL &'",
    "name":"' &REM_ITEM_NAME &'",
    "userDefinedFields": {
      "1":"' &REM_FIELD_1 &'",
      "2":"' &REM_FIELD_2 &'",
      "3":"' &REM_FIELD_3 &'"
    },
    "key":"' &REM_KEY &'",
    "issues": [{
      "name":"' &REM_ISSUE_NAME &'",
      "importance":"' &REM_IMPORTANCE &'",
      "note":"' &REM_ISSUE_NOTE &'",
      "assignee": {
        "name":"' &REM_ASSIGNEE &'"
      },
      "workflowName":"' &REM_WORKFLOW &'",
      "dueDate":"' &REM_DUE_DATE &'",
      "status":"' &REM_STATUS &'"
    }]
  }]
}'

 

Tip: You could also write a global function to generate the JSON structure.

After creating the JSON structure, you can invoke the web service to create remediation records. In the HTTP Request node, you call the web service as follows:

Address:  http://[server]:[port]/SASDataRemediation/rest/groups
Method: post
Input Filed: The variable containing the JSON structure. I.e. JSON_REQUEST
Output Filed: A field to take the output from the web service. You can use the New button create a filed and set the size to 1000
Under Security… you can set a defined user and password to access Data Remediation.
In the HTTP Request node’s advanced settings set the WSCP_HTTP_CONTENT_TYPE options to application/json

 

 

 

You can now execute the Data Management job to create the remediation records in SAS Data Remediation.

Improving data quality through SAS Data Remediation was published on SAS Users.

4月 152017
 

Every day before I even wake up, I have little "SAS robots" that do work for me. These are SAS batch jobs that gather data from external services and build data marts, generate reports, and send e-mail. One of those SAS jobs gathers Google Analytics data about our SAS blogs at blogs.sas.com.

With SAS 9.4 Maintenance 4, it's finally easy (well, relatively speaking) to connect to complicated APIs like those supported by Google, and to gather information with an unattended batch process. It's made possible by recent enhancements in PROC HTTP and the new JSON library engine in SAS. The PROC HTTP enhancements make it easier to negotiate multi-step authentication schemes like OAuth2. And of course, the JSON engine makes it easier to parse JSON results into SAS data sets. If you scour the Internet and SAS conference papers, you might find examples that use PROC GROOVY or other tricks to call outside of SAS to drive these OAuth2 APIs and parse JSON. The recent enhancements make such tricks unnecessary, and thus provide a cleaner approach with fewer moving parts.

Joseph Henry, a senior software developer at SAS (and primary developer for PROC HTTP), wrote a useful SAS Global Forum paper about PROC HTTP and OAuth2. Joseph definitely understands this better than I do, so I recommend that you read his paper first. Joseph's example uses Google Drive. Come back to this post when you want to see the instructions I've hacked together -- it's what I've got working for Google Analytics.

How to use the Google APIs

There are four main steps to use the Google APIs. Two of these steps need to be performed just once, and don't need to be automated. The other two steps are necessary each time you access a Google API from a program, and they can be scripted. Here are the steps:

  1. Grant permission and obtain an authorization token for your "app" (a SAS program, in our case). This must be performed in a browser while logged into your Google account.
  2. Obtain an access token and refresh token. You can accomplish this with a SAS program that you run just once. You then save the refresh token (in a safe place!) for subsequent runs.
  3. Convert your saved refresh token into an access token. Do this at the start of every SAS job that needs to get Google Analytics data.
  4. And finally, use the Google Analytics API to get actual data!

There is a lot to learn about how Google APIs work and how to provision your Google account with access to resources like Google Analytics. That's more than I can go into here, so I'm going to assume a few things and skip ahead. I'm going to assume that you already have an API project created, and that you have a "client ID" and "client secret". The Google API documentation is very good on this topic, and there are many Internet resources to help you learn this part of the process.

Also, some Google APIs are free to use, while others cost money. And some are rate-limited -- that is, you can call the APIs only so many times within a given period. Your specific limits might differ depending on the type of accounts you have with Google. If you have a corporate relationship with Google, you might have an admin who has to provision your specific Google account to get access to the APIs. I'm glossing over all of these nuances -- this post is simply about the code.

Step 1. Grant permission and retrieve auth code

Do this step only once per account, per API permission. You perform this step in the browser while logged into your Google account. Enter the following URL, substituting your client-id as indicated. The URL needs to be all on one line, but I've broken it up here for readability.

 https://accounts.google.com/o/oauth2/v2/auth?
         scope=https://www.googleapis.com/auth/analytics.readonly
         &redirect_uri=urn:ietf:wg:oauth:2.0:oob
         &response_type=code
         &client_id=<your-client-id>.apps.googleusercontent.com 

You will be prompted to allow the "app" access to your Google Analytics data (read only). This is similar to allowing another app to access your Facebook or Twitter profile -- you've probably seen similar prompts during your own Internet citizenship. Permissions are an important component of allowing apps to act on your behalf via REST APIs.

Then, you'll be redirected to a web page with an auth code that you should copy and save. We'll use it in the next step. Setting the redirect_uri properly is very important: redirect_uri=urn:ietf:wg:oauth:2.0:oob. Otherwise the API won't generate a code that you can use in a tool-based app like SAS.

Step 2. Exchange the auth code for an access token

Next, run this PROC HTTP step in SAS with the POST method to exchange that auth code from Step 1 for an access token. It will return a JSON response with a valid Bearer access token. That token expires in 3600 seconds (1 hour). It also returns a refresh_token, which you can exchange again for a new access token after the first one expires. The refresh_token never expires (though it can be revoked via the developer console or API). Thus, you usually need to perform this step just once, unless your token is revoked for some reason.

/* file to store your result */
filename token "c:\temp\token.json";
%let code_given =<code-returned-from-step-1> ;
%let oauth2=https://www.googleapis.com/oauth2/v4/token;
%let client_id=<your-client-id>.apps.googleusercontent.com;
%let client_secret=<your-client-secret>;
proc http
/* put this all on one line! */
 url="&oauth2.?client_id=&client_id.%str(&)code=&code_given.
      %str(&)client_secret=&client_secret.%str(&)redirect_uri=urn:ietf:wg:oauth:2.0:oob
      %str(&)grant_type=authorization_code%str(&)response_type=token"
 method="POST"
 out=token
;
run;

Note: the refresh_token and client-id/secret values should be protected! Anyone who has access to these can get to your Google API data as if they were you. Consider storing them in a file that only you have read access to, and programmatically pull them in when running your SAS program under your host account.

Step 3. Exchange the refresh_token for a valid access token

Typically, you'll include this step just once at the beginning of your SAS job. This takes your saved refresh_token value and asks Google to grant an access token for use in the rest of your program. The Google APIs run very fast -- you should not need to renew the token again within the same job.

/* STEP 3. Do this every time you want to use the GA API */
/* Turn in a refresh-token for a valid access-token      */
/* Should be good for 60 minutes                         */
/* So typically run once at beginning of the job.        */
%let oauth2=https://www.googleapis.com/oauth2/v4/token;
%let client_id=<your-client-id>.apps.googleusercontent.com;
%let client_secret=<your-client-secret>;
%let refresh_token=<refresh-token-from-step-2>;
 
filename rtoken temp;
proc http
 method="POST"
 /* Again, put this all on one line */
 /* broken here for readability */
 url="&oauth2.?client_id=&client_id.
     %str(&)client_secret=&client_secret.
     %str(&)grant_type=refresh_token%str(&)refresh_token=&refresh_token."
 out=rtoken;
run;
 
/* Read the access token out of the refresh response  */
/* Relies on the JSON libname engine (9.4m4 or later) */
libname rtok json fileref=rtoken;
data _null_;
 set rtok.root;
 call symputx('access_token',access_token);
run;

Step 4. Use the Google Analytics API to gather data

Finally, we're to the point where we can retrieve data from this service! And this is where those new features in PROC HTTP come into play. In SAS 9.4m3, Joseph added support for an inline HEADERS statement, which is perfect for injecting OAuth2 information like "Bearer" token into the HTTP header.

proc http
  url="<REST-api-url-call>"
  method="GET" out=ga_resp;
  /* Headers statement makes this easy */
  headers 
    "Authorization"="Bearer &access_token."
    "client-id:"="&client_id.";
 run;

I've noticed that a lot of REST API docs use cUrl (command-line URL) as in their examples. In my experience, PROC HTTP can do just about anything that cUrl can do, and these new options make it even easier to make cUrl-based examples work in SAS.

The Google Analytics API is complex on its own, so I suggest that you spend some time in the Google API Explorer to see what types of requests yield what sort of results. Google Analytics allows you to fetch metrics across different dimensions and dates, so the results can come back summarized across the dimensions that you request. Since we're bringing this data into SAS and I probably want to further summarize in my reports, I try to get as much "unsummarized" data as possible, and I avoid aggregations such as "Average views" or "Average time on page" -- as I don't want to risk calculating new summaries based on these.

I've included my program for retrieving the page views, total time on page, entrances and exits at the daily level. This is granular enough for me to create meaningful reports all of our blog authors. The program is a SAS macro that issues one API call per day in the range specified.

/* Metrics and dimensions are defined in the Google Analytics doc */
/* Experiment in the developer console for the right mix          */
/* Your scenario might be different and would require a different */
/* type of query                                                  */
/* The GA API will "number" the return elements as                */
/* element1, element2, element3 and so on                         */
/* In my example, path and title will be 1 and 2 */
%let dimensions=  %sysfunc(urlencode(%str(ga:pagePath,ga:pageTitle)));
/* then pageviews, uniquepageviews, timeonpage will be 3, 4, 5, etc. */
%let metrics=     %sysfunc(urlencode(%str(ga:pageviews,ga:uniquePageviews,ga:timeOnPage,ga:entrances,ga:exits)));
/* this ID is the "View ID" for the GA data you want to access   */
%let id=          %sysfunc(urlencode(%str(ga:<your-view-ID>)));
 
%macro getGAdata;
%do workdate = &enddate %to &startdate %by -1;
	%let urldate=%sysfunc(putn(&workdate.,yymmdd10.));
	filename ga_resp temp;
	proc http
         /* again, put this url= value ALL on a single line! */
	 url="https://www.googleapis.com/analytics/v3/data/ga
            ?ids=&id.%str(&)start-date=&urldate.
            %str(&)end-date=&urldate.
            %str(&)metrics=&metrics.
            %str(&)dimensions=&dimensions.%str(&)max-results=20000"
	 method="GET" out=ga_resp;
	 headers 
	   "Authorization"="Bearer &access_token."
	   "client-id:"="&client_id.";
	run;
 
        /* Love this JSON libname engine! */
	libname garesp json fileref=ga_resp;
 
	data ga.ga_daily%sysfunc(compress(&urldate.,'-')) (drop=element:);
		set garesp.rows;
		drop ordinal_root ordinal_rows;
		length date 8 url $ 300 title $ 250 
	          views 8 unique_views 8 time_on_page 8 entrances 8 exits 8
	          ;
		format date yymmdd10.;
		date=&workdate.;
		/* Corerce the elements into data variables */
		/* Basic on expected sequence               */
		url = element1;
		title = element2;
		views = input(element3, 5.);
		unique_views = input(element4, 6.);
		time_on_page=input(element5, 7.2);
		entrances = input(element6, 6.);
		exits = input(element7, 6.);
	run;
%end;
%mend;
 
%getGAdata;
 
/* Assemble the daily files into one data set */
data alldays_gadata;
  set ga.ga_daily:;
run;

When it's all finally working, SAS pulls in data that looks like this:
Google Analytics data via API

Re-entry and disclaimers

This has been a pretty deep post about Google APIs and how to make them work from SAS programs. If you have experience with the Google APIs already, then I'm optimistic that this article is enough for you to get things working in SAS. If you've never used OAuth2 or any of the Google APIs, then this post might not be quite enough. Don't forget to read Joseph's paper about OAuth2 and PROC HTTP. You might also like his paper about using RESTful APIs.

Please leave a comment if you have questions or if you think that I left something out. I'll do my best to correct and clarify.

The post Using SAS to access Google Analytics APIs appeared first on The SAS Dummy.

3月 082017
 

Digitalisation is blasting the cobwebs out from enterprises and organisations of all kinds – freeing them to innovate and take advantage of the always-on economy. But it’s also helping new disruptive players to gain an unexpectedly strong foothold in many markets. One of the key advantages these new players have [...]

Is governance getting in the way of innovation? was published on SAS Voices by Peter Pugh-Jones

12月 032016
 

JSON is the new XML. The number of SAS users who need to access JSON data has skyrocketed, thanks mainly to the proliferation of REST-based APIs and web services. Because JSON is structured data in text format, we've been able to offer simple parsing techniques that use DATA step and most recently PROC DS2. But finally*, with SAS 9.4 Maintenance 4, we have a built-in LIBNAME engine for JSON.

Simple JSON example: Who is in space right now?

Speaking of skyrocketing, I discovered a cool web service that reports who is in space right now (at least on the International Space Station). It's actually a perfect example of a REST API, because it does just that one thing and it's easily integrated into any process, including SAS. It returns a simple stream of data that can be easily mapped into a tabular structure. Here's my example code and results, which I produced with SAS 9.4 Maintenance 4.

filename resp temp;
 
/* Neat service from Open Notify project */
proc http 
 url="http://api.open-notify.org/astros.json"
 method= "GET"
 out=resp;
run;
 
/* Assign a JSON library to the HTTP response */
libname space JSON fileref=resp;
 
/* Print result, dropping automatic ordinal metadata */
title "Who is in space right now? (as of &sysdate)";
proc print data=space.people (drop=ordinal:);
run;

JSON who is in space
But what if your JSON data isn't so simple? JSON can represent information in nested structures that can be many layers deep. These cases require some additional mapping to transform the JSON representation to a rectangular data table that we can use for reporting and analytics.

JSON map example: Most recent topics from SAS Support Communities

In a previous post I shared a PROC DS2 program that uses the DS2 JSON package to call and parse our SAS Support Communities API. The parsing process is robust, but it requires quite a bit of fore knowledge about the structure and fields within the JSON payload. It also requires many lines of code to extract each field that I want.

Here's a revised pass that uses the JSON engine:

/* split URL for readability */
%let url1=http://communities.sas.com/kntur85557/restapi/vc/categories/id/bi/topics/recent;
%let url2=?restapi.response_format=json%str(&)restapi.response_style=-types,-null,view;
%let url3=%str(&)page_size=100;
%let fullurl=&url1.&url2.&url3;
 
filename topics temp;
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
/* Let the JSON engine do its thing */
libname posts JSON fileref=topics;
title "Automap of JSON data";
 
/* examine resulting tables/structure */
proc datasets lib=posts; quit;
proc print data=posts.alldata(obs=20); run;

Thanks to the many layers of data in the JSON response, here are the tables that SAS creates automatically.

json Auto tables
There are 12 tables that contain various components of the message data that I want, plus the ALLDATA member that contains everything in one linear table. ALLDATA is good for examining structure, but not for analysis. You can see that it's basically name-value pairs with no data types/formats assigned.

json ALLDATA
I could use DATA steps or PROC SQL to merge the various tables into a single denormalized table for my reporting purposes, but there is a better way: define and apply a JSON map for the libname engine to use.

To get started, I need to rerun my JSON libname assignment with the AUTOMAP option. This creates an external file with the JSON-formatted mapping that SAS generates automatically. In my example here, the file lands in the WORK directory with the name "top.map".

filename jmap "%sysfunc(GETOPTION(WORK))/top.map";
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts JSON fileref=topics map=jmap automap=create;

This generated map is quite long -- over 400 lines of JSON metadata. Here's a snippet of the file that describes a few fields in just one of the generated tables.

"DSNAME": "messages_message",
"TABLEPATH": "/root/response/messages/message",
"VARIABLES": [
{
  "NAME": "ordinal_messages",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages"
},
{
  "NAME": "ordinal_message",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages/message"
},
{
  "NAME": "href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/href",
  "CURRENT_LENGTH": 19
},
{
  "NAME": "view_href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/view_href",
  "CURRENT_LENGTH": 134
},

By using this map as a starting point, I can create a new map file -- one that is simpler, much smaller, and defines just the fields that I want. I can reference each field by its "path" in the JSON nested structure, and I can also specify the types and formats that I want in the final data.

In my new map, I eliminated many of the tables and fields and ended up with a file that was just about 60 lines long. I also applied sensible variable names, and I even specified SAS formats and informats to transform some columns during the import process. For example, instead of reading the message "datetime" field as a character string, I coerced the value into a numeric variable with a DATETIME format:

{
  "NAME": "datetime",
   "TYPE": "NUMERIC",
  "INFORMAT": [ "IS8601DT", 19, 0 ],
  "FORMAT": ["DATETIME", 20],
  "PATH": "/root/response/messages/message/post_time/_",
  "CURRENT_LENGTH": 8
},

I called my new map file 'minimap.map' and then re-issued the libname without the AUTOMAP option:

filename minmap 'c:tempminmap.map';
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts json fileref=topics map=minmap;
proc datasets lib=posts; quit;
 
data messages;
 set posts.messages;
run;

Here's a snapshot of the single data set as a result.

JSON final data
I think you'll agree that this result is much more usable than what my first pass produced. And the amount of code is much smaller and easier to maintain than any previous SAS-based process for reading JSON.

Here's the complete program in public GitHub gist, including my custom JSON map.


* By the way, tags: JSON, REST API, SAS programming

The post Reading data with the SAS JSON libname engine appeared first on The SAS Dummy.

11月 022016
 

With DataFlux Data Management 2.7, the major component of SAS Data Quality and other SAS Data Management solutions, every job has a REST API automatically created once moved to the Data Management Server. This is a great feature and enables us to easily call Data Management jobs from programming languages like Python. We can then involve the Quality Knowledge Base (QKB), a  pre-built set of data quality rules, and do other Data Quality work that is impossible or challenging to do when using only Python.

calling-sas-data-quality-jobs-from-pythonIn order to make a RESTful call from Python we need to first get the REST API information for our Data Management job. The best way to get this information is to go to Data Management Server in your browser where you’ll find respective links for:

  • Batch Jobs
  • Real-Time Data Jobs
  • Real-Time Process Jobs.

From here you can drill through to your job REST API.

Alternatively, you can use a “shortcut” to get the information by calling the job’s REST API metadata URL directly. The URL looks like this:

http://<DM Server>:<port>/<job type>/rest/jobFlowDefns/<job id>/metadata

calling-sas-data-quality-jobs-from-python02

The <job id> is simply the job name (with subdirectory and extension) Base64 encoded. This is a common method to avoid issues with illegal URL characters like: # % & * { } : < > ? / + or space. You can go to this website to Base64 encode your job name.

If you have many jobs on the Data Management Server it might be quicker to use the “shortcut” instead of drilling through from the top.

Here an example to get the REST API information for the Data Management job “ParseAddress.ddf” which is in the subdirectory Demo of Real-Time Data Services on DM Server:

calling-sas-data-quality-jobs-from-python03

We Base64 encode the job name “Demo/ParseAddress.ddf” using the website mentioned above…

calling-sas-data-quality-jobs-from-python04

…and call the URL for the job’s REST API metadata:

http://DMServer:21036/SASDataMgmtRTDataJob/rest/jobFlowDefns/ RGVtby9QYXJzZUFkZHJlc3MuZGRm/metadata

calling-sas-data-quality-jobs-from-python05

From here we collect the following information:

The REST API URL and Content-Type information…
calling-sas-data-quality-jobs-from-python06

…the JSON structure for input data

calling-sas-data-quality-jobs-from-python07

…which we need in this format when calling the Data Management job from Python:

{"inputs" : {"dataTable" : {"data" : [[ "sample string" ],[ "another string" ]], "metadata" : [{"maxChars" : 255, "name" : "Address", "type" : "string"}]}}}

…and the JSON structure for the data returned by the Data Management job.

calling-sas-data-quality-jobs-from-python08

When you have this information, the Python code to call the Data Management job would look like this:

calling-sas-data-quality-jobs-from-python09

The output data from the Data Management job will be in data_raw. We call the built-in JSON decoder from the “request” module to move the output into a dictionary (data_out) from where we can access the data. The structure of the dictionary is according to the REST metadata. We can access the relevant output data via data_out[‘outputs’][‘dataTable’][‘data’]

calling-sas-data-quality-jobs-from-python10

calling-sas-data-quality-jobs-from-python11

The Python program will produce an output like this…

calling-sas-data-quality-jobs-from-python12

You can find more information about the DataFlux Data Management REST API here.

Calling Data Management jobs from Python is straight forward and is a convenient way to augment your Python code with the more robust set of Data Quality rules and capabilities found in the SAS Data Quality solution.

Learn more about SAS Data Quality.

tags: data management, DataFlux Data Management Studio, open source, REST API

Calling SAS Data Quality jobs from Python was published on SAS Users.

10月 202016
 

The study of social networks has gained importance over the years within social and behavioral research on HIV and AIDS. Social network research can show routes of potential viral transfer, and be used to understand the influence of peer norms and practices on the risk behaviors of individuals. This example analyzes the […]

Analyzing social networks using Python and SAS Viya was published on SAS Voices.

7月 232016
 

Slack is a tremendously popular app for team collaboration. At its core, it's an open messaging app that allows team members to communicate with each other in real time. It works well with "startup"-style teamwork, which requires constant communication among people who aren't always located together. I've heard a lot about Slack from colleagues and from podcasts (Slack is a frequent advertiser on tech shows), but I'd never tried it myself until recently.

I was most curious about their APIs, which allow you to integrate other applications into the Slack messaging platform. Months ago, a colleague had asked me whether it was possible to integrate SAS output with Slack. I suspected that we could do something with PROC HTTP, but didn't get around to trying until today. I can now answer for certain: Yes! It's possible...and it's sort of fun!

Get started with a Slack "webhook"

I won't go into how you get started with Slack, except to say that it's really easy (and free) to create an account and a "channel" (your message space) for a small team. Assuming that you have that going, I'm going to show you how to use SAS to publish to your channel.

Slack supports a feature called "Incoming Webhooks," which is basically a simple endpoint URL that you can send messages to. From your Slack channel, you can select to Add Configuration, which takes you to the option to add a new Incoming Webhook:

slackwebhook
Click the button and Slack will provision a cryptic URL that's your unique endpoint. Any process that uses HTTP calls to POST content to that URL can publish content. The content itself (called the payload) is delivered in JSON format to the API.

Example 1: Simple in-line message with PROC HTTP

The Slack documentation supplies examples that use curl (command-line URL tool), but PROC HTTP is the SAS equivalent. Here's a simple example:

%let webhookUrl = https://hooks.slack.com/services/<your Webhook URL>;
 
/*
  Simple example with direct text 
*/
 
filename resp temp;
proc http
 url="&webhookUrl"
 method="POST"
 /* IN= supports text in SAS 9.4m3.  Earlier release? Use fileref with content */
 in='payload={"channel": "#fromsas", 
     "username": "sasprogram", 
     "text": "Created using PROC HTTP!"}'
 out=resp
 ;
run;

To try this, you'll have to first get your own Webhook URL and plug it into the program. I'd loan you mine, but you're not on my channel so you can't check the results...which look like this:

Slack simple

Example 2: Share rich messages with PROC HTTP and JSON

Slack also allows multipart messages with simple formatting, including some colors, custom icons, and working links. This requires a little bit more JSON content in the payload, including an array of attachments. Here's a more complex example:

/*
  More complex messages, with multiple parts.
  Use the attachments fields that Slack supports
*/
filename rich temp;
 
data _null_;
 file rich;
 infile datalines4;
 input;
 put _infile_;
datalines4;
payload=
   {   
    "channel": "#fromsas", 
    "username": "sasprogram",
    "icon_emoji": ":fax:",
   "attachments":[
      {
  "fallback":
   "New SAS Dummy post!: <http://blogs.sas.com/content/sasdummy|The SAS Dummy blog>",
 "pretext":
   "New SAS Dummy post!: <http://blogs.sas.com/content/sasdummy|The SAS Dummy blog>",
         "color":"#3030F0",
         "fields":[
            {
               "title":"Great news!",
               "value":"That Chris...he's done it again!",
               "short":false
            }
         ]
      }
   ]
  }	
;;;;
 
proc http
 url="&webhookUrl"
 method="POST"
 in=rich
 out=resp
 ;
run;

Here's the result. See how I selected a nice modern emoji as the account icon? Slack has hundreds of these available.

Slack message

Example 3: Data-driven JSON payload published to Slack using PROC HTTP

But the real power of this integration from SAS is the ability to push dynamic, data-driven content to your Slack channel. To accomplish that, you need to dynamically generate your JSON content with the fields that you want to share. Here's an example that publishes the output of a SAS procedure (read from a data set) to the channel:

/*
 And finally an example that publishes values from data!
*/
 
/* Calculate some data */
proc means data=sashelp.class noprint;
var age;
output out=stats;
run;
 
/* file to hold the JSON payload */
filename msg temp;
 
/* Create the start of the JSON payload */
data _null_;
 file msg ;
 infile datalines4;
 input;
 put _infile_;
datalines4;
payload=
   {   
    "channel": "#fromsas", 
    "username": "sasprogram",
    "icon_emoji": ":fax:",
   "attachments":[
      {
         "fallback":"Latest Stats for AGE in SASHELP.CLASS",
         "pretext":"Latest Stats for AGE in SASHELP.CLASS",
         "color":"#D000FF",
         "fields":[
;;;;
 
/* fill in the data fields in the middle */
data _null_;
 file msg mod;
 set stats end=eof;
 put '{ "short":false, "title": "' _stat_ '",';
 put '"value": "' age '" }';
 /* separate values with commas, except the last */
 if not eof then put ",";
run;
 
/*
  And finish with the tail end of the payload
*/
 
data _null_;
 file msg mod;
 infile datalines4;
 input;
 put _infile_;
datalines4;
         ]
      }
   ]
  }	
;;;;
 
proc http
 url="&webhookUrl"
 method="POST"
 in=msg
 out=resp
 ;
run;

Here's the result -- the latest figures from SASHELP.CLASS!
Slack data

I've shared the complete example code on a public Gist on GitHub. Remember, to try it yourself you'll need to:

  • Create a Slack account, if you don't have one. Along with a new Slack channel.
  • Use the Slack site to add a new Incoming Webhook for your channel
  • Replace the webhookURL macro value in my example code with your specific Webhook URL.

Have fun! And if you create anything really interesting, I hope you'll invite me to your Slack channel!

tags: PROC HTTP, REST API, Slack

The post How to publish to a Slack channel with SAS appeared first on The SAS Dummy.

2月 172016
 

REpresentational State Transfer (REST) is an architectural style for designing web services that access a system's resources using HTTP methods.

With the release of DataFlux Data Management 2.7 earlier this year, three new REST Application Programming Interfaces (APIs) are now available:

  • Data Management Server Batch Jobs
  • Data Management Server Real-time Data Jobs
  • Data Management Server Real-time Process Jobs

The Base URLs for the new Data Management Server REST APIs are:

Batch Jobs – http://<DM Server IP address or server name>:<port>/ SASDataMgmtBatchJob/rest

Real-time Data Services – http://<DM Server IP address or server name>:<port>/ SASDataMgmtRTDataJob/rest

Real-time Process Services – http://<DM Server IP address or server name>:<port>/ SASDataMgmtRTProcessJob/rest

Here is a sample call to the Base URL for Real-time Data Services:

http://my_dmserver.com:21036/SASDataMgmtRTDataJob/rest

RESTAPIs

The Data Management Server job or service name is referred to by its ID which is simply the name of the job or service (with its extension) Base64 encoded.  You can use this site to Base64 encode text.  Base64 encoding is a common standard use to encode paths or other resource names in any URL.  For example, without Base64 encoding, a filename and path that contains a space can cause problems.

Here is a sample call for a specific Real-time Data Service using its Base64 encoded name for its ID.

http://my_dmserver.com:21036/SASDataMgmtRTDataJob/rest/jobFlowDefns/ RGV0ZXJtaW5lX0dlbmRlci5kZGY=

RESTAPIs2

Note that the returned text contains information about the links to call (href), method to use as well as content type of the request.

Before reviewing the REST API metadata information for the service, let’s first look at testing the Determine Gender data service using the Data Management Server test service utility.

RESTAPIs3

The input of the service is Name and the output is Gender.

Here is an example of the REST API metadata information which includes the inputs and outputs for the Determine_Gender.ddf service.

http://my_dmserver.com:21036/SASDataMgmtRTDataJob/rest/jobFlowDefns/ RGV0ZXJtaW5lX0dlbmRlci5kZGY=/metadata

RESTAPIs4

The following are some common tools for testing REST APIs requests and responses:

Here is an example of the request/response information for executing the Determine Gender data service. The input Names of “John Smith” and “Jane Smith” were sent to the service and the Genders of “M” and “F” were returned.  (The Mozilla Firefox Add-on tool was used to generate this information.)

RESTAPIs5

The table below lists the possible HTTP status response codes that could be returned by a Data Management Server REST API request:

RESTAPIs6

For more information refer to the DataFlux Data Management Server 2.7: REST API Reference guide.

 

 

tags: data management, Data Management Server, REST API, SAS Professional Services

DataFlux Data Management Server 2.7: REST APIs was published on SAS Users.

9月 292015
 

Thanks to the proliferation of cloud services and REST-based APIs, SAS users have been making use of PROC HTTP calls (to query these web services) and some creative DATA step or PROC GROOVY code to process the JSON results. Such methods get the job done (JSON is simply text, after all), but they aren't as robust as an official JSON parser. JSON is simple: it's a series of name-value pairs that represent an object in JavaScript. But these pairs can be nested within one another, so in order to parse the result you need to know about the object structure. A parser helps with the process, but you still need to know the semantics of any JSON response.

SAS 9.4 introduced PROC JSON, which allows you to create JSON output from a data set. But it wasn't until SAS 9.4 Maintenance 3 that we have a built-in method to parse JSON content. This method was added as a DS2 package: the JSON package.

I created an example of the method working -- using an API that powers our SAS Support Communities! The example queries communities.sas.com for the most recent posts to the SAS Programming category. Here's a small excerpt of the JSON response.

 "post_time": "2015-09-28T16:29:05+00:00",
  "views": {
  "count": 1
  },
  "subject": "Re: How to code for the consecutive values",
  "author": {
  "href": "/users/id/13884",
  "login": "ballardw"

Notice that some items, such as post_time, are simple one-level values. But other items, such as views or author, require a deeper dive to retrieve the value of interest ("count" for views, and "login" for author). The DS2 JSON parser can help you to navigate to those values without you needing to know how many braces or colons or commas are in your way.

Here is an example of the result: a series plot from PROC SGPLOT and a one-way frequency analysis from PROC FREQ. The program also produces a detailed listing of the messages, the topic content, and the datetime stamp.

series

boardfreq
This is my first real DS2 program, so I'm open to feedback. I already know of a couple of improvements I should make, but I want to share it now as I think it's good enough to help others who are looking to do something similar.

The program requires SAS 9.4 Maintenance 3. It also works fine in the most recent version of SAS University Edition (using SAS Studio 3.4). All of the code runs using just Base SAS procedures.

/* DS2 program that uses a REST-based API */
/* Uses http package for API calls       */
/* and the JSON package (new in 9.4m3)   */
/* to parse the result.                  */
proc ds2; 
  data messages (overwrite=yes);
    /* Global package references */
    dcl package json j();
 
    /* Keeping these variables for output */
    dcl double post_date having format datetime20.;
    dcl int views;
    dcl nvarchar(128) subject author board;
 
    /* these are temp variables */
    dcl varchar(65534) character set utf8 response;
    dcl int rc;
    drop response rc;
 
    method parseMessages();
      dcl int tokenType parseFlags;
      dcl nvarchar(128) token;
      rc=0;
      * iterate over all message entries;
      do while (rc=0);
        j.getNextToken( rc, token, tokenType, parseFlags);
 
        * subject line;
        if (token eq 'subject') then
          do;
            j.getNextToken( rc, token, tokenType, parseFlags);
            subject=token;
          end;
 
        * board URL, nested in an href label;
        if (token eq 'board') then
          do;
            do while (token ne 'href');
               j.getNextToken( rc, token, tokenType, parseFlags );
            end;
            j.getNextToken( rc, token, tokenType, parseFlags );
            board=token;
          end;
 
        * number of views (int), nested in a count label ;
        if (token eq 'views') then
          do;
            do while (token ne 'count');
               j.getNextToken( rc, token, tokenType, parseFlags );
            end;
            j.getNextToken( rc, token, tokenType, parseFlags );
            views=inputn(token,'5.');
          end;
 
        * date-time of message (input/convert to SAS date) ;
        * format from API: 2015-09-28T10:16:01+00:00 ;
        if (token eq 'post_time') then
          do;
            j.getNextToken( rc, token, tokenType, parseFlags );
            post_date=inputn(token,'anydtdtm26.');
          end;
 
        * user name of author, nested in a login label;
        if (token eq 'author') then
          do; 
            do while (token ne 'login');
               j.getNextToken( rc, token, tokenType, parseFlags );
            end;
            * get the author login (username) value;
            j.getNextToken( rc, token, tokenType, parseFlags );
            author=token;
            output;
          end;
      end;
      return;
    end;
 
    method init();
      dcl package http webQuery();
      dcl int rc tokenType parseFlags;
      dcl nvarchar(128) token;
      dcl integer i rc;
 
      /* create a GET call to the API                                         */
      /* 'sas_programming' covers all SAS programming topics from communities */
      webQuery.createGetMethod(
         'http://communities.sas.com/kntur85557/' || 
         'restapi/vc/categories/id/sas_programming/posts/recent' ||
         '?restapi.response_format=json' ||
         '&restapi.response_style=-types,-null&page_size=100');
      /* execute the GET */
      webQuery.executeMethod();
      /* retrieve the response body as a string */
      webQuery.getResponseBodyAsString(response, rc);
      rc = j.createParser( response );
      do while (rc = 0);
        j.getNextToken( rc, token, tokenType, parseFlags);
        if (token = 'message') then
          parseMessages();
      end;
    end;
 
  method term();
    rc = j.destroyParser();
  end;
 
  enddata;
run;
quit;
 
/* Add some basic reporting */
proc freq data=messages noprint;
    format post_date datetime11.;
    table post_date / out=message_times;
run;
 
ods graphics / width=2000 height=600;
title '100 recent message contributions in SAS Programming';
title2 'Time in GMT';
proc sgplot data=message_times;
    series x=post_date y=count;
    xaxis minor label='Messages';
    yaxis label='Time created' grid;
run;
 
title 'Board frequency for recent 100 messages';
proc freq data=messages order=freq;
    table board;
run;
 
title 'Detailed listing of messages';
proc print data=messages;
run;
 
title;

I also shared this program on the SAS Support Communities as a discussion topic. If you want to contribute to the effort, please leave me a reply with your suggestions and improvements!

tags: DS2, JSON, REST API, SAS 9.4

The post Using SAS DS2 to parse JSON appeared first on The SAS Dummy.

3月 072013
 

Last year I shared this popular tip for counting how many times a web link has been shared on Twitter or Facebook. I use this technique daily to report on the social media "popularity" of our blog articles at SAS.

I wanted to add LinkedIn into the mix. Like Twitter and Facebook, LinkedIn also has a REST-based API that can be used within a SAS program. For example, if I want to know how many times my recent post "It's raining analytics" has been shared on LinkedIn, I can use PROC HTTP to hit this URL:

http://www.linkedin.com/countserv/count/share?url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/

Here is the JSON-formatted response (as of today):

IN.Tags.Share.handleCount ( { "count":89, "fCnt":"89", "fCntPlusOne":"90", "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2013\/01\/15\/its-raining-analytics\/" } );

Wow! That blog post has done pretty well on LinkedIn (with "count"=89) - it's my most-shared post this year.

Here's a SAS program that checks the LinkedIn shares for me:

%let url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/;

/* temp holding area for LinkedIn response */
filename li temp;

/* call the LinkedIn API */
proc http
  url="http://www.linkedin.com/countserv/count/share?url=&url."
  method='GET'
  /* proxyhost= if behind corp firewall */
  out=li;
run;

/* use RegEx to gather the "count":n  value */
data liresult (keep=url lishares);
  length line $ 1000 lishares 8;
  length url $ 300;
  url = "&url.";
  infile li;
  input line;

  if _n_ = 1 then
    do;
      retain li_regex;
      li_regex = prxparse("/\""count\""\:([0-9]*)/");
    end;

  position = prxmatch(li_regex,line);

  if (position ^= 0) then
    do;
      call prxposn(li_regex, 1, start, length);
      lishares = substr(line,start,length);
    end;
run;

/* clear our temp response file */
filename li clear;

Result:

That's a lot of code to retrieve the answer for just one link. Thanks to the power of the SAS macro language, I can scale this to retrieve the values for an entire collection of links. With those results in hand, I can run other stats:

Of my 63 posts in the past 12 months, my links have been shared to LinkedIn an average of 4.58 times, with a total of 289 shares overall.

I'm not so naive that I consider these to be impressive numbers, but I've only just begun the habit of sharing my posts on LinkedIn. With this process as part of my daily blog reporting, I can now measure how my "LinkedIn engagement" improves as I share more content. Collect data, count, measure, report -- that's what it's all about, right?

Note: Many web articles, such as blog posts, can have multiple URLs. For example, the WordPress platform offers "short-link" GUID URLs as well as the longer, more descriptive URLs. While all of these different URLs might lead to the same page, LinkedIn counts only the URL you share. So if you are in the habit of publicizing different URLs for convenience or other tracking purposes, you might need to check each permutation of a page URL with this program to get the complete "LinkedIn shares" picture.

Reference example

Complete macro version of my LinkedIn shares program (lishares_example.sas)

tags: LinkedIn, PROC HTTP, REST API