1月 062017
 

At SAS, we've published more repositories on GitHub as a way to share our open source projects and examples. These "repos" (that's Git lingo) are created and maintained by experts in R&D, professional services (consulting), and SAS training. Some recent examples include:

With dozens of repositories under the sassoftware account, it becomes a challenge to keep track of them all. So, I've built a process that uses SAS and the GitHub APIs to create reports for my colleagues.

Using the GitHub API

GitHub APIs are robust and well-documented. Like most APIs these days, you access them using HTTP and REST. Most of the API output is returned as JSON. With PROC HTTP and the JSON libname engine (new in SAS 9.4 Maint 4), using these APIs from SAS is a cinch.

The two API calls that we'll use for this basic report are:

Fetching the GitHub account metadata

The following SAS program calls the first API to gather some account metadata. Then, it stores a selection of those values in macro variables for later use.

/* Establish temp file for HTTP response */
filename resp temp;
 
/* Get Org metadata, including repo count */
proc http
 url="https://api.github.com/orgs/sassoftware"  
 method="GET"
 out=resp
;
run;
 
/* Read response as JSON data, extract select fields */
/* It's in the ROOT data set, found via experiment   */
libname ss json fileref=resp;
 
data meta; 
  set ss.root; 
  call symputx('repocount',public_repos);
  call symputx('acctname',name);
  call symputx('accturl',html_url);
run;
 
/* log results */
%put &=repocount;
%put &=acctname;
%put &=accturl;

Here is the output of this program (as of today):

REPOCOUNT=66
ACCTNAME=SAS Software
ACCTURL=https://github.com/sassoftware

The important piece of this output is the count of repositories. We'll need that number in order to complete the next step.

Fetching the repositories and stats

It turns out that the /repos API call returns the details for 30 repositories at a time. For accounts with more than 30 repos, we need to call the API multiple times with a &page= index value to iterate through each batch. I've wrapped this process in a short macro function that repeats the calls as many times as needed to gather all of the data. This snippet calculates the upper bound of my loop index:

/* Number of repos / 30, rounded up to next integer     */
%let pages=%sysfunc(ceil(%sysevalf(&repocount / 30)));

Given the 66 repositories on the SAS Software account right now, that results in 3 API calls.

Each API call creates verbose JSON output with dozens of fields, only a few if which we care about for this report. To simplify things, I've created a JSON map that defines just the fields that I want to capture. I came up with this map by first allowing the JSON libname engine to "autocreate" a map file with the full response. I edited that file and whittled the result to just 12 fields. (Read my previous blog post about the JSON engine to learn more about JSON maps.)

The multiple API calls create multiple data sets, which I must then concatenate into a single output data set for reporting. Then to clean up, I used PROC DATASETS to delete the intermediate data sets.

First, here's the output data:

ssgit
Here's the code segment, which is rather long because I included the JSON map inline.

/* This trimmed JSON map defines just the fields we want */
/* Created by using AUTOMAP=CREATE on JSON libname       */
/* then editing the generated map file to reduce to      */
/* minimum number of fields of interest                  */
filename repomap temp;
data _null_;
 infile datalines;
 file repomap;
 input;
 put _infile_;
 datalines;
{
  "DATASETS": [
 {
   "DSNAME": "root",
   "TABLEPATH": "/root",
   "VARIABLES": [
  {
    "NAME": "id",
    "TYPE": "NUMERIC",
    "PATH": "/root/id"
  },
  {
    "NAME": "name",
    "TYPE": "CHARACTER",
    "PATH": "/root/name",
    "CURRENT_LENGTH": 50,
    "LENGTH": 50
  },
  {
    "NAME": "html_url",
    "TYPE": "CHARACTER",
    "PATH": "/root/html_url",
    "CURRENT_LENGTH": 100,
    "LENGTH": 100
  },
  {
    "NAME": "language",
    "TYPE": "CHARACTER",
    "PATH": "/root/language",
    "CURRENT_LENGTH": 20,
    "LENGTH": 20
  },
  {
    "NAME": "description",
    "TYPE": "CHARACTER",
    "PATH": "/root/description",
    "CURRENT_LENGTH": 300,
    "LENGTH": 500
  },
  {
    "NAME": "created_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/created_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "updated_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/updated_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "pushed_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/pushed_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "size",
    "TYPE": "NUMERIC",
    "PATH": "/root/size"
  },
  {
    "NAME": "stars",
    "TYPE": "NUMERIC",
    "PATH": "/root/stargazers_count"
  },
  {
    "NAME": "forks",
    "TYPE": "NUMERIC",
    "PATH": "/root/forks"
  },
  {
    "NAME": "open_issues",
    "TYPE": "NUMERIC",
    "PATH": "/root/open_issues"
  }
   ]
 }
  ]
}
;
run;
 
/* GETREPOS: iterate through each "page" of repositories */
/* and collect the GitHub data                           */
/* Output: <account>_REPOS, a data set with all basic data  */
/*  about an account's public repositories          */
%macro getrepos;
 %do i = 1 %to &pages;
  proc http
   url="https://api.github.com/orgs/sassoftware/repos?page=&i."  
   method="GET"
   out=resp
  ;
  run;
 
  /* Use JSON engine with defined map to capture data */
  libname repos json map=repomap fileref=resp;
  data _repos&i.;
   set repos.root;
  run;
 %end;
 
 /* Concatenate all pages of data */
 data sassoftware_allrepos;
  set _repos:;
 run;
 
 /* delete intermediate repository data */
 proc datasets nolist nodetails;
  delete _repos:;
 quit;
%mend;
 
/* Run the macro */
%getrepos;

Creating a simple report

Finally, I want to create simple report listing of all of the repositories and their top-level stats. I'm using PROC SQL without a CREATE TABLE statement, which will create a simple ODS listing report for me. I use this approach instead of PROC PRINT because I transformed a couple of the columns in the same step. For example, I created a new variable with a fully formed HTML link, which ODS HTML will render as an active link in the browser. Here's a snapshot of the output, followed by the code.

samplereport

/* Best with ODS HTML output */
title "github.com/sassoftware (&acctname.): Repositories and stats";
title2 "ALL &repocount. repos, Data pulled with GitHub API as of &SYSDATE.";
title3 height=1 link="&accturl." "See &acctname. on GitHub";
proc sql;
 select 
  catt('<a href="',t1.html_url,'">',t1.name,"</a>") as Repository, 
 case 
  when length(t1.description)>50 then cat(substr(t1.description,1,49),'...')
  else t1.description
 end 
as Description,
 t1.language as Language,
 t1.created_at format=dtdate9. as Created, 
 t1.pushed_at format=dtdate9. as Last_Update, 
 t1.stars as Stars, 
 t1.forks as Forks, 
 t1.open_issues as Open_Issues
from sassoftware_allrepos t1
 order by t1.pushed_at desc;
quit;

Get the entire example

Not wanting to get too meta on you here, but I've placed the entire program on my own GitHub account. The program I've shared has a few modifications that make it easier to adapt for any organization or user on GitHub. As you play with this, keep in mind that the GitHub API is "rate limited" -- they allow only so many API calls from a single IP address in a certain period of time. That's to ensure that the APIs perform well for all users. You can use authenticated API calls to increase the rate-limit threshold for yourself, and I do that for my own production reporting process. But...that's a blog post for a different day.

tags: github, JSON, open source, PROC HTTP

The post Reporting on GitHub accounts with SAS appeared first on The SAS Dummy.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)