Github

12月 142017
 

As you might have heard, sasCommunity.org -- a wiki-based web site that has served as a user-sourced SAS repository for over a decade -- is winding down. This was a difficult decision taken by the volunteer advisory board that runs the site. However, the decision acknowledges a new reality: SAS professionals have many modern options for sharing and promoting their professional work, and they are using those options. In 2007, the birth year of sasCommunity.org, the technical/professional networking world was very different than it is today. LinkedIn was in its infancy. GitHub didn't exist. SAS Support Communities (communities.sas.com) was an experiment just getting started with a few discussion forums. sasCommunity.org (and its amazing volunteers) blazed a trail for SAS users to connect and share, and we'll always be grateful for that.

Even with the many alternatives we now have, the departure of sasCommunity.org will leave a gap in some of our professional sharing practices. In this article, I'll share some ideas that you can use to fill this gap, and to extend the reach of your SAS knowledge beyond just your SAS community colleagues. Specifically, I'll address how you can make the biggest splash and have an enduring impact with that traditional mode of SAS-knowledge sharing: the SAS conference paper.

Extending the reach of your SAS Global Forum paper

Like many of you, I've written and presented a few technical papers for SAS Global Forum (and also for its predecessor, SUGI). With each conference, SAS publishes a set of proceedings that provide perpetual access to the PDF version of my papers. If you know what you're looking for, you can find my papers in several ways:

All of these methods work with no additional effort from me. When your paper is published as part of a SAS conference, that content is automatically archived and findable within these conference assets. But for as far as this goes, there is opportunity to do so much more.

Write an article for SAS Support Communities

ArtC's presenter page

sasCommunity.org supported the idea of "presenter pages" -- a mini-destination for information about your conference paper. As an author, you would create a page that contains the description of your paper, links to supporting code, and any other details that you wanted to lift out of the PDF version of your paper. Creating such a page required a bit of learning time with the wiki syntax, and just a small subset of paper presenters ever took the time to complete this step. (But some prolific contributors, such as Art Carpenter or Don Henderson, shared blurbs about dozens of their papers in this way.) Personally, I created a few pages on sasCommunity.org to support my own papers over the years.

SAS Support Communities offers a similar mechanism: the SAS Communities Library. Any community member can create an article to share his or her insights about a SAS related topic. A conference paper is a great opportunity to add to the SAS Communities Library and bring some more attention to your work. A communities article also serves as platform for readers to ask you questions about your work, as the library supports a commenting feature that allows for discussion.

Since sasCommunity.org has announced its retirement plans, I took this opportunity to create new articles on SAS Support Communities to address some of my previous papers. I also updated the content, where appropriate, to ensure that my examples work for modern releases of SAS. Here are two examples of presentation pages that I created on SAS Support Communities:

One of my presentations on in the SAS Communities Library

When you publish a topic in the SAS Communities Library, especially if it's a topic that people search for, your article will get an automatic boost in visitors thanks to the great search engine traffic that drives the communities site. With that in mind, use these guidelines when publishing:

  • Use relevant key words/phrases in your article title. Cute and clever titles are a fun tradition in SAS conference papers, and you should definitely keep those intact within the body of your article. But reserve the title field for a more practical description of the content you're sharing.
  • Include an image or two. Does your paper include an architecture diagram? A screen shot? A graph or plot? Use the Insert Photos button to add these to your article for visual interest and to give the reader a better idea of what's in your paper.
  • Add a snippet of code. You don't have to attach all of your sample code with hundreds of program lines, but a little bit of code can help the reader with some context. Got lots of code? We'll cover that in the next section.

To get started with the process for creating an article...see this article!

Share your code on GitHub

SAS program code is an important feature in SAS conference papers. A code snippet in a PDF-style paper can help to illustrate your points, but you cannot effectively share entire programs or code libraries within this format. Code that is locked up in a PDF document is difficult for a reader to lift and reuse. It's also impossible to revise after the paper is published.

GitHub is a free service that supports sharing and collaboration for any code-based technology, including SAS. Anyone who works with code -- data scientists, programmers, application developers -- is familiar with GitHub at least as a reader. If you haven't done so already, it might be time to create your own GitHub account and share your useful SAS code. I have several GitHub repositories (or "repos" as we GitHub hipsters say) that are related to papers, blog posts, and books that I've written. It just feels like a natural way to share code. Occasionally a reader suggests an improvement or finds a bug, and I can change the code immediately. (Alas, I cannot go back in time and change a published paper...)

A sample of conference-paper-code on my GitHub.

List your published work on your LinkedIn profile

So, you've presented your work at a major SAS conference! Your professional network needs to know this about you. You should list this as an accomplishment on your resume, and definitely on your LinkedIn profile.

LinkedIn offers a "publication" section -- perfect for listing books and papers that you've written. Or, you can add this to the "projects" section of your profile, especially if you collaborate with someone else that you want to include in this accomplishment. I have yet to add my entire back-catalog of conference papers, but I have added a few recent papers to my LinkedIn profile.

One of a few publications listed on my LinkedIn profile

Bonus step: write about your experience in a LinkedIn article

Introspection has a special sort of currency on LinkedIn that doesn't always translate well to other places. A LinkedIn article -- a long-form post that you write from a first-person perspective -- gives you a chance to talk about the deeper meaning of your project. This can include the story of inspiration behind your conference paper, personal lessons that you learned along the way, and the impact that the project had in your workplace and on your career. This "color commentary" adds depth to how others see your work and experience, which helps them to learn more about you and what drives you.

Here are a few examples of what I'm talking about:

It's not about you. It's about us

The techniques I've shared here might sound like "how to promote yourself." Of course, that's important -- we each need to take responsibility for our own self-promotion and ensure that our professional achievements shine through. But more importantly, these steps play a big role helping your content to be findable -- even "stumble-uponable" (a word I've just invented). You've already invested a tremendous amount of work into researching your topic and crafting a paper and presentation -- take it the extra bit of distance to make sure that the rest of us can't miss it.

The post How to share your SAS knowledge with your professional network appeared first on The SAS Dummy.

5月 082017
 

In his recent article Perceptions of probability, Rick Wicklin explores how vague statements about "likeliness" translate into probabilities that we can express numerically. It's a fun, informative post -- I recommend it! You'll "Almost Certainly" enjoy it.

To prepare the article, Rick first had to download the source data from the study he cited. The data was shared as a CSV file on GitHub. Rick also had to rename the variables (column names) from the data table so that they are easier to code within SAS. Traditionally, SAS variable names must adhere to a few common programming rules: they must be alphanumeric, begin with a letter, and contain no spaces or special characters. The complete rules are documented in the this method for reading data from a cloud service like DropBox and GitHub. It's still my favorite technique for reading data from the Internet. You'll find lots of papers and examples that use FILENAME URL for the same job in fewer lines of code, but PROC HTTP is more robust. It runs faster, and it allows you to separate the step of fetching the file from the subsequent steps of processing that file.

You can see the contents of the CSV file at this friendly URL: https://github.com/zonination/perceptions/blob/master/probly.csv. But that's not the URL that I need for PROC HTTP or any programmatic access. To download the file via a script, I need the "Raw" file URL, which I can access via the Raw button on the GitHub page.

GitHub preview

In this case, that's https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv. Here's the PROC HTTP step to download the CSV file into a temporary fileref.

/* Fetch the file from the web site */
filename probly temp;
proc http
 url="https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv"
 method="GET"
 out=probly;
run;

A note for SAS University Edition users: this step won't work for you, as the free software does not support access to secure (HTTPS) sites. You'll have to manually download the file via your browser and then continue with the remaining steps.

Step 2. Import the data into SAS with PROC IMPORT

SAS can process data with nonstandard variable names, including names that contain spaces and special characters. You simply have to use the VALIDVARNAME= system option to put SAS into the right mode (oops, almost wrote "mood" there but it's sort of the same thing).

With 'crime against nature'n.)

For this step, I'll set VALIDVARNAME=ANY to allow PROC IMPORT to retain the original column names from the CSV file. The same trick would work if I was importing from an Excel file, or any other data source that was a little more liberal in its naming rules.

/* Tell SAS to allow "nonstandard" names */
options validvarname=any;
 
/* import to a SAS data set */
proc import
  file=probly
  out=work.probly replace
  dbms=csv;
run;

Step 3. Create RENAME and LABEL statements with PROC SQL

This is one of my favorite SAS tricks. You can use PROC SQL SELECT INTO to create SAS programming statements for you, based on the data you're processing. Using this technique, I can build the parts of the LABEL statement and the RENAME statement dynamically, without knowing the variable names ahead of time.

The LABEL statement is simple. I'm going to build a series of assignments that look like this:

  'original variable name'n = 'original variable name'

I used the SELECT INTO clause to build a label assignment for each variable name. I used the CAT function to assemble the label assignment piece-by-piece, including the special literal syntax, the variable name, the assignment operator, and the label value within quotes. I'm fetching the variable names from SASHELP.VCOLUMN, one of the built-in dictionary tables that SAS provides to surface table and column metadata.

  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";

Here's part of the value of &labelStmt:

'Almost Certainly'n='Almost Certainly' 
'Highly Likely'n='Highly Likely' 
'Very Good Chance'n='Very Good Chance' 
'Probable'n='Probable' 
'Likely'n='Likely' 
'Probably'n='Probably' 
'We Believe'n='We Believe' 

The RENAME statement is a little trickier, because I have to calculate a new valid variable name. For this specific data source that's easy, because the only SAS "rule" that these column names violate is the ban on space characters. I can create a new name by using the COMPRESS function to remove the spaces. To be a little safer, I used the "kn" modifier on the COMPRESS function to keep only English letters, numbers, and underscores. That should cover all cases except for variable names that are too long (greater than 32 characters) or that begin with a number (or that don't contain any valid characters to begin with).

Some of the column names are one-word names that are already valid. If I include those in the RENAME statement, SAS will generate an error (you cannot "rename" a variable to its current name). I used the

/* Generate new names to comply with SAS rules.                          */
/* Assumes names contain spaces, and can fix with COMPRESS               */
/* Other deviations (like special chars, names that start with a number) */
/* would need different adjustments                                      */
/* NVALID() function can check that a name is a valid V7 name           */
proc sql noprint;
 
  /* retain original names as labels */
  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";
 
  select cat("'",trim(name),"'n","=",compress(name,,'kn')) 
     into :renameStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK"
  /* exclude those varnames that are already valid */
  AND not NVALID(trim(name),'V7');
quit;

Step 4. Modify the data set with new names and labels using PROC DATASETS

With the body of the LABEL and RENAME statements built, it's time to plug them into a PROC DATASETS step. PROC DATASETS can change data set attributes such as variable names, labels, and formats without requiring a complete rewrite of the data -- it's a very efficient operation.

I include the LABEL statement first, since it references the original variable names. Then I include the RENAME statement, which changes the variable names to their new V7-compliant values.

Finally, I reset the VALIDVARNAME= option to the normal V7 sanity. (Unless you're running in SAS Enterprise Guide, in which case the option is already set to ANY by default. Check this blog post for a less disruptive method of setting/restoring options.)

proc datasets lib=work nolist ;
  modify probly / memtype=data;
  label &labelStmt.;
  rename &renameStmt.;
  /* optional: report on the var names/labels */
  contents data=probly nodetails;
quit;
 
/* reset back to the old rules */
options validvarname=v7;

Here's the CONTENTS output from the PROC DATASETS step, which shows the final variable attributes. I now have easy-to-code variable names, and they still have their descriptive labels. My data dictionary dreams are coming true!

DATASETS rename output

Download the entire program example from my public Gist: import_renameV7.sas.

The post How to download and convert CSV files for use in SAS appeared first on The SAS Dummy.

5月 082017
 

In his recent article Perceptions of probability, Rick Wicklin explores how vague statements about "likeliness" translate into probabilities that we can express numerically. It's a fun, informative post -- I recommend it! You'll "Almost Certainly" enjoy it.

To prepare the article, Rick first had to download the source data from the study he cited. The data was shared as a CSV file on GitHub. Rick also had to rename the variables (column names) from the data table so that they are easier to code within SAS. Traditionally, SAS variable names must adhere to a few common programming rules: they must be alphanumeric, begin with a letter, and contain no spaces or special characters. The complete rules are documented in the this method for reading data from a cloud service like DropBox and GitHub. It's still my favorite technique for reading data from the Internet. You'll find lots of papers and examples that use FILENAME URL for the same job in fewer lines of code, but PROC HTTP is more robust. It runs faster, and it allows you to separate the step of fetching the file from the subsequent steps of processing that file.

You can see the contents of the CSV file at this friendly URL: https://github.com/zonination/perceptions/blob/master/probly.csv. But that's not the URL that I need for PROC HTTP or any programmatic access. To download the file via a script, I need the "Raw" file URL, which I can access via the Raw button on the GitHub page.

GitHub preview

In this case, that's https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv. Here's the PROC HTTP step to download the CSV file into a temporary fileref.

/* Fetch the file from the web site */
filename probly temp;
proc http
 url="https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv"
 method="GET"
 out=probly;
run;

A note for SAS University Edition users: this step won't work for you, as the free software does not support access to secure (HTTPS) sites. You'll have to manually download the file via your browser and then continue with the remaining steps.

Step 2. Import the data into SAS with PROC IMPORT

SAS can process data with nonstandard variable names, including names that contain spaces and special characters. You simply have to use the VALIDVARNAME= system option to put SAS into the right mode (oops, almost wrote "mood" there but it's sort of the same thing).

With 'crime against nature'n.)

For this step, I'll set VALIDVARNAME=ANY to allow PROC IMPORT to retain the original column names from the CSV file. The same trick would work if I was importing from an Excel file, or any other data source that was a little more liberal in its naming rules.

/* Tell SAS to allow "nonstandard" names */
options validvarname=any;
 
/* import to a SAS data set */
proc import
  file=probly
  out=work.probly replace
  dbms=csv;
run;

Step 3. Create RENAME and LABEL statements with PROC SQL

This is one of my favorite SAS tricks. You can use PROC SQL SELECT INTO to create SAS programming statements for you, based on the data you're processing. Using this technique, I can build the parts of the LABEL statement and the RENAME statement dynamically, without knowing the variable names ahead of time.

The LABEL statement is simple. I'm going to build a series of assignments that look like this:

  'original variable name'n = 'original variable name'

I used the SELECT INTO clause to build a label assignment for each variable name. I used the CAT function to assemble the label assignment piece-by-piece, including the special literal syntax, the variable name, the assignment operator, and the label value within quotes. I'm fetching the variable names from SASHELP.VCOLUMN, one of the built-in dictionary tables that SAS provides to surface table and column metadata.

  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";

Here's part of the value of &labelStmt:

'Almost Certainly'n='Almost Certainly' 
'Highly Likely'n='Highly Likely' 
'Very Good Chance'n='Very Good Chance' 
'Probable'n='Probable' 
'Likely'n='Likely' 
'Probably'n='Probably' 
'We Believe'n='We Believe' 

The RENAME statement is a little trickier, because I have to calculate a new valid variable name. For this specific data source that's easy, because the only SAS "rule" that these column names violate is the ban on space characters. I can create a new name by using the COMPRESS function to remove the spaces. To be a little safer, I used the "kn" modifier on the COMPRESS function to keep only English letters, numbers, and underscores. That should cover all cases except for variable names that are too long (greater than 32 characters) or that begin with a number (or that don't contain any valid characters to begin with).

Some of the column names are one-word names that are already valid. If I include those in the RENAME statement, SAS will generate an error (you cannot "rename" a variable to its current name). I used the

/* Generate new names to comply with SAS rules.                          */
/* Assumes names contain spaces, and can fix with COMPRESS               */
/* Other deviations (like special chars, names that start with a number) */
/* would need different adjustments                                      */
/* NVALID() function can check that a name is a valid V7 name           */
proc sql noprint;
 
  /* retain original names as labels */
  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";
 
  select cat("'",trim(name),"'n","=",compress(name,,'kn')) 
     into :renameStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK"
  /* exclude those varnames that are already valid */
  AND not NVALID(trim(name),'V7');
quit;

Step 4. Modify the data set with new names and labels using PROC DATASETS

With the body of the LABEL and RENAME statements built, it's time to plug them into a PROC DATASETS step. PROC DATASETS can change data set attributes such as variable names, labels, and formats without requiring a complete rewrite of the data -- it's a very efficient operation.

I include the LABEL statement first, since it references the original variable names. Then I include the RENAME statement, which changes the variable names to their new V7-compliant values.

Finally, I reset the VALIDVARNAME= option to the normal V7 sanity. (Unless you're running in SAS Enterprise Guide, in which case the option is already set to ANY by default. Check this blog post for a less disruptive method of setting/restoring options.)

proc datasets lib=work nolist ;
  modify probly / memtype=data;
  label &labelStmt.;
  rename &renameStmt.;
  /* optional: report on the var names/labels */
  contents data=probly nodetails;
quit;
 
/* reset back to the old rules */
options validvarname=v7;

Here's the CONTENTS output from the PROC DATASETS step, which shows the final variable attributes. I now have easy-to-code variable names, and they still have their descriptive labels. My data dictionary dreams are coming true!

DATASETS rename output

Download the entire program example from my public Gist: import_renameV7.sas.

The post How to download and convert CSV files for use in SAS appeared first on The SAS Dummy.

1月 062017
 

At SAS, we've published more repositories on GitHub as a way to share our open source projects and examples. These "repos" (that's Git lingo) are created and maintained by experts in R&D, professional services (consulting), and SAS training. Some recent examples include:

With dozens of repositories under the sassoftware account, it becomes a challenge to keep track of them all. So, I've built a process that uses SAS and the GitHub APIs to create reports for my colleagues.

Using the GitHub API

GitHub APIs are robust and well-documented. Like most APIs these days, you access them using HTTP and REST. Most of the API output is returned as JSON. With PROC HTTP and the JSON libname engine (new in SAS 9.4 Maint 4), using these APIs from SAS is a cinch.

The two API calls that we'll use for this basic report are:

Fetching the GitHub account metadata

The following SAS program calls the first API to gather some account metadata. Then, it stores a selection of those values in macro variables for later use.

/* Establish temp file for HTTP response */
filename resp temp;
 
/* Get Org metadata, including repo count */
proc http
 url="https://api.github.com/orgs/sassoftware"  
 method="GET"
 out=resp
;
run;
 
/* Read response as JSON data, extract select fields */
/* It's in the ROOT data set, found via experiment   */
libname ss json fileref=resp;
 
data meta; 
  set ss.root; 
  call symputx('repocount',public_repos);
  call symputx('acctname',name);
  call symputx('accturl',html_url);
run;
 
/* log results */
%put &=repocount;
%put &=acctname;
%put &=accturl;

Here is the output of this program (as of today):

REPOCOUNT=66
ACCTNAME=SAS Software
ACCTURL=https://github.com/sassoftware

The important piece of this output is the count of repositories. We'll need that number in order to complete the next step.

Fetching the repositories and stats

It turns out that the /repos API call returns the details for 30 repositories at a time. For accounts with more than 30 repos, we need to call the API multiple times with a &page= index value to iterate through each batch. I've wrapped this process in a short macro function that repeats the calls as many times as needed to gather all of the data. This snippet calculates the upper bound of my loop index:

/* Number of repos / 30, rounded up to next integer     */
%let pages=%sysfunc(ceil(%sysevalf(&repocount / 30)));

Given the 66 repositories on the SAS Software account right now, that results in 3 API calls.

Each API call creates verbose JSON output with dozens of fields, only a few if which we care about for this report. To simplify things, I've created a JSON map that defines just the fields that I want to capture. I came up with this map by first allowing the JSON libname engine to "autocreate" a map file with the full response. I edited that file and whittled the result to just 12 fields. (Read my previous blog post about the JSON engine to learn more about JSON maps.)

The multiple API calls create multiple data sets, which I must then concatenate into a single output data set for reporting. Then to clean up, I used PROC DATASETS to delete the intermediate data sets.

First, here's the output data:

ssgit
Here's the code segment, which is rather long because I included the JSON map inline.

/* This trimmed JSON map defines just the fields we want */
/* Created by using AUTOMAP=CREATE on JSON libname       */
/* then editing the generated map file to reduce to      */
/* minimum number of fields of interest                  */
filename repomap temp;
data _null_;
 infile datalines;
 file repomap;
 input;
 put _infile_;
 datalines;
{
  "DATASETS": [
 {
   "DSNAME": "root",
   "TABLEPATH": "/root",
   "VARIABLES": [
  {
    "NAME": "id",
    "TYPE": "NUMERIC",
    "PATH": "/root/id"
  },
  {
    "NAME": "name",
    "TYPE": "CHARACTER",
    "PATH": "/root/name",
    "CURRENT_LENGTH": 50,
    "LENGTH": 50
  },
  {
    "NAME": "html_url",
    "TYPE": "CHARACTER",
    "PATH": "/root/html_url",
    "CURRENT_LENGTH": 100,
    "LENGTH": 100
  },
  {
    "NAME": "language",
    "TYPE": "CHARACTER",
    "PATH": "/root/language",
    "CURRENT_LENGTH": 20,
    "LENGTH": 20
  },
  {
    "NAME": "description",
    "TYPE": "CHARACTER",
    "PATH": "/root/description",
    "CURRENT_LENGTH": 300,
    "LENGTH": 500
  },
  {
    "NAME": "created_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/created_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "updated_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/updated_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "pushed_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/pushed_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "size",
    "TYPE": "NUMERIC",
    "PATH": "/root/size"
  },
  {
    "NAME": "stars",
    "TYPE": "NUMERIC",
    "PATH": "/root/stargazers_count"
  },
  {
    "NAME": "forks",
    "TYPE": "NUMERIC",
    "PATH": "/root/forks"
  },
  {
    "NAME": "open_issues",
    "TYPE": "NUMERIC",
    "PATH": "/root/open_issues"
  }
   ]
 }
  ]
}
;
run;
 
/* GETREPOS: iterate through each "page" of repositories */
/* and collect the GitHub data                           */
/* Output: <account>_REPOS, a data set with all basic data  */
/*  about an account's public repositories          */
%macro getrepos;
 %do i = 1 %to &pages;
  proc http
   url="https://api.github.com/orgs/sassoftware/repos?page=&i."  
   method="GET"
   out=resp
  ;
  run;
 
  /* Use JSON engine with defined map to capture data */
  libname repos json map=repomap fileref=resp;
  data _repos&i.;
   set repos.root;
  run;
 %end;
 
 /* Concatenate all pages of data */
 data sassoftware_allrepos;
  set _repos:;
 run;
 
 /* delete intermediate repository data */
 proc datasets nolist nodetails;
  delete _repos:;
 quit;
%mend;
 
/* Run the macro */
%getrepos;

Creating a simple report

Finally, I want to create simple report listing of all of the repositories and their top-level stats. I'm using PROC SQL without a CREATE TABLE statement, which will create a simple ODS listing report for me. I use this approach instead of PROC PRINT because I transformed a couple of the columns in the same step. For example, I created a new variable with a fully formed HTML link, which ODS HTML will render as an active link in the browser. Here's a snapshot of the output, followed by the code.

samplereport

/* Best with ODS HTML output */
title "github.com/sassoftware (&acctname.): Repositories and stats";
title2 "ALL &repocount. repos, Data pulled with GitHub API as of &SYSDATE.";
title3 height=1 link="&accturl." "See &acctname. on GitHub";
proc sql;
 select 
  catt('<a href="',t1.html_url,'">',t1.name,"</a>") as Repository, 
 case 
  when length(t1.description)>50 then cat(substr(t1.description,1,49),'...')
  else t1.description
 end 
as Description,
 t1.language as Language,
 t1.created_at format=dtdate9. as Created, 
 t1.pushed_at format=dtdate9. as Last_Update, 
 t1.stars as Stars, 
 t1.forks as Forks, 
 t1.open_issues as Open_Issues
from sassoftware_allrepos t1
 order by t1.pushed_at desc;
quit;

Get the entire example

Not wanting to get too meta on you here, but I've placed the entire program on my own GitHub account. The program I've shared has a few modifications that make it easier to adapt for any organization or user on GitHub. As you play with this, keep in mind that the GitHub API is "rate limited" -- they allow only so many API calls from a single IP address in a certain period of time. That's to ensure that the APIs perform well for all users. You can use authenticated API calls to increase the rate-limit threshold for yourself, and I do that for my own production reporting process. But...that's a blog post for a different day.

tags: github, JSON, open source, PROC HTTP

The post Reporting on GitHub accounts with SAS appeared first on The SAS Dummy.

8月 072016
 

A few months ago I shared the news about Jupyter notebook support for SAS. If you have SAS for Linux, you can install a free open-source project called sas-kernel and begin running SAS code within your Jupyter notebooks. In my post, I hinted that support for this might be coming in the SAS University Edition. I'm pleased to say that this is one time where my crystal ball actually worked -- Jupyter support has arrived!

(Need to learn more about SAS and Jupyter? Watch this 7-minute video from SAS Global Forum.)

Start coding in the notebook format

If you download or update your instance of SAS University Edition, you'll be able to point your browser to a slightly different URL and begin running SAS programs in Jupyter. Of course, you can continue to use SAS Studio to learn SAS programming skills. Having trouble deciding which to use? You don't have to choose: you can use both!

jupyter_ue
If you've started SAS University Edition within Oracle Virtual Box, you can find SAS Studio at its familiar address: http://localhost:10080/. And you can find the Jupyter notebook environment at: http://localhost:18888/. (If you're using VMWare, the URLs are slightly different. Check the documentation.)

Why did SAS add support for Jupyter notebooks? The answer is simple: you asked for it. While we believe that SAS Studio provides a better environment for producing and managing SAS code, Jupyter notebooks are widely used by students and data scientists who want to package their code, results, and documentation in the convenient notebook format. Notebook files (*.ipynb format) are even supported on GitHub, easily shareable and viewable by others.

Now, what are the limitations?

jupyter_uemenuWithin SAS University Edition, the Jupyter environment supports only SAS programs. The Jupyter project can support other languages, including Python, Julia, and R (the namesake languages) and dozens of others with published language kernels. However, because of the virtual-machine core of the SAS University Edition, those other languages are not available.

Support for other languages (as well as for the Jupyter console) is available when you use Jupyter in a standalone SAS environment. In fact, the sas_kernel project recently received some exciting updates. You can now host the Jupyter environment on a different machine than your SAS server (although Linux is still the only supported SAS host), and the installation process has been streamlined. See more on the sassoftware GitHub home for the sas_kernel project.

Where can you learn more about Jupyter in SAS University Edition?

Check out the help topics for SAS University Edition, beginning with this one: What is Jupyter Notebook in SAS University Edition?

And if you need help or advice about how to make the best use of SAS University Edition, check out the SAS Analytics U community. There are plenty of experts in the forum who would love to help you learn!

tags: github, Jupyter, SAS University Edition

The post Using Jupyter and SAS together with SAS University Edition appeared first on The SAS Dummy.

1月 182016
 
I love GitHub for version control and collaboration, though I'm no master of it. And the tools for integrating git and GitHub with RStudio are just amazing boons to productivity.

Unfortunately, my University-supplied computer does not play well with GitHub. Various directories are locked down, and I can't push or pull to GitHub directly from RStudio. I can't even use install_github() from the devtools package, which is needed for loading Shiny applications up to Shinyapps.io. I lived with this for a bit, using git from the desktop and rsconnect from a home computer. But what a PIA.

Then I remembered I know how to put RStudio in the cloud-- why not install R there, and make that be my GitHub solution?

It works great. The steps are below. In setting it up, I discovered that Digital Ocean has changed their set-up a little bit, so I update the earlier post as well.

1. Go to Digital Ocean and sign up for an account. By using this link, you will get a $10 credit. (Full disclosure: I will also get a $25 credit once you spend $25 real dollars there.) The reason to use this provider is that they have a system ready to run with Docker already built in, which makes it easy. In addition, their prices are quite reasonable. You will need to use a credit card or PayPal to activate your account, but you can play for a long time with your $10 credit-- the cheapest machine is $.007 per hour, up to a $5 per month maximum.

2. On your Digital Ocean page, click "Create droplet". Click on "One-click Apps" and select "Docker (1.9.1 on 14.04)". (The numbers in the parentheses are the Docker and Ubuntu version, and might change over time.) Then a size (meaning cost/power) of machine and the region closest to you. You can ignore the settings. Give your new computer an arbitrary name. Then click "Create Droplet" at the bottom of the page.

3. It takes a few seconds for the droplet to spin up. Then you should see your droplet dashboard. If not, click "Droplets" from the top bar. Under "More", click "Access Console". This brings up a virtual terminal to your cloud computer. Log in (your username is root) using the password that digital ocean sent you when the droplet spun up.

4. Start your RStudio container by typing: docker run -d -p 8787:8787 -e ROOT=TRUE rocker/hadleyverse

You can replace hadleyverse with rstudio if you like, for a quicker first-time installation, but many R users will want enough of Hadley Wickham's packages that it makes sense to install this version. The -e ROOT=TRUE is crucial for our approach to installing git into the container, but see the comment below from Petr Simicek below for another way to do the same thing.

5. Log in to your Cloud-based RStudio. Find the IP address of your cloud computer on the droplet dashboard, and append :8787 to it, and just put it into your browser. For example: http://135.104.92.185:8787. Log in as user rstudio with password rstudio.

6. Install git, inside the Docker container. Inside RStudio, click Tools -> Shell.... Note: you have to use this shell, it's not the same as using the droplet terminal. Type: sudo apt-get update and then sudo apt-get install git-core to install git.

git likes to know who you are. To set git up, from the same shell prompt, type git config --global user.name "Your Handle" and git config --global user.email "an.email@somewhere.edu"

7. Close the shell, and in RStudio, set things up to work with GitHub: Go to Tools -> Global Options -> Git/SVN. Click on create RSA key. You don't need a name for it. Create it, close the window, then view it and copy it.

8. Open GitHub, go to your Profile, click "Edit Profile", "SSH keys". Click "Add key", and just paste in the stuff you copied from RStudio in the previous step.

You're done! To clone an existing repos from Github to your cloud machine, open a new project in RStudio, and select Version Control, then Git, and paste in the URL name that GitHub provides. Then work away!

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, other than as mentioned above, the aggregator is violating the terms by which we publish our work.
4月 012013
 

I have a function like macro (recursive version) to create a sequence:

%macro _list(n,pre=ff);
%if &n=1 %then &pre.1;
%else %_list(%eval(&n-1)),&pre.&n;
%mend _list;

%put %_list(3); *produces ff1, ff2, ff3;

But when I read one of Ian Whitlock’s papers, Names, Names, Names – Make Me a List (SGF 2007, SESUG 2008),  I say: stop! I’m gonna use Ian’s %range and I create Github page to hold it (with minimum modifications due to personal preference).

I once posted My Collection of SAS Macro Repositories credited to some SAS gurus like Richard DeVenezia. When facing a programming challenge, there is always a trade-off: should I take a look at what others wrote, or I just write from the scratch? Searching also needs lots of efforts, so I plan to utilize Github pages to minimum my own searching efforts and hope it would be helpful for you (no intelligence waste anymore!). I begin with SAS list processing:

https://github.com/Jiangtang/SAS_ListProcessing/

I got most of such utilities macros (with detailed comments, examples and sources) from papers, blogs and other websites and honors belong to their authors! Sometimes I will also add my own if I think there are some holes to fill up. To get start, you may read a READ ME (will keep updated) first. Besides the individual macros, a combined file (trigged by a simple Dos command) is also available.

2月 212013
 

PhUSE-FDA Working Group 5 (Development of Standard Scripts for Analysis and Programming) just adopted Google Code as collaborative programming platform. Google Code is one of the most popular and respected open source software hosting sites in the world and it is definitely a good choice for PhUSE-FDA WG5.

But after viewing one of WG5’s working reports, Sharing Standard Statistical Scripts and getting to know why they finally chose Google Code (rather than Github which was also tested by WG5 members), I think it’s necessary to clarify some misunderstanding against Github where I’m also an occasional user.

As stated in Slide 11 in the report mentioned before, Github,

Too complicated an interface
Too much overhead for simple development
Too much training and education needed

designed for classic programming languages like C and Java (not for things like R and SAS)

For the first point regarding interface, it seems only Git command line tested, and it may be too complicated to “classic statistical programming users”. Actually, Github offers a great GUI tool, for example, GitHub for Windows to help users visually clone repositories, commit changes and other management tasks without typing Git commands:

Github_GUI

It’s also worthy to mention that with GitHub for Windows, users don’t need to install any separated version control software like Git, CVS or SVN. GitHub for Windows already includes a fully functional version of msysGit. It just makes users’ life much simpler. To use Google Code, you must install and configure something like TortoiseSVN.

The second, is Github suitable for “things like R and SAS”? It’s true that all hosts including Github are dominated by “classic programming languages like C and Java”. For SAS, SAS programmers as a whole are just not active in  any social coding activities, but for R, actually it is one of the mostly used languages in Github.

Google Code is good and a “Google Code vs Github” question is just mostly subjective. It seems to me the pickup of Google Code by WG5 rather than Github was based on incomplete information. I personally prefer Github and there are also some good reasons:

  • Use the GUI tool, GitHub for Windows to maintain a minimum Git/SVN/CVS setup.
  • Github supplies much richer statistics reports, including charts.
  • Github is more social oriented which makes it cool in this Web2.0 world.
12月 182012
 

I used "Dropbox" in the title for this post, but these techniques can be used for other cloud-based file sharing services, such as GitHub and Google Drive.

Using PROC HTTP (added in SAS 9.2), you can easily access any "cloud-based" file as long as you have a public link to reach it. I'm a cloudy sort of guy, so I've got profiles on Dropbox, Google Drive and Github. I've shared a few sample data sources on these accounts so that you can try these techniques for yourself.

Here's one working example to get you started. (In all of my examples, I included a placeholder for PROXYHOST=, which you need only if your Internet connection is routed through a gateway proxy. That's common for corporate intranet connections, less common for schools and homes.)

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" 
 out=_inbox
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;

There are a few additional considerations for accessing cloud-based data.

You need the direct download link for the file


When you click "Share Link" on a service like Dropbox, you are provided with a browser-friendly link that you can give to your friends or colleagues. When they visit that link, they are presented with a simple view of the file name and a Download button.

Here's the Share Link for my sample data set:

https://www.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat

I can't use that link in my SAS program because PROC HTTP isn't going to "click" on the Download button for me. Instead, I need the link that results from clicking on the Download button, which I can get by substituting the dl.dropbox.com domain instead of www.dropbox.com in the address:

https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat

Note: You should assume that URL schemes for the "direct download" links can be subject to change, and of course they vary based on the cloud service provider. Today, the "dl.dropbox.com" scheme works for Dropbox.

A cloud-based folder is NOT a cloud-based path for a SAS library

Dropbox allows you to share a link to a folder that contains multiple files. As tempting as it might be to try, you cannot treat a cloud folder as a "path" for a SAS library. Did you notice the FILENAME statement in my first code example? I'm providing a local file destination for the download operation. PROC HTTP is my method to copy the file that I need into my SAS session.

If I have multiple files to fetch, I'll need to repeat those steps for each file, using its direct-download URL.

Your file might require additional processing before you can use it

My earlier program example works by copying a SAS7BDAT file (a SAS data set) into my SAS WORK folder. Because it's a data set, my SAS session recognizes as a valid data member. I was lucky because the data set encoding was compatible with my SAS session.

Sometimes the data might not be quite ready for you to use directly. If a colleague shares a file in a CSV or Excel format, you will need to run a PROC IMPORT step before you can use it. For convenience, a colleague might use PROC CPORT to save multiple SAS data sets into a single transport file. You'll need to run it through PROC CIMPORT before you can use it.

Here's an example that uses a CPORT file that I created with the CARS and CLASS sample data sets:

filename _inbox TEMP;
 
proc http method="get" 
 url="https://dl.dropbox.com/s/5uehelsok9oslgv/sample.cpo" 
 out=_inbox
  /* proxyhost="http://yourproxy.company.com" */
;
run;
 
proc cimport file=_inbox lib=work;
run;
filename _inbox clear;

From the SAS log:

NOTE: PROCEDURE HTTP used (Total process time):
      real time           0.91 seconds
      cpu time            0.00 seconds

NOTE: PROC CIMPORT begins to create/update data set WORK.CARS
NOTE: Data set contains 15 variables and 428 observations. 
      Logical record length is 152

NOTE: PROC CIMPORT begins to create/update data set WORK.CLASS
NOTE: Data set contains 5 variables and 19 observations. 
      Logical record length is 40

You can %include SAS programs from the cloud

It might not be the wisest move to blindly access a cloud-based SAS program and execute it with your SAS session, but you can do it. And because SAS programs can contain data records (in the form of CARDS or DATALINES), this is another way to share data in a portable way.

This program downloads and runs Rick Wicklin's "Christmas Tree" challenge, which he posted on The DO Loop blog last week.

filename _inbox "%sysfunc(getoption(work))/ifsxmas.sas";
proc http method="get" 
 url="https://dl.dropbox.com/s/g1hat0woohud9yc/IFSChristmasTree.sas" 
 out=_inbox
  /* proxyhost="http://yourproxy.company.com" */
;
run;
%include _inbox;
filename _inbox clear;

Working with Google Drive and GitHub

These techniques also work with the other popular services, but the download links are different.

For files stored on GitHub, you need to find the direct link to the "raw" file in your repository. Here's a working example:

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://github.com/cjdinger/proc_http_example/raw/master/data/streaming.sas7bdat" 
 out=_inbox
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;

For Google Drive, you must first select to make the file or folder public, or at least reachable by "anyone who has the link". By default, your Google Drive content is private to your account.

The "Share Link" is more cryptic than some of the other services. The important part is the id= portion of the URL, which allows you to form the direct link like this:

https://docs.google.com/uc?export=download&id=<cryptic_id_from_Google_Drive>

Here's my SAS-enabled example. Note the use of the %str() function to avoid complaints about trying to resolve the ampersand portion of the URL:

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://docs.google.com/uc?export=download%str(&)id=0BwSh7LOTCPQ5MmlJZFdOOFJhOHc" 
 out=_inbox 
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;
tags: cloud, Dropbox, github, Google Drive, PROC HTTP, SAS programming