Chris Hemedinger

3月 072013
 

Last year I shared this popular tip for counting how many times a web link has been shared on Twitter or Facebook. I use this technique daily to report on the social media "popularity" of our blog articles at SAS.

I wanted to add LinkedIn into the mix. Like Twitter and Facebook, LinkedIn also has a REST-based API that can be used within a SAS program. For example, if I want to know how many times my recent post "It's raining analytics" has been shared on LinkedIn, I can use PROC HTTP to hit this URL:

http://www.linkedin.com/countserv/count/share?url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/

Here is the JSON-formatted response (as of today):

IN.Tags.Share.handleCount ( { "count":89, "fCnt":"89", "fCntPlusOne":"90", "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2013\/01\/15\/its-raining-analytics\/" } );

Wow! That blog post has done pretty well on LinkedIn (with "count"=89) - it's my most-shared post this year.

Here's a SAS program that checks the LinkedIn shares for me:

%let url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/;

/* temp holding area for LinkedIn response */
filename li temp;

/* call the LinkedIn API */
proc http
  url="http://www.linkedin.com/countserv/count/share?url=&url."
  method='GET'
  /* proxyhost= if behind corp firewall */
  out=li;
run;

/* use RegEx to gather the "count":n  value */
data liresult (keep=url lishares);
  length line $ 1000 lishares 8;
  length url $ 300;
  url = "&url.";
  infile li;
  input line;

  if _n_ = 1 then
    do;
      retain li_regex;
      li_regex = prxparse("/\""count\""\:([0-9]*)/");
    end;

  position = prxmatch(li_regex,line);

  if (position ^= 0) then
    do;
      call prxposn(li_regex, 1, start, length);
      lishares = substr(line,start,length);
    end;
run;

/* clear our temp response file */
filename li clear;

Result:

That's a lot of code to retrieve the answer for just one link. Thanks to the power of the SAS macro language, I can scale this to retrieve the values for an entire collection of links. With those results in hand, I can run other stats:

Of my 63 posts in the past 12 months, my links have been shared to LinkedIn an average of 4.58 times, with a total of 289 shares overall.

I'm not so naive that I consider these to be impressive numbers, but I've only just begun the habit of sharing my posts on LinkedIn. With this process as part of my daily blog reporting, I can now measure how my "LinkedIn engagement" improves as I share more content. Collect data, count, measure, report -- that's what it's all about, right?

Note: Many web articles, such as blog posts, can have multiple URLs. For example, the WordPress platform offers "short-link" GUID URLs as well as the longer, more descriptive URLs. While all of these different URLs might lead to the same page, LinkedIn counts only the URL you share. So if you are in the habit of publicizing different URLs for convenience or other tracking purposes, you might need to check each permutation of a page URL with this program to get the complete "LinkedIn shares" picture.

Reference example

Complete macro version of my LinkedIn shares program (lishares_example.sas)

tags: LinkedIn, PROC HTTP, REST API
3月 072013
 

Last year I shared this popular tip for counting how many times a web link has been shared on Twitter or Facebook. I use this technique daily to report on the social media "popularity" of our blog articles at SAS.

I wanted to add LinkedIn into the mix. Like Twitter and Facebook, LinkedIn also has a REST-based API that can be used within a SAS program. For example, if I want to know how many times my recent post "It's raining analytics" has been shared on LinkedIn, I can use PROC HTTP to hit this URL:

http://www.linkedin.com/countserv/count/share?url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/

Here is the JSON-formatted response (as of today):

IN.Tags.Share.handleCount ( { "count":89, "fCnt":"89", "fCntPlusOne":"90", "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2013\/01\/15\/its-raining-analytics\/" } );

Wow! That blog post has done pretty well on LinkedIn (with "count"=89) - it's my most-shared post this year.

Here's a SAS program that checks the LinkedIn shares for me:

%let url=http://blogs.sas.com/content/sasdummy/2013/01/15/its-raining-analytics/;

/* temp holding area for LinkedIn response */
filename li temp;

/* call the LinkedIn API */
proc http
  url="http://www.linkedin.com/countserv/count/share?url=&url."
  method='GET'
  /* proxyhost= if behind corp firewall */
  out=li;
run;

/* use RegEx to gather the "count":n  value */
data liresult (keep=url lishares);
  length line $ 1000 lishares 8;
  length url $ 300;
  url = "&url.";
  infile li;
  input line;

  if _n_ = 1 then
    do;
      retain li_regex;
      li_regex = prxparse("/\""count\""\:([0-9]*)/");
    end;

  position = prxmatch(li_regex,line);

  if (position ^= 0) then
    do;
      call prxposn(li_regex, 1, start, length);
      lishares = substr(line,start,length);
    end;
run;

/* clear our temp response file */
filename li clear;

Result:

That's a lot of code to retrieve the answer for just one link. Thanks to the power of the SAS macro language, I can scale this to retrieve the values for an entire collection of links. With those results in hand, I can run other stats:

Of my 63 posts in the past 12 months, my links have been shared to LinkedIn an average of 4.58 times, with a total of 289 shares overall.

I'm not so naive that I consider these to be impressive numbers, but I've only just begun the habit of sharing my posts on LinkedIn. With this process as part of my daily blog reporting, I can now measure how my "LinkedIn engagement" improves as I share more content. Collect data, count, measure, report -- that's what it's all about, right?

Note: Many web articles, such as blog posts, can have multiple URLs. For example, the WordPress platform offers "short-link" GUID URLs as well as the longer, more descriptive URLs. While all of these different URLs might lead to the same page, LinkedIn counts only the URL you share. So if you are in the habit of publicizing different URLs for convenience or other tracking purposes, you might need to check each permutation of a page URL with this program to get the complete "LinkedIn shares" picture.

Reference example

Complete macro version of my LinkedIn shares program (lishares_example.sas)

tags: LinkedIn, PROC HTTP, REST API
2月 272013
 

If you're using SAS Enterprise Guide and you're not using custom tasks, you're missing out! Custom tasks are new features you can plug in – features that weren't originally packaged with the software. (And contrary to the Pulp-O-Mizer poster that I created, they do not come FROM OUTER SPACE. Usually.)

Did you know there are dozens of tasks available for you to download and install? These tasks provide all sorts of capabilities, including new analysis and reports, productivity tools for programmers and utilities that help you to manage your projects. Many of these tasks also work with the SAS Add-In for Microsoft Office.

This is the topic of my next SAS Talks session, coming up on March 14, 2013 at 1pm EST. Tune in if you can, but don't worry if you can't -- the session will be recorded and available later as part of the SAS Talks series.

In this session, you'll learn what custom tasks are available and how you can download and install them. You'll also see a few of the tasks in action as part of a live demo. Specific topics include:

  • What are custom tasks, and how are they different from (and similar to) built-in tasks?
  • How to download, install and control access to custom tasks.
  • A quick look at creating your own custom tasks.

Of course, you can expect me to mention my new book during the session. But you don't need to own the book in order to benefit from the information or from the many custom tasks that you can already use today. I hope you'll join me for the session.

tags: SAS custom tasks, sas talks
2月 232013
 

SAS Integration Technologies provides a flexible platform to create all types of apps, from simple utilities to full-blown applications. As part of the research for my SAS Global Forum 2013 paper (Create Your Own Client Apps Using SAS Integration Technologies), I've been trying to invent some useful examples that you can run from your Windows desktop. In this post I'll cover how you can use Windows PowerShell plus the SAS Integration Technologies client to connect to a SAS Metadata Server.

Creating objects with SAS Object Manager

When working with the SAS Integration Technologies client, you need a way to create the objects that represent the connections to the SAS services. For that, you must use the SAS Object Manager.

The SAS Object Manager includes a class named ObjectFactory. As the name implies, the "object factory" class is where your subsequent objects will be created. In our examples, we will use the ObjectFactoryMulti2 class to create the connection to the SAS server for use in our applications. After creating that connection, you can use methods on the connection object to get to the other services we need.

To get started with the SAS Object Manager in Windows PowerShell, use the New-Object -ComObject command.

$objFactory = New-Object -ComObject SASObjectManager.ObjectFactoryMulti2

Before you can connect to a SAS server, you must define its attributes to SAS Object Manager. A SAS server has several attributes: a host name, TCP port number, and a Class Identifier. The Class Identifier is a 32-character GUID (unique ID) that indicates the type of SAS server that you expect to connect to.

How to Find the Correct Class Identifier
If you search support.sas.com you may be able to find a "lookup" table for the Class Identifier values to map to the specific types of SAS servers. However, the most reliable source for these values, and usually the easiest to access, can come from SAS itself by way of PROC IOMOPERATE. Here is an example program:

proc iomoperate;
   list types;
quit;

Here's an excerpt of the SAS log, which contains the values:

SAS Metadata Server 
    Short type name  : Metadata 
    Class identifier : 0217e202-b560-11db-ad91-001083ff6836

SAS Stored Process Server 
    Short type name  : StoredProcess 
    Class identifier : 15931e31-667f-11d5-8804-00c04f35ac8c

SAS Workspace Server 
    Short type name  : Workspace 
    Class identifier : 440196d4-90f0-11d0-9f41-00a024bb830c

You use the SAS Object Manager to create a ServerDef object, and then use the CreateObjectByServer method to establish the connection. You name the SAS Metadata Server port (which is 8561 in a default installation) and the SAS Metadata Server value for the ClassIdentifier. Here is an example:
$objFactory   = New-Object -ComObject SASObjectManager.ObjectFactoryMulti2
$objServerDef = New-Object -ComObject SASObjectManager.ServerDef
 
# assign the attributes of your metadata server
$objServerDef.MachineDNSName  = "yournode.company.com"
$objServerDef.Port            = 8561  # metadata server port
$objServerDef.Protocol        = 2     # 2 = IOM protocol
# Class Identifier for SAS Metadata Server
$objServerDef.ClassIdentifier = "0217E202-B560-11DB-AD91-001083FF6836"
 
# connect to the server
# we'll get back an OMI handle (Open Metadata Interface)
try
{
$objOMI = $objFactory.CreateObjectByServer(
        "", 
        $true, 
        $objServerDef, 
        "sasdemo",  # metadata user ID
        "Password1" # password
        )
Write-Host "Connected to " $objServerDef.MachineDNSName 
}
catch [system.exception]
{
  Write-Host "Could not connect to SAS metadata server: " $_.Exception
  exit -1
}

CreateObjectByServer returns a connection to the SAS Metadata Server, sometimes called OMI (which stands for "Open Metadata Interface" and is easily confused with IOM). In the above example program, the connection is in the PowerShell variable named $objOMI.

Most metadata operations require that you know the metadata ID for the repository in which your metadata resides. For most applications that's the Foundation repository. Even though it has a standard name ("Foundation"), the ID value can differ in each installation. So the next step is to use the GetRepositories method to find the ID value for the Foundation repository.

The GetRepositories method returns information within an XML-formatted structure, which you must then parse to get the information you need. Here is an example result from the GetRepositories method:

<Repositories>
 <Repository Id="A0000001.A5B4FV3C" 
    Name="Foundation" Desc="" DefaultNS="SAS"/>
 <Repository Id="A0000001.A5Q23NT1" 
    Name="BILineage" Desc="BILineage" DefaultNS="SAS"/>
</Repositories>

The nugget of information that you need from this example is A0000001.A5B4FV3C, which is the ID for the Foundation repository in this installation. Fortunately, Windows PowerShell provides an XML data type that makes it easy to filter and parse. The following code segment does the job:
# get list of repositories
$reps="" # this is an "out" param we need to define
$rc = $objOMI.GetRepositories([ref]$reps,0,"")

# parse the results as XML
[xml]$result = $reps

# filter down to "Foundation" repository
$foundationNode = $result.Repositories.Repository | ? {$_.Name -match "Foundation"} 
$foundationId = $foundationNode.Id

Write-Host  "Foundation ID is $foundationId"  

There! The connection is made, and you know the Foundation ID. The next step is to use the metadata API methods (such as GetMetadataObjects) to retrieve useful information from the SAS Metadata Server.

I'll describe that in my next post, but if you want a sneek peak you can see the complete code for the example here (hosted on GitHub).

Related links

Using Windows PowerShell to find registered tables and columns in SAS metadata
Running Windows PowerShell scripts
Build your own SAS data set viewer using Windows PowerShell
Example of using Windows PowerShell to query SAS table metadata from SAS Metadata Server (GitHub)

tags: PowerShell, sas administration, SAS Integration Technologies, sasgf13
2月 132013
 

John D. Cook shared a picture of "pretty squiggles" on his blog, as well as a prose description of the mathematics behind it.

I'm more of a programmer than a mathematician, but I've attempted to transcribe his description into a SAS program. I used DATA step to generate the point values, and then PROC SGPLOT and the SERIES statement to plot them.

/* Create a data set with the 3 sets of data points */
data squiggle;
 do time=-100 to 100 by 0.01;
  blue = sin(time);
  /* hope I've got the golden ration (phi) correct */
  green = 0.7 * sin( ((1+sqrt(5))/2) * time );
  red = blue+green;
  output;
 end;
run;
 
ods graphics / width=2000 height=200;
 
/* Plot the data using a series of SERIES plots */
proc sgplot data=squiggle noautolegend;
 series x=time y=blue  
  / lineattrs=(color=blue  pattern=solid);
 series x=time y=green 
  / lineattrs=(color=green pattern=solid);
 series x=time y=red 
  / lineattrs=(color=red   pattern=solid);
 xaxis display=none ;
 yaxis display=none ;
run;

As a variation, I included a "filled" version of the plot using the BAND statement instead of SERIES.

/* a variation that uses a BAND plot instead */
proc sgplot data=squiggle noautolegend;
 band x=time lower=0 upper=blue  
  / transparency=0.6 fillattrs=(color=blue);
 band x=time lower=0 upper=green  
  / transparency=0.6 fillattrs=(color=green);
 band x=time lower=0 upper=red  
  / transparency=0.6 fillattrs=(color=red);
 xaxis display=none;
 yaxis display=none grid;
run;

I don't understand all of the topics that John covers in his blog, but I'm a regular reader anyway. As with Rick Wicklin's blog, I tend to feel smarter while I read it, even though I don't have the ability to explain what I've read to anyone else.

tags: mathematics, SGPLOT
2月 102013
 

Netflix has made a big splash in the news with its use of big data. By analyzing millions of data points about the viewing habits of its customers, the movie delivery giant used the insight it gained to devise the "perfect show". One of the defining characteristics of the show, aside from its cast and story line, is in its packaging. House of Cards is delivered in a series of episodes, but released all at once, thus enabling the audience to indulge in "binge viewing".

In my household we are familiar with binge viewing. We don't watch a lot of television, and "media" is all-but-banned during the week when school is in session. That means that during weekends and holidays, in between other scheduled activities, the family lets loose. And since we don't subscribe to any other cable or satellite programming, Netflix receives the biggest share of our attention.

You might remember how I've used SAS to analyze our personal Netflix viewing history. This graph (produced using PROC SGPLOT) shows how our per-day minutes-of-watching has grown ever since Netflix introduced television streaming (around April 2010).


As you can see from the data points on the right side of the plot, we've had our share of Netflix binges. Here's the breakdown by day-of-week. Saturday accounts for 23% of our viewing minutes, while Friday-Sunday add up to 53%. (That accounts for 21,517 minutes -- that's 358 hours! Just imagine if we did watch a lot of TV...).


Okay, so if Netflix's predictive model indicates "viewers watch a lot in one sitting", then our behavior (sometimes) fits that model pretty well. But what about the content? Here are the top 10 serial shows that we've streamed using our account:


I think that this is where our household viewing habits depart from the predicted model. And while House of Cards might be a compelling show, it's not likely to appear on our top 10 chart anytime soon. If Netflix had wanted to design a show especially for us, the plot pitch might have sounded like this:

A groundbreaking sci-fi historical fiction series, featuring several British and Australian actors. Chunked into 22-minute episodes, the stories revolve around two smart boys -- one a fake psychic and the other an OCD-afflicted detective -- who travel through time and space. With their only goal to fill their summer vacation with interesting activities, they consistently foil the evil plots of the antagonist, Dr. Doofenshmirtz. Subplots revolve around Dr. Doofenshmirtz acquiring and dispensing with several wives (his "evil queens"), who seem to always fall victim to his many "-inators".

Will the boys ever get "busted"? Not if the mermaids have anything to say about it.

I'm not sure that such a show would be a commercial success, even though it might be a big hit in the Hemedinger household. It was wise of Netflix to use the accumulated viewing habits of all of its customers, and not just me, to train its analytical models. And I think the House of Cards fans will agree with that.

tags: big data, Netflix, predictive analytics
2月 042013
 

When I travel to San Francisco in April for SAS Global Forum 2013, it will make my 12th time to attend the international SAS users group conference, and my 7th consecutive year.

A lot of people assume that I automatically go every year, but the truth is that SAS employees have to earn their spots. For most of us, that means we need to write and present a paper. And all papers must be selected and approved by the section chairpersons who serve on the SAS Global Forum committee.

Since my slot is never guaranteed, every year I propose several different paper topics. Using this scattershot approach, I always hope for one or two to be selected. Any more than that (and it's happened) makes for a very busy time leading up to the event, as well as a tough schedule to juggle during the conference week.

I'll have two papers for the 2013 conference, which I consider to be a manageable number. Here's what I'm working on...

Title: For All the Hats You Wear: SAS Enterprise Guide Has Got You Covered
Conference section: SAS Enterprise Guide: Implementation and Usage
Abstract: Are you new to SAS and trying to figure out where to begin? Are you a SAS programmer, already comfortable with code but unsure about new tools? Are you a statistician seeking to apply your techniques in a new way? Are you a data manager, just trying to get your data in shape? Perhaps you're a Jack (or Jill) of all trades trying to manage work in the simplest way possible. Regardless of your background, SAS Enterprise Guide is full of features and techniques to help you achieve your objective. In this paper, we show how you can turn SAS Enterprise Guide into your tool to get work done immediately, without conforming to an entirely new way of working just to become productive.
How I got this one: I was lucky! MaryAnne DePesquo, who chairs the section this year, invited me to present on this topic. This is a great example of how the connections you make at in-person events like SAS user groups can open up new opportunities.

Title: Create Your Own Client Apps Using SAS Integration Technologies
Conference section: Application Development
Abstract: SAS Integration Technologies allows any custom client application to interact with SAS services. SAS Enterprise Guide and SAS Add-In for Microsoft Office are noteworthy examples of what can be done, but your own applications don't have to be that ambitious. This paper explains how to use SAS Integration Technologies components to accomplish focused tasks, such as run a SAS program on a remote server, read a SAS data set, run a stored process, and transfer files between the client machine and the SAS server. Working examples in Microsoft .NET (including C# and Visual Basic .NET) as well as Windows PowerShell are also provided.
How I got this one: During the call-for-papers period I reviewed the description for the Application Development section. The chairpersons indicated interest in Microsoft .NET and SAS Integration Technologies -- two topics that I know something about. So I wrote a proposal that I felt I could achieve (important!).

Now it's time to actually write the papers and develop examples that support them!

tags: SAS global forum, SAS GloFo, sas users, sasgf13
1月 312013
 

I've got a new trick that you can use to track progress in a long-running SAS program, while using SAS Enterprise Guide.

I've previously written about the SYSECHO statement and how you can use it to update the Task Status window with custom messages. SYSECHO is a "global" statement in the SAS language, which works great within the boundaries of a PROC step to annotate your Task Status messages. For certain interactive PROCs (like SQL), you can pepper in SYSECHO statements to give more frequent and detailed updates.
here's what it looks like
But the DATA step, which is often used for long-running operations, cannot use multiple SYSECHO statements. When SAS compiles the DATA step it processes all global statements up front, so that only one use of the SYSECHO statement takes effect. This means that if you want to use multiple SYSECHO calls to report which observation or values are being processed as the DATA step runs, you can't.

That is, you couldn't, until now. With SAS 9.3 Maintenance 2 you can use a new function, named DOSUBL, to process any SAS statements (including global statements) "just in time" as your DATA step runs. DOSUBL accepts one or more SAS statements and executes them immediately, inline with the currently running DATA step. The statements must represent a "complete" segment of a SAS program: no partial PROCs or partial DATA steps. But it can execute any PROC step (bounded by RUN or QUIT) or a DATA step (bounded by RUN) or -- key for our example -- a SAS global statement.

NOTE: DOSUBL is an experimental function in SAS 9.3, and you really need the M2 revision or later for it to work well. It's on target for a production release with SAS 9.4. It's also the topic of a paper by Rick Langston at SAS Global Forum 2013 (here's a preview from Jason's paper last year).

If you have SAS 9.3M2, try running this from your SAS Enterprise Guide session. Note how I put a condition around the DOSUBL call; I don't want to process the statement for every record, as that might get too chatty in the status window. Instead, I call it once for every 5 records.

/* Tracking where you are in the DATA step iterations */
data _null_;
 set sashelp.class;
 if mod(_n_,5)=0 then
  rc = dosubl(cats('SYSECHO "OBS N=',_n_,'";'));
 s = sleep(1); /* contrived delay - 1 second on Windows */
run;


Here's another example that reports on progress, but instead of triggering the update based on record count, I've triggered it based on the current BY value in process:

proc sort data=sashelp.cars out=sortedcars;
by make;
run;
 
/* Tracking which "category" value is being processed */
data _null_;
 set sortedcars;
 by make;
 if first.make then do;
  rc = dosubl(cats('SYSECHO "OBS N=',_n_,' MAKE=',make,'";'));
 end;
run;

With a little imagination, you can devise some sophisticated SYSECHO statements that report on other aspects of the DATA step processing: the timings, the program data vector, and so on. It might become a useful debugging technique (in lieu of the DATA step debugger, which can't run in a client application like SAS Enterprise Guide).

Other references:

Tracking progress in SAS programs with SAS Enterprise Guide
Executing a PROC from a DATA step

tags: debugging, SAS Enterprise Guide, SYSECHO
1月 262013
 

Last year I published an example application for searching your SAS Enterprise Guide project files (EGP files). The example shows off some of the cool features of the automation API, and it's a useful tool.

As neat of an example as that was, it had some limitations. It worked only with SAS Enterprise Guide 4.3 installations, so v5.1 users couldn't use it. And some users noticed a strange behavior where the app littered your current directory with a collection of temporary files. The app didn't support searching subfolders, a popular request from those who tried it. I'm the first to admit that the app stretches the capabilities of SAS Enterprise Guide automation.

In response to these issues, I built a new version: one that does not use the automation API but instead uses techniques that are not documented for end users. This allows me to circumvent the aforementioned limitations and provide just a simple, useful tool. (But without the source code...)

Here's the new version (packaged in a Zip file):

>> DOWNLOAD link for the EGPSearcher tool (SAS Enterprise Guide 4.3 and 5.1)

The Zip package contains 3 versions of the tool: one for 4.3, and two for 5.1 (32-bit and 64-bit). Note: Because you're downloading this from the big scary Internet, you might need to right-click on the EXE file in Windows Explorer, select Properties and Unblock the program so that it can run.

If you have interest (and a bunch of EGP files), give this new version a test run. Let me know via the blog comments how it works for you. I want to know the good and the bad, as I'd like to make it a useful tool for all.

You can still refer to the original application and its source code as a reference for how to use the automation API from a .NET application.

tags: EG-Project-Search-inator, SAS Enterprise Guide