10月 252016

Requirements that are the most easily described can often be the most difficult to implement. I’m referring to requests like:

  • Display a gauge with the most recently collected metric.
  • Plot a 18 month rolling window of profit.
  • Display last month’s products percent of total metrics for visual comparison.

Okay, so these are pretty specific requests, which I built a report to answer, but none the less, requirements like these do exist.

Use Rank in SAS Visual Analytics

So, how do you implement these requests? Use rank! You might be wondering how this is possible since the rank feature requires a numeric value and these requirements are based on dates. Solution: use the TreatAs function. Let’s break it down step by step.

But first, here is a breakdown of the report objects used in this report. Notice that this report contains a section prompt via a button bar which prompts the user to select a Product Line. This section prompt filters all of the other objects by that Product Line value.


Step 1: Use TreatAs to create a metric from your date category

I am assuming that your data source has a date category. This will work with a date or date by month or date by year formatted data item. So long as the data item is recognized as a date then this technique will work.

This example will use the Date by Month data item. We will use the TreatAs function to create a metric, or in other words, a numeric representation of the date. That’s the great thing about dates in SAS, they simply represent the number of days before or after January 1, 1960. So the most recent the date, the larger the number, which we can then use rank to order.

From the Data tab, use the drop-down menu and select New Calculated Item….


Give your new calculated data item a name.

The result type will be numeric.

Under Operators, use the search window to find the TreatAs function; then drag that onto the visual pane. For the drop-down option, select _Number_.
Finally, drag the date data item onto the visual pane. In this example, we are using Date by Month


Step 2: Change the aggregation on your new measure to be non-additive

Next, we need to make sure this new metric that represents the Date by Month date is non-additive. We will not get the proper result if this new metric takes the sum or average when displayed on a visualization. To do this, navigate to the Data tab and click on the name of the new metric you created. In my example, I created a new metric named DateByMonthNum.

Toward the bottom of the Data tab are the data properties. Under the Aggregation property use the drop-down menu and select one of the non-additive metrics such as: Minimum, Median, or Maximum.


Step 3: Verify that your new measure returns the correct results

Now we can verify that when we rank our new measure, we get the expected results. To do this, I used a list table and added both the date data item Date by Month and the new metric data item DateByMonthNum. Here we can see that when I sort the metric data item by descending I get the expected results where each Date by Month value gives me a different DateByMonthNum value. I can also see that the more recent Date by Month value pairs to a larger DateByMonthNum value.


To be sure that you properly assigned a non-additive aggregation type, you can use the Show detail data property from the Properties tab. At the detail level you should see the same value pairs for the date and metric data items. Once you de-select Show detail data you should see the exact same value pairs. If you do, then you have correctly assigned your non-additive aggregation type.


Step 4: Use Rank to meet report requirements

Now that we have our metric properly created, we can use the Rank feature to display the last month’s metrics or a rolling window.

Last Month’s Metrics
In this visualization I used the Gauge Object.


On the Roles tab, I assigned Profit to the Measure role and Product to the Group role. I then created a five interval Display Rule between 0% and 50% at 10% intervals where anything over 50% is grouped together under the darkest green rule.


Now we must filter this visualization to display only the last month’s profit metrics; we do this by using the Rank feature. From the Ranks tab, you must first select the category data item you wish to subset by the rank. In our example, we want to display the last month’s metrics, so we will want to add a rank for the Date by Month data item. Once selected, click the button Add Rank.


Next we will need to select the metric we want to rank by. Next to the By drop-down; select our newly created metric DateByMonthNum. Then we will want to select the type of rank and how many to return. In this example, we will return the Top Count, i.e. the greatest value. And for the Count we want to return 1.


To help with the titling of the report, I added the exact same rank to a List Table object to display the data’s last month and to help report users know which month they are looking at.



Rolling 18 Month Window
The next visualization I created was a Line Chart Object plotting a rolling window of 18 month profit.


On the Roles tab, I selected Date by Month as the Category and Profit as the Measure.
On the Ranks tab, I selected the same values as I did for the list table and gauge objects, except I selected a Count of 18 to return the top 18 values of Date by Month ranked on our newly created metric DateByMonthNum. The rank will return the top 18 highest values for DatebyMonthNum which pair to the most recent 18 values for Date by Month giving us a rolling 18 month window.


Other Applications

In this example I used Rank at the month level but you could use this technique at the day level, quarter level, essentially for any supported date interval.

Assuming you have the proper data collected, you could also use Rank for the standard use of ranking the top X performing products, sales representatives or investment funds. You could also use rank to identify your bottom performing manufacturing equipment, car mileage, or school ratings.

Other Report Screenshots



tags: SAS Professional Services, SAS Programmers, SAS Visual Analytics

Use Rank in SAS Visual Analytics to display the last date, month or rolling window was published on SAS Users.

10月 212016

ProblemSolversHave you ever needed to run code based on the client application that you are using? Or have you needed to know the version of SAS® software that you are running and the operating system that you are running it on? This blog post describes a few automatic macro variables that can help with gathering this information.

Application Name

You can use the &_CLIENTAPP macro variable to obtain the name of the client application. Here are some details:

  • Referencing &_CLIENTAPP in SAS® Studio returns a value of SAS Studio
  • Referencing &_CLIENTAPP in SAS® Enterprise Guide® returns a value of ‘SAS Enterprise Guide
    Note: The quotation marks around SAS Enterprise Guide are part of the value.

Program Name

You can use the &SYSPROCESSNAME macro variable to obtain the name of the current SAS process. Here are some details:

  • Referencing &SYSPROCESSNAME interactively within the DMS window returns a value of DMS Process
  • Referencing &SYSPROCESSNAME in the SAS windowing environment of your second SAS session returns a value of DMS Process (2)
  • Referencing &SYSPROCESSNAME in SAS Enterprise Guide or SAS Studio returns a value of Object Server
  • Referencing &SYSPROCESSNAME in batch returns the word Program followed by the name of the program being run (for example: Program '')
    Note: For information about other techniques for retrieving the program name, see SAS Note 24301: “How to retrieve the program name that is currently running in batch mode or interactively.”


The following code illustrates how you can use both of these macro variables to check which client application you are using and display a message in the SAS log based on that result:

%macro check;
  %if %symexist(_clientapp) %then %do;
   %if &_clientapp = SAS Studio %then %do;
    %put Running SAS Studio;
   %else %if &_clientapp= 'SAS Enterprise Guide' %then %do;
    %put Running SAS Enterprise Guide; 
  %else %if %index(&sysprocessname,DMS) %then %do;
    %put Running in Display Manager;
  %else %if %index(&sysprocessname,Program) %then %do;
     %let prog=%qscan(%superq(sysprocessname),2,%str( ));
     %put Running in batch and the program running is &prog;
  %mend check;

SAS Session Run Mode or Server Type

Another helpful SAS read-only automatic macro variable is &SYSPROCESSMODE. You can use &SYSPROCESSMODE to obtain the current SAS session run mode or server type name. Here is a list of possible values:

• SAS Batch Mode

• SAS/CONNECT Session 

• SAS DMS Session

• SAS IntrNet Server

• SAS Line Mode

• SAS Metadata Server

• SAS OLAP Server

• SAS Pooled Workspace Server

• SAS Share Server

• SAS Stored Process Server

• SAS Table Server

• SAS Workspace Server

Operating System and Version of SAS

Having the information detailed above is helpful, but you might also need to know the operating system and exact version of SAS that you are running. The following macro variables help with obtaining this information.

You can use &SYSSCP and &SYSSCPL to obtain an abbreviation of the name of your operating system.  Here are some examples:


For a complete list of values, see the “SYSSCP and SYSSCPL Automatic Macro Variables” section of SAS® 9.4 Macro Language: Reference, Fourth Edition.

SAS Release

&SYSVLONG4 is the most informative of the macro variables that provide SAS release information. You can use it to obtain the release number and maintenance level of SAS as well as a four-digit year. Here is an example:

%put &sysvlong4;

This code would print something similar to the following in the log:


Here is what this output means:

SAS release: 9.04.01

Maintenance level: M3

Ship Event date: D06292015

I hope that some of the tools described above are useful to you for obtaining information about your SAS environment. If you have any questions, please contact SAS Technical Support, and we will be happy to assist you. Thank you for using SAS!

tags: macro, Problem Solvers, SAS Macro, SAS Programmers

Macro variables that provide information about your SAS® environment was published on SAS Users.

10月 202016

SAS Quality Knowledge Base locales in a SAS event stream processing compute windowIn a previous blog post, I demonstrated combining the power of SAS Event Stream Processing (ESP) and the SAS Quality Knowledge Base (QKB), a key component of our SAS Data Quality offerings. In this post, I will expand on the topic and show how you can work with data from multiple QKB locales in your event stream.

To illustrate how to do this I will review an example where I have event stream data that contains North American postal codes.  I need to standardize the values appropriately depending on where they are from – United States, Canada, or Mexico – using the Postal Code Standardization definition from the appropriate QKB locale.  Note: This example assumes that the QKB for Contact Information has been installed and the license file that the DFESP_QKB_LIC environment variable points to contains a valid license for these locales.

In an ESP Compute window, I first need to initialize the call to the BlueFusion Expression Engine Language function and load the three QKB locales needed – ENUSA (English – United States), ENCAN (English – Canada), and ESMEX (Spanish – Mexico).


Next, I need to call the appropriate Postal Code QKB Standardization definition based on the country the data is coming from.  However, to do this, I first need to standardize the Country information in my streaming data; therefore, I call the Country (ISO 3-character) Standardization definition.


After that is done, I do a series of if/else statements to standardize the Postal Codes using the appropriate QKB locale definition based on the Country_Standardized value computed above.  The resulting standardized Postal Code value is returned in the output field named PostalCode_STND.


I can review the output of the Compute window by testing the ESP Studio project and subscribing to the Compute window.


Here is the XML code for the SAS ESP project reviewed in this blog:


Now that the Postal Code values for the various locales have been standardized for the event stream, I can add analyses to my ESP Studio project based on those standardized values.

For more information, please refer to the product documentation:

Learn more about a sustainable approach to data quality.

tags: data management, SAS Data Quality, SAS Event Stream Processing, SAS Professional Services

Using multiple SAS Quality Knowledge Base locales in a SAS Event Stream Processing compute window was published on SAS Users.

10月 122016

Recently, one of sons came to me and asked about something called “The Monty Hall Paradox.” They had discussed it in school and he was having a hard time understanding it (as you often do with paradoxes).

For those of you who may not be familiar with the Monty Hall Paradox, it is named for the host of a popular TV game show called “Let’s Make a Deal.” On the show, a contestant would be selected and shown a valuable prize.  Monty Hall would then explain that the prize is located just behind one of three doors and asked the contestant to pick a door.  Once a door was selected, Monty would then tease the contestant with cash to get him/her to either abandon the game or switch to another door.  Invariably, the contestant would stand firm and then Monty would proceed to show the contestant what was behind one of the other doors.  Of course, it wouldn’t be any fun if the prize was behind the revealed door, so after showing the contestant an empty door Monty would then ply them with even more cash, in the hopes that they would abandon the game or switch to the remaining door.

Almost without fail, the contestant would stand firm in their belief that their chosen door was the winner and would not switch to the other door.

So where’s the paradox?

When left with two doors, most people assume that they've got a 50/50 chance at winning. However, the truth is that the contestant will double his/her chance of winning by switching to the other door.

After explaining this to my son, it occurred to me that this would be an excellent exercise for coding in Python and in SAS to see how the two languages compared. Like many of you reading this blog, I’ve been programming in SAS for years so the struggle for me was coding this in Python.

I kept it simple. I generated my data randomly and then applied simple logic to each row and compared the results.  The only difference between the two is in how the languages approach it.  Once we look at the two approaches then we can look at the answer.

First, let's look at SAS:

data choices (drop=max);
do i = 1 to 10000;
	prize = ceil(max*u);
	choice = ceil(max*u2);

I started by generating two random numbers for each row in my data. The first random number will be used to randomize the prize door and the second will be used to randomize the choice that the contestant makes. The result is a dataset with 10,000 rows each with columns ‘prize’ and ‘choice’ to represent the doors.  They will be random integers between 1 and 3.  Our next task will be to determine which door will be revealed and determine a winner.

If our prize and choice are two different doors, then we must reveal the third door. If the prize and choice are the same, then we must choose a door to reveal. (Note: I realize that my logic in the reveal portion is somewhat flawed, but given that I am using an IF…ELSE IF and the fact that the choices are random and there isn’t any risk of introducing bias, this way of coding it was much simpler.)

data results;
set choices;
by i;
if prize in (1,2) and choice in (1,2) then reveal=3;
else if prize in (1,3) and choice in (1,3) then reveal=2;
else if prize in (2,3) and choice in (2,3) then reveal=1;

Once we reveal a door, we must now give the contestant the option to switch. Switch means they always switch, neverswitch means they never switch.

if reveal in (1,3) and choice in (1,3) then do;
        switch = 2; neverswitch = choice; 
else if reveal in (2,3) and choice in (2,3) then do;
	switch = 1; neverswitch = choice; 
else if reveal in (1,2) and choice in (1,2) then do;
	switch = 3; neverswitch = choice; 

Now we create a column for the winner.  1=win 0=loss.

	switchwin = (switch=prize);
	neverswitchwin = (neverswitch=prize);

Next, let’s start accumulating our results across all of our observations.  We’ll take a running tally of how many times a contestant who switches win as well as for the contestant who never switches.

data cumstats;
set results;
format cumswitch cumnever comma8.;
format pctswitch pctnever percent8.2;
retain cumswitch cumnever;
if _N_ = 1 then do;
	cumswitch = 0; cumnever = 0;
else do;
cumswitch = cumswitch+switchwin;
cumnever = cumnever+neverswitchwin;
pctswitch = cumswitch/i;
pctnever = cumnever/i;
proc means data=results n mean std;
var switchwin neverswitchwin;
frame	;
symbol1 interpol=splines;
pattern1 value=ms;
	minor=none ;
	minor=none ;
	minor=none ;
title1 " Cumulative chances of winning on Let's Make a Deal ";
proc gplot data=work.cumstats;
	plot pctnever * i  /
frame	vaxis=axis1
plot2 pctswitch * i  = 2 /
 	legend=legend1 ;
run; quit; 


The output of PROC MEANS shows that people who always switch (switchwin) have a win percentage of nearly 67%, while the people who never switch (neverswitchwin) have a win percentage of only 33%. The Area Plot proves the point graphically showing that the win percentage of switchers to be well above the non-switchers.

Now let’s take a look at how I approached the problem in Python (keeping in mind that this language is new to me).

Now, let’s look at Python:

Copied from Jupyter Notebook

import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from itertools import accumulate
%matplotlib inline

First let's create a blank dataframe with 10,000 rows and 10 columns, then fill in the blanks with zeros.

rawdata = {'index': range(10000)}
df = pd.DataFrame(rawdata,columns=['index','prize','choice','reveal','switch','neverswitch','switchwin','neverswitchwin','cumswitch','cumnvrswt'])
df = df.fillna(0)

Now let's populate our columns. The prize column represents the door that contains the new car! The choice column represents the door that the contestant chose. We will populate them both with a random number between 1 and 3.

for row in df['index']:

Now that Monty Hall has given the contestant their choice of door, he reveals the blank door that they did not choose.

for i in range(len(df)):
    if (df['prize'][i] in (1,2) and df['choice'][i] in (1,2)):
    elif (df['prize'][i] in (1,3) and df['choice'][i] in (1,3)):
    elif (df['prize'][i] in (2,3) and df['choice'][i] in (2,3)):
df['reveal']= reveal

Here's the rub. The contestant has chosen a door, Monty has revealed a blank door, and now he's given the contestant the option to switch to the other door. Most of the time the contestant will not switch even though they should. To prove this, we create a column called 'switch' that reflects a contestant that ALWAYS switches their choice. And, a column called 'neverswitch' that represents the opposite.

for i in range(len(df)):
    if (df['reveal'][i] in (1,3) and df['choice'][i] in (1,3)):
    elif (df['reveal'][i] in (1,2) and df['choice'][i] in (1,2)):
    elif (df['reveal'][i] in (2,3) and df['choice'][i] in (2,3)):
    neverswitch = choice

Now let's create a flag for when the Always Switch contestant wins and a flag for when the Never Switch contestant wins.

for i in range(len(df)):
    if (df['switch'][i]==df['prize'][i]):
    if (df['neverswitch'][i]==df['prize'][i]):

Now we accumulate the total number of wins for each contestant.


…and divide by the number of observations for a win percentage.

for i in range(len(df)):

Now we are ready to plot the results. Green represents the win percentage of Always Switch, blue represents the win percentage of Never Switch.

fig, ax = plt.subplots(1, 1, figsize=(12, 9))
ax.plot(x,y,lw=3, label='Always', color='green')
ax.plot(x,y2,lw=3, label='Never',color='blue',alpha=0.5)
ax.fill_between(x,y2,y, facecolor='green',alpha=0.6)
ax.fill_between(x,0,y2, facecolor='blue',alpha=0.5)
ax.set_ylabel("Win Pct",size=14)
plt.title("Cumulative chances of winning on Let's Make a Deal", size=16)


Why does it work?

Most people think that because there are two doors left (the door you chose and the door Monty didn’t show you) that there is a fifty-fifty chance that you’ve got the prize.  But we just proved that it’s not, so “what gives”?

Remember that the door you chose at first has a 1/3 chance of winning.  That means that the other two doors combined have a 2/3 chance in winning.  Even though Monty showed us what’s behind one of those two doors, the two of them together still have a 2/3 chance of winning.  Since you know one of them is empty, that means the door you didn’t pick MUST have a 2/3 chance of winning.  You should switch.  The green line in the Python graph (or the red line in the SAS graph) shows that after having run 10,000 contestants through the game the people that always switched won 67% of the time while the people that never switched only won 33% of the time.

My comparisons and thoughts between SAS and Python.

In terms of number of lines of code required, SAS wins hands down.  I only needed 57 lines of code to get the result in SAS, compared to 74 lines in Python. I realize that experience has a lot to do with it, but I think there is an inherent verbosity to the Python code that is not necessarily there in SAS.

In terms of ease of use, I’m going to give the edge to Python.  I really liked how easy it was to generate a random number between two values.  In SAS, you have to actually perform arithmetic functions to do it, whereas in Python it’s a built-in function. It was exactly the same for accumulating totals of numbers. It was exactly the same for accumulating totals of numbers.  In Python, it was the accumulate function. In SAS, it was a do loop that summed each of the previous values.

In terms of iterative ability and working “free style,” I give the edge to SAS.  With Python, it is easy to iterate, but I felt myself having to start all over again having to pre-define columns, packages, etc., in order to complete my analysis.  With SAS, I could just code.  I didn’t have to start over because I created a new column.  I didn’t have to start over because I needed to figure out which package I needed, find it on Github, install it and then import it.

In terms of tabular output, SAS wins.  Easy to read, easy to generate.

In terms of graphical output, Python edges SAS out.  Both are verbose and tedious to get it to work. Python wins because the output is cleaner and there are way more options.

In terms of speed, SAS wins.  On my laptop, I could change the number of rows from 10,000 to 100,000 without noticing much of a difference in speed (0.25 – 0.5 seconds).  In Python, anything over 10,000 got slow.  10,000 rows was 6 seconds, 100,000 rows was 2 minutes 20 seconds.

Of course, this speed has a resource cost.  In those terms, Python wins.  My Anaconda installation is under 2GB of disk space, while my particular deployment of SAS requires 50GB of disk space.

Finally, in terms of mathematics, they tied.  They both produce the same answer as expected.  Of course, I used extremely common packages that are well used and tested.  Newer or more sophisticated packages are often tested against SAS as the standard for accuracy.

But in the end, comparing the two as languages is limited.  Python is much a more versatile object oriented language that has capabilities that SAS doesn’t have.  While SAS’ mature DATA step can do things to data that Python has difficulty with.   But most importantly, is the release of SAS Viya. Through Viya’s open APIs and micro-services, SAS is transforming itself into something more than just a coding language, it aims to be the analytical platform that all data scientists can use to get their work done.

tags: Python, SAS Programmers

The Monty Hall Paradox - SAS vs. Python was published on SAS Users.

10月 062016

One very useful type of auditing for a SAS administrator is to have summary data about the availability and performance of various resources (platforms, servers, services) from the 30,000-foot view.  Using SAS Environment Manager, it's easy to go in and look at the availability of any one resource over various time spans--for the past few hours, past day, past week, or past month and more.  This is a very powerful way to summarize how much of the time a given resource was "up and responsive."


However, there's no way to see that type of information, or even a summary of that information, for all your servers at once.  A typical deployment will have anywhere from 10 to 50 or more different servers, and to view the availability of them all, over an extended period, you would have to visit and drill down each resource, one at a time.

To help answer this problem, I've developed a simple report that summarizes the availability for all servers.  It uses two data sets that are automatically generated as part of the SAS Environment Manager Data Mart - availability and resourceinventory. It doesn't provide  the hour-by-hour information like the Monitoring interface of SAS Environment Manager does, but it gives you  a percent of time available, for each server, for either the past day, week,  month, or even quarter, in one summary report (if you have the data for it).  For a production environment, this could provide helpful "big-picture" information on availability.

Building the report will involve copying an existing report, used as a template, modifying a bit of the metadata for that report, and copying the SAS code that generates the report.  The report uses a stored process to do the work.

Here's how to create the report.

1. Copy the SAS code provided here and save it in a local file:

If you are working on a remote machine, such as through Remote Desktop or mobaXterm, then upload the file to the machine running the metadata server so it will be available to copy into the report metadata.

2. Log into SAS Management Console as sasadm@saspw, the SAS administrator.

3. Select the Folders tab (upper left), right mouse on the Shared Data folder, select New->Folder, and name it Custom Reports:


4. Navigate to the folder Products->SAS Environment Manager->Custom and locate the existing example report called Example 2 Call EV Macros with prompts.


5. Copy the example report into the new Custom Reports folder created in Step 3 above, then use the right mouse menu to rename it to Availability Report.  When complete, you will see the new folder and report inside it:


6. Select the new Availability Report, right mouse, select Properties.

7. On the General tab, specify Name, Description, and Keywords as shown:


It's especially important to have the keywords "Environment Manager" in the Keywords field, so that the report will appear in the Report Center of Environment Manager.

8. On the Execution tab, select your Application server, and below, specify Store source code in metadata.  Then click the Edit Source Code... button:


9. Copy/paste the SAS code (Step 1 above) into the source code window and click OK to save it.


10.  On the Parameters tab, you'll see Titles and Footnotes, and Output Formats and Debugging.  Select and Delete the Titles and Footnotes group; leave the Output Formats and Debugging group as is.


11. Select General, then click the New Group... button (right side), and specify:

Group type:  Standard group

Displayed text: Time Period

Click OK button


12. Select the new group, Time Period.  Click the New Prompt button.  Type the string "TimePeriod" (no space) in the Name field and “Time Period” in the Displayed Text field.  Then click OK.


13.  Select the new prompt TimePeriod, click the Edit button on the right side.  Select the Prompt Type and Values tab at the top.  On the Edit Prompt screen, specify:

Prompt type:   Text

Method for populating prompt:  User selects values from a static list

Number of values: Single value

Use the Add button (right side) to insert the following four choices shown below:


Set the "week" value as the default as shown.   Click the OK button.  Now, select the new Time Period prompt and click the Move Up button (right side), so that the Time Period prompt appears first:


If desired, you can try the Test Prompts button to make sure your prompts are correct.  Click OK again to save all changes.

14. Test the new report:
Log into SAS Environment Manager as an administrator and go to the Report Center. Open the Shared Data / Custom Reports folder and find the new report "Availability Report."  (If you were already logged into Environment Manager, you have to log out/log in again to pick up the new metadata.)


Notice that you have the custom prompt for Time Period , where you can choose either "day," "week," "month" or "quarter" for the time window, and the standard prompt for Outputting Styles and Logging and Debugging settings.   Click the Run button at the bottom, and you should see your new report, similar to this:  (Yellow indicates any percent less than 100.)


You can install this report for any site with the SAS Environment Manager Data Mart active and have it running in just a few minutes.  You could pull the SAS code from it and create your own scheduled run, weekly or monthly, as a means to help to evaluate a deployment for reliability.

A few cautions are in order:

  • You need to have at least some data in your data mart in order to run this report. If you don't have enough data to cover the time period that you request, you will get an error message.
  • The data this report provides is summary level, so if you find one or more servers
    that have been down a significant amount of time, the next step would be to use the SAS Environment Manager to focus in on those troubled resources and examine exactly WHEN they were down, for exactly how long, how many times they were down, and/or look at other metrics on those resources as a way to ascertain causes.
  • This is a simple program without much error checking.  However, you can specify in the parameters to see the SAS log, which should enable you to debug it should issues occur.  And of course feel free to enhance it or add to it!

One easy check you can do by hand is to verify the existence of the data sets being used, and that the libref used, ACM, has been defined.  This is what they looked like on my compute machine, where the SAS Environment Manager Service Architecture was installed:



For additional insight on how you can use SAS Environment Manger and the report center to analyze the "big picture" on a SAS deployment's availability, check out this YouTube video.

SAS code referenced above.

/*       SAS 9.4   M3                             */
/*      Runs report in SAS EV (Report Center) on availability of     */
/*      server resources.  Requires macro variable to be defined     */
/*      TimePeriod, with value of (dtday, dtweek, dtmonth, or dtqtr) */
/*      Requires data sets:  acm.resourceinventory                   */
/*                           acm.availability                        */
/*            from Service Architecture Framework datamart           */
/*                          Dave Naden August 2016                   */
%global _DEBUG
%include "&SASEVCONFIGDIR/Datamart/";
ods _ALL_ CLOSE;
%let evbegin=%ev_startdate(%sysfunc(date()),-10);
%let evend=%sysfunc(datetime(),datetime.);
%let period=&TimePeriod;     /* can be dtday, dtweek, dtmonth, or dtqtr */
%macro available;
/* get the resource IDs for servers only  */
data servers;
   set acm.resourceinventory(keep=id name type invLevel);
      if (invlevel = "SERVER");
     /* remove a few servers that are not really servers  */
      realserver = 1;
      if (index(type,"Directory") > 0) then realserver = 0;
      if (index(type,"Data Mart") > 0) then realserver = 0;
      if (index(type,"Server Context") > 0) then realserver = 0;
      if (index(type,"System Info") > 0) then realserver = 0;
      if (realserver) then output;
      rename id = resource_id;
 proc sort data=servers;
    by resource_id;
/* get availability data, prepare for merge */
 proc sort data = acm.availability(keep=resource_id name avail starttime endtime minutes)
    by resource_id;
 /* keep data only if it's server data, and if there's any records  */
 /*  in the availability data set   */
 data avail;
    merge servers(in=a)
       by resource_id;
       format now datetime24. ;
      if (a) and (avail ne .) ;
 /* get maximum (most recent) endtime  */
 proc summary data=avail;
    output out=timewindow(rename=(endtime=maxendtime)) max=; 
    var endtime;
 /* Get earliest start time for this data set  */
 proc summary data=avail;
    output out=earliesttime(rename=(starttime=earliest)) min=;
    var starttime; 
%let enoughdata=1;  
 /* establish the time window for report requested, and check whether  */
 /* data goes back far enough to run the report.  If not, set flag     */
 data timewindow;
    set timewindow ;
    set earliesttime;
       period = symget('period');
       begintime = intnx(period,maxendtime,-1,'S');
       format begintime datetime24.  ;
       totalminutes = intck('DTMINUTES',begintime,maxendtime,'C');
       if (earliest > begintime) then call symput('enoughdata','0');
/* subset data set to include only records within that time window requested  */
%if &enoughdata %then %do;
data avail;
   if (_N_ = 1) then set timewindow;
   set avail;
      if (endtime < begintime) then delete;
 /* aggregate to resource level (server), calculate percent available */
 data avail(keep=name type resource_id minutesdown percentup  );
    set avail;
       by resource_id;
       retain minutesdown;
          if first.resource_id then minutesdown=0;
          if (avail < 1) then do;
             if (starttime < begintime) then do;
             /* get number of minutes between begintime and endtime */
                minutes = intck('DTMINUTES',begintime,endtime,'C'); 
             minutesdown = minutesdown + minutes;
          if last.resource_id then do;
             if (minutesdown = 0) then percentup = 100;
             else percentup = ((totalminutes - minutesdown) / totalminutes) * 100;
   call symput("begintime",put(begintime,datetime24.));
   call symput("maxendtime", put(maxendtime,datetime24.)) ;
   call symput ("currentdatetime",put(datetime(),datetime24.)) ;
   %let period = %substr(&period,3);        
proc sort data=avail;
   by percentup;
proc report data=avail center split="*";
title "Server-type resources, showing percent up time up, past &period only";
title2 "Date Interval used: &begintime to &maxendtime "; 
footnote "Date of Report:  &currentdatetime ";
   column name type minutesdown percentup; 
   compute percentup;
      if percentup.sum
tags: SAS Administrators, SAS Environment Manager, SAS Professional Services

Auditing SAS server availability from 30,000 feet was published on SAS Users.

10月 052016

streamviewerSAS Event Stream Processing (ESP) cannot only process structured streaming events (a collection of fields) in real time, but has also very advanced features regarding the collection and the analysis of unstructured events. Twitter is one of the most well-known social network application and probably the first that comes to mind when thinking about streaming data source. On the other hand, SAS has powerful solutions to analyze unstructured data with SAS Text Analytics. This post is about merging 2 needs: collecting unstructured data coming from Twitter and doing some text analytics processing on tweets (contextual extraction, content categorization and sentiment analysis).

Before moving forward, SAS ESP is based on a publish and subscribe model. Events are injected into an ESP model using an “adapter” or a “connector.” or using Python and the publisher API Target applications consume enriched events output by ESP using the same technology, “adapters” and “connectors.” SAS ESP provides lots of them, in order to integrate with static and dynamic applications.

Then, an ESP model flow is composed of “windows” which are basically the type of transformation we want to perform on streaming events. It can be basic data management (join, compute, filter, aggregate, etc.) as well as advanced processing (data quality, pattern detection, streaming analytics, etc.).

SAS ESP Twitter Adapters background

SAS ESP 4.2 provides two adapters to connect to Twitter as a data source and to publish events from Twitter (one event per tweet) to a running ESP model. There are no equivalent connectors for Twitter.

Both two adapters are publisher only and include:

  • Twitter Publisher Adapter
  • Twitter Gnip Publisher Adapter

The second one is more advanced, using a different API (GNIP, bought by Twitter) and providing additional capabilities (access to history of tweets) and performance. The adapter builds event blocks from a Twitter Gnip firehose stream and publishes them to a source window. Access to this Twitter stream is restricted to Twitter-approved parties. Access requires a signed agreement.

In this article, we will focus on the first adapter. It consumes Twitter streams and injects event blocks into source windows of an ESP engine. This adapter has free capabilities. The default access level of a Twitter account allows us to use the following methods:

  • Sample: Starts listening on random sample of all public statuses.
  • Filter: Starts consuming public statuses that match one or more filter predicates.

SAS ESP Text Analytics background

SAS ESP 4.1/4.2 provides three window types (event transformation nodes) to perform Text Analytics in real time on incoming events.

The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.


Here are the SAS ESP Text Analytics features:

  • Text Category” window:
    • Content categorization or document classification into topics
    • Automatically identify or extract content that matches predefined criteria to more easily search by, report on, and model/segment by important themes
    • Relies on “.mco” binary files coming from SAS Contextual Analysis solution
  • Text Context” window:
    • Contextual extraction of named entities (people, titles, locations, dates, companies, etc.) or facts of interest
    • Relies on “.li” binary files coming from SAS Contextual Analysis solution
  • Text Sentiment” window:
    • Sentiment analysis of text coming from documents, social networks, emails, etc.
    • Classify documents and specific attributes/features as having positive, negative, or neutral/mixed tone
    • Relies on “.sam” binary files coming from SAS Sentiment Analysis solution

Binary files (“.mco”, “.li”, “.sam”) cannot be reverse engineered. The original projects in their corresponding solutions (SAS Contextual Analysis or SAS Sentiment Analysis) should be used to perform modifications on those binaries.

The ESP project

The following ESP project is aimed to:

  • Wait for events coming from Twitter in the source Twitter window (this is a source window, the only entry point for streaming events)
  • Perform basic event processing and counting
  • Perform text analytics on tweets (in the input streaming, the tweet text is injected as a single field)


Let’s have a look at potential text analytics results.

Here is a sample of the Twitter stream that SAS ESP is able to catch (the tweet text is collected in a field called tw_Text):


The “Text Category” window, with an associated “.mco” file, is able to classify tweets into topics/categories with a related score:


The “Text Context” window, with an associated “.li” file, is able to extract terms and their corresponding entity (person, location, currency, etc.) from a tweet:


The “Text Sentiment” window, with an associated “.sam” file, is able to determine a sentiment with a probability from a tweet:


Run the Twitter adapter

In order to inject events into a running ESP model, the Twitter adapter should be started and is going to publish live tweets into the sourceTwitter window of our model.


Here we search for tweets containing “iphone”, but you can change to any keyword you want to track (assuming people are tweeting on that keyword…).

There are many additional options: -f allows to follow specific user ids, -p allows to specify locations of interest, etc.

Consume enriched events with SAS ESP Streamviewer

SAS ESP provides a way to render events in real-time graphically. Here is an example of how to consume real-time events in a powerful dashboard.



With SAS ESP, you can bring the power of SAS Analytics into the real-time world. Performing Text Analytics (content categorization, sentiment analysis, reputation management, etc.) on the fly on text coming from tweets, documents, emails, etc. and triggering consequently some relevant actions have never been so simple and so fast.

tags: SAS Event Stream Processing, SAS Professional Services, SAS Text Analytics, twitter

How to perform real time Text Analytics on Twitter streaming data in SAS ESP was published on SAS Users.

9月 302016

SecurityIn a number of my previous blogs I have discussed auditing within a SAS environment and how to identity who has accessed data or changed reports. For many companies keeping an audit trail is very important. If you’re an administrator in your environment and auditing is important at your organization, here are a few steps to take to secure the auditing setup and possibly audit any changes made to it, all the while ensuring there are no gaps in collecting this information.

In a SAS deployment the logging configuration is stored in XML files for each server configuration.  The xml files can be secured with OS permissions to prevent unauthorized changes. However, there are a number of ways in which a user can temporarily adjust logging settings which may allow them to prevent audit messages from reaching a log.

Logging can be adjusted dynamically in:

SAS Code using logging functions and macros
SAS Management Console

As an example of how a user could circumvent the audit trail let’s look at an environment where logging has been configured to audit access to SAS datasets using the settings described in this blog. When auditing is enabled, messages are recorded in the log when users access a table. In the test case for the blog we have a stored process that prints a SAS dataset. When the stored process runs, the log will record the table that was opened and the user that opened it.


A user could turn auditing off by adding a call to the log4sas_logger function. The code below included at the start of the Stored Process (or any SAS program) would turn dataset audit logging off for the duration of the execution. Any access of datasets in the Stored Process would no longer be audited.

data _null_;
rc=log4sas_logger(“Audit.Data.Dataset.Open”, “level=off”);

There is an option we can add to the logger to prevent this from happening. The IMMUTABILITY option specifies whether a logger’s additivity and level settings are permanent or whether they can be changed by using the SAS language. In the code above the level of the logger was changed using the log4sas logger function.

If we add the immutability=”true” setting to the logger, this will prevent a user from adjusting the logging level dynamically using the DATA step functions. Attempts to adjust the logger's level will fail with an error. More about the Logging Facility.

< appender-ref ref=”TimeBasedRollingFileAudit”/>
< level value=”Trace”/>
< /logger>


However, this setting does not fully prevent dynamic changes to logging configuration. IMMUTABILITY is ignored for configuration changes made by administrators using SAS Management Console or the IOMOPERATE procedure.

Users who have access to the SAS Management Console Server Manager plug-in can change logging on the fly. In SAS Management Console select Server Manager > SASApp > SAS App Logical Stored Process Server > <machine> right-click and select Connect.  This will display the process ID of an active Stored Process Server. Select the PID and select the loggers tab. This tab displays the current setting for the active loggers on the SAS server. The Audit.Data.Dataset.Open logger has a level of TRACE, this is the setting that is writing audit messages about activity on the dataset to the log.


An administrator can change the level of the logger by right-clicking on the logger name, and selecting Properties. In this example we will turn logging off for the Audit.Data.Dataset.Open logger.


Now any run of a stored process will no longer record audit messages about data access to the log.

Users could also use PROC IOMOPERATE to change the logging configuration on a server. The code below dynamically finds the serverid of the stored process server and then turns audit logging off. The key piece of code is the set attribute line near the end:

%let spawnerURI = %str(iom://server:8581;Bridge;user=sasadm@saspw,pass=???);
proc iomoperate ;
connect uri=”&amp;spawnerURI”;
list spawned;
list spawned out=obj_spawned;
data _null_;
set work.obj_spawned;
where upcase( serverComponent ) like ‘%STORED%’;
call symput(“stpid”, strip(serverid) );
proc iomoperate;
connect uri=”&amp;spawnerURI”
spawned=”&amp;stpid”   ;
set attribute category=”Loggers” name=”Audit.Data.Dataset.Open” value=”OFF”;

You must be a SAS Administrator to change logging configuration dynamically with SAS Management Console and PROC IOMOPERATE. While access to this functionality can be restricted, it does leave a potential gap in the auditing setup.  As a further failsafe you can actually configure auditing so that any changes made to logging configuration results in audit messages about the change in a log file. Effective with SAS 9.4, the following loggers capture these changes:

  • Logging.Configuration.Logger
  • Logging.Configuration.Appender
  • Logging.Configuration.Config

Let’s see how we can audit changes that are made to the Stored Process server’s logging configuration file.  In the existing configuration file, add a new appender. The new appender (below) will write messages to a separate log in the audit directory of the Stored Process server (make sure this directory exists). The log will have a name similar to  AuditLogsSASApp_LoggingChange_2016-09-15_sasbi_16024.log

&lt;!– Rolling log file with default rollover of midnight for Logging Auditing –&gt;
&lt; appender class=”RollingFileAppender” name=”TimeBasedRollingFileLogging“&gt;
&lt; param name=”Append” value=”false”/&gt;
&lt; param name=”Unique” value=”true”/&gt;
&lt; param name=”ImmediateFlush” value=”true”/&gt;
&lt; rollingPolicy class=”TimeBasedRollingPolicy”&gt;
&lt; param name=”FileNamePattern” value=”D:SASvaconfigLev1SASAppStoredProcessServerAuditLogsSASApp_LoggingChange_%d_%S{hostname}_%S{pid}.log”/&gt;
&lt; /rollingPolicy&gt;
&lt; layout&gt;
&lt; param name=”HeaderPattern” value=”Host: ‘%S{hostname}’, OS: ‘%S{os_family}’, Release: ‘%S{os_release}’, SAS Version: ‘%S{sup_ver_long2}’, Command: ‘%S{startup_cmd}'”/&gt;
&lt; param name=”ConversionPattern” value=”%d %-5p [%t] %X{Client.ID}:%u – %m”/&gt;
&lt; /layout&gt;
&lt; /appender&gt;

Now we will add a logger that references the new appender. The Audit.Logging logger will write all changes to loggers and appenders to the referenced appender.  After you make changes to logging configuration make sure you validate the server in SAS Management Console; syntax errors in logging configuration files can render SAS servers inoperable.

&lt; level value=”Trace”/&gt;
&lt; appender-ref ref=”TimeBasedRollingFileLogging”/&gt;
&lt; /logger&gt;

Once you have this set up, any dynamic changes made to logging configuration, either through the SAS language, PROC IOMOPERATE or SAS Management Console, will write a record to the new logging configuration auditing log.


When auditing is a critical requirement for a SAS deployment it is important for administrators to close any gaps which would allow actions to fail to write audit records.

These gaps can be closed by:

  • Securing the logging configuration file using operating system permissions
  • Ensuring that key loggers are configured as immutable
  • Restricting access to the SAS Management Console Server Manager plug-in
  • Ensuring that only key administrators have access to credentials that can change auditing configuration using SAS Management Console and PROC IOMOPERATE
  • Configuring auditing for changes to the logging configuration
tags: SAS Administrators, SAS Professional Services

How to protect your audit trail in a SAS environment was published on SAS Users.

9月 282016

open_source_models_using_sasWith my first open source software (OSS) experience over a decade ago, I was ecstatic. It was amazing to learn how easy it was to download the latest version on my personal computer, with no initial license fee. I was quickly able to analyse datasets using various statistical methods.

Organisations might feel similar excitement when they first employ people with predominantly open source programming skills. . However, it becomes tricky to organize an enterprise-wide approach based solely on open source software. . However, it becomes tricky to organize an enterprise-wide approach based solely on open source software. Decision makers within many organisations are now coming to realize the value of investing in both OSS and vendor provided, proprietary software. Very often, open source has been utilized widely to prototype models, whilst proprietary software, such as SAS, provides a stable platform to deploy models in real time or for batch processing, monitor  changes and update - directly in any database or on a Hadoop platform.

Industries such as pharma and finance have realised the advantages of complementing open source software usage with enterprise solutions such as SAS.

A classic example is when pharmaceutical companies conduct clinical trials, which must follow international good clinical practice (GCP) guidelines. Some pharma organisations use SAS for operational analytics, taking advantage of standardized macros and automated statistical reporting, whilst R is used for the  planning phase (i.e. simulations), for the peer-validation of the results (i.e. double programming) and for certain specific analyses.

In finance, transparency is required by ever demanding regulators, intensified after the recent financial crisis. Changing regulations, security and compliance are mitigating factors to using open source technology exclusively. Basel’s metrics such as PD, LGD and EADs computation must be properly performed. A very well-known bank in the Nordics, for example, uses open source technology to build all type of models including ensemble models, but relies on SAS’ ability to co-exist and extend open source on its platform to deploy and operationalise open source models.

Open source software and SAS working together – An example

The appetite of deriving actionable insight from data is very crucial. It is often believed that when data is thoroughly tortured, the required insight will become obvious to drive business growth. SAS and open source technology is used by various organisations to achieve maximum business opportunities and ROI on all analytics investment made.

Using the flexibility of prototyping predictive model in R and the power and stable platform of SAS to handle massive dataset, parallelize analytic workload processing, a well-known financial institution is combining both to deliver instant results from analytics and take quick actions.

How does this work?

SAS embraces and extends open source in different ways, following the complete analytics lifecycle of Data, Discovery and Deployment.


An ensemble model, built in R is used within SAS for objective comparison within SAS Enterprise Miner (Enterprise Miner is a drag and drop, workflow modelling application which is easy to use without the need to code) – including an R model within the ‘open source integration node.’


Once this model has been compared and the best model identified from automatically generated fit statistics, the model can be registered into the metadata repository making it available for usage on all SAS platform.

We used SAS Model Manager to monitor Probability of Default(PD) and Loss Given Default(LGD) model. All models are also visible to everyone within the organization depending on system rights and privileges and can be used to score and retrain new dataset when necessary. Alerts can also be set to monitor model degradation and automated message sent for real time intervention.


Once champion model was set and published, it was used in Real Time Decision Manager(RTDM) flow to score new customers coming in for loan. RTDM is a web application which allows instant assessment of new applications without the need to score the entire database.

As a result of this flexibility the bank was able to manage their workload and modernize their platform in order to make better hedging decisions and cost saving investments. Complex algorithms can now be integrated into SAS to make better predictions and manage exploding data volumes.


tags: open source, SAS Enterprise Miner

Operationalising Open Source Models Using SAS was published on SAS Users.

9月 262016

mwsug-2016-logoOver the past 37 years I've had the good fortune to be able to attend and present at hundreds of in-house, local, regional, special-interest and international SAS events. I am a conference junkie. I've not only attended thousands of presentations, Hands-On Workshops, tutorials, breakout sessions, quick tips, posters, breakfasts, luncheons, mixers and more, but have had the privilege of hearing, seeing and networking with thousands of like-minded SAS users and presenters as they share valuable tips, techniques, advice, and suggestions on how to best use the SAS software.

For me, attending, volunteering and participating at SAS conferences and events has not only brought personal satisfaction like nothing else, it has allowed me to grow myself professionally and make many life-long friends. One of my objectives while attending a conference is to identify and learn at least three new things I didn't already know about SAS software. These three new things could consist of "cool" programming tips, unique coding techniques, "best" practice conventions, or countless other SAS-related nuggets.

At the upcoming 2016 MidWest SAS Users Group (MWSUG) Educational Forum and Conference, I'll be presenting several topics near and dear to my heart including "Top Ten SAS Performance Tuning Techniques." This 50-minutes presentation highlights my personal top ten list of performance tuning techniques for SAS users to apply in their programs and applications. If you are unable to attend, here are a couple programming tips and techniques from each performance area to consider.

CPU Techniques

1. Use IF-THEN / ELSE or SELECT-WHEN / OTHERWISE in the DATA step, or a Case expression in PROC SQL to conditionally process data.

2. CPU time and elapsed time can be reduced by using the SASFILE statement to process the same data set multiple times.

I/O Techniques

1. Consider using data compression for large data sets.

2. Build and use indexed data sets to improve access to data subsets.

Data Storage Techniques

1. Use data compression strategies to reduce the amount of storage used to store data sets.

2. Use KEEP= or DROP= data set options to retain desired variables.

Memory Techniques

1. Use memory-resident DATA step constructs like Hash objects to take advantage of available memory and memory speeds.

2. Use the MEMSIZE= system option to control memory usage with the SUMMARY procedure.

Want to learn more SAS tips, techniques and shortcuts like these? Please join me at the MidWest SAS Users Group Conference October 9 – 11 at the Hyatt Regency in downtown Cincinnati, Ohio. Register now for three days of great educational opportunities, 100+ presentations, training, workshops, networking and more.

I look forward to meeting and seeing you there!

tags: MWSUG, SAS Programmers, US Regional Conferences

SAS performance tuning - A little bit goes a long way was published on SAS Users.

9月 242016

Every day, more than one hundred thousand SAS users visit our website looking for SAS information and resources. Given its importance to our user base, we’re constantly looking for ways to evolve the site.  Over the next few months, you’ll notice changes to the support website, changes we believe will provide you with a better user experience.

Today, we launch a beta version of six top-level support pages – accessible via your computer, smartphone or tablet by clicking on the banner on The beta pages may look different from what you’re used to, but they are fully functional, and you can use them to fulfill your support needs. During the beta period, all current pages will still be available for your use, but we hope you’ll give the new beta pages a try.

As a company with more than 40 years of experience delivering business analytics software, we focus on making sure that you – our customers – are the beginning point for all of our work – whether we’re developing a new product or designing a new website.

Jim Goodnight, our CEO, put it best when he said, “For the past four decades, we’ve used a simple approach with our customers. We ask them what they want, and then we develop it for them.”

And that’s what’s at the heart of our support site evolution.

We’ve listened to your comments and suggestions. We’ve heard what’s most important to you.  And we’ve designed the new support pages with you in mind, focusing on your key tasks so you can access information, find answers and get help quickly and easily. Important information is now front and center and new, simplified navigation will enable you to get what you need with fewer clicks.  We believe the new pages deliver the experience you’ve asked us for, but we’ll let you be the judge.

During the beta period, we’d really love to hear what you think of the new pages. You’ll find a “give feedback” button at the top of each new page.  Please let us know what’s working well, and what isn’t; what you like, and what you don’t. And remember to give us feedback each time you use the beta pages.

Once the beta period is over and the new pages move into production, we’ll continue to evolve the site, working on lower level pages in an ongoing effort to give you a better support experience. We hope you like the changes we’re making and that you find the new support pages useful.

tags: SAS support site

The evolution of was published on SAS Users.