Tech

11月 152016
 

If your SAS Visual Analytics report requirements include linking out to separate reports without the need to pass values, you may want to consider using images to enhance the appearance of your base report. Here are three style examples using images that you can use depending on your report design requirements and report user preference:

1.     Visually appealing
2.     Generic
3.     Screenshot of actual report.

There is no better substitute for looking at examples so here are some screenshots for you:

1.     Visually appealing

Use images in SAS Visual Analytics

2.     Generic

use-images-in-sas-visual-analytics02

3.     Screenshot of actual report

use-images-in-sas-visual-analytics03

Image selection

Using an image in your report has never been easier. You can navigate your local machine for the image and, if you want, you can also save the image in metadata. This allows other users with access to that metadata location the ability to use the same image. This is great when you want to impose consistency throughout your reports.

use-images-in-sas-visual-analytics04

use-images-in-sas-visual-analytics05

Setting link using image

To define a report link from your image, click on your image then open the Interactions tab. Then use the New drop-down menu and select the type of link you wish to define. For a Report Link or Section Link, use the Browse button to navigate the metadata and select your target. If you are linking to an External URL then enter the fully qualified URL or you can define a link to a stored process.

use-images-in-sas-visual-analytics06

Here is a breakdown of the report objects used in the main dashboard report. I also included the screenshots of my Daily, Weekly, and Monthly report examples.

Dashboard example breakdown

use-images-in-sas-visual-analytics07

Daily report example

use-images-in-sas-visual-analytics08

Weekly report example

use-images-in-sas-visual-analytics09

Monthly report example

use-images-in-sas-visual-analytics10

 

 

tags: SAS Professional Services, SAS Visual Analytics

Use images in SAS Visual Analytics to enhance your report link was published on SAS Users.

11月 102016
 

We all have challenges in getting an accurate and consistent view of our customers across multiple applications or sources of customer information. Suggestion-based matching is a technique found in SAS Data Quality to improve matching results for data that has arbitrary typos and incorrect spellings in it. The suggestion-based concept and benefits were described in a previous blog post. In this post, I will expand on the topic and show how to build a data job that uses suggestion-based matching in DataFlux Data Management Studio, the key component of SAS Data Quality and other SAS Data Management offerings. This article takes a simple example job to illustrate the steps needed to configure suggestion-based matching for person names.

In DataFlux Data Management Studio I first configure a Job Specific Data node to define the columns and example records that I’d like to feed into the matching process. In this example, I use a two column data table made up of Rec_ID and a Name column and sample records as shown below.

suggestion-based-matching-in-sas-data-quality

To build the suggestion-based matching feature, I have to insert and configure at least a Create Match Codes node, a Clustering Node and a Cluster Aggregation node in the data job.

suggestion-based-matching-in-sas-data-quality02

This example uses names with randomly injected typographical errors, like missing characters, additional characters and character transpositions. Please note, these types of data errors would not be matched correctly using the standard matching technique in SAS Data Quality, therefore I am leveraging the suggestion-based matching feature. The picture below shows the person names and highlights the injected errors for Ethan Baker.

suggestion-based-matching-in-sas-data-quality03

For suggestion-based matching in SAS Data Quality I need to use a match definition that supports suggestions. The current Quality Knowledge Base (QKB) for Customer Information ships with a ready to use match definition called “Name (with suggestions).” For other data types, I can easily create new definitions or expand existing definitions to support the suggestions feature as well.

To continue building my suggestion-based matching job I next need to configure the Create Match Codes node as shown in the picture below.

suggestion-based-matching-in-sas-data-quality045

In the property screen of the Create Match Codes node I select the Locale that matches my input data and pick the “Name (with Suggestions)” definition.

Next, I check Allow generation of multiple match codes per definition for each sensitivity in the property window. This enables the data job node to generate suggestions and also create an additional Match Score field as output.

suggestion-based-matching-in-sas-data-quality05

Looking at the output of the Match Codes node, we can see that we generate multiple different match codes (suggestions), and match scores for a single input (Ethan Baker). Because I selected Allow generation of multiple match codes per definition for each sensitivity, the Create Match Code node generates a match code representing the input name, plus additional match codes (suggestions) with character deletions, insertions, replacements and transpositions applied to the input name. The score value, in the Name_Match Code Score column, is an indicator for the closeness of the generated suggestion to the original input data. The more change operations are used to create the suggestion, the lower the score value. Therefore the lower the score the less likely it is that the suggestion is the true name.

Next in configuring my suggestion-based matching job is the Clustering node.

suggestion-based-matching-in-sas-data-quality06

There are no specific configuration needed in the Clustering node when using it with suggestion-based matching. In my example I only set the Name_Match Code field as the match condition and I pass all input fields as output (in Additional Outputs).

suggestion-based-matching-in-sas-data-quality07

Previewing the output of the cluster node I can already see the misspelled names of “Ethan Baker” are clustered correctly. But because I generated multiple suggestions for each input record, I end up with multiple clusters holding the same input records. Ethan Baker, Ethn Baker and Epthan Baker and its suggestions are assigned to cluster 0 to 7 and would also appear in single row clusters further down the output list. It is ok at this step of the data job to have two or more clusters containing the same set of input records when using suggestion-based matching. The next node in the data job will resolve the issue and use the match score to determine the single best cluster.

In the properties of the Cluster Aggregation node I set Cluster ID, Primary Key and Score fields (which were outputs of the previous Cluster node). In order to determine the single best cluster, I select the Cluster as a scoring method and Highest Mean as scoring algorithm. The Cluster Aggregation node will compute the mean value in each cluster. By checking Remove subclusters, I make sure only the cluster with the highest mean is outputted.

suggestion-based-matching-in-sas-data-quality08

With the Cluster Aggregation node configured the output looks like this:

suggestion-based-matching-in-sas-data-quality09

The final output of the Cluster Aggregation is reduced to the eight input records only. With the described set-up I successfully matched names that contain typographical errors like additional or missing characters.

As you can see above, the accuracy of your matching rules, and ultimately, your understanding of your customers, can be augmented through use of suggestion-based matching.

For more information, please refer to the product documentation:

tags: data management, DataFlux Data Management Studio, SAS Data Quality

Using suggestion-based matching in SAS Data Quality was published on SAS Users.

11月 022016
 

With DataFlux Data Management 2.7, the major component of SAS Data Quality and other SAS Data Management solutions, every job has a REST API automatically created once moved to the Data Management Server. This is a great feature and enables us to easily call Data Management jobs from programming languages like Python. We can then involve the Quality Knowledge Base (QKB), a  pre-built set of data quality rules, and do other Data Quality work that is impossible or challenging to do when using only Python.

calling-sas-data-quality-jobs-from-pythonIn order to make a RESTful call from Python we need to first get the REST API information for our Data Management job. The best way to get this information is to go to Data Management Server in your browser where you’ll find respective links for:

  • Batch Jobs
  • Real-Time Data Jobs
  • Real-Time Process Jobs.

From here you can drill through to your job REST API.

Alternatively, you can use a “shortcut” to get the information by calling the job’s REST API metadata URL directly. The URL looks like this:

http://<DM Server>:<port>/<job type>/rest/jobFlowDefns/<job id>/metadata

calling-sas-data-quality-jobs-from-python02

The <job id> is simply the job name (with subdirectory and extension) Base64 encoded. This is a common method to avoid issues with illegal URL characters like: # % & * { } : < > ? / + or space. You can go to this website to Base64 encode your job name.

If you have many jobs on the Data Management Server it might be quicker to use the “shortcut” instead of drilling through from the top.

Here an example to get the REST API information for the Data Management job “ParseAddress.ddf” which is in the subdirectory Demo of Real-Time Data Services on DM Server:

calling-sas-data-quality-jobs-from-python03

We Base64 encode the job name “Demo/ParseAddress.ddf” using the website mentioned above…

calling-sas-data-quality-jobs-from-python04

…and call the URL for the job’s REST API metadata:

http://DMServer:21036/SASDataMgmtRTDataJob/rest/jobFlowDefns/ RGVtby9QYXJzZUFkZHJlc3MuZGRm/metadata

calling-sas-data-quality-jobs-from-python05

From here we collect the following information:

The REST API URL and Content-Type information…
calling-sas-data-quality-jobs-from-python06

…the JSON structure for input data

calling-sas-data-quality-jobs-from-python07

…which we need in this format when calling the Data Management job from Python:

{"inputs" : {"dataTable" : {"data" : [[ "sample string" ],[ "another string" ]], "metadata" : [{"maxChars" : 255, "name" : "Address", "type" : "string"}]}}}

…and the JSON structure for the data returned by the Data Management job.

calling-sas-data-quality-jobs-from-python08

When you have this information, the Python code to call the Data Management job would look like this:

calling-sas-data-quality-jobs-from-python09

The output data from the Data Management job will be in data_raw. We call the built-in JSON decoder from the “request” module to move the output into a dictionary (data_out) from where we can access the data. The structure of the dictionary is according to the REST metadata. We can access the relevant output data via data_out[‘outputs’][‘dataTable’][‘data’]

calling-sas-data-quality-jobs-from-python10

calling-sas-data-quality-jobs-from-python11

The Python program will produce an output like this…

calling-sas-data-quality-jobs-from-python12

You can find more information about the DataFlux Data Management REST API here.

Calling Data Management jobs from Python is straight forward and is a convenient way to augment your Python code with the more robust set of Data Quality rules and capabilities found in the SAS Data Quality solution.

Learn more about SAS Data Quality.

tags: data management, DataFlux Data Management Studio, open source, REST API

Calling SAS Data Quality jobs from Python was published on SAS Users.

10月 252016
 

Requirements that are the most easily described can often be the most difficult to implement. I’m referring to requests like:

  • Display a gauge with the most recently collected metric.
  • Plot a 18 month rolling window of profit.
  • Display last month’s products percent of total metrics for visual comparison.

Okay, so these are pretty specific requests, which I built a report to answer, but none the less, requirements like these do exist.

Use Rank in SAS Visual Analytics

So, how do you implement these requests? Use rank! You might be wondering how this is possible since the rank feature requires a numeric value and these requirements are based on dates. Solution: use the TreatAs function. Let’s break it down step by step.

But first, here is a breakdown of the report objects used in this report. Notice that this report contains a section prompt via a button bar which prompts the user to select a Product Line. This section prompt filters all of the other objects by that Product Line value.

use-rank-in-sas-visual-analytics02

Step 1: Use TreatAs to create a metric from your date category

I am assuming that your data source has a date category. This will work with a date or date by month or date by year formatted data item. So long as the data item is recognized as a date then this technique will work.

This example will use the Date by Month data item. We will use the TreatAs function to create a metric, or in other words, a numeric representation of the date. That’s the great thing about dates in SAS, they simply represent the number of days before or after January 1, 1960. So the most recent the date, the larger the number, which we can then use rank to order.

From the Data tab, use the drop-down menu and select New Calculated Item….

use-rank-in-sas-visual-analytics03

Give your new calculated data item a name.

The result type will be numeric.

Under Operators, use the search window to find the TreatAs function; then drag that onto the visual pane. For the drop-down option, select _Number_.
Finally, drag the date data item onto the visual pane. In this example, we are using Date by Month

use-rank-in-sas-visual-analytics04

Step 2: Change the aggregation on your new measure to be non-additive

Next, we need to make sure this new metric that represents the Date by Month date is non-additive. We will not get the proper result if this new metric takes the sum or average when displayed on a visualization. To do this, navigate to the Data tab and click on the name of the new metric you created. In my example, I created a new metric named DateByMonthNum.

Toward the bottom of the Data tab are the data properties. Under the Aggregation property use the drop-down menu and select one of the non-additive metrics such as: Minimum, Median, or Maximum.

use-rank-in-sas-visual-analytics05

Step 3: Verify that your new measure returns the correct results

Now we can verify that when we rank our new measure, we get the expected results. To do this, I used a list table and added both the date data item Date by Month and the new metric data item DateByMonthNum. Here we can see that when I sort the metric data item by descending I get the expected results where each Date by Month value gives me a different DateByMonthNum value. I can also see that the more recent Date by Month value pairs to a larger DateByMonthNum value.

use-rank-in-sas-visual-analytics06

To be sure that you properly assigned a non-additive aggregation type, you can use the Show detail data property from the Properties tab. At the detail level you should see the same value pairs for the date and metric data items. Once you de-select Show detail data you should see the exact same value pairs. If you do, then you have correctly assigned your non-additive aggregation type.

use-rank-in-sas-visual-analytics07

Step 4: Use Rank to meet report requirements

Now that we have our metric properly created, we can use the Rank feature to display the last month’s metrics or a rolling window.

Last Month’s Metrics
In this visualization I used the Gauge Object.

use-rank-in-sas-visual-analytics08

On the Roles tab, I assigned Profit to the Measure role and Product to the Group role. I then created a five interval Display Rule between 0% and 50% at 10% intervals where anything over 50% is grouped together under the darkest green rule.

use-rank-in-sas-visual-analytics09

Now we must filter this visualization to display only the last month’s profit metrics; we do this by using the Rank feature. From the Ranks tab, you must first select the category data item you wish to subset by the rank. In our example, we want to display the last month’s metrics, so we will want to add a rank for the Date by Month data item. Once selected, click the button Add Rank.

use-rank-in-sas-visual-analytics10

Next we will need to select the metric we want to rank by. Next to the By drop-down; select our newly created metric DateByMonthNum. Then we will want to select the type of rank and how many to return. In this example, we will return the Top Count, i.e. the greatest value. And for the Count we want to return 1.

use-rank-in-sas-visual-analytics11

To help with the titling of the report, I added the exact same rank to a List Table object to display the data’s last month and to help report users know which month they are looking at.

use-rank-in-sas-visual-analytics12

use-rank-in-sas-visual-analytics13

Rolling 18 Month Window
The next visualization I created was a Line Chart Object plotting a rolling window of 18 month profit.

use-rank-in-sas-visual-analytics14

On the Roles tab, I selected Date by Month as the Category and Profit as the Measure.
On the Ranks tab, I selected the same values as I did for the list table and gauge objects, except I selected a Count of 18 to return the top 18 values of Date by Month ranked on our newly created metric DateByMonthNum. The rank will return the top 18 highest values for DatebyMonthNum which pair to the most recent 18 values for Date by Month giving us a rolling 18 month window.

use-rank-in-sas-visual-analytics15

Other Applications

In this example I used Rank at the month level but you could use this technique at the day level, quarter level, essentially for any supported date interval.

Assuming you have the proper data collected, you could also use Rank for the standard use of ranking the top X performing products, sales representatives or investment funds. You could also use rank to identify your bottom performing manufacturing equipment, car mileage, or school ratings.

Other Report Screenshots

use-rank-in-sas-visual-analytics16

use-rank-in-sas-visual-analytics18

tags: SAS Professional Services, SAS Programmers, SAS Visual Analytics

Use Rank in SAS Visual Analytics to display the last date, month or rolling window was published on SAS Users.

10月 212016
 

ProblemSolversHave you ever needed to run code based on the client application that you are using? Or have you needed to know the version of SAS® software that you are running and the operating system that you are running it on? This blog post describes a few automatic macro variables that can help with gathering this information.

Application Name

You can use the &_CLIENTAPP macro variable to obtain the name of the client application. Here are some details:

  • Referencing &_CLIENTAPP in SAS® Studio returns a value of SAS Studio
  • Referencing &_CLIENTAPP in SAS® Enterprise Guide® returns a value of ‘SAS Enterprise Guide
    Note: The quotation marks around SAS Enterprise Guide are part of the value.

Program Name

You can use the &SYSPROCESSNAME macro variable to obtain the name of the current SAS process. Here are some details:

  • Referencing &SYSPROCESSNAME interactively within the DMS window returns a value of DMS Process
  • Referencing &SYSPROCESSNAME in the SAS windowing environment of your second SAS session returns a value of DMS Process (2)
  • Referencing &SYSPROCESSNAME in SAS Enterprise Guide or SAS Studio returns a value of Object Server
  • Referencing &SYSPROCESSNAME in batch returns the word Program followed by the name of the program being run (for example: Program 'c:test.sas')
    Note: For information about other techniques for retrieving the program name, see SAS Note 24301: “How to retrieve the program name that is currently running in batch mode or interactively.”

Example

The following code illustrates how you can use both of these macro variables to check which client application you are using and display a message in the SAS log based on that result:

%macro check;
 
  %if %symexist(_clientapp) %then %do;
   %if &amp;_clientapp = SAS Studio %then %do;
    %put Running SAS Studio;
   %end;
   %else %if &amp;_clientapp= 'SAS Enterprise Guide' %then %do;
    %put Running SAS Enterprise Guide; 
   %end;
  %end;
 
  %else %if %index(&amp;sysprocessname,DMS) %then %do;
    %put Running in Display Manager;
  %end;
  %else %if %index(&amp;sysprocessname,Program) %then %do;
     %let prog=%qscan(%superq(sysprocessname),2,%str( ));
     %put Running in batch and the program running is &amp;prog;
  %end;
 
  %mend check;
 %check

SAS Session Run Mode or Server Type

Another helpful SAS read-only automatic macro variable is &SYSPROCESSMODE. You can use &SYSPROCESSMODE to obtain the current SAS session run mode or server type name. Here is a list of possible values:

• SAS Batch Mode

• SAS/CONNECT Session 

• SAS DMS Session

• SAS IntrNet Server

• SAS Line Mode

• SAS Metadata Server

• SAS OLAP Server

• SAS Pooled Workspace Server

• SAS Share Server

• SAS Stored Process Server

• SAS Table Server

• SAS Workspace Server

Operating System and Version of SAS

Having the information detailed above is helpful, but you might also need to know the operating system and exact version of SAS that you are running. The following macro variables help with obtaining this information.

You can use &SYSSCP and &SYSSCPL to obtain an abbreviation of the name of your operating system.  Here are some examples:

macrovariables

For a complete list of values, see the “SYSSCP and SYSSCPL Automatic Macro Variables” section of SAS® 9.4 Macro Language: Reference, Fourth Edition.

SAS Release

&SYSVLONG4 is the most informative of the macro variables that provide SAS release information. You can use it to obtain the release number and maintenance level of SAS as well as a four-digit year. Here is an example:

%put &amp;sysvlong4;

This code would print something similar to the following in the log:

9.04.01M3D06292015

Here is what this output means:

SAS release: 9.04.01

Maintenance level: M3

Ship Event date: D06292015

I hope that some of the tools described above are useful to you for obtaining information about your SAS environment. If you have any questions, please contact SAS Technical Support, and we will be happy to assist you. Thank you for using SAS!

tags: macro, Problem Solvers, SAS Macro, SAS Programmers

Macro variables that provide information about your SAS® environment was published on SAS Users.

10月 202016
 

SAS Quality Knowledge Base locales in a SAS event stream processing compute windowIn a previous blog post, I demonstrated combining the power of SAS Event Stream Processing (ESP) and the SAS Quality Knowledge Base (QKB), a key component of our SAS Data Quality offerings. In this post, I will expand on the topic and show how you can work with data from multiple QKB locales in your event stream.

To illustrate how to do this I will review an example where I have event stream data that contains North American postal codes.  I need to standardize the values appropriately depending on where they are from – United States, Canada, or Mexico – using the Postal Code Standardization definition from the appropriate QKB locale.  Note: This example assumes that the QKB for Contact Information has been installed and the license file that the DFESP_QKB_LIC environment variable points to contains a valid license for these locales.

In an ESP Compute window, I first need to initialize the call to the BlueFusion Expression Engine Language function and load the three QKB locales needed – ENUSA (English – United States), ENCAN (English – Canada), and ESMEX (Spanish – Mexico).

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window01

Next, I need to call the appropriate Postal Code QKB Standardization definition based on the country the data is coming from.  However, to do this, I first need to standardize the Country information in my streaming data; therefore, I call the Country (ISO 3-character) Standardization definition.

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window02

After that is done, I do a series of if/else statements to standardize the Postal Codes using the appropriate QKB locale definition based on the Country_Standardized value computed above.  The resulting standardized Postal Code value is returned in the output field named PostalCode_STND.

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window03

I can review the output of the Compute window by testing the ESP Studio project and subscribing to the Compute window.

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window04

Here is the XML code for the SAS ESP project reviewed in this blog:

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window05

Now that the Postal Code values for the various locales have been standardized for the event stream, I can add analyses to my ESP Studio project based on those standardized values.

For more information, please refer to the product documentation:

Learn more about a sustainable approach to data quality.

tags: data management, SAS Data Quality, SAS Event Stream Processing, SAS Professional Services

Using multiple SAS Quality Knowledge Base locales in a SAS Event Stream Processing compute window was published on SAS Users.

10月 122016
 

Recently, one of sons came to me and asked about something called “The Monty Hall Paradox.” They had discussed it in school and he was having a hard time understanding it (as you often do with paradoxes).

For those of you who may not be familiar with the Monty Hall Paradox, it is named for the host of a popular TV game show called “Let’s Make a Deal.” On the show, a contestant would be selected and shown a valuable prize.  Monty Hall would then explain that the prize is located just behind one of three doors and asked the contestant to pick a door.  Once a door was selected, Monty would then tease the contestant with cash to get him/her to either abandon the game or switch to another door.  Invariably, the contestant would stand firm and then Monty would proceed to show the contestant what was behind one of the other doors.  Of course, it wouldn’t be any fun if the prize was behind the revealed door, so after showing the contestant an empty door Monty would then ply them with even more cash, in the hopes that they would abandon the game or switch to the remaining door.

Almost without fail, the contestant would stand firm in their belief that their chosen door was the winner and would not switch to the other door.

So where’s the paradox?

When left with two doors, most people assume that they've got a 50/50 chance at winning. However, the truth is that the contestant will double his/her chance of winning by switching to the other door.

After explaining this to my son, it occurred to me that this would be an excellent exercise for coding in Python and in SAS to see how the two languages compared. Like many of you reading this blog, I’ve been programming in SAS for years so the struggle for me was coding this in Python.

I kept it simple. I generated my data randomly and then applied simple logic to each row and compared the results.  The only difference between the two is in how the languages approach it.  Once we look at the two approaches then we can look at the answer.

First, let's look at SAS:

data choices (drop=max);
do i = 1 to 10000;
	u=ranuni(1);
	u2=ranuni(2);
	max=3;
	prize = ceil(max*u);
	choice = ceil(max*u2);
	output;
end;
run;

I started by generating two random numbers for each row in my data. The first random number will be used to randomize the prize door and the second will be used to randomize the choice that the contestant makes. The result is a dataset with 10,000 rows each with columns ‘prize’ and ‘choice’ to represent the doors.  They will be random integers between 1 and 3.  Our next task will be to determine which door will be revealed and determine a winner.

If our prize and choice are two different doors, then we must reveal the third door. If the prize and choice are the same, then we must choose a door to reveal. (Note: I realize that my logic in the reveal portion is somewhat flawed, but given that I am using an IF…ELSE IF and the fact that the choices are random and there isn’t any risk of introducing bias, this way of coding it was much simpler.)

data results;
set choices;
by i;
 
if prize in (1,2) and choice in (1,2) then reveal=3;
else if prize in (1,3) and choice in (1,3) then reveal=2;
else if prize in (2,3) and choice in (2,3) then reveal=1;

Once we reveal a door, we must now give the contestant the option to switch. Switch means they always switch, neverswitch means they never switch.

if reveal in (1,3) and choice in (1,3) then do;
        switch = 2; neverswitch = choice; 
end;
else if reveal in (2,3) and choice in (2,3) then do;
	switch = 1; neverswitch = choice; 
end;
else if reveal in (1,2) and choice in (1,2) then do;
	switch = 3; neverswitch = choice; 
end;

Now we create a column for the winner.  1=win 0=loss.

	switchwin = (switch=prize);
	neverswitchwin = (neverswitch=prize);
run;

Next, let’s start accumulating our results across all of our observations.  We’ll take a running tally of how many times a contestant who switches win as well as for the contestant who never switches.

data cumstats;
set results;
format cumswitch cumnever comma8.;
format pctswitch pctnever percent8.2;
retain cumswitch cumnever;
if _N_ = 1 then do;
	cumswitch = 0; cumnever = 0;
end;
else do;
cumswitch = cumswitch+switchwin;
cumnever = cumnever+neverswitchwin;
end;
 
pctswitch = cumswitch/i;
pctnever = cumnever/i;
run;
 
proc means data=results n mean std;
var switchwin neverswitchwin;
run;
legend1
frame	;
symbol1 interpol=splines;
pattern1 value=ms;
axis1
	style=1
	width=1
	minor=none ;
axis2
	style=1
	width=1
	major=none
	minor=none ;
axis3
	style=1
	width=1
	minor=none ;
title;
title1 " Cumulative chances of winning on Let's Make a Deal ";
 
proc gplot data=work.cumstats;
	plot pctnever * i  /
	areas=1
frame	vaxis=axis1
	haxis=axis2
	lvref=1
	cvref=black
	vzero
	legend=legend1;
plot2 pctswitch * i  = 2 /
  	areas=1
	vaxis=axis3
	vzero
overlay 
 	legend=legend1 ;
run; quit; 

monthy_hall8

The output of PROC MEANS shows that people who always switch (switchwin) have a win percentage of nearly 67%, while the people who never switch (neverswitchwin) have a win percentage of only 33%. The Area Plot proves the point graphically showing that the win percentage of switchers to be well above the non-switchers.

Now let’s take a look at how I approached the problem in Python (keeping in mind that this language is new to me).

Now, let’s look at Python:

Copied from Jupyter Notebook

import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from itertools import accumulate
%matplotlib inline

First let's create a blank dataframe with 10,000 rows and 10 columns, then fill in the blanks with zeros.

rawdata = {'index': range(10000)}
df = pd.DataFrame(rawdata,columns=['index','prize','choice','reveal','switch','neverswitch','switchwin','neverswitchwin','cumswitch','cumnvrswt'])
df = df.fillna(0)

Now let's populate our columns. The prize column represents the door that contains the new car! The choice column represents the door that the contestant chose. We will populate them both with a random number between 1 and 3.

prize=[]
choice=[]
for row in df['index']:
    prize.append(random.randint(1,3))
    choice.append(random.randint(1,3))   
df['prize']=prize
df['choice']=choice

Now that Monty Hall has given the contestant their choice of door, he reveals the blank door that they did not choose.

reveal=[]
for i in range(len(df)):
    if (df['prize'][i] in (1,2) and df['choice'][i] in (1,2)):
        reveal.append(3)
    elif (df['prize'][i] in (1,3) and df['choice'][i] in (1,3)):
        reveal.append(2)
    elif (df['prize'][i] in (2,3) and df['choice'][i] in (2,3)):
        reveal.append(1) 
df['reveal']= reveal

Here's the rub. The contestant has chosen a door, Monty has revealed a blank door, and now he's given the contestant the option to switch to the other door. Most of the time the contestant will not switch even though they should. To prove this, we create a column called 'switch' that reflects a contestant that ALWAYS switches their choice. And, a column called 'neverswitch' that represents the opposite.

switch=[]
neverswitch=[]
for i in range(len(df)):
    if (df['reveal'][i] in (1,3) and df['choice'][i] in (1,3)):
        switch.append(2)
    elif (df['reveal'][i] in (1,2) and df['choice'][i] in (1,2)):
        switch.append(3)
    elif (df['reveal'][i] in (2,3) and df['choice'][i] in (2,3)):
        switch.append(1) 
    neverswitch = choice
df['switch']=switch
df['neverswitch']=neverswitch

Now let's create a flag for when the Always Switch contestant wins and a flag for when the Never Switch contestant wins.

switchwin=[]
neverswitchwin=[]
for i in range(len(df)):
    if (df['switch'][i]==df['prize'][i]):
        switchwin.append(1)
    else:
        switchwin.append(0)    
    if (df['neverswitch'][i]==df['prize'][i]):
        neverswitchwin.append(1)
    else:
        neverswitchwin.append(0)     
df['switchwin']=switchwin
df['neverswitchwin']=neverswitchwin

Now we accumulate the total number of wins for each contestant.

cumswitch=[]
cumnvrswt=[]
df['cumswitch']=list(accumulate(df['switchwin']))
df['cumnvrswt']=list(accumulate(df['neverswitchwin']))

…and divide by the number of observations for a win percentage.

pctswitch=[]
pctnever=[]
for i in range(len(df)):
    pctswitch=df['cumswitch']/(df['index']+1)
    pctnever=df['cumnvrswt']/(df['index']+1)
df['pctswitch']=pctswitch
df['pctnever']=pctnever

Now we are ready to plot the results. Green represents the win percentage of Always Switch, blue represents the win percentage of Never Switch.

x=df['index']
y=df['pctswitch']
y2=df['pctnever']
fig, ax = plt.subplots(1, 1, figsize=(12, 9))
ax.plot(x,y,lw=3, label='Always', color='green')
ax.plot(x,y2,lw=3, label='Never',color='blue',alpha=0.5)
ax.fill_between(x,y2,y, facecolor='green',alpha=0.6)
ax.fill_between(x,0,y2, facecolor='blue',alpha=0.5)
ax.set_xlabel("Iterations",size=14)
ax.set_ylabel("Win Pct",size=14)
ax.legend(loc='best')
plt.title("Cumulative chances of winning on Let's Make a Deal", size=16)
plt.grid(True)

monthy_hall9

Why does it work?

Most people think that because there are two doors left (the door you chose and the door Monty didn’t show you) that there is a fifty-fifty chance that you’ve got the prize.  But we just proved that it’s not, so “what gives”?

Remember that the door you chose at first has a 1/3 chance of winning.  That means that the other two doors combined have a 2/3 chance in winning.  Even though Monty showed us what’s behind one of those two doors, the two of them together still have a 2/3 chance of winning.  Since you know one of them is empty, that means the door you didn’t pick MUST have a 2/3 chance of winning.  You should switch.  The green line in the Python graph (or the red line in the SAS graph) shows that after having run 10,000 contestants through the game the people that always switched won 67% of the time while the people that never switched only won 33% of the time.

My comparisons and thoughts between SAS and Python.

In terms of number of lines of code required, SAS wins hands down.  I only needed 57 lines of code to get the result in SAS, compared to 74 lines in Python. I realize that experience has a lot to do with it, but I think there is an inherent verbosity to the Python code that is not necessarily there in SAS.

In terms of ease of use, I’m going to give the edge to Python.  I really liked how easy it was to generate a random number between two values.  In SAS, you have to actually perform arithmetic functions to do it, whereas in Python it’s a built-in function. It was exactly the same for accumulating totals of numbers. It was exactly the same for accumulating totals of numbers.  In Python, it was the accumulate function. In SAS, it was a do loop that summed each of the previous values.

In terms of iterative ability and working “free style,” I give the edge to SAS.  With Python, it is easy to iterate, but I felt myself having to start all over again having to pre-define columns, packages, etc., in order to complete my analysis.  With SAS, I could just code.  I didn’t have to start over because I created a new column.  I didn’t have to start over because I needed to figure out which package I needed, find it on Github, install it and then import it.

In terms of tabular output, SAS wins.  Easy to read, easy to generate.

In terms of graphical output, Python edges SAS out.  Both are verbose and tedious to get it to work. Python wins because the output is cleaner and there are way more options.

In terms of speed, SAS wins.  On my laptop, I could change the number of rows from 10,000 to 100,000 without noticing much of a difference in speed (0.25 – 0.5 seconds).  In Python, anything over 10,000 got slow.  10,000 rows was 6 seconds, 100,000 rows was 2 minutes 20 seconds.

Of course, this speed has a resource cost.  In those terms, Python wins.  My Anaconda installation is under 2GB of disk space, while my particular deployment of SAS requires 50GB of disk space.

Finally, in terms of mathematics, they tied.  They both produce the same answer as expected.  Of course, I used extremely common packages that are well used and tested.  Newer or more sophisticated packages are often tested against SAS as the standard for accuracy.

But in the end, comparing the two as languages is limited.  Python is much a more versatile object oriented language that has capabilities that SAS doesn’t have.  While SAS’ mature DATA step can do things to data that Python has difficulty with.   But most importantly, is the release of SAS Viya. Through Viya’s open APIs and micro-services, SAS is transforming itself into something more than just a coding language, it aims to be the analytical platform that all data scientists can use to get their work done.

tags: Python, SAS Programmers

The Monty Hall Paradox - SAS vs. Python was published on SAS Users.

10月 062016
 

One very useful type of auditing for a SAS administrator is to have summary data about the availability and performance of various resources (platforms, servers, services) from the 30,000-foot view.  Using SAS Environment Manager, it's easy to go in and look at the availability of any one resource over various time spans--for the past few hours, past day, past week, or past month and more.  This is a very powerful way to summarize how much of the time a given resource was "up and responsive."

auditing-sas-server-availability

However, there's no way to see that type of information, or even a summary of that information, for all your servers at once.  A typical deployment will have anywhere from 10 to 50 or more different servers, and to view the availability of them all, over an extended period, you would have to visit and drill down each resource, one at a time.

To help answer this problem, I've developed a simple report that summarizes the availability for all servers.  It uses two data sets that are automatically generated as part of the SAS Environment Manager Data Mart - availability and resourceinventory. It doesn't provide  the hour-by-hour information like the Monitoring interface of SAS Environment Manager does, but it gives you  a percent of time available, for each server, for either the past day, week,  month, or even quarter, in one summary report (if you have the data for it).  For a production environment, this could provide helpful "big-picture" information on availability.

Building the report will involve copying an existing report, used as a template, modifying a bit of the metadata for that report, and copying the SAS code that generates the report.  The report uses a stored process to do the work.

Here's how to create the report.

1. Copy the SAS code provided here and save it in a local file:

If you are working on a remote machine, such as through Remote Desktop or mobaXterm, then upload the file to the machine running the metadata server so it will be available to copy into the report metadata.

2. Log into SAS Management Console as sasadm@saspw, the SAS administrator.

3. Select the Folders tab (upper left), right mouse on the Shared Data folder, select New->Folder, and name it Custom Reports:

auditing-sas-server-availability02

4. Navigate to the folder Products->SAS Environment Manager->Custom and locate the existing example report called Example 2 Call EV Macros with prompts.

auditing-sas-server-availability03

5. Copy the example report into the new Custom Reports folder created in Step 3 above, then use the right mouse menu to rename it to Availability Report.  When complete, you will see the new folder and report inside it:

auditing-sas-server-availability04

6. Select the new Availability Report, right mouse, select Properties.

7. On the General tab, specify Name, Description, and Keywords as shown:

auditing-sas-server-availability05

It's especially important to have the keywords "Environment Manager" in the Keywords field, so that the report will appear in the Report Center of Environment Manager.

8. On the Execution tab, select your Application server, and below, specify Store source code in metadata.  Then click the Edit Source Code... button:

auditing-sas-server-availability06

9. Copy/paste the SAS code (Step 1 above) into the source code window and click OK to save it.

auditing-sas-server-availability07

10.  On the Parameters tab, you'll see Titles and Footnotes, and Output Formats and Debugging.  Select and Delete the Titles and Footnotes group; leave the Output Formats and Debugging group as is.

auditing-sas-server-availability08

11. Select General, then click the New Group... button (right side), and specify:

Group type:  Standard group

Displayed text: Time Period

Click OK button

auditing-sas-server-availability09

12. Select the new group, Time Period.  Click the New Prompt button.  Type the string "TimePeriod" (no space) in the Name field and “Time Period” in the Displayed Text field.  Then click OK.

auditing-sas-server-availability10

13.  Select the new prompt TimePeriod, click the Edit button on the right side.  Select the Prompt Type and Values tab at the top.  On the Edit Prompt screen, specify:

Prompt type:   Text

Method for populating prompt:  User selects values from a static list

Number of values: Single value

Use the Add button (right side) to insert the following four choices shown below:

auditing-sas-server-availability11

Set the "week" value as the default as shown.   Click the OK button.  Now, select the new Time Period prompt and click the Move Up button (right side), so that the Time Period prompt appears first:

auditing-sas-server-availability12

If desired, you can try the Test Prompts button to make sure your prompts are correct.  Click OK again to save all changes.

14. Test the new report:
Log into SAS Environment Manager as an administrator and go to the Report Center. Open the Shared Data / Custom Reports folder and find the new report "Availability Report."  (If you were already logged into Environment Manager, you have to log out/log in again to pick up the new metadata.)

auditing-sas-server-availability13

Notice that you have the custom prompt for Time Period , where you can choose either "day," "week," "month" or "quarter" for the time window, and the standard prompt for Outputting Styles and Logging and Debugging settings.   Click the Run button at the bottom, and you should see your new report, similar to this:  (Yellow indicates any percent less than 100.)

auditing-sas-server-availability14

You can install this report for any site with the SAS Environment Manager Data Mart active and have it running in just a few minutes.  You could pull the SAS code from it and create your own scheduled run, weekly or monthly, as a means to help to evaluate a deployment for reliability.

A few cautions are in order:

  • You need to have at least some data in your data mart in order to run this report. If you don't have enough data to cover the time period that you request, you will get an error message.
  • The data this report provides is summary level, so if you find one or more servers
    that have been down a significant amount of time, the next step would be to use the SAS Environment Manager to focus in on those troubled resources and examine exactly WHEN they were down, for exactly how long, how many times they were down, and/or look at other metrics on those resources as a way to ascertain causes.
  • This is a simple program without much error checking.  However, you can specify in the parameters to see the SAS log, which should enable you to debug it should issues occur.  And of course feel free to enhance it or add to it!

One easy check you can do by hand is to verify the existence of the data sets being used, and that the libref used, ACM, has been defined.  This is what they looked like on my compute machine, where the SAS Environment Manager Service Architecture was installed:

auditing-sas-server-availability15

auditing-sas-server-availability16

For additional insight on how you can use SAS Environment Manger and the report center to analyze the "big picture" on a SAS deployment's availability, check out this YouTube video.

SAS code referenced above.

/*********************************************************************/
/*   Availability.sas       SAS 9.4   M3                             */
/*      Runs report in SAS EV (Report Center) on availability of     */
/*      server resources.  Requires macro variable to be defined     */
/*      TimePeriod, with value of (dtday, dtweek, dtmonth, or dtqtr) */
/*      Requires data sets:  acm.resourceinventory                   */
/*                           acm.availability                        */
/*            from Service Architecture Framework datamart           */
/*                          Dave Naden August 2016                   */
/*********************************************************************/
 
*ProcessBody;
 
%global _DEBUG
        _EVDEBUG
        _EVODSDEST
        _EVODSSTYLE
        _EVSHOWPARM
        FNOTE1
        FNOTE2
        FNOTE3
        TITLE1
        TITLE2;
 
 
%evdm_kickstart();
%include "&amp;SASEVCONFIGDIR/Datamart/EVDMAUTOEXEC_STP.sas";
 
ods _ALL_ CLOSE;
 
%evodssetopt(evdest=&amp;_evodsdest,style=&amp;_evodsstyle);
%evgopts;
 
%evspinit;
 
%ev_promptselects(varname=_debug);
 
%let evbegin=%ev_startdate(%sysfunc(date()),-10);
%let evend=%sysfunc(datetime(),datetime.);
 
%let period=&amp;TimePeriod;     /* can be dtday, dtweek, dtmonth, or dtqtr */
 
%STPBEGIN;
%macro available;
/* get the resource IDs for servers only  */
data servers;
   set acm.resourceinventory(keep=id name type invLevel);
      if (invlevel = "SERVER");
     /* remove a few servers that are not really servers  */
      realserver = 1;
      if (index(type,"Directory") &gt; 0) then realserver = 0;
      if (index(type,"Data Mart") &gt; 0) then realserver = 0;
      if (index(type,"Server Context") &gt; 0) then realserver = 0;
      if (index(type,"System Info") &gt; 0) then realserver = 0;
      if (realserver) then output;
 
      rename id = resource_id;
 
 proc sort data=servers;
    by resource_id;
 
/* get availability data, prepare for merge */
 proc sort data = acm.availability(keep=resource_id name avail starttime endtime minutes)
    out=avail;
    by resource_id;
 
 /* keep data only if it's server data, and if there's any records  */
 /*  in the availability data set   */
 data avail;
    merge servers(in=a)
          avail;
       by resource_id;
       format now datetime24. ;
      if (a) and (avail ne .) ;
 
 /* get maximum (most recent) endtime  */
 proc summary data=avail;
    output out=timewindow(rename=(endtime=maxendtime)) max=; 
    var endtime;
 
 /* Get earliest start time for this data set  */
 proc summary data=avail;
    output out=earliesttime(rename=(starttime=earliest)) min=;
    var starttime; 
 
%let enoughdata=1;  
 
 /* establish the time window for report requested, and check whether  */
 /* data goes back far enough to run the report.  If not, set flag     */
 data timewindow;
    set timewindow ;
    set earliesttime;
       period = symget('period');
       begintime = intnx(period,maxendtime,-1,'S');
       format begintime datetime24.  ;
       totalminutes = intck('DTMINUTES',begintime,maxendtime,'C');
       if (earliest &gt; begintime) then call symput('enoughdata','0');
run; 
/* subset data set to include only records within that time window requested  */
%if &amp;enoughdata %then %do;
data avail;
   if (_N_ = 1) then set timewindow;
   set avail;
      if (endtime &lt; begintime) then delete;
 
 /* aggregate to resource level (server), calculate percent available */
 data avail(keep=name type resource_id minutesdown percentup  );
    set avail;
       by resource_id;
       retain minutesdown;
          if first.resource_id then minutesdown=0;
          if (avail &lt; 1) then do;
             if (starttime &lt; begintime) then do;
             /* get number of minutes between begintime and endtime */
                minutes = intck('DTMINUTES',begintime,endtime,'C'); 
             end; 
             minutesdown = minutesdown + minutes;
          end;   
          if last.resource_id then do;
             if (minutesdown = 0) then percentup = 100;
             else percentup = ((totalminutes - minutesdown) / totalminutes) * 100;
             output;
          end;
   call symput("begintime",put(begintime,datetime24.));
   call symput("maxendtime", put(maxendtime,datetime24.)) ;
   call symput ("currentdatetime",put(datetime(),datetime24.)) ;
   %let period = %substr(&amp;period,3);        
 
proc sort data=avail;
   by percentup;
 
 
proc report data=avail center split="*";
title "Server-type resources, showing percent up time up, past &amp;period only";
title2 "Date Interval used: &amp;begintime to &amp;maxendtime "; 
footnote "Date of Report:  &amp;currentdatetime ";
   column name type minutesdown percentup; 
   compute percentup;
      if percentup.sum
tags: SAS Administrators, SAS Environment Manager, SAS Professional Services

Auditing SAS server availability from 30,000 feet was published on SAS Users.

10月 052016
 

streamviewerSAS Event Stream Processing (ESP) cannot only process structured streaming events (a collection of fields) in real time, but has also very advanced features regarding the collection and the analysis of unstructured events. Twitter is one of the most well-known social network application and probably the first that comes to mind when thinking about streaming data source. On the other hand, SAS has powerful solutions to analyze unstructured data with SAS Text Analytics. This post is about merging 2 needs: collecting unstructured data coming from Twitter and doing some text analytics processing on tweets (contextual extraction, content categorization and sentiment analysis).

Before moving forward, SAS ESP is based on a publish and subscribe model. Events are injected into an ESP model using an “adapter” or a “connector.” or using Python and the publisher API Target applications consume enriched events output by ESP using the same technology, “adapters” and “connectors.” SAS ESP provides lots of them, in order to integrate with static and dynamic applications.

Then, an ESP model flow is composed of “windows” which are basically the type of transformation we want to perform on streaming events. It can be basic data management (join, compute, filter, aggregate, etc.) as well as advanced processing (data quality, pattern detection, streaming analytics, etc.).

SAS ESP Twitter Adapters background

SAS ESP 4.2 provides two adapters to connect to Twitter as a data source and to publish events from Twitter (one event per tweet) to a running ESP model. There are no equivalent connectors for Twitter.

Both two adapters are publisher only and include:

  • Twitter Publisher Adapter
  • Twitter Gnip Publisher Adapter

The second one is more advanced, using a different API (GNIP, bought by Twitter) and providing additional capabilities (access to history of tweets) and performance. The adapter builds event blocks from a Twitter Gnip firehose stream and publishes them to a source window. Access to this Twitter stream is restricted to Twitter-approved parties. Access requires a signed agreement.

In this article, we will focus on the first adapter. It consumes Twitter streams and injects event blocks into source windows of an ESP engine. This adapter has free capabilities. The default access level of a Twitter account allows us to use the following methods:

  • Sample: Starts listening on random sample of all public statuses.
  • Filter: Starts consuming public statuses that match one or more filter predicates.

SAS ESP Text Analytics background

SAS ESP 4.1/4.2 provides three window types (event transformation nodes) to perform Text Analytics in real time on incoming events.

The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.

text-analytics-on-twitter

Here are the SAS ESP Text Analytics features:

  • Text Category” window:
    • Content categorization or document classification into topics
    • Automatically identify or extract content that matches predefined criteria to more easily search by, report on, and model/segment by important themes
    • Relies on “.mco” binary files coming from SAS Contextual Analysis solution
  • Text Context” window:
    • Contextual extraction of named entities (people, titles, locations, dates, companies, etc.) or facts of interest
    • Relies on “.li” binary files coming from SAS Contextual Analysis solution
  • Text Sentiment” window:
    • Sentiment analysis of text coming from documents, social networks, emails, etc.
    • Classify documents and specific attributes/features as having positive, negative, or neutral/mixed tone
    • Relies on “.sam” binary files coming from SAS Sentiment Analysis solution

Binary files (“.mco”, “.li”, “.sam”) cannot be reverse engineered. The original projects in their corresponding solutions (SAS Contextual Analysis or SAS Sentiment Analysis) should be used to perform modifications on those binaries.

The ESP project

The following ESP project is aimed to:

  • Wait for events coming from Twitter in the source Twitter window (this is a source window, the only entry point for streaming events)
  • Perform basic event processing and counting
  • Perform text analytics on tweets (in the input streaming, the tweet text is injected as a single field)

text-analytics-on-twitter02

Let’s have a look at potential text analytics results.

Here is a sample of the Twitter stream that SAS ESP is able to catch (the tweet text is collected in a field called tw_Text):

text-analytics-on-twitter03

The “Text Category” window, with an associated “.mco” file, is able to classify tweets into topics/categories with a related score:

text-analytics-on-twitter04

The “Text Context” window, with an associated “.li” file, is able to extract terms and their corresponding entity (person, location, currency, etc.) from a tweet:

text-analytics-on-twitter05

The “Text Sentiment” window, with an associated “.sam” file, is able to determine a sentiment with a probability from a tweet:

text-analytics-on-twitter06

Run the Twitter adapter

In order to inject events into a running ESP model, the Twitter adapter should be started and is going to publish live tweets into the sourceTwitter window of our model.

text-analytics-on-twitter07

Here we search for tweets containing “iphone”, but you can change to any keyword you want to track (assuming people are tweeting on that keyword…).

There are many additional options: -f allows to follow specific user ids, -p allows to specify locations of interest, etc.

Consume enriched events with SAS ESP Streamviewer

SAS ESP provides a way to render events in real-time graphically. Here is an example of how to consume real-time events in a powerful dashboard.

streamviewer

Conclusion

With SAS ESP, you can bring the power of SAS Analytics into the real-time world. Performing Text Analytics (content categorization, sentiment analysis, reputation management, etc.) on the fly on text coming from tweets, documents, emails, etc. and triggering consequently some relevant actions have never been so simple and so fast.

tags: SAS Event Stream Processing, SAS Professional Services, SAS Text Analytics, twitter

How to perform real time Text Analytics on Twitter streaming data in SAS ESP was published on SAS Users.

9月 302016
 

SecurityIn a number of my previous blogs I have discussed auditing within a SAS environment and how to identity who has accessed data or changed reports. For many companies keeping an audit trail is very important. If you’re an administrator in your environment and auditing is important at your organization, here are a few steps to take to secure the auditing setup and possibly audit any changes made to it, all the while ensuring there are no gaps in collecting this information.

In a SAS deployment the logging configuration is stored in XML files for each server configuration.  The xml files can be secured with OS permissions to prevent unauthorized changes. However, there are a number of ways in which a user can temporarily adjust logging settings which may allow them to prevent audit messages from reaching a log.

Logging can be adjusted dynamically in:

SAS Code using logging functions and macros
SAS Management Console
Using PROC IOMOPERATE

As an example of how a user could circumvent the audit trail let’s look at an environment where logging has been configured to audit access to SAS datasets using the settings described in this blog. When auditing is enabled, messages are recorded in the log when users access a table. In the test case for the blog we have a stored process that prints a SAS dataset. When the stored process runs, the log will record the table that was opened and the user that opened it.

protect-your-audit-trail-in-a-sas-environment

A user could turn auditing off by adding a call to the log4sas_logger function. The code below included at the start of the Stored Process (or any SAS program) would turn dataset audit logging off for the duration of the execution. Any access of datasets in the Stored Process would no longer be audited.

data _null_;
rc=log4sas_logger(“Audit.Data.Dataset.Open”, “level=off”);
run;

There is an option we can add to the logger to prevent this from happening. The IMMUTABILITY option specifies whether a logger’s additivity and level settings are permanent or whether they can be changed by using the SAS language. In the code above the level of the logger was changed using the log4sas logger function.

If we add the immutability=”true” setting to the logger, this will prevent a user from adjusting the logging level dynamically using the DATA step functions. Attempts to adjust the logger's level will fail with an error. More about the Logging Facility.

&lt; appender-ref ref=”TimeBasedRollingFileAudit”/&gt;
&lt; level value=”Trace”/&gt;
&lt; /logger&gt;

protect-your-audit-trail-in-a-sas-environment6

However, this setting does not fully prevent dynamic changes to logging configuration. IMMUTABILITY is ignored for configuration changes made by administrators using SAS Management Console or the IOMOPERATE procedure.

Users who have access to the SAS Management Console Server Manager plug-in can change logging on the fly. In SAS Management Console select Server Manager > SASApp > SAS App Logical Stored Process Server > <machine> right-click and select Connect.  This will display the process ID of an active Stored Process Server. Select the PID and select the loggers tab. This tab displays the current setting for the active loggers on the SAS server. The Audit.Data.Dataset.Open logger has a level of TRACE, this is the setting that is writing audit messages about activity on the dataset to the log.

protect-your-audit-trail-in-a-sas-environment2

An administrator can change the level of the logger by right-clicking on the logger name, and selecting Properties. In this example we will turn logging off for the Audit.Data.Dataset.Open logger.

protect-your-audit-trail-in-a-sas-environment3

Now any run of a stored process will no longer record audit messages about data access to the log.

Users could also use PROC IOMOPERATE to change the logging configuration on a server. The code below dynamically finds the serverid of the stored process server and then turns audit logging off. The key piece of code is the set attribute line near the end:

%let spawnerURI = %str(iom://server:8581;Bridge;user=sasadm@saspw,pass=???);
proc iomoperate ;
connect uri=”&amp;spawnerURI”;
list spawned;
list spawned out=obj_spawned;
quit;
data _null_;
set work.obj_spawned;
where upcase( serverComponent ) like ‘%STORED%’;
call symput(“stpid”, strip(serverid) );
run;
proc iomoperate;
connect uri=”&amp;spawnerURI”
spawned=”&amp;stpid”   ;
set attribute category=”Loggers” name=”Audit.Data.Dataset.Open” value=”OFF”;
quit;

You must be a SAS Administrator to change logging configuration dynamically with SAS Management Console and PROC IOMOPERATE. While access to this functionality can be restricted, it does leave a potential gap in the auditing setup.  As a further failsafe you can actually configure auditing so that any changes made to logging configuration results in audit messages about the change in a log file. Effective with SAS 9.4, the following loggers capture these changes:

  • Logging.Configuration.Logger
  • Logging.Configuration.Appender
  • Logging.Configuration.Config

Let’s see how we can audit changes that are made to the Stored Process server’s logging configuration file.  In the existing configuration file, add a new appender. The new appender (below) will write messages to a separate log in the audit directory of the Stored Process server (make sure this directory exists). The log will have a name similar to  AuditLogsSASApp_LoggingChange_2016-09-15_sasbi_16024.log

&lt;!– Rolling log file with default rollover of midnight for Logging Auditing –&gt;
&lt; appender class=”RollingFileAppender” name=”TimeBasedRollingFileLogging“&gt;
&lt; param name=”Append” value=”false”/&gt;
&lt; param name=”Unique” value=”true”/&gt;
&lt; param name=”ImmediateFlush” value=”true”/&gt;
&lt; rollingPolicy class=”TimeBasedRollingPolicy”&gt;
&lt; param name=”FileNamePattern” value=”D:SASvaconfigLev1SASAppStoredProcessServerAuditLogsSASApp_LoggingChange_%d_%S{hostname}_%S{pid}.log”/&gt;
&lt; /rollingPolicy&gt;
&lt; layout&gt;
&lt; param name=”HeaderPattern” value=”Host: ‘%S{hostname}’, OS: ‘%S{os_family}’, Release: ‘%S{os_release}’, SAS Version: ‘%S{sup_ver_long2}’, Command: ‘%S{startup_cmd}'”/&gt;
&lt; param name=”ConversionPattern” value=”%d %-5p [%t] %X{Client.ID}:%u – %m”/&gt;
&lt; /layout&gt;
&lt; /appender&gt;

Now we will add a logger that references the new appender. The Audit.Logging logger will write all changes to loggers and appenders to the referenced appender.  After you make changes to logging configuration make sure you validate the server in SAS Management Console; syntax errors in logging configuration files can render SAS servers inoperable.

&lt; level value=”Trace”/&gt;
&lt; appender-ref ref=”TimeBasedRollingFileLogging”/&gt;
&lt; /logger&gt;

Once you have this set up, any dynamic changes made to logging configuration, either through the SAS language, PROC IOMOPERATE or SAS Management Console, will write a record to the new logging configuration auditing log.

protect-your-audit-trail-in-a-sas-environment5

When auditing is a critical requirement for a SAS deployment it is important for administrators to close any gaps which would allow actions to fail to write audit records.

These gaps can be closed by:

  • Securing the logging configuration file using operating system permissions
  • Ensuring that key loggers are configured as immutable
  • Restricting access to the SAS Management Console Server Manager plug-in
  • Ensuring that only key administrators have access to credentials that can change auditing configuration using SAS Management Console and PROC IOMOPERATE
  • Configuring auditing for changes to the logging configuration
tags: SAS Administrators, SAS Professional Services

How to protect your audit trail in a SAS environment was published on SAS Users.