4月 212016

SAS Global ForumThere are hundreds of breakout sessions happening at SAS Global Forum in both the Users and Executive programs. Since we couldn’t just pick one to highlight from opening day, we decided to put together a SAS Global Forum day 1 session roundup, highlighting some of our very favorites!

The big data behind fantasy football

With millions of users, peak traffic seasons and thousands of requests a second for complex user-specific data, fantasy football offers many challenges for even the most talented analytical minds. Clint Carpenter, one of the principal architects of the NFL fantasy football program, shared strategies and lessons learned behind football fanatics’ favorite application.

Fantasy football combines a high volume of users with detailed personalized data; multiple data sources; various data consumers; and high peak volumes of request. The challenge is to process the data from the stadium playing field and user devices, and make it easily accessible to a variety of different services. If there’s something to learn from developing and analyzing fantasy football over the years, Carpenter said it’s these three things: don’t blindly trust data or specifications; spend time planning upfront to avoid problems in the end; and test for data integrity, performance and for the whole system. “If you test well, you will have happy stakeholders,” said Carpenter. “If you don’t, you are asking for unhappy users and sleepless nights.”

More from this presentation.

One university’s solution to the data science talent gap

Is it time for a Ph.D. in data science? If you ask Jennifer Lewis Priestly, who happens to be the director of Kennesaw State University’s new Ph.D. in data science, the answer is yes, but there are areas we have to consider and address in order to make it work.

“Closing the talent gap is a problem and a challenge for our global economy,” said Priestly. The demand for deep analytical talent in the United States could potentially be 50 to 60 percent greater than its projected supply by 2018. And that demand is creating a first for academia, forcing companies across industry sectors to chase the same talent pool of students.

But it’s not just the skills gap that has to be addressed, Priestly said we also have to consider the rising master’s degree explosion. Today, analytically-aligned master’s programs are popping up across the country, and most can be completed between 10 to 18 months. But can institutions transform a student into a data scientist that fast? Offering a data science Ph.D. allows students to dive into the complexity of data science, rather than skim the surface.

So, if we find the talent and design the program, who will teach all of these students? “We have to put these students out into the market to fill these jobs, but we also have to put them back into colleges and universities to train up our future talent,” Priestly said.

View a video of this talk.
View the presentation.

Turning data into stories

Your data matters, but unless people emotionally connect with the data presented, it’s going to fall short. By not offering context, you risk having an audience miss your vision, draw their own conclusions or misunderstand the root of the problem you are trying to solve.

The question then becomes how? How do you actually get someone to engage and connect with the numbers? You’ve got to tell a story. Bree Baich, Principal Learning and Development Specialist with SAS Best Practices, gave her session attendees tips and tricks to turning data into stories that make sense.

“Data storytelling is a real thing, connecting the logical and emotional parts of our brain to not just make sense of the data, but to connect it in a way that causes a response,” Baich said. With an easy, four-step plan, Baich helped attendees see how getting data-driven stories is easier than we think.

  1. The story setup allows your audience to become curious and garner interest from the start. It’s a way to spark curiosity upfront by using a hook.
  2. The context paints a picture of the current realities, providing real understanding of the information at hand.
  3. The options show your audience where you want them to go. Think of it as an opportunity to demonstrate why your option is the better choice that will make a real difference.
  4. The action leaves a call to action and is key to pushing stakeholders to make a decision or getting customers to purchase.
    Remember, data shouldn’t stand alone. Next time, shape it with a story!

More from this presentation.







tags: SAS Global Forum

Highlights from SAS Global Forum 2016: Fantasy football, the data science talent gap and data storytelling was published on SAS Users.

4月 192016

Copy Data to Hadoop using SASWith the release of SAS® 9.4 M3, you can now access SAS Scalable Performance Data Engine (SPD Engine) data using Hive. SAS provides a custom Hive SerDe for reading SAS SPD Engine data stored on HDFS, enabling users to access the SPD Engine table from other applications.

The SPD Engine Hive SerDe is delivered in the form of two JAR files. Users need to deploy the SerDe JAR files under “../hadoop-mapreduce/lib” and “../hive/lib” on all nodes of a Hadoop cluster to enable the environment. To access the SPD Engine table from Hive, you need to register the SPD Engine table under Hive metadata using the metastore registration utility provided by SAS.

The Hive SerDe is read-only and cannot serialize data for storage in HDFS. The Hive SerDe does not support creating, altering, or updating SPD Engine data in HDFS using HiveQL or other languages. For those functions, you would use the SPD Engine with SAS applications.


Before you can access an SPD Engine table using Hive SerDe you have to perform the following:

  • Deploy the SAS Foundation software using SAS Deployment Wizard.
  • Select the product name “SAS Hive SerDe for SPDE Data”

Accessing SPD Engine Data using Hive

This will create a subfolder under $sashome with the SerDe JAR files

[root@sasserver01 9.4]# pwd
[root@sasserver01 9.4]# ls -l
total 88
drwxr-xr-x. 3 sasinst sas 4096 Mar 8 15:52 installs
-r-xr-xr-x. 1 sasinst sas 8615 Apr 15 2015
-rw-r--r--. 1 sasinst sas 62254 Jun 24 2015 sas.HiveSerdeSPDE.jar
-rw-r--r--. 1 sasinst sas 6998 Jun 24 2015 sas.HiveSerdeSPDE.nls.jar
[root@sasserver01 9.4]#

  • You must be running a supported Hadoop distribution that includes Hive 0.13 or later:

> Cloudera CDH 5.2 or later
> Hortonworks HDP 2.1 or later
> MapR 4.0.2 or later

  • The SPD Engine table stored in HDFS must have been created using the SPD Engine.
  • The Hive SerDe is delivered as two JAR files, which must be deployed to all nodes in the Hadoop cluster.
  • The SPD Engine table must be registered in the Hive metastore using the metastore registration utility supplied by SAS. You cannot use any other method to register tables.

Deploying the Hive SerDe on the Hadoop cluster

Deploy the SAS Hive SerDe on the Hadoop cluster by executing the script “”. This script is located in the SAS Hive SerDe software deployed folder. Follow the steps below, which describe the SAS Hive SerDe deployment on a Hadoop cluster.

  • Copy the script file along with two JAR files to one of the nodes (NameNode server). For example, in my test environment, files were copied to the sascdh01 (NameNode) server with user ‘hadoop’.

[hadoop@sascdh01 SPDEHiveSerde]$ pwd
[hadoop@sascdh01 SPDEHiveSerde]$ ls -l
total 84
-rwxr-xr-x 1 hadoop hadoop 8615 Mar 8 15:57
-rw-r--r-- 1 hadoop hadoop 62254 Mar 8 15:57 sas.HiveSerdeSPDE.jar
-rw-r--r-- 1 hadoop hadoop 6998 Mar 8 15:57 sas.HiveSerdeSPDE.nls.jar
[hadoop@sascdh01 SPDEHiveSerde]$

  • The node server (NameNode) must be able to use SSH to access the other data nodes in cluster. It’s recommended to execute the deployment script as user ‘root’ or with sudo su command.
  • Switch user to ‘root’ or user with ‘sudo su’ permission.
  • Set the Hadoop CLASSPATH to include the MapReduce and Hive Library installation directory. Set the SERDE_HOSTLIST to include the server where JAR files will be deployed. For example, for my test environment the following statement is used.

export CLASSPATH=/usr/lib/hive/lib/*:/usr/lib/hadoop-mapreduce/lib/*

  • Execute the script as user ‘root’ to deploy the JAR files on all nodes under “ ../hive/lib” and “../hadoop-mapreduce/lib” subfolder. While running the script, provide the location of MapReduce and the Hive library installation folder as parameters to script.

For example:

sh -mr /usr/lib/hadoop-mapreduce/lib -hive /usr/lib/hive/lib
[root@sascdh01 SPDEHiveSerde]# sh -mr /usr/lib/hadoop-mapreduce/lib -hive /usr/lib/hive/lib
scp -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null /root/Downloads/SPDEHiveSerde/sas.HiveSerdeSPDE.jar root@sascdh01:/usr/lib/hive/lib
scp -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null /root/Downloads/SPDEHiveSerde/sas.HiveSerdeSPDE.nls.jar root@sascdh03:/usr/lib/hadoop-mapreduce/lib
[root@sascdh01 SPDEHiveSerde]#

  • Restart YARN/MapReduce and Hive services on the Hadoop cluster.

Registering the SAS Scalable Performance Data Engine table in Hive metadata

The SPD Engine table that you are planning to access from Hive must be registered to Hive metadata using the SAS provided metadata registration utility. You cannot use any other method to register tables. The utility reads an SPD Engine table’s metadata file (.mdf) in HDFS and creates Hive metadata in the Hive metastore as table properties. Registering the SPD Engine table projects a schema-like structure onto the table and creates Hive metadata about the location and structure of the data in HDFS.

Because the utility reads the SPD Engine table’s metadata file that is stored in HDFS, if the metadata is changed by the SPD Engine, you must re-register the table.

The metadata registration utility can be executed from one of the Hadoop cluster node server, preferably NameNode server. The code examples mentioned here are all from NameNode server.

The following steps describe the SPD Engine table registration to Hive metadata.

  • Set the Hadoop CLASSPATH to include a directory with the client Hadoop configuration files and SerDe JAR files.

The following example is from my test environment where two SerDe JAR files are copied under the “/home/hadoop/SPDEHiveSerde/” subfolder. This subfolder is owned by OS user ‘hadoop’, i.e., the user who will execute the table registration utility. While exporting CLASSPATH, you must also include ../hive/lib folder as part of classpath. For the Hadoop configuration XML file, here we are using /etc/hive/conf folder. If you have a separate folder for storing Hadoop configuration files, you can plug in that folder.

export CLASSPATH=/home/hadoop/SPDEHiveSerde/*:/usr/lib/hive/lib/*
export SAS_HADOOP_CONFIG_PATH=/etc/hive/conf/

As a result of exporting Hadoop CLASSPATH, the output from ‘hadoop classpath’ statement should look like as follows. Notice the value that you included in your previous export statement.

[hadoop@sascdh01 ~]$ hadoop classpath
[hadoop@sascdh01 ~]$

  • Run the SerDe JAR command with appropriate command parameters and options to register the SPD Engine table. For example, the following command executes the SerDe JAR files and registers an SPD Engine table named stocks. It specifies the HDFS directory location (/user/lasradm/SPDEData) that contains the .mdf file of that SPD Engine table. The –table and –mdflocation parameters are required.

hadoop jar /home/hadoop/SPDEHiveSerde/sas.HiveSerdeSPDE.jar -table stocks -mdflocation /user/lasradm/SPDEData

[hadoop@sascdh01 ~]$ hadoop jar /home/hadoop/SPDEHiveSerde/sas.HiveSerdeSPDE.jar -table stocks -mdflocation /user/lasradm/SPDEData
16/03/09 16:46:35 INFO hive.metastore: Trying to connect to metastore with URI thrift://
16/03/09 16:46:35 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/03/09 16:46:36 INFO hive.metastore: Connected to metastore.
16/03/09 16:46:36 INFO hive.MetastoreRegistration: Table is registered in the Hive metastore as default.stocks
[hadoop@sascdh01 ~]$

Reading SAS Scalable Performance Data Engine table data from Hive

Once the SPD Engine table is registered in Hive metadata, you can query the SPD Engine table data via Hive. If you describe the table with the formatted option, you will see that the data file locations are the SPD Engine locations. The Storage section provides information about SerDe library, which is ‘’.

hive> show tables;

Time taken: 0.025 seconds, Fetched: 15 row(s)

hive> select count(*) from stocks;
Query ID = hadoop_20160309171515_9db3aed5-0ba4-40cc-acc4-56acee10a275
Total jobs = 1
Total MapReduce CPU Time Spent: 2 seconds 860 msec
Time taken: 38.734 seconds, Fetched: 1 row(s)

hive> describe formatted stocks;
# col_name data_type comment

stock varchar(9) from deserializer
date date from deserializer
open double from deserializer
high double from deserializer
low double from deserializer
close double from deserializer
volume double from deserializer
adjclose double from deserializer

# Detailed Table Information
Database: default
Owner: anonymous
CreateTime: Wed Mar 09 16:46:36 EST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://
Table Parameters:
adjclose.length 8
adjclose.offset 48
close.length 8
close.offset 32
date.length 8
date.offset 0
high.length 8
high.offset 16
low.length 8
low.offset 24
numFiles 0
numRows -1
open.length 8
open.offset 8
rawDataSize -1
spd.byte.order LITTLE_ENDIAN
spd.column.count 8
spd.encoding ISO-8859-1
spd.mdf.location hdfs://
spd.record.length 72
spde.serde.version.number 9.43
stock.offset 56
totalSize 0
transient_lastDdlTime 1457559996
volume.length 8
volume.offset 40

# Storage Information
SerDe Library:
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.166 seconds, Fetched: 60 row(s)

How SAS Scalable Performance Data Engine SerDE reads data

The SerDe reads the data using the encoding of the SPD Engine table. Make sure that the SPD Engine table name is appropriate for the encoding associated with the cluster.

Current SerDe Implementation of Data Conversion from SAS to Hive

Accessing SPD Engine Data using Hive2


If the SPD Engine table in HDFS has any of the following features, it cannot be registered in Hive or use the SerDe. You must access it by going through SAS and the SPD Engine. The following table features are not supported:

  • Compressed or encrypted tables
  • Tables with SAS informats
  • Tables that have user-defined formats
  • Password-protected tables
  • Tables owned by the SAS Scalable Performance Data Server

Reference documents

SAS(R) 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System, Third Edition


tags: SAS Administrators, SAS Professional Services

Accessing SPD Engine Data using Hive was published on SAS Users.

4月 162016

SAS Global ForumIs it just me, or does it feel a little bit like Christmas Eve? I think it's because SAS Global Forum 2016 is right around the corner, and for many SAS users, it's the most wonderful time of the year. If you're heading to Las Vegas, get ready for three days of learning from SAS peers, exchanging ideas, discovering new techniques for using SAS, and maybe, if you play your cards right (see what I did there?), a dash of fun as well. If only there was something exciting to do in Las Vegas...

All this sounds great if you're one of the 5,000 SAS users who will be at the event (April 18-21 @ The Venetian), right? But what if you can't make the trip to Las Vegas? Is there another way to experience some of the great content that will be shared there? I'm happy to say the answer is yes!

This year, SAS will provide dozens of hours of live video streaming from the event, so you can watch select sessions from the Users and Executive Programs from wherever you are. Live coverage will include Opening Session, all the keynote talks, select breakouts, Tech Talks, updates from The Quad, interviews with SAS executives and developers, and more. Additional videos will be available on the SAS Global Forum Video Portal. Here you'll find featured, most popular, and how-to videos, as well as episodes of Inside SAS Global Forum. You can even view videos from past events. Coverage will be available for on-demand viewing after the conference as well.

Video not your thing? No worries. SAS will provide several other ways to stay up to date. For starters, you can read any of a number of blog posts from the event. Posts will come from many different SAS blogs, but all posts from SAS Global Forum will be aggregated here.

If you're on LinkedIn, Twitter or Facebook, you can stay connected with what's happening and engage with attendees on SAS’ social media channels. Join the conversation, comment on some of the cool presentations you attended or viewed, discuss the exciting news coming out of the event, or simply follow along. The channels sure to have the most activity are the SAS Users LinkedIn group, the SAS Twitter account, and the SAS Users Group Facebook page. The hashtag for SAS Global Forum is #SASGF; be sure to use the hashtag in all your posts.

With all the opportunities to follow along, connect and contribute, you can be a part of SAS Global Forum 2016, whether you're attending in person or not. And if you're a SAS user, that's almost as exciting as a visit from Santa.

tags: Livestream, SAS Global Forum

How to participate in SAS Global Forum 2016...even if you're not going was published on SAS Users.

4月 142016

protecting-personal-identityHave you ever used SAS to produce reports for publishing? Have you ever thought of or been told about suppressing data in such reports? Why do we need to suppress (in the sense of withholding, concealing, obscuring or hiding) data in published reports?

The reason is simple - in order to protect privacy of individuals - personally identifiable information (PII) - data that could potentially identify specific individuals and their sensitive or confidential information. Such sensitive data can include health insurance and medical records, age, ethnicity, race, gender, education, political or religious believes, financial and credit information, geographical location, criminal history, student education records, etc.

In the U.S., such information is considered confidential and protected by Federal Law, e.g. HIPAA - Health Insurance Portability and Accountability Act and FERPA - Family Educational Rights and Privacy Act. Many other countries have similar laws and regulations.

When SAS is used to generate and publish reports, we need to be on a lookout in order not to break the law since demographic component of any report has a potential of breaching privacy protection, especially when we report on a small group of people. For small reporting samples, even when we publish aggregated reports, there is still a risk of possible deducing or disaggregating personal data.

Grouping for data suppression

One way of obscuring small count numbers to protect people privacy is to lump them up into a larger group, call it “Others” and leave it there. However, while protecting PII this method distorts composition of the report group as it can put different demographic characteristics into “Other” category for different report groups thus making it impossible to compare them side by side.

Using custom formats for data suppression

Another way to suppress or mask small numbers is to use SAS custom formats. Let’s say we want to suppress all numbers in the range of 1 through 6, but show all other numbers as comma-formatted. We can create the following SAS user-defined custom format:

proc format;
	value suppress

This works just fine for a single variable (list) frequency or cross-tabulation frequency numbers as long as there are no Total column or Total row presented. If Totals by row or column are reported then the suppressed small number cell can be easily derived from those totals and the values of the other unsuppressed numbers.

Primary and complementary data suppression

When a row or column contains just a single suppressed small number and in addition to frequencies (or counts) we also report totals by rows and/or columns, primary data suppression that is based on the number value itself is not enough for privacy protection as it can be calculated as total minus sum of the other reported numbers in that row or column. To exclude the possibility of deducing suppressed data, this primary data suppression needs to be complemented by the secondary data suppression when a second small number in that row or column also needs to be suppressed. What further complicates matter is that as long as we suppress that second number in a row, that suppressed cell may become the only suppressed cell across the column. In such a case, it can be derived from the column total and other numbers in that column. To make sure we really protect privacy and eliminate the possibility of unraveling suppressed data in a manner we solve a Sudoku puzzle, we need to complement that secondary suppression with tertiary data suppression of that cell in that column. We will refer to those secondary, tertiary, and so on data suppressions as complementary data suppressions.

Depending on the cardinality of classification variables and data combination, the number of those complementary suppression iterations may vary.

Suppression rules

Unfortunately, there is no unambiguous scientific/statistical definition of “small count numbers” that are subject to suppression to protect personally identifiable information. Such definition of “small count numbers” that need to be suppressed in aggregated reports, suppression rules and suppression protocols vary by jurisdiction (country, state), industry, agency, study, data source, demographic characteristic, effective date, etc. Such definitions can be based on an interval between 1 and a certain threshold (in most cases between 4 and 15). For example, the Environmental Public Health Tracking Network currently suppresses cells based on non-zero counts less than six.

Interestingly, that zero usually is not considered as a “small number” to be suppressed, as zero reveals nobody’s info. Although there might be an interpretation of zeros as not suitable for primary suppression, but to be a fair game for complementary suppression, I don’t think it would make a lot of sense as it would make suppression inconsistent and dependent on the data composition. Besides, suppressing zeros and counting them as complementary may actually aid deducing values of other cells in a row or column that it is supposed to protect. Therefore, we will treat zeros as not to be suppressed neither in primary nor in complementary suppression. The reason is that zeros represent very useful and unique information without compromising any confidentiality, while “small numbers” are not so unique compared to zeros.

Numbers for totals by rows or columns are also subject to suppression by the same measure as the numbers for cross-tabulation cells.

To make our SAS implementation of the small cell numbers suppression as general as possible, we will use a range from 1 through a threshold denoted as &sup_max, where sup_max is a macro variable that can be assigned a value of any natural number.

Automatic suppression algorithm

To fully automate the process of data suppression I developed a simple yet efficient iterative algorithm for implementing small numbers cell suppression using SAS for a two-dimensional tabular report.

  1. Use proc means to calculate count totals and output results in a data table. Class variable values for the totals will have blank values.
  2. Sort the data table from step 1 by class-variable-1, count. That will order counts from smallest to largest within each value of the class-variable-1 (including blanks for totals).
  3. Read through the sorted data table from step 2 by class-variable-1 group and
    • Replace all count values within [1, &sup_max] interval with a special missing value, .S. This is the primary suppression. All primary suppression “suspects” are “eliminated” during this first iteration.
    • Also replace all count variables that are next larger to a suppressed value in a group. This is the secondary complementary suppression.
    • If there are no numbers found to be eligible for primary suppression, the whole suppression process stops after this step without suppressing any data.
  4. Repeat steps 2 and 3 with class-variable-2 variable instead of class-variable-1. This is further complementary suppression.
  5. Keep repeating steps 2 and 3 while flipping class-variable-1 and class-variable-2 until step 3 produces no suppressed value.
  6. Use proc report to output the results with .S values formatted to *.

Suppression implementation with SAS

You can view, download and run the full sample of SAS suppression code.

The following are the code highlights.

  1. Sample data. Since I am creating sample data table within the SAS code itself, I started with a pre-aggregated data table just to cut down on the code size – the code shall work as fine using a non-aggregated input table with one observation per person.
  2. In the Assign macro variables section we assign values for the classification variable names, suppression threshold value (%let sup_max = 6;) and global macro variable sup_flag (%let sup_flag = 1;). It makes this suppression code implementation ready to be used within a SAS macro or a SAS Stored Process (STP).
  3. Proc format defines formats (notsorted) for class variables with blank value ' ' reserved for ‘Total’. It also defines format cntf for counts and a format $totbgf for background color of the Total column:
    proc format;
    	value $racef (notsorted)
    	'M' = 'Martian'
    	'A' = 'Asian'
    	'B' = 'Black'
    	'H' = 'Hispanic'
    	'W' = 'White'
    	' ' = 'Total'
    	value $agegf (notsorted)
    	'1' = '<18'
    	'2' = '19-64'
    	'3' = '65-99'
    	'4' = '100+'
    	' ' = 'Total'
    	value cntf
    	.S = '*'
    	. = '0'
    	other = [comma18.0]
    	value $totbgf 
    	'Total' = "&hbcolor"
  4. Proc means with completetypes and preloadfmt options to get all possible combinations of the class variable values:
    proc means data=people_roster noprint completetypes;
    	format &tclass1 &f1.. &tclass2 &f2..;
    	class &tclass1 &tclass2 / preloadfmt order=data;
    	var count;
    	output out=people_sup sum(count)=count;
  5. Calculate number of columns in the tabular report (needed for cosmetic purpose to apply distinct color to the background of the Total column):
    /* get number of columns for table report */
    data _null_;
    	set people_sup1 (where=(_type_ eq 1)) end=e;
    	totcol + 1; 
    	if e then call symput('totcol',strip(put(totcol + 2,best.))); *<-- Column number for Total;
  6. Suppression macro %suppress parses &dsname table created with proc means above by one of the class variables &supclass and replaces small numbers between 1 and &sup_max with special missing value .S (primary suppression). It also suppresses values next to the primary suppressed value if that value is the only suppressed value in the by group (secondary suppression). In addition, this macro assigns global macro variable &sup_flag to 1 if at least one suppression was done, and to 0 otherwise:
    %macro suppress(dsname=,supclass=);
    	proc sort data=&dsname;
    		by &supclass count;
    	%let sup_flag = 0;
    	data &dsname (drop=_supnum_);
    		set &dsname;
    		by &supclass count;
    		if first.&supclass then _supnum_ = 0;
    		if (1 le count le &sup_max) or (_supnum_ eq 1 and count ne .S) then
    			count = .S;
    			call symput('sup_flag','1');
    		if (count eq .S) then _supnum_ + 1;
    %mend suppress;
  7. Macro %iterate_suppression to iterate macro %suppress call while it finds new cells to suppress (produces &sup_flag=1). For each iteration, &class1 and &class2 are swapped. This macro is invoked immediately after its definition.
    %macro iterate_supression (class1=, class2=);
    	%let nextclass = &class1;
    	%do %while (&sup_flag);
    		%suppress(dsname=people_sup, supclass=&nextclass);
    		%if &nextclass eq &class1
    			%then %let nextclass = &class2;
    			%else %let nextclass = &class1;
    %mend iterate_supression;
    %iterate_supression (class1=&tclass1, class2=&tclass2);
  8. Conditionally generate footnote if at least one cell is suppressed:
    %let ftnote=;
    data _null_;
    	set people_sup (where=(count eq .S));
    	call symput('ftnote',"j=L h=10pt '*) The data is suppressed to protect privacy'");
    footnote &ftnote;
  9. Proc report to generate table output. Option preloadfmt preserves the classification variable values order as they are defined by proc format:
    proc report data=people_sup nowd split='*' spanrows missing 
                style(header)=[font_size=11pt just=C color=#&hfcolor]
                style(column)=[font_size=10pt just=L];
          column &tclass1 count,&tclass2 count=tot;
          define &tclass1 / group  order=data "&tlabel1" f=&f1.. preloadfmt style(column)=header[vjust=m cellwidth=1.5in color=#&hfcolor]; 
          define &tclass2 / across order=data "&tlabel2" f=&f2.. preloadfmt style(header)=[background=$totbgf.];;
          define count    / display 'People Reported' style(column)=[just=R cellwidth=1in] f=cntf.;
    	  define tot      / sum noprint;
    	  compute &tclass1;
    		if &tclass1 eq ' ' then
    			call define(_col_,'style',"style=[just=C font_size=11pt color=#&hfcolor]");
                call define(_row_,'style',"style=[font_weight=bold background=#&hbcolor]");
    	  compute &tclass2;
               call define("_c&totcol._",'style',"style=[font_weight=bold just=R background=#&hbcolor]");

Suppression results

Notice, that all the data suppression manipulations are performed on a single summary data table produced by proc means at the very beginning. Since all the suppression processing is done on a small summary table, it makes its runtime insignificant. In the end, the report has all the small number cells suppressed by both, their values (primary suppression) and the values that complement small numbers in each row and each column (complementary suppression). There is no single suppressed cell in any row or column. This eliminates risk of uniquely deducing suppressed numbers. The following proc report output illustrates the resulting SAS report which is safe to publish:


Behind the curtain

In order to illustrate what is going on during execution of this suppression algorithm, I slightly modified the program to preserve all the interim data (work.people_sup1, work.people_sup2, . . . ) and print interim reports so you can see exactly what is going on at each iteration. You can view, download and run SAS suppression code version that produces interim results. Below are the reports produced at each iteration.

  1. Suppression iteration 1 (searching horizontally; if start searching vertically we still will arrive to the same end result)
    1. Start (here we do primary suppression of numbers between 1 and 6, as well as complementary suppression of second closest number; this iteration takes care of ALL primary suppressions since we search through every cell in a table):sas-report-suppression-iteration-1-start
    2. Finish:
  2. Suppression iteration 2 (searching vertically)
    1. Start:
    2. Finish:
  3. Suppression iteration 3 (searching horizontally)
    1. Start:
    2. Finish:
  4. Suppression iteration 4 (searching vertically)
    1. Start:
    2. Finish (now we have at least two suppressed cell in each column):
  5. Iteration 5 (searching horizontally) did not produced any suppressed cells since we have at least two suppressed cell in each row; flag &sup_flag is set to 0 by macro suppress and the whole suppression process automatically stops.


Your comments and suggestions are welcome as always.



Automatic data suppression in SAS reports was published on SAS Users.

4月 132016

A SAS technical resource guideOn the search for some SAS notes, workarounds or sample code? What you may not know is that there are a ton of SAS resources out there providing examples, tips and tricks that are great to have in your tool belt! Many of these resources provide sufficient insight and resolutions without the need for Technical Support assistance… you just have to know where to look!

That’s where I can help. Below, you’ll find a few handy technical resources that are great references, and definitely worth bookmarking if you haven’t already!

First, if you’re looking for guides, documentation, or technical papers, there are a few places you’ll want to visit: is a good place to start. Here you’ll find system requirements, installation guides, documentation, samples and SAS Notes. Got an error? Paste it in the "Search" search box (upper-right corner) to bring up any relevant and related SAS Notes that provide workarounds and explanations.

Seeking industry examples and references? Check out SAS White Papers. You can search papers by business topic, industry or name if you know what you’re looking for. Another great source of information is our SAS Blogs! Look for the latest posts from SAS experts highlighting new techniques, features and updates. You can also subscribe to any SAS blogs that may be of interest to stay up to date on the latest entries.

What’s more, SAS Global Forum Online Proceedings also provides archives of all SAS Global Forum proceedings, a great place to find papers and presentations from past events. Another handy resource for browsing papers on more specific topics and SAS proceedings is Lex Jansen, which searches not only SAS Global Forum, but PharmaSUG, SUGI, SESUG, NESUG, MWSUIG, WUSS, PNWSUG and SCSUG!

If you prefer videos, we’ve got those, too! The best part – they’re all free (though some may require registration or login). Note: Can’t make it to a live event? Most webinars are available on-demand typically within 48 hours:

  • SAS TutorialsAsktheExpert: Over 200 free “how to” demos and videos.
  • Ask The Expert Series: Live events covering a variety of topics. (Psst! New dates are always being added, so be sure to check back frequently for new dates and topics!).
  • SAS Talks: Check out the archives, which cover an assortment of SAS topics.
  • Webinars: Sort by name, industry, or business topic.

Finally, the SAS YouTube Channel for tutorials, event streams and coverage, and more!

Phew! But we’re not done yet. If you’re still having trouble finding just what you’re looking for, you can always connect with other SAS users and specialists online, via the SAS Support Communities which are monitored by SAS experts, or on Twitter via @SAS_Cares, our customer service focused Twitter handle.
Lastly, I’ll leave you with a few handy technical resources and references on common topics:

  • You can search SAS Procedures by product or by name (super handy when you’re looking for information on a particular procedure, or are not sure if you have the necessary software installed).
  • On that note, if you’re trying to determine which SAS products you have installed, submit the code:  proc product_status; run;  via Enterprise Guide or the SAS program editor. Viola!
  • Migrating to SAS 9.4 (for those upgrading to 9.4 – P.S. Upgrades are free!)
  • And speaking of upgrades, here’s a handy link for Maintenance Releases & Product Upgrades!
  • On Moving to 64-bit (& be sure to check out this resource as well!)
  • Installation Center (for those getting started!)

And if you find that you do need technical assistance after all, don’t forget that the SAS Technical Support team is available from 9 a.m. until 8 p.m. ET Monday through Friday!


tags: SAS Administrators, sas customer contact center, SAS Programmers

Let me help you with that! A technical SAS resource guide was published on SAS Users.

4月 122016

Places to visit while in Las Vegas for SAS Global Forum 2016People sometimes ask me, “What is there to do when I’m in town for SAS Global Forum when I’m NOT attending the conference? After getting over the shock of finding these people in my house, emptying out my fridge, I realize they are simply offering me positive reinforcement based on my post on the subject from last year.

With this year’s conference in Las Vegas, I could simply say “Stop by the casinos, catch a show or two, and enjoy the many dining options,” but they COULD have figured THAT out on their own. Instead, I started looking for things that are a little non-traditional, and Las Vegas does “non-traditional” so well. (For the record, unless stated otherwise, I’ve not visited the locations or been a customer of the services.)


Shark Reef Aquarium at Mandalay Bay is the only facility in Nevada accredited by the Association of Zoos & Aquariums. Located in the back of the Mandalay Bay Hotel and Casino, this Aquarium features numerous species, including piranha, a Komodo Dragon, Golden Crocodiles, and of course, sharks. (If you’re a Certified Diver, you can even sign up to swim with the sharks.) Call before visiting though, especially during evening hours – as I discovered the last time SAS Global Forum was in Las Vegas, this attraction has been known to close early to rent the facility to private parties.

If you’re on a budget, consider visiting the Flamingo Hotel’s Wildlife Habitat, which includes pelicans, turtles, parrots, and, of course, flamingos; The MGM Grand’s Lion Habitat; the 117,000 gallon Silverton Aquarium and nearby Bass Pro Shop freshwater exhibits; and the 50,000 gallon Atlantis Aquarium at the Forum Shoppes in Caesars Palace. All these attractions are free.

Offbeat Museums

If you’re around my age, you may remember the flamboyant pianist Liberace; unfortunately, his long-running museum has closed down his wardrobe and candelabra collection sold off. However, Las Vegas IS the obvious place for something called “The Mob Museum.” Located downtown, this relatively new facility has garnered praise from those who have visited. You could also consider visiting the National Pinball Hall of Fame, which is across the street from the old Liberace museum.


Unfortunately, the UNLV Running Rebels are done playing basketball for the year, and the NHL has yet to announce their anticipated expansion into the city. And baseball fans will have to be flexible; the Las Vegas 51s will be on the road between April 15th and 22nd. But, if you’re coming into town early or staying late, consider taking the short trip north of town to watch the AAA affiliate of the New York Mets play at Cashman Field.  Most seats are under $20 – many well under!  (I HAVE been to a game here, and thoroughly enjoyed myself.)

Television Sites

Many movies and television shows have been shot in Las Vegas. The famous “Welcome to Las Vegas” sign is an American icon and been featured in so many movies. You can find it in the median at 5100 Las Vegas Boulevard South, or you can get your photo with a replica at the airport if you prefer.  “Vegas Vic,” another famous sign featuring a cowboy, still exists and can be found on Fremont Street. (His arm no longer moves, and the speech has been turned off, but the landmark itself is still there.)

The “World Famous Gold & Silver Pawn Shop” featured in the reality TV show Pawn Stars is about 3.5 miles north on Las Vegas Blvd.  Some have advised me that “it’s just another little pawn shop” and “the TV people are almost never there”, but this is a chance to wander in and see something you recognize from theTV show, orbuy a T-shirt commemorating your visit. . On a related note, the director of the Clark County Museum, Mark Hall-Patton, often appears on the show; the museum itself is located about 30 minutes  outside of town in Henderson, Nevada. (1830 S. Boulder Hwy, $2 admission).  Danny “The Count” Koker, who rolled into his own show “Counting Cars,” offers **FREE** tours of “Count’s Kustoms,” which is located a couple of blocks behind Circus Circus. Rick’s Restorations, site of Pawn Stars now cancelled spin-off “American Restorations,” can also be found in Las Vegas, but I cannot find any reference that their Mesquite Ave. location offers tours.

Rehab” may have been canceled, but the namesake pool and “beach” club can still be found at the Hard Rock Hotel and Casino.

Bus Tours

If you’d like to get out of town, several tour companies offer hotel pick-up / drop-off coach trips to sites like Hoover Dam / Lake Mead, Red Rock Canyon, Death Valley, Area 51 (allegedly), and even the Grand Canyon, Bryce Canyon and Zion National Parks.

Check with concierge or perform some web research; companies I found include,, and (prices and availabilities vary).

Show Tickets

If you want to see a show, but are not concerned with getting in to see any particular act may be interested in obtaining day-of-show tickets at a substantial discount off list price. Tix4Tonight updates their prices and availability at 9:30am each day; there may even be some names you’ll recognize on their daily roster.

For those that want to stay on the property and will be coming into town early, the spring tour of ABC’s “Nashville” will be stopping at the Venetian Theater on the evening Friday, April 15th.

Pick and choose how you want to spend your free time while in Las Vegas –remember, of course, that attending the conference events is your number one priority while you’re at SAS Global Forum!  See you there.





tags: SAS Global Forum

Places to visit while in Las Vegas for SAS Global Forum 2016 was published on SAS Users.

4月 062016

parallel2One of the hidden gems of SAS Studio is the ability to run process flows in parallel. This feature really shines when used in a grid environment. Let’s discuss this one step at a time.

First, what is a process flow? When working in the Visual Programmer perspective, you have access to process flows. A process flow is a graphical representation of a process, where each object, be it a SAS program, a SAS Studio task, a query, and so on, is represented by a node. Nodes are connected by links that instruct SAS Studio how to move from one node to the next.

Note: Click on the images to enlarge them.

SAS Studio Parallel Process Flows1

Display 1. SAS Studio Process Flow

On the Properties tab of the current process flow, you can set the execution mode of the nodes.  With the default setting, SAS Studio runs the nodes in the order in which they were added to the process flow. If node 2 is dependent on node 1, node 1 must run completely before node 2 will run.

You can change the execution mode to Parallel as shown in Display 2. When this value is set, SAS Studio uses multiple workspace servers to run the nodes concurrently, always enforcing the correct dependencies.

SAS Studio Parallel Process Flows2

Display 2. Setting the Execution Mode to Parallel

When you use this feature in a SAS Grid environment, if the administrator has configured the workspace server sessions to be grid launched, you can achieve the benefits and the performance improvements of multi-machine parallel load balancing, without having to code any SAS/CONNECT statement. It’s a real point-and-click parallel execution engine!

Display 3 shows the process flow presented in Display 1 running in parallel execution mode. The pane is grayed out because it is not possible to interact with it until the execution is complete.

We can see that the List Data node is still running, while the Partition Data node has already finished. Thus, the two Filter Data nodes were able to start in parallel.

SAS Studio Parallel Process Flows3

Display 3. Tasks Running in Parallel in a Process Flow

In this scenario, we would guess three workspace server sessions are concurrently running our code. However, if we monitor what is happening on the back-end hosts, we notice something unexpected. There are actually five workspace server sessions running. Why? As soon as you sign in to SAS Studio, it starts two SAS sessions. These are used only for the default execution mode. If a process flow is run in parallel mode, up to three additional SAS session are started, for a total of five. Once the process flow is finished, the three additional SAS processes terminate, if there is no further activity for 30 seconds, in order to release resources.

An administrator can use a configuration property, webdms.maxParallelWorkspaces, to specify the maximum number of workspaces that can be used when SAS is running in parallel mode. The default value is 3. The maximum value is 8.

I hope you enjoy running multiple tasks concurrently. If you have already started using parallel processing, you might want to check out my earlier blog, How to avoid the pitfalls of parallel jobs!



tags: parallel processing, SAS Professional Services, SAS Programmers, sas studio

SAS Studio Parallel Process Flows was published on SAS Users.

4月 042016

VisualAnalyticsHTML5In the past, configuring a new autoload library for SAS Visual Analytics was a manual process involving steps to create very specific folder structures and to copy and edit scripts and SAS programs. No more! Updates to the SAS Deployment Manager have added a new task that creates and configures new autoload libraries automatically, which should help SAS administrators add autoload libraries to their system while reducing the possibility of making a careless mistake.

For those unfamiliar with autoload, I am talking about a directory into which a user can drop a spreadsheet, delimited file, or SAS Data Set and have that file automatically loaded for use in a SAS Visual Analytics environment.

So let’s see how this works. For a distributed environment, we need to make sure we start the SAS Deployment Manager on the compute tier machine.  Looking through the tasks, you should see “Configure Autoload Directory for SAS Visual Analytics.”

SAS Visual Analytics autoload configuration made easy

After connecting to the SAS Metadata Server with administrative credentials, you will be prompted with a SAS LASR Artifact selection. The first two selections enable you to define entirely new autoload locations serviced by brand-new LASR servers while the last selection enables you to add an autoload library to an existing LASR server. I am going to work under the assumption that I am creating a brand new Non-Distributed LASR server that will surface my brand new autoload library so I will select the first option.

SAS Visual Analytics autoload configuration made easy2

In subsequent dialogs I can provide the parameters that define a new LASR server, a new LASR library, and an autoload location where users will deposit data to be loaded.

SAS Visual Analytics autoload configuration made easy5

The SAS Deployment Manager will enter configuration processing, after which we should see confirmation that our autoload configuration was successful.

SAS Visual Analytics autoload configuration made easy3

The summary document generated by the SAS Deployment Manager indicates that we have created a new SAS LASR server and SAS LASR Library, a new drop-zone with the requisite folder structure, and a new set of scripts with which to manage the autoload library.

SAS Visual Analytics autoload configuration made easy4

All that is left for the SAS administrator to do is to execute schedule.bat for the new autoload library to initiate the job that will check the autoload drop-zone for new data.  By default that will occur every 15 minutes, but administrators can edit schedule.bat before executing it to adjust that setting.

tags: SAS Administrators, SAS Deployment Manager, SAS Professional Services, SAS Visual Analytics

SAS Visual Analytics autoload configuration made easy was published on SAS Users.

3月 302016

EnvironmentManagerThe SAS Environment Manager Report Center is a set of SAS stored processes and SAS macros which leverage the SAS Data Mart for the purpose of monitoring and auditing a SAS installation. Full documentation on the structure and functioning of the Data Mart can be found in the SAS Environment Manager 2.5 User’s Guide. This blog discusses functionality available as of the SAS 9.4 M3 release.

The intent of the Report Center is to provide most of the reporting that a SAS administrator might need “out of the box.” However, there’s lots of customization possible without doing any coding at all. You do, however, need to know something about your data, and, it helps if you’ve also got some knowledge of SAS reporting and graphing procedures. Those procedures would include PROC TABULATE, PROC TEMPLATE (for plots), PROC GPLOT, and PROC REPORT. This blog will explore a few things I’ve discovered about using the parameters to produce customized reports. In later blogs I’ll discuss using some of the reporting templates provided.

The first step is get a feel for the organization of the report center. Here’s a table to summarize what’s in it:

Opening Up the SAS Environment Manager Report Center

Often the first portions of the Report Center to be exploited are these three:

  • ARM Performance Reports
  • Audit Reports
  • ACM Reports

So, we’ll use these three as examples.

If you open ARM Performance Reports, you’ll find all the data about SAS artifacts, how much they are being used, and who is using them. SAS artifacts include such items as Stored Process Servers, Workspace Servers, tables, cubes, stored processes, OS directory usage, SAS procedure usage, and stored process times elapsed. They also report which users are using which types of SAS artifact. You can get Workspace Server load distribution by hour, and number of workspace server sessions broken down by day. This is a great first step for a SAS administrator to know what’s going on in his/her production deployment.

As for customizing, you’ll see this menu for all the ARM Performance Reports (and nearly all other reports as well).  You get to choose:

1.     Number of observations to print in report

2.     Subsetting, by begin/end date of data, WHERE clause

3.     Look and feel:  output format, ODS style

4.     Debugging options: show SAS log, macro calls/parameters, and macro generation (using the debug level selector)

Opening Up the SAS Environment Manager Report Center2

Careful: The WHERE clause is free-form, so you must know your variables names, types, and reasonable or valid values, or you will get errors.

The next interesting collection of reports in the Report Center is the Audit Reports, where you’ll find reports on all the events that could be scraped from SAS logs, such as:

  • Authentication errors
  • Group membership changes
  • Changes in user accounts
  • Other metadata changes
  • Client connections by userID
  • Most recent login for each userID

The most complex set of data is reported on by the ACM Reports (Agent Collected Metrics). This encompasses all the data that the SAS Environment Manager collects in its normal operations. As you would expect, a great deal of this data is the same as what you’d see when examining the metrics in SAS Environment Manager using the Resources->Browse menus.

ACM reports typically provide far more parameter options beyond the usual Titles, Footnotes, Date/Time ranges, and WHERE clause. This makes them far more customizable once you’re familiar with the additional parameters. For example, you typically have a prompt screen like the following:

Opening Up the SAS Environment Manager Report Center3

You can choose your data set, variables to display, many plotting options, X-axis and Y-axis variables and labels, BY groupings, layout and axis options, and many more. Plus, of course, the standard selectors for time/date, WHERE clauses, Titles and Footnotes.

So the lesson here is:  Know your data, know something about plotting/graphing in SAS and what the terms mean, be willing to experiment and have a little patience. With these assets you can get literally hundreds of great tables and graphs.

tags: SAS Administrators, SAS Environment Manager, SAS Professional Services

Opening Up the SAS Environment Manager Report Center was published on SAS Users.

3月 292016

Visual Analytics audit data collectionIn a couple of my previous blogs I discussed how to audit who made changes to data in a SAS environment. In the last couple of weeks I have been asked how to do the same thing for SAS Visual Analytics reports and explorations.  The Visual Analytics administrator overview report doesn’t answer the question, it deals mainly with usage of LASR servers and data. The same can be said for the VALASR kit reports that are available in SAS Environment Manager.

These two assets can help answer usage questions like what in-memory tables are being used and by whom, but not who is making changes to reports and exploration.

Looking at SAS Environment Manager Service Architecture report center there is not a report that will directly answer the question either. However, the good news is, the data to answer the question is being collected and stored in the Environment manager data mart.

EV Service Architecture enables the Audit.Meta.Updates logger on the metadata server, the result is metadata server Audit logs record, for all public objects:

  • Who modified the object
  • Date and time of the modification
  • Was the object added, deleted or updated

The audit logs are located in <config>Lev1SASMetaMetadataServerAuditLogs and start with the string Audit.

Here is an example of a line from the log:

2016-02-05T13:58:56,892 INFO [00016303] 573:henri@meta01 – Audit Public Object Type=SAS report (2G) Name=AnnualSalarybyDepartment ObjId=A5FMYBKR.AX00006Q has been updated.

NOTE: public objects are loosely defined as objects that are modeled in metadata and participate in the Information Service and/or Logical Object frameworks. Many, but not all, public objects live in the SAS Folders tree structure.

The SAS Environment Manager log parsing, collection and ETL process reads the information from the metadata audit logs and stores it in the environment manager data mart. The data is stored in the ARTIFACT.AUDIT_TRANSACTIONS table.

From there, if the feed to Visual Analytics is enabled, the data will be fed to the EVDMLA.AUDIT_TRANSACTIONS in-memory table, where you can use Visual Analytics Explorer or Report designer to view the audit trail of a report, exploration or any public object.

The columns of interest are:

  • Active Userid: the user who made the change.
  • Audit Record Event: the event that occurred, for changes to public objects the relevant events are updated, added and deleted.
  • Audit Record Type: there are multiple types of audit record in the table e.g. Login, access control change etc. The one we are interested in is AuditPublicObject which records changes made to public objects.
  • ObjectID: the metadata object id (can be used to get more information about the object from metadata)
  • Object Type: the type of the public object, for visual analytics reports that is “SAS report (2G” and “Visual exploration”
  • Startdt: the date and time that the event happened.
  • Identity Name:  the name of the object in metadata.

Visual Analytics Reports and Explorations

NOTE: often you will see multiple records for an update with a slightly different timestamp. This is because one save from the interface can generate multiple updates on objects in the metadata server.

The exploration below shows a graph and table displaying what audit events have occurred for Visual Analytics reports and explorations. The table shows, for each report and exploration, an audit trail of who updated the report and when the update occurred. If an object is deleted that will also be recorded. Unfortunately the detail of what changed e.g. a column was added, or a calculation was changed, is not recorded.

Visual Analytics Reports and Explorations2

In another example, this basic Visual Analytics report allows an administrator to select a report or exploration and lists the update history for the selected object.

Visual Analytics Reports and Explorations3

The bottom line, with SAS Environment Manager service architecture setup, the data mart contains information about who is adding, updating and deleting public objects in SAS metadata. If the feed to Visual Analytics is enabled, VA can be used to report an audit trail for reports and explorations.


Who changed my SAS Visual Analytics reports and explorations? was published on SAS Users.