12月 132017
 

I'm a big fan of the Import Data task in SAS Enterprise Guide, especially for its support of text-based files (CSV, tab delimited, fixed width, and more). There's no faster method for generating SAS code that reads your data exactly the way you need it. I use the tool so often that I take for granted some of its neatest features, and I forget that many new users (and even veteran users) might not know about them. In this article, I'll review a few of the cool things that this task can do for you.

Read fixed-width text files into SAS

We think of CSV files (and...alas...Excel files) as the main standard for data exchange among systems, but many legacy systems still produce and consume fixed-width text data formats. The SAS DATA step is a perfect tool for reading these files, but defining the columns and their properties can be tedious. The "Fixed columns" option on the Import Data task can make this job simple.

Suppose that you're beginning with a spec like this:

And a raw data file like this:

You can use the Import Data wizard to define the boundaries of your columns by adding boundary lines with just click-and-drag operations. Beginning with the File->Import Data task, select your source text file and advance to the second page of the wizard. When you select "Fixed columns" as the input text format, you'll see a layout ruler that looks like this:

Click at the column boundaries (referring to your original spec!) and drag the rule lines as needed to define those column boundaries. Then click Next, and fill out details for the column names and types:

Which then tells the Import Data task how to generate the proper INPUT statements:

When you click Finish, you end up with a data set that's ready for business:

Modify the properties for multiple columns -- with one step

Here's a click-saving trick. Sometimes you have an input data file that contains many columns that share the same properties: type, length, and SAS format. It can be tedious to click and modify the properties of each column that you want to import. There's a shortcut on the Define Field Attributes page of the wizard that you can use to change the attributes for several columns at the same time. Simply SHIFT+Click to select multiple column definitions on the page, then click Modify.... The "Field Attributes for Multiple Selections" window appears, and you can change the necessary attributes just once and apply to the many items you picked.

This trick works as you import any text file or Excel file.

Create SAS program code that you can reuse anywhere

In a previous article I described how the Import Data task works "behind the scenes." Some of the magic that the task performs is not captured in SAS code, and that can present a challenge when you want to reuse this work in other settings -- for example, in a batch process or in a larger SAS program. However, with a couple of tweaks you can coerce the Import Data task into creating SAS code that you can almost just "lift and shift," as is.

The first option is hidden under the Performance window, labeled as "Bypass the data cleansing process." By default, the Import Data task reformats your input text file to normalize it for a cleaner import step. While doing no harm, most of the time this step isn't needed -- especially if your original data file is well formed. And since this step changes the input file, it's isn't repeatable outside of this task. My first tip for the best reusable code: click Performance... on the first page of the wizard, then select the "Bypass.." checkbox. That guarantees that the code will be formulated to read your original raw file. (Note that the Performance button is available only when importing text files, not Excel files.)

The second option you'll want to change is related to this, but you'll find it on the final page with the Advanced Options. Select "Generalize import step to run outside of SAS Enterprise Guide." This ensures that the task won't attempt any behind-the-scenes monkey business with your original file -- everything is captured in the DATA step that the task generates. Well, almost everything...

The one missing piece, a confounding factor when you select a local text file to import on a remote SAS Workspace session, is the transfer of the local file to the remote server. SAS Enterprise Guide copies the file for you -- behind the scenes -- and there is no SAS code to represent this step.

You can take control of even this step, though, if you make use of the Copy Files task (now available for you on the Tasks->Data menu). You can then copy the file from a local source folder, and land it wherever you want on the SAS server. Modify your newly repurposed Import Data code to pull from that server-based destination, giving you more control over the individual steps in the import process.

Learn more about importing text files

If you're new to importing data into SAS, whether using a SAS program or SAS Enterprise Guide, you might learn some of the basics from these video tutorials that were produced by SAS instructors:

12月 132017
 

I'm a big fan of the Import Data task in SAS Enterprise Guide, especially for its support of text-based files (CSV, tab delimited, fixed width, and more). There's no faster method for generating SAS code that reads your data exactly the way you need it. I use the tool so often that I take for granted some of its neatest features, and I forget that many new users (and even veteran users) might not know about them. In this article, I'll review a few of the cool things that this task can do for you.

Read fixed-width text files into SAS

We think of CSV files (and...alas...Excel files) as the main standard for data exchange among systems, but many legacy systems still produce and consume fixed-width text data formats. The SAS DATA step is a perfect tool for reading these files, but defining the columns and their properties can be tedious. The "Fixed columns" option on the Import Data task can make this job simple.

Suppose that you're beginning with a spec like this:

And a raw data file like this:

You can use the Import Data wizard to define the boundaries of your columns by adding boundary lines with just click-and-drag operations. Beginning with the File->Import Data task, select your source text file and advance to the second page of the wizard. When you select "Fixed columns" as the input text format, you'll see a layout ruler that looks like this:

Click at the column boundaries (referring to your original spec!) and drag the rule lines as needed to define those column boundaries. Then click Next, and fill out details for the column names and types:

Which then tells the Import Data task how to generate the proper INPUT statements:

When you click Finish, you end up with a data set that's ready for business:

Modify the properties for multiple columns -- with one step

Here's a click-saving trick. Sometimes you have an input data file that contains many columns that share the same properties: type, length, and SAS format. It can be tedious to click and modify the properties of each column that you want to import. There's a shortcut on the Define Field Attributes page of the wizard that you can use to change the attributes for several columns at the same time. Simply SHIFT+Click to select multiple column definitions on the page, then click Modify.... The "Field Attributes for Multiple Selections" window appears, and you can change the necessary attributes just once and apply to the many items you picked.

This trick works as you import any text file or Excel file.

Create SAS program code that you can reuse anywhere

In a previous article I described how the Import Data task works "behind the scenes." Some of the magic that the task performs is not captured in SAS code, and that can present a challenge when you want to reuse this work in other settings -- for example, in a batch process or in a larger SAS program. However, with a couple of tweaks you can coerce the Import Data task into creating SAS code that you can almost just "lift and shift," as is.

The first option is hidden under the Performance window, labeled as "Bypass the data cleansing process." By default, the Import Data task reformats your input text file to normalize it for a cleaner import step. While doing no harm, most of the time this step isn't needed -- especially if your original data file is well formed. And since this step changes the input file, it's isn't repeatable outside of this task. My first tip for the best reusable code: click Performance... on the first page of the wizard, then select the "Bypass.." checkbox. That guarantees that the code will be formulated to read your original raw file. (Note that the Performance button is available only when importing text files, not Excel files.)

The second option you'll want to change is related to this, but you'll find it on the final page with the Advanced Options. Select "Generalize import step to run outside of SAS Enterprise Guide." This ensures that the task won't attempt any behind-the-scenes monkey business with your original file -- everything is captured in the DATA step that the task generates. Well, almost everything...

The one missing piece, a confounding factor when you select a local text file to import on a remote SAS Workspace session, is the transfer of the local file to the remote server. SAS Enterprise Guide copies the file for you -- behind the scenes -- and there is no SAS code to represent this step.

You can take control of even this step, though, if you make use of the Copy Files task (now available for you on the Tasks->Data menu). You can then copy the file from a local source folder, and land it wherever you want on the SAS server. Modify your newly repurposed Import Data code to pull from that server-based destination, giving you more control over the individual steps in the import process.

Learn more about importing text files

If you're new to importing data into SAS, whether using a SAS program or SAS Enterprise Guide, you might learn some of the basics from these video tutorials that were produced by SAS instructors:

12月 132017
 

PROC FREQ is one of the most popular procedures in the SAS language.  It is mostly used to describe frequency distribution of a variable or combination of variables in contingency tables.  However, PROC FREQ has much more functionality than that.  For an overview of all that it can do, see an introduction of the SAS documentation. SAS Viya does not have PROC FREQ, but that doesn’t mean you can’t take advantage of this procedure when working with BIG DATA. SAS 9.4m5 integration with SAS Viya allows you to summarize the data appropriately in CAS, and then pass the resulting summaries to PROC FREQ to get any of the statistics that it is designed to do, from your favorite SAS 9.4 coding client. In this article, you will see how easy it is to work with PROC FREQ in this manner.

These steps are necessary to accomplish this:

  1. Define the CAS environment you will be using from a SAS 9.4m5 interface.
  2. Identify variables and/or combination of variables that define the dimensionality of contingency tables.
  3. Load the table to memory that will need to be summarized.
  4. Summarize the data with a CAS enable procedure, use PROC FEDSQL for high cardinality cases.
  5. Write the appropriate PROC FREQ syntax using the WEIGHT statement to input cell count data.

Step 1: Define the CAS environment

Before you start writing any code, make sure that there is an _authinfo file in the appropriate user directory in your system (for example, c:\users\<userid>$$ with the following information:

host <cas server name> port <number> user "<cas user id>" password "<cas user password>"

This information can be found by running the following statement in the SAS Viya environment from SAS Studio:

cas;  
caslib _all_ assign;

 

Then, in your SAS 9.4m5 interface, run the following program to define the CAS environment that you will be working on:

options cashost=" " casport=;
cas mycas user=;
libname mycas cas;
/** Set the in memory shared library if you will be using any tables already promoted in CAS **/
libname public cas caslib=public;

 

PROC FREQ FOR BIG DATA

Figure 1

Figure 1 shows the log and libraries after connecting to the CAS environment that will be used to deal with Big Data summarizations.

Step 2: Identify variables

The variables here are those which will be use in the TABLE option in PROC FREQ.  Also, any numeric variable that is not part of TABLE statement can be used to determine the input cell count.

Step 3: Load tale to memory

There are two options to do this.  The first one is to load the table to the PUBLIC library in the SAS Viya environment directly.  The second option is to load it from your SAS 9.4m5 environment into CAS. Loading data into CAS can be as easy as writing a DATA step or using other more efficient methods depending on where the data resides and its size.  This is out of scope of this article.

Step 4: Summarize the data with a CAS enable procedure

Summarizing data for many cross tabulations can become very computing expensive, especially with Big Data. SAS Viya offers several ways to accomplish this (i.e. PROC MEANS, PROC MDSUMMARY, PROC FEDSQL). There are ways to optimize performance for high cardinality summarization when working with CAS.  PROC FEDSQL is my personal favorite since it has shown good performance.

Step 5: Write the PROC FREQ

When writing the PROC FREQ syntax make sure to use the WEIGHT statement to instruct the algorithm to use the appropriate cell count in the contingency tables. All feature and functionality of PROC FREQ is available here so use it to its max! The DATA option can point to the CAS library where your summarized table is, so there will be some overhead for data transfer to the SAS 9 work server. But, most of the time the summarized table is small enough that the transfer may not take long.  An alternative to letting PROC FREQ do the data transfer is doing a data step to bring the data from CAS to a SAS base library in SAS 9 and then running PROC FREQ with that table as the input.

Example

Figure 2 below shows a sample program on how to take advantage of CAS from SAS 9.4m5 to produce different analyses using PROC FREQ.

PROC FREQ for big data

Figure 2

 

Figure 3 shows the log of the summarization in CAS for a very large table in 1:05.97 minutes (over 571 million records with a dimensionality of 163,964 for all possible combinations of the values of feature, date and target).  The PROC FREQ shows three different ways to use the TABLE statement to produce the desired output from the summarized table which took only 1.22 seconds in the SAS 9.4m5 environment.

PROC FREQ for big data

Figure 3

Program

 

/** Connect to Viya Environment **/
options cashost="xxxxxxxxxxx.xxx.xxx.com" casport=5570;
cas mycas user=calara;
libname mycas cas datalimit=all;
 
/** Set the in memory shared library **/
libname public cas caslib=public datalimit=all;
 
/** Define macro variables **/
/* CAS Session where the FEDSQL will be performed */
%let casref=mycas;
 
/* Name of the output table of the frequency counts */
%let outsql=sql_sum;
 
/* Variables for cross classification cell counts */
%let dimvars=date, feature, target;
 
/* Variable for which frequencies are needed */
%let cntvar=visits;
 
/* Source table */
%let intble=public.clicks_summary;
 
proc FEDSQL sessref=&casref.;
	create table &outsql. as select &dimvars., count(&cntvar.) as freq 
		from &intble. group by &dimvars.;
	quit;
run;
 
proc freq data=&casref..&outsql.;
	weight freq;
	table date;
	table date*target;
	table feature*date / expected cellchi2 norow nocol chisq noprint;
	output out=ChiSqData n nmiss pchi lrchi;
run;

PROC FREQ for Big Data was published on SAS Users.

12月 112017
 
Self-similar Christmas tree created in SAS

Happy holidays to all my readers! My greeting-card to you is an image of a self-similar Christmas tree. The image (click to enlarge) was created in SAS by using two features that I blog about regularly: matrix computations and ODS statistical graphics.

Self-similarity in Kronecker products

I have previously shown that the repeated Kronecker product of a binary matrix leads to self-similar structures.

Specifically, if M is a binary matrix then the Kronecker product M@M is a block matrix in which each 1 in the original matrix is replaced with a copy of M and each 0 is replaced by a zero block. (Here '@' is the Kronecker product (or direct product) operator, which is implemented in SAS/IML software.) If you repeat this process three or more times and create a heat map of the product matrix, the self-similar structure is apparent.

For example, the following 5 x 7 binary matrix has a 1 in positions that approximate the shape of a Christmas tree. The direct product M@M@M is a matrix that has 53 = 125 rows and 73 = 343 columns. If you use the HEATMAPDISC subroutine in SAS/IML to create a heat map of the direct product, you obtain the following image in which a green cell represents a 1:

proc iml;
M = {0 0 0 1 0 0 0,
     0 0 1 1 1 0 0,
     0 1 1 1 1 1 0,
     1 1 1 1 1 1 1,
     0 0 0 1 0 0 0};
Tree = M @ M @ M;     /* M is 5 x 7, so Tree is 125 x 343 */
 
ods graphics / width=300px height=450px  ANTIALIASMAX=50000;
call heatmapdisc(Tree) title="Happy Holidays to All my Readers!"
     colorramp={White Green} displayoutlines=0 ShowLegend=0;

That's not a bad result for three SAS/IML statements! You can see that the "tree" has a self-similar structure: the large tree is composed of 17 smaller trees, each of which is composed of 17 mini trees.

Adding a star

My readers are worth extra effort, so I want to put a star on the top of the tree. One way to do this is to use a scatter plot and plot a special observation by using a star symbol. To plot the "tree" by using a scatter plot, it is necessary to "unpack" the 125 x 343 matrix into a list of (x, y) values (this is a wide-to-long conversion of the matrix). You can use the NDX2SUB function to extract the row and column of each 1 in the matrix, as follows:

/* extract (row, column) pairs for matrix elements that are '1' */
idx = loc(Tree = 1);                /* indices for the 1s */
s = ndx2sub(dimension(Tree), idx);  /* convert indices to subscripts */
create Tree from s[c={"RNames" "CNames"}];
append from s;
close;
 
call symputx("XPos", range(s[,2])/2); /* midrange = horiz position of star */
quit;

The previous statements create a SAS data set called TREE that contains the (x, y) positions of the 1s in the direct product matrix. The statements also saved the midrange value to a macro variable. The following SAS procedures create the location of the star, concatenate the two data sets, and create a scatter plot of the result, which is shown at the top of this article.

data star;
x = &XPos; y = 0; 
run;
 
data Tree2;
set Tree star;
run;
 
ods graphics / width=600px height=800px  ANTIALIASMAX=50000;
title "Happy Holidays";
proc sgplot data=Tree2 noborder noautolegend;
scatter x=CNames y=RNames / markerattrs=(color=ForestGreen symbol=SquareFilled size=3);
scatter x=x y=y / markerattrs=(symbol=StarFilled size=20)             /* yellow star */
                  filledoutlinedmarkers markerfillattrs=(color=yellow);
yaxis reverse display=none;
xaxis display=none;
run;

Previous SAS-created Christmas cards

All year long I blog about how to use SAS for serious pursuits like statistical modeling, data analysis, optimization, and simulation. Consequently, I enjoy my occasional forays into a fun and frivolous topic such as how to create a greeting card with SAS. If you like seeing geeky SAS-generated images, here is a list of my efforts from previous years:

Wherever you are and whatever holiday you celebrate, may you enjoy peace and happiness this season and in the New Year.

The post A self-similar Christmas tree appeared first on The DO Loop.

12月 092017
 

From national parks and healthcare to taxes and nutrition, federal civilian agencies feature an incredibly large and diverse set of missions. These agencies oversee almost every aspect of American life with an endless sea of projects, programs and general oversight. But, as Deloitte Consulting’s Mark Urbanczyk said during a recent [...]

How federal civilian agencies can improve outcomes with analytics was published on SAS Voices by Steve Bennett

12月 072017
 

The primary obstacle to becoming a data-driven business is that data is not readily available, leaving valuable insights unused in data silos. To overcome this hurdle, today’s companies are creating a new role: Chief Data Officers (CDO). Responsible for unlocking insights hidden in data silos, the CDO is tasked with [...]

Are you ready for the Chief Data Officer challenge? was published on SAS Voices by Helmut Plinke

12月 072017
 

authorization in SAS ViyaAuthorization determines what a user can see and do in an application. An authorization system is used to define access control policies, and those policies are later enforced so that access requests are granted or denied. To secure resources in SAS Viya there are three authorization systems of which you need to be aware.

The General Authorization system secures folders within the SAS Viya environment and the content of folders, for example, reports or data plans. It can also secure access to functionality.

The CAS Authorization system secures CAS libraries, tables (including rows and columns) and CAS actions.

The File system authorization system secures files and directories on the OS, for example code or source tables for CAS libraries.

In this post, I will provide a high-level overview of the General and CAS authorization systems.  If you want to dig into more detail please see the SAS Viya Administration Guide Authorization section.

An important factor in authorization is the identity of the user making the request. In a visual deployment of SAS Viya the General and CAS authorization systems use the same identity provider, an LDAP server. The other common feature between the two authorization systems is that they are deny-based. In other words, access to resources is implicitly disallowed by default. This is important as it marks a change for those familiar with metadata authorization in SAS 9.4.

You can administer both general and CAS authorization from SAS Environment Manager. CAS authorization may also be administered from CAS Server Monitor and from the programming interfaces via the accessControl action set. In SAS Viya 3.3, command-line interfaces are available to set authorization for both systems.

General Authorization System

The general authorization system is used to administer authorization for folders, content and functionality. The general authorization system is an attribute based authorization system which determines authorization based on the attributes of the:

  • Subject: attributes that describe the user attempting the access e.g. user, group, department, role, job title etc.,
  • Action: attributes that describe the action being attempted e.g. read, delete, view, update etc.
  • Resource (or object): attributes that describe the object being accessed (e.g. the object type, location, etc.)
  • Context (environment): attributes that deal with time, location etc.

This attribute model supports Boolean logic, in which rules contain “IF, THEN” statements about who is making the request, the resource, the context and the action.

In the general authorization system, information about the requesting user, the target resource, and the environment can influence access. Each access request has a context that includes environmental data such as time and device type. Environmental constraints can be incorporated using conditions.

Permission inheritance in the general authorization system flows through a hierarchy of containers. Each container conveys settings to its child members. Each child member inherits settings from its parent container. Containers can be folders, or rest endpoints which contain functionality. For example, a folder will pass on its permissions to any children which can be additional sub-folders, or content such as reports or data plans.

In the general authorization system precedence, the way in which permission conflicts are resolved, is very simple, the only factor that affects precedence is the type of rule (grant or prohibit). Prohibit rules have absolute precedence. If there is a relevant prohibit rule, effective access is always prohibited.

So, a deny anywhere in the object or identity hierarchy for the principal (user making the request) means that access is denied.

CAS Authorization

The CAS authorization system mostly focuses on data. It makes sense therefore that it is implemented in the style of a database management system (DBMS). DBMS style authorization systems focus on securing access to data. The permissions relate to data for example, read, write, update, select etc., and some CAS-specific ones like promote and limited promote.

Permission inheritance in the general authorization system flows through a hierarchy of objects, The hierarchy is CASLIB > table > rows/columns.

In the CAS authorization system precedence, the way in which permission conflicts are resolved is determined by where an access control is set and who can access control is assigned to.   The precedence rules are:

  • Direct access controls have precedence over inherited settings.
  • Identity precedence from highest to lowest is user > groups > authenticated users.

To put this another way, if a direct access control is found on an object it will determine access. If multiple direct access controls are found, a control for a user will be used over a control for a group, which will be similarly be used over a control for all authenticated users.  If no direct access control is found on, for example, a table, settings will be inherited from the CASLIB in a similar manner.

A final note on CASLIBSs, there may be additional authorization that needs to be considered to provide access to data. For example, for a path based CASLIB if host-layer access requirements are not met, grants in the CAS authorization layer do not provide access.

This has been a brief look at the two authorization systems in a SAS VIYA environment. The table below summarizes some of the information in this blog.

authorization in SAS Viya

As I noted at the start, you can find much more detail in the SAS Viya Administration Guide An introduction to authorization in SAS Viya was published on SAS Users.