SAS programmers

12月 132017
 

PROC FREQ is one of the most popular procedures in the SAS language.  It is mostly used to describe frequency distribution of a variable or combination of variables in contingency tables.  However, PROC FREQ has much more functionality than that.  For an overview of all that it can do, see an introduction of the SAS documentation. SAS Viya does not have PROC FREQ, but that doesn’t mean you can’t take advantage of this procedure when working with BIG DATA. SAS 9.4m5 integration with SAS Viya allows you to summarize the data appropriately in CAS, and then pass the resulting summaries to PROC FREQ to get any of the statistics that it is designed to do, from your favorite SAS 9.4 coding client. In this article, you will see how easy it is to work with PROC FREQ in this manner.

These steps are necessary to accomplish this:

  1. Define the CAS environment you will be using from a SAS 9.4m5 interface.
  2. Identify variables and/or combination of variables that define the dimensionality of contingency tables.
  3. Load the table to memory that will need to be summarized.
  4. Summarize the data with a CAS enable procedure, use PROC FEDSQL for high cardinality cases.
  5. Write the appropriate PROC FREQ syntax using the WEIGHT statement to input cell count data.

Step 1: Define the CAS environment

Before you start writing any code, make sure that there is an _authinfo file in the appropriate user directory in your system (for example, c:\users\<userid>$$ with the following information:

host <cas server name> port <number> user "<cas user id>" password "<cas user password>"

This information can be found by running the following statement in the SAS Viya environment from SAS Studio:

cas;  
caslib _all_ assign;

 

Then, in your SAS 9.4m5 interface, run the following program to define the CAS environment that you will be working on:

options cashost=" " casport=;
cas mycas user=;
libname mycas cas;
/** Set the in memory shared library if you will be using any tables already promoted in CAS **/
libname public cas caslib=public;

 

PROC FREQ FOR BIG DATA

Figure 1

Figure 1 shows the log and libraries after connecting to the CAS environment that will be used to deal with Big Data summarizations.

Step 2: Identify variables

The variables here are those which will be use in the TABLE option in PROC FREQ.  Also, any numeric variable that is not part of TABLE statement can be used to determine the input cell count.

Step 3: Load tale to memory

There are two options to do this.  The first one is to load the table to the PUBLIC library in the SAS Viya environment directly.  The second option is to load it from your SAS 9.4m5 environment into CAS. Loading data into CAS can be as easy as writing a DATA step or using other more efficient methods depending on where the data resides and its size.  This is out of scope of this article.

Step 4: Summarize the data with a CAS enable procedure

Summarizing data for many cross tabulations can become very computing expensive, especially with Big Data. SAS Viya offers several ways to accomplish this (i.e. PROC MEANS, PROC MDSUMMARY, PROC FEDSQL). There are ways to optimize performance for high cardinality summarization when working with CAS.  PROC FEDSQL is my personal favorite since it has shown good performance.

Step 5: Write the PROC FREQ

When writing the PROC FREQ syntax make sure to use the WEIGHT statement to instruct the algorithm to use the appropriate cell count in the contingency tables. All feature and functionality of PROC FREQ is available here so use it to its max! The DATA option can point to the CAS library where your summarized table is, so there will be some overhead for data transfer to the SAS 9 work server. But, most of the time the summarized table is small enough that the transfer may not take long.  An alternative to letting PROC FREQ do the data transfer is doing a data step to bring the data from CAS to a SAS base library in SAS 9 and then running PROC FREQ with that table as the input.

Example

Figure 2 below shows a sample program on how to take advantage of CAS from SAS 9.4m5 to produce different analyses using PROC FREQ.

PROC FREQ for big data

Figure 2

 

Figure 3 shows the log of the summarization in CAS for a very large table in 1:05.97 minutes (over 571 million records with a dimensionality of 163,964 for all possible combinations of the values of feature, date and target).  The PROC FREQ shows three different ways to use the TABLE statement to produce the desired output from the summarized table which took only 1.22 seconds in the SAS 9.4m5 environment.

PROC FREQ for big data

Figure 3

Program

 

/** Connect to Viya Environment **/
options cashost="xxxxxxxxxxx.xxx.xxx.com" casport=5570;
cas mycas user=calara;
libname mycas cas datalimit=all;
 
/** Set the in memory shared library **/
libname public cas caslib=public datalimit=all;
 
/** Define macro variables **/
/* CAS Session where the FEDSQL will be performed */
%let casref=mycas;
 
/* Name of the output table of the frequency counts */
%let outsql=sql_sum;
 
/* Variables for cross classification cell counts */
%let dimvars=date, feature, target;
 
/* Variable for which frequencies are needed */
%let cntvar=visits;
 
/* Source table */
%let intble=public.clicks_summary;
 
proc FEDSQL sessref=&casref.;
	create table &outsql. as select &dimvars., count(&cntvar.) as freq 
		from &intble. group by &dimvars.;
	quit;
run;
 
proc freq data=&casref..&outsql.;
	weight freq;
	table date;
	table date*target;
	table feature*date / expected cellchi2 norow nocol chisq noprint;
	output out=ChiSqData n nmiss pchi lrchi;
run;

PROC FREQ for Big Data was published on SAS Users.

12月 042017
 

During my 35 years of using SAS® software, I have found the CNTLIN and CNTLOUT options in the FORMAT procedure to be among the most useful features that I routinely suggest to other SAS users. The CNTLIN option enables you to create user-defined formats from a SAS data set (input control data set). The CNTLOUT option enables you to create a SAS data set (output control data set) containing format details from an entry in a SAS format catalog.

In this blog post, I provide a few examples demonstrating how to use the CNTLIN option. I also mention how to use the CNTLOUT option to store your format information in case you need to move to a new operating environment.

You can store all the format details from a SAS format catalog in a CNTLOUT data set and later restore them in a format catalog in your new operating environment using the CNTLIN option. For details, see SAS Usage Note 22194: “How to use the CNTLOUT= and CNTLIN= options in PROC FORMAT to move formats from one platform to another.”

A data set for the CNTLIN option contains variables that give specific information about ranges and values. At a minimum, the data set must contain the following variables:

FMTNAME specifies a character variable whose value is the format or informat name.
START specifies a variable that gives the range's starting value.
LABEL specifies a variable whose value is associated with a format or an informat.

For details about input and output control data sets, see the “FORMAT Procedure” section of Base SAS® 9.4 Procedures Guide, Seventh Edition.

Create a Numeric Format

The following simple example using the CNTLIN option creates a numeric format named respf:

 data test;                                         
    input response desc $20.;                       
 datalines;                                         
 1  Strongly Disagree                               
 2  Disagree                                        
 3  Neutral                                         
 4  Agree                                           
 5  Stongly Agree                                   
 ;                                                  
 run;                                               
 
 data crfmt;                                        
    set test;                                       
    start=response;                                 
    label=desc;                                     
    fmtname='respf';                                
 run;                                               
 
 proc format library=work cntlin=crfmt fmtlib;      
    select respf;                                   
 run;

Controlling Your Formats

Reveal Data Set Variables

To see the other variables that are included in data sets created by the CNTLIN and CNTLOUT options, use CNTLOUT to create a data set for the respf format created above:

 proc format library=work cntlout=outfmt;       
    select respf;                               
 run;                                      
 proc print data=outfmt;                        
 run;

Add Additional Ranges

To add another range to the respf format, you can use DATA step processing with the data set created by the CNTLOUT option. Then, re-create the format using the CNTLIN option:

data infmt;                                               
    set outfmt end=last;                                   
    output;                                                
    if last then do;                                       
       HLO='O';  /* indicates a special other range  */      
       label='NA';                                         
       output;                                             
    end;                                                   
 run;                                                     
 
 proc format library=work cntlin=infmt fmtlib;             
    select respf;                                          
 run;

Convert a State Name to Its Postal Abbreviation

One use for the CNTLIN option is to create a format that converts a state name to its 2-letter postal abbreviation. For example, this option can convert 'North Carolina' to 'NC'.  Because SAS does not have a function or format to convert state names to postal abbreviations, this is an excellent use of the CNTLIN option.

We can use data from the SASHELP.ZIPCODE data set to create a user-defined format using the CNTLIN option, as shown below:

proc sql noprint;                               
    create table crfmt as                        
    select distinct statename as start,          
           statecode as label,                   
           '$mystate' as fmtname                 
    from sashelp.zipcode;                        
 quit;                                           
 
 proc format library=work cntlin=crfmt fmtlib;   
    select $mystate;                             
 run;

Identify State Capitals

In a similar manner, we can use the MAPS.USCITY data set to create a user-defined format that identifies state capitals from the 2-letter state abbreviation. See the sample code and partial results below:

proc sql noprint;                                 
   create table crfmt as                          
   select distinct statecode as start,            
          city as label,                          
          '$mycity' as fmtname                    
   from maps.uscity                               
   where capital='Y';                             
 quit;                                            
 
proc format library=work cntlin=crfmt fmtlib;     
   select $mycity;                                
run;

Use External Data Sources

You can gather information from external data sources and read that information into a data set created by the CNTLIN option to create user-defined formats.

The following example uses ICD10 medical diagnosis codes. I downloaded a list of ICD10 codes and their descriptions into a Microsoft Excel file from the Center for Medicare & Medicaid Services website. Then, I created a user-defined format from the first 25 records:
Note: You can also download the codes as a text file.

/* This code reads in the Excel file.   */                                                                  
proc import out==myicd10                                              
   datafile= "C:\Section111ValidICD10-2017.xlsx"   
   dbms=excelcs replace;                                                
   range="'Valid ICD10 2017 &amp; NF Exclude$'";                            
   scantext=yes;                                                        
   usedate=yes;                                                         
   scantime=yes;                                                        
run;                                                                    
 
 
data crfmt;                                         
   set myicd10 (obs=25);                         
   fmtname='$myicd';                                
   start=code;                                      
   label=short_description;                         
run;  
 
title1 'ICD10 Format';                                                      
title3 'FMTLIB results only display the first 40 characters of the label';  
proc format library=work cntlin=crfmt fmtlib;       
   select $myicd;                                   
run;

A more complicated example that uses other data set variables created by the CNTLIN option is included in the linked sample program in Sample 47312: “Create a user-defined format containing decile ranges from PROC UNIVARIATE results.”

If you can think of a scenario in which the CNTLIN format would be helpful, give it a try. If you have questions, you can ask via  SAS Communities or contact us in SAS Technical Support.

Controlling your formats was published on SAS Users.

11月 292017
 

The CAS procedure (PROC CAS) enables us to interact with SAS Cloud Analytic Services (CAS) from the SAS client based on the CASL (the scripting language of CAS) specification. CASL supports a variety of data types including INT, DOUBLE, STRING, TABLE, LIST, BLOB, and others. The result of a CAS action could be any of the data types mentioned above. In this article, we will use the CAS procedure to explore techniques to interact with the results produced from CAS actions.

PROC CAS enables us to run CAS actions in SAS Viya; CAS actions are at the heart of how the CAS server receives requests, performs the relevant tasks and returns results back to the user. CAS actions representing common functionality constitute CAS action sets.  Consider the CAS action set TABLE. Within the TABLE action set, we will currently find 24 different actions relevant to various tasks that can be performed on a table. For example, the COLUMNINFO action within the TABLE action set will inform the user of the contents of a CAS in-memory table, including column names, lengths, data types, and so on. Let’s take a look at the following code to understand how this works:

proc cas;
	table.columnInfo  / table='HMEQ';
	simple.summary    / table={name='HMEQ' groupby='BAD'};
run;

In the code above, the table called ‘HMEQ’ is a distributed in-memory CAS table that contains data about applicants who were granted credit for a certain home equity loan. The categorical binary-valued variable ‘BAD’ identifies a client who has either defaulted or repaid their home equity loan. Since PROC CAS is an interactive procedure, we can invoke multiple statements within a single run block. In the code above we have executed the COLUMNINFO action from the TABLE actionset and the SUMMARY action from the SIMPLE action set within the same run block. Notice that we are able to obtain a summary statistic of all the interval variables in the dataset grouped by the binary variable ‘BAD’ without having to first sort the data by that variable. Below is a snapshot of the resulting output.

PROC CAS

Fig1: Output from table.columninfo

Fig 2: Output from executing the summary actionset with a BY group option

Let’s dig into this a little deeper by considering the following statements:

proc cas;
simple.summary result=S /table={name='hmeq'};
describe S;
run;

In the code snippet above, the result of the summary action is returned to the user in a variable ‘S’ which is a dictionary. How do we know? To find that, we have invoked the DESCRIBE statement to help us understand exactly what the form of this result, S, is. The DESCRIBE statement is used to display the data type of a variable. Let’s take a look at the log file:

The log file informs the user that ‘S’ is a dictionary with one entry of type table named “Summary.” In the above example the column names and the attributes are shown on the log file. If we want to print the entire summary table or a subset, we would use the code below:

proc cas;
simple.summary result=S /table={name='hmeq'};
print s[“Summary”];
print s[“summary”, 3:5];
run;

The first PRINT statement will fetch the entire summary table; the second PRINT statement will fetch rows 3 through 5. We can also use WHERE expression processing to create a new table with rows that match the WHERE expression. The output of the second PRINT statements above are shown in the figure below:

The result of an action could also be more complex in nature; it could be a dictionary containing dictionaries, arrays, and lists, or the result could be a list containing lists and arrays and tables etc. Therefore, it is important to understand these concepts through some simple cases. Let’s consider another example where the result is slightly more complex.

proc cas;
simple.summary result=s /table={name='hmeq', groupby='bad'};
describe s;
print s["ByGroup1.Summary", 3:5]; run;

In the example above, we are executing a summary using a BY group on a binary variable. The log shows that the result in this case is a dictionary with three entries, all of which are tables. Below is a snapshot of the log file as well as the output of PRINT statement looking at the summary for the first BY group for row 3 through 5.

If we are interested in saving the output of the summary action as a SAS data set (sas7bdat file), we will execute the SAVERESULT statement. The code below saves the summary statistics of the first BY group in the work library as a SAS dataset.

proc cas;
simple.summary result=s /table={name='hmeq', groupby='bad'};
describe s;
print s["ByGroup1.Summary", 3:5]; 
saveresult s["ByGroup1.Summary"] dataout=work.data;
run;

A helpful function we can often use is findtable. This function will search the given value for the first table it finds.

proc cas;
simple.summary result=s /table={name='hmeq'};
val = findtable(s);
saveresult val dataout=work.summary;
run;

In the example above, I used findtable to locate the first table produced within the dictionary, S, and save it under ‘Val,’ which is then directly invoked with SAVERESULT statement to save the output as a SAS dataset.

Finally, we also have the option to save the output of a CAS action in another CAS Table. The table summary1 in the code below is an in-memory CAS table that contains the output of the summary action. The code and output are shown below:

proc cas;
simple.summary /table={name='hmeq'} casout={name='mylib.summary1'}; 
fetch /table={name='mylib.summary1'};
run;

In this post, we saw several ways of interacting with the results of a CAS action using the CAS procedure. Depending on what our end goal is, we can use any of these options to either view the results or save the data for further processing.

Interacting with the results of PROC CAS was published on SAS Users.

11月 212017
 

SAS Viya provides a robust, scalable, cloud-ready, distributed runtime engine. This engine is driven by CAS (Cloud Analytic Services), providing fast processing for many data management techniques that run distributive, i.e. using all threads on all defined compute nodes.

Why

PROC APPEND is a common technique used in SAS processes. This technique will concatenate two data sets together. However, PROC APPEND will produce an ERROR if the target CAS table exists prior to the PROC APPEND.

Simulating PROC APPEND

Figure 1. SAS Log with the PROC APPEND ERROR message

Now what?

How

To explain how to simulate PROC APPEND we first need to create two CAS tables. The first CAS table is named CASUSER.APPEND_TARGET. Notice the variables table, row and variable in figure 2.

Figure 2. Creating the CAS table we need to append rows to

The second CAS table is called CASUSER.TABLE_TWO and in figure 3 we can review the variables table, row, and variable.

Figure 3. Creating the table with the rows that we need to append to the existing CAS table

To simulate PROC APPEND we will use a DATA Step. Notice on line 77 in figure 4 we will overwrite an existing CAS table, i.e. CASUSER.APEND_TARGET. On Line 78, we see the first table on the SET statement is CASUSER.APPEND_TARGET, followed by CASUSER.TABLE_TWO. When this DATA Step runs, all of the rows in CASUSER.APPEND_TARGET will be processed first, followed by the rows in CASUSER.TABLE_TWO. Also, note we are not limited to two tables on the SET statement with DATA Step; we can use as many as we need to solve our business problems.

Figure 4. SAS log validating the DATA Step ran in CAS i.e. distributed

The result table created by the DATA Step is shown in figure 5.

Figure 5. Result table from simulating PROC APPEND using a DATA Step

Conclusion

SAS Viya’s CAS processing allows us to stage data for downstream consumption by leveraging robust SAS programming techniques that run distributed, i.e. fast. PROC APPEND is a common procedure used in SAS processes. To simulate PROC APPEND when using CAS tables as source and target tables to the procedure use DATA Step.

How to simulate PROC APPEND in CAS was published on SAS Users.

11月 182017
 

Have you heard?  The ODS Destination for PowerPoint Has a New Option

It’s true.  The ODS destination for PowerPoint now has the STARTPAGE= option, which provides you with greater control and flexibility when creating presentations.

Added to the ODS POWERPOINT statement in SAS® 9.4TS1M4, the STARTPAGE= option enables you to force the creation of a new slide between procedures and between ODS layout containers.  Inserting a slide break between layout containers is one of the most impactful ways that you can use this option.

A new layout container does not automatically trigger a new slide within the presentation.  A new slide is started when the current slide is full.  That is the default, but the new STARTPAGE= option gives you the ability to start a new slide between containers even if the current slide is not full.

Examples

Shown below are four procedures placed within three layout containers.

  • The first PROC ODSTEXT step is placed in the first layout container.  Its purpose is to generate a slide of text, with that text roughly centered in the slide.
  • The second PROC ODSTEXT step is in the second container.  Its purpose is to provide useful information about the table and graph.
  • The PROC TABULATE and SGPLOT steps make up the third container.  They are the results of the analysis and, as such, need to be displayed side by side.

Default Behavior of ODS POWERPOINT, Without STARTPAGE=

Let’s look at the default behavior.  In this example, the STARTPAGE= option is not used.

ods powerpoint file='example1.pptx' options(backgroundimage="saslogo_pptx.png");
title;
ods layout gridded x=10% y=25%;
proc odstext;
	p "Have you heard?" /style=[just=c fontsize=42pt color=RoyalBlue];
	p "The STARTPAGE= option has been added to the ODS POWERPOINT statement!" /style=[just=c fontsize=24pt];
run;
ods layout end;
 
ods layout gridded rows=1 columns=1;
   ods region;
   proc odstext;
      p 'Table Shows Total Runs and Hits for Each League';
      p 'The Graph Contains One Bubble for Each Player.  The Size of Each Bubble Represents the Magnitude of the RBIs.';
   run;
ods layout end;
 
ods graphics / width=4.5in height=4in;
ods layout gridded columns=2 column_widths=(47% 47%) column_gutter=1pct;
   ods region;
   proc tabulate data=sashelp.baseball;
      class league;
      var nruns nhits nrbi;
      tables league='', sum='Totals'*(nruns nhits)*f=comma12.;
   run;
 
   ods region;
   proc sgplot data=sashelp.baseball;
      bubble x=nhits y=nruns size=nrbi/ group=league transparency=.3;
   run;
ods layout end;
ods powerpoint close;

Here is the resulting slide output:

ODS Destination for PowerPoint

Those results are not what we hoped they would be.  The output from the second PROC ODSTEXT step, which is to provide information about the table and graph, is on the first slide.  So is the graph!!!  And the graph does not look good because it is the wrong size.  The table is by itself on the second slide.  This is not the desired output at all.

Here Is Where STARTPAGE= Helps!

In this example, an ODS POWERPOINT statement with the STARTPAGE= option is added.  It is placed after the ODS LAYOUT END statement for the first container.

ods powerpoint file='example2.pptx' options(backgroundimage="saslogo_pptx.png");
title;
ods layout gridded x=10% y=25%;
proc odstext;
	p "Have you heard?" /style=[just=c fontsize=42pt color=RoyalBlue];
	p "The STARTPAGE= option has been added to the ODS POWERPOINT statement!" /style=[just=c fontsize=24pt];
run;
ods layout end;
 
ods powerpoint startpage=now; /* <---- Triggers a new slide */
 
ods layout gridded rows=1 columns=1;
   ods region;
   proc odstext;
      p 'Table Shows Total Runs and Hits for Each League';
      p 'The Graph Contains One Bubble for Each Player.  The Size of Each Bubble Represents the Magnitude of the RBIs.';
   run;
ods layout end;
 
ods graphics / width=4.5in height=4in;
ods layout gridded columns=2 column_widths=(47% 47%) column_gutter=1pct;
   ods region;
   proc tabulate data=sashelp.baseball;
      class league;
      var nruns nhits nrbi;
      tables league='', sum='Totals'*(nruns nhits)*f=comma12.;
   run;
 
   ods region;
   proc sgplot data=sashelp.baseball;
      bubble x=nhits y=nruns size=nrbi/ group=league transparency=.3;
   run;
ods layout end;
ods powerpoint close;

The STARTPAGE= option gave us exactly what we need.  The first slide contains just the text (from the first layout container).  The second slide contains more text along with the table and graph (from the second and third layout containers).

Use It Wisely!

The most important thing to know about using the STARTPAGE= option with layout containers is that it has to be placed between containers.  It cannot be placed within a container.  So you cannot put the statement between the ODS LAYOUT GRIDDED and ODS LAYOUT END statements.

For more information about ODS destination for PowerPoint and all of its options, visit this The Dynamic Duo: ODS Layout and the ODS Destination for PowerPoint.  Take a peek at it for more examples of using ODS Layout with the ODS destination for PowerPoint.

The ODS Destination for PowerPoint Has a New Option was published on SAS Users.

11月 072017
 

SAS Tips and TricksThere is certainly no shortage of terrific tips and tricks in various SAS blogs from some of our most distinguished SAS in-house experts. But, there's another group of equally qualified experts who don't often get to share their expertise on this channel: our customers. So, I went on a quest to get the inside scoop from various SAS users, polling Friends of SAS members to get their feedback on their favorite SAS tips.

We asked a few of these Friends of SAS members who are regular SAS users to share with us their top SAS tips and tricks for improving performance or something they wished they had known earlier in their SAS career. Based on that, we got a wide range of tips and tricks from a number of different SAS users – ranging from novice to expert and across various industries and product users. Check out some of them below:

FUNCTIONS

Functions are either built into SAS itself or you can write your own customized code that act in the same manner, all of which help in analyzing and processing data. There are a variety of function categories that include mathematical, date and time, character, truncation, and miscellaneous. Using functions makes us more efficient, and we don’t have to re-invent the wheel every time we want to figure something out. With this being said, some of our regular SAS users have a thing or two to say about dealing with functions that may help you out:

“Before you program any complex code, look for a SAS function that will do the task for you.”
     - John Ladds, Past President, OASUS

“Insert a line break in a concatenated string, such as: manylines = catx('0a'x,a,b,c);”
     - Aroop Ghosh, Principal Consultant, Webtalk Communications

“Use the lag function to create time related variables, for example, in time punch data”
     - Yolanda, Analyst, TD

“A good trick that I have recently learnt [sic]which can make the code less wordier is using the functions IFN and IFC as an alternative to IF THEN ELSE statements in conditional processing.”
     - Sunny Giroti, Master of Business Analytics Candidate, Schulich School of Business

“IFN can be used in place of IF THEN ELSE to shorten code”
     - Neil Menezes, Senior Business Anlyst, CTFS

“Ron Cody’s link from SAS.COM. It has many SAS function examples.”
     - John Lam, CIBC

SYNTAX/SHORTCUTS/EFFICIENCIES

You know what they say: time is money. So for a SAS programmer, finding shortcuts and ways to work more efficiently and faster are important to get a job done quicker. Here are a few ways SAS users think can make your life easy while working with SAS:

“Use missover to ensure no records are skipped when reading in a file”
     - Scott Bellefeuille, IT Solutions Developer (Merchant Services), TD Bank

“Pressing keys 'Ctrl'+'/' to comment out a line of code.”
     - Bunce Leung, Execution Manager, RBC

“Variable Lists - being able to refer to variables using double dashes to indicate all variables between first and last in a dataset is super useful for many procs. The later versions of being able to use the prefix and colon to indicate all datasets with a prefix is a great shortcut as well.”
     - Fareeza Khurshed, Manager (Statistical Services), Alberta Treasury Board and Finance

“I like to use PERL in SAS for finding stuff in character variables.”
     - Peter Timusk, Statistics Officer, Statistics Canada

“Title "SAS can give you an Inheritance". Have an ODBC driver on your local PC but not on a remote server? No problem. Use rsubmit with the inheritlib option. Your remote server will now inherit the ODBC driver and be able to access a database you thought you could only reach with your PC.”
     - Horst Wolter, Manager, TD Bank

“If you want to speed the processing of your program. Run your join statements on the "work" library. It is must faster.”
     - Estela Tavares, Economist, Statistics Canada

“When dealing with probability, can logistic be used in all cases? Trick Q - as A is N0. What about the times, probability is 0 and 1. What if the data is heavily distributed on 1s and 0s.”
     - Mukul Pandey, Student Business Analytics, Schulich School of Business

“Proc tabulate can perform descriptive statistics better than proc freq and proc means.”
     - Taha Azizi, Senior Business Insight Analyst, TD

Your turn

Were any of these tips and tricks useful? Do you use them already? What are some of your top SAS tips and tricks? Please be sure to share in the comments below!

Looking for more tips and tricks? Check out this video featuring six Canadian SAS programmers, including a few Friends of SAS members, who share some of their favourite SAS programming tips.

About Friends of SAS

If you’re not familiar with Friends of SAS, it is an exclusive online community available only to our Canadian SAS customers and partners to recognize and show our appreciation for their affinity to SAS. Members complete activities called 'challenges' and earn points that can be redeemed for rewards. There are opportunities to build powerful connections, gain privileged access to SAS resources and events, and boost your learning and development of SAS all in a fun environment.

Interested in learning more about Friends of SAS? Feel free to email myself at Natasha.Ulanowski@sas.com or Martha.Casanova@sas.com with any questions or more details.

SAS tips and tricks: Users-tell-all edition was published on SAS Users.

11月 022017
 

The purpose of this blog post is to demonstrate a SAS coding technique that allows for calculations with multiple variables across a SAS dataset, whether or not their values belong to the same or different observations.

calculations across observations of a SAS data table

What do we want?

As illustrated in the picture on the right, we want to be able to hop, or jump, back and forth, up and down across observations of a data table in order to implement calculations not just on different variables, but with their values from different observations of a data table.

In essence, we want to access SAS dataset variable values similar to accessing elements of a matrix (aij), where rows represent dataset observations, and columns represent dataset variables.

Combine and Conquer

In the spirit of my earlier post Combine and Conquer with SAS, this technique combines the functionality of the LAG function, which allows us to retrieve the variable value of a previous observation from a queue, with an imaginary, non-existent in SAS, LEAD function that reads a subsequent observation in a data set while processing the current observation during the same iteration of the DATA step.

LAG function

LAG<n> function in SAS is not your usual, ordinary function. While it provides a mechanism of retrieving previous observations in a data table, it does not work “on demand” to arbitrarily read a variable value from a previous observation n steps back. If you want to use it conditionally in some observations of a data step, you still need to call it in every iteration of that data step. That is because it retrieves values from a queue that is built sequentially for each invocation of the LAG<n> function. In essence, in order to use the LAG function even just once in a data step, you need to call it every time in each data step iteration until that single use.

Moreover, if you need to use each of the LAG1, LAG2, . . . LAGn functions just once, in order to build these queues, you have to call each of them in every data step iteration even if you are going to use them in some subsequent iterations.

LEAD function

The LEAD function is implemented in Oracle SQL and it returns data from the next or subsequent row of a data table. It allows you to query more than one row in a table at a time without having to join the table to itself.

There is no such function in SAS. However, the POINT= option of the SET statement in a SAS data step allows retrieving any observation by its number from a data set using random (direct) access to read a SAS data set. This will allow us to simulate a LEAD function in SAS.

HOP function

But why do we need two separate functions like LAG and LEAD in order to retrieve non-current observations. In essence, these two functions do the same thing, just in opposite directions. Why can’t we get by with just one function that does both backwards and forwards “hopping?”

Let’s combine and conquer.

Ideally, we would like to construct a new single function - let’s call it HOP(x, j) - that combines the best qualities of both LAG and LEAD functions. The two arguments of the HOP function would be as follows:

x – SAS variable name (numeric or character) the value of we are retrieving;

j – hop distance (numeric) – an offset from the current observation; negative values being lagging (hopping back), positive values being leading (hopping forward), and a zero-value meaning staying within the current observation.

The sign of the second argument defines whether we lag (minus) or lead (plus). The absolute value of this second argument defines how far from the current observation we hop.

Alternatively, we could have the first argument as a column number, and the second argument as a row/observation number, if we wanted this function to deal with the data table more like with a matrix. But relatively speaking, the method doesn’t really matter as long as we can unambiguously identify a data element or a cell. To stay within the data step paradigm, we will stick with the variable name and offset from the current observation (_n_) as arguments.

Let’s say we have a data table SAMPLE, where for each event FAIL_FLAG=1 we want to calculate DELTA as the difference between DATE_OUT, one observation after the event, and DATE_IN, two observations before the event:

Calculations across observations of a SAS data table

That is, we want to calculate DELTA in the observation where FAIL_FLAG = 1 as

26MAR2017 18JAN2017 = 67 (as shown in light-blue highlighting in the above figure).

With the HOP() function, that calculation in the data step would look like this:

data SAMPLE;
   set SAMPLE;
   if FAIL_FLAG then DELTA = hop(DATE_OUT,1) - hop(DATE_IN,-2);
run;

It would be reasonable to suggest that the hop() function should return a missing value when the second argument produces an observation number outside of the dataset boundary, that is when

_n_ + j < 0 or _n_ + j > num, where _n_ is the current observation number of the data step iteration; num is the number of observations in the dataset; j is the offset argument value.

Do you see anything wrong with this solution? I don’t. Except that the HOP function exists only in my imagination. Hopefully, it will be implemented soon if enough SAS users ask for it. But until then, we can use its surrogate in the form of a %HOP macro.

%HOP macro

The HOP macro grabs the value of a specified variable in an offset observation relative to the current observation and assigns it to another variable. It is used within a SAS data step, but it cannot be used in an expression; each invocation of the HOP macro can only grab one value of a variable across observations and assign it to another variable within the current observation.

If you need to build an expression to do calculations with several variables from various observations, you would need to first retrieve all those values by invoking the %hop macro as many times as the number of the values involved in the expression.

Here is the syntax of the %HOP macro, which has four required parameters:

%HOP(d,x,y,j)

d – input data table name;

x – source variable name;

y – target variable name;

j – integer offset relative to the current observation. As before, a negative value means a previous observation, a positive value means a subsequent observation, and zero means the current observation.

Using this %HOP macro we can rewrite our code for calculating DELTA as follows:

 data SAMPLE (drop=TEMP1 TEMP2);
   set SAMPLE;
   if FAIL_FLAG then
   do;
      %hop(SAMPLE,DATE_OUT,TEMP1, 1)
      %hop(SAMPLE,DATE_IN, TEMP2,-2)
      DELTA = TEMP1 - TEMP2;
   end;
run;

Note that we should not have temporary variables TEMP1 and TEMP2 listed in a RETAIN statement, as this could mess up our calculations if the j-offset throws an observation number out of the dataset boundary.

Also, the input data table name (d parameter value) is the one that is specified in the SET statement, which may or may not be the same as the name specified in the DATA statement.

In case you are wondering where you can download the %HOP macro from, here it is in its entirety:

%macro hop(d,x,y,j);
   _p_ = _n_ + &j;
   if (1 le _p_ le _o_) then set &d(keep=&x rename=(&x=&y)) point=_p_ nobs=_o_;
%mend hop;

Of course, it is “free of charge” and “as is” for your unlimited use.

Your turn

Please provide your feedback and share possible use cases for the HOP function/macro in the Comment section below. This is your chance for your voice to be heard!

Hopping for the best - calculations across SAS dataset observations was published on SAS Users.

11月 012017
 

A ghoulish Halloween Boo to all my readers! Hope my costume freaks you out, but even if it doesn't, I’m positive PROC FREQ will in a few amazing ways! Today’s Programming 2: Data Manipulation Techniques class asked about the power of PROC FREQ. Since I stopped to explain some of it's benefits to [...]

The post 3 freaky ways PROC FREQ can work in your favor appeared first on SAS Learning Post.

10月 272017
 

When loading data into CAS using PROC CASUTIL, you have two choices on how the table can be loaded:  session-scope or global-scope.  This is controlled by the PROMOTE option in the PROC CASUTIL statement.

Session-scope loaded

proc casutil;
                                load casdata="model_table.sas7bdat" incaslib="ryloll" 
                                outcaslib="otcaslib" casout="model_table”;
run;
Global-scope loaded
proc casutil;
                                load casdata="model_table.sas7bdat" incaslib="ryloll" 
                                outcaslib="otcaslib" casout="model_table" promote;
run;

 

Global-scope loaded

proc casutil;
                                load casdata="model_table.sas7bdat" incaslib="ryloll" 
                                outcaslib="otcaslib" casout="model_table" promote;
run;

 

Remember session-scope tables can only be seen by a single CAS session and are dropped from CAS when that session is terminated, while global-scope tables can be seen publicly and will not be dropped when the CAS session is terminated.

But what happens if I want to create a new table for modeling by partitioning an existing table and adding a new partition column? Will the new table be session-scoped or global-scoped? To find out, I have a global-scoped table called MODEL_TABLE that I want to partition based on my response variable Event. I will use PROC PARTITION and call my new table MODEL_TABLE_PARTITIONED.

proc partition data=OTCASLIB.MODEL_TABLE partind samppct=30;
	by Event;
	output out=OTCASLIB.model_table_partitioned;
run;

 

After I created my new table, I executed the following code to determine its scope. Notice that the Promoted Table value is set to No on my new table MODEL_TABLE_PARTITIONED which means it’s session-scoped.

proc casutil;
     list tables incaslib="otcaslib";
run;

 

promote CAS tables from session-scope to global-scope

How can I promote my table to global-scoped?  Because PROC PARTITION doesn’t provide me with an option to promote my table to global-scope, I need to execute the following PROC CASUTIL code to promote my table to global-scope.

proc casutil;
     promote casdata="MODEL_TABLE_PARTITIONED"
     Incaslib="OTCASLIB" Outcaslib="OTCASLIB" CASOUT="MODEL_TABLE_PARTITIONED";
run;

 

I know what you’re thinking.  Why do I have to execute a PROC CASUTIL every time I need my output to be seen publicly in CAS?  That’s not efficient.  There has to be a better way!

Well there is, by using CAS Actions.  Remember, when working with CAS in SAS Viya, SAS PROCs are converted to CAS Actions and CAS Actions are at a more granular level, providing more options and parameters to work with.

How do I figure out what CAS Action syntax was used when I execute a SAS PROC?  Using the PROC PARTITION example from earlier, I can execute the following code after my PROC PARTITION completes to see the CAS Action syntax that was previously executed.

proc cas;
     history;
run;

 

This command will return a lot of output, but if I look for lines that start with the word “action,” I can find the CAS Actions that were executed.  In the output, I can see the following CAS action was executed for PROC PARTITION:

action sampling.stratified / table={name='MODEL_TABLE', caslib='OTCASLIB', groupBy={{name='Event'}}}, samppct=30, partind=true, output={casOut={name='MODEL_TABLE_PARTITIONED', caslib='OTCASLIB', replace=true}, copyVars='ALL'};

 

To partition my MODEL_TABLE using a CAS Action, I would execute the following code.

proc cas;
  sampling.stratified / 
    table={name='MODEL_TABLE', caslib='OTCASLIB', groupBy={name='Event'}}, 
    samppct=30, 
    partind=true, 
    output={casOut={name='embu_partitioned', caslib='OTCASLIB'}, copyVars='ALL'};
run;

 

If I look up sampling.stratified syntax in the

proc cas;
  sampling.stratified / 
    table={name='MODEL_TABLE', caslib='OTCASLIB', groupBy={name='Event'}}, 
    samppct=30, 
    partind=true, 
    output={casOut={name='embu_partitioned', caslib='OTCASLIB', promote=true}, copyVars='ALL'};
run;

 

So, what did we learn from this exercise?  We learned that when we create a table in CAS from a SAS PROC, the default scope will be session and to change the scope to global we would need to promote it through a PROC CASUTIL statement.  We also learned how to see the CAS Actions that were executed by SAS PROCs and how we can write code in CAS Action form to give us more control.

I hope this exercise helps you when working with CAS.

Thanks.

Tip and tricks to promote CAS tables from session-scope to global-scope was published on SAS Users.

10月 262017
 

If you've visited SAS documentation (also known as the "SAS Help Center") lately, you may have noticed that we've made some fairly significant changes in the documentation for SAS products and solutions. The new site is organized in a new way, search works a little differently, and the user interface has changed. These changes are part of our continuous pursuit to provide you with the best possible experience on our website.

Below you'll find a quick summary of what's new. Check out the SAS Help Center and let us know what you think.
(You'll find ways to provide feedback at the end of this post. We'd love to hear from you!)

The SAS Help Center

For starters, SAS documentation has a new location on the web: http://documentation.sas.com and a new name: the “SAS Help Center.” You'll notice the SAS Help Center homepage serves as a gateway to documentation for a number of SAS products and solutions. We've highlighted a few of the major product documentation sets at the top of the page, with a full listing of available documentation immediately following. The user interface contains many of the same features as the documentation you used on support.sas.com, but there are a few little differences. Perhaps the most significant - search works a little differently. More on that in a bit.

Content Organization

SAS documentation is now organized into topic-focused collections. For example, SAS Viya Administration docs are together in another collection. You'll find collections for a number of different topic areas, with each collection containing all the documentation for that specific topic area. For a list of all topic areas, see the Products Index A – Z .

Searching the SAS Help Center

When you use search in the new SAS Help Center, be aware that you're only searching the specific documentation collection that you are using at the time. For example, if you're inside the SAS Viya 3.2 Administration documentation and initiate a search, you will only see results for the doc within the SAS Viya 3.2 Administration collection. If you prefer to search all doc collections at once, you can use the search on support.sas.com or use a third-party search tool, such as Google or Bing. (For tips and guidelines on using search, visit our “yourturn@sas.com.

SAS Help Center: your gateway to documentation was published on SAS Users.