Tech

8月 202019
 

You can now easily embed a Python script inside a SAS decision within SAS Intelligent Decisioning. If you want to execute in SAS Micro Analytic Service (MAS), you no longer need to wrap it in DS2 code. The new Python code node does it for you. Here is how you can achieve it in less than 5 minutes:

Ready? Steady? Go!

The Python Script

If you want to run the following in MAS:

X1=1
X2=2
if X1 == None:
   X1 = 0
if X2 == None:
   X2 = 0
Y = 0.55 + 1 * X1 + 2 * X2 
print(Y)

Convert it to a Python function to meet the PyMAS requirements:

def execute(X1, X2):
       "Output: Y"
       if X1 == None:
            X1 = 0
       if X2 == None:
            X2 = 0
        Y = 0.55 + 1 * X1 + 2 * X2
       return Y
 
X1=1
X2=2
print(execute(X1,X2))

In a Jupyter Notebook, it will look like this:

Create an input data set to test the results

In SAS Studio V:

cas mysession sessopts=(metrics=true);
caslib _all_ assign;
options dscas;
 
data CASUSER.X1X2 (promote=yes);
length X1 8 X2 8;
X1=1; X2=1; output;
X1=1; X2=2; output;
X1=1; X2=3; output;
X1=1; X2=4; output;
run;
cas mysession terminate;

Create a decision in SAS Intelligent Decisioning 5.3

Choose New Python code file and call it python_logic. Copy the code from the Jupyter Notebook: from def until return Y. Watch out for your indentation!

Save and Close. Go to Variables:

Click on variables X1, X2, Y and change their type to Decimal.

Save the Decision.

Publish the decision to MAS

Test the publishing destination

Click on the published validation. Choose the data set you created:

Run. The code is executed.

Check the execution results

Y is the output of the python function. For the second line in the X1X2 data set, where X1 = 1 X2 =2, we get the result 5.55. Just as in the Jupyter Notebook.

Concepts

About Decisions in SAS

Put simply, there are three main components to a decision in SAS: inputs, logic, and outputs.

Inputs: the decision needs input variables. These can come from a CAS data set, a REST API or manual inputs.

Logic: a decision is defined by business rules, conditions, analytic models, custom code (DS2), etc. The new version allows execution of Python code in PyMAS (see below).

Outputs: a decision computes an output based on inputs and logic.

About SAS Micro Analytic Service (MAS)

A picture says a thousand words; here is a simplified diagram of MAS architecture (thanks to Michael Goddard):

MAS Architecture: Execution engine

You can apply or publish a decision using MAS. The SAS Micro Analytic Service provides the capability to publish a decision into operational environments.

When deployed as part of SAS Decision Manager, MAS is called as a web application with a REST interface by both SAS Decision Manager and by other client applications. MAS provides hosting for DS2 and Python programs and supports a "compile-once, execute-many-times" usage pattern.

The REST interface provides easy integration with client applications and adds persistence and clustering for scalability and high availability.

Prerequisites for Python decisions

You need SAS Intelligent Decisioning 5.3 in SAS Viya 3.4. SAS Intelligent Decisioning 5.3 is the wiz-kid of SAS Decision Manager 5.2. You do not need a certain Python version in your environment, but if you use certain libraries (e.g.: numpy, scipy, etc.), they might depend on the Python version.

Debugging your Python-based decisions

If you cannot replicate the example, it might be useful to consult the MAS logs. Log with MobaXtrem (or the software of your choice) to your server. Browse to the log of the concerned microservice, e.g.: microanalyticservice = MAS.

cd /opt/sas/viya/config/var/log/microanalyticservice/default/

Connect to the node using SFTP and open the log files. Check for errors, such as:

2019-06-27T21:31:12,251 [00000007] ERROR App.tk.MAS – Module ‘python1_0’ failed to compile in user context ‘provider’.

Resolve the Python error, per the messages you find in the log.

Solution for some errors

When you've made changes in your environment and have trouble getting your Python decisions to work, try to restart the following services:

  • decisionmanager
  • compsrv
  • launcher
  • runlauncher
  • microanalyticservice

Acknowledgements

Thanks to Marilyn Tomasic for finding the solution on what to do if you do not get the expected results. Thanks to Yi Jian Ching for sharing his knowledge and material.

References

Execute Python inside a SAS Decision: Learn how in less than 5 minutes was published on SAS Users.

8月 162019
 

The Output Delivery System (ODS) Graphics procedures provide many options to give you control over the look of your output. However, there are times when your output does not look like you thought it would.

This blog discusses how to solve some common output-related problems that we hear about in Technical Support.

All of the examples in this blog relate to creating scatter plots and bar charts from the same data set, SASHELP.CLASS. This data set, included in your SAS® installation, provides information about heights and ages for both male and female students.

Colors in the output are not as desired

Using the STYLEATTRS statement

The STYLEATTRS statement enables you to define attributes, such as color, for graphical elements.

In the following SGPLOT procedure example, the STYLEATTRS statement defines the colors for the marker symbols on a scatter plot as either blue or pink:

proc sgplot data=sashelp.class;
   styleattrs datacolors=(blue pink);
   scatter x=age y=height / group=sex
   markerattrs=(symbol=circlefilled);
run;

However, after you submit the code, the resulting plot does not use the specified colors. Instead, you see blue and red:

When defining colors for graphical elements, the DATACOLORS= option defines the colors for filled areas, and the DATACONTRASTCOLORS= option defines the colors for marker symbols and lines.

Because the scatter plot is creating marker symbols, you have to change the STYLEATTRS statement to use the DATACONTRASTCOLORS= option instead of the DATACOLORS= option. Here is the revised code:

proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink);
   scatter x=age y=height / group=sex 
   markerattrs=(symbol=circlefilled);
run;

Now, when you submit the updated code, the correct colors appear:

You can find more information about the STYLEATTRS statement in the STYLEATTRS section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Using an attribute map

An attribute map enables you to associate specific values for your plot GROUP= variable with specific graphical attributes.

The attribute map is defined in a data set that includes the following:

  • an ID variable that contains the name of the attribute map definition
  • a VALUE variable that contains the value of the plot statement GROUP= variable
  • any other variables for the attributes that you want to define

In the following example, the attribute map BARCOLORS is defined to associate the group value F with the color pink and the group value M with the color blue. Note that the FILLCOLOR variable is also used to define the colors for the bars of the bar chart.

data attrmap;
   id='barcolors';
      input value $ fillcolor $;
      datalines;
F pink
M blue
;
run;
proc format;
   value $ genderfmt
      'F'='Female'
      'M'='Male';
run;
proc sgplot data=sashelp.class dattrmap=attrmap;
   vbar age / response=height group=sex groupdisplay=cluster 
   nooutline attrid=barcolors;
   format sex $genderfmt.;
run;

However, the output shows blue and red bars, instead of using the pink and blue values that you specified:

In this case, a format is defined to display the group values as Female and Male. The attribute map associates the group values of F and M with the pink and blue bar colors that you want, so the values do not match.

You need to change the attribute map VALUE variable so that it contains the formatted value of the GROUP= variable. Here is the first part of the code again, highlighted to show where it has changed:

With the updated values, the output now displays the correct colors:

You can find more information about attribute maps in the SG Attribute Maps section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Symbols in the output are not as desired

In the following PROC SGPLOT example, the STYLEATTRS statement defines the colors for the marker symbols on a scatter plot as either blue or pink. Also, the marker symbols should be either filled circles or filled squares:

ods html style=styles.htmlblue;
proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink) 
   datasymbols=(circlefilled squarefilled);
   scatter x=age y=height / group=sex;
run;

You submit the code using the STYLES.HTMLBLUE style. The output shows all the symbols as circles, and none are squares:

The ATTRPRIORITY ODS Graphics option determines how attributes are cycled. The default value for the ATTRPRIORITY option is defined in the style that is being used.

The STYLES.HTMLBLUE style sets the default value COLOR for the ATTRPRIORITY option. This COLOR value cycles the symbols through your specified colors before the second symbol is generated.

You want to set the ATTRPRIORITY ODS Graphics option to NONE in an ODS GRAPHICS statement. That ODS GRAPHICS statement then prevents the symbols from cycling through the colors list:

ods graphics /attrpriority=none;
ods html style=styles.htmlblue;
proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink) 
   datasymbols=(circlefilled squarefilled);
   scatter x=age y=height / group=sex;
run;

In the updated output, squares are now seen in the correct color, pink:

You can read more about how attributes are cycled in the following blog post by Rick Wicklin:
Attrs, attrs, everywhere: The interaction between ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS in ODS graphics

Annotation is not placed in the output as desired

Adding an oval

In this example, you want the resulting scatter plot to contain an oval around the circle that represents the tallest student:

proc sql;
   create table maxheight as
   select height, age
   from sashelp.class
        having height=max(height);
quit;
 
data anno;
   set maxheight;
       drawspace='datavalue';
       function='oval';
       x=age;
       y=height;
       width=6;
       height=6;
       linecolor='red';
run;
 
proc sgplot data=sashelp.class sganno=anno;
   scatter x=age y=height;
      xaxis offsetmax=0.1 offsetmin=0.1;
      yaxis offsetmax=0.1 offsetmin=0.1;
run;

After you submit the code, you notice that the resulting plot does not include the oval:

If you have used annotation in SAS/GRAPH® software, you might be accustomed to using the X and Y variables in the annotation data set to indicate the location of the annotation. However, ODS Statistical Graphics (SG) annotation uses X1 and Y1 variables for the location of the annotation.

Therefore, you need to change the X and Y variables in the annotation data set to X1 and Y1 instead:

data anno;
   set maxheight;
       drawspace='datavalue';
       function='oval';
       x1=age;
       y1=height;
       width=6;
       height=6;
       linecolor='red';
run;

In the next version of the scatter plot, the oval now appears:

Adding a text label

In this example, you want to place a text label next to the circle that represents the tallest student.

proc sql;
   create table maxheight as
   select height, age, name
   from sashelp.class
        having height=max(height);
quit;
 
data anno;
   set maxheight;
       drawspace='datavalue';
       function='label';
       x1=age;
       y1=height;
       label=name;
       textsize=10;
       textcolor='red';
       anchor='bottom';
run;
 
proc sgplot data=sashelp.class sganno=anno;
   scatter x=age y=height;
      xaxis offsetmax=0.1 offsetmin=0.1;
      yaxis offsetmax=0.1 offsetmin=0.1;
run;

However, after you run this code, the output does not include the text label:

Again, if you are a SAS/GRAPH user, you might assume that the LABEL function can place text on a plot. However, ODS SG annotation needs to use the TEXT function instead.

In the previous DATA step, you need to change the FUNCTION variable so that it contains the value TEXT:

data anno;
   set maxheight;
       drawspace='datavalue';
       function='text';
       x1=age;
       y1=height;
       label=name;
       textsize=10;
       textcolor='red';
       anchor='bottom';
run;

After you revise the DATA step and resubmit your code, you then see that the text label appears where intended:

You can find more information about SG annotation in the SG Annotation section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Summary

These are just a few examples to demonstrate some of the common output-related problems that we hear about in Technical Support. If your graphical output does not appear as you wanted, consider the options that you are using and make sure that you are using the correct option.

Learn More

How to fix common problems in output from ODS Graphics procedures was published on SAS Users.

8月 132019
 

Raw data doesn’t change an organization, and neither do analytics on their own. It’s making decisions based on that data and the results of analytics that drives change through a company. Every decision is important and influences an organization. Thousands of decisions need to be made every day and many decisions are dependent on other decisions in an interconnected network.

SAS Intelligent Decisioning combines business rules management, decision processing, real-time event detection, decision governance and analytics to automate and manage decisions across the enterprise. It supports customer-facing activities such as personalized marketing and next-best action, plus decisions affecting customers, including credit services and fraud prevention.

Overview

Business rules

An integrated business rule management platform enables fast rule construction, testing, governance and integration within decision flows. You can manage rule versions for tracking and governance. The solution allows users to create complex business logic supported by sophisticated functions and integration with Lookup Tables.

Decision flows

A graphical drag-and-drop interface allows users to build decisions with minimal programming effort. Decisions are created in a decision flow that orchestrates business rules, analytical models, database access, custom code objects and more.

Graphical editor to create decisions

Further, it is possible to test and maintain different versions of decisions and business rules before deploying them for production real-time or batch execution.

The high-performance, real-time Micro Analytics Services (MAS) engine can handle more than 5,000 real-time transactions per second with response times of 10 milliseconds per transaction. The REST interface to call decisions or business rules in real-time provides simple integration with most third-party applications.

Monitor test results through Decision Path tracking

New Features

Recently, the latest release of SAS intelligent Decisioning was released and I’d like to highlight some of the new features.

SQL Query Node

Users can now submit SQL directly into a SQL Query node without supplying any additional coding logic. The SQL Query node supports SELECT, INSERT, DELETE and UPDATE.

To link a SQL statement to a decision, just point tables and columns to the decision variables as shown below in the curly brackets. Intelligent Decisioning will then automatically pass data into the SQL as appropriate.

If you query data via a select statement, the result is returned in a Datagrid. A Datagrid is a data type for an object in Intelligent Decisioning and represents data in a table format that belongs to a single record.

Datagrids are used in many places in Intelligent Decisioning and there is a rich set of Datagrid functions to access and work with data in a Datagrid.

Python Code Node

Intelligent Decisioning provides an environment that aims to minimize the need to write code to build decisions. But if necessary, it is possible to submit code. Intelligent Decisioning supports writing code in Python as part of a decision flow. Data from a decision flow can be passed into the Python code and return values will be passed back from Python into the decision flow.

To enable coding in Python, a Python execution environment needs to be installed alongside Intelligent Decisioning. If a decision flow contains a Python Code Node, the Python code will automatically be executed in the Python environment as part of the overall decision.

Decision Flow containing Python code node

A code editor in Intelligent Decisioning allows you to edit your Python code within the environment.

A Python code editor is part of Intelligent Decisioning

Decision Node

Decision flows can call other decision flows. This opens the way to designing and building modular decisions with “pluggable” components. You can also build reusable decisions which are called by different decision flows. Building decisions in such a modular way makes it easier to read and maintain decision flows.

Drill down from one decision to the next

Treatments

Treatments are lists of attributes with fixed or dynamic values.

Treatments are used to define offers to present to a customer as a result of an inbound marketing campaign. Or treatments can be used as parameter lists to control engine settings. There are numerous use cases for treatments.

Treatment attribute list

To determine if a treatment is valid for a decision, you can set Eligibility Rules to decide when a treatment will be used. For audit reasons and to track changes over time, you can also have different versions of a treatment.

To utilize treatments, you group them together in treatment groups, which can then be called from a decision flow.

Conclusion

Manging and analysing high volumes of data to make thousands of decisions every day in an automated fashion and applying analytics to real-time customer interactions require a sophisticated and complete solution like SAS Intelligent Decisioning. It enables users to create, test, control versioning and trace analytically driven decisions all in one solution.

By making decisions, smarter organizations become more efficient. As mentioned in the beginning: Data doesn’t change your organization, decisions do!

Learn more

Video: SAS Intelligent Decisioning | Product Overview
Documentation: SAS Intelligent Decisioning
Product: SAS Intelligent Decisioning

SAS Intelligent Decisioning: Intro and Update was published on SAS Users.

8月 092019
 

Opening Plenary session, Esri UC 2019

Several of my colleagues and I attended the annual Esri User Conference last month in San Diego - along with 18,000 other Geo professionals.  It was a busy week of meetings, seminars and talks about the latest in GIS and Spatial technologies.  The days were long and exhausting, but it was also exciting and a ton of fun.  As we continue to process, plan and prepare to integrate some of these technologies into SAS Visual Analytics, I thought it would be beneficial to highlight the Esri features available in VA today.

One topic that received a lot of questions during this year’s SAS Global Forum in Dallas was that of Geocoding.  Geocoding is the process of transforming text address data into numeric latitude and longitude values.  Once the latitude and longitude are known, they can be mapped and analyzed spatially.  SAS has offered geocoding capabilities for quite some time as a part of SAS/Graph.  Beginning with SAS v940m5, PROC GEOCODE has moved into BASE SAS.  See my colleague’s blog posts here and here for more information on geocoding from BASE SAS.

But Geocoding is no longer limited to just Base SAS.  You can also geocode from within Visual Analytics, thanks to the integration with the Esri geocoding api.  This feature is part of the Esri Premium agreement, and became available in VA 8.3.   Esri premium features require an existing relationship and credentials with Esri.  This post assumes that relationship exists and your credentials have been validated.  I will discuss the details of the Esri premium features in a future post, but for today the focus is how to use the Esri Geocoding feature from VA with a real-world data set.

1. Getting the data into Visual Analytics

We will be using point data from the City of Dallas for the Public Library branch locations.  You can download the .csv file from the Dallas Open Data portal.  After downloading, it must be imported into VA for geocoding.

  • From the Data tab in VA, select Import > Local File
  • Navigate to the location of the Dallas library .csv file and select it
  • Adjust the default settings, if desired, and click the ‘Import Item’ button
  • Once you see the green success message, the data has been imported into VA and is ready to be geocoded. Click the ‘Cancel’ button

Message indicating successful data import

2. Selecting the data columns to geocode on

Accessing the Geocoding feature in VA follows a similar process to the steps we just performed to import the .csv file.

  • From the Data tab in VA, select Import > Esri > Geocode. Here, you must select the location of the newly imported library data set.  This path will vary depending upon the configuration of your VA instance.  For my installation, it is located at cas-shared-default > Public folder > CITY_OF_DALLAS_LIBRARY_LOCATIONS.  Once located, click the 'Select' button
  • The Geocoding Import window will open. This window should look familiar.  The top half is the same as the Import data we just used to get the .csv file into VA.  Essentially, the geocoding process is a new data import.  It will send selected columns to Esri via a REST api call.  The response will contain the corresponding latitude and longitude values we desire.  They will be added to our existing data set and imported into VA as a new geocoded data set.  The name of the new data set will have _GEO_CODE appended to the end of the original data set name.  This name can be modified as desired.

Geocoding selection dialog window

  • At the bottom of the Geocoding Import window are two list boxes, Available items and Selected items. The Available items box on the left contains all columns in the data set.  Select the column(s) containing the address information you wish to geocode.  Double click or click the right arrow to move them to the Selected items window on the right.  In this example, we are using the Address column.
  • VA concatenates the selected column(s) to generate a sample address for geocoding. Clicking the ‘Test’ button returns coordinates for the sample address and a score representing the confidence level of the results.  In the screenshot above, our score is 71/100 for the test address.  Not bad, but it could be better.  More on this a bit later.
  • To finish the geocoding process, click the ‘Import Item’ at the top of the page, as we did with the original .csv file import. This time, you will be presented with a new dialog window.  Geocoding, as with other Esri premium features require the use of credits.  This dialog indicates how many Esri credits will be used by the geocoding process and will also be discussed in detail in a future post.

Esri credit usage alert dialog

For now, select 'Yes' to continue.  When you see the green success message, the operation is complete.  We are now ready to map our Dallas Library locations.  Click 'Ok' to open the new geocoded data set.

3. Create the geography variable and display the map

Next, we need to create our geography variable from the new geocoded data set.  As part of the geocoding process, four new columns have been added to the new data set: esri_latitude, esri_longitude, esri_score, esri_address.  We only need the esri_latitude and esri_longitude columns for our map.

  • Select the Branch Name category variable and change its Classification to Geography
  • For Geography data type, select Custom Coordinates
  • Select esri_latitude for Latitude
  • Select esri_longitude for Longitude
  • Click 'OK'
  • Drag the Branch Name geography variable to the canvas to create the map

Map of non-unique geocoded addresses

What happened??  Our data set contains Dallas Public library locations, so why are the data points spread across the world?  It’s all in the data.  If you look at the original data a bit deeper, you will notice the Address field we selected for the geocoding only contains the street number and street name of the library location.  It does not contain enough information to make it unique.  Therefore, during the geocoding process, the first instance of that address will be considered a match, regardless of where it is actually located.

Detailed view of incorrect geocoded address

In the image above for the Preston Royal branch, its street number and name were a perfect match to a location in Eugene, Oregon.  Not quite what we were looking for.  So, how do we fix this?  To make our addresses unique, it requires a simple addition to the source data .csv file.

Column selection to ensure unique addresses for geocoding

We need to add a ‘City’ and ‘State’ column to the original .csv file with the values of ‘Dallas’ and ‘Texas’ assigned to all entries.  This will ensure each address is unique and within our area of interest.  Re-import the new .csv file and geocode it using the Address, City and State columns.  The result?  A confidence score of a perfect 100.  Much better than our first attempt!  This will now give us the map we desire for the Dallas Public Library locations.

 

Final geocoded map of Dallas Library branches

In this post, I used real-world data to illustrate two things: the importance of knowing your data set, and how to geocode address information in SAS Visual Analytics.  Public data sets are a great resource but need to be used with a critical eye.  They may still need additional cleansing in order to work for your situation.

The geocoding feature is one example of the premium Esri features currently available in VA.  In future posts, I will go into more detail on other Esri features available, what make these features ‘premium’ and examples of how to use them.  Stay tuned!

Esri integration with SAS Visual Analytics: Geocoding was published on SAS Users.

8月 082019
 

Some key components of CASL are the action statements. These statements perform tasks that range from configuring the connection to the server, to summarizing large amounts of data, to processing image files. Each action has its own purpose. However, there is some overlapping functionality between actions. For example, more than one action can summarize numeric variables.
This blog looks at three actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS. Each of these actions generates summary statistics. Though there might be more actions that generate the same statistics, these three are a good place to start as you learn CASL.

Create a CAS table for these examples

The following step generates a table called mydata, stored in the casuser caslib, that will be used for the examples in this blog.

cas;
libname myuser cas caslib='casuser';
data myuser.mydata;
	length color $8;
	array X{100};
	do k=1 to 9000;
   	do i=1 to 50;
      	X{i} = rand('Normal',0, 4000);
      end;
      do i=51 to 100;
      	X{i} = rand('Normal', 100000, 1000000);
      end;
      if x1 < 0 then color='red';
      	else if x1 < 3000 then color='blue';
         else color='green';
		output;
	end;
run;

SIMPLE.SUMMARY

The purpose of the Simple Analytics action set is to perform basic analytical functions. One of the actions in the action set is the SUMMARY action, used for generating descriptive statistics like the minimum, maximum, mean, and sum.
This example demonstrates obtaining the sum, mean, and n statistics for five variables (x1–x5) and grouping the results by color. The numeric input variables are specified in the INPUTS parameter. The desired statistics are specified in the SUBSET parameter.

proc cas;
   simple.summary / 
      inputs={"x1","x2","x3","x4","x5"},
      subset={"sum","mean","n"},
      table={caslib="casuser",name="mydata",groupBy={"color"}},
      casout={caslib="casuser", name="mydata_summary", replace=true};
run;
	table.fetch /
		table={caslib="casuser",name="mydata_summary" };
run;
quit;

The SUMMARY action creates a table that is named mydata_summary. The TABLE.FETCH action is included to show the contents of the table.

The mydata_summary table can be used as input for other actions, its variable names can be changed, or it can be transposed. Now that you have the summary statistics, you can use them however you need to.

AGGREGATION.AGGREGATE

Many SAS® procedures have been CAS-enabled, which means you can use a CAS table as input. However, specifying a CAS table does not mean all of the processing takes place on the CAS server. Not every statement, option, or statistic is supported on the CAS server for every procedure. You need to be aware of what is not supported so that you do not run into issues if you choose to use a CAS-enabled procedure. In the documentation, refer to the CAS processing section to find the relevant details.
When a procedure is CAS-enabled, it means that, behind the scenes, it is submitting an action. The MEANS and SUMMARY procedure steps submit the AGGREGATION.AGGREGATE action.
With PROC MEANS it is common to use a BY or CLASS statement and ask for multiple statistics for each analysis variable, even different statistics for different variables. Here is an example:

proc means sum data=myuser.mydata noprint;
  by color;
   var x1 x2 x3;
   output out=test(drop=_type_ _freq_) sum(x1 x3)=x1_sum x3_sum
   max(x2)=x2_max std(x3)=x3_std;
run;

The AGGREGATE action produces the same statistics and the same structured output table as PROC MEANS.

proc cas;
	aggregation.aggregate / 
		table={name="mydata",caslib="casuser",groupby={"color"}}
      casout={name="mydata_aggregate", caslib='casuser', replace=true}
      varspecs={{name='x1', summarysubset='sum', columnnames={'x1_sum'}}, 
                {name='x2', agg='max', columnnames={'x2_max'}},
                {name='x3', summarysubset={'sum','std'},
                columnnames={'x3_sum','x3_std'}}}
      savegroupbyraw=true, savegroupbyformat=false, raw=true;
run;
quit;

The VARSPECS parameter might be confusing. It is where you specify the variables that you want to generate statistics for, which statistics to generate, and what the resulting column should be called. Check the documentation: depending on the desired statistic, you need to use either SUMMARYSUBSET or AGG arguments.

If you are using the GROUPBY action, you most likely want to use the SAVEGROUPBYRAW=TRUE parameter. Otherwise, you must list every GROUPBY variable in the VARSPECS parameter. Also, the SAVEGROUPBYFORMAT=FALSE parameter prevents the output from containing _f versions (formatted versions) of all of the GROUPBY variables.

DATAPREPROCESS.RUSTATS

The RUSTATS action, in the Data Preprocess action set, computes univariate statistics, centralized moments, quantiles, and frequency distribution statistics. This action is extremely useful when you need to calculate percentiles. If you ask for percentiles from a procedure, all of the data will be moved to the compute server and processed there, not on the CAS server.
This example has an extra step. Actions require a list of variables, which can be cumbersome when you want to generate summary statistics for more than a handful of variables. Macro variables are a handy way to insert a list of strings, variable names in this case, without having to enter all of the names yourself. The SQL procedure step generates a macro variable containing the names of all of the numeric variables. The macro variable is referenced in the INPUTS parameter.
The RUSTATS action has TABLE and INPUTS parameters like the previous actions. The REQUESTPACKAGES parameter is the parameter that allows for a request for percentiles.
The example also contains a bonus action, TRANSPOSE.TRANSPOSE. The goal is to have a final table, mydata_rustats2, with a structure like PROC MEANS would generate. The tricky part is the COMPUTEDVARSPROGRAM parameter.
The table generated by the RUSTATS action has a column called _Statistic_ that contains the name of the statistic. However, it contains “Percentile” multiple times. A different variable, _Arg1_, contains the value of the percentiles (1, 10, 20, and so on). The values of _Statistic_ and _Arg1_ need to be combined, and that new combined value generates the new variable names in the final table.
The COMPUTEDVARS parameter specifies that the name of the new variable will hold the concatenation of _Statistic_ and _Arg1_. The COMPUTEDVARSPROGRAM parameter tells CAS how to create the values for NEWID. The NEWID value is then used in the ID parameter to make the new variable names—pretty cool!

proc sql noprint;
	select quote(strip(name)) into: numvars separated by ','
	from dictionary.columns 
 	where libname='MYUSER' and memname='MYDATA' and type='num';
quit;
 
proc cas;
	dataPreprocess.rustats / 
   	table={name="mydata",caslib="casuser"} 
   	inputs={&numvars}
   	requestpackages={{percentiles={1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95 99},scales={"std"}}}
   	casoutstats={name="mydata_rustats",caslib="casuser"} ;
 
	transpose.transpose / 
   	table={caslib='casuser', name="mydata_rustats", groupby={"_variable_"},
			computedvars={{name="newid",format="$20."}},computedvarsprogram="newid=strip(_statistic_)||compress(strip(_arg1_),'.-');"}
   	transpose={"_value_"} 
   	id={"newid"}
   	casOut={caslib='casuser', name="mydata_rustats2", replace=true};
   run;
quit;

Here is a small portion of the final table. Remember, you can use the TABLE.FETCH action to view the table contents as well.

Summary

Summarizing numeric data is an important step in analyzing your data. CASL provides multiple actions that generate summary statistics. This blog provided a quick overview of three of those actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS.
The wonderful part of so many choices is that you can decide which one best fits your needs. Summarizing your data with actions also ensures that all of the processing occurs on the CAS server and that you are taking full advantage of its capabilities.
Be sure to use the DROPTABLE action to delete any tables that you do not want taking up space in memory:

proc cas;
	table.droptable / caslib='casuser' name='mydata' quiet=true;
	table.droptable / caslib='casuser' name='mydata_summary' quiet=true;
	table.droptable / caslib='casuser' name='mydata_aggregate' quiet=true;
	table.droptable / caslib='casuser' name='mydata_rustats' quiet=true;
	table.droptable / caslib='casuser' name='mydata_rustats2' quiet=true;
quit;
cas casauto terminate;

Learn More

Summarization in CASL was published on SAS Users.

8月 052019
 

As a company, SAS consistently supports #data4good initiatives designed to help those less fortunate around the world. SAS Press team members recently took some time to reflect on the SAS initiatives that inspired them. We thought this would be a good opportunity to introduce some of the team who work so hard on our SAS Press books.

Sian Roberts, Publisher

I lead the SAS Press team and oversee the publication of our books from start to finish, including manuscript acquisition, book development, production, sales, and promotion.

Having lost both my dad and grandmother to cancer, the work SAS is doing to help improve care for cancer patients by tailoring treatments for individuals particularly resonates with me. For example, the wonderful work that is being done with Amsterdam University Medical Center to use computer vision and predictive analytics to improve care for cancer patients is of particular interest to me. My hope is that by using analytics and AI on data gathered from hospitals, research institutes, pharma and biotech companies, patterns can be identified earlier, and survival rates will increase.

 
 

Suzanne Morgen, Developmental Editor

I work with authors to help them develop and write their books, then go to conferences to sell those books and recruit more authors!

At SAS Global Forum, we heard about a pilot program at the New Hanover County Department of Social Services that uses SAS to alert caseworkers to risks for children in their care. I have been a foster parent for several years, so I am excited about any new resources that would help social workers intervene earlier in kids’ lives and hopefully keep them safer and even reduce the need for foster care. I hope SAS is able to partner with many more social services departments and use analytics to help protect more kids in the state and across the country.

Emily Scheviak-Livesay, Senior Business Operations Specialist

As a SAS employee for 22 years, I have learned to wear many hats. At SAS Press, I keep the business running smoothly and manage the metadata of all our books in all formats. I also work with our partners to ensure our titles are available both in the US and globally.

I love this story about JMP working with the Animal Humane Society! I’m a huge fan of “adopt, don’t shop” and it makes me so proud to work at a company where one of our products was used to assist in furthering the cause. For JMP to be able to take a huge amount of data from various sources and turn it into valuable information for The Animal Humane Society is amazing! Helping to care and save animals is what it’s all about. It truly is a fairy “tail” ending.

 

Missy Hannah, Senior Associate Developmental Editor

I work directly with SAS Press and JMP authors to plan and implement marketing strategies for our books. I grew up with a mother who not only was a Systems Engineer but who taught me all about technology. Looking back, I was always watching her code and work with technology and IT my entire life and seeing her do this meant those things came very easily for me. But often, other young women don’t find mentors in the field of data analytics and technology. Data shows that women account for less than 20% of computer science degrees in the U.S. and hold less than 25% of STEM-related jobs. That is why the Women’s In Tech Network at SAS has been something I have really enjoyed having at my company. SAS creating the Women’s Initiative Network (WIN) and all the other work they are doing to increase women in STEM and data fields is something that really matters.

Catherine Connolly, Developmental Editor

I work with authors to develop books that support SAS’ business initiatives. My main areas of focus are JMP, data management, and IoT.

There are so many SAS initiatives through #data4good that make me proud to be a SAS employee. One initiative I read earlier this year that stuck with me was a partnership between SAS and CAP Science to combat against repeated domestic violence. CAP Science developed wearables to be worn by both the domestic violence victim and the offender. The wearable uses SAS software to continuously collect data and report on the offender’s location in real-time in an effort to stop future attacks.

 
 

We hope you enjoyed this small insight into some of our team. We are all very proud to work for a company that takes the time to improve the lives of those who need it and uses the power of data and analytics to help the world.

What SAS #data4good initiative has been your favorite? Make sure to comment below!

What really matters: SAS #data4good and the SAS Press team was published on SAS Users.

8月 052019
 

As a company, SAS consistently supports #data4good initiatives designed to help those less fortunate around the world. SAS Press team members recently took some time to reflect on the SAS initiatives that inspired them. We thought this would be a good opportunity to introduce some of the team who work so hard on our SAS Press books.

Sian Roberts, Publisher

I lead the SAS Press team and oversee the publication of our books from start to finish, including manuscript acquisition, book development, production, sales, and promotion.

Having lost both my dad and grandmother to cancer, the work SAS is doing to help improve care for cancer patients by tailoring treatments for individuals particularly resonates with me. For example, the wonderful work that is being done with Amsterdam University Medical Center to use computer vision and predictive analytics to improve care for cancer patients is of particular interest to me. My hope is that by using analytics and AI on data gathered from hospitals, research institutes, pharma and biotech companies, patterns can be identified earlier, and survival rates will increase.

 
 

Suzanne Morgen, Developmental Editor

I work with authors to help them develop and write their books, then go to conferences to sell those books and recruit more authors!

At SAS Global Forum, we heard about a pilot program at the New Hanover County Department of Social Services that uses SAS to alert caseworkers to risks for children in their care. I have been a foster parent for several years, so I am excited about any new resources that would help social workers intervene earlier in kids’ lives and hopefully keep them safer and even reduce the need for foster care. I hope SAS is able to partner with many more social services departments and use analytics to help protect more kids in the state and across the country.

Emily Scheviak-Livesay, Senior Business Operations Specialist

As a SAS employee for 22 years, I have learned to wear many hats. At SAS Press, I keep the business running smoothly and manage the metadata of all our books in all formats. I also work with our partners to ensure our titles are available both in the US and globally.

I love this story about JMP working with the Animal Humane Society! I’m a huge fan of “adopt, don’t shop” and it makes me so proud to work at a company where one of our products was used to assist in furthering the cause. For JMP to be able to take a huge amount of data from various sources and turn it into valuable information for The Animal Humane Society is amazing! Helping to care and save animals is what it’s all about. It truly is a fairy “tail” ending.

 

Missy Hannah, Senior Associate Developmental Editor

I work directly with SAS Press and JMP authors to plan and implement marketing strategies for our books. I grew up with a mother who not only was a Systems Engineer but who taught me all about technology. Looking back, I was always watching her code and work with technology and IT my entire life and seeing her do this meant those things came very easily for me. But often, other young women don’t find mentors in the field of data analytics and technology. Data shows that women account for less than 20% of computer science degrees in the U.S. and hold less than 25% of STEM-related jobs. That is why the Women’s In Tech Network at SAS has been something I have really enjoyed having at my company. SAS creating the Women’s Initiative Network (WIN) and all the other work they are doing to increase women in STEM and data fields is something that really matters.

Catherine Connolly, Developmental Editor

I work with authors to develop books that support SAS’ business initiatives. My main areas of focus are JMP, data management, and IoT.

There are so many SAS initiatives through #data4good that make me proud to be a SAS employee. One initiative I read earlier this year that stuck with me was a partnership between SAS and CAP Science to combat against repeated domestic violence. CAP Science developed wearables to be worn by both the domestic violence victim and the offender. The wearable uses SAS software to continuously collect data and report on the offender’s location in real-time in an effort to stop future attacks.

 
 

We hope you enjoyed this small insight into some of our team. We are all very proud to work for a company that takes the time to improve the lives of those who need it and uses the power of data and analytics to help the world.

What SAS #data4good initiative has been your favorite? Make sure to comment below!

What really matters: SAS #data4good and the SAS Press team was published on SAS Users.

8月 022019
 

SAS Global Forum is a time for SAS users to come together to share their knowledge. Are you ready and willing to share your SAS knowledge?

Take a look at the proceedings from 2019 for ideas and inspiration. Search for any of the top 10 sessions below to get a feel for what was popular this year:

    1. Cool PROC SQL Tricks
    2. End to End Modeling and Machine Learning in SAS® Viya®
    3. Getting the most out of SAS® Macro Language and SQL
    4. How to be an Effective Statistician
    5. Data Governance: Harder, Better, Faster, Stronger
    6. Tell Me a Data Story: Data Visualization
    7. Let Leonardo da Vinci Inspire Your Next Presentation: Data Visualization
    8. Comparing SAS® Viya® and SAS® 9.4: How Their Features Complement Each Other
    9. From Words to Actions: Using Text Analytics to Drive Business Decisions
    10. A Beginner's Guide to ARRAYs and DO Loops

Remember, accepted abstracts for primary authors get a 50% discount on registration. Call for content closes September 30, 2019.

Still not sure?

What's keeping you from submitting? Maybe you can't seem to come up with a topic, or the thought of writing that abstract and working outline is overwhelming. Don't worry, we've got you covered. We have a great Presenter Mentoring Program that will pair you up with a seasoned expert to guide you through the process. We also have resources to help get you started. We’re here to help you throughout the process.

Award programs

We continue to support new SAS users and international attendees with our New SAS Professional and International Professional Award Programs. Check out the benefits you may be eligible for if you are a SAS professional using SAS for five years or less, or you live outside of the contiguous 48 United States. You can potentially get lodging assistance (and travel assistance for international attendees) if you submit an abstract and working outline that is accepted. Visit the conference website to learn about the many award and scholarship opportunities to help get you to Washington, DC.

SAS Global Forum 2020 is going to be epic! It's our first year holding this event at the Walter E. Washington Convention Center in downtown Washington, D.C. You won't want to miss it, so get your abstract in soon!

SAS Global Forum Call for Content is Open was published on SAS Users.

7月 262019
 

In a previous post, Zero to SAS in 60 Seconds- SAS Machine Learning on SAS Analytics Cloud, I documented my experience with a SAS free trial on the SAS Analytics Cloud. Well, the engineers at SAS have been busy and created another free trial. The new trial covers SAS Event Stream Processing (ESP).

This time last year (when just starting at SAS), I only knew ESP as extrasensory perception. I'm more enlightened now. Working through this exercise introduced me to how event stream processing is a powerful and effective tool for analyzing data using machine learning and streaming analytics to uncover insights for real-time decision making. In a nutshell, you create a model, stream your data, process the results, and make timely decisions based on the results.

The trial uses SAS ESPPy, allowing you to embed an ESP project inside a Python pipeline. To see ESPPy in action take a look at this video. To learn more about ESP and IoT see this article on the SAS Communities Library. In this article I chronicle my journey through the trial while introducing key concepts and operations of ESP.

Register and get started

The process to register and initial login are identical to the machine learning article. You must have a SAS Profile to participate in the trial. The only difference is you need to follow this link to sign up for the ESP trial. Please refer to the machine learning article for detailed steps of signing up and logging in.

The use case

SAS Solar Farm in Cary

The SAS Solar Farm sits on almost 12 acres of SAS Headquarters property. There are 10,276 solar panels producing more than 3.6 million kilowatt hours annually. That’s enough power for more than 325 average sized U.S. homes.

As part of the environment management, it is important to continuously monitor the operation of the solar panels to optimize configuration parameters, detect potential equipment failure, and accurately forecast the amount of energy generated. Factors considered include panel angles, time of day, seasons, and weather patterns as the energy generated depends directly of the amount of sun available to the panels.

The ESP project in this demo is pre-loaded in the trial and is run through a Jupyter notebook. The project shows the monitoring of energy (kWh) and power (kW) generated during a specific time interval eliminating localized outlier effects and triggering alerts when there is a pre-defined difference in the energy generated between subsequent time intervals.

Solar Farm Data represented as digital art

Take two minutes and watch this video on how SAS uses SAS software to create a work of art with solar farm data.

Disclaimer: no sheep were harmed during data collection or writing of this article.

Navigating the trial

Once logged into the trial, you see the Applications screen.

ESP trial Applications screen

The Data and Team options in the left pane behave exactly as those in the machine learning trial. These sections allow you to access data and manage a multi-user system. Select the SAS Event Stream Processing icon to start a JupyterLab session.

JupyterLab home screen

I will not go into the details of JupyterLab here. The left pane contains menus, file management, and other options. The pane on the right displays three options:

Python 3 Notebook - a blank Jupyter notebook - documents that combine live, runnable code with narrative text (Markdown), equations (LaTeX), images, interactive visualizations and other rich output
Python 3 Console - a blank Python console - code consoles enable you to run code interactively in a kernel
Text File - basic text editor - enables you to edit text files in JupyterLab

For this article we're going to follow along and interact with the pre-loaded demo Solar Farm ESP project. To locate the Jupyter notebook double click the demo directory from the left pane.

Select the demo directory from the left pane

Next select Event_Stream_Processing. Before proceeding with the demo, I'd highly suggest opening the README.ipynb file.

Contents of the README notebook

Here you will find overview and environment organization information for the trial. The trial uses SAS ESPPy for designing, testing, and deploying projects on ESP Servers.

Step through the demo

Before starting the trial, I needed a little background on event stream processing. I located the SAS ESP product documentation. I recommend referring to it for details on the ESP model, objects, and workflow.

To access the demo, double click the demo directory from the left pane. The trial comes with five pre-loaded demos. Feel free to try any/all of them. Double click on ESP Basic Project - Solar Farm.ipynb to display the Solar Farm notebook. The notebook walks you through the ESP model creation and execution. To run a command place the cursor in a command cell and select the 'Run' button (triangle-shaped button at the top of the notebook). If no response returns when running the cell block, assume the commands ran successfully.

Below is a brief description of the steps in the project:

  1. Create the project and query used - this creates dedicated space and objects where the ESP process takes place
  2. Create input and aggregate windows - this action extracts desired data and creates data subsets from the stream
  3. Add a join window - this brings together lag and current values into the project
  4. Add a compute window - this calculates the difference between the previous and current event
  5. Add a filter window - this action filters occurrences outside a threshold value; this creates an alert for potential mechanical issues
  6. Define workflow connections - this defines the workflow between the various windows in the project
  7. Save the project - this generates an XML file for the project
  8. Load the project to the ESP Server - this loads the project and produces a graphical representation of the workflow

    Solar Farm project workflow

  9. Start streaming data - in this example, rather than streaming data in real time, the stream derives from the solar farm table data
  10. View solar farm data - this creates a graphical representation of streaming data

    Solar Farm graph for kW and kWh

While not included in the demo, the streaming data would pass through the filter and if a threshold breach occurs, an alert is created. Considering the graph above, alerts could very well have occurred just before 1:15 pm (IntkW drops from 185 to 150) and just before 2:30 pm (IntkW drops from 125 to 35).

Your turn

Now that you have a taste of ESP, feel free to step through the rest of the demos. You may also load your own data and create your own ESP models. Feel free to share your experience and what you create by leaving a comment.

SAS Event Stream Processing on SAS Analytics Cloud - my journey was published on SAS Users.

7月 252019
 

Years ago I saw a line of SAS code that was really puzzling. It was a statement that started with:

if 0 then … ;

What? This was a statement that would always be evaluated as false. Why would anyone write such a statement? Recently, I was discussing with a good friend of mine a macro I wrote where I used an IF statement that was always false.

DATA Step compiler shortcut
In some programming languages, the "IF 0" trick ensures that the code that follows would never be executed.  But with the DATA step, SAS actually does still evaluate the "zeroed-out" code during the compile phase.

Let's see why this can actually be useful.

The problem I was trying to solve was to assign the number of observations in a data set to a macro variable. First, the code, then the explanation.

if 0 then set Oscar Nobs=Number_of_Obs; call symputx('Number',Number_of_Obs); stop; run;

The SET option Nobs= places the number of observations in data set Oscar in a variable I called Number_of_Obs. This happens at compile time, before any observations are read from data set Oscar. The CALL SYMPUTX routine takes two arguments. The first is the name of a macro variable (Number) and the second is the name of a SAS variable (Number_of_Obs). The call routine assigns the value of the SAS variable to the Macro variable.

Titles by Ron CodyMost data sets stop executing when you reach the end-of-file on any data set. Because you are not reading any values from data set Oscar, you need a STOP statement to prevent the Data Step from looping. When this Data Step executes, the number of observations in data set Oscar is assigned to the macro variable called Number.

Now you, too, can write programs using this useful trick. The macro where I used this trick can be found in my book, Cody's Data Cleaning techniques, 3rd edition. You can see a copy of this macro (and all the other macros described in the book) by going to my author web site: support.sas.com/cody, scrolling down to the Data Cleaning book and clicking on the link to see example and data. One of the macros that use this trick is called HighLow.sas. As an alternative, you can buy the book!

By the way, you can download programs and data from any of my books by using this same method and it is free!

Books by Ron Cody See them all!

A DATA step compiler trick to get the record count was published on SAS Users.