7月 192019
 

When a new Moon passes between the Earth and the Sun, the Moon can cast a shadow on certain regions of the Earth. This natural phenomenon creates a solar eclipse, meaning the Moon covers, or eclipses, your view of the Sun if you're in that region. No surprise that in [...]

Ring of fire: Visualizing 5,000 years of solar eclipses was published on SAS Voices by Falko Schulz

7月 172019
 

Problem solving: thinking inside the box

Have you ever tried to pass comma-delimited values to SAS macro or to a SAS macro function? How can SAS distinguish commas separating parameters or arguments from commas separating parts of the values?

Passing comma-delimited value as an argument to a SAS macro function

Let’s say you want to extract the first word from the following string of characters (these words represent column names in the SASHELP.CARS data table):

make, model, type, origin

If you run the following code:

%let firstvar = %scan(make, model, type, origin, 1);

you get is the following ERROR in your SAS log:

ERROR: Macro function %SCAN has too many arguments.

That is because %scan macro function sees and treats those make, model, type and origin as arguments since commas between them are interpreted as argument separators.

Even if you “hide” your comma-delimited value within a macro variable, it still won’t do any good since the macro variable gets resolved during macro compilation before being passed on to a macro or macro function for execution.

%let mylist = make, model, type, origin;
%let firstvar = %scan(&mylist, 1);

You will still get the same ERROR:

ERROR: Macro function %SCAN has too many arguments.

Passing comma-delimited value as a parameter to a SAS macro

Try submitting the following code that passes your macro variable value to a SAS macro as a parameter:

%let mylist = make, model, type, origin;
%macro subset(dsname=, varlist=);
   proc sql;
      select &varlist
      from &dsname;
   quit;
%mend subset;
%subset(dsname=SASHELP.CARS, varlist=&mylist)

You will get another version of the SAS log ERROR:

ERROR: All positional parameters must precede keyword parameters.
NOTE: Line generated by the macro variable "MYLIST".
1                 type, origin
                  ----
                  180
ERROR 180-322: Statement is not valid or it is used out of proper order.

In this case, macro %subset gets as confused as the %scan function above because your macro variable will get resolved during macro compilation, and SAS macro processor will see the macro invocation as:

%subset(dsname=SASHELP.CARS, varlist=make, model, type, origin)

treating each comma as a parameter separator.

All this confusion happens because SAS functions’ arguments and SAS macros’ parameters use commas as their separators, while resolved macro variables introduce their own values’ comma delimiters into the functions/macros constructs’ picture, thus wreaking havoc on your SAS program.

It’s time for a vacation

But don’t panic! To fight that chaos, you need to take a vacation. Not a stay-home, do-nothing vacation, but some serious vacation, with faraway destination and travel arrangements. While real vacation is preferable, an imaginary one would do it too. I mean to start fighting the mess with comma-separated values, pick your destination, book your hotel and flight, and start packing your stuff.

Do you have a “vacation items list”? In my family, we have an individual vacation list for every family member. How many items do you usually take with you? Ten, twenty, a hundred?

Regardless, you don’t show up at the airport checkpoint with a pile of your vacation items. That would’ve been too messy. I don’t think you would be even allowed boarding with an unpacked heap of your stuff. You come to an airport neatly rolling a single item that is called a suitcase. Well, I suppose that some of you may have two of them, but I can’t imagine more than that.

You only started your fantasy vacation, you haven’t even checked in to your flight, but you have already have a solution in your sight, a perfect combine-and-conquer solution for passing comma-delimited values. Even if you have not yet realized that it’s in your plain view.

Thinking inside the box

Forget about “thinking outside the box” metaphor. You can’t solve all your problems with a single strategy. Sometimes, you need to turn your thinking on its head to solve, or even to see the problem.

As for your airport check-in, instead of thinking outside the box, you thought “inside the box” and brought your many items “boxed” as a single item – a suitcase. A container, in a broader sense.

That is exactly how we are going to approach our comma-delimited lists problem. We are going to check them in to a macro or a macro function as a single, boxed item. Just like this:
Passing a comma-separated value to SAS macro or SAS macro function
Or like this:
passing SAS macro variable with comma-separated value to SAS macro or SAS macro function

Not surprisingly, SAS macro language provides a variety of these wonder boxes for many special occasions collectively known as macro quoting functions. Personally, I would prefer calling them “macro masking functions,” as they have nothing to do with “quoting” per se and have everything to do with masking various characters during macro compilation or macro processing. But that is what “macro quoting” means – masking, boxing, - similar to “quoting” a character string to make it a single entity.

Different macro quoting functions mask different special characters (+ - , / ; = etc.) and mnemonics (AND OR GT EQ etc.) so that the macro facility interprets them as text instead of as language symbols.

Here are all 7 SAS macro quoting functions, two of which work at macro compilation - %STR() and %NRSTR(), while other 5 work at macro execution - %QUOTE() and %NRQUOTE(), %BQUOTE() and %NRBQUOTE(), and %SUPERQ().

You may look up what symbols they mask and the timing they apply (macro compilation vs. macro execution) in this macro quoting functions summary. You may also want to look at the following cheat sheet: Deciding When to Use a Macro Quoting Function and Which Function to Use.

As general rule of thumb, use macro quoting functions at compilation time when you mask text constants - (make, model, type, origin); use macro quoting functions at execution time when you mask macro or macro variable references containing & or % - (&mylist).

NOTE: There are many other SAS macro functions that besides their main role also perform macro quoting, e.g. %QSCAN(), %QSUBSTR() and others; they all start with %Q.

Masking commas within a comma-delimited value passed as an argument or a parameter

It turns out that to mask (or to “box”) comma-separated values in a macro function or a SAS macro, any macro quoting function will work. In this case I would suggest using the simplest (and shortest) %STR(). %STR() applies during macro compilation and serves as a perfect “box” for our comma-delimited values to hide (mask) commas to receiving macro function or a macro does not confuse them with its own commas separating arguments / parameters.

With it we can re-write our above examples as:

%let firstvar = %scan(%str(make, model, type, origin), 1);
%put &=firstvar;

SAS log will produce exactly what we expected:

FIRSTVAR=make

Similarly, we can call the above SAS macro as:

%subset(dsname=SASHELP.CARS, varlist=%str(make, model, type, origin) )

It will run without ERRORs and produce a print of the SASHELP.CARS data table with 4 columns specified by the varlist parameter value:

SAS output table as a result of macro run

Masking commas within a macro variable value passed as an argument or parameter

When you assign a comma-delimited list as a value to a macro variable, we want to mask commas within the resolved value during execution. Any of the execution time macro quoting functions will mask comma.

Again, in case of multiple possibilities I would use the shortest one - %QUOTE().

With it we can re-write our above examples as:

%let mylist = make, model, type, origin;
 
%let firstvar = %scan(%quote(&mylist), 1);
 
%subset(dsname=SASHELP.CARS, varlist=%quote(&mylist))

But just keep in mind that the remaining 4 execution time macro quoting functions - %NRQUOTE(), %BQUOTE(), %NRBQUOTE() and %SUPERQ() - will work too.

NOTE: The syntax of the %SUPERQ() function is quite different from the rest of the pack. The %SUPERQ() macro function takes as its argument either a macro variable name without an ampersand or a macro text expression that yields a macro variable name without an ampersand.

Get it going

I realize that macro quoting is not a trivial matter. That is why I attempted to explain its concept on a very simple yet powerful use case. Hope you will expand on this to empower your SAS coding skills.

Passing comma-delimited values into SAS macros and macro functions was published on SAS Users.

7月 172019
 

In the SAS/IML language, a matrix contains data of one type: numeric or character. If you want to create a SAS data set that contains mixed-type data (numeric and character), SAS/IML 15.1 provides support to write multiple matrices to a data set by using a single statement. Specifically, the CREATE FROM and APPEND FROM statements now support writing multiple matrices of any types. SAS/IML 15.1 was released as part of SAS 9.4m6.

Write mixed-type data from SAS/IML objects

With the new enhancements to the CREATE FROM and APPEND FROM statements, you now have four ways to write mixed type data to a SAS data set:

Write multiple matrices to a data set

In SAS/IML 15.1, you can specify multiple matrices on the CREATE FROM statement. The matrices can be any type. In the following example, X matrix is a numeric matrix and C is a character matrix:

/* read numeric and character vars in one call */
proc iml;
NumerVarNames = {'N' 'N2' 'N3'};
X = { 1  2  3,
      2  4  6,
      3  6  9,
      4  8 12};
charVarNames = {'Animal' 'Flower'};
C = {'Rat'   'Iris', 
     'Pig'   'Rose',
     'Goat'  'Daisy', 
     'Duck'  'Lily'};
 
/* SAS/IML 15.1: write multiple matrices of any type to a SAS data sets */
AllNames = NumerVarNames || CharVarNames;
create MyData from X C [colname=AllNames];  /* specify multiple matrices */
append from X C;                            /* repeat matrix names */
close;
QUIT;
 
proc print data=MyData noobs;
run;

Although the new enhancements to the CREATE FROM and APPEND FROM statements enable you to write mixed-type data to a SAS data set, you can also write multiple matrices regardless of the types. For example, you can use the same technique to write multiple numeric matrices.

Notice that if you want to specify the names of the data set variables, you use a single COLNAME= option at the end of the CREATE FROM statement.

The post Write numeric and character matrices to a data set from SAS/IML appeared first on The DO Loop.

7月 162019
 

“They were the best of asteroids, they were the worst of asteroids.” ~ Charles Dickens Armstrong There are good asteroids, and there are bad asteroids. Good asteroids stay in their own neighborhoods and wait for us to come visit them.  Bad asteroids, however, don’t wait for an invitation – they [...]

A tale of two asteroids was published on SAS Voices by Leo Sadovy

7月 152019
 

With all the excitement around SAS’ new software architecture, SAS Viya, we often get asked the question:

What is it and how can it help my company conquer our analytics challenges?

Fortunately, learning more about SAS Viya has never been easier.

SAS Viya extends the SAS® Platform and provides reliable, scalable, and secure analytics inventory management and governance. It allows for faster processing, access to machine learning, plus support for other languages like Python, R, Java, and Lua. In addition, it has support for on-site, cloud, or hybrid environments. It opens SAS to more than just data scientists and allows the SAS platform to be used by business analysts, developers, executives, and more. It truly is the next step in data analytics!

To support our Viya revolution, we have published two new free e-books to illustrate the features and capabilities of SAS Viya.


Exploring SAS® Viya®: Programming and Data Management covers how to access data files, libraries, and existing code in SAS® Studio. It also includes information on new procedures in SAS Viya, how to write new code, and how to use some of the pre-installed tasks that come with SAS® Visual Data Mining and Machine Learning.

 

 

 

 

Exploring SAS® Viya®: Visual Analytics, Statistics, and Investigations covers data visualization which enables decision-makers to see analytics presented visually so that they can grasp difficult concepts or identify new patterns. SAS offers several solutions for visualizing your data, many of which are powered by SAS Viya. This book includes four visualization solutions powered by SAS Viya: SAS Visual Analytics, SAS Visual Statistics, SAS Visual Text Analytics, and SAS Visual Investigator.

Test your new SAS skills

Ready to test out Viya for yourself? Get a free trial and test the power of the Viya engine.

For more updates on new SAS Press books and exclusive discounts subscribe to our SAS Press New Book Newsletter.

Curious about SAS® Viya®? Discover two new free SAS Press e-books! was published on SAS Users.

7月 152019
 

Heat maps have many uses. You can use a heat map to visualize correlation matrices, to visualize longitudinal data ("lasagna plots"), and to visualize counts in any two-dimensional table. As of SAS 9.4m3, you can create heat maps in SAS by using the HEATMAP and HEATMAPPARM statements in PROC SGPLOT. Prior to SAS 9.4m3, you could create heat maps by using the Graph Template Language (GTL) in Base SAS or the HeatmapCont and HeatmapDisc functions in SAS/IML software.

I like to emphasize the difference between a continuous heat map and a discrete heat map. In a continuous heat map, each cell is assigned a color from a continuous color ramp and the graph includes a gradient legend that associates colors with numerical values of the continuous response variable. However, sometimes the response has a small number of discrete values such as 'Low', 'Medium', and 'High'. In that case, you can create a discrete heat map, similar to the one shown to the right. A discrete heat map uses a discrete palette of colors (and a discrete legend) to visualize the response variable.

First, this article shows how to use the HEATMAPPARM statement in PROC SGPLOT to create a continuous heat map, which is the default behavior. Next, it shows how to use a SAS format to bin the response variable into ordinal categories. Third, it creates a discrete heat map, shown at right to visualize the binned responses. Binning the response values and using a discrete heat map is especially useful when the response variable spans several orders of magnitude.

Create a continuous heat map

In this article, I use only the HEATMAPPARM statement. The difference between the HEATMAP and the HEATMAPPARM statement is that the HEATMAP statement supports binning the (x, y) values onto a uniform grid. The color in each cell is based on some statistic (frequency, sum, mean,...) that is computed over all the observations in a bin. In contrast, you use the HEATMAPPARM statement when the data are already aggregated onto a uniform grid. For each (x, y) coordinate, you have a single response value that you want to visualize by using color. This is often the case when you use heat maps to visualize tables.

Suppose a store tracks sales of three products ('A', 'B', and 'C') over a 10-week period. You can use a continuous heat map to visualize the quantities sold for each product. Because there are only 10 cells in the horizontal direction, you can optionally use the DISCRETEX option to show all values, as follows:

data Sales;
input Product $ @@;
do Week = 1 to 10;
   input QtySold @@;
   output;
end;
label QtySold="Quantity Sold";
datalines;
A  5  3  2  7 10  8  5  6  9 11 
B  4  1  0  2  0  2  2  1  2  2
C 27 15 18 29 40 20 19 25 31 34
;
 
ods graphics / width=640 height=400px;
title "Continuous Heat Map";
title2 "Continuous Color Ramp and Legend";
proc sgplot data=Sales;
   heatmapparm x=Week y=Product colorresponse=QtySold / outline discretex;
   text x=Week y=Product text=QtySold / textattrs=(size=12pt) strip;
   gradlegend;
run;
Discrete heat map in SAS by using PROG SGPLOT. Colors are assigned by using a discrete attribute map.

To make the heat map easier to understand, I overlaid the quantities sold for each product and each week. The color of each cell is determined by using a three-color color ramp. The darkest blue corresponds to 0 items sold, the white color corresponds to 20 units sold, and the darkest red corresponds to 40 units sold. The colors for other values are linearly interpolated. A gradient legend to the right shows the association between shades of colors and units sold. You can use the COLORMODEL= option to use a different color ramp.

As I have discussed in other articles, you might not want to use a linear color ramp when the response variable is skewed or contains outliers. In this example, the store sells many more units of product 'C' than 'A' or 'B'. Consequently, most of the cells in the heat map are blue (low) and only a few are white (medium) or red (high). If you bin the counts into meaningful ordinal categories, the low and medium values will be easier to discern.

Use a format to bin the response variable

If your response variable is discrete and consists of a small number of groups, you can use a discrete heat map. Syntactically, you specify a discrete heat map by using the COLORGROUP= option (instead of COLORRESPONSE=) on the HEATMAPPARM statement. Instead of the GRADLEGEND statement, add a regular (discrete) legend by using the KEYLEGEND statement.

Let's create a discrete heat map for the Sales data by binning the QtySold variable. You can use a SAS format to bin a continuous variable into ordinal categories. The following call to PROC FORMAT bins the data into five categories by using the cut points 3, 7, 12, and 20.

By default, the SGPLOT procedure will use the data colors in the current style to assign colors to groups, such as blue, red, green, brown, and purple. However, when the categories are ordinal, you might want to use a sequential or diverging color scheme to assign colors to group, similar to what the gradient color ramp provides. You can use the STYLEATTRS statement to assign colors to groups. The following is an initial attempt to create a discrete heat map. However, as you will see, the program contains a logical error:

proc format;
value SoldFmt       /* bin into five groups */
      low -<  3   = "Almost None"
       3 -<   7   = "Few"
       7 -<  12   = "Moderate"
       12 -< 20   = "Many" 
       20 -  high = "Most";
run;
 
title "Discrete Heat Map";
title2 "Discrete Color Palette and Legend";
/* Attempt to use STYLEATTRS to define discrete colors.
   Does not work because default group order is "data order" */
proc sgplot data=Sales;
   format QtySold SoldFmt.;         /* use a format to bin the response variable */
   styleattrs datacolors=(ModerateBlue VeryLightBlue CXF8F8F8 VeryLightRed ModerateRed);
   heatmapparm x=Week y=Product colorgroup=QtySold / outline discretex;
   keylegend;
run;
Discrete heat map in SAS by using PROG SGPLOT. The colors are assigned in data order.

The heat map is shown, but it does not reflect the ordinal nature of the counts. I intentionally constructed the example so that the groups appear in the legend "out of order." The "Few" category appears before the "Almost None" category, and the "Most" category appears before the "Many" category. The STYLEATTRS statement correctly assigned colors to the groups, but the groups do not appear in fewest-to-most order.

This problem occurs because the order of the groups (and, therefore, their colors) is determined by the order in which they appear in the data set. There are several solutions to this problem, including sorting the data and adding fake observations to the data set. However, the best solution is to explicitly create a mapping between group values and colors. This is called a "discrete attribute map." A discrete attribute map enables you to associate colors (and other attributes) to groups, regardless of how the groups are sorted or used.

Use a discrete attribute map to associate colors to groups

If you encounter this "legend order" problem, a discrete attribute map is the most robust solution. The "map" is simply a data set that assigns attributes to each formatted value of the response variable. The PROC SGPLOT documentation for discrete attribute maps provides details about the names of variables in the data set.

For the heat map, the important attribute is the FILLCOLOR attribute of each cell. Thus, you need to create a data set that has five rows and two variables. The name of the primary columns must be Value and FillColor. You can hard-code the formatted values or you can use the PUT function to format the raw values, as shown in the following program. (I like the second option; it works even if you change the strings in PROC FORMAT.) You also might want to define the ID variable and the Show variables. The ID variable is optional if the data set defines only one attribute map. If you set Show="AttrMap", the legend will show all of the possible values in the legend, even if the data set does not contain all the groups.

The following DATA step defines a discrete attribute map. Use the DATTRMAP= option on the PROC SGPLOT statement to use the mapping, as follows:

data Order;                            /* create discrete attribute map */
length Value $11 FillColor $15;
input raw FillColor;
Value = put(raw, SoldFmt.);            /* use format to assign values */
retain ID 'SortOrder'                  /* name of map */
     Show 'AttrMap';                   /* always show all groups in legend */
datalines;
0  ModerateBlue
3  VeryLightBlue
7  CXF8F8F8
12 VeryLightRed
20 ModerateRed
;
 
proc sgplot data=Sales dattrmap=Order; /* use discrete attribute map */
   format QtySold SoldFmt.;
   heatmapparm x=Week y=Product colorgroup=QtySold / outline attrid=SortOrder;
   keylegend;                          /* will use the order in attribute map */
run;
Continuous heat map in SAS by using PROG SGPLOT

Success! The heat map uses the custom blue-white-red color ramp for the groups. The order of the items in the legend (and their attributes) are determined by the discrete attribute map. No matter what order the groups appear in the data, the legend will show the items in the correct ordinal order, which is least to greatest.

For more information about legend order and discrete attribute maps, see Warren Kuhfeld's article "Legend order and group attributes."

In summary, this article shows how to use the HEATMAPPARM statement in PROC SGPLOT to create heat maps. Use the HEATMAPPARM statement when the (x, y) values are discrete and pre-summarized. By default, the HEATMAPPARM statement creates a continuous heat map and the GRADLEGEND statement displays a gradient legend. If the response variable is discrete, use the COLORGROUP= option on the HEATMAPPARM statement and use the KEYLEGEND statement to add a discrete legend. Remember that the order of the groups is determined by the order in which the groups appear in the data, but you can define a discrete attribute map to ensure that the groups appear in a specified order.

The post Create a discrete heat map with PROC SGPLOT appeared first on The DO Loop.

7月 122019
 

Are you a seasoned data scientist looking for a fast, all-inclusive machine learning solution? Curious about machine learning but have little to no programming experience? Interested in using AI to take over the world? Follow my lead and use SAS VDDML to fast track your world domination.

This blog is the beginning of a series on  SAS Visual Data Mining and Machine Learning (VDMML) told from my perspective as a first-time SAS Viya user, Graduate Intern at SAS, and ABD PhD Candidate in Computer Science. I'm writing this series for two main reasons: 1) to express how surprised I am at seeing how easily complex tasks can be completed after doing it the hard way for years and 2) to provide examples to convince you, too.  

SAS VDMML is only one of many products available in SAS Viya®. Its distinguishing feature being the machine learning pipelines which are created in a single, integrated in-memory environment via a drag and drop interface.

In this post, I will provide a high-level overview of a few of the features available in SAS VDMML. In the next posts, I will provide detailed examples and code comparisons for individual features, such as pipeline creation and autotuning. 

Tip: At the bottom of the post, I talk about a course on machine learning using SAS Viya that provides access to the software and teaches machine learning basics. 

simpleDecisionTreePipeline

Simple custom pipeline using SAS Viya

If you've never used SAS VDMML, here are the top 3 reasons why I think you should check it out. 

sasvdmml_hyperparameters

A sample of the variety of hyperparameter modifications available.

SAS VDMML creates a simplified approach to machine learning solutions beneficial to people with a wide range of expertise.

Have you been programming for as long as you can remember and are well-versed in the machine learning world?

Why spend all your precious mental energy focusing on tedious programming tasks? Instead, you could be focusing your energy on diving deeper in your data and discovering the extent of its modelling capabilities.

After spending only a week familiarizing myself with the interface, I felt confident I could perform my normal tasks with ease and with better hyperparameter tuning and more comprehensive model evaluations.

If you are wondering if these simplifications limit the customization of models, think again. For most uses, the customization options available match the level programming provides by utilizing features such as drop-down menus and editable text boxes - eliminating unnecessary mental overhead. 

 

opensourcecodenode

Open Source Code node as a supervised learning node and example code in code editor

Still itching to program?

You have options: the SAS Code node and the Open Source Code node (available for use with R and Python). Both nodes can be used in any part of the pipeline, including preprocessing, supervised learning, and post-processing.

For example, you may have preprocessing code for extra messy data already written in R. All you need to do is add the Open Source Code node, insert your code, and update the variables to match the provided macros. Or, maybe you want to try the Deep Learning toolkit and CAS action? Drop a SAS Code node into your pipeline, add your code, and you are good to go!

 

Little to no programming experience or not quite a machine learning expert?

The drag and drop interface, wide selection of templates, and the extensive evaluations allow for almost anyone to produce professional-level results in a matter of minutes. While using SAS VDMML might not require expert-level knowledge on machine learning, important projects should have an expert review the approach and results. 

Example of creating a new pipeline using an advanced template

sasvdmml_pipelinenodeoptions

A sample of options for preprocessing and supervised learning.

The days of spending weeks programming scripts for feature extraction, fine-tuning models, and evaluating your model are over!

For example, let's say I'm attempting to impute some variables using R. First, I might store the names of  each columns separately based on the type of imputation I want to perform on it. Then, I could create the code for each type of imputation. If I'm only attempting 2 different types of imputation, I will most likely need less than 10 lines of code. Not much, right?

But, I will also need to test and verify that each variable has been imputed correctly.

Instead, the same task in SAS VDMML would just require you to drop an Imputation node into the pipeline and select via a drop-down menu how to impute the variables - no time wasted.

Additionally, you can quickly compare a variety of supervised learning methods as well as test out the same model with different pre-processing methods using the automatic evaluations provided.  

Looking to save even more time?

You can use the autotuning feature to select the best set of hyperparameters for your model by turning it on in your supervised learning node of choice and hitting run. 

sasvdmml_autotuning

Turn on the autotuning feature inside your supervised learning node, then adjust ranges for the hyperparameters.

After the run is complete, view the supervised learning model's results to see the best configuration of parameters as determined by autotuning. 

sasvdmml_autotuningresults

Example of the results after using autotuning for a decision tree.

All of this can be accomplished with a few clicks, which eliminates the hours spent debugging scripts and connecting the steps in your workflow.

stressedatcomputerYou’ve spent the last hour transforming and creating features in your code editor of choice. Now, after waiting 30 minutes for your model to run again, you get the same results! How? Wait...you’ve forgotten to update the reference to your new data, AGAIN. (This has definitely not happened to me.) 

Fortunately, SAS VDMML only allows you to view results if the pipeline is up-to-date, which ensures that all changes are accounted for. Now, instead of checking and checking again that I passed the right data to the right functions, I can immediately know that my small tweak had no effect on the results. *sigh* OR on a brighter note, that the drastic improvement is not a fluke!  

Updating the Feature Extraction node resets all child nodes below - ensuring that the pipeline stays up-to-date.

Interested in checking out SAS Viya?

Machine Learning Using SAS Viya is a course that teaches machine learning basics, gives instruction on using SAS Viya VDMML, and provides access to the SAS Viya for Learners software all for $79.This course is the pre-requisite course for the SAS Certified Specialist in Machine Learning Certification. Going through the course myself, I was able to quickly learn how to use SAS VDMML and received a refresher on many data preprocessing tactics and machine learning concepts. 

Want to learning more? 

Stay tuned!!

I will be posting blogs with in-depth examples of specific features in the SAS VDMML and adding links to the new blog posts here as they are posted. If you there’s any specific features you would like to know more about, leave a comment below! 

Visual machine learning using SAS Viya: a Graduate Intern’s perspective was published on SAS Users.

7月 122019
 

Are you a seasoned data scientist looking for a fast, all-inclusive machine learning solution? Curious about machine learning but have little to no programming experience? Interested in using AI to take over the world? Follow my lead and use SAS VDDML to fast track your world domination.

This blog is the beginning of a series on  SAS Visual Data Mining and Machine Learning (VDMML) told from my perspective as a first-time SAS Viya user, Graduate Intern at SAS, and ABD PhD Candidate in Computer Science. I'm writing this series for two main reasons: 1) to express how surprised I am at seeing how easily complex tasks can be completed after doing it the hard way for years and 2) to provide examples to convince you, too.  

SAS VDMML is only one of many products available in SAS Viya®. Its distinguishing feature being the machine learning pipelines which are created in a single, integrated in-memory environment via a drag and drop interface.

In this post, I will provide a high-level overview of a few of the features available in SAS VDMML. In the next posts, I will provide detailed examples and code comparisons for individual features, such as pipeline creation and autotuning. 

Tip: At the bottom of the post, I talk about a course on machine learning using SAS Viya that provides access to the software and teaches machine learning basics. 

simpleDecisionTreePipeline

Simple custom pipeline using SAS Viya

If you've never used SAS VDMML, here are the top 3 reasons why I think you should check it out. 

sasvdmml_hyperparameters

A sample of the variety of hyperparameter modifications available.

SAS VDMML creates a simplified approach to machine learning solutions beneficial to people with a wide range of expertise.

Have you been programming for as long as you can remember and are well-versed in the machine learning world?

Why spend all your precious mental energy focusing on tedious programming tasks? Instead, you could be focusing your energy on diving deeper in your data and discovering the extent of its modelling capabilities.

After spending only a week familiarizing myself with the interface, I felt confident I could perform my normal tasks with ease and with better hyperparameter tuning and more comprehensive model evaluations.

If you are wondering if these simplifications limit the customization of models, think again. For most uses, the customization options available match the level programming provides by utilizing features such as drop-down menus and editable text boxes - eliminating unnecessary mental overhead. 

 

opensourcecodenode

Open Source Code node as a supervised learning node and example code in code editor

Still itching to program?

You have options: the SAS Code node and the Open Source Code node (available for use with R and Python). Both nodes can be used in any part of the pipeline, including preprocessing, supervised learning, and post-processing.

For example, you may have preprocessing code for extra messy data already written in R. All you need to do is add the Open Source Code node, insert your code, and update the variables to match the provided macros. Or, maybe you want to try the Deep Learning toolkit and CAS action? Drop a SAS Code node into your pipeline, add your code, and you are good to go!

 

Little to no programming experience or not quite a machine learning expert?

The drag and drop interface, wide selection of templates, and the extensive evaluations allow for almost anyone to produce professional-level results in a matter of minutes. While using SAS VDMML might not require expert-level knowledge on machine learning, important projects should have an expert review the approach and results. 

Example of creating a new pipeline using an advanced template

sasvdmml_pipelinenodeoptions

A sample of options for preprocessing and supervised learning.

The days of spending weeks programming scripts for feature extraction, fine-tuning models, and evaluating your model are over!

For example, let's say I'm attempting to impute some variables using R. First, I might store the names of  each columns separately based on the type of imputation I want to perform on it. Then, I could create the code for each type of imputation. If I'm only attempting 2 different types of imputation, I will most likely need less than 10 lines of code. Not much, right?

But, I will also need to test and verify that each variable has been imputed correctly.

Instead, the same task in SAS VDMML would just require you to drop an Imputation node into the pipeline and select via a drop-down menu how to impute the variables - no time wasted.

Additionally, you can quickly compare a variety of supervised learning methods as well as test out the same model with different pre-processing methods using the automatic evaluations provided.  

Looking to save even more time?

You can use the autotuning feature to select the best set of hyperparameters for your model by turning it on in your supervised learning node of choice and hitting run. 

sasvdmml_autotuning

Turn on the autotuning feature inside your supervised learning node, then adjust ranges for the hyperparameters.

After the run is complete, view the supervised learning model's results to see the best configuration of parameters as determined by autotuning. 

sasvdmml_autotuningresults

Example of the results after using autotuning for a decision tree.

All of this can be accomplished with a few clicks, which eliminates the hours spent debugging scripts and connecting the steps in your workflow.

stressedatcomputerYou’ve spent the last hour transforming and creating features in your code editor of choice. Now, after waiting 30 minutes for your model to run again, you get the same results! How? Wait...you’ve forgotten to update the reference to your new data, AGAIN. (This has definitely not happened to me.) 

Fortunately, SAS VDMML only allows you to view results if the pipeline is up-to-date, which ensures that all changes are accounted for. Now, instead of checking and checking again that I passed the right data to the right functions, I can immediately know that my small tweak had no effect on the results. *sigh* OR on a brighter note, that the drastic improvement is not a fluke!  

Updating the Feature Extraction node resets all child nodes below - ensuring that the pipeline stays up-to-date.

Interested in checking out SAS Viya?

Machine Learning Using SAS Viya is a course that teaches machine learning basics, gives instruction on using SAS Viya VDMML, and provides access to the SAS Viya for Learners software all for $79.This course is the pre-requisite course for the SAS Certified Specialist in Machine Learning Certification. Going through the course myself, I was able to quickly learn how to use SAS VDMML and received a refresher on many data preprocessing tactics and machine learning concepts. 

Want to learning more? 

Stay tuned!!

I will be posting blogs with in-depth examples of specific features in the SAS VDMML and adding links to the new blog posts here as they are posted. If you there’s any specific features you would like to know more about, leave a comment below! 

Visual machine learning using SAS Viya: a Graduate Intern’s perspective was published on SAS Users.

7月 102019
 

Posted on behalf of SAS Press author Derek Morgan.


I was sitting in a model railroad club meeting when one of our more enthusiastic young members said, "Wouldn't it be cool if we could make a computer simulation, with trains going between stations and all. We could have cars and engines assigned to each train and timetables and…"

So, I thought to myself, “Timetables… I bet SAS can do that easily… sounds like something fun for Mr. Dates and Times."

As it turns out, the only easy part of creating a timetable is calculating the time. SAS handles the concept of elapsed time smoothly. It’s still addition and subtraction, which is the basis of how dates and times work in SAS. If a train starts at 6:00 PM (64,800 seconds of the day) and arrives at its destination 12 hours (43,200 seconds) later, it arrives at 6:00 AM the next day. The math is start time+duration=end time (108,000 seconds,) which is 6:00 AM the next day. It doesn’t matter which day, that train is always scheduled to arrive at 6:00 AM, 12 hours from when it left.

It got a lot more complicated when the group grabbed onto the idea. One of the things they wanted to do was to add or delete trains and adjust the timing so multiple trains don’t run over the same track at the same time. This wouldn’t be that difficult in SAS; just create an interactive application, but… I’m the only one who has SAS. So how do I communicate my SAS database with the “outside world”? The answer was Microsoft Excel, and this is where it gets thorny.

It’s easy enough to send SAS data to Excel using ODS EXCEL and PROC REPORT, but how could I get Excel to allow the club to manipulate the data I sent?
I used the COMPUTE block in PROC REPORT to display a formula for every visible train column. I duplicated the original columns (with corresponding names to keep it all straight) and hid them in the same spreadsheet. The EXCEL formula code is in line 8.

Compute Block Code:

I also added three rows to the dataset at the top. The first contains the original start time for each train, the second contains an offset, which is always zero in the beginning, while the third row was blank (and formatted with a black background) to separate it from the actual schedule.


Figure 1: Schedule Adjustment File

The users can change the offset to change the starting time of a train (Column C, Figure 2.) The formula in the visible columns adds the offset to the value in each cell of the corresponding hidden column (as long as it isn’t blank.) You can’t simply add the offset to the value of the visible cell, because that would be a circular reference.

The next problem was moving a train to an earlier starting time, because Excel has no concept of negative time (or date—a date prior to the Excel reference date of January 1, 1900 will be a character value in Excel and cause your entire column to be imported into SAS as character data.) Similarly, you can’t enter -1:00 as an offset to move the starting time of our 5:35 AM train to 4:35 AM. Excel will translate “-1:00” as a character value and that will cause a calculation error in Excel. In order to move that train to 4:35 AM, you have to add 23 hours to the original starting time (Column D, Figure 2.)


Figure 2: Adjusting Train Schedules

After the users adjust the schedules, it’s time to return our Excel data to SAS, which creates more challenges. In the screenshot above, T534_LOC is the identifier of a specific train, and the timetable is kept in SAS time values. Unfortunately, PROC IMPORT using DBMS=XLSX brings the train columns into SAS as character data. T534_LOC also imports as the actual Excel value, time as a fraction of a day.


Figure 3: How the Schedule Adjustment File Imports to SAS

While I can fix that by converting the character data to numeric and multiplying by 86,400, I still need the original column name of T534_LOC for the simulation, so I would have to rename each character column and output the converted data to the original column name. There are currently 146 trains spread across 12 files, and that is a lot of work for something that was supposed to be easy! Needless to say, this “little” side project, like most model railroads, is still in progress. However, this exercise in moving time data between Microsoft Excel and SAS gave me even more appreciation for the way SAS handles date and time data.

Figure 4 is a partial sample of the finished timetable file, generated as an RTF file using SAS. The data for trains 534 and 536 are from the spreadsheet in Figure 1.


Figure 4: Partial Sample Timetable

Want to learn more about how to use and manipulate dates, times, and datetimes in SAS? You'll find the answers to these questions and much more in my book The Essential Guide to SAS Dates and Times, Second Edition. Updated for SAS 9.4, with additional functions, formats, and capabilities, the Second Edition has a new chapter dedicated to the ISO 8601 standard and the formats and functions that are new to SAS, including how SAS works with Universal Coordinated Time (UTC). Chapter 1 is available as a free preview here.

For updates on new SAS Press books and great discounts subscribe to the SAS Press New Book Newsletter.

SAS Press Author Derek Morgan on Timetables and Model Trains was published on SAS Users.