1月 102018
 

SAS Global Forum 2018 AwardsThis April, more than 5,000 SAS users and business leaders will converge on Denver CO for the premier event for SAS professionals: SAS Global Forum 2018. The event provides an excellent forum to expand your SAS knowledge and network with users of all skill levels. (Last year I found myself having lunch one day sandwiched between a consultant who had built a three-decade career around SAS and a graduate student who started using SAS three months earlier. How's that for diversity!)

And because SAS Global Forum attracts users from across the globe; in every industry imaginable; and from countless government and academic institutions, it really is a user event not to be missed. Thanks to the SAS Global Users Group Executive Board there are a couple of award programs in place to help those who might otherwise have a hard time getting to the event... well, get to the event!

New SAS® Professional Award

For relatively new SAS users who want to experience the conference for the first time, there's the New SAS® Professional Award. This award provides full-time SAS professionals with five years or less of SAS experience the opportunity to earn a free conference registration and one free pre-conference tutorial. You are eligible if you have never attended a SAS Global Forum in the past and would not otherwise be able to attend without assistance.

SAS® Global Forum International Professional Award

A similar award, the SAS® Global Forum International Professional Award, provides users outside of the 48 contiguous U.S. states a similar opportunity. To qualify for this award, you must be a full-time SAS professional who has never attended a SAS Global Forum and would not otherwise be able to attend. This award provides free registration, including meals; one free pre-conference tutorial; and an invitation to an awards recognition luncheon on Sunday, April 8.

Both awards are managed by SAS users who will assume leadership roles in future conferences.

MaryAnne DePesquo, the 2019 SAS Global Forum Chair, is in charge of the 2018 International Professional Awards, while Lisa Mendez, SAS Global Forum Chair in 2020, manages the 2018 New SAS Professional Awards. Direct questions about either program to MaryAnne or Lisa.

To be considered for either program, you must submit your application by Jan. 29, 2018. You will be notified if you received an award no later than March 5, 2018.

Hope to see you in Denver!

Apply for the New SAS Professional Award.
Apply for the SAS Global Forum International Professional Award.

Interested? Hear more from a couple of last year's award recipients

Professional Awards provide first-time attendees a chance to attend SAS Global Forum 2018 was published on SAS Users.

1月 102018
 

Last week I wrote about the 10 most popular articles from The DO Loop in 2017. My most popular articles tend to be about elementary statistics or SAS programming tips. Less popular are the articles about advanced statistical and programming techniques. However, these technical articles fill an important niche. Not everyone needs to know how to interpret a diffogram that visually compares the differences between means between groups, but those who do often send me a note of thanks, which motivates me to research and write similar articles.

Statistical programmers might want to review the following technical articles from 2017. This ain't summertime, and the reading ain't easy, but I think these articles are worth the effort. I've broken them into three categories. Enjoy!

Statistical Concepts

Visualization of regression that uses a weight variable in SAS

Statistical Data Analysis and Visualization

Visualize multivariate regression model by slicing the continuous variables. Graph created by using the EFFECTPLOT SLICEFIT statement in SAS.

Advanced SAS Programming Techniques

If you are searching for a way to enhance your SAS knowledge in this New Year, I think these 10 articles from The DO Loop are worth a second look. Was there an article from 2017 that you found particularly useful? Leave a comment.

The post 10 posts from 2017 that deserve a second look appeared first on The DO Loop.

1月 082018
 

Real-time analytics is, at its most basic, being able to present the right offer at the right time. And it’s the ultimate competitive differentiator in today’s age of highly valued customer experiences. Marketers need to understand why this capability is so important to customers, and how to implement this kind [...]

A marketer’s guide for real-time analytics was published on Customer Intelligence Blog.

1月 082018
 
Label multiple regression lines in a graph in SAS by using PROC SGPLOT

A SAS programmer asked how to label multiple regression lines that are overlaid on a single scatter plot. Specifically, he asked to label the curves that are produced by using the REG statement with the GROUP= option in PROC SGPLOT. He wanted the labels to be the slope and intercept of a linear regression line, as shown to the right. (Click to enlarge.)

Initially I thought that you could use the CURVELABEL option on the REG statement to generate labels, as follows:

proc sgplot data=sashelp.iris noautolegend;
   reg x=SepalLength y=PetalLength / group=Species CURVELABEL; /* does NOT work */
run;

However, the SAS log displays the following warning:

WARNING: CURVELABEL not supported for fit plots when a group variable is
         used.  The option will be ignored.

Fortunately, I thought of two other ways to create a graph that has a regression line for each group level, each with its own label. For linear regression, you can use the LINEPARM statement, as shown in the article "Add a diagonal line to a scatter plot." For general (possibly nonlinear) regression curves, you can find the location of the end of the curve and use the TEXT statement in PROC SGPLOT to add a label at that location.

Label the regression line for each group: The LINEPARM statement

Let's use Fisher's Iris data set for our example data. The Iris data contains 50 observations for each of three species of flowers: iris Setosa, iris Versicolor, and iris Virginica. The programmer wants to label the regression line for each species by using the slope and intercept of the line. The first step is to create a SAS data set that contains the intercept and slope for each curve. You can use the OUTEST= option in PROC REG to write the parameter estimates (intercept and slope) to a SAS data set. You can then use the CATX function in the DATA step to construct the labels, as follows:

proc sort data=sashelp.iris out=iris;
   by Species;
run;
 
/* compute parameter estimates */
proc reg data=iris outest=PE noprint;
   by Species;
   model PetalLength = SepalLength;
run;
 
/* construct labels from the parameter estimates */
data Labels;
   length Label $30;
   set PE(rename=(SepalLength=Slope));                   /* independent variable */
   Label = catx(" ", put(Intercept, BestD5.), '+',       /* separate by blank */
                     put(Slope, BestD5.), '* SepalLength');
   keep Label Species Intercept Slope;
run;
 
proc print noobs; run;

The LABELS data set contains a label for the regression line in each group. You can use other labels if you prefer. The following DATA step combines the labels with the original data. The SCATTER statement in PROC SGPLOT displays the data. The LINEPARM statement draws the lines and adds labels to the end of each line.

data Plot;
   set iris Labels;
run;
 
title "Regression Lines Labeled with Slope and Intercept";
proc sgplot data=Plot;
  scatter x=SepalLength y=PetalLength / group=Species;
  lineparm x=0 y=Intercept slope=Slope / group=Label curvelabel 
               curvelabelloc=outside clip;
run;
Label multiple regression lines in a graph in SAS by using PROC SGPLOT

Success! The regression line for each group is labeled by the formula for the line. For more information about displaying the formula for a regression line, see the SAS/STAT example "Adding Equations and Special Characters to Fit Plots."

Label the regression line for each group: The TEXT statement

The preceding method uses the LINEPARM statement, so it only works for lines. However, the user actually wanted to use the REG statement. With a little work, you can label curves that are produced by the REG statement or other curve-fitting statements. The idea is to obtain the data coordinates for the end of the curve, which will become the location of the label.

You might be thinking, if the curve is produced by a regression statement in PROC SGPLOT, how can we get the data coordinates out of the plot and into a data set? The answer is simple: You can use the ODS OUTPUT statement to write a data set that contains the data in any ODS graph. You can apply this trick to any ODS graph, including graphs created by SGPLOT. The following call to PROC SGPLOT uses an ODS OUTPUT statement to create a SAS data set that contains the data in the regression plot:

/* use ODS OUTPUT to find data coordinates of end of lines */
proc sgplot data=iris;
   ods output sgplot=RegPlot;    /* name of ODS table is 'sgplot' */
   reg x=SepalLength y=PetalLength / group=Species;
run;
 
proc contents short varnum; run; /* find names used by graph */

The variable names that are automatically manufactured by SAS procedures can be long and unwieldy, as shown by the call to PROC CONTENTS. I usually rename the long names to simpler names such as X, Y, GROUP, and so on. You should look at the structure of the REGPLOT data set so that the next DATA step makes sense. The DATA step saves only the last coordinates along the curve for each group (species).

data Coords;
set RegPlot(
    rename=(REGRESSION_SEPALLENGTH_PETAL___X = x
            REGRESSION_SEPALLENGTH_PETAL___Y = y
            REGRESSION_SEPALLENGTH_PETAL__GP = Group)
    where=(x ^=.));
by Group;
if last.Group;
keep x y Group;
run;
 
proc print noobs; run;

The COORDS data contains the location (in data coordinates) of the end of each regression line. You can overlay labels at these coordinates to label the curves. From the preceding section, the labels are in the LABELS data set, so you can merge the two data sets, as follows:

/* combine the positions and labels with original data */
data A;
merge Labels Coords(rename=(Group=Species));
by Species;
run;
 
data Plot;
set iris A;
/* optional: pad label with blanks on the left (if length is long enough) */
Label = "   " || Label;  
run;
 
proc sgplot data=Plot;
  reg x=SepalLength y=PetalLength / group=Species;
  text x=x y=y text=Label / position=right;
run;

The graph is shown at the top of this article.

Label multiple regression curves

If you study the previous section, you will see that the code does not rely on the linearity of the regression model. The same code works for polynomial regression and nonparametric regression curves such as are created by the LOESS and PBSPLINE statements in PROC SGPLOT. The following graph shows a PBSPLINE fit to the IRIS data. Because the penalized B-spline curve is nonparametric, there is no equation to display as a label. Instead, I use the Species name as a label and suppress the legend at the bottom of the graph. You can download the SAS program that creates this and all the graphs in this article.

Label multiple regression curves in a graph in SAS by using PROC SGPLOT

The post Label multiple regression lines in SAS appeared first on The DO Loop.

1月 062018
 

Deep learning is not synonymous with artificial intelligence (AI) or even machine learning. Artificial Intelligence is a broad field which aims to "automate cognitive processes." Machine learning is a subfield of AI that aims to automatically develop programs (called models) purely from exposure to training data.

Deep Learning and AI

Deep learning is one of many branches of machine learning, where the models are long chains of geometric functions, applied one after the other to form stacks of layers. It is one among many approaches to machine learning but not on equal footing with the others.

What makes deep learning exceptional

Why is deep learning unequaled among machine learning techniques? Well, deep learning has achieved tremendous success in a wide range of tasks that have historically been extremely difficult for computers, especially in the areas of machine perception. This includes extracting useful information from images, videos, sound, and others.

Given sufficient training data (in particular, training data appropriately labelled by humans), it’s possible to extract from perceptual data almost anything that a human could extract. Large corporations and businesses are deriving value from deep learning by enabling human-level speech recognition, smart assistants, human-level image classification, vastly improved machine translation, and more. Google Now, Amazon Alexa, ad targeting used by Google, Baidu and Bing are all powered by deep learning. Think of superhuman Go playing and near-human-level autonomous driving.

In the summer of 2016, an experimental short movie, Sunspring, was directed using a script written by a long short-term memory (LSTM) algorithm a type of deep learning algorithm.

How to build deep learning models

Given all this success recorded using deep learning, it's important to stress that building deep learning models is more of an art than science. To build a deep learning or any machine learning model for that matter one need to consider the following steps:

  • Define the problem: What data does the organisation have? What are we trying to predict? Do we need to collect more data? How can we manually label the data? Make sure to work with domain expert because you can’t interpret what you don’t know!
  • What metrics can we use to reliably measure the success of our goals.
  • Prepare validation process that will be used to evaluate the model.
  • Data exploration and pre-processing: This is where most time will be spent such as normalization, manipulation, joining of multiple data sources and so on.
  • Develop an initial model that does better than a baseline model. This gives some indication of whether machine learning is ideal for the problem.
  • Refine model architecture by tuning hyperparameters and adding regularization. Make changes based on validation data.
  • Avoid overfitting.
  • Once happy with the model, deploy it into production environment. This may be difficult to achieve for many organisations giving that a deep learning score code is large. This is where SAS can help. SAS has developed a scoring mechanism called "astore" which allows deep learning method to be pushed into production with just a click.

Is the deep learning hype justified?

We're still in the middle of deep learning revolution trying to understand the limitations of this algorithm. Due to its unprecedented successes, there has been a lot of hype in the field of deep learning and AI. It’s important for managers, professionals, researchers and industrial decision makers to be able to distill this hype from reality created by the media.

Despite the progress on machine perception, we are still far from human level AI. Our models can only perform local generalization, adapting to new situations that must be similar to past data, whereas human cognition is capable of extreme generalization, quickly adapting to radically novel situations and planning for long-term future situations. To make this concrete, imagine you’ve developed a deep network controlling a human body, and you wanted it to learn to safely navigate a city without getting hit by cars, the net would have to die many thousands of times in various situations until it could infer that cars are dangerous, and develop appropriate avoidance behaviors. Dropped into a new city, the net would have to relearn most of what it knows. On the other hand, humans are able to learn safe behaviors without having to die even once—again, thanks to our power of abstract modeling of hypothetical situations.

Lastly, remember deep learning is a long chain of geometrical functions. To learn its parameters via gradient descent one key technical requirements is that it must be differentiable and continuous which is a significant constraint.

Looking beyond the AI and deep learning hype was published on SAS Users.

1月 052018
 

For the last two years, I’ve spent time listening to and brainstorming with automakers, their suppliers and other technology companies about how to monetize connected vehicle data. This involves: Data that’s offloaded from a roving vehicle fleet (cars, trains, semi-trucks, farm tractors, you name it!). Data coming from driving-related mobile apps used [...]

How to monetize connected vehicles with analytics and mobility services was published on SAS Voices by Lonnie Miller

1月 042018
 

What will 2018 unveil for the data management market? I searched expert opinions on technology trends for 2018 and matched them against my own to uncover the five major trends that I think we’ll see in data management this year: 1. Data movement becomes more important. Cloud providers have proven [...]

Data management predictions for 2018 was published on SAS Voices by Helmut Plinke