1月 112018
 

If you work in a team environment, you might be accustomed to using mapped network drives for source data folders or to publish results. If you've recently moved to a SAS server environment, you might not have those mapped drives available. How can you tell?

This question was posted on the SAS Support Communities, and SAS Communities member Patrick provided a handy code snippet that does the trick. Patrick learned the trick himself from SAS Note 24818.

The following code uses a special DRIVEMAP device for the

filename diskinfo DRIVEMAP;
data _null_;
  infile diskinfo;
  input;
  put _infile_;
run;

Here's the output on my PC, where I have a two mapped network drives, U: and W:. (C: and D: are part of my laptop device.)

NOTE: The infile DISKINFO is:
      Drivemap Access Device,
      PROCESS,RECFM=V,LRECL=32767
C:
D:
U:
W:

Mapped drives are typically available only from your local machine. Even if you use SAS on a remote Windows server, the process that defines the mapped drive aliases is usually not triggered when you connect. If this is the case for you, you'll have to adjust your process to use UNC path notation (example: "\\server\folder\project" instead of the familiar "P:\project" shortcut). (Additional note for SAS admins: on a Windows SAS server,

The post How to list your mapped drives in a SAS program appeared first on The SAS Dummy.

1月 112018
 

The SAS® platform is now open to be accessed from open-source clients such as Python, Lua, Java, the R language, and REST APIs to leverage the capabilities of SAS® Viya® products and solutions. You can analyze your data in a cloud-enabled environment that handles large amounts of data in a variety of different formats. To find out more about SAS Viya, see the “SAS Viya: What's in it for me? The user.” article.

This blog post focuses on the openness of SAS® 9.4 and discusses features such as the SASPy package and the SAS kernel for Jupyter Notebook and more as clients to SAS. Note: This blog post is relevant for all maintenance releases of SAS 9.4.

SASPy

The SASPy package enables you to connect to and run your analysis from SAS 9.4 using the object-oriented methods and objects from the Python language as well as the Python magic methods. SASPy translates the objects and methods added into the SAS code before executing the code. To use SASPy, you must have SAS 9.4 and Python 3.x or later.
Note: SASPy is an open-source project that encourages your contributions.

After you have completed the installation and configuration of SASPy, you can import the SASPy package as demonstrated below:
Note: I used Jupyter Notebook to run the examples in this blog post.

1.   Import the SASPy package:

Openness of SAS® 9.4

2.   Start a new session. The sas object is created as a result of starting a SAS session using a locally installed version of SAS under Microsoft Windows. After this session is successfully established, the following note is generated:

Adding Data

Now that the SAS session is started, you need to add some data to analyze. This example uses SASPy to read a CSV file that provides census data based on the ZIP Codes in Los Angeles County and create a SASdata object named tabl:

To view the attributes of this SASdata object named tabl, use the PRINT() function below, which shows the libref and the SAS data set name. It shows the results as Pandas, which is the default result output for tables.

Using Methods to Display and Analyze Data

This section provides some examples of how to use different methods to interact with SAS data via SASPy.

Head() Method

After loading the data, you can look at the first few records of the ZIP Code data, which is easy using the familiar head() method in Python. This example uses the head() method on the SASdata object tabl to display the first five records. The output is shown below:

Describe() Method

After verifying that the data is what you expected, you can now analyze the data. To generate a simple summary of the data, use the Python describe() method in conjunction with the index [1:3]. This combination generates a summary of all the numeric fields within the table and displays only the second and third records. The subscript works only when the result is set to Pandas and does not work if set to HTML or Text, which are also valid options.

Teach_me_SAS() Method

The SAS code generated from the object-oriented Python syntax can also be displayed using SASPy with the teach_me_SAS() method. When you set the argument in this method to True, which is done using a Boolean value, the SAS code is displayed without executing the code:

ColumnInfo() Method

In the next cell, use the columnInfo() method to display the information about each variable in the SAS data set. Note: The SAS code is generated as a result of adding the teach_me_SAS() method in the last section:

Submit() Method

Then, use the submit() method to execute the PROC CONTENTS that are displayed in the cell above directly from Python. The submit method returns a dictionary with two keys, LST and LOG. The LST key contains the results and the LOG key returns the SAS log. The results are displayed as HTML. The HTML package is imported  to display the results.

The SAS Kernel Using Jupyter Notebook

Jupyter Notebook can run programs in various programming languages including SAS when you install and configure the SAS kernel. Using the SAS kernel is another way to run SAS interactively using a web-based program, which also enables you to save the analysis in a notebook. See the links above for details about installation and configuration of the SAS kernel. To verify that the SAS kernel installed successfully, you can run the following code: jupyter kernelspec list

From the command line, use the following command to start the Jupyter Notebook: Jupyter notebook. The screenshot below shows the Jupyter Notebook session that starts when you run the code. To execute SAS syntax from Jupyter Notebook, select SAS from the New drop-down list as shown below:

You can add SAS code to a cell in Jupyter Notebook and execute it. The following code adds a PRINT procedure and a SGPLOT procedure. The output is in HTML5 by default. However, you can specify a different output format if needed.

You can also use magics in the cell such as the %%python magic even though you are using the SAS kernel. You can do this for any kernel that you have installed.

Other SAS Goodness

There are more ways of interacting with other languages with SAS as well. For example, you can use the Groovy procedure to run Groovy statements on the Java Virtual Machine (JVM). You can also use the LUA procedure to run LUA code from SAS along with the ability to call most SAS functions from Lua. For more information, see “Using Lua within your SAS programs.” Another very powerful feature is the DATA step JavaObject, which provides the ability to instantiate Java classes and access fields and methods. The DATA step JavaObject has been available since SAS® 9.2.

Resources

SASPy Documentation

Introducing SASPy: Use Python code to access SAS

Come on in, we're open: The openness of SAS® 9.4 was published on SAS Users.

1月 102018
 

SAS Global Forum 2018 AwardsThis April, more than 5,000 SAS users and business leaders will converge on Denver CO for the premier event for SAS professionals: SAS Global Forum 2018. The event provides an excellent forum to expand your SAS knowledge and network with users of all skill levels. (Last year I found myself having lunch one day sandwiched between a consultant who had built a three-decade career around SAS and a graduate student who started using SAS three months earlier. How's that for diversity!)

And because SAS Global Forum attracts users from across the globe; in every industry imaginable; and from countless government and academic institutions, it really is a user event not to be missed. Thanks to the SAS Global Users Group Executive Board there are a couple of award programs in place to help those who might otherwise have a hard time getting to the event... well, get to the event!

New SAS® Professional Award

For relatively new SAS users who want to experience the conference for the first time, there's the New SAS® Professional Award. This award provides full-time SAS professionals with five years or less of SAS experience the opportunity to earn a free conference registration and one free pre-conference tutorial. You are eligible if you have never attended a SAS Global Forum in the past and would not otherwise be able to attend without assistance.

SAS® Global Forum International Professional Award

A similar award, the SAS® Global Forum International Professional Award, provides users outside of the 48 contiguous U.S. states a similar opportunity. To qualify for this award, you must be a full-time SAS professional who has never attended a SAS Global Forum and would not otherwise be able to attend. This award provides free registration, including meals; one free pre-conference tutorial; and an invitation to an awards recognition luncheon on Sunday, April 8.

Both awards are managed by SAS users who will assume leadership roles in future conferences.

MaryAnne DePesquo, the 2019 SAS Global Forum Chair, is in charge of the 2018 International Professional Awards, while Lisa Mendez, SAS Global Forum Chair in 2020, manages the 2018 New SAS Professional Awards. Direct questions about either program to MaryAnne or Lisa.

To be considered for either program, you must submit your application by Jan. 29, 2018. You will be notified if you received an award no later than March 5, 2018.

Hope to see you in Denver!

Apply for the New SAS Professional Award.
Apply for the SAS Global Forum International Professional Award.

Interested? Hear more from a couple of last year's award recipients

Professional Awards provide first-time attendees a chance to attend SAS Global Forum 2018 was published on SAS Users.

1月 102018
 

Last week I wrote about the 10 most popular articles from The DO Loop in 2017. My most popular articles tend to be about elementary statistics or SAS programming tips. Less popular are the articles about advanced statistical and programming techniques. However, these technical articles fill an important niche. Not everyone needs to know how to interpret a diffogram that visually compares the differences between means between groups, but those who do often send me a note of thanks, which motivates me to research and write similar articles.

Statistical programmers might want to review the following technical articles from 2017. This ain't summertime, and the reading ain't easy, but I think these articles are worth the effort. I've broken them into three categories. Enjoy!

Statistical Concepts

Visualization of regression that uses a weight variable in SAS

Statistical Data Analysis and Visualization

Visualize multivariate regression model by slicing the continuous variables. Graph created by using the EFFECTPLOT SLICEFIT statement in SAS.

Advanced SAS Programming Techniques

If you are searching for a way to enhance your SAS knowledge in this New Year, I think these 10 articles from The DO Loop are worth a second look. Was there an article from 2017 that you found particularly useful? Leave a comment.

The post 10 posts from 2017 that deserve a second look appeared first on The DO Loop.

1月 082018
 

Real-time analytics is, at its most basic, being able to present the right offer at the right time. And it’s the ultimate competitive differentiator in today’s age of highly valued customer experiences. Marketers need to understand why this capability is so important to customers, and how to implement this kind [...]

A marketer’s guide for real-time analytics was published on Customer Intelligence Blog.

1月 082018
 
Label multiple regression lines in a graph in SAS by using PROC SGPLOT

A SAS programmer asked how to label multiple regression lines that are overlaid on a single scatter plot. Specifically, he asked to label the curves that are produced by using the REG statement with the GROUP= option in PROC SGPLOT. He wanted the labels to be the slope and intercept of a linear regression line, as shown to the right. (Click to enlarge.)

Initially I thought that you could use the CURVELABEL option on the REG statement to generate labels, as follows:

proc sgplot data=sashelp.iris noautolegend;
   reg x=SepalLength y=PetalLength / group=Species CURVELABEL; /* does NOT work */
run;

However, the SAS log displays the following warning:

WARNING: CURVELABEL not supported for fit plots when a group variable is
         used.  The option will be ignored.

Fortunately, I thought of two other ways to create a graph that has a regression line for each group level, each with its own label. For linear regression, you can use the LINEPARM statement, as shown in the article "Add a diagonal line to a scatter plot." For general (possibly nonlinear) regression curves, you can find the location of the end of the curve and use the TEXT statement in PROC SGPLOT to add a label at that location.

Label the regression line for each group: The LINEPARM statement

Let's use Fisher's Iris data set for our example data. The Iris data contains 50 observations for each of three species of flowers: iris Setosa, iris Versicolor, and iris Virginica. The programmer wants to label the regression line for each species by using the slope and intercept of the line. The first step is to create a SAS data set that contains the intercept and slope for each curve. You can use the OUTEST= option in PROC REG to write the parameter estimates (intercept and slope) to a SAS data set. You can then use the CATX function in the DATA step to construct the labels, as follows:

proc sort data=sashelp.iris out=iris;
   by Species;
run;
 
/* compute parameter estimates */
proc reg data=iris outest=PE noprint;
   by Species;
   model PetalLength = SepalLength;
run;
 
/* construct labels from the parameter estimates */
data Labels;
   length Label $30;
   set PE(rename=(SepalLength=Slope));                   /* independent variable */
   Label = catx(" ", put(Intercept, BestD5.), '+',       /* separate by blank */
                     put(Slope, BestD5.), '* SepalLength');
   keep Label Species Intercept Slope;
run;
 
proc print noobs; run;

The LABELS data set contains a label for the regression line in each group. You can use other labels if you prefer. The following DATA step combines the labels with the original data. The SCATTER statement in PROC SGPLOT displays the data. The LINEPARM statement draws the lines and adds labels to the end of each line.

data Plot;
   set iris Labels;
run;
 
title "Regression Lines Labeled with Slope and Intercept";
proc sgplot data=Plot;
  scatter x=SepalLength y=PetalLength / group=Species;
  lineparm x=0 y=Intercept slope=Slope / group=Label curvelabel 
               curvelabelloc=outside clip;
run;
Label multiple regression lines in a graph in SAS by using PROC SGPLOT

Success! The regression line for each group is labeled by the formula for the line. For more information about displaying the formula for a regression line, see the SAS/STAT example "Adding Equations and Special Characters to Fit Plots."

Label the regression line for each group: The TEXT statement

The preceding method uses the LINEPARM statement, so it only works for lines. However, the user actually wanted to use the REG statement. With a little work, you can label curves that are produced by the REG statement or other curve-fitting statements. The idea is to obtain the data coordinates for the end of the curve, which will become the location of the label.

You might be thinking, if the curve is produced by a regression statement in PROC SGPLOT, how can we get the data coordinates out of the plot and into a data set? The answer is simple: You can use the ODS OUTPUT statement to write a data set that contains the data in any ODS graph. You can apply this trick to any ODS graph, including graphs created by SGPLOT. The following call to PROC SGPLOT uses an ODS OUTPUT statement to create a SAS data set that contains the data in the regression plot:

/* use ODS OUTPUT to find data coordinates of end of lines */
proc sgplot data=iris;
   ods output sgplot=RegPlot;    /* name of ODS table is 'sgplot' */
   reg x=SepalLength y=PetalLength / group=Species;
run;
 
proc contents short varnum; run; /* find names used by graph */

The variable names that are automatically manufactured by SAS procedures can be long and unwieldy, as shown by the call to PROC CONTENTS. I usually rename the long names to simpler names such as X, Y, GROUP, and so on. You should look at the structure of the REGPLOT data set so that the next DATA step makes sense. The DATA step saves only the last coordinates along the curve for each group (species).

data Coords;
set RegPlot(
    rename=(REGRESSION_SEPALLENGTH_PETAL___X = x
            REGRESSION_SEPALLENGTH_PETAL___Y = y
            REGRESSION_SEPALLENGTH_PETAL__GP = Group)
    where=(x ^=.));
by Group;
if last.Group;
keep x y Group;
run;
 
proc print noobs; run;

The COORDS data contains the location (in data coordinates) of the end of each regression line. You can overlay labels at these coordinates to label the curves. From the preceding section, the labels are in the LABELS data set, so you can merge the two data sets, as follows:

/* combine the positions and labels with original data */
data A;
merge Labels Coords(rename=(Group=Species));
by Species;
run;
 
data Plot;
set iris A;
/* optional: pad label with blanks on the left (if length is long enough) */
Label = "   " || Label;  
run;
 
proc sgplot data=Plot;
  reg x=SepalLength y=PetalLength / group=Species;
  text x=x y=y text=Label / position=right;
run;

The graph is shown at the top of this article.

Label multiple regression curves

If you study the previous section, you will see that the code does not rely on the linearity of the regression model. The same code works for polynomial regression and nonparametric regression curves such as are created by the LOESS and PBSPLINE statements in PROC SGPLOT. The following graph shows a PBSPLINE fit to the IRIS data. Because the penalized B-spline curve is nonparametric, there is no equation to display as a label. Instead, I use the Species name as a label and suppress the legend at the bottom of the graph. You can download the SAS program that creates this and all the graphs in this article.

Label multiple regression curves in a graph in SAS by using PROC SGPLOT

The post Label multiple regression lines in SAS appeared first on The DO Loop.

1月 062018
 

Deep learning is not synonymous with artificial intelligence (AI) or even machine learning. Artificial Intelligence is a broad field which aims to "automate cognitive processes." Machine learning is a subfield of AI that aims to automatically develop programs (called models) purely from exposure to training data.

Deep Learning and AI

Deep learning is one of many branches of machine learning, where the models are long chains of geometric functions, applied one after the other to form stacks of layers. It is one among many approaches to machine learning but not on equal footing with the others.

What makes deep learning exceptional

Why is deep learning unequaled among machine learning techniques? Well, deep learning has achieved tremendous success in a wide range of tasks that have historically been extremely difficult for computers, especially in the areas of machine perception. This includes extracting useful information from images, videos, sound, and others.

Given sufficient training data (in particular, training data appropriately labelled by humans), it’s possible to extract from perceptual data almost anything that a human could extract. Large corporations and businesses are deriving value from deep learning by enabling human-level speech recognition, smart assistants, human-level image classification, vastly improved machine translation, and more. Google Now, Amazon Alexa, ad targeting used by Google, Baidu and Bing are all powered by deep learning. Think of superhuman Go playing and near-human-level autonomous driving.

In the summer of 2016, an experimental short movie, Sunspring, was directed using a script written by a long short-term memory (LSTM) algorithm a type of deep learning algorithm.

How to build deep learning models

Given all this success recorded using deep learning, it's important to stress that building deep learning models is more of an art than science. To build a deep learning or any machine learning model for that matter one need to consider the following steps:

  • Define the problem: What data does the organisation have? What are we trying to predict? Do we need to collect more data? How can we manually label the data? Make sure to work with domain expert because you can’t interpret what you don’t know!
  • What metrics can we use to reliably measure the success of our goals.
  • Prepare validation process that will be used to evaluate the model.
  • Data exploration and pre-processing: This is where most time will be spent such as normalization, manipulation, joining of multiple data sources and so on.
  • Develop an initial model that does better than a baseline model. This gives some indication of whether machine learning is ideal for the problem.
  • Refine model architecture by tuning hyperparameters and adding regularization. Make changes based on validation data.
  • Avoid overfitting.
  • Once happy with the model, deploy it into production environment. This may be difficult to achieve for many organisations giving that a deep learning score code is large. This is where SAS can help. SAS has developed a scoring mechanism called "astore" which allows deep learning method to be pushed into production with just a click.

Is the deep learning hype justified?

We're still in the middle of deep learning revolution trying to understand the limitations of this algorithm. Due to its unprecedented successes, there has been a lot of hype in the field of deep learning and AI. It’s important for managers, professionals, researchers and industrial decision makers to be able to distill this hype from reality created by the media.

Despite the progress on machine perception, we are still far from human level AI. Our models can only perform local generalization, adapting to new situations that must be similar to past data, whereas human cognition is capable of extreme generalization, quickly adapting to radically novel situations and planning for long-term future situations. To make this concrete, imagine you’ve developed a deep network controlling a human body, and you wanted it to learn to safely navigate a city without getting hit by cars, the net would have to die many thousands of times in various situations until it could infer that cars are dangerous, and develop appropriate avoidance behaviors. Dropped into a new city, the net would have to relearn most of what it knows. On the other hand, humans are able to learn safe behaviors without having to die even once—again, thanks to our power of abstract modeling of hypothetical situations.

Lastly, remember deep learning is a long chain of geometrical functions. To learn its parameters via gradient descent one key technical requirements is that it must be differentiable and continuous which is a significant constraint.

Looking beyond the AI and deep learning hype was published on SAS Users.

1月 052018
 

For the last two years, I’ve spent time listening to and brainstorming with automakers, their suppliers and other technology companies about how to monetize connected vehicle data. This involves: Data that’s offloaded from a roving vehicle fleet (cars, trains, semi-trucks, farm tractors, you name it!). Data coming from driving-related mobile apps used [...]

How to monetize connected vehicles with analytics and mobility services was published on SAS Voices by Lonnie Miller