12月 182017
 

Slice, slice, baby! You've got to slice, slice, baby!

When you fit a regression model that has multiple explanatory variables, it is a challenge to effectively visualize the predicted values. This article describes how to visualize the regression model by slicing the explanatory variables. In SAS, you can use the SLICEFIT option in the EFFECTPLOT statement visualize a slice of a regression surface.

Why the naive visualization fails

For a regression model that contains one explanatory variable and (optionally) one classification variable, it is easy to visualize the predicted values. Most statistical software packages make it easy to create a "fit plot." For example, the following call to PROC GLM in SAS fits a model to some patients in a heart study:

data Heart;    /* create example data */
set sashelp.heart(obs=500);
where cholesterol < 400;
run;
 
ods graphics / attrpriority=none     /* groups determine symbols and line patterns */
               imagemap tipmax=1500; /* enable tool tips */
 
/* easy to visualize predicted values for 1 continuous and 1 categorical explanatory variable */
proc glm data=Heart plots=meanplot;  /* PLOTS= option supported in many procedures */
class Sex;
model Cholesterol = Sex Systolic;
quit;

The graph shows the observed responses versus the continuous explanatory variable and overlays two curves: one for the predicted values when Sex='Male' and the other when Sex='Female'. Creating this graph is easy because the procedure does all the work.

What happens if you add additional explanatory variables into the model and try to create the same graph? For reasons that will soon be apparent, the procedure will not automatically create the graph when there are additional variables in the model. However, you can use the OUTPUT statement to write the predicted values to a SAS data set and use PROC SGPLOT to create the graph. You will need to sort by the variable that you are plotting on the X axis, as follows:

proc glm data=Heart;
class Sex Smoking_status;
model Cholesterol = Sex Smoking_Status    /* two classification variables */
                    Systolic Weight;      /* two continuous variables */
output out=GLMOut p=Pred;                 /* output data set contains predicted values */
quit;
 
proc sort data=GLMOut; by Systolic Sex; run; /* sort by X variable for graphing */
 
title "Predicted Values";
proc sgplot data=GLMOut;
styleattrs datalinepatterns=(solid solid);
scatter x=Systolic y=Cholesterol / group=Sex transparency=0.75;
series  x=Systolic y=Pred / group=Sex tip=(Smoking_Status Weight); /* add tool tips */
yaxis min=180 max=300;    /* zoom in on predicted values */
footnote J=L "Jagged Lines Because Covariates Have Multiple Values";
run;
Visualize regression model: Graph of response versus explanatory variable. There are hidden explanatory variables. Markers are observed values. Jagged lines are the projections of the predicted values.

This graph looks strange. The regression model is linear, but a plot of the predicted values shows a jagged line for the predicted values. What is going on?

You can use the tool tips feature of the graph to understand why the curves are jagged. If you hover the cursor near a point on the jagged line, the values of the hidden explanatory variables (Weight and Smoking_Status) appear. The graph shows the tool tip at a point that corresponds to a male patient who weighs 160 pounds and who is a moderate smoker. By moving the cursor, you can discover that the previous point along the red line corresponds to a male patient who weighs 155 pounds and is a non-smoker. The subsequent point corresponds to a heavy smoker who weighs 151 pounds.

Because Weight and Smoking_Status were included in the model, the predicted values "jump" up or down as you move along the Systolic axis. Two observations that have similar Systolic values might have very different values for other (hidden) components. Geometrically, this graph displays the projection of the predicted values onto the two-dimensional (Systolic, Cholesterol) plane. To obtain a smooth curve, you must "slice" a response surface rather than project it.

Slice the response surfaces

The predicted values for this model form a set of 10 planes in the three-dimensional space (x, y, z) = (Systolic, Weight, Cholesterol). Each plane is the graph of predicted values for a combination of the 2 genders and 5 levels of smokers. There is one plane is for the ('Male', 'Non-smoker') patients, another for the ('Female', 'Light (1-5)') patients, and so on.

A "slice" through the response surfaces is accomplished by evaluating the model at a particular value of one of the continuous variables. This gives a two-dimensional plot that has 10 lines on it. Because 10 lines might overcrowd the display, it is common to pick a reference value for one of the classification variables and plot only the lines that are indexed by that value. For example, if you choose the reference value Smoking_Status = 'Non-smoker', the plot contains two lines that correspond to ('Male', 'Non-smoker') and ('Female', 'Non-smoker').

This might sound complicated, but SAS provides an easy implementation: the SLICEFIT option in the EFFECTPLOT statement, which is supported in several regression procedures, enables you to specify how you want to slice the surfaces and which combinations of levels you want to display.

By default, the EFFECTPLOT SLICEFIT statement creates a "sliced fit plot" that graphs the response variable versus the first continuous variable and shows the predicted values for each level of the first class variable. "First" is determined by the order in which the variables are listed on the MODEL statement. Other continuous variables are sliced (evaluated) at their mean value; other classification variables are evaluated at their last level.

PROC GLM does not support the EFFECTPLOT statement, but PROC GENMOD does. The following call to PROC GENMOD fits the same model and creates a "sliced fit plot" of the predicted values. The sliced fit plot will show the response variable (Cholesterol) versus the first continuous variable (Systolic) overlaid with predictions for males and females. The value of the Weight variable is set to 151.7, which is the mean value of the sample. The value of the Smoking_Status variable is set to 'Very Heavy (> 25)', which is the last level in alphanumeric order.

title; footnote;
ods graphics / attrpriority=none imagemap=off;
proc genmod data=Heart;
class Sex Smoking_status;
model Cholesterol = Sex Smoking_Status   /* classification variables */
                    Systolic Weight;     /* continuous variables */
/* Plot response vs first cont var for each level of first class var */  
/* Set other cont vars to MEAN; set other class vars to last level */
effectplot slicefit / obs;               /* add scatter plot of observations */
run;
Sliced fit plot for multivariate regression model. Created by the EFFECTPLOT statement in SAS.

The sliced fit plot shows smooth (not jagged) lines because the model is evaluated at constant values of the hidden variables. The values (Weight, Smoking_Status) = (151.7, 'Very Heavy (> 25)') are held constant while the model is evaluated over the range of the Systolic and Sex variables.

Other ways to slice the response surfaces

The SLICEFIT option in the EFFECTPLOT statement supports many suboptions that enable you to control the way that the model is sliced:

  • You can plot any two variables, one continuous and one categorical. Use the X= option to specify the continuous variable and the SLICEBY= option to specify the categorical variable.
  • You can specify the statistics that are used to slice the continuous covariates. By default the covariates are sliced at their mean values. You can use the AT option to specify the following keywords: MEAN (the default), MIN, MAX, MEDIAN, or MIDRANGE. (Recall that the midrange is the value (min+max)/2.) For class variables, the REF option specifies that the last level be used.
  • You can use the AT option to specify particular values for slicing the continuous covariates and class variables.
  • You can specify multiple values for the AT option. The EFFECTPLOT statement will create a panel of sliced fit plots, one for each joint combination of specified values.

The following four EFFECTPLOT statements correspond to the four items in the previous list:

proc genmod data=Heart;
class Sex Smoking_status;
model Cholesterol = Sex Smoking_Status    /* classification variables */
                    Systolic Weight;      /* continuous variables */
/* specify the X and categorical variables */
effectplot slicefit(X=weight sliceby=Smoking_status)  / obs;
 
/* specify statistics used to slice the covariates */
effectplot slicefit / at MIDRANGE      /* new default for continuous vars */ 
                         REF;          /* default for classification vars */
 
/* specify explicit values of the covariates */
effectplot slicefit / at(Weight=150
                         Smoking_Status='Non-smoker');
 
/* specify multiple values of the covariates to get a panel */
effectplot slicefit / at(Weight=150 200
			 Smoking_Status='Non-smoker' 'Heavy (16-25)');
quit;

To save space, only the last sliced fit plot (the panel) is shown below. I have linked to the other three plots: the plot of Weight and Smoking_Status, the plot at midrange, and the plot at specified values.

Panel of sliced fit plot created by EFFECTPLOT SLICEFIT / AT(Weight=150 200  Smoking_Status='Non-smoker' 'Heavy (16-25)'

In summary, you can use the SLICEFIT option in the EFFECTPLOT statement in SAS to visualize regression models that contain many explanatory variables. The AT option enables you to specify values for the covariates. The resulting graph displays a slice through the response surface.

The EFFECTPLOT statement is also available in PROC PLM. PROC PLM enables you to visualize a model that has been saved to an item store. The OBS option (which overlays the predicted values and a scatter plot) is not available in PROC PLM because the item store does not include the observations.

The post Visualize multivariate regression models by slicing continuous variables appeared first on The DO Loop.

12月 162017
 

With the Christmas holiday approaching, I got to wondering what they call Santa in other countries. Of course, some countries don't celebrate Christmas - but most countries at least have some sort of "winter holiday," and most also have some tradition of gift-giving. So, I guess the better question might [...]

The post What do they call Santa in other countries? appeared first on SAS Learning Post.

12月 162017
 

SAS' new Support site search“Alexa, how many ounces are in a 750-ml bottle of wine?”
“Waze, how many miles between Cary and Kannapolis, NC?”
“Google, which NFL teams are favored to win this weekend?”

Every day many of us turn to some sort of search vehicle to help us solve a work issue or win a “discussion” with a friend. In fact, the word “Google”  has now become a verb in our lexicon. Don’t know what PROC BCHOICE does? Google it!

Here at SAS, there are over 30,000 pages on the support.sas.com site, so we recognize how important an efficient search can be for our users. As part of our continuing effort to evolve the site and provide you with the best possible user experience, we’re pleased to announce that this week we launched a beta version of a new search engine. We believe the new search delivers an improved experience, but we’ll let you be the judge.

We want you to give the new search a try and provide your feedback. Type a few terms into the search box and let us know what you think. You’ll find a “Feedback” link in the right-hand column of each page. Click it, complete the short form, and give us your opinions. We want to know what works well for you and what doesn’t, and which filters you see yourself using and which you think aren’t as useful. Is there a feature you would like to see and don’t? Please use the search preview as often as you like, and feel free to give us feedback every time you do. But hurry, because the search preview will only run until the end of December.

We’ll use your feedback to put the finishing touches on our search functionality. Look for our first production release in early 2018! We want the support site search enhancements to be as important in helping you use SAS as Google is in helping you become the champion of your fantasy football team!

Thanks for your help and happy holidays!

Click to image to launch; type in "Search" box; click "Feedback" to share your comments

 

Test-Drive SAS' new Support site search was published on SAS Users.

12月 152017
 

What is blockchain and how can you analyze data in a blockchain? This article will discuss various forms of blockchain analytics from a tactical or heuristic perspective. I’ll explain how SAS technologies can provide advanced analytics for operational, value/asset and regulatory viewpoints in the diverse world of open source blockchain [...]

A practical approach to blockchain analytics was published on SAS Voices by Sam Penfield

12月 152017
 

The paradigm in which we have all lived in an electrified world is changing. The convergence of technology, changing business models, and increasing customer expectations means the way utilities have operated for the last 100+ years must change. Further, this change must embrace where the operations side of the business [...]

Coming soon to a grid near you: certainty amid confusion was published on SAS Voices by Mike F. Smith

12月 142017
 

As you might have heard, sasCommunity.org -- a wiki-based web site that has served as a user-sourced SAS repository for over a decade -- is winding down. This was a difficult decision taken by the volunteer advisory board that runs the site. However, the decision acknowledges a new reality: SAS professionals have many modern options for sharing and promoting their professional work, and they are using those options. In 2007, the birth year of sasCommunity.org, the technical/professional networking world was very different than it is today. LinkedIn was in its infancy. GitHub didn't exist. SAS Support Communities (communities.sas.com) was an experiment just getting started with a few discussion forums. sasCommunity.org (and its amazing volunteers) blazed a trail for SAS users to connect and share, and we'll always be grateful for that.

Even with the many alternatives we now have, the departure of sasCommunity.org will leave a gap in some of our professional sharing practices. In this article, I'll share some ideas that you can use to fill this gap, and to extend the reach of your SAS knowledge beyond just your SAS community colleagues. Specifically, I'll address how you can make the biggest splash and have an enduring impact with that traditional mode of SAS-knowledge sharing: the SAS conference paper.

Extending the reach of your SAS Global Forum paper

Like many of you, I've written and presented a few technical papers for SAS Global Forum (and also for its predecessor, SUGI). With each conference, SAS publishes a set of proceedings that provide perpetual access to the PDF version of my papers. If you know what you're looking for, you can find my papers in several ways:

All of these methods work with no additional effort from me. When your paper is published as part of a SAS conference, that content is automatically archived and findable within these conference assets. But for as far as this goes, there is opportunity to do so much more.

Write an article for SAS Support Communities

ArtC's presenter page

sasCommunity.org supported the idea of "presenter pages" -- a mini-destination for information about your conference paper. As an author, you would create a page that contains the description of your paper, links to supporting code, and any other details that you wanted to lift out of the PDF version of your paper. Creating such a page required a bit of learning time with the wiki syntax, and just a small subset of paper presenters ever took the time to complete this step. (But some prolific contributors, such as Art Carpenter or Don Henderson, shared blurbs about dozens of their papers in this way.) Personally, I created a few pages on sasCommunity.org to support my own papers over the years.

SAS Support Communities offers a similar mechanism: the SAS Communities Library. Any community member can create an article to share his or her insights about a SAS related topic. A conference paper is a great opportunity to add to the SAS Communities Library and bring some more attention to your work. A communities article also serves as platform for readers to ask you questions about your work, as the library supports a commenting feature that allows for discussion.

Since sasCommunity.org has announced its retirement plans, I took this opportunity to create new articles on SAS Support Communities to address some of my previous papers. I also updated the content, where appropriate, to ensure that my examples work for modern releases of SAS. Here are two examples of presentation pages that I created on SAS Support Communities:

One of my presentations on in the SAS Communities Library

When you publish a topic in the SAS Communities Library, especially if it's a topic that people search for, your article will get an automatic boost in visitors thanks to the great search engine traffic that drives the communities site. With that in mind, use these guidelines when publishing:

  • Use relevant key words/phrases in your article title. Cute and clever titles are a fun tradition in SAS conference papers, and you should definitely keep those intact within the body of your article. But reserve the title field for a more practical description of the content you're sharing.
  • Include an image or two. Does your paper include an architecture diagram? A screen shot? A graph or plot? Use the Insert Photos button to add these to your article for visual interest and to give the reader a better idea of what's in your paper.
  • Add a snippet of code. You don't have to attach all of your sample code with hundreds of program lines, but a little bit of code can help the reader with some context. Got lots of code? We'll cover that in the next section.

To get started with the process for creating an article...see this article!

Share your code on GitHub

SAS program code is an important feature in SAS conference papers. A code snippet in a PDF-style paper can help to illustrate your points, but you cannot effectively share entire programs or code libraries within this format. Code that is locked up in a PDF document is difficult for a reader to lift and reuse. It's also impossible to revise after the paper is published.

GitHub is a free service that supports sharing and collaboration for any code-based technology, including SAS. Anyone who works with code -- data scientists, programmers, application developers -- is familiar with GitHub at least as a reader. If you haven't done so already, it might be time to create your own GitHub account and share your useful SAS code. I have several GitHub repositories (or "repos" as we GitHub hipsters say) that are related to papers, blog posts, and books that I've written. It just feels like a natural way to share code. Occasionally a reader suggests an improvement or finds a bug, and I can change the code immediately. (Alas, I cannot go back in time and change a published paper...)

A sample of conference-paper-code on my GitHub.

List your published work on your LinkedIn profile

So, you've presented your work at a major SAS conference! Your professional network needs to know this about you. You should list this as an accomplishment on your resume, and definitely on your LinkedIn profile.

LinkedIn offers a "publication" section -- perfect for listing books and papers that you've written. Or, you can add this to the "projects" section of your profile, especially if you collaborate with someone else that you want to include in this accomplishment. I have yet to add my entire back-catalog of conference papers, but I have added a few recent papers to my LinkedIn profile.

One of a few publications listed on my LinkedIn profile

Bonus step: write about your experience in a LinkedIn article

Introspection has a special sort of currency on LinkedIn that doesn't always translate well to other places. A LinkedIn article -- a long-form post that you write from a first-person perspective -- gives you a chance to talk about the deeper meaning of your project. This can include the story of inspiration behind your conference paper, personal lessons that you learned along the way, and the impact that the project had in your workplace and on your career. This "color commentary" adds depth to how others see your work and experience, which helps them to learn more about you and what drives you.

Here are a few examples of what I'm talking about:

It's not about you. It's about us

The techniques I've shared here might sound like "how to promote yourself." Of course, that's important -- we each need to take responsibility for our own self-promotion and ensure that our professional achievements shine through. But more importantly, these steps play a big role helping your content to be findable -- even "stumble-uponable" (a word I've just invented). You've already invested a tremendous amount of work into researching your topic and crafting a paper and presentation -- take it the extra bit of distance to make sure that the rest of us can't miss it.

The post How to share your SAS knowledge with your professional network appeared first on The SAS Dummy.

12月 142017
 

The goal of all types of analytics is to provide business insight. Consider that: Descriptive analytics provides the business with insight on what happened in the past and what is happening now. Predictive analytics provides the business with insight on the probability of what will happen in the future. Prescriptive analytics provides the [...]

The post How a data-driven business supports analytics goals appeared first on The Data Roundtable.

12月 132017
 

SAS Data Preparation 2.1 is now available and it includes the ability to perform data quality transformations on your data using the definitions from the SAS Quality Knowledge Base (QKB).

The SAS Quality Knowledge Base is a collection of files which store data and logic that define data quality operations such as parsing, standardization, and generating match codes to facilitate fuzzy matching based on geographic locales. SAS software products reference the QKB when performing data quality transformations on your data.  These products include: SAS Data Integration Studio, SAS DataFlux Data Management Studio/Server, SAS code via dqprocs, SAS MDM, SAS Data Loader for Hadoop, SAS Event Stream Processing, and now SAS Data Preparation which is powered by SAS Viya.

Out-of-the-box QKB definitions include the ability to perform data quality operations on items such as Name, Address, Phone, and Email.

SAS Data Preparation 2.1

SAS Data Preparation – Data Quality Transformations

The following are the data quality transformations available in SAS Data Preparation:

  • Casing – case a text string in upper, lower, or proper case. Example using the Proper (Organization) case definition – input: sas institute   output: SAS Institute.
  • Parsing – break up a text string into its tokens. Example using the Name parse definition – input: James Michael Smith   output: James (Given Name token), Michael (Middle Name token), and Smith (Family Name token).
  • Field extraction – extract relevant tokens from a text string. Example using a custom created extraction definition for Clothing information – input: The items purchased were a small red dress and a blue shirt, large   output: dress; shirt (Item token), red; blue (Color token), and small; large (Size token).
  • Gender analysis – guess the gender of a text string. Example using the Name gender analysis definition – input: James Michael Smith   output: M (abbreviation for Male).
  • Identification analysis – guess the type of data for a text string. Example using the Contact Info identification analysis definition:
  • Match codes – generate a code to fuzzy match text strings. Example using the Name match definition at a sensitivity of 85:
    For more information on match codes, view this YouTube video on The Power of the SAS® Matchcode.
  • Standardize – put a text string into a common format. Example using the Phone standardization definition – input: 9196778000   output: (919) 677 8000.

Note:  While all the examples above are using definitions from the English (United States) locale in the SAS Quality Knowledge Base for Contact Information, QKBs are available for dozens of locales.

You can also customize the definitions in the QKB using SAS DataFlux Data Management Studio.  This allows you to update the out-of-the-box QKB definitions or create your own data types and definitions to suit your project needs.  For example, you may need to create a definition to extract the clothing information from a free-form text field as shown the Field extraction example.  These customized definitions can then be used in SAS Data Preparation as part of your data quality transformations.  For more information on Customizing the QKB, you can view this YouTube video.

For more information on the SAS Quality Knowledge Base (QKB), you can view its documentation.

SAS Data Preparation 2.1: Data quality transformations was published on SAS Users.

12月 132017
 

The SAS language is large. Even after 20+ years of using SAS, there are many features that I have never used. Recently it became necessary for me to learn about DICTIONARY tables in PROC SQL (and the associated SASHELP views) because I needed to programmatically obtain the text for the current value of the system title in SAS. I had heard a lot about DICTIONARY tables, but this was my first time using them in a program.

This article discusses DICTIONARY tables and shows how to use the DICTIONARY.Titles table to obtain the current value of titles and footnotes in SAS.

DICTIONARY tables and titles

DICTIONARY tables are documented in the PROC SQL documentation. They are special read-only PROC SQL tables that contain information about the current state of SAS, including the state of libraries, data sets, and system options. The documentation lists all DICTIONARY tables, and I determined that I needed to look at the DICTIONARY.Titles table in PROC SQL or (if I needed to use another SAS procedure) the SASHELP.VTitle view, which contains the same information.

After finding out which table to use, I wanted to display the contents of the table. The following call to PROC SQL displays the structure of the table (names and types of variables) and the contents. Equivalently, you can use PROC CONTENTS and PROC PRINT to show similar information for the view SASHELP.VTitle. The output shows the table for a new SAS session:

proc sql;
describe table Dictionary.Titles;         /* writes to SAS log */
select * from Dictionary.Titles;          /* display table */
quit;
 
proc contents data=Sashelp.VTitle;  run;  
proc print data=Sashelp.VTitle;     run;

The table contains three variables:

  • The TYPE variable is a one-character variable with the values 'T' (for title) or 'F' (for footnote).
  • The NUMBER variable is a numeric variable. The value '1' indicates the value of the TITLE1 or FOOTNOTE1 global statements. The value '2' indicates the value of the TITLE2 or FOOTNOTE2 statements, and so on.
  • The TEXT variable is a 256-character variable that contains the value of a title or footnote. The output shows that when you first start SAS, the TITLE1 statement is set to "The SAS System."

You can run an example to see how the contents of the view change after you submit TITLE and FOOTNOTE statements:

title "Normal Distribution";    /* alias for TITLE1 statement */
title2 "mu = 0; sigma = 1;";
footnote "N = 100";             /* alias for FOOTNOTE1 statement */
 
proc print data=Sashelp.VTitle;  run;

Putting a title into a macro variable

The structure of the DICTIONARY.Titles table implies that you can use a WHERE clause to subset the table. For example, the clause WHERE Type="T" & Number=1 selects only the row for the TITLE1 statement. Similarly, the clause WHERE Type="F" & Number=2 selects the row for the FOOTNOTE2 statement. You can use the SELECT INTO :MacroVar statement in PROC SQL to put data into a macro variable, as follows:

PROC SQL noprint;
select Text into :TitleText TRIMMED  /* put the trimmed value into a macro */
  from Dictionary.Titles
  where Type="T" & Number=1;
quit;
%put &=TitleText;
TITLETEXT=Normal Distribution

You need to be a little careful when using this technique in production code. If you ask for the TITLE2 information when that title is not set, then the WHERE clause will return an empty table. For example, if you clear the TITLE1 statement and rerun the previous PROC SQL statement you will see that the SAS log displays NOTE: No rows were selected and value of the TitleText macro is not updated.

One way to handle this potential problem is to use the %LET statement to set the macro variable to an empty value before you call PROC SQL. If the macro variable is empty after PROC SQL runs, then the requested title or footnote is not set. The following macro uses this technique to set a macro variable to the value of the Nth title, where you can specify the parameter N:

/* Get the N_th title into the &TitleText macro variable */
%macro GetTitle(Number=1);
%global TitleText;
%let TitleText = ;           /* value is empty if TITLEn does not exist */
PROC SQL noprint;
select Text into :TitleText TRIMMED    /* value is set if TITLEn exists */
  from Dictionary.Titles
  where Type="T" & Number=&Number;
quit;
%mend;
 
/* test the %GetTitle macro */
title "Lognormal Distribution";   /* set TITLE1; clear TITLE2 */
%GetTitle();
%put Title1 = "&TitleText";      /* text of title1 */
%GetTitle(Number=2);
%put Title2 = "&TitleText";      /* text of title1 */
Title1 = "Lognormal Distribution"
Title2 = ""

This basic macro is sufficient for my purposes. Feel free to propose improvements in the comments. Also, let me know how you use DICTIONARY tables in your work.

If you would like to learn more about DICTIONARY tables, the following two references will get you started. Many papers have been written about DICTIONARY tables and views. An internet search of the form
sas proceedings "dictionary table"
will reveal some of the papers from SAS conferences.

The post How to get the current TITLE in SAS appeared first on The DO Loop.