books

3月 022021
 

The more I use SAS Studio in the cloud via SAS OnDemand for Academics, the more I like it. To demonstrate how useful the Files tab is, I'm going to show you what happens when you drag a text file, a SAS data set, and a SAS program into the Editor window.

I previously created a folder called MyBookFiles and uploaded several files from my local computer to that folder.  You can see a partial list of files in the figure below.

Notice that there are text files, SAS data sets, SAS programs, and some Excel workbooks. Look what happens when I drag a text file (Blank_Delimiter.txt) into the Editor window.

No need to open Notepad to view this file—SAS Studio displays it for you. What about a SAS data set? As an example, I dragged a SAS data set called blood_pressure into the Editor.

You see a list of variables and some of the observations in this data set.  There are vertical and horizontal scroll bars (not shown in the figure) to see more rows or columns. If you want to see a listing of the entire data set or the first 'n' observations, you can run the List Data task, located under the Tasks and Utilities tab.

For the last example, I dragged a SAS program into the editor. It appears exactly the same as if I opened it in my stand-alone version of SAS.

At this point, you can run the program or continue to write more SAS code. By the way, the tilde (~) used In the INFILE statement is a shortcut for your home directory. Follow it with the folder name and the file name.

You can read more about SAS Studio in the cloud in my latest book, Getting Started with SAS Programming: Using SAS Studio in the Cloud.

Viewing files, programs, and data sets in SAS Studio was published on SAS Users.

12月 142020
 

Do you need to see how long patients have been treated for? Would you like to know if a patient’s dose has changed, or if the patient experienced any dose interruptions? If so, you can use a Napoleon plot, also known as a swimmer plot, in conjunction with your exposure data set to find your answers. We demonstrate how to find the answer in our recent book SAS® Graphics for Clinical Trials by Example.

You may be wondering what a Napoleon plot is? Have you ever heard of the map of Napoleon’s Russian campaign? It was a map that displayed six types of data, such as troop movement, temperature, latitude, and longitude on one graph (Wikipedia). In the clinical setting, we try to mimic this approach by displaying several different types of safety data on one graph: hence, the name “Napoleon plot.” The plot is also known as a swimmer plot because each patient has a row in which their data is displayed, which looks like swimming lanes.

Code

Now that you know what a Napoleon plot is, how do you produce it? In essence, you are merely writing GTL code to produce the graph you need. In order to generate a Napoleon plot, some key GTL statements that are used are DISCRETEATTRMAP, HIGHLOWPLOT, SCATTERPLOT and DISCRETELEGEND. Other plot statements are used, but the statements that were just mentioned are typically used for all Napoleon plot. In our recent book, one of the chapters carefully walks you through each step to show you how to produce the Napoleon plot. Program 1, below, gives a small teaser of some of the code used to produce the Napoleon Plot.

Program 1: Code for Napoleon Plot That Highlights Dose Interruptions

	   discreteattrmap name = "Dose_Group";
            value "54" / fillattrs = (color = orange) 
                         lineattrs = (color = orange pattern = solid);     
            value "81" / fillattrs = (color = red) 
                         lineattrs = (color = red pattern = solid);
         enddiscreteattrmap;
 
         discreteattrvar attrvar = id_dose_group var = exdose attrmap = "Dose_Group";
 
         legenditem type = marker name = "54_marker" /
            markerattrs = (symbol = squarefilled color = orange)
            label = "Xan 54mg";
 
         < Other legenditem statements >
 
 
	     layout overlay / yaxisopts = (type = discrete 
                                         display = (line label)     
                                         label = "Patient")
 
	        highlowplot y = number 
                          high = eval(aendy/30.4375) 
                          low = eval(astdy/30.4375) / 
                 group = id_dose_group                       
                 type = bar 
                 lineattrs = graphoutlines 
                 barwidth = 0.2;
		 scatterplot y = number x = eval((max_aendy + 10)/30.4375) /      
                 markerattrs = (symbol = completed size = 12px);               
		 discretelegend "54_marker" "81_marker" "completed_marker" /  
                 type = marker  
                 autoalign = (bottomright) across = 1                          
                 location = inside title = "Dose";
         endlayout;

Output

Without further ado, Output 1 shows you an example of a Napoleon plot. You can see that there are many patients, and so the patient labels have been suppressed. You also see that the patient who has been on the study the longest has a dose delay indicated by the white space between the red and orange bars. While this example illustrates a simple Napoleon plot with only two types, dose exposure and treatment, the book has more complex examples of swimmer plots.

Output 1: Napoleon Plot that Highlights Dose Interruptions

Napoleon plot with orange and red bars showing dose exposure and treatment

How to create a Napoleon plot with Graph Template Language (GTL) was published on SAS Users.

9月 222020
 

Everyone knows that SAS has been helping programmers and coders build complex machine learning models and solve complex business problems for many years, but did you know that you can also now build machines learning models without a single line of code using SAS Viya?

SAS has been helping programmers and coders build complex machine learning models and solve complex business problems over many years.

Building on the vision and commitment to democratize analytics, SAS Viya offers multiple ways to support non-programmers and empowers people with no programming skills to get up and running quickly and build machine learning models. I touched on some of the ways this can be done via SAS Visual Analytics in my previous post on analytics for everyone with SAS Viya. In addition, SAS Viya also supports more advanced pipeline-based visual modeling via SAS Visual Data Mining and Machine Learning. The combination of these different tools within SAS Viya supporting a low-code/no-code approach to modeling makes SAS Viya an incredibly flexible and powerful analytics platform that can help drive analytics usage and adoption throughout an organization.

As analytics and machine learning become more pervasive, an analytics platform that supports a low-code/no-code approach can get more people involved, drive ongoing innovations, and ultimately accelerate digital transformation throughout an organization.

Speed

I have met my fair share of coding ninjas who blew me away with their ability to build models using keyboards with lightning speed. But when it comes to being able to quickly get an idea into a model and generate all the assessment statistics and charts, there is nothing quite like a visual approach to building machine learning models.

In SAS Viya, you can build a decision tree model literally just by dragging and dropping the relevant variables onto the canvas as shown in the animated screen flow below.

Building a machine learning model via drag and drop

In this case, we were able to quickly build a decision tree model that predicts child mortality rates around the world. Not only do we get the decision tree in all its graphics glory (on the left-hand side of the image), we also get the overall model fit measure (Average Standard Error in this case), a variable importance chart, as well as a lift chart all without having to enter a single line of code in under 5 seconds!

You also get a bunch of detailed statistical outputs, including a detailed node statistics table without having to do anything extra. This is useful for when you need to review the distribution and characteristics of specific nodes when using the decision tree.

Detailed node statistics table

 

What’s more, you can leverage the same drag-and-drop paradigm to quickly tune the model. In our case, you can do simple modifications like adding a new variable by simply dragging a new data item onto the canvas or more complex techniques like manually splitting or pruning a node just by clicking and selecting a node on the canvas. The whole model and visualization refreshes instantly as you make changes, and you get instant feedback on the outputs of your tuning actions, which can help drive rapid iteration and idea testing.

Governance and collaboration

A graphical and components-based approach to modeling also has the added benefits of providing a stronger level of governance and fostering collaboration. Building machine learning model is often a team sport, and the ability to share and reuse models easily can dramatically reduce the cost and effort involved in building and maintaining models.

SAS Visual Data Mining and Machine Learning enables users to build complex, enterprise-grade pipeline models that support sophisticated variable selection, feature engineering techniques, as well as model comparison processes all within a single, easy-to-understand, pipeline-based design framework.

Pipeline modeling using SAS VDMML

The graphical, pipeline-based modeling framework within SAS Visual Data Mining and Machine Learning leverages common components, supports self-documentation, and allows users to leverage a template-based approach to building and sharing machine learning models quickly.

More importantly, as a new user or team member who needs to review, tune or reuse someone else’s model, it is much easier and quicker to understand the design and intent of the various components of a pipeline model and make the needed changes.

It is much easier and quicker to understand the design and intent of the various components of a pipeline model.

Communication and storytelling

Finally, and perhaps most importantly, a graphical, low-code/no-code approach to building machine learning models makes it much easier to communicate both the intent and potential impact of the model. Figures and numbers represent facts, but narratives and stories convey emotion and build connections. The visual modeling approaches supported by SAS Viya enable you to tell compelling stories, share powerful ideas, and inspire valuable actions.

SAS Viya enables you to make changes and apply filters on the fly within its various visual modeling environments. With the model training process and model outputs all represented visually, it makes it extremely easy to discuss business scenarios, test hypotheses, and test modeling strategies and approaches, even with people without a deep machine learning background.

There is no question that a programmatic approach to building machine learning models offers the ultimate power and flexibility and enables data scientist to build the most complex and advanced machine learning models. But when it comes to speed, governance, and communications, a graphical, low-code/no-code approach to building machine learning definitely has a lot to offer.

To learn more about a low-code/no-code approach to building machine learning models using SAS Viya, check out my book Smart Data Discovery Using SAS® Viya®.

The value of a low-code/no-code approach to building machine learning models was published on SAS Users.

9月 222020
 

Everyone knows that SAS has been helping programmers and coders build complex machine learning models and solve complex business problems for many years, but did you know that you can also now build machines learning models without a single line of code using SAS Viya?

SAS has been helping programmers and coders build complex machine learning models and solve complex business problems over many years.

Building on the vision and commitment to democratize analytics, SAS Viya offers multiple ways to support non-programmers and empowers people with no programming skills to get up and running quickly and build machine learning models. I touched on some of the ways this can be done via SAS Visual Analytics in my previous post on analytics for everyone with SAS Viya. In addition, SAS Viya also supports more advanced pipeline-based visual modeling via SAS Visual Data Mining and Machine Learning. The combination of these different tools within SAS Viya supporting a low-code/no-code approach to modeling makes SAS Viya an incredibly flexible and powerful analytics platform that can help drive analytics usage and adoption throughout an organization.

As analytics and machine learning become more pervasive, an analytics platform that supports a low-code/no-code approach can get more people involved, drive ongoing innovations, and ultimately accelerate digital transformation throughout an organization.

Speed

I have met my fair share of coding ninjas who blew me away with their ability to build models using keyboards with lightning speed. But when it comes to being able to quickly get an idea into a model and generate all the assessment statistics and charts, there is nothing quite like a visual approach to building machine learning models.

In SAS Viya, you can build a decision tree model literally just by dragging and dropping the relevant variables onto the canvas as shown in the animated screen flow below.

Building a machine learning model via drag and drop

In this case, we were able to quickly build a decision tree model that predicts child mortality rates around the world. Not only do we get the decision tree in all its graphics glory (on the left-hand side of the image), we also get the overall model fit measure (Average Standard Error in this case), a variable importance chart, as well as a lift chart all without having to enter a single line of code in under 5 seconds!

You also get a bunch of detailed statistical outputs, including a detailed node statistics table without having to do anything extra. This is useful for when you need to review the distribution and characteristics of specific nodes when using the decision tree.

Detailed node statistics table

 

What’s more, you can leverage the same drag-and-drop paradigm to quickly tune the model. In our case, you can do simple modifications like adding a new variable by simply dragging a new data item onto the canvas or more complex techniques like manually splitting or pruning a node just by clicking and selecting a node on the canvas. The whole model and visualization refreshes instantly as you make changes, and you get instant feedback on the outputs of your tuning actions, which can help drive rapid iteration and idea testing.

Governance and collaboration

A graphical and components-based approach to modeling also has the added benefits of providing a stronger level of governance and fostering collaboration. Building machine learning model is often a team sport, and the ability to share and reuse models easily can dramatically reduce the cost and effort involved in building and maintaining models.

SAS Visual Data Mining and Machine Learning enables users to build complex, enterprise-grade pipeline models that support sophisticated variable selection, feature engineering techniques, as well as model comparison processes all within a single, easy-to-understand, pipeline-based design framework.

Pipeline modeling using SAS VDMML

The graphical, pipeline-based modeling framework within SAS Visual Data Mining and Machine Learning leverages common components, supports self-documentation, and allows users to leverage a template-based approach to building and sharing machine learning models quickly.

More importantly, as a new user or team member who needs to review, tune or reuse someone else’s model, it is much easier and quicker to understand the design and intent of the various components of a pipeline model and make the needed changes.

It is much easier and quicker to understand the design and intent of the various components of a pipeline model.

Communication and storytelling

Finally, and perhaps most importantly, a graphical, low-code/no-code approach to building machine learning models makes it much easier to communicate both the intent and potential impact of the model. Figures and numbers represent facts, but narratives and stories convey emotion and build connections. The visual modeling approaches supported by SAS Viya enable you to tell compelling stories, share powerful ideas, and inspire valuable actions.

SAS Viya enables you to make changes and apply filters on the fly within its various visual modeling environments. With the model training process and model outputs all represented visually, it makes it extremely easy to discuss business scenarios, test hypotheses, and test modeling strategies and approaches, even with people without a deep machine learning background.

There is no question that a programmatic approach to building machine learning models offers the ultimate power and flexibility and enables data scientist to build the most complex and advanced machine learning models. But when it comes to speed, governance, and communications, a graphical, low-code/no-code approach to building machine learning definitely has a lot to offer.

To learn more about a low-code/no-code approach to building machine learning models using SAS Viya, check out my book Smart Data Discovery Using SAS® Viya®.

The value of a low-code/no-code approach to building machine learning models was published on SAS Users.

8月 252020
 

Analytics is playing an increasingly strategic role in the ongoing digital transformation of organizations today. However, to succeed and scale your digital transformation efforts, it is critical to enable analytics skills at all tiers of your organization. In a recent blog post covering 4 principles of analytics you cannot ignore, SAS COO Oliver Schabenberger articulated the importance of democratizing analytics. By scaling your analytics efforts beyond traditional data science teams and involving more people with strong business domain knowledge, you can gain more valuable insights and make more significant impacts.

SAS Viya was built from the ground up to fulfill this vision of democratizing analytics. At SAS, we believe analytics should be accessible to everyone. While SAS Viya offers tremendous support and will continue to be the tool of choice for many advanced users and programmers, it is also highly accessible for business analysts and insights team who prefer a more visual approach to analytics and insights discovery.

Self-service data management

First of all, SAS Viya makes it easy for anyone to ingest and prepare data without a single line of code. The integrated data preparation components within SAS Viya support ad-hoc, agile-oriented data management tasks where you can profile, cleanse, and join data easily and rapidly.

Automatically Generated Data Profiling Report

You can execute complex joins, create custom columns, and cleanse your data via a completely drag-and-drop interface. The automation built into SAS Viya eases the often tedious task of data profiling and data cleansing via automated data type identification and transform suggestions. In an area that can be both complex and intimidating, SAS Viya makes data management tasks easy and approachable, helping you to analyze more data and uncover more insights.

Data Join Using a Visual Interface

A visual approach supporting low-code and no-code programming

Speaking of no-code, SAS Viya’s visual approach and support extend deep into data exploration and advanced modeling. Not only can you quickly build charts such as histograms and box plots using a drag and drop interface, but you can also build complex machine learning models using algorithms such as decision trees and logistic regression on the same visual canvas.

Building a Decision Tree Model Using SAS Viya

By putting the appropriate guard rails and providing relevant and context-rich help for the user, SAS Viya empowers users to undertake data analysis using other advanced analytics techniques such as forecasting and correlation analysis. These techniques empower users to ask more complex questions and can potentially help uncover more actionable and valuable insights.

Correlation Analysis Using the Correlation Matrix within SAS Viya

Augmented analytics

Augmented analytics is an emerging area of analytics that leverages machine learning to streamline and automate the process of doing analytics and building machine learning models. SAS Viya leverages augmented analytics throughout the platform to automate various tasks. My favorite use of augmented analytics in SAS Viya, though, is the hyperparameters autotuning feature.

In machine learning, hyperparameters are parameters that you need to set before the learning processing can begin. They are only used during the training process and contribute significantly to the model training process. It can often be challenging to set the optimal hyperparameter settings, especially if you are not an experienced modeler. This is where SAS Viya can help by making building machine learning models easier for everyone one hyperparameter at a time.

Here is an example of using the SAS Viya autotuning feature to improve my decision tree model. Using the autotuning window, all I needed to do was tell SAS Viya how long I want the autotuning process to run for. It will then work its magic and determine the best hyperparameters to use, which, in this case, include the Maximum tree level and the number of Predictor bins. In most cases, you get a better model after coming back from getting a glass of water!

Hyperparameters Autotuning in SAS Viya

Under the hood, SAS Viya uses complex optimization techniques to try to find the best hyperparameter combinations to use all without you having to understand how it manages this impressive feat. I should add that hyperparameters autotuning is supported with many other algorithms in SAS Viya, and you have even more autotuning options when using it via the programmatic interface!

By leveraging a visually oriented framework and augmented analytics capabilities, SAS Viya is making analytics easier and machine learning models more accessible for everyone within an organization. For more on how SAS Viya enables everyone to ask more complex questions and uncover more valuable insights, check out my book Smart Data Discovery Using SAS® Viya®.

Analytics for everyone with SAS Viya was published on SAS Users.

8月 102020
 

The most fundamental concept that students learning introductory SAS programming must master is how SAS handles data. This might seem like an obvious statement, but it is often overlooked by students in their rush to produce code that works. I often tell my class to step back for a moment and "try to think like SAS" before they even touch the keyboard. There are many key topics that students must understand in order to be successful SAS programmers. How does SAS compile and execute a program? What is the built-in loop that SAS uses to process data observation by observation? What are the coding differences when working with numeric and character data? How does SAS handle missing observations?

One concept that is a common source of confusion for students is how to tell SAS to treat rows versus columns. An example that we use in class is how to write a program to calculate a basic descriptive statistic, such as the mean. The approach that we discuss is to identify our goal, rows or columns, and then decide what SAS programming statements are appropriate by thinking like SAS. First, we decide if we want to calculate the mean of an observation (a row) or the mean of a variable (a column). We also pause to consider other issues such as the type of variable, in this case numeric, and how SAS evaluates missing data. Once these concepts are understood we can proceed with an appropriate method: using DATA step programming, a procedure such as MEANS, TABULATE, REPORT or SQL, and so on. For more detailed information about this example there is an excellent user group paper on this topic called "Many Means to a Mean" written by Shannon Pileggi for the Western Users of SAS Software conference in 2017. In addition, The Little SAS® Book and its companion book, Exercises and Projects for the Little SAS® Book, Sixth Edition address these types of topics in easy-to-understand examples followed up with thought-provoking exercises.

Here is an example of the type of question that our book of exercises and projects uses to address this type of concept.

Short answer question

  1. Is there a difference between calculating the mean of three variables X1, X2, and X3 using the three methods as shown in the following examples of code? Explain your answer.
    Avg1 = MEAN(X1,X2,X3);
    Avg2 = (X1 + X2 + X3) / 3;
    PROC MEANS; VAR X1 X2 X3; RUN;

Solution

In the book, we provide solutions for odd-numbered multiple choice and short answer questions, and hints for the programming exercises. Here is the solution for this question:

  1. The variable Avg1 that uses the MEAN function returns the mean of nonmissing arguments and will provide a mean value of X1, X2, and X3 for each observation (row) in the data set. The variable Avg2 that uses an arithmetic equation will also calculate the mean for each observation (row), but will return a missing value if any of the variables for that observation have a missing value. Using PROC MEANS will calculate the mean of nonmissing data for each variable (column) X1, X2, and X3 vertically.

For more information about The Little SAS Book and its companion book of exercises and projects, check out these blogs:

Learning to think like SAS was published on SAS Users.

6月 112020
 

Whether you enjoy debugging or hate it, for programmers, debugging is a fact of life. It’s easy to misspell a keyword, scramble your array subscripts, or (heaven forbid!) forget a semicolon. That’s why we include a chapter on debugging in The Little SAS® Book and its companion book, Exercises and Projects for the Little SAS® Book. We believe that learning to debug makes you a better programmer. Once you understand a bug, you will be better prepared to avoid it in the future.

To help hone your debugging skills, here is an example of the type of problems you can find in our book of exercises and projects. See if you can find the bugs.

Programming exercise

  1. A friend tells you that she is learning SAS and wrote the following program. Unfortunately, the program won’t run. Help her improve her programming skills by finding the mistakes.

TITLE Height, Weight, and BMI;
TITLE2 by Sex and Age Group;
PROC CONTENT DATA = SASHELP.class; RUN;
DATA; SET SASHELP.class;
Height_m = Heigth * 0.0254;
Weight_kg = Weight * 0.4536;
BMI = Weight_kg / Height_m**2;
PROC FORMAT; VALUE
$sex 'M' = 'Boys' 'F' = 'Girls';
VALUE agegp 11-12 = 'Preteens
13-16 = 'Teens';
PROC TABULATE;
CLASS Sex Age; VAR Height_m Weight_kg;
TABLES (Height_m Weight_kg BMI)*
MEAN, Sex Age ALL;
FORMAT Sex $sex. Age agegp.;
RUN;
QUIT;

 

  • a. Examine the SAS data set SASHELP.CLASS including variable attributes.
  • b. Clean up the formatting of the program by adding appropriate indention and line spacing to show the structure of the DATA and PROC steps. Make changes as needed to make the program conform to standard best practices.
  • c. Fix any errors in the code so that the program will run correctly.
  • d. Add comments to the revised program for each bug that you fix so that your friend can understand her mistakes.

Solution

In the book, we provide solutions for odd-numbered multiple choice and short answer questions, and hints for the programming exercises. So here is a hint for this exercise:

  1. Hint: This program contains four bugs. It also contains “red herrings” that are unusual for SAS code, but nonetheless do run properly and so are not actual bugs. Be sure you know how SAS handles data set names by default. SAS Enterprise Guide can format code for you; right-click the Program window and select Format Code from the pop-up menu. To format code in SAS Studio, click the Format Code icon at the top of Program window.

For more about The Little SAS Book and its companion book of exercises and projects, check out these blogs:

What's wrong with this code? was published on SAS Users.

5月 292020
 

While working at the Rutgers Robert Wood Johnson Medical School, I had access to data on over ten million visits to emergency departments in central New Jersey, including ICD-9 (International Classification of Disease – 9th edition) codes along with some patient demographic data.

I also had the ozone level from several central New Jersey monitoring stations for every hour of the day for ten years. I used PROC REG (and ARIMA) to assess the association between ozone levels and the number of admissions to emergency departments diagnosed as asthma. Some of the predictor variables, besides ozone level, were pollen levels and a dichotomous variable indicating if the date fell on a weekend. (On weekdays, patients were more likely to visit the personal physician than on a weekend.) The study showed a significant association between ozone levels and asthma attacks.

It would have been nice to have the incredible diagnostics that are now produced when you run PROC REG. Imagine if I had SAS Studio back then!

In the program, I used a really interesting trick. (Thank you Paul Grant for showing me this trick so many years ago at a Boston Area SAS User Group meeting.) Here's the problem: there are many possible codes such as 493, 493.9, 493.100, 493.02, and so on that all relate to asthma. The straightforward way to check an ICD-9 code would be to use the SUBSTR function to pick off the first three digits of the code. But why be straightforward when you can be tricky or clever? (Remember Art Carpenter's advice to write clever code that no one can understand so they can't fire you!)

The following program demonstrates the =: operator:

*An interesting trick to read ICD codes;
<strong>Data</strong> ICD_9;
  input ICD : $7. @@;
  if ICD =: "493" the output;
datalines;
493 770.6 999 493.9 493.90 493.100
;
title "Listing of All Asthma Codes";
<strong>proc</strong> <strong>print</strong> data=ICD_9 noobs;
<strong>run</strong>;

 

Normally, when SAS compares two strings of different length, it pads the shorter string with blanks to match the length of the longer string before making the comparison. The =: operator truncates the longer string to the length of the shorter string before making the comparison.

The usual reason to write a SAS blog is to teach some aspect of SAS programming or to just point out something interesting about SAS. While that is usually my motivation, I have an ulterior motive in writing this blog – I want to plug a new book I have just published on Amazon. It's called 10-8 Awaiting Crew: Memories of a Volunteer EMT. One of the chapters discusses the difficulty of conducting statistical studies in pre-hospital settings. This was my first attempt at a non-technical book. I hope you take a look. (Enter "10-8 awaiting crew" or "Ron Cody" in Amazon search to find the book.) Drop me an email with your thoughts at ron.cody@gmail.com.

Using SAS to estimate the link between ozone and asthma (and a neat trick) was published on SAS Users.

5月 292020
 

While working at the Rutgers Robert Wood Johnson Medical School, I had access to data on over ten million visits to emergency departments in central New Jersey, including ICD-9 (International Classification of Disease – 9th edition) codes along with some patient demographic data.

I also had the ozone level from several central New Jersey monitoring stations for every hour of the day for ten years. I used PROC REG (and ARIMA) to assess the association between ozone levels and the number of admissions to emergency departments diagnosed as asthma. Some of the predictor variables, besides ozone level, were pollen levels and a dichotomous variable indicating if the date fell on a weekend. (On weekdays, patients were more likely to visit the personal physician than on a weekend.) The study showed a significant association between ozone levels and asthma attacks.

It would have been nice to have the incredible diagnostics that are now produced when you run PROC REG. Imagine if I had SAS Studio back then!

In the program, I used a really interesting trick. (Thank you Paul Grant for showing me this trick so many years ago at a Boston Area SAS User Group meeting.) Here's the problem: there are many possible codes such as 493, 493.9, 493.100, 493.02, and so on that all relate to asthma. The straightforward way to check an ICD-9 code would be to use the SUBSTR function to pick off the first three digits of the code. But why be straightforward when you can be tricky or clever? (Remember Art Carpenter's advice to write clever code that no one can understand so they can't fire you!)

The following program demonstrates the =: operator:

*An interesting trick to read ICD codes;
<strong>Data</strong> ICD_9;
  input ICD : $7. @@;
  if ICD =: "493" the output;
datalines;
493 770.6 999 493.9 493.90 493.100
;
title "Listing of All Asthma Codes";
<strong>proc</strong> <strong>print</strong> data=ICD_9 noobs;
<strong>run</strong>;

 

Normally, when SAS compares two strings of different length, it pads the shorter string with blanks to match the length of the longer string before making the comparison. The =: operator truncates the longer string to the length of the shorter string before making the comparison.

The usual reason to write a SAS blog is to teach some aspect of SAS programming or to just point out something interesting about SAS. While that is usually my motivation, I have an ulterior motive in writing this blog – I want to plug a new book I have just published on Amazon. It's called 10-8 Awaiting Crew: Memories of a Volunteer EMT. One of the chapters discusses the difficulty of conducting statistical studies in pre-hospital settings. This was my first attempt at a non-technical book. I hope you take a look. (Enter "10-8 awaiting crew" or "Ron Cody" in Amazon search to find the book.) Drop me an email with your thoughts at ron.cody@gmail.com.

Using SAS to estimate the link between ozone and asthma (and a neat trick) was published on SAS Users.

3月 052020
 

Have you heard that SAS offers a collection of new, high-performance CAS procedures that are compatible with a multi-threaded approach? The free e-book Exploring SAS® Viya®: Data Mining and Machine Learning is a great resource to learn more about these procedures and the features of SAS® Visual Data Mining and Machine Learning. Download it today and keep reading for an excerpt from this free e-book!

In SAS Studio, you can access tasks that help automate your programming so that you do not have to manually write your code. However, there are three options for manually writing your programs in SAS® Viya®:

  1. SAS Studio provides a SAS programming environment for developing and submitting programs to the server.
  2. Batch submission is also still an option.
  3. Open-source languages such as Python, Lua, and Java can submit code to the CAS server.

In this blog post, you will learn the syntax for two of the new, advanced data mining and machine learning procedures: PROC TEXTMINE and PROCTMSCORE.

Overview

The TEXTMINE and TMSCORE procedures integrate the functionalities from both natural language processing and statistical analysis to provide essential functionalities for text mining. The procedures support essential natural language processing (NLP) features such as tokenizing, stemming, part-of-speech tagging, entity recognition, customized stop list, and so on. They also support dimensionality reduction and topic discovery through Singular Value Decomposition.

In this example, you will learn about some of the essential functionalities of PROC TEXTMINE and PROC TMSCORE by using a text data set containing 1,830 Amazon reviews of electronic gaming systems. The data set is named Amazon. You can find similar data sets of Amazon reviews at http://jmcauley.ucsd.edu/data/amazon/.

PROC TEXTMINE

The Amazon data set has already been loaded into CAS. The review content is stored in the variable ReviewBody, and we generate a unique review ID for each review. In the proc call shown in Program 1 we ask PROC TEXTMINE to do three tasks:

  1. parse the documents in table reviews and generate the term by document matrix
  2. perform dimensionality reduction via Singular Value Decomposition
  3. perform topic discovery based on Singular Value Decomposition results

Program 1: PROC TEXTMINE

data mycaslib.amazon;
    set mylib.amazon;
run;

data mycaslib.engstop;
    set mylib.engstop;
run;

proc textmine data=mycaslib.amazon;
    doc_id id;
    var reviewbody;

 /*(1)*/  parse reducef=2 entities=std stoplist=mycaslib.engstop 
          outterms=mycaslib.terms outparent=mycaslib.parent
          outconfig=mycaslib.config;

 /*(2)*/  svd k=10 svdu=mycaslib.svdu outdocpro=mycaslib.docpro
          outtopics=mycaslib.topics;

run;

(1) The first task (parsing) is specified in the PARSE statement. Parameter “reducef” specifies the minimum number of times a term needs to appear in the text to be included in the analysis. Parameter “stop” specifies a list of terms to be excluded from the analysis, such as “the”, “this”, and “that”. Outparent is the output table that stores the term by document matrix, and Outterms is the output table that stores the information of terms that are included in the term by document matrix. Outconfig is the output table that stores configuration information for future scoring.

(2) Tasks 2 and 3 (dimensionality reduction and topic discovery) are specified in the SVD statement. Parameter K specifies the desired number of dimensions and number of topics. Parameter SVDU is the output table that stores the U matrix from SVD calculations, which is needed in future scoring. Parameter OutDocPro is the output table that stores the new matrix with reduced dimensions. Parameter OutTopics specifies the output table that stores the topics discovered.

Click the Run shortcut button or press F3 to run Program 1. The terms table shown in Output 1 stores the tagging, stemming, and entity recognition results. It also stores the number of times each term appears in the text data.

Output 1: Results from Program 1

PROC TMSCORE

PROC TEXTMINE is used with large training data sets. When you have new documents coming in, you do not need to re-run all the parsing and SVD computations with PROC TEXTMINE. Instead, you can use PROC TMSCORE to score new text data. The scoring procedure parses the new document(s) and projects the text data into the same dimensions using the SVD weights derived from the original training data.

In order to use PROC TMSCORE to generate results consistent with PROC TEXTMINE, you need to provide the following tables generated by PROC TEXTMINE:

  • SVDU table – provides the required information for projection into the same dimensions.
  • Config table – provides parameter values for parsing.
  • Terms table – provides the terms that should be included in the analysis.

Program 2 shows an example of TMSCORE. It uses the same input data layout used for PROC TEXTMINE code, so it will generate the same docpro and parent output tables, as shown in Output 2.

Program 2: PROC TMSCORE

Proc tmscore data=mycaslib.amazon svdu=mycaslib.svdu
        config=mycaslib.config terms=mycaslib.terms
        svddocpro=mycaslib.score_docpro outparent=mycaslib.score_parent;
    var reviewbody;
    doc_id id;
run;

 

Output 2: Results from Program 2

To learn more about advanced data mining and machine learning procedures available in SAS Viya, including PROC FACTMAC, PROC TEXTMINE, and PROC NETWORK, you can download the free e-book, Exploring SAS® Viya®: Data Mining and Machine Learning. Exploring SAS® Viya® is a series of e-books that are based on content from SAS® Viya® Enablement, a free course available from SAS Education. You can follow along with examples in real time by watching the videos.

 

Learn about new data mining and machine learning procedures in SAS Viya was published on SAS Users.