sas books

7月 272021
 

In the past, the COMPRESS function was useful. Since SAS version 9, it has become a blockbuster, and you might not have noticed. The major change was the addition of a new optional parameter called MODIFIERS.

The traditional use of the COMPRESS function was to remove blanks or a list of selected characters from a character string. The addition of a MODIFIER argument does two things. First, you can specify classes of characters to remove, such as all letters, all punctuation marks, or all digits. That is extremely useful, but the addition of the 'k' modifier is why I used the term blockbuster in my description. The 'k' modifier flips the function from one that removes characters from a string to one that keeps a list of characters and removes everything else. Let me show you some examples.

This first example stems from a real problem I encountered while trying to read values that contained units. My data looked something like this:

ID     Weight 
001    100lbs.
002     59Kgs.
003    210LBS
004    83kg

My goal was to create a variable called Wt that represented the person's weight in pounds as a numeric value.

First, let’s look at the code. Then, I’ll give an explanation.

data Convert;
   length ID $3 Weight $8;
   input ID Weight;
 
   Wt = input(compress(Weight,,'kd'),8.);
   /* The COMPRESS function uses two modifiers, 'k' and 'd'.  This means
      keep the digits, remove anything else.  The INPUT function does the
      character-to-numeric conversion.
   */
 
   If findc(Weight,'k','i') then Wt = Wt * 2.2;
 
   /* the FINDC function is looking for an upper or lowercase 'k' in the
      original character string.  If found, it converts the value in
      kilograms to pounds (note: 1 kg = 2.2 pounds).
   */
 
datalines;
001    100lbs.
002     59Kgs.
003    210LBS
004    83kg
;
title "Listing of Data Set Convert";
footnote "This program was run using SAS OnDemand for Academics";
proc print data=Convert noobs;
run;

The program reads the value of Weight as a character string. The COMPRESS function uses 'k' and 'd' as modifiers. Notice the two commas in the list of arguments. A single comma would interpret 'kd' as the second argument (the list of characters to remove). Including two commas notifies the function that 'kd' is the third argument (modifiers). You can list these modifiers in any order, but I like to use 'kd', and I think of it as "keep the digits." What remains is the string of digits. The INPUT function does the character-to-numeric conversion.

Your next step is to figure out if the original value of Weight contained an upper or lowercase 'k'. The FINDC function can take three arguments: the first is the string that you are examining, the second is a list of characters that you are searching for, and the third argument is the 'i' modifier that says, "ignore case" (very useful).

If the original character string (Weight) contains an uppercase or lowercase 'k', you convert from kilograms to pounds.

Here is the output:

There is one more useful application of the COMPRESS function that I want to discuss. Occasionally, you might have a text file in ASCII or EBCDIC that contains non-printing characters (usually placed there in error). Suppose you want just the digits, decimal points (periods), blanks, and commas. You need to read the original value as a text string. Let's call the original string Contains_Junk. All you need to convert these values is one line of code like this:

Valid = compress(Contains_Junk,'.,','kdas');

In this example, you are using all three arguments of the COMPRESS function. As in pre-9 versions of SAS, the second argument is a list of characters that you want to remove. However, because the third argument (modifiers) contains a 'k', the second argument is a list of characters that you want to keep. In addition to periods and commas, you use modifiers to include all digits, uppercase and lowercase letters (the 'a' modifier - 'a' for alpha), and space characters (these include spaces, tabs, and a few others such as carriage returns and linefeeds). If you did not want to include tabs and other "white space" characters, you could rewrite this line as:

Valid = compress(Contains_Junk,'., ','kd');

Here you are including a blank in the second argument and omitting the 's' in the modifier list.

You can read more about the COMPRESS function in any of the following books, available from SAS Press as an e-book or from Amazon in print form:

Or my latest programming book:

 

Questions and/or comments are welcome.

The Amazing COMPRESS Function was published on SAS Users.

7月 212021
 

In my new book, I explain how segmentation and clustering can be accomplished in three ways: coding in SAS, point-and-click in SAS Visual Statistics, and point-and-click in SAS Visual Data Mining and Machine Learning using SAS Model Studio. These three analytical tools allow you to do many diverse types of segmentation, and one of the most common methods is clustering. Clustering is still among the top 10 machine learning methods used based on several surveys across the globe.

One of the best methods for learning about your customers, patrons, clients, or patients (or simply observations in almost any data set) is to perform clustering to find clusters that have similar within-cluster characteristics and each cluster has differing combinations of attributes. You can use this method to aid in understanding your customers or profile various data sets. This can be done in an environment where SAS and open-source software work in a unified platform seamlessly. (While open source is not discussed in my book, stay tuned for future blog posts where I will discuss more fun and exciting things that should be of interest to you for clustering and segmentation.)

Let’s look at an example of clustering. The importance of looking at one’s data quickly and easily is a real benefit when using SAS Visual Statistics.

Initial data exploration and preparation

To demonstrate the simplicity of clustering in SAS Visual Statistics, the data set CUSTOMERS is used here and also throughout the book. I have loaded the CUSTOMERS data set into memory, and it is now listed in the active tab. I can easily explore and visualize this data by right-mouse-clicking and selecting Actions and then Explore and Visualize. This will take you to the SAS Visual Analytics page.

I have added four new compute items by taking the natural logarithm of four attributes and will use these newly transformed attributes in a clustering.

Performing simple clustering

Clustering in SAS Visual Statistics can be found by selecting the Objects icon on the left and scrolling down to see the SAS Visual Statistics menus as seen below. Dragging the Cluster icon onto the Report template area will allow you to use that statistic object and visualize the clusters.

Once the Cluster object is on the template, adding data items to the Data Roles is simple by checking the four computed data items.

Click the OK icon, and immediately the four data items that are being clustered will look like the report below where five clusters were found using the four data items.

There are 105,456 total observations in the data set, however, only 89,998 were used for the analysis. Some observations were not used due to the natural logarithm not being able to be computed. To see how to handle that situation easily, please pick up a copy of Segmentation Analytics with SAS Viya. Let me know if you have any questions or comments.

 

 

Clustering made simple was published on SAS Users.

5月 252021
 

In SAS Studio, the ordering of rows and columns in the Table Analysis task are, by default, arranged by the internal ordering of the values used in the table. The table arranges the variables alphabetically or numerically by increasing value. For example, traditional coding uses 1 for Yes and 0 for No, so the No column is created as the first row because the internal value is 0. There are times when it makes more sense to change the order of the rows and/or columns.

Suppose you have data on risk factors for having a heart attack (including high blood pressure) and outcome data (heart attack). A data set called Risk has data on the status of blood pressure and heart attack (simulated data).

Here are the first 10 observations from that data set:

You can use PROC FREQ to create a 2x2 table or you can use the SAS Studio task called Table Analysis (in the Statistics task list) to create your table. Regardless of whether you decide to write a program or use a SAS Studio task, the resulting table looks like this:

Because we are more interested in what causes heart attacks, we would prefer to have Yes (1) as the first row and column of the table. Here is how to do it with a short SAS program:

You create a format that labels 1 as '1:Yes' and 0 as '2:No' and associate this format with both variables in the PROC FREQ step. You also include the PROC FREQ option ORDER=formatted. This option orders values by their formatted values rather than the default ordering—by the internal values. The original table placed 0 before 1 for that reason. By being tricky and placing the 1: and 2: in the format label, you are forcing the Yes values to come before the No values (otherwise, 'No' would come before 'Yes' – alphabetical order). Here is the output:

If you decided to use a SAS Studio task to create the table, you would open the Code window and click the Edit icon. You could then add PROC FORMAT and the ORDER=formatted option in the TABLES statement.

For the curious readers who would like to see how the Risk data set was created, here is the code:

Here the RAND function is generating a Bernoulli distribution (0 or 1), based on a probability of a getting a 1. You specify this probability using the second argument of the function. One more note: The statement CALL STREAMINIT is used to generate the same series of random numbers every time you run the program. If you omit this statement, the program generates a different series of random numbers every time you run it.

You can read more about how to reorder rows and columns in a 2x2 table in my new book, A Gentle Introduction to Statistics Using SAS Studio in the Cloud. In that book, I demonstrate how to edit the SAS Studio-generated code to reorder rows and columns in a table.

Reordering rows and columns in a 2x2 table with SAS® Studio Tasks was published on SAS Users.

3月 022021
 

The more I use SAS Studio in the cloud via SAS OnDemand for Academics, the more I like it. To demonstrate how useful the Files tab is, I'm going to show you what happens when you drag a text file, a SAS data set, and a SAS program into the Editor window.

I previously created a folder called MyBookFiles and uploaded several files from my local computer to that folder.  You can see a partial list of files in the figure below.

Notice that there are text files, SAS data sets, SAS programs, and some Excel workbooks. Look what happens when I drag a text file (Blank_Delimiter.txt) into the Editor window.

No need to open Notepad to view this file—SAS Studio displays it for you. What about a SAS data set? As an example, I dragged a SAS data set called blood_pressure into the Editor.

You see a list of variables and some of the observations in this data set.  There are vertical and horizontal scroll bars (not shown in the figure) to see more rows or columns. If you want to see a listing of the entire data set or the first 'n' observations, you can run the List Data task, located under the Tasks and Utilities tab.

For the last example, I dragged a SAS program into the editor. It appears exactly the same as if I opened it in my stand-alone version of SAS.

At this point, you can run the program or continue to write more SAS code. By the way, the tilde (~) used In the INFILE statement is a shortcut for your home directory. Follow it with the folder name and the file name.

You can read more about SAS Studio in the cloud in my latest book, Getting Started with SAS Programming: Using SAS Studio in the Cloud.

Viewing files, programs, and data sets in SAS Studio was published on SAS Users.

2月 012021
 

Do you want to spend less time on the tedious task of preparing your data? I want to tell you about a magical and revolutionary SAS macro called %TK_codebook. Not only does this macro create an amazing codebook showcasing your data, it also automatically performs quality control checks on each variable. You will easily uncover potential problems lurking in your data including variables that have:

  • Incomplete formats
  • Out of range values
  • No variation in response values
  • Variables missing an assigned user-defined format
  • Variables that are missing labels

All you need is a SAS data set with labels and formats assigned to each variable and the accompanying format catalogue. Not only will this macro change the way you clean and prepare your data, but it also gives you an effortless way to evaluate the quality of data you obtain from others before you start your analysis. Look how easy it is to create a codebook if you have a SAS data set with labels and formats:

title height=12pt 'Master Codebook for Study A Preliminary Data';
title2 height=10pt 'Simulated Data for Participants in a Health Study';
title3 height=10pt 'Data simulated to include anomalies illustrating the power of %TK_codebook';
 
libname library "/Data_Detective/Formats/Blog_1_Codebooks";
 
%TK_codebook(lib=work,
	file1=STUDYA_PRELIM,
	fmtlib=LIBRARY,
	cb_type=RTF,
	cb_file=/Data_Detective/Book/Blog/SAS_Programs/My_Codebook.rtf,
	var_order=INTERNAL,
	organization = One record per CASEID,
	include_warn=YES;
run;

Six steps create your codebook

After creating titles for your codebook, this simple program provides %TK_codebook with the following instructions:

  1. Create a codebook for SAS data set STUDYA_PRELIM located in the temporary Work library automatically defined by SAS
  2. Find the formats assigned to the STUDYA_PRELIM in a format catalogue located in the folder assigned to the libref LIBRARY
  3. Write your codebook in a file named /Data_Detective/Book/Blog/SAS_Programs/My_Codebook.rtf
  4. List variables in the codebook by their INTERNAL order (order stored in the data set)
  5. Add “One record per CASEID” indicating which variable(s) uniquely identify each observation to codebook header
  6. Include reports identifying potential problems in the data

Just these few lines of code will create the unbelievably useful codebook shown below.

The data set used has many problems that can interfere with analysis. %TK_codebook creates reports showing a concise summary of only those problem variables needing close examination. These reports save you an incredible amount of time.

Using assigned formats, %TK_codebook identifies unexpected values occurring in each variable and provides a summary in the first two reports.

Values occurring outside those defined by the assigned format indicate two possible problems:

  1. A value was omitted from the format definition (Report 1 – Incomplete formats)
  2. The variable has unexpected values needing mitigation before the data is analyzed (Report 2 – Out of Range Value)

The next report lists numeric variables that have no variation in their values.

These variables need examining to uncover problems with preparing the data set.

The next two reports warn you about variables missing an assigned user-defined format. These variables will be excluded from screening for out-of-range values and incomplete format definitions.

The last report informs you about variables that are missing a label or have a label that matches the variable name.

It is easy to use %TK_codebook to resolve problems in your data and create an awesome codebook. Instead of spending your time preparing your data, you will be using your data to change the world!

Create your codebook

Download %TK_codebook from my author page, then learn to use it from my new book, The Data Detective’s Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data.

THE DATA DETECTIVE'S TOOLKIT | BUY IT NOW

Creating codebooks with SAS macros was published on SAS Users.

12月 142020
 

Do you need to see how long patients have been treated for? Would you like to know if a patient’s dose has changed, or if the patient experienced any dose interruptions? If so, you can use a Napoleon plot, also known as a swimmer plot, in conjunction with your exposure data set to find your answers. We demonstrate how to find the answer in our recent book SAS® Graphics for Clinical Trials by Example.

You may be wondering what a Napoleon plot is? Have you ever heard of the map of Napoleon’s Russian campaign? It was a map that displayed six types of data, such as troop movement, temperature, latitude, and longitude on one graph (Wikipedia). In the clinical setting, we try to mimic this approach by displaying several different types of safety data on one graph: hence, the name “Napoleon plot.” The plot is also known as a swimmer plot because each patient has a row in which their data is displayed, which looks like swimming lanes.

Code

Now that you know what a Napoleon plot is, how do you produce it? In essence, you are merely writing GTL code to produce the graph you need. In order to generate a Napoleon plot, some key GTL statements that are used are DISCRETEATTRMAP, HIGHLOWPLOT, SCATTERPLOT and DISCRETELEGEND. Other plot statements are used, but the statements that were just mentioned are typically used for all Napoleon plot. In our recent book, one of the chapters carefully walks you through each step to show you how to produce the Napoleon plot. Program 1, below, gives a small teaser of some of the code used to produce the Napoleon Plot.

Program 1: Code for Napoleon Plot That Highlights Dose Interruptions

	   discreteattrmap name = "Dose_Group";
            value "54" / fillattrs = (color = orange) 
                         lineattrs = (color = orange pattern = solid);     
            value "81" / fillattrs = (color = red) 
                         lineattrs = (color = red pattern = solid);
         enddiscreteattrmap;
 
         discreteattrvar attrvar = id_dose_group var = exdose attrmap = "Dose_Group";
 
         legenditem type = marker name = "54_marker" /
            markerattrs = (symbol = squarefilled color = orange)
            label = "Xan 54mg";
 
         < Other legenditem statements >
 
 
	     layout overlay / yaxisopts = (type = discrete 
                                         display = (line label)     
                                         label = "Patient")
 
	        highlowplot y = number 
                          high = eval(aendy/30.4375) 
                          low = eval(astdy/30.4375) / 
                 group = id_dose_group                       
                 type = bar 
                 lineattrs = graphoutlines 
                 barwidth = 0.2;
		 scatterplot y = number x = eval((max_aendy + 10)/30.4375) /      
                 markerattrs = (symbol = completed size = 12px);               
		 discretelegend "54_marker" "81_marker" "completed_marker" /  
                 type = marker  
                 autoalign = (bottomright) across = 1                          
                 location = inside title = "Dose";
         endlayout;

Output

Without further ado, Output 1 shows you an example of a Napoleon plot. You can see that there are many patients, and so the patient labels have been suppressed. You also see that the patient who has been on the study the longest has a dose delay indicated by the white space between the red and orange bars. While this example illustrates a simple Napoleon plot with only two types, dose exposure and treatment, the book has more complex examples of swimmer plots.

Output 1: Napoleon Plot that Highlights Dose Interruptions

Napoleon plot with orange and red bars showing dose exposure and treatment

How to create a Napoleon plot with Graph Template Language (GTL) was published on SAS Users.

9月 222020
 

Everyone knows that SAS has been helping programmers and coders build complex machine learning models and solve complex business problems for many years, but did you know that you can also now build machines learning models without a single line of code using SAS Viya?

SAS has been helping programmers and coders build complex machine learning models and solve complex business problems over many years.

Building on the vision and commitment to democratize analytics, SAS Viya offers multiple ways to support non-programmers and empowers people with no programming skills to get up and running quickly and build machine learning models. I touched on some of the ways this can be done via SAS Visual Analytics in my previous post on analytics for everyone with SAS Viya. In addition, SAS Viya also supports more advanced pipeline-based visual modeling via SAS Visual Data Mining and Machine Learning. The combination of these different tools within SAS Viya supporting a low-code/no-code approach to modeling makes SAS Viya an incredibly flexible and powerful analytics platform that can help drive analytics usage and adoption throughout an organization.

As analytics and machine learning become more pervasive, an analytics platform that supports a low-code/no-code approach can get more people involved, drive ongoing innovations, and ultimately accelerate digital transformation throughout an organization.

Speed

I have met my fair share of coding ninjas who blew me away with their ability to build models using keyboards with lightning speed. But when it comes to being able to quickly get an idea into a model and generate all the assessment statistics and charts, there is nothing quite like a visual approach to building machine learning models.

In SAS Viya, you can build a decision tree model literally just by dragging and dropping the relevant variables onto the canvas as shown in the animated screen flow below.

Building a machine learning model via drag and drop

In this case, we were able to quickly build a decision tree model that predicts child mortality rates around the world. Not only do we get the decision tree in all its graphics glory (on the left-hand side of the image), we also get the overall model fit measure (Average Standard Error in this case), a variable importance chart, as well as a lift chart all without having to enter a single line of code in under 5 seconds!

You also get a bunch of detailed statistical outputs, including a detailed node statistics table without having to do anything extra. This is useful for when you need to review the distribution and characteristics of specific nodes when using the decision tree.

Detailed node statistics table

 

What’s more, you can leverage the same drag-and-drop paradigm to quickly tune the model. In our case, you can do simple modifications like adding a new variable by simply dragging a new data item onto the canvas or more complex techniques like manually splitting or pruning a node just by clicking and selecting a node on the canvas. The whole model and visualization refreshes instantly as you make changes, and you get instant feedback on the outputs of your tuning actions, which can help drive rapid iteration and idea testing.

Governance and collaboration

A graphical and components-based approach to modeling also has the added benefits of providing a stronger level of governance and fostering collaboration. Building machine learning model is often a team sport, and the ability to share and reuse models easily can dramatically reduce the cost and effort involved in building and maintaining models.

SAS Visual Data Mining and Machine Learning enables users to build complex, enterprise-grade pipeline models that support sophisticated variable selection, feature engineering techniques, as well as model comparison processes all within a single, easy-to-understand, pipeline-based design framework.

Pipeline modeling using SAS VDMML

The graphical, pipeline-based modeling framework within SAS Visual Data Mining and Machine Learning leverages common components, supports self-documentation, and allows users to leverage a template-based approach to building and sharing machine learning models quickly.

More importantly, as a new user or team member who needs to review, tune or reuse someone else’s model, it is much easier and quicker to understand the design and intent of the various components of a pipeline model and make the needed changes.

It is much easier and quicker to understand the design and intent of the various components of a pipeline model.

Communication and storytelling

Finally, and perhaps most importantly, a graphical, low-code/no-code approach to building machine learning models makes it much easier to communicate both the intent and potential impact of the model. Figures and numbers represent facts, but narratives and stories convey emotion and build connections. The visual modeling approaches supported by SAS Viya enable you to tell compelling stories, share powerful ideas, and inspire valuable actions.

SAS Viya enables you to make changes and apply filters on the fly within its various visual modeling environments. With the model training process and model outputs all represented visually, it makes it extremely easy to discuss business scenarios, test hypotheses, and test modeling strategies and approaches, even with people without a deep machine learning background.

There is no question that a programmatic approach to building machine learning models offers the ultimate power and flexibility and enables data scientist to build the most complex and advanced machine learning models. But when it comes to speed, governance, and communications, a graphical, low-code/no-code approach to building machine learning definitely has a lot to offer.

To learn more about a low-code/no-code approach to building machine learning models using SAS Viya, check out my book Smart Data Discovery Using SAS® Viya®.

The value of a low-code/no-code approach to building machine learning models was published on SAS Users.

9月 222020
 

Everyone knows that SAS has been helping programmers and coders build complex machine learning models and solve complex business problems for many years, but did you know that you can also now build machines learning models without a single line of code using SAS Viya?

SAS has been helping programmers and coders build complex machine learning models and solve complex business problems over many years.

Building on the vision and commitment to democratize analytics, SAS Viya offers multiple ways to support non-programmers and empowers people with no programming skills to get up and running quickly and build machine learning models. I touched on some of the ways this can be done via SAS Visual Analytics in my previous post on analytics for everyone with SAS Viya. In addition, SAS Viya also supports more advanced pipeline-based visual modeling via SAS Visual Data Mining and Machine Learning. The combination of these different tools within SAS Viya supporting a low-code/no-code approach to modeling makes SAS Viya an incredibly flexible and powerful analytics platform that can help drive analytics usage and adoption throughout an organization.

As analytics and machine learning become more pervasive, an analytics platform that supports a low-code/no-code approach can get more people involved, drive ongoing innovations, and ultimately accelerate digital transformation throughout an organization.

Speed

I have met my fair share of coding ninjas who blew me away with their ability to build models using keyboards with lightning speed. But when it comes to being able to quickly get an idea into a model and generate all the assessment statistics and charts, there is nothing quite like a visual approach to building machine learning models.

In SAS Viya, you can build a decision tree model literally just by dragging and dropping the relevant variables onto the canvas as shown in the animated screen flow below.

Building a machine learning model via drag and drop

In this case, we were able to quickly build a decision tree model that predicts child mortality rates around the world. Not only do we get the decision tree in all its graphics glory (on the left-hand side of the image), we also get the overall model fit measure (Average Standard Error in this case), a variable importance chart, as well as a lift chart all without having to enter a single line of code in under 5 seconds!

You also get a bunch of detailed statistical outputs, including a detailed node statistics table without having to do anything extra. This is useful for when you need to review the distribution and characteristics of specific nodes when using the decision tree.

Detailed node statistics table

 

What’s more, you can leverage the same drag-and-drop paradigm to quickly tune the model. In our case, you can do simple modifications like adding a new variable by simply dragging a new data item onto the canvas or more complex techniques like manually splitting or pruning a node just by clicking and selecting a node on the canvas. The whole model and visualization refreshes instantly as you make changes, and you get instant feedback on the outputs of your tuning actions, which can help drive rapid iteration and idea testing.

Governance and collaboration

A graphical and components-based approach to modeling also has the added benefits of providing a stronger level of governance and fostering collaboration. Building machine learning model is often a team sport, and the ability to share and reuse models easily can dramatically reduce the cost and effort involved in building and maintaining models.

SAS Visual Data Mining and Machine Learning enables users to build complex, enterprise-grade pipeline models that support sophisticated variable selection, feature engineering techniques, as well as model comparison processes all within a single, easy-to-understand, pipeline-based design framework.

Pipeline modeling using SAS VDMML

The graphical, pipeline-based modeling framework within SAS Visual Data Mining and Machine Learning leverages common components, supports self-documentation, and allows users to leverage a template-based approach to building and sharing machine learning models quickly.

More importantly, as a new user or team member who needs to review, tune or reuse someone else’s model, it is much easier and quicker to understand the design and intent of the various components of a pipeline model and make the needed changes.

It is much easier and quicker to understand the design and intent of the various components of a pipeline model.

Communication and storytelling

Finally, and perhaps most importantly, a graphical, low-code/no-code approach to building machine learning models makes it much easier to communicate both the intent and potential impact of the model. Figures and numbers represent facts, but narratives and stories convey emotion and build connections. The visual modeling approaches supported by SAS Viya enable you to tell compelling stories, share powerful ideas, and inspire valuable actions.

SAS Viya enables you to make changes and apply filters on the fly within its various visual modeling environments. With the model training process and model outputs all represented visually, it makes it extremely easy to discuss business scenarios, test hypotheses, and test modeling strategies and approaches, even with people without a deep machine learning background.

There is no question that a programmatic approach to building machine learning models offers the ultimate power and flexibility and enables data scientist to build the most complex and advanced machine learning models. But when it comes to speed, governance, and communications, a graphical, low-code/no-code approach to building machine learning definitely has a lot to offer.

To learn more about a low-code/no-code approach to building machine learning models using SAS Viya, check out my book Smart Data Discovery Using SAS® Viya®.

The value of a low-code/no-code approach to building machine learning models was published on SAS Users.

8月 102020
 

The most fundamental concept that students learning introductory SAS programming must master is how SAS handles data. This might seem like an obvious statement, but it is often overlooked by students in their rush to produce code that works. I often tell my class to step back for a moment and "try to think like SAS" before they even touch the keyboard. There are many key topics that students must understand in order to be successful SAS programmers. How does SAS compile and execute a program? What is the built-in loop that SAS uses to process data observation by observation? What are the coding differences when working with numeric and character data? How does SAS handle missing observations?

One concept that is a common source of confusion for students is how to tell SAS to treat rows versus columns. An example that we use in class is how to write a program to calculate a basic descriptive statistic, such as the mean. The approach that we discuss is to identify our goal, rows or columns, and then decide what SAS programming statements are appropriate by thinking like SAS. First, we decide if we want to calculate the mean of an observation (a row) or the mean of a variable (a column). We also pause to consider other issues such as the type of variable, in this case numeric, and how SAS evaluates missing data. Once these concepts are understood we can proceed with an appropriate method: using DATA step programming, a procedure such as MEANS, TABULATE, REPORT or SQL, and so on. For more detailed information about this example there is an excellent user group paper on this topic called "Many Means to a Mean" written by Shannon Pileggi for the Western Users of SAS Software conference in 2017. In addition, The Little SAS® Book and its companion book, Exercises and Projects for the Little SAS® Book, Sixth Edition address these types of topics in easy-to-understand examples followed up with thought-provoking exercises.

Here is an example of the type of question that our book of exercises and projects uses to address this type of concept.

Short answer question

  1. Is there a difference between calculating the mean of three variables X1, X2, and X3 using the three methods as shown in the following examples of code? Explain your answer.
    Avg1 = MEAN(X1,X2,X3);
    Avg2 = (X1 + X2 + X3) / 3;
    PROC MEANS; VAR X1 X2 X3; RUN;

Solution

In the book, we provide solutions for odd-numbered multiple choice and short answer questions, and hints for the programming exercises. Here is the solution for this question:

  1. The variable Avg1 that uses the MEAN function returns the mean of nonmissing arguments and will provide a mean value of X1, X2, and X3 for each observation (row) in the data set. The variable Avg2 that uses an arithmetic equation will also calculate the mean for each observation (row), but will return a missing value if any of the variables for that observation have a missing value. Using PROC MEANS will calculate the mean of nonmissing data for each variable (column) X1, X2, and X3 vertically.

For more information about The Little SAS Book and its companion book of exercises and projects, check out these blogs:

Learning to think like SAS was published on SAS Users.

7月 242020
 

SAS Press has added to its selection of free downloadable eBooks with the new SAS and Open-Source Model Management® – Special Collection.

From the description:

“Turn analytical models into business value and smarter decisions with this special collection of papers about SAS Model Management. Without a structured and standardized process to integrate and coordinate all the different pieces of the model life cycle, a business can experience increased costs and missed opportunities. SAS Model Management solutions enable organizations to register, test, deploy, monitor, and retrain analytical models, leveraging any available technology – including open-source models in Python, R, and TensorFlow –into a competitive advantage.”

As we know, the diversity and multitude of tools, programming languages, algorithms and skills is reshaping Models’ Management best practices. The “New Normal” has brought an immediate need to adopt Model Management’s solutions which enable users to register, test, deploy, monitor and retrain models, in a scalable and structured way, leveraging their open-source/commercial development tool of choice.

In this eBook, we have carefully selected several papers and best practices from SAS Global Forum 2020 on how to ML in production and become an Expert in each step of the Model Management Lifecycle.

Key Take Aways

  1. How you can put ML in production and turn your models into business value.
  2. How SAS openness allows you to leverage any open-source technology.
  3. How SAS Model Management solution enables you to register, test, deploy, monitor and retrain analytical models.

Papers Selected

  • The Aftermath What Happens After You Deploy Your Models and Decisions
    By David Duling
  • Turning the Crank: A Simulation of Optimizing Model Retraining
    By David Duling
  • Open-Source Model Management with SAS® Model Manager
    By Glenn Clingroth, Hongjie Xin, and Scott Lindauer
  • Deploying Models Using SAS® and Open-Source
    By Jared Dean
  • Cows or Chickens: How You Can Make Your Models into Containers
    By Hongjie Xin, Jacky Jia, David Duling, and Chris Toth
  • Choose Your Own Adventure: Manage Model Development via a Python IDE
    By Jon Walker
  • Build Your ML Web Application Using SAS AutoML
    By Paata Ugrekhelidze
  • Monitoring the Relevance of Predictors for a Model Over Time
    By Ming-Long Lam
  • Model Validation
    By Hans-Joachim Edert and Tamara Fischer

If you are interested in other topics, like Forecasting, Computer Vision, Text Analytics, find more free downloadable eBooks in the Special Collection series at support.sas.com/freesasebooks.

SAS and Open-Source Model Management (free eBook) was published on SAS Users.