Suzanne Morgen

8月 012022

Recently, the SAS Press team moved to a new building on the SAS campus. And when the SAS Press team moves, we bring a lot of books with us! Packing and organizing all of our books gave us a chance to appreciate all of our authors’ hard work during the more than 30 years that SAS Press has existed.

One author has an outsized presence on the SAS Press bookshelves – Ron Cody. He has written over a dozen books that include some of the most popular titles for new SAS users. He taught statistics at the Rutgers Robert Wood Johnson Medical School for many years and is a frequent presenter at SAS conferences.

Few people know more about SAS than Ron, which made him the perfect person to be our first SAS Press Author of the Month. During August, you can get special deals on all of Ron's books at RedShelf. Find out more information on his author page. Ron is also hosting a free webinar on data cleaning tips and tricks on August 11th.

We recently asked Ron to share a little bit about his journey as an author and teacher. As you might imagine, Ron has a lot of SAS knowledge and advice to share.

Ron Cody's books on a bookshelf.

The Cody section of the SAS Press bookshelf.

Q: When did you decide to write your first book?

I decided to write my first SAS book (Applied Statistics and the SAS Programming Language) in 1985. It was published by Elsevier Publishing, which was later bought by Prentice Hall. At the time I was writing the book, there were no other books out there—just SAS manuals. This book is still in print (in a fifth edition).

Q: What made you decide to keep writing more books about SAS?

Once I realized I could write well enough to get published, I got the "writer's bug." Although writing a book is hard work, the reward is substantial. My motivation is more about teaching people about SAS and statistics rather than the monetary reward. My goal in writing any book is to make enough money to take my wife to dinner!

Q: Is one of your books your favorite? Or do you love them all equally?

I do have some favorites. I would put Learning SAS by Example as one. It contains a section for beginners as well as later chapters that are useful to intermediate, or even advanced SAS programmers. I particularly like my latest book, A Gentle Introduction to Statistics Using SAS Studio in the Cloud. I think I made statistical concepts accessible to non-mathematically minded people. There are only a few equations in the entire book.

Q: As a teacher, how did you encourage students who were having a hard time understanding statistics?

I try really hard to convince them that the statistical concepts really make sense, and it is sometimes the terminology that gets in the way.

Q: What do you think students struggle with the most when they are learning SAS?

I believe the overwhelming richness of SAS can intimidate a beginning programmer. That's why I start from simple examples and explain step-by-step how each program works.

Q: What is your best advice for someone who wants to learn SAS?

Buy all of my books, of course! Just kidding. There are many YouTube videos and other online resources that are useful. I think two of the best books would be The Little SAS Book, and either Learning SAS by Example (for the more serious student) or Getting Started with SAS Programming Using SAS Studio in the Cloud. The latter book is more suited to someone using SAS OnDemand for Academics.

Q: You recently published a memoir about your time as an EMT. How did it feel to reflect back on that time of your life? Any more memoirs in your future?

I thoroughly enjoyed writing a book about my years as a volunteer EMT (10-8 Awaiting Crew: Memories of a Volunteer EMT). I was fortunate that I had kept a journal and recorded details of some of the more interesting and exciting calls. As of now, I do not have another memoir in my future, but I am working on a nonfiction novel. I'm not sure how successful it will be, but I'm going to give it a try. I strive to write almost every day. I tell other beginning authors, that if you spend a few hours writing every day, it becomes much easier. I call it "getting in the groove."

Ron Cody sitting in a chair with a cat in his lap. Ron is wearing a shirt that says "using Statistics to prove a point is just MEAN."

Ron and Dudley. Credit: Jan Cody

Thanks for talking to us, Ron! We hope you will join us in celebrating Ron's accomplishments as our Author of the Month. Check out his author page to see special deals on all of Ron’s books during August and to discover more resources.


Meet our SAS Press Author of the Month – Ron Cody was published on SAS Users.

9月 132021

The Day of the Programmer is not enough time to celebrate our favorite code-creators. That’s why at SAS, we celebrate an entire week with SAS Programmer Week! If you want to extend the fun and learning of SAS Programmer Week year-round, SAS Press is here to support you with books for programmers at every level.

2021 has been a big year for learning, so we wanted to share the six most popular books for programmers this year. There are some old favorites on this list as well as some brand-new books on a variety of topics. Check out the list below, and see what your fellow programmers are reading this year!

  1. Little SAS Book: A Primer, Sixth Edition

This book is at the top of almost every list of recommended books for anyone who wants to learn SAS. And for good reason! It breaks down the basics of SAS into easy-to-understand chunks with tons of practice questions. If you are new to SAS or are interested in getting your basic certification, this is the book for you.

  1. Learning SAS by Example: A Programmer’s Guide, Second Edition

Whether you are learning SAS for the first time or just need a quick refresher on a single topic, this book is well-organized so that you can read start to finish or skip to your topic of interest. Filled with real-world examples, this is a book that should be on every SAS programmer’s bookshelf!

  1. Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS

If you work with big data, then you probably work with a lot of text. The third book on our list is for anyone who handles unstructured data. This book focuses on practical solutions to real-life problems. You’ll learn how to collect, cleanse, organize, categorize, explore, analyze, and interpret your data.

  1. End-to-End Data Science with SAS: A Hands-On Programming Guide

This book offers a step-by-step explanation of how to create machine learning models for any industry. If you want to learn how to think like a data scientist, wrangle messy code, choose a model, and evaluate models in SAS, then this book has the information that you need to be a successful data scientist.

  1. Cody's Data Cleaning Techniques Using SAS, Third Edition

Every programmer knows that garbage in = garbage out. Take out the trash with this indispensable guide to cleaning your data. You’ll learn how to find and correct errors and develop techniques for correcting data errors.

  1. SAS Graphics for Clinical Trials by Example

If you are a programmer who works in the health care and life sciences industry and want to create visually appealing graphs using SAS, then this book is designed specifically for you. You’ll learn how to create a wide range of graphs using Graph Template Language (GTL) and statistical graphics procedures to solve even the most challenging clinical graph problems.

An honorable mention also goes to the SAS Certification Guides. They are a great way to study for the certification exams for the SAS Certified Specialist: Base Programming and SAS Certified Professional: Advanced Programming credentials.

We have many books available to support you as you develop your programming skills – and some of them are free! Browse all our available titles today.

Top Books for SAS Programmers was published on SAS Users.

11月 202020

The following is an excerpt from Cautionary Tales in Designed Experiments by David Salsburg. This book is available to download for free from SAS Press. The book aims to explain statistical design of experiments (DOE) to readers with minimal mathematical knowledge and skills. In this excerpt, you will learn about the origin of Thomas Bayes’ Theorem, which is the basis for Bayesian analysis.

A black and white portrait of Thomas Bayes in a black robe with a white collar.

Source: Wikipedia

The Reverend Thomas Bayes (1702–1761) was a dissenting minister of the Anglican Church, which means he did not subscribe to the full body of doctrine espoused by the Church. We know of Bayes in the 21st century, not because of his doctrinal beliefs, but because of a mathematical discovery, which he thought made no sense whatsoever. To understand Bayes’ Theorem, we need to refer to this question of the meaning of probability.

In the 1930s, the Russian mathematician Andrey Kolomogorov (1904–1987) proved that probability was a measure on a space of “events.” It is a measure, just like area, that can be computed and compared. To prove a theorem about probability, one only needed to draw a rectangle to represent all possible events associated with the problem at hand. Regions of that rectangle represent classes of sub-events.

For instance, in Figure 1, the region labeled “C” covers all the ways in which some event, C, can occur. The probability of C is the area of the region C, divided by the area of the entire rectangle. Anticipating Kolomogorov’s proof, John Venn (1834–1923) had produced such diagrams (now called “Venn diagrams”).

Two overlapping circular shapes. One is labeled C, the other labeled D. The area where the shapes overlap is labeled C+D

Figure 1: Venn Diagram for Events C and D

Figure 1 shows a Venn diagram for the following situation: We have a quiet wooded area. The event C is that someone will walk through those woods sometime in the next 48 hours. There are many ways in which this can happen. The person might walk in from different entrances and be any of a large number of people living nearby. For this reason, the event C is not a single point, but a region of the set of all possibilities. The event D is that the Toreador Song from the opera Carmen will resound through the woods. Just as with event C, there are a number of ways in which this could happen. It could be whistled or sung aloud by someone walking through the woods, or it could have originated from outside the woods, perhaps from a car radio on a nearby street. Some of these possible events are associated with someone walking through the woods, and those possible events are in the overlap between the regions C and D. Events associated with the sound of the Toreador Song that originate outside the woods are in the part of region D that does not overlap region C.

The area of region C (which we can write P(C) and read it as “P of C”) is the probability that someone will walk through the woods. The area of region D (which we can write P(D)) is the probability that the Toreador Song will be heard in the woods. The area of the overlap between C and D (which we can write P(C and D) is the probability that someone will walk through the woods and that the Toreador Song will be heard.

If we take the area P(C and D) and divide it by the area P(C), we have the probability that the Toreador Song will be heard when someone walks through the woods. This is called the conditional probability of D, given C. In symbols

P(D|C) = P(C and D)÷ P(C)

Some people claim that if the conditional probability, P(C|D), is high, then we can state “D causes C.” But this would get us into the entangled philosophical problem of the meaning of “cause and effect.”

To Thomas Bayes, conditional probability meant just that—cause and effect. The conditioning event, C, (someone will walk through the woods in the next 48 hours) comes before the second event D, (the Toreador Song is heard). This made sense to Bayes. It created a measure of the probability for D when C came before.

However, Bayes’ mathematical intuition saw the symmetry that lay in the formula for conditional probability:

P(D|C) = P(D and C)÷ P(C) means that

P(D|C)P(C) = P(D and C) (multiply both sides of the equation by P(C)).

But just manipulating the symbols shows that, in addition,

P(D and C) = P(C|D) P(D), or

P(C|D) = P(C and D)÷ P(D).

This made no sense to Bayes. The event C (someone walks through the woods) occurred first. It had already happened or not before event D (the Toreador Song is heard). If D is a consequence of C, you cannot have a probability of C, given D. The event that occurred second cannot “cause” the event that came before it. He put these calculations aside and never sent them to the Royal Society. After his death, friends of Bayes discovered these notes and only then were they sent to be read before the Royal Society of London. Thus did Thomas Bayes, the dissenting minister, become famous—not for his finely reasoned dissents from church doctrine, not for his meticulous calculations of minor problems in astronomy, but for his discovery of a formula that he felt was pure nonsense.

P(C|D) P(D) = P(C and D) = P(D|C) P(C)

For the rest of the 18th century and for much of the 19th century, Bayes’ Theorem was treated with disdain by mathematicians and scientists. They called it “inverse probability.” If it was used at all, it was as a mathematical trick to get around some difficult problem. But since the 1930s, Bayes’ Theorem has proved to be an important element in the statistician’s bag of “tricks.”

Bayes saw his theorem as implying that an event that comes first “causes” an event that comes after with a certain probability, and an event that comes after “causes” an event that came “before” (foolish idea) with another probability. If you think of Bayes’ Theorem as providing a means of improving on prior knowledge using the data available, then it does make sense.

In experimental design, Bayes’ Theorem has proven very useful when the experimenter has some prior knowledge and wants to incorporate that into his or her design. In general, Bayes’ Theorem allows the experimenter to go beyond the experiment with the concept that experiments are a means of continuing to develop scientific knowledge.

To learn more about how probability is used in experimental design, download Cautionary Tales in Designed Experiments now!

Thomas Bayes’ theorem and “inverse probability” was published on SAS Users.

3月 192020

At SAS Press, we agree with the saying “The best things in life are free.” And one of the best things in life is knowledge. That’s why we offer free e-books to help you learn SAS or improve your skills. In this blog post, we will introduce you to one of our amazing titles that is absolutely free.

SAS Programming for R Users

Many data scientists today need to know multiple programming languages including SAS, R, and Python. If you already know basic statistical concepts and how to program in R but want to learn SAS, then SAS Programming for R Users by Jordan Bakerman was designed specifically for you! This free e-book explains how to write programs in SAS that replicate familiar functions and capabilities in R. This book covers a wide range of topics including the basics of the SAS programming language, how to import data, how to create new variables, random number generation, linear modeling, Interactive Matrix Language (IML), and many other SAS procedures. This book also explains how to write R code directly in the SAS code editor for seamless integration between the two tools.

The book is based on the free, 14-hour course of the same name offered by SAS Education available here. Keep reading to learn more about the differences between SAS and R.

SAS versus R

R is an object-oriented programming language. Results of a function are stored in an object and desired results are pulled from the object as needed. SAS revolves around the data table and uses procedures to create and print output. Results can be saved to a new data table.

Let’s briefly compare SAS and R in a general way. Look at the following table, which outlines some of the major differences between SAS and R.

Here are a few other things about SAS to note:

  • SAS has the flexibility to interact with objects. (However, the book focuses on procedural methods.)
  • SAS does not have a command line. Code must be run in order to return results.

SAS Programs

A SAS program is a sequence of one or more steps. A step is a sequence of SAS statements. There are only two types of steps in SAS: DATA and PROC steps.

  • DATA steps read from an input source and create a SAS data set.
  • PROC steps read and process a SAS data set, often generating an output report. Procedures can be called an umbrella term. They are what carry out the global analysis. Think of a PROC step as a function in R.

Every step has a beginning and ending boundary. SAS steps begin with either of the following statements:

  • a DATA statement
  • a PROC statement

After a DATA or PROC statement, there can be additional SAS statements that contain keywords that request SAS perform an operation or they can give information to the system. Think of them as additional arguments to a procedure. Statements always end with a semicolon!

SAS options are additional arguments and they are specific to SAS statements. Unfortunately, there is no rule to say what is a statement versus what is an option. Understanding the difference comes with a little bit of experience. Options can be used to do the following:

  • generate additional output like results and plots
  • save output to a SAS data table
  • alter the analytical method

SAS detects the end of a step when it encounters one of the following statements:

  • a RUN statement (for most steps)
  • a QUIT statement (for some procedures)

Most SAS steps end with a RUN statement. Think of the RUN statement as the right parentheses of an R function. The following table shows an example of a SAS program that has a DATA step and a PROC step. You can see that both SAS statements end with RUN statements, while the R functions begin and end with parentheses.

If you want to learn more about this book or any other free e-books from SAS Press, visit Subscribe to our newsletter to get the latest information on new books.

Free e-book: SAS Programming for R Users was published on SAS Users.

3月 052020

Have you heard that SAS offers a collection of new, high-performance CAS procedures that are compatible with a multi-threaded approach? The free e-book Exploring SAS® Viya®: Data Mining and Machine Learning is a great resource to learn more about these procedures and the features of SAS® Visual Data Mining and Machine Learning. Download it today and keep reading for an excerpt from this free e-book!

In SAS Studio, you can access tasks that help automate your programming so that you do not have to manually write your code. However, there are three options for manually writing your programs in SAS® Viya®:

  1. SAS Studio provides a SAS programming environment for developing and submitting programs to the server.
  2. Batch submission is also still an option.
  3. Open-source languages such as Python, Lua, and Java can submit code to the CAS server.

In this blog post, you will learn the syntax for two of the new, advanced data mining and machine learning procedures: PROC TEXTMINE and PROCTMSCORE.


The TEXTMINE and TMSCORE procedures integrate the functionalities from both natural language processing and statistical analysis to provide essential functionalities for text mining. The procedures support essential natural language processing (NLP) features such as tokenizing, stemming, part-of-speech tagging, entity recognition, customized stop list, and so on. They also support dimensionality reduction and topic discovery through Singular Value Decomposition.

In this example, you will learn about some of the essential functionalities of PROC TEXTMINE and PROC TMSCORE by using a text data set containing 1,830 Amazon reviews of electronic gaming systems. The data set is named Amazon. You can find similar data sets of Amazon reviews at


The Amazon data set has already been loaded into CAS. The review content is stored in the variable ReviewBody, and we generate a unique review ID for each review. In the proc call shown in Program 1 we ask PROC TEXTMINE to do three tasks:

  1. parse the documents in table reviews and generate the term by document matrix
  2. perform dimensionality reduction via Singular Value Decomposition
  3. perform topic discovery based on Singular Value Decomposition results



data mycaslib.engstop;
    set mylib.engstop;

proc textmine;
    doc_id id;
    var reviewbody;

 /*(1)*/  parse reducef=2 entities=std stoplist=mycaslib.engstop 
          outterms=mycaslib.terms outparent=mycaslib.parent

 /*(2)*/  svd k=10 svdu=mycaslib.svdu outdocpro=mycaslib.docpro


(1) The first task (parsing) is specified in the PARSE statement. Parameter “reducef” specifies the minimum number of times a term needs to appear in the text to be included in the analysis. Parameter “stop” specifies a list of terms to be excluded from the analysis, such as “the”, “this”, and “that”. Outparent is the output table that stores the term by document matrix, and Outterms is the output table that stores the information of terms that are included in the term by document matrix. Outconfig is the output table that stores configuration information for future scoring.

(2) Tasks 2 and 3 (dimensionality reduction and topic discovery) are specified in the SVD statement. Parameter K specifies the desired number of dimensions and number of topics. Parameter SVDU is the output table that stores the U matrix from SVD calculations, which is needed in future scoring. Parameter OutDocPro is the output table that stores the new matrix with reduced dimensions. Parameter OutTopics specifies the output table that stores the topics discovered.

Click the Run shortcut button or press F3 to run Program 1. The terms table shown in Output 1 stores the tagging, stemming, and entity recognition results. It also stores the number of times each term appears in the text data.

Output 1: Results from Program 1


PROC TEXTMINE is used with large training data sets. When you have new documents coming in, you do not need to re-run all the parsing and SVD computations with PROC TEXTMINE. Instead, you can use PROC TMSCORE to score new text data. The scoring procedure parses the new document(s) and projects the text data into the same dimensions using the SVD weights derived from the original training data.

In order to use PROC TMSCORE to generate results consistent with PROC TEXTMINE, you need to provide the following tables generated by PROC TEXTMINE:

  • SVDU table – provides the required information for projection into the same dimensions.
  • Config table – provides parameter values for parsing.
  • Terms table – provides the terms that should be included in the analysis.

Program 2 shows an example of TMSCORE. It uses the same input data layout used for PROC TEXTMINE code, so it will generate the same docpro and parent output tables, as shown in Output 2.


Proc tmscore svdu=mycaslib.svdu
        config=mycaslib.config terms=mycaslib.terms
        svddocpro=mycaslib.score_docpro outparent=mycaslib.score_parent;
    var reviewbody;
    doc_id id;


Output 2: Results from Program 2

To learn more about advanced data mining and machine learning procedures available in SAS Viya, including PROC FACTMAC, PROC TEXTMINE, and PROC NETWORK, you can download the free e-book, Exploring SAS® Viya®: Data Mining and Machine Learning. Exploring SAS® Viya® is a series of e-books that are based on content from SAS® Viya® Enablement, a free course available from SAS Education. You can follow along with examples in real time by watching the videos.


Learn about new data mining and machine learning procedures in SAS Viya was published on SAS Users.

2月 262020

Do you wish you could predict the likelihood that one of your customers will open your marketing email? Or what if you could tell whether a new medical treatment for a patient will have a better outcome than the standard treatment? If you are familiar with propensity modeling, then you know such predictions about future behavior are possible! Propensity models generate a propensity score, which is the probability that a future behavior will occur. Propensity models are used often in machine learning and predictive data analytics, particularly in the fields of marketing, economics, business, and healthcare. These models can detect and remove bias in analysis of real-world, observational data where there is no control group.

SAS provides several approaches for calculating propensity scores. This excerpt from the new book, Real World Health Care Data Analysis: Causal Methods and Implementation Using SAS®, discusses one approach for estimating propensity scores and provides associated SAS code. The example code and data used in the examples is available to download here.

A priori logistic regression model

One approach to estimating a propensity score is to fit a logistic regression model a priori, that is, identify the covariates in the model and fix the model before estimating the propensity score. The main advantage of an a priori model is that it allows researchers to incorporate knowledge external to the data into the model building. For example, if there is evidence that a covariate is correlated to the treatment assignment, then this covariate should be included in the model even if the association between this covariate and the treatment is not strong in the current data. In addition, the a priori model is easy to interpret. The directed acyclic graph approach could be very informative in building a logistic propensity score model a priori, as it clearly points out the relationship between covariates and interventions. The correlation structure between each covariate and the intervention selection is pre-specified and in a fixed form. However, one main challenge of the a priori modeling approach is that it might not provide the optimal balance between treatment and control groups.

Building an a priori model

To build an a priori model for propensity score estimation in SAS, we can use either PROC PSMATCH or PROC LOGISTIC as shown in Program 1. In both cases, the input data set is a one observation per patient data set containing the treatment and baseline covariates from the simulated REFLECTIONS study. Also, in both cases the code will produce an output data set containing the original data set with the additional estimated propensity score for each patient (_ps_).

Program 1: Propensity score estimation: a priori logistic regression



Before building a logistic model in SAS, we suggest examining the distribution of the intervention indicator at each level of the categorical variable to rule out the possibility of “complete separation” (or “perfect prediction”), which means that for subjects at some level of some categorical variable, they would all receive one intervention but not the other. Complete separation can occur for several reasons and one common example is when using several categorical variables whose categories are coded by indicators. When the logistic regression model is fit, the estimate of the regression coefficients βs is based on the maximum likelihood estimation, and MLEs under logistic regression modeling do not have a closed form. In other words, the MLE β̂ cannot be written as a function of Xi and Ti. Thus, the MLE of βs are obtained using some numerical analysis algorithms such as the Newton-Raphson method. However, if there is a covariate X that can completely separate the interventions, then the procedure will not converge in SAS. If PROC LOGISTIC was used, the following warning message will be issued.

WARNING: There is a complete separation of data points. The maximum likelihood estimate does not exist.

WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.

Notice that SAS will continue to finish the computation despite issuing warning messages. However, the estimate of such βs are incorrect, and so are the estimated propensity scores. If after examining the intervention distribution at each level of the categorical variables complete separation is found, then efforts should be made to address this issue. One possible solution is to collapse the categorical variable causing the problem. That is, combine the different outcome categories such that the complete separation no longer exists.

Firth logistic regression

Another possible solution is to use Firth logistic regression. It uses a penalized likelihood estimation method. Firth bias-correction is considered an ideal solution to the separation issue for logistic regression (Heinze and Schemper, 2002). In PROC LOGISTIC, we can add an option to run the Firth logistic regression as shown in Program 2.

Program 2: Firth logistic regression

        CPFQ_B FIQ_B GAD7_B ISIX_B PHQ8_B PhysicalSymp_B SDS_B / FIRTH;



Heinze G, Schemper M (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine 21.16: 2409-2419.

Propensity Score Estimation with PROC PSMATCH and PROC LOGISTIC was published on SAS Users.

2月 142020

In honor of Valentine’s day, we thought it would be fitting to present an excerpt from a paper about the LIKE operator because when you like something a lot, it may lead to love! If you want more, you can read the full paper “Like, Learn to Love SAS® Like” by Louise Hadden, which won best paper at WUSS 2019.


SAS provides numerous time- and angst-saving techniques to make the SAS programmer’s life easier. Among those techniques are the ability to search and select data using SAS functions and operators in the data step and PROC SQL, as well as the ability to join data sets based on matches at various levels. This paper explores how LIKE is featured in each one of these techniques and is suitable for all SAS practitioners. I hope that LIKE will become part of your SAS toolbox, too.

Smooth Operators

SAS operators are used to perform a number of functions: arithmetic calculations, comparing or selecting variable values, or logical operations. Operators are loosely grouped as “prefix” (for example a sign before a variable) or “infix” which generally perform an operation BETWEEN two variables. Arithmetic operations using SAS operators may include exponentiation (**), multiplication (*), and addition (+), among others. Comparison operators may include greater than (>, GT) and equals (=, EQ), among others. Logical, or Boolean, operators include such operands as || or !!, AND, and OR, and serve the purpose of grouping SAS operations. Some operations that are performed by SAS operators have been formalized in functions. A good example of this is the concatenation operators (|| and !!) and the more powerful CAT functions which perform similar, but not identical, operations. LIKE operators are most frequently utilized in the DATA step and PROC SQL via a DATA step.

There is a category of SAS operators that act as comparison operators under special circumstances, generally in where statements in PROC SQL and the data step (and DS2) and subsetting if statements in the data step. These operators include the LIKE operator and the SOUNDS LIKE operator, as well as the CONTAINS and the SAME-AND operators. It is beyond the scope of this short paper to discuss all the smooth operators, but they are definitely worth a look.

LIKE Operator

Character operators are frequently used for “pattern matching,” that is, evaluating whether a variable value equals, does not equal, or sounds like a specified value or pattern. The LIKE operator is a case-sensitive character operator that employs two special “wildcard” characters to specify a pattern: the percent sign (%) indicates any number of characters in a pattern, while the underscore (_) indicates the presence of a single character per underscore in a pattern. The LIKE operator is akin to the GREP utility available on Unix/Linux systems in terms of its ability to search strings.

The LIKE operator also includes an escape routine in case you need to use a string that includes a comparison operator such as the carat, the underscore, or the percent sign, etc. An example of the escape routine syntax, when looking for a string containing a percent sign, is:

where yourvar like ‘100%’ escape ‘%’;

Additionally, SAS practitioners can use the NOT LIKE operator to select variables WITHOUT a given pattern. Please note that the LIKE statement is case-sensitive. You can use the UPCASE, LOWCASE, or PROPCASE functions to adjust input strings prior to using the LIKE statement. You may string multiple LIKE statements together with the AND or OR operators.


The LIKE operator, described above, searches the actual spelling of operands to make a comparison. The SOUNDS LIKE operator uses phonetic values to determine whether character strings match a given pattern. As with the LIKE operator, the SOUNDS LIKE operator is useful for when there are misspellings and similar sounding names in strings to be compared. The SOUNDS LIKE operator is denoted with a short cut ‘-*’. SOUNDS LIKE is based on SAS’s SOUNDEX algorithm. Strings are encoded by retaining the original first column, stripping all letters that are or act as vowels (A, E, H, I, O, U, W, Y), and then assigning numbers to groups: 1 includes B, F, P, and V; 2 includes C, G, J, K, Q, S, X, Z; 3 includes D and T; 4 includes L; 5 includes M and N; and 6 includes R. “Tristn” therefore becomes T6235, as does Tristan, Tristen, Tristian, and Tristin.

For more on the SOUNDS LIKE operator, please read the documentation.

Joins with the LIKE Operator

It is possible to select records with the LIKE operator in PROC SQL with a WHERE statement, including with joins. For example, the code below selects records from the SASHELP.ZIPCODE file that are in the state of Massachusetts and are for a city that begins with “SPR”.

proc sql;
        a.City ,
        a.countynm  , a.city2 ,
         a.statename , a.statename2
    from sashelp.zipcode as a
    where upcase( like 'SPR%' and 
upcase(a.statename)='MASSACHUSETTS' ; 

The test print of table TEMP1 shows only cases for Springfield, Massachusetts.

The code below joins SASHELP.ZIPCODE and a copy of the same file with a renamed key column (city --> geocity), again selecting records for the join that are in the state of Massachusetts and are for a city that begins with “SPR”.

proc sql;
        a.City , b.geocity, 
        a.countynm  ,
        a.statename , b.statecode, 
        a.x, a.y
    from sashelp.zipcode as a, zipcode2 as b
    where = b.geocity and upcase( like 'SPR%' and b.statecode
= 'MA' ;

The test print of table TEMP2 shows only cases for Springfield, Massachusetts with additional variables from the joined file.

The LIKE “Condition”

The LIKE operator is sometimes referred to as a “condition,” generally in reference to character comparisons where the prefix of a string is specified in a search. LIKE “conditions” are restricted to the DATA step because the colon modifier is not supported in PROC SQL. The syntax for the LIKE “condition” is:

where firstname=: ‘Tr’;

This statement would select all first names in Table 2 above. To accomplish the same goal in PROC SQL, the LIKE operator can be used with a trailing % in a where statement.


SAS provides practitioners with several useful techniques using LIKE statements including the smooth LIKE operator/condition in both the DATA step and PROC SQL. There’s definitely reason to like LIKE in SAS programming.

To learn more about SAS Press, check out our up-and-coming titles, and to receive exclusive discounts make sure to subscribe to our newsletter.


    Gilsen, Bruce. September 2001. “SAS® Program Efficiency for Beginners.” Proceedings of the Northeast SAS Users Group Conference, Baltimore, MD.

    Roesch, Amanda. September 2011. “Matching Data Using Sounds-Like Operators and SAS® Compare Functions.” Proceedings of the Northeast SAS Users Group Conference, Portland, ME.

    Shankar, Charu. June 2019. “The Shape of SAS® Code.” Proceedings of PharmaSUG 2019 Conference, Philadelphia, PA.

Learn to Love SAS LIKE was published on SAS Users.

1月 132020

Are you ready to get a jump start on the new year? If you’ve been wanting to brush up your SAS skills or learn something new, there’s no time like a new decade to start! SAS Press is releasing several new books in the upcoming months to help you stay on top of the latest trends and updates. Whether you are a beginner who is just starting to learn SAS or a seasoned professional, we have plenty of content to keep you at the top of your game.

Here is a sneak peek at what’s coming next from SAS Press.

For students and beginners

For beginners, we have Exercises and Projects for The Little SAS® Book: A Primer, Sixth Edition, the best-selling workbook companion to The Little SAS Book by Rebecca Ottesen, Lora Delwiche, and Susan Slaughter. Exercises and Projects for The Little SAS® Book, Sixth Edition will be updated to match the updates to the new The Little SAS® Book: A Primer, Sixth Edition. This hands-on workbook is designed to hone your SAS skills whether you are a student or a professional.



For data explorers of all levels

This free e-book explores the features of SAS® Visual Data Mining and Machine Learning, powered by SAS® Viya®. Users of all skill levels can visually explore data on their own while drawing on powerful in-memory technologies for faster analytic computations and discoveries. You can manually program with custom code or use the features in SAS® Studio, Model Studio, and SAS® Visual Analytics to automate your data manipulation and modeling. These programs offer a flexible, easy-to-use, self-service environment that can scale on an enterprise-wide level. This book introduces some of the many features of SAS Visual Data Mining and Machine Learning including: programming in the Python interface; new, advanced data mining and machine learning procedures; pipeline building in Model Studio, and model building and comparison in SAS® Visual Analytics



For health care data analytics professionals

If you work with real world health care data, you know that it is common and growing in use from sources like observational studies, pragmatic trials, patient registries, and databases. Real World Health Care Data Analysis: Causal Methods and Implementation in SAS® by Doug Faries et al. brings together best practices for causal-based comparative effectiveness analyses based on real world data in a single location. Example SAS code is provided to make the analyses relatively easy and efficient. The book also presents several emerging topics of interest, including algorithms for personalized medicine, methods that address the complexities of time varying confounding, extensions of propensity scoring to comparisons between more than two interventions, sensitivity analyses for unmeasured confounding, and implementation of model averaging.


For those at the cutting edge

Are you ready to take your understanding of IoT to the next level? Intelligence at the Edge: Using SAS® with the Internet of Things edited by Michael Harvey begins with a brief description of the Internet of Things, how it has evolved over time, and the importance of SAS’s role in the IoT space. The book will continue with a collection of chapters showcasing SAS’s expertise in IoT analytics. Topics include Using SAS Event Stream Processing to process real world events, connectivity, using the ESP Geofence window, applying analytics to streaming data, using SAS Event Stream Processing in a typical IoT reference architecture, the role of SAS Event Stream Manager in managing ESP deployments in an IoT ecosystem, how to use deep learning with Your IoT Digital, accounting for data quality variability in streaming GPS data for location-based analytics, and more!




Keep an eye out for these titles releasing in the next two months! We hope this list will help in your search for a SAS book that will get you to the next step in updating your SAS skills. To learn more about SAS Press, check out our up-and-coming titles, and to receive exclusive discounts make sure to subscribe to our newsletter.

Foresight is 2020! New books to take your skills to the next level was published on SAS Users.

9月 232019

The SAS Global Forum 2020 call for content is open until Sept. 30, 2019. Are you thinking of submitting a paper? If so, we have a few tips adapted from The Global English Style Guide that will help your paper shine. By following Global English guidelines, your writing will be clearer and easier to understand, which can boost the effectiveness of your communications.

Even if you’re not planning on submitting a paper and producing technical information is not your primary job function, being aware of Global English guidelines can help you communicate more effectively with your colleagues from around the world.

1. Use Short Sentences

Short sentences are less likely to contain ambiguities or complexities. For task-oriented information, try to limit your sentences to 20 words. If you have written a long sentence, break it up into two or more shorter sentences.

2. Use Complete Sentences

Incomplete sentences can be confusing for non-native speakers because the order of sentence parts is different in other languages. In addition, incomplete sentences can cause machine-translation software to produce garbled results. For example, the phrases below are fragments that may cause issues for readers:

Original: Lots of info here. Not my best, but whatever. Waiting to hear back until I do anything.
Better: There is lots of info here. It’s not my best work, but I am waiting to hear back until I do anything.

An extremely common location to encounter sentence fragments is in the introductions to lists. For example, consider the sentence introducing the list below.

The programs we use for analysis are:
• SAS Visual Analytics
• SAS Data Mining and Machine Learning
• SAS Visual Investigator

If you use an incomplete sentence to introduce a list, consider revising the sentence to be complete, then continue on to the list, as shown below.

We use the following programs in our analysis:
• SAS Visual Analytics
• SAS Data Mining and Machine Learning
• SAS Visual Investigator

3. Untangle Long Noun Phrases

A noun phrase can be a single noun, or it can consist of a noun plus one or more preceding words such as articles, pronouns, adjectives, and other nouns. For example, the following sentence contains a noun phrase with 6 words:

The red brick two-story apartment building was on fire.

Whenever possible, limit noun phrases to no more than three words while maintaining comprehensibility.

4. Expand -ED Verbs That Follow Nouns Whenever Possible

A past participle is the form of a verb that usually ends in -ed. It can be used as both the perfect and past tense of verbs as well as an adjective. This double use can be confusing for non-native English speakers. Consider the following sentence:

This is the algorithm used by the software.

In this example, the word “used” is an adjective, but it may be mistaken for a verb. Avoid using -ed verbs in ambiguous contexts. Instead, add words such as “that” or switch to the present tense to help readers interpret your meaning. A better version of the previous example sentence follows.

This is the algorithm that is used by the software.

5. Always Revise -ING Verbs That Follow Nouns

The name of this tip could have been written as “Always Revise -ING Verbs Following Nouns.” But that’s exactly what we want to avoid! If an -ING word immediately follows and modifies a noun, then either expand it or eliminate it. These constructions are ambiguous and confusing.

6. Use “That” Liberally

The word “that” is your friend! In English, the word “that” is often omitted before a relative clause. If it doesn’t feel unnatural or forced, try to include “that” before these clauses, as shown in the following example.

Original: The file you requested could not be located.

Better: The file that you requested could not be located.

7. Choose Simple, Precise Words That Have a Limited Range of Meanings

We are not often accustomed to thinking about the alternative meanings for the words we use. But consider that many words have multiple meanings. If translated incorrectly, these words could make your writing completely incomprehensible and possibly ridiculous. Consider the alternate meanings of a few of the words in the following sentences:

When you hover over the menu, a box appears.

We are deploying containers in order to scale up efficiency.

8. Don’t Use Slang, Idioms, Colloquialisms, or Figurative Language

In the UK, retail districts are called “high street.” In the US, they are called “main street.” This is just one example of how using colloquialisms can cause confusion. And Brits and Americans both speak the same language!

Especially in formal communications, keep your writing free of regional slang and idioms that cannot be easily understood by non-native English speakers. Common phrases such as “under the weather,” “piece of cake,” or “my neck of the woods” make absolutely no sense when translated literally.

We hope these Global English tips help you write your Global Forum paper, or any other communications you might produce as part of your work. For more helpful tips, read The Global English Style Guide: Writing Clear, Translatable Documentation for a Global Market by John R. Kohl.

Top 8 Global English Guidelines was published on SAS Users.

4月 162019

This blog post is based on the Code Snippets tutorial video in the free SAS® Viya® Enablement course from SAS Education. Keep reading to learn more about code snippets or check out the video to follow along with the tutorial in real-time.

Has there ever been a block of code that you use so infrequently that you always seem to forget the options that you need? Conversely, has there ever been a block of code that you use so frequently that you grow tired of typing it all the time? Code snippets can greatly assist with both of these scenarios. In this blog post, we discuss using pre-installed code snippets and creating new code snippets within SAS Viya.

Pre-installed code snippets

Figure 1: Pre-installed Snippets

SAS Viya comes with several code snippets pre-installed, including snippets to connect to CAS. To access these snippets, expand the Snippets area on the left navigation panel of SAS Studio as shown in Figure 1. You can see that the snippets are divided into categories, making it easier to find them.

If you double-click a pre-installed code snippet, or if you click and drag the snippet into the code editor panel, then the snippet will appear in the panel.

Snippets can range from very simple to very complex. Some contain comments. Some contain macro variables. Some might be only a couple of lines of code. That is the advantage of snippets. They can be anything that you want them to be.



Create new snippets

Now, let’s create a snippet of our own. Figure 2 shows an example of code that calls PROC CARDINALITY. This code is complete and fully executable. When you have the code the way that you want in your code window, click on the shortcut button for Add to My Snippets above the code. The button is outlined in a box in Figure 2.

Figure 2: Add to My Snippets Button

A window will appear that asks you to name the snippet. Naming the snippet then saves it into the My Snippets area in the left navigation panel for future use.

Remember that snippets are extremely flexible. The code that you save does not have to be fully executable. Instead of supplying the data source in your code, you may instead include notes or comments about what needs to be added, which makes the code more general, but it is still a very useful snippet.

To use one of your saved snippets, simply navigate to the My Snippets area, then double-click on your snippet or drag it into the code window.

Want to learn more about SAS Viya? Download the free e-book Exploring SAS® Viya®: Programming and Data Management. The content in this e-book is based on SAS® Viya® Enablement," a free course available from SAS Education.

Using code snippets in SAS® Viya® was published on SAS Users.