The most fundamental concept that students learning introductory SAS programming must master is how SAS handles data. This might seem like an obvious statement, but it is often overlooked by students in their rush to produce code that works. I often tell my class to step back for a moment and "try to think like SAS" before they even touch the keyboard. There are many key topics that students must understand in order to be successful SAS programmers. How does SAS compile and execute a program? What is the built-in loop that SAS uses to process data observation by observation? What are the coding differences when working with numeric and character data? How does SAS handle missing observations?
One concept that is a common source of confusion for students is how to tell SAS to treat rows versus columns. An example that we use in class is how to write a program to calculate a basic descriptive statistic, such as the mean. The approach that we discuss is to identify our goal, rows or columns, and then decide what SAS programming statements are appropriate by thinking like SAS. First, we decide if we want to calculate the mean of an observation (a row) or the mean of a variable (a column). We also pause to consider other issues such as the type of variable, in this case numeric, and how SAS evaluates missing data. Once these concepts are understood we can proceed with an appropriate method: using DATA step programming, a procedure such as MEANS, TABULATE, REPORT or SQL, and so on. For more detailed information about this example there is an excellent user group paper on this topic called "Many Means to a Mean" written by Shannon Pileggi for the Western Users of SAS Software conference in 2017. In addition, The Little SAS® Book and its companion book, Exercises and Projects for the Little SAS® Book, Sixth Edition address these types of topics in easy-to-understand examples followed up with thought-provoking exercises.
Here is an example of the type of question that our book of exercises and projects uses to address this type of concept.
Short answer question
- Is there a difference between calculating the mean of three variables X1, X2, and X3 using the three methods as shown in the following examples of code? Explain your answer.
Avg1 = MEAN(X1,X2,X3); Avg2 = (X1 + X2 + X3) / 3; PROC MEANS; VAR X1 X2 X3; RUN;
In the book, we provide solutions for odd-numbered multiple choice and short answer questions, and hints for the programming exercises. Here is the solution for this question:
- The variable Avg1 that uses the MEAN function returns the mean of nonmissing arguments and will provide a mean value of X1, X2, and X3 for each observation (row) in the data set. The variable Avg2 that uses an arithmetic equation will also calculate the mean for each observation (row), but will return a missing value if any of the variables for that observation have a missing value. Using PROC MEANS will calculate the mean of nonmissing data for each variable (column) X1, X2, and X3 vertically.
For more information about The Little SAS Book and its companion book of exercises and projects, check out these blogs:
- The Little SAS Book 6.0: The Best-selling SAS Book Gets Even Better
- Expand Your SAS Knowledge by Learning PROC SQL