In SAS procedures, the WHERE clause is a useful way to filter observations so that the procedure receives only a subset of the data to analyze. The IML procedure supports the WHERE clause in two separate statements.

- On the USE statement, the WHERE clause acts as a global filter. The where clause applies to all subsequent READ statements that read from the open data set.
- On the READ statement, the WHERE clause further filters the data. Because you can use multiple READ statements, you can easily create matrices that contain disjoint or overlapping subsets of the data. However, be aware multiple READ statements might result in SAS reading the data multiple times, depending on the syntax that you specify.

An interesting fact about the WHERE clause in SAS/IML is that you can specify run-time expressions for the WHERE clause, which makes it a very powerful tool for data analysis.

*The WHERE clause in SAS/IML: filter data with run-time expressions #SAStip*

Click To Tweet

### The WHERE clause in the USE statement

The USE statement opens a SAS data set for read access. When you need only one filter, specify a WHERE clause on the USE statement. For example, suppose that you want a matrix that contains the age, weight, and height of all females in the Sashelp.Class data. The following program reads the female observations into the matrix X and prints the average age, weight, and height:

proc iml; varNames = {"Age" "Weight" "Height"}; use Sashelp.Class where(sex='F'); /* restrict to females */ read all var varNames into X; close Sashelp.Class; avg = mean(X); print avg[L="Mean Values for Males" colname=varNames format=5.2]; |

### The WHERE clause in the READ statement

You can also put the WHERE clause in the READ statement. This technique is useful if you intend to read the data several times. For example, the following program reads data for females into the X matrix and data for males into the Y matrix:

use Sashelp.Class; read all var varNames into X where(sex='F'); read all var varNames into Y where(sex='M'); close Sashelp.Class; |

If you use a WHERE clause on both the USE and READ statements, the SAS log will include the NOTE

`NOTE: WHERE clause has been augmented`

to inform you that the data filter combines both WHERE clauses by using a "logical AND" operator.

### Expressions in the WHERE clause

Beginning with SAS/IML 13.1 (released with SAS 9.4m1), you can use expressions in WHERE clauses. This means that you can call the READ statement in loop. During each iteration, you can read and analyze various subsets of the data during each iteration.

For example, suppose that you have several grouping variables and you want to conduct a BY-group analysis. You can use the UNIQUEBY technique to conduct a BY-group analysis with several variables. However, the UNIQUEBY technique requires that the data be sorted and fit in RAM. It also requires a bit of "bookkeeping" because you need to keep track of indices. If you don't mind the inefficiency of reading the data multiple times, a WHERE clause approach is conceptually easier to program.

As an example, suppose that you want to analyze the MPG_City variable in the Sashelp.Cars data set for each combinations of the Origin and Type variables. To keep it simple, suppose that you want to compute the mean value of MPG_City for all pairwise combinations of the Origin and Type variables, excluding the observations for American-made vehicles. This analysis is simple by using PROC MEANS. (The output for PROC MEANS is not shown.)

proc means data=Sashelp.Cars Mean; where Origin ^= "USA"; class Origin Type; var MPG_City; run; |

In PROC IML, this computation requires looping over the valid combinations of Origin and Type. To make the analysis simpler, the following call to PROC FREQ writes the valid combinations to a SAS data set:

proc freq data=Sashelp.Cars noprint; where Origin ^= "USA"; tables Origin*Type / nocum norow nocol nopercent out=FreqOut; /* unique combinations of Origin and Type */ run; |

In PROC IML, you can read the FreqOut data to obtain the unique combinations of the Origin and Type variables. You can iterate over these combinations, reading the Sashelp.Cars data multiple times. During each iteration, you can analyze one of the BY groups, as follows:

proc iml; use FreqOut; read all var {Origin Type}; /* read unique levels of BY groups */ close FreqOut; NumGroups = nrow(Origin); use Sashelp.Cars where(Origin ^= "USA"); /* open data set for reading */ Stats = j(NumGroups, 2); /* allocate vector for results */ do i = 1 to NumGroups; /* for each BY group... */ /* read unsorted data to obtain the i_th BY group */ /* Notice the EXPRESSIONS in the WHERE clause! */ read all var {MPG_City} where(origin=(Origin[i]) & type=(Type[i])); Stats[i, 1] = countn(MPG_City); /* number of nonmissing obs */ Stats[i, 2] = mean(MPG_City); /* analyze this BY group */ end; close Sashelp.Cars; print Origin Type Stats[colname={"N" "Mean"}]; |

The result of the analysis is similar to the output from PROC MEANS. Notice the use of expressions in the WHERE clause in the READ statement. The expression `origin=(Origin[i])` is interpreted as follows:

- The left side of the equal sign (
`origin`) specifies the name of a variable in the open data set. - The right side of the equal sign must be enclosed in parentheses unless it is a literal constant.
- The expression inside the parentheses can be any matrix computation that results in a scalar value, including calls to built-in or user-defined functions.

The example program reads the data set 10 times, once for each unique combination of Origin and Type. Although re-reading data is inefficient, there are three advantages: the data set does not need to be sorted, only one BY group at a time is ever in RAM, and the program statements are easy to write. By using this method, you do not have to keep track of sorting, indexing, or extracting the data. The WHERE clause in SAS/IML does the work for you.

The post The WHERE clause in SAS/IML appeared first on The DO Loop.