The other day I encountered a SAS Knowledge Base article that shows how to count the number of missing and nonmissing values for each variable in a data set. However, the code is a complicated macro that is difficult for a beginning SAS programmer to understand. (Well, it was hard for *me* to understand!) The code not only counts the number of missing values for each variable, but also creates a SAS data set with the complete results. That's a nice bonus feature, but it contributes to the complexity of the macro.

This article simplifies the process and shows an alternative way to count the number of missing and nonmissing values for each variable in a data set.

### The easy case: Count missing values for numeric variables

If you are only interested in the number of missing values for numeric variables, then a single call to the MEANS procedure computes the answer:

/* create sample data */ data one; input a $ b $ c $ d e; cards; a . a 1 3 . b . 2 4 a a a . 5 . . b 3 5 a a a . 6 a a a . 7 a a a 2 8 ; run; proc means data=one NMISS N; run;

In many SAS procedures, including PROC MEANS, you can omit the VAR statement in order to operate on all relevant variables. For the MEANS procedure, "relevant" means "numeric."

### Count missing values for all variables

The MEANS procedure computes statistics for numeric variables, but other SAS procedures enable you to count the number of missing values for character and numeric variables.

The FREQ procedure is a SAS workhorse that I use almost every day. To get the FREQ procedure to count missing values, use three tricks:

- Specify a format for the variables so that the missing values all have one value and the nonmissing values have another value. PROC FREQ groups a variable's values according to the formatted values.
- Specify the MISSING and MISSPRINT options on the TABLES statement.
- Use the
`_CHAR_`and`_NUM_`keywords on the TABLES statement to specify that the FREQ procedure should compute statistics for all character or all numeric variables.

The following statements count the number of missing and nonmissing values for every variable: first the character variables and then the numeric ones.

/* create a format to group missing and nonmissing */ proc format; value $missfmt ' '='Missing' other='Not Missing'; value missfmt . ='Missing' other='Not Missing'; run; proc freq data=one; format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */ tables _CHAR_ / missing missprint nocum nopercent; format _NUMERIC_ missfmt.; tables _NUMERIC_ / missing missprint nocum nopercent; run;

### Using the SAS/IML language to count missing values

In the SAS/IML Language, you can use the COUNTN and COUNTMISS functions that were introduced in SAS/IML 9.22. Strictly speaking, you need to use only one of the functions, since the result of the other is determined by knowing the number of observations in the data set. For the sake of the example, I'll be inefficient and use both of the functions.

As is the case for the PROC FREQ example, the trick is to use the
`_CHAR_` and `_NUM_` keywords to read in and operate on the character and numeric variables in separate steps:

proc iml; use one; read all var _NUM_ into x[colname=nNames]; n = countn(x,"col"); nmiss = countmiss(x,"col"); read all var _CHAR_ into x[colname=cNames]; close one; c = countn(x,"col"); cmiss = countmiss(x,"col"); /* combine results for num and char into a single table */ Names = cNames || nNames; rNames = {" Missing", "Not Missing"}; cnt = (cmiss // c) || (nmiss // n); print cnt[r=rNames c=Names label=""];

This is similar to the output produced by the macro in the SAS Knowledge Base article. You can also write the `cnt` matrix to a data set, if necessary.