The DOLIST syntax: Specify a list of numerical values in SAS
Have you ever heard of the DOLIST syntax? You might know the syntax even if you are not familiar with the name. The DOLIST syntax is a way to specify a list of numerical values to an option in a SAS procedure. Applications include:
- Specify the end points for bins of a histogram
- Specify percentiles to be output to a data set
- Specify tick marks for a custom axis on a graph
- Specify the location of reference lines on a graph
- Specify a list of parameters for an algorithm. Examples include smoothing parameters (the SMOOTH= option in PROC LOESS), sample sizes (the NTOTAL= option in PROC POWER), and initial guess for parameters in an optimization (the PARMS statement in PROC NLMIXED and PROC NLIN)
This article demonstrates how to use the DOLIST syntax to specify a list of values in SAS procedures. It shows how to use a single statement to specify individual values and also a sequence of values.
The DOLIST syntax enables you to write a single statement that specifies individual values and one or more sequences of values. The DOLIST syntax should be in the toolbox of every intermediate-level SAS programmer!
The DOLIST syntax in the SAS DATA step
According to the documentation of PROC POWER, the syntax described in this article is sometimes called the DOLIST syntax because it is based on the syntax for the iterative DO loop in the DATA step.
The most common syntax for a DO loop is DO x = start TO stop BY increment. For example, DO x = 10 TO 90 BY 20; iterates over the sequence of values 10, 30, 50, 70, and 90. If the increment is 1, you can omit the BY increment portion of the statement. However, you can also specify values as a common-separated list, such as DO x = 10, 30, 50, 70, 90;, which generates the same values. What you might not know is that you can combine these two methods. For example, in the following DATA step, the values are specified by using two comma-separated lists and three sequences. For clarity, I have placed each list on a separate line, but that is not necessary:
/* the DOLIST syntax for a DO loop in the DATA step */ data A; do pctl = 5, /* individual value(s) */ 10 to 50 by 20, /* a sequence of values */ 54.3, 69.1, /* individual value(s) */ 80 to 90 by 5, /* another sequence */ 60 to 40 by -20; /* yet another sequence */ output; end; run; proc print; run;
The output (not shown) is a list of values: 5, 10, 30, 50, 54.3, 69.1, 80, 85, 90, 60, 40. Notice that the values do not need to be in sorted order, although they often are.
The expressions to the right of the equal sign are what I mean by the "DOLIST syntax." You can use the same syntax to specify a list of options in many SAS procedures. When the SAS documentation says that an option takes a "list of values," you can often use a comma-separated list, a space-separated list, and the syntax start TO stop BY increment. (Or a combination of these expressions!) The following sections provide a few examples, but there are literally hundreds of options in SAS that support the DOLIST syntax!
Some procedures (for example, PROC SGPLOT) require the DOLIST values to be in parentheses. Consequently, I have adopted the convention of always using parentheses around DOLIST values, even if the parentheses are not strictly required. As far as I know, it is never wrong to put the DOLIST inside parentheses, and it keeps me from having to remember whether parentheses are required. The examples in this article all use parentheses to enclose DOLIST values.
Histogram bins and percentiles
You can use the DOLIST syntax to specify the endpoints of bins in a histogram. For example, in PROC UNIVARIATE, the ENDPOINTS= option in the HISTOGRAM statement supports a DOLIST. Because histograms use evenly spaced bins, usually you will specify only one sequence, as follows:
proc univariate data=sashelp.cars; var weight; histogram weight / endpoints=(1800 to 7200 by 600); /* DOLIST sequence expression */ run;
You can also use the DOLIST syntax to specify percentiles. For example, the PCTLPTS= option on the OUTPUT statement enables you to specify which percentiles of the data should be written to a data set:
proc univariate data=sashelp.cars; var MPG_City; output out=UniOut pctlpre=P_ pctlpts=(50 75, 95 to 100 by 2.5); /* DOLIST */ run;
Notice that this example specifies both individual percentiles (50 and 75) and a sequence of percentiles (95, 97.5, 100).
Tick marks and reference lines
The SGPLOT procedure enables you to specify the locations of tick marks on the axis of a graph. Most of the time you will specify an evenly spaced set of values, but (just for fun) the following example shows how you can use the DOLIST syntax to combine evenly spaced values and a few custom values:
title "Specify Ticks on the Y Axis"; proc sgplot data=sashelp.cars; scatter x=Weight y=Mpg_City; yaxis grid values=(10 to 40 by 5, 50 60); /* DOLIST; commas optional */ run;
As shown in the previous example, the GRID option on the XAXIS and YAXIS statements enables you to display reference lines at each tick location. However, sometimes you want to display reference lines independently from the tick marks. In that case, you can use the REFLINE statement, as follows:
title "Many Reference Lines"; proc sgplot data=sashelp.cars; scatter x=Weight y=MPG_City; refline (1800 to 6000 by 600, 7000) / axis=x; /* many reference lines */ run;
Many statistical procedures have options that support lists. In most cases, you can use the DOLIST syntax to provide values for the list.
I have already written about how to use the DOLIST syntax to specify initial guesses for the PARM statement in PROC NLMIXED and PROC NLIN. The documentation for the POWER procedure discusses how to specify lists of values and uses the term "DOLIST" in its discussion.
Some statistical procedures enable you to specify multiple parameter values, and the analysis is repeated for each parameter in the list. One example is the SMOOTH= option in the MODEL statement of the LOESS procedure. The SMOOTH= option specifies values of the loess smoothing parameter. The following call to PROC LOESS fits four loess smoothers to the data. The call to PROC SGPLOT overlays the smoothers on a scatter plot of the data:
proc loess data=sashelp.cars plots=none; model MPG_City = Weight / smooth=(0.1 to 0.5 by 0.2, 0.75); /* value-list */ output out=LoessOut P=Pred; run; proc sort data=LoessOut; by SmoothingParameter Weight; run; proc sgplot data=LoessOut; scatter x=Weight y=MPG_City / transparency=0.9; series x=Weight y=Pred / group=SmoothingParameter curvelabel curvelabelpos=min; run;
In summary, this article describes the DOLIST syntax in SAS, which enables you to simultaneously specify individual values and evenly spaced sequences of values. A sequence is specified by using the start TO step BY increment syntax. The DOLIST syntax is valid in many SAS procedures and in the DATA step. In some procedures (such as PROC SGPLOT), the syntax needs to be inside parentheses. For readability, you can use commas to separate individual values and sequences.
Many SAS procedures accept the special syntax even if it is not explicitly mentioned in the documentation. In the documentation for an option, look for terms such as value-list or numlist or value-1 <...value-n>, which indicate that the option supports the DOLIST syntax.
The post The DOLIST syntax: Specify a list of numerical values in SAS appeared first on The DO Loop.