Writing file contents to the SAS® log
Perhaps you are reading a file and seeing “invalid data” messages in the log or you are not sure which delimiter is used between variables. You can use the null INPUT statement to read a small sample of the data. Specify the amount of data that you want to read by adjusting options in the INFILE statement. This example reads one record with 500 bytes:
data _null_; infile 'path' recfm=f lrecl=500 obs=1; input; list; run;
The LIST statement writes the current record to the log. In addition, it includes a ruled line above the record so that it is easier to see the data for each column of the file. If the data contains at least one unprintable character, you will see three lines of output for each record written.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+ 1 CHAR this is a test.123.dog.. 24 ZONE 766726726276770333066600 NUMR 48930930104534912394F7DA
The line beginning with CHAR describes the character that is represented by the two hexadecimal characters underneath. When you see a period in that line, it might actually be a period, but it often means that the hexadecimal characters do not map to a specific key on the keyboard. Also, column 15 for this data is a tab ('09'x). By looking at this record in the log, you can see that the first variable contains spaces and that the file is tab delimited.
Writing hexadecimal output for all input
Since SAS® 9.4M6 (TS1M6), the HEXLISTALL option in the LIST statement enables all lines of input data to be written in hexadecimal format to the SAS log, regardless of whether there are unprintable characters in the data:
data mylib.new / hexlistall;
Removing “invalid data” messages from the log
When SAS reads data into a data set, it is common to encounter values that are invalid based on the variable type or the informat that is specified for a variable. You should examine the SAS log to assess whether the informat is incorrect or if there are some records that are actually invalid. After you ensure that the data is being read correctly, you can dismiss any “invalid data” messages by using double question mark (??) modifiers after the variable name in question:
input date ?? mmddyy10.;
You can also use the question mark modifiers when you convert a character variable to numeric with the INPUT function:
The above syntax reads all values with the MMDDYY10. informat and then dismisses the notes to the log when some values for the OLDDATE variable are invalid.
Sharing files between UNIX and Microsoft Windows operating systems
The end-of-record markers in text files are different for UNIX and Windows operating systems. If you needed to share a file between these systems, the file used to need preprocessing in order to change the markers to the desired type or perhaps specify a delimiter. Now, the TERMSTR= option in the INFILE statement enables you to specify the end-of-record marker in the incoming file.
If you are working in a UNIX environment and you need to read a file that was created in a Windows environment, use the TERMSTR=CRLF option:
infile 'file-specification' termstr=crlf ;
If you are in a Windows environment and you need to read a file that was created in a UNIX environment, use this syntax:
infile 'file-specification' termstr=lf ;
Adapting array values from an ARRAY statement
The VNAME function makes it very convenient to use the variable names from an ARRAY statement. You most often use an ARRAY statement to obtain the values from numerous variables. In this example, the SQL procedure makes it easy to store the unique values of the variable Product into the macro variable &VARLIST and that number of values into the macro variable &CT (another easy tip). Within the DO loop, you obtain the names of the variables from the array, and those values from the array then become variable names.
proc sql noprint; select distinct(product) into :varlist separated by ' ' from one; select count(distinct product) into :ct from one; quit; …more DATA step statements… array myvar(&ct) $ &varlist; do i=1 to &ct; if product=vname(myvar(i)) then do; myvar(i)=left(put(contract,8.)); end; …more DATA step statements…
Using a mathematical equation in a DO loop
A typical DO loop has a beginning and an end, both represented by integers. Did you know you can use an equation to process data more easily? Suppose that you want to process every four observations as one unit. You can run code similar to the following:
…more DATA step statements… J+1; do i=((j*4)-3) to (j*4); set data-set-name point=I; …more DATA step statements… end;
Using an equation to point to an array element
With small data sets, you might want to put all values for all observations into a single observation. Suppose that you have a data set with four variables and six observations. You can create an array to hold the existing variables and also create an array for the new variables. Here is partial code to illustrate how the equation dynamically points to the correct new variable name in the array:
array old(4) a b c d; array test (24) var1-var24; retain var1-var24; do i=1 to nobs; set w nobs=nobs point=i; do j=1 to 4; test(((i-1)*4)+j)=old(j); end; end;
Creating a hexadecimal reference chart
How often have you wanted to know the hexadecimal equivalent for a character or vice versa? Sure, you can look up a reference chart online, but you can also create one with a short program. The BYTE function returns values in your computer’s character set. The PUT statement writes the decimal value, hexadecimal value, and equivalent character to the SAS log:
data a; do i=0 to 255; x=byte(i); put i=z3. i=hex2. x=; end; run;
When the resulting character is unprintable, such as a carriage return and line feed (CRLF) character, the X value is blank.
Hopefully, these new tips will be useful in your future programming. If you know some additional tips, please comment them below so that readers of this blog can add even more DATA step tips to their arsenal! If you would like to learn more about DATA step, check out these other blogs:
- A DATA step compiler trick to get the record count
- How to achieve repeatable results with distributed DATA Step BY Groups
- DATA STEP text file tricks