Amber Elam

1月 112020
 
In SAS 9.3 and earlier, the default value of the YEARCUTOFF= option is 1920. This default setting could trigger data integrity issues because any 2-digit years of "20" in dates will be assumed to occur in 1920 instead of 2020. If the intended year in the date is 2020, you must set the YEARCUTOFF= option to a value larger than 1920. Of course, the best alternative is to always specify date values with 4-digit years.

Luckily, SAS makes it easy to change the YEARCUTOFF= option so that it works best for your data. The default value for the YEARCUTOFF= option changed in SAS 9.4 to 1926. This change makes it easier for customers who are still using 2-digit years of "20" to make sure that the date is assigned to 2020. Let's review some of the frequently asked questions that customers ask about how SAS works with 2-digit years.

What is the YEARCUTOFF= option?

The YEARCUTOFF= option lets you specify which century SAS software should assign to dates with 2-digit years.

How do I specify the YEARCUTOFF= option in my SAS programs?

The option is specified in an OPTIONS statement. Here is an example:

options yearcutoff=1930;

You can also specify the option in an autoexec file or a config file. If you don't specify the YEARCUTOFF= option, the SAS system default is used. Remember that 1920 is the default for SAS 9.3 and earlier releases and 1926 is the default for SAS 9.4. (For reference, SAS 9.3 was released in 2011. The first release of SAS 9.4 was released in 2013.)

How does the YEARCUTOFF= option work?

The YEARCUTOFF= option specifies the first year of a 100-year window in which all 2-digit years are assumed to occur. For example, if the YEARCUTOFF= option is set to 1920, all 2-digit years are assumed to occur between 1920 and 2019. This means that two-digit years from 20 - 99 are assigned a century prefix of "19" and all 2-digit years from 00 - 19 have a century prefix of "20."

Which types of date values are affected by the YEARCUTOFF= option?

The YEARCUTOFF= option affects the interpretation of 2-digit years in the following cases:

  • Reading date values from external files
  • Specifying dates or year values in SAS functions
  • Specifying SAS date literals

The YEARCUTOFF= option does not influence the following cases:

  • Processing dates with 4-digit years
  • Processing dates already stored as SAS date values (the number of days since January 1, 1960)
  • Displaying dates with SAS date formats

Which value should I set the YEARCUTOFF= option to?

The optimal value depends on the range of dates in your data. The YEARCUTOFF= option should be set so that the 100-year range encompasses the range of your data values. In general, SAS recommends setting the YEARCUTOFF= option to a value equal to or slightly less than the first year in your data. For example, if the range of dates that you are processing is from 1930 - 2010, a YEARCUTOFF value of 1925 or 1930 would be appropriate. If you set YEARCUTOFF=1925, then all 2-digit years are assumed to be in the 100-year period from 1925 to 2024. If all the dates in your data fall within that range, they will be interpreted correctly.

What do I do if my dates with 2-digit years span more than 100 years?

The YEARCUTOFF= option cannot reliably assign centuries to 2-digit years if the range of dates for a variable is greater than 100 years. If the date ranges for a variable span more than 100 years, you must either specify the dates with 4-digit years or use DATA step logic to assign a century to each year (perhaps based on the value of another variable).

But why does the YEARCUTOFF= option allow only a 100-year span? If the YEARCUTOFF= option allowed for more than a 100-year span, there would be no way to determine which century a 2-digit year should have. For example, let’s assume that YEARCUTOFF=1950 with a 150-year span and your external data file had 2-digit years. In this scenario, your 150-year span would be from 1950 to 2100. Since you have 2-digit years, there would be no way to determine if the year 00 was meant to occur during 2000 or 2100.

How do I change the default setting for all the SAS users at my site?

Setting system default option values is usually done by a site SAS Installation Representative. The recommended method for setting a system default YEARCUTOFF= value is to specify the desired value in the system SAS configuration file. Note that even if you set a default value for all the users at your site, they can override the default value in their SAS programs, in personal autoexec files, in config files, by setting an environment variable, or when invoking SAS software.

How do I change the default setting for my own programs if I want a default that is different from the rest of the users at my site?

You can specify personal default values either in a personal configuration file or in an autoexec file. If you specify the value in a personal configuration file, the syntax depends on your operating system and is the same as that for setting the value in the system-wide configuration file on each system. If you use an autoexec file, you can specify the YEARCUTOFF= option in an OPTIONS statement.  Here is an example:

options yearcutoff=1930;

Additional Resources

  • YEARCUTOFF= System Option section in SAS® 9.3 System Options: Reference, Second Edition
  • YEARCUTOFF= System Option section in SAS® 9.4 System Options: Reference, Fifth Edition
  • SAS Note 46368, "The default value for the YEARCUTOFF= system option has changed in SAS® 9.4 and beyond"
  • SAS Note 65307, "You might encounter an issue in which 2-digit year dates have the wrong century in SAS® 9.3 and earlier releases"

Why does my SAS date have the wrong century? was published on SAS Users.

10月 212017
 

using the IMPORT procedure to read files that contain delimitersReading an external file that contains delimiters (commas, tabs, or other characters such as a pipe character or an exclamation point) is easy when you use the IMPORT procedure. It's easy in that variable names are on row 1, the data starts on row 2, and the first 20 rows are a good sample of your data. Unfortunately, most delimited files are not created with those restrictions in mind.  So how do you read files that do not follow those restrictions?

You can still use PROC IMPORT to read the comma-, tab-, or otherwise-delimited files. However, depending on the circumstances, you might have to add the GUESSINGROWS= statement to PROC IMPORT or you might need to pre-process the delimited file before you use PROC IMPORT.

Note: PROC IMPORT is available only for use in the Microsoft Windows, UNIX, or Linux operating environments.

The following sections explain four different scenarios for using PROC IMPORT to read files that contain the delimiters that are listed above.

Scenario 1

In this scenario, I use PROC IMPORT to read a comma-delimited file that has variable names on row 1 and data starting on row 2, as shown below:

proc import datafile='c:\temp\classdata.csv' 
out=class dbms=csv replace;
run;

 

When I submit this code, the following message appears in my SAS® log:

NOTE: Invalid data for Age in line 28 9-10.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
28        Janet,F,NA,62.5,112.5 21
Name=Janet Sex=F Age=. Height=62.5 Weight=112.5 _ERROR_=1 _N_=27
NOTE: 38 records were read from the infile 'c:\temp\classdata.csv'.
      The minimum record length was 17.
      The maximum record length was 21.
NOTE: The data set WORK.CLASS has 38 observations and 5 variables.

 

In this situation, how do you prevent the Invalid Data message in the SAS log?

By default, SAS scans the first 20 rows to determine variable attributes (type and length) when it reads a comma-, tab-, or otherwise-delimited file.  Beginning in SAS® 9.1, a new statement (GUESSINGROWS=) is available in PROC IMPORT that enables you to tell SAS how many rows you want it to scan in order to determine variable attributes. In SAS 9.1 and SAS® 9.2, the GUESSINGROWS= value can range from 1 to 32767.  Beginning in SAS® 9.3, the GUESSINGROWS= value can range from 1 to 2147483647.  Keep in mind that the more rows you scan, the longer it takes for the PROC IMPORT to run.

The following program illustrates the use of the GUESSINGROWS= statement in PROC IMPORT:

proc import datafile='c:\temp\classdata.csv' out=class              dbms=csv replace;
guessingrows=100;
run;

 

The example above includes the statement GUESSINGROWS=100, which instructs SAS to scan the first 100 rows of the external file for variable attributes. You might need to increase the GUESSINGROWS= value to something greater than 100 to obtain the results that you want.

Scenario 2

In this scenario, my delimited file has the variable names on row 4 and the data starts on row 5. When you use PROC IMPORT, you can specify the record number at which SAS should begin reading.  Although you can specify which record to start with in PROC IMPORT, you cannot extract the variable names from any other row except the first row of an external file that is comma-, tab-, or an otherwise-delimited.

Then how do you program PROC IMPORT so that it begins reading from a specified row?

To do that, you need to allow SAS to assign the variable names in the form VARx (where x is a sequential number). The following code illustrates how you can skip the first rows of data and start reading from row 4 by allowing SAS to assign the variable names:

proc import datafile='c:\temp\class.csv' out=class dbms=csv replace;
getnames=no;
datarow=4;
run;

 

Scenario 3

In this scenario, I want to read only records 6–15 (inclusive) in the delimited file. So the question here is how can you set PROC IMPORT to read just a section of a delimited file?

To do that, you need to use the OBS= option before you execute PROC IMPORT and use the DATAROW= option within PROC IMPORT.

The following example reads the middle ten rows of a CSV file, starting at row 6:

options obs=15; 
 
proc import out=work.test2  
            datafile= "c:\temp\class.csv" 
            dbms=csv replace; 
            getnames=yes; 
            datarow=6; 
run; 
 
options obs=max; 
run;

 

Notice that I reset the OBS= option to MAX after the IMPORT procedure to ensure that any code that I run after the procedure processes all observations.

Scenario 4

In this scenario, I again use PROC IMPORT to read my external file. However, I receive more observations in my SAS data set than there are data rows in my delimited file. The external file looks fine when it is opened with Microsoft Excel. However, when I use Microsoft Windows Notepad or TextPad to view some records, my data spans multiple rows for values that are enclosed in quotation marks.  Here is a snapshot of what the file looks like in both Microsoft Excel and TextPad, respectively:

The question for this scenario is how can I use PROC IMPORT to read this data so that the observations in my SAS data set match the number of rows in my delimited file?

In this case, the external file contains embedded carriage return (CR) and line feed (LF) characters in the middle of the data value within a quoted string. The CRLF is an end-of-record marker, so the remaining text in the string becomes the next record. Here are the results from reading the CSV file that is illustrated in the Excel and TextPad files that are shown earlier:

That behavior is why you receive more observations than you expect.  Anytime SAS encounters a CRLF, SAS considers that a new record regardless of where it is found.

A sample program that removes a CRLF character (as long as it is part of a quoted text string) is available in SAS Note 26065, "Remove carriage return and line feed characters within quoted strings."

After you run the code (from the Full Code tab) in SAS Note 26065 to pre-process the external file and remove the erroneous CR/LF characters, you should be able to use PROC IMPORT to read the external file with no problems.

For more information about PROC IMPORT, see "Chapter 35, The IMPORT Procedure" in the Base SAS® 9.4 Procedures Guide, Seventh Edition.

 

 

Tips for using the IMPORT procedure to read files that contain delimiters was published on SAS Users.