LOCALE system option

9月 072020
 

Locale-specific SAS® format catalogs make reporting in multiple languages more dynamic. It is easy to generate reports in different languages when you use both the LOCALE option in the FORMAT procedure and the LOCALE= system option to create these catalogs. If you are not familiar with the LOCALE= system option, see the "Resources" section below for more information.

This blog post, inspired by my work on this topic with a SAS customer, focuses on how to create and use locale-specific informats to read in numeric values from a Microsoft Excel file and then transform them into SAS character values. I incorporated this step into a macro that transforms ones and zeroes from the Excel file into meaningful information for multilingual readers.

Getting started: Creating the informats

The first step is to submit the LOCALE= system option with the value fr_FR. For the example in this article, I chose the values fr_FR and en_US for French and English from this table of LOCALE= values. (That is because I know how to say “yes” and “no” in both English and French — I need to travel more!)

   options locale=fr_fr;

The following code uses both the INVALUE statement and the LOCALE option in PROC FORMAT to create an informat that is named $PT_SURVEY:

   proc format locale library=work;
      invalue $pt_survey 1='oui' 0='non'; run;

Now, toggle the LOCALE= system option and create a second informat using labels in a different language (in this example, it is English):
options locale=en_us;

   proc format locale library=work;
      invalue $pt_survey 1='yes' 0='no';
   run;

In the screenshot below, which shows the output from the DATASETS procedure, you can see that PROC FORMAT created two format catalogs using the specified locale values, which are preceded by underscore characters. If the format catalogs already exist, PROC FORMAT simply adds the $PT_SURVEY informat entry type to them.

   proc datasets memtype=catalog; 
   quit;

Before you use these informats for a report, you must tell SAS where the informats are located. To do so, specify /LOCALE after the libref name within the FMTSEARCH= system option. If you do not add the /LOCALE specification, you see an error message stating either that the $PT_SURVEY informat does not exist or that it cannot be found. In the next two OPTIONS statements, SAS searches for the locale-specific informat in the FORMATS_FR_FR catalog, which PROC FORMAT created in the WORK library:

   options locale=fr_fr;
   options fmtsearch=(work/locale);

If you toggle the LOCALE= system option to have the en_US locale value, SAS then searches for the informat in the other catalog that was created, which is the FORMATS_EN_US catalog.

Creating the Excel file for this example

For this example, you can create an Excel file by using the ODS EXCEL destination from the REPORT procedure output. Although you can create the Excel file in various ways, the reason that I chose the ODS EXCEL statement was to show you some options that can be helpful in this scenario and are also useful at other times.
Use the ODS EXCEL destination to create a file from PROC REPORT. I specify the TAGATTR= style attribute using “TYPE:NUMBER” for the Q_1 variable:

   %let  path=%sysfunc(getoption(WORK));
   filename temp "&path\surveys.xlsx"; 
   ods excel file=temp;
 
 
   data one;
      infile datalines truncover;
      input ptID Q_1;
      datalines;
   111 0
   112 1
   ;
   run;
 
   proc report data=one;
      define ptID / display style(column)={tagattr="type:String"};
      define Q_1 / style(column)={tagattr="type:Number"};
   run;
 
   ods excel close;

Now you have a file that looks like this screenshot when it is opened in Excel. Note that the data value for the Q_1 column is numeric:

The IMPORT procedure uses the DBSASTYPE= data set option to convert the numeric Excel data into SAS character values. Then I can apply the locale-specific character informat to a character variable.

As you will see below, in the macro, I use DBMS=EXCEL in PROC IMPORT to read the Excel file because my SAS and Microsoft Office versions are both 64-bit. (You might have to use the PCFILES LIBNAME Engine to connect to Excel through the SAS PC Files Server if you are not set up this way.)

Using the informats in a macro to create the multilingual reports

The final step is to run the macro with parameters to produce the two reports in French and English, using the locale-specific catalogs. When the macro is called, depending on the parameter value for the macro variable LOCALE, the LOCALE= system option changes, and the $PT_SURVEY informat from the locale-specific catalog is applied. These two tabular reports are produced:

Here is the full code for the example:

   %let  path=%sysfunc(getoption(WORK));
   filename temp "&path\surveys.xlsx";
   ods excel file=temp;
 
   data one;
      infile datalines truncover;
      input ptID Q_1;
      datalines;
   111 0
   112 1
   ;
   run;
 
   proc report data=one;
      define ptID / display style(column)={tagattr="type:String"};
      define Q_1 / style(column)={tagattr="type:Number"};
   run;
 
   ods excel close;
   options locale=fr_fr;
 
   proc format locale library=work;
      invalue $pt_survey 1='oui' 0='non';
   run;
 
   options locale=en_us;
 
   proc format locale library=work;
      invalue $pt_survey 1='yes' 0='no';
   run;
 
   /* Set the FMTSEARCH option */
   options fmtsearch=(work/locale);
 
   /* Compile the macro */
   %macro survey(locale,out);
      /* Set the LOCALE system option */
      options locale=&locale;
 
      /* Import the Excel file  */
      filename survey "&path\surveys.xlsx";
 
      proc import dbms=excel datafile=survey out=work.&out replace;
         getnames=yes;
         dbdsopts="dbsastype=(Q_1='char(8)')";
      run;
 
      data work.&out;
         set work.&out;
 
         /* Create a new variable for the report whose values are assigned by specifying the locale-specific informat in the INPUT function */
         newvar=input(Q_1, $pt_survey.);
         label newvar='Q_1';
      run;
 
      options missing='0';
 
      /*  Create the tabular report */
      proc tabulate data=&out;
         class ptID newvar;
 
         table ptID='Patient ID', newvar*n=' '/box="&locale";
      run;
 
   %mend survey;
 
   /* Call the macros */
   %survey(fr_fr,fr)
   %survey(en_us,en)

For a different example that does not involve an informat, you can create a format in a locale-specific catalog to print a data set in both English and Romanian. See Example 19: Creating a Locale-Specific Format Catalog in the Base SAS® 9.4 Procedures Guide.

Resources

For more information about the LOCALE option:

For more information about reading and writing Excel files:

For more information about creating macros and using the macro facility in SAS:

Using locale-specific format catalogs to create reports in multiple languages was published on SAS Users.

9月 272014
 

If you live in an English speaking country you are used to a relatively unadorned alphabet. Take a look at the French and Spanish languages, where vowels are decorated with accents like “acción” in Spanish, and the circumflex, or the hat used in “pâte” in French. Look at the gorgeous scripting you get to use if you read and write the letter "a" in Japanese: あ . Nice looking, right?

If you work with data that originates from another country or is distributed across the globe, you need to know about the SAS system options that control how the characters in your data are stored. Two of these options are ENCODING and LOCALE. These options will help guarantee that if your Japanese counterpart sends you SAS information in Japanese, you see the appropriate output, and not a series of question marks or blank boxes in your SAS session, or worse, errors in your log window.

The ENCODING system option instructs SAS how to store the data created by SAS in that session and how to read data from external sources. The LOCALE system option instructs SAS how to represent currency, date and time values, how to display menu items and tasks, and sets default papersize and timezone values.

What exactly is encoding?

Encoding is not a term SAS invented. It is the way that characters are represented by computers. This  W3C Internationalization page is an excellent non-SAS resource if you’d like to study the concept  further.

Many encoding values exist in SAS, so that in combination with the LOCALE option setting, SAS will run in over 100 countries with SAS windows, pmenus and log messages localized accordingly. The SBCS, DBCS, and Unicode Encoding Values for Transcoding Data is a table of the current encoding values for SAS 9.4.

For example, if I invoke SAS with the LOCALE setting of Korean and an encoding value of euc-kr, I will see notes, warnings and errors written to my log in Korean:

NOTE: 변수 'a'이(가) 초기화되지 않았습니다.

The above text is translated to English as:

Note: Variable a is not initialized

Why is encoding important?

The ENCODING system option is important to SAS programmers because its setting determines how individual characters are represented by SAS.  As an illustration, on Windows, using an encoding of Wlatin1, the character “á” is stored in the 255th place. However when running a Unicode (UTF8) encoding session of SAS, the same value is stored internally in the 195th place:

/*RANK Function: */
/*Returns the position of a character in the ASCII collating sequence. */

data test;
 x='á';
 y=getoption('encoding');
 z= rank(x);
 put x=;
 put y=;
 put z=;
run;

x=á x=á
y=WLATIN1 y=UTF-8
z=225 z=195

What is the default encoding value?

As stated previously, the encoding value is set based upon the value of LOCALE, and the regional settings selected in the SAS Deployment Wizard during installation and deployment of SAS software, as shown here:

encoding1

On Windows and UNIX machines, the SAS 9.4 Intelligence Platform installation provides the Locale Setup Manager task in the SAS Deployment Manager to configure the language and region for SAS Foundation and certain SAS applications.  See the  SAS(R) 9.4 Intelligence Platform: Installation and Configuration Guide for more information.

The SAS Technical Paper Multilingual Computing with SAS® 9.4 explains how SAS is deployed for National Language Support.  Three images are automatically deployed for SAS on all Windows and UNIX machines:

  • ‘English’ is a single-byte SAS image that displays an English User Interface and English messages by default. The LOCALE and ENCODING options for the English image are set to match the Regional settings or, if the Regional Settings selection is an Asian language, it sets LOCALE and ENCODING to support en_US.
  • ‘English with DBCS’ is a double-byte SAS image that displays English User Interface and English SAS messages by default. This image supports languages that require a double-byte character set, such as Chinese. If a double-byte language is selected later in the Regional Settings dialog, the LOCALE option in the ‘English with DBCS support’ config file is set to match. Otherwise, the LOCALE defaults to ja_JP.
  • ‘Unicode Support’ is installed for all Windows and UNIX deployments, even if the SAS server is not configured for Unicode support. The ENCODING option is set to utf-8. The LOCALE of the Unicode server is set to match the Regional Settings locale selection.

What if my encoding differs from others with whom I share SAS data?

You could encounter this error because the data set encoding does not match the SAS session encoding:

ERROR: Some character data was lost during transcoding in the data set

A comparison of the PROC OPTIONS group=LANGUAGECONTROL settings with the dataset’s encoding will help determine what steps you should take to ensure you can access and modify the data set in question.  This SAS Note 52716 discusses the lost character data error in detail.

The most common method of preventing this error is to launch SAS using a different configuration file, so that the encoding for the SAS session matches that of the dataset.  Alternatively, requesting data in a different format (i.e., different encoding) is feasible as well.

The logic shown in SAS Note 15597 shows logic to convert the encoding of a SAS data set. However, if it is the case that you will be sharing data with different languages and encoding values, it is imperative that you communicate with those with whom you share data and SAS files to ensure that the SAS system settings you use consistently allow you seamless access to shared data.

What if I have non-SAS data sources?

If you read or access data from a database such as Oracle, it is imperative that your data base client communicates with SAS in order to correctly interpret native characters.   SAS note 51411 tells how to correct a potential problem for both Windows and UNIX systems, and may require the input of your data base administrator. The SAS Technical Paper Multilingual Computing with SAS® 9.4 describes configuration steps for many other data base clients.

Where can I find more information on this topic?

tags: ENCODING system option, LOCALE system option, NLS support, Problem Solvers, SAS Programmers