SAS programmers

5月 192017
 

Technical Support regularly receives incoming calls from customers who have encountered the following transcoding warning:

WARNING: Some character data was lost during transcoding in the data set xxx.xxx. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding

People are not always exactly sure what this warning means nor what to do about it. No worries! This blog provides background information about why this warning occurs and offers resources that can help you understand and resolve any encoding or truncation issues that are the source of the warning above.

In most Western character sets, each character occupies only one byte of computer memory or storage. These are known as single-byte character sets (SBCS). Character sets for languages such as Japanese, Korean, and Simplified and Traditional Chinese require mixed-width encoding. The character sets for these languages are double-byte character sets (DBCS). (The term DBCS can be misleading because not all of the characters in a double-byte character set are two bytes. Thus, the term multibyte character set [MBCS] is sometimes used instead of DBCS.)

Unicode is a multibyte character set that was created to support all languages. It includes all characters from most modern written languages and historic scripts, even hieroglyphs and cuneiform. UTF-8 is one of the more common encoding forms of the Unicode standard, and this encoding is recommended as the SAS session encoding for multilingual environments if your data sources contain characters from many regions. Each character can be up to four bytes in UTF-8. UTF-8 is the SAS session encoding for SAS® Viya™. Clients such as SAS University Edition, SAS Studio, and SAS Visual Analytics typically execute statements in a server environment that run the UTF-8 encoding.

Now, let's look at how SAS handles data from different encodings along with two causes of the warning shown earlier.

Starting in SAS®9, SAS data sets store an encoding indicator in the descriptor portion. If the encoding value of the file differs from the encoding of the currently executing SAS session, the Cross Environment Data Access (CEDA) engine is invoked when SAS reads the data set. CEDA transcodes the data, by default. Transcoding is the process of converting the contents of SAS files from one encoding to another, and this process is necessary in order to read data from around the world.

When transcoding occurs, you receive an informational note about the use of CEDA. In addition, you might see the warning (shown above) in your SAS log.

The transcoding warning is issued when you have values for character variables that have either of the following characteristics:

  • The number of bytes for a character in one encoding do not match the number of bytes that are used for the same character in another encoding (as shown in the image below). If the length of the column is not wide enough to accommodate the additional bytes, truncation of the character data occurs.

  • The characters exist in one encoding, but they do not exist in another encoding. As a result, the transcoding is not successful.

If you use a procedure (for example, the PRINT or REPORT procedures), the procedure runs to completion. The variable is truncated if the additional bytes cause the value to exceed the length of the column. Alternatively, if a character from the data set does not also exist in the target encoding or if it occupies a different code point (after transcoding), the column might be blank or another character might be substituted for the original character. The procedure generates the transcoding warning.

If you have an output data set open, SAS transcodes the character data and stops writing observations to the file at the point it encounters the problematic characters. This behavior can result in either fewer observations than expected or a data set with zero observations. When this happens, SAS generates an error message instead of a warning. This error message contains the same content as the warning message.

ERROR: Some character data was lost during transcoding in the data set xxx.xxx. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding

New Video Available to Help You Resolve the Transcoding Error or Warning

Before you attempt to resolve the transcoding error or warning, it is helpful to understand a bit about encodings and how to handle multilingual SAS data. To help you in this endeavor, SAS Technical Support worked with Education and Training to create a new, short (12 minutes) video to help you quickly troubleshoot and resolve the most common reasons for the transcoding error or warning.

Click the image below to access the video. (Note: You will be prompted to create a SAS profile if you do not have one already.)

Click image to play video

 

If you want to practice the techniques in this video, you can download the data sets that are used in the video here.

This video is based on solutions that have helped customers who called SAS Technical Support for assistance. The video offers these solutions in a visual format that is easy to follow. In addition to showing you how to launch SAS in different encodings to resolve the error, the video also introduces the Character Variable Padding (CVP) engine (Read-only) for processing SAS data files that contain multilingual data.

This video is one of a number of resources available to help you. The next section lists several tips that might help prevent or resolve transcoding warnings or errors.

Helpful Tips from the Field

The following tips have been tested and used in the field, and you might find them helpful when you work with multilingual data. The first two tips are usage tips; the others are programming tips.

  • Invoke SAS in more than one encoding. For this first tip, it is important to know that if you try to change the value of the ENCODING system option during a SAS session, SAS ignores the option. When this situation occurs, the following warning is generated:

WARNING 30-12: SAS option ENCODING is valid only at startup of the SAS System. The SAS option is ignored.

If you work with data that is stored in more than one encoding, you can invoke SAS in more than one session encoding and manage the sessions easily by setting the title bar to display the session encoding in the main SAS window.

Note: This solution only works in the Microsoft Windows operating environment. Display of UTF-8 encoded data in the SAS windowing environment is not fully supported.  SAS Enterprise Guide and SAS Studio have full support for display of UTF-8 characters.

To customize the title bar of the main SAS window.

1.  In a Windows environment, select Start All Programs SAS Additional Languages.

2.  Under Additional Languages, select the shortcut for the session encoding that you want. For example, the following image shows that SAS 9.4 (Unicode Support) is selected

3.  Right-click the shortcut and select Properties. This opens the properties dialog box for that shortcut.

4. On the Shortcut tab, place your cursor at the end of the current string in the Target text box. If it is not already listed in the box, enter the location of the configuration file that you want to use by using the –CONFIG option followed by the –AWSTITLE option.

Then include in quotation marks the text that you want to see in the main SAS window, as shown in the following example:

The entire line should look similar to the following, only it will appear as one long string:

"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\
u8\sasv9.cfg" -awstitle "SAS UTF8"

The following display shows the –AWSTITLE option added to the Target field of the SAS 9.4 (Unicode Support) Properties dialog box:


5
. Click OK.

Now, when you open SAS, you can immediately recognize the session encoding based on the value in the title bar.

You can pin this shortcut to your task bar by dragging it from the Start menu to the task bar. Then you should see your custom title when you hover your cursor over the shortcut.

  • Prevent an unexpected transcoding warning or error: If you read or convert a SAS data set from LATIN1 or WLATIN1 encoding to UTF-8 that contains Microsoft Word smart quotes or dashes in character variables, it can cause truncation errors. Those are not ASCII characters, and they require more than one byte when converted to UTF-8. For example, performing the following steps causes an unexpected transcoding error or warning.
    1. You use SAS in the WLATIN1 encoding.
    2. You copy a quoted string from Microsoft Word and paste that string into the DATALINES section of a SAS program in the SAS windowing environment.
    3. You create a new variable that uses the quoted string in its value.
    4. You execute the SAS program to create a permanent SAS data set.
    5. You invoke a UTF-8 SAS session to read the WLATIN1 data set in a procedure or DATA step.

You can prevent this issue by clearing the "Straight quotes" with "smart quotes" option in Word, as shown below. This option is available by selecting File Options Proofing AutoCorrect Option. On the AutoFormat tab, the option is listed under the Replace category.

  • Set the CVP engine explicitly (for UTF-8 sessions). The transcoding warning is common if you use a UTF-8 session encoding and the input data source is in the WLATIN1 encoding. The first 128 code points of the UTF-8 code page are identical to the 7-bit ASCII encoding. If your character data only contains characters within those 128 code points, SAS does not generate a warning or error when it transcodes the data. For code points above 128, the character variables must be expanded to two or more bytes. Many characters that are used in Western European languages fall into this category. You can address this issue and avoid the warning or error by adding the CVP engine to the LIBNAME statement for your input data source.

The following example LIBNAME statement illustrates how to specify the CVP engine explicitly:

libname mySAS cvp 'SAS-library-path';
data utf8dat;
set mySAS.wlatindat;
run;

In this example, the code is submitted in a UTF-8 SAS session. The CVP engine is Read-only, and it is used to read the input data set from the WLATIN1 encoding.

  • Set the CVP engine implicitly. The CVP engine provides an easy way to convert your files and avoid truncation problems by using a default value of 1.5 times the original length of the character variable. However, that value might not be sufficient for the length of the characters in your data. The expansion length should be based on the characters that are contained in your data.

For example, if your data consists of double-byte character (DBCS) data (Chinese, Japanese, and so on), most characters are three bytes and emoji characters are four bytes. Therefore, the default 1.5 value for the CVP engine might not be enough for the number of additional bytes. A value of 2.5 is a reasonable expansion for DBCS characters, but you might need more.

In such cases, you might want to try setting the CVP engine implicitly. When you set the CVP engine implicitly, you can use either the CVPBYTES= or CVPMULTIPLIER= options. The following example uses the CVPBYTES= option to accommodate the four bytes that are required for the winking emoticon in UTF-8:

libname enc 'c:\public\encdata' cvpbytes=4;

 

  • Convert text in a character variable to another encoding. Sometimes, the CVP engine does not correct the transcoding error or warning. In these cases, Technical Support usually finds that the destination encoding does not support all of the characters found in the data. For example, if you have a UTF-8 data set that contains French, Greek, Hindi, and other characters and you are running a WLATIN1 session, SAS will successfully transcode the French and other Western European characters to WLATIN1. However, the Greek and Hindi characters cannot be transcoded because there is no representation for them in the WLATIN1 encoding. As a result, an error is generated, and the data is lost.

If you have this kind of multilingual data, you can use the KPROPDATA() function to transcode a character string from and to any encoding and convert characters that are not represented in the target encoding to a character of your choice (a question mark or a space). Note that you must specify BINARY or ANY as the input encoding to prevent CEDA from transcoding your data. (See the next tip for information about suppressing transcoding.) A macro is available in the technical paper Multilingual Computing with SAS 9.4 (on page 7) that you can run to accomplish these tasks.

  • Suppress transcoding. There are certain cases in which you might want to bypass transcoding errors. For example, suppose that you know that your data contains only ASCII characters. You are providing data sets to users and you do not know whether the users will run SAS in WLATIN1 or UTF-8 session encodings. Because the characters are ASCII, SAS does not need to transcode them. You do not want CEDA notes nor the extra overhead that CEDA requires. In such a case, you can suppress transcoding without risking data loss. In the LIBNAME statement for the output data set, specify either the OUTENCODING=ASCIIANY option or the ENCODING=ASCIIANY data set option.

Example 1:

libname final 'SAS-library-path' outencoding=asciiany; 
proc copy in=sashelp out=final noclone;
   select dsname;
run;

Example 2:

libname final 'SAS-library-path'
data final.class(encoding=asciiany);
   set dsname;
run;

The value of ASCIIANY means no transcoding occurs on ASCII machines. When the data source is open on an EBCDIC machine, SAS assumes that the data is ASCII and transcodes it into the EBCDIC session encoding. Note: The CONTENTS procedure shows the encoding value: us-ascii ASCII (ANSI).

The encoding options shown above have two other values:

  • ANY: With this value, SAS does not transcode at all, which is, effectively, binary mode.
  • EBCDICANY: With this value, transcoding only occurs on ASCII machines. SAS assumes the data is EBCDIC and transcodes the character data into the ASCII session encoding.

Additional Self-Help Resources

This section lists several documents that provide more detail about encoding concepts and suggests ways to handle transcoding problems.

  • Encoding: Helping SAS speak your language. This blog by Bari Lawhorn briefly explains what an encoding is, how to determine the default encoding, what to do if your encoding differs from that of other people with whom you share SAS data, and where to find more in-depth information about encodings.
  • Multilingual Computing with SAS® 9.4: This paper, which describes new and enhanced features for handling multiple languages in SAS 9.4, covers topics such as installing and configuring the SAS System, changing the locale of a deployment, working with multibyte data, multilingual support in SAS® Visual Analytics, and more
  • SAS® and UTF-8: Ultimately the Finest. Your Data and Applications Will Thank You!: This SAS Global Forum 2017 paper by Elizabeth Bales and Wei Zheng is a perfect go-to resource when you are migrating data files and formats from another encoding to UTF-8.
  • SAS® Encoding: Understanding the Details: This SAS Press book, by Manfred Kiefer, explains the basics about character encoding that are required for creating, manipulating, and rendering any type of character. This book also provide examples for troubleshooting a variety of encoding problems.
  • SAS® 9.4 National Language Support (NLS): Reference Guide, Fifth Edition. This user guide provides detailed information about encodings, transcoding, and other NLS topics (including dictionaries of NLS autocall macros, data set options, system options, formats, informats, functions, and procedures)
  • SAS Note 15597, "How to convert SAS data set encoding." This note explains how you can convert the encoding for a SAS data set.
  • SAS Note 52716, "The error "Some character data was lost during transcoding in the data set" occurs when the data set encoding does not match the SAS® session encoding."
  • SAS Sample 55054, "PROC SQL can be used to identify data representation and encoding for all data sets in a library:" This sample enables you to query a library of data sets that are in mixed encodings so you can plan to use them in compatible session encodings or you can convert the data sets to another encoding. Using the SQL procedure in this sample, you can create output similar to the following that shows you the data representation and encoding for all data sets in your library.

 

 

Demystifying and resolving common transcoding problems was published on SAS Users.

4月 282017
 

ETL automationWhen developing SAS applications, you can feed database tables into your application by using the libname access engine either by directly referring a database table, or via SAS or database views that themselves refer to one or more of the database tables.

However, such on-the-fly data access may not be efficient enough, especially for interactive SAS applications when system response time is critical. In case of distributed systems where a database might reside on one server, and the SAS Application server – on a different physical server, user wait time caused by delays in data access across networks and databases might become intolerable. In such cases, it makes perfect sense to extract database tables and load them into SAS tables in advance, preferably placing them on (or close to) the SAS Application server. That will ensure higher operational efficiency (responsiveness) of the interactive SAS application.

SAS Data Integration Studio provides a powerful visual design tool for building, implementing, and managing data integration processes across different data sources, applications, and platforms.

However, in case of multiple tables, loading them one by one using even such a powerful tool might become a bit tedious. In an automation paradigm, being “visual” is not necessarily a good thing; when we automate it implies that we want to get things done without visualizing or even seeing them.

Here is a SAS coding approach, which I recently used for a customer project, to automate the repetitive process.

SAS code example of loading multiple Oracle tables into SAS tables

There were several Oracle tables that needed to be extracted and loaded into SAS data tables on a different physical server. The Oracle tables contained multi-year data, but we were only interested in the latest year’s data. That is those tables that have datetime stamp EVENT_DT column needed to be subset to the latest rolling 365 days.

Step 1. Creating a driver table

In order to make this process data-driven, let’s first create a driver table containing a list of the table names that needed to be extracted and loaded:

/* --------------------------------------------------- */
/* create table list to extract & load,                */
/* dt_flag indicates whether EVENT_DT variable present */
/* --------------------------------------------------- */
 
libname parmdl '/sas/data/parmdata';
 
data parmdl.tablelist;
        length tname $8 dt_flag $1;
        input tname dt_flag;
        datalines;
ADDRESS N
ACCOUNT Y
BENEFIT Y
EXCLUSN N
FINANCE Y
HOUSING Y
;

This program runs just once to create the driver table parmdl.tablelist.

Step 2. Loading multiple tables

The following data-driven program runs every time when you need to reload Oracle tables into SAS:

/* source ORACLE library */
libname oralib oracle path="xxx" schema="yyy" user="uuu"
 	PASSWORD="{SAS002}D2DF612A161F7F874C4EF97F" access=readonly;
 
/* target SAS library */
libname sasdl '/sas/data/appdata';
 
/* driver table SAS library */
libname parmdl '/sas/data/parmdata';
 
options symbolgen mprint mlogic;
%let cutoff_date = %eval(%sysfunc(today()) - 365);
 
/* --------------------------------------------------- */
/* loop through table list to extract & load           */
/* --------------------------------------------------- */
%macro ETL;
   %let dsid = %sysfunc(open(parmdl.tablelist));
   %syscall set(dsid);
   %do %while(%sysfunc(fetch(&dsid)) eq 0);
      data sasdl.&tname;
         set oralib.&tname;
         %if &dt_flag eq Y %then %str(where datepart(EVENT_DT) ge &cutoff_date;);
      run;
   %end;
   %let rc = %sysfunc(close(&dsid));
%mend ETL;
%ETL

In this code, we use a similar coding technique to that described in my earlier blog post Modifying variable attributes in all datasets of a SAS library. We loop through the initially created table parmdl.tablelist. in a macro %do-loop, and for each observation containing a single table name, implement a data step extracting one table at a time.

If macro variable &dt_flag=Y, then we generate and insert a where statement sub-setting the extracted table to the latest year’s data.

Note a very useful feature here %syscall set(dsid) that links SAS data set variables to macro variables that have the same name and data type. Since we have not defined those macro variables, in essence, it creates a namesake macro variable for each SAS data set variable.

Implementing data-driven load in SAS Data Integration Studio

The above SAS code is run on the SAS Application server – the same server where the SAS target library resides. To fully automate the ETL process, this code may be embedded into a SAS job in SAS Data Integration Studio and then scheduled via either Platform Suite for SAS or your operating system scheduler. Alternately, it can be run in batch mode or via script that that in turn can be scheduled by the operating system scheduler.

In SAS DI Studio:

1.  Create a New job and give it a meaningful name (e.g. Loading Oracle Tables into SAS Tables.)

2.  Drag & drop User Written Code transformation onto the SAS Job canvas and give it a meaningful name (e.g. ETL macro):

SAS Data Integration Studio Job

3.  Go to the Properties of this transformation and click on the Code tab, then paste your SAS code in the text entry area:

SAS DI job properties

4.  Click OK and Save your SAS Job.

Note, that we do not add Source and Target tables to the Job canvas, only User Written Code transformation. Also note, that the %ETL macro has no direct references to the database table names being extracted and loaded into SAS datasets. Instead, it receives the names of the tables to be loaded from the PARMDL.TABLELIST dataset. This makes it extremely flexible to the task of processing a completely different set of tables; all you need to do is update the table list in the driver table. No change to your SAS DI job is needed.

Now it's your turn

What is your experience with automating ETL processes? Please feel free to share it here in the Comments section.

 

Automating the loading of multiple database tables into SAS tables was published on SAS Users.

4月 212017
 

send an email that embeds a graphWhen using the SAS® system to email graphics output, a common request is to use SAS to send an email in which the graphics output is embedded in the body of the email. This functionality is not available until the second maintenance release for SAS® 9.4 (TS1M2). If you are using a version of SAS earlier than SAS 9.4 TS1M2, your best option is to create graphics output in a format such as RTF or PDF, and then attach the RTF or PDF file to your email.

Using the INLINED Option to Embed Graphics

If you are running SAS 9.4 TS1M2 or later, you can embed graphics output in an email. To do this, use the INLINED suboption with the ATTACH option in a SAS FILENAME statement that uses the EMAIL engine. Here is an example:

filename sendmail email to=("first.last@company.com")          from=("first.last@company.com")
    attach=("c:\temp\email.png" inlined='sgplot')
    type='text/html' subject="Emailing graphics output";

 

Then, later in your code, reference the value specified for the INLINED option in DATA step code that creates custom HTML output, as shown below:

                data _null_;  
        file sendmail;  
  put '<html>';
  put '<body>';
  put '<img src=cid:sgplot>';
  put '</body>';
  put '</html>';
run;

With this technique, although the graph is sent as an attachment, the attachment is hidden. When the email recipient opens the email, the attached graph is automatically displayed in the email (so that it looks like the graph is embedded in the body of the email).

Note: When using SAS to email graphics output, you must first set the EMAILSYS system option to SMTP and the EMAILHOST system option to the name of the SMTP email server at your site.

Embedding Multiple Graphs

You can also send multiple graphs in a single email using a SAS FILENAME statement as shown here:

            filename sendmail email to=("first.last@company.com") from=("first.last@company.com")
    attach=("c:\temp\email1.png" inlined='sgplot1'  "c:\temp\email2.png" inlined='sgplot2')
    type='text/html' subject="Emailing graphics output";

Then, create custom HTML output using DATA step code similar to the following:

     Data _null_;  
       file sendmail;  
 put '<html>';
 put '<body>';
 put '<img src=cid:sgplot1>';
 put '<img src=cid:sgplot2>';
 put '</body>';
 put '</html>';
    run;

Embedding a Graph and a PROC PRINT Table

This example shows how to embed a graph and PRINT procedure table in one email. Let us assume that you have a graph named sgplot.png stored in C:\Temp. You want to send an email using SAS that displays the SGPLOT graph in the body of the email directly before a table created with PROC PRINT. The following sample code demonstrates how to do this using a TITLE statement with PROC PRINT:

filename sendmail email  to=("first.last@company.com") from=("first.last@company.com")
     attach=("c:\temp\sgplot.png" inlined='sgplot') 
     type='text/html' subject="Email test of GRAPH output";
      ods _all_ close; 
ods html file=sendmail; 
title1 '<img src=cid:sgplot>';
proc print data=sashelp.class; 
run;
ods html close; 
ods listing; 
filename sendmail clear;

Embedding a Graph (Complete Program)

Here is a complete sample program that demonstrates embedding graphics in an email using graphics output created with the SGPLOT procedure:

%let workdir=%trim(%sysfunc(pathname(work)));
ods _ALL_ close; 
ods listing gpath="&workdir";
ods graphics / reset=index outputfmt=PNG imagename='email';  
title1 'Graph output emailed using SAS';
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
run;
filename sendmail email to=("first.last@company.com") from=("first.last@company.com")
     attach=("&workdir./email.png" inlined='sgplot')
     type='text/html' subject="Emailing graphics output";
      data _null_;
 file sendmail;  
 put '<html>';
 put '<body>';
 put '<img src=cid:sgplot>';
 put '</body>';
 put '</html>';
run; 
      filename sendmail clear;

In conclusion, if you are running SAS 9.4 TS1M2 or later, using the INLINED option in a FILENAME statement is an excellent option when emailing graphics output.  Note that you can use this technique to email any graphics file in PNG, GIF, or JPEG format created with the SAS SG procedures, ODS Graphics, or SAS/GRAPH® procedures (such as GPLOT and GCHART).  You can also use this technique to email graphics files created with software other than SAS.

Use SAS to send an email that embeds a graph in the body of the email was published on SAS Users.

4月 132017
 
The Penitent Magdalene

Titian (Tiziano Vecellio) (Italian, about 1487 - 1576) The Penitent Magdalene, 1555 - 1565, Oil on canvas 108.3 × 94.3 cm (42 5/8 × 37 1/8 in.) The J. Paul Getty Museum, Los Angeles; Digital image courtesy of the Getty's Open Content Program.

Even if you are a traditional SAS programmer and have nothing to do with cybersecurity, you still probably have to deal with this issue in your day-to-day work.

The world has changed, and what you do as a SAS programmer is not just between you and your computer anymore. However, I have found that many of us are still careless, negligent or reckless enough to be dangerous.

Would you scotch-tape your house key to the front door next to the lock and go on vacation? Does anybody do that? Still, some of us have no problem explicitly embedding passwords in our code.

That single deadly sin, the thing that SAS programmers (or any programmers) must not do under any circumstances, is placing unmasked passwords into their code. I must confess, I too have sinned, but I have seen the light and hope you will too.

Password usage examples

Even if SAS syntax calls for a password, never type it or paste it into your SAS programs. Ever.

If you connect to a database using a SAS/ACCESS LIBNAME statement, your libname statement might look like:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password=mypwd1;

If you specify the LIBNAME statement options for the metadata engine to connect to the metadata server, it may look like:

libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw=abcdefg;

If you use LIBNAME statement options for the metadata engine to connect to a database, it may look like:

libname oralib meta library=oralib dbuser=orauser dbpassword=orapw;

In all of the above examples, some password is “required” to be embedded in the SAS code. But it does not mean you should put it there “as is,” unmasked. SAS program code is usually saved as a text file, which is stored on your laptop or somewhere on a server. Anyone who can get access to such a file would immediately get access to those passwords, which are key to accessing databases that might contain sensitive information. It is your obligation and duty to protect this sensitive data.

Hiding passwords in a macro variable or a macro?

I’ve seen some shrewd SAS programmers who do not openly put passwords in their SAS code. Instead of placing the passwords directly where the SAS syntax calls for, they assign them to a macro variable in AUTOEXEC.SAS, an external SAS macro file, a compiled macro or some other SAS program file included in their code, and then use a macro reference, e.g.

/* in autoexec.sas or external macro */
%let opw = mypwd1;
 
/* or */
 
%macro opw;
	mypwd1
%mend opw;
/* in SAS program */
libname mydblib oracle user=myusr1 password=&opw
        path=airdb_remote schema=hrdept;
 
/* or */
 
libname mydblib oracle user=myusr1 password=%opw
        path=airdb_remote schema=hrdept;

Clever! But it’s no more secure than leaving your house key under the door mat. In fact it is much less secure. One who wants to look up your password does not even need to look under the door mat, oh, I mean look at your program file where the password is assigned to a macro variable or a macro. For a macro variable, a simple %put &opw; statement will print the password’s actual value in the SAS log. In case of a macro, one can use %put %opw; with the same result.

In other words, hiding the passwords do not actually protect them.

What to do

What should you do instead of openly placing or concealing those required passwords into your SAS code? The answer is short: encrypt passwords.

SAS software provides a powerful procedure for encrypting passwords that renders them practically unusable outside of the SAS system.

This is PROC PWENCODE, and it is very easy to use. In its simplest form, in order to encrypt (encode) your password abc123 you would need to submit just the following two lines of code:

proc pwencode in="abc123";
run;

The encrypted password is printed in the SAS log:

1 proc pwencode in=XXXXXXXX;
2 run;

{SAS002}3CD4EA1E5C9B75D91A73A37F

Now, you can use this password {SAS002}3CD4EA1E5C9B75D91A73A37F in your SAS programs. The SAS System will seamlessly take care of decrypting the password during compilation.

The above code examples can be re-written as follows:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password="{SAS002}9746E819255A1D2F154A26B7";
 
libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw="{SAS002}9FFC53315A1596D92F13B4CA";
 
libname oralib meta library=oralib dbuser=orauser
dbpassword="{SAS002}9FFC53315A1596D92F13B4CA";

Encryption methods

The {SAS002} prefix indicates encoding method. This SAS-proprietary encryption method which uses 32-bit key encryption is the default, so you don’t have to specify it in the PROC PWENCODE.

There are other, stronger encryption methods supported in SAS/SECURE:

{SAS003} – uses a 256-bit key plus 16-bit salt to encode passwords,

{SAS004} – uses a 256-bit key plus 64-bit salt to encode passwords.

If you want to encode your password with one of these stronger encryption methods you must specify it in PROC PWENCODE:

proc pwencode in="abc123" method=SAS003;
run;

SAS Log: {SAS003}50374C8380F6CDB3C91281FF2EF57DED10E6

proc pwencode in="abc123" method=SAS004;
run;

SAS Log: {SAS004}1630D14353923B5940F3B0C91F83430E27DA19164FC003A1

Beyond encryption

There are other methods of obscuring passwords to protect access to sensitive information that are available in the SAS Business Intelligence Platform. These are the AUTHDOMAIN= SAS/ACCESS option supported in LIBNAME statements, as well as PROC SQL CONNECT statements, SAS Token Authentication, and Integrated Windows Authentication. For more details, see Five strategies to eliminate passwords from your SAS programs.

Conclusion

Never place unencrypted password into your SAS program. Encrypt it!

Place this sticker in front of you until you memorize it by heart.

PROC PWENCODE sticker

 

One deadly sin SAS programmers should stop committing was published on SAS Users.

4月 012017
 

Solar farm on SAS campus The full text of Fermat's statement, written in Latin, reads "Cubum autem in duos cubos, aut quadrato-quadratum in duos quadrato-quadratos, et generaliter nullam in infinitum ultra quadratum potestatem in duos eiusdem nominis fas est dividere cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet."

The English translation is: "It is impossible for a cube to be the sum of two cubes, a fourth power to be the sum of two fourth powers, or in general for any number that is a power greater than the second to be the sum of two like powers. I have discovered a truly marvelous demonstration of this proposition that this margin is too narrow to contain."

Here at SAS, we don’t take challenges lightly. After a short but intensive brainstorming, we came up with a creative and powerful SAS code that effectively proves this long-standing theorem. And it is so simple and short that not only can it be written on the margins of this blog, it can be tweeted!

Drum roll, please!

Here is the SAS code:

data _null_; 
	do n=3 by 1; 
		do a=1 by 1; 
			do b=1 by 1; 
				do c=1 by 1; 
					e = a**n + b**n = c**n;
					if e then stop; 
				end;
			end; 
		end;
	end;
run;

Or written compactly, without unnecessary spaces:

data _null_;do n=3 by 1;do a=1 by 1;do b=1 by 1;do c=1 by 1;e=a**n+b**n=c**n;if e then stop;end;end;end;end;run;

which is exactly 112 character long – well below the Twitter 140-character threshold.

Don’t be fooled by the utter simplicity and seeming unfeasibility of this code.  For the naysayers, let me clarify that we run this code in a distributed multithreaded environment where each do-loop runs as a separate thread.

We also use some creative coding techniques:

1.     Do-loop with just two options, count= and by=, but without the to= option (e.g. do c=1 by 1;). It is a valid syntax in SAS and serves the purpose of creating infinite loops when they are necessary (like in this case). You can easily test it by running the following SAS code snippet:

data _null_;
	start = datetime();
	do i=1 by 1;
		if intck('sec',start,datetime()) ge 20 then leave;
	end;
run;

The if-statement here is added solely for the purpose of specifying a wait time (e.g. 20) sufficient for persuading you in the loop’s infiniteness. Skeptics may increase this number to their comfort level or even remove (or comment out) the if-statement and enjoy the unconstrained eternity.

2.     Expression with two “=” signs in it (e.g. e = a**n + b**n = c**n;) Again, this is a perfectly valid expression in SAS and serves the purpose of assigning a variable the value of 0 or 1 resulting from a logical comparison operation. This expression can be rewritten as

e = a**n + b**n eq c**n;

or even more explicitly as

e = (a**n + b**n eq c**n);

As long as the code runs, the theorem is considered proven. If it stops, then the theorem is false.

You can try running this code on your hardware, at your own risk, of course.

We have a dedicated 128-processor UNIX server powered by an on-campus solar farm that has been autonomously running the above code for 40 years now, and there was not a single instance when it stopped running. Except pausing for the scheduled maintenance and equipment replacements.

During the course of this historic experience, we have accumulated an unprecedented amount of big data (all in-memory), converted it into event stream processing, and become a leader in data mining and business analytics.

This leads us to the following scientific conclusion: whether you are a pure mathematician or an empiricist, you can rest assured that Fermat's Last Theorem has been proven with a probability asymptotic to 1 beyond a reasonable doubt.

Have a happy 91-st day of the year 2017!

 

SAS code to prove Fermat's Last Theorem was published on SAS Users.

4月 012017
 

There's an old song that starts out, "You Can Get Anything You Want at Alice's Restaurant."  Well, maybe you are too young to know that song, but if you’re a SAS users, you’ll be glad to know that you can capture anything produced by any SAS procedure (even if the [...]

The post Capturing output from any procedure with an ODS OUTPUT statement appeared first on SAS Learning Post.

3月 292017
 

There is a well-known Russian saying that goes “Если нельзя, но очень хочется, то можно.” The English translation of it can span anywhere from “If you can’t, but want it badly, then you can” to “If you shouldn’t, but want it badly, then you should” to “If you may not, but want it badly, then you may.” Depending on your situation, any possible combination of “may,” “can,” or “should” may apply. You can even replace “want” with “need” to get a slightly different flavor.

There are known means of modifying variable attributes with PROC DATASETS, but they are limited to variable name, format, informat, and label. But what if we want/need to modify a variable length, or change a variable type? And I am not talking about creating a new variable with a different length or converting a numeric variable value into a character value of another variable. We want to change variable type and/or variable length in place, without adding new variables. If you believe it can’t be done, read the first paragraph again.

Imagine that we have two data tables that we need to concatenate into one table. However, there is one common variable that is of different type in each table – in the first table it is numeric, but in the second table it is character.

Sample data

Let’s create some sample data to emulate our situation by running the following SAS code:

libname sasdl 'C:PROJECTS_BLOG_SASchanging_variable_type_and_length_in_sas_datasets';
 
/* create study2016 data table */
data sasdl.study2016;
	length subjid dob 8 state $2 start_date end_date 8;
	infile datalines truncover;
	input subjid dob : mmddyy10. state start_date : mmddyy10. end_date : mmddyy10.;
	format dob start_date end_date mmddyy10.;
	datalines;
123456 08/15/1960 MD 01/02/2016
234567 11/13/1970 AL 05/12/2016 12/30/2016
;
 
/* create study2017 data table */
data sasdl.study2017;
	length subjid $6 dob 4 state $2 start_date end_date 4;
	infile datalines truncover;
	input subjid dob : mmddyy10. state start_date : mmddyy10. end_date : mmddyy10.;
	format dob start_date end_date mmddyy10.;
	datalines;
987654 03/15/1980 VA 02/13/2017
876543 11/13/1970 NC 01/11/2017 01/30/2017
765432 12/15/1990 NY 03/14/2017
;

The produced data tables will look as follows:

Table STUDY2016:
SAS data table 1

Table STUDY2017:
SAS data table 2

If we look at the tables’ variable properties, we will see that the subjid variable is of different type in these two data tables: it is of type Numeric (length of 8) in STUDY2016 and of type Character (length of 6) in STUDY2017:

SAS variables properties

Also, notice that variables dob, start_date, and end_date, although of the same Numeric type, have different length attributes - 8 in the STUDY2016 table, and 4 in the STUDY2017 table.

Data table concatenating problem

If we try to concatenate these two tables using PROC APPEND, SAS will generate an ERROR:

proc append base=sasdl.study2016 data=sasdl.study2017;
run;
 
NOTE: Appending SASDL.STUDY2017 to SASDL.STUDY2016.
WARNING: Variable subjid not appended because of type mismatch.
WARNING: Variable dob has different lengths on BASE and DATA
         files (BASE 8 DATA 4).
WARNING: Variable start_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
WARNING: Variable end_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
ERROR: No appending done because of anomalies listed above.
       Use FORCE option to append these files.
NOTE: 0 observations added.

Even if we do use the FORCE option as the ERROR message suggests, the result will be disappointing:

proc append base=sasdl.study2016 data=sasdl.study2017 force;
run;
 
NOTE: Appending SASDL.STUDY2017 to SASDL.STUDY2016.
WARNING: Variable subjid not appended because of type mismatch.
WARNING: Variable dob has different lengths on BASE and DATA
         files (BASE 8 DATA 4).
WARNING: Variable start_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
WARNING: Variable end_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
NOTE: FORCE is specified, so dropping/truncating will occur.

The resulting data table will have missing values for the appended subjid:

Missing values in SAS table after PROC APPEND

Solution

In order to concatenate these tables, we must make the mismatching variable subjid of the same type in both data tables, either Character or Numeric. Making them both of Character type seems more robust, since it would allow for the values to contain both digit and non-digit characters. But if you know for sure that the value contains only digits, making them both Numeric works just as well.

Let’s say we decide to make them of Character type. Also note that our numeric variables representing dates (dob, start_date and end_date) are of different lengths: they are length 8 in STUDY2016 and length 4 in STUDY2017. Let’s make them the same length as well. From the standpoint of numerical accuracy in SAS for the dates, a length of 4 seems to be quite adequate to represent them accurately.

Let’s apply all our modifications to the STUDY 2016 dataset. Even though we are going to re-build the dataset in order to modify variable type and length, we are going to preserve the variable order so it feels like we just modified those variable attributes.

Here is how it can be done.

/* create macrovariable varlist containing a list of variable names */
proc sql noprint;
	select name into :varlist separated by ' '
	from sashelp.vcolumn
	where upcase(libname) eq 'SASDL' and upcase(memname) eq 'STUDY2016';
quit;
 
/* modify variable type and length */
data sasdl.study2016 (drop=v1-v4);
	retain &varlist; *<-- preserve variable order ;
	length subjid $6 dob start_date end_date 4; *<-- define new types/lengths ;
	format dob start_date end_date mmddyy10.;   *<-- recreate formats ;
	set sasdl.study2016 (rename=(subjid=v1 dob=v2 start_date=v3 end_date=v4));
	subjid = put(v1,6.); *<-- redefine subjid variable ;
	dob = v2;            *<-- redefine dob variable ;
	start_date = v3;     *<-- redefine start_date variable ;
	end_date = v4;       *<-- redefine end_date variable ;
run;
 
/* make sure new concatenated file (study_all) does not exist */
proc datasets library=sasdl nolist;
	delete study_all;
quit;
 
/* append both (study2016 and study2017) to study_all */
proc append base=sasdl.study_all data=sasdl.study2016;
run;
proc append base=sasdl.study_all data=sasdl.study2017;
run;

In this code, first, using proc sql and SAS view sashelp.vcolumn, we create a macro variable varlist to hold the list of all the variable names in our table, sasdl.study2016.

Then in the data step, we use a retain statement to preserve the variable order. When we read the sasdl.study2016 dataset using the set statement, we rename our variables-to-be-modified to some temporary names (e.g. v1 – v4) which we eventually drop in the data statement.

Then we re-assign the values of those temporary variables to the original variable names, thereby essentially creating new variables with new type and length. Since these new variables are named exactly as the old ones, the effect is as if their type and length attributes where modified, while in fact the whole table was rebuilt and replaced. Problem solved.

When we concatenate the data tables we create a new table sasdl.study_all. Before concatenating our two tables using proc append twice, we use proc datasets to delete that new table first. Even if the table does not exist, proc datasets will at least attempt to delete it. With all the seeming redundancy of this step, you will definitely appreciate it when you try running this code more than one time.

Changing variable type and variable length in SAS datasets was published on SAS Users.

3月 282017
 

I have been using the SAS Viya environment for just over six months now and I absolutely love it.  As a long-time SAS coder and data scientist I’m thrilled with the speed and greater accuracy I’m getting out of a lot of the same statistical techniques I once used in SAS9.  So why would a data scientist want to switch over to the new SAS Viya platform? The simple response is “better, faster answers.”  There are some features that are endemic to the SAS Viya architecture that provide advantages, and there are also benefits specific to different products as well.  So, let me try to distinguish between these.

SAS Viya Platform Advantages

To begin, I want to talk about the SAS Viya platform advantages.  For data processing, SAS Viya uses something called the CAS (Cloud Analytic Services) server – which takes the place of the SAS9 workspace server.  You can still use your SAS9 installation, as SAS has made it easy to work between SAS9 and SAS Viya using SAS/CONNECT, a feature that will be automated later in 2017.

Parallel Data Loads

One thing I immediately noticed was the speed with which large data sets are loaded into SAS Viya memory.  Using Hadoop, we can stage input files in either HDFS or Hive, and CAS will lift that data in parallel into its pooled memory area.  The same data conversion is occurring, like what happened in SAS9, but now all available processors can be applied to load the input data simultaneously.  And speaking of RAM, not all of the data needs to fit exactly into memory as it did with the LASR and HPA procedures, so much larger data sets can be processed in SAS Viya than you might have been able to handle before.

Multi-threaded DATA step

After initially loading data into SAS Viya, I was pleased to learn that the SAS DATA step is multi-threaded.  Most of your SAS9 programs will run ‘as is,’ however the multi-processing really only kicks in when the system finds explicit BY statements or partition statements in the DATA step code.  Surprisingly, you no longer need to sort your data before using BY statements in Procs or DATA steps.  That’s because there is no Proc Sort anymore – sorting is a thing of the past and certainly takes some getting used to in SAS Viya.  So for all of those times where I had to first sort data before I could use it, and then execute one or more DATA steps, that all transforms into a more simplified code stream.   Steven Sober has some excellent code examples of the DATA step running in full-distributed mode in his recent article.

Open API’s

While all of SAS Viya’s graphical user interfaces are designed with consistency of look and feel in mind, the R&D teams have designed it to allow almost any front-end or REST service submit commands and receive results from either CAS or its corresponding micro-service architecture.  Something new I had to learn was the concept of a CAS action set.  CAS action sets are comprised of a number of separate actions which can be executed singly or with other actions belonging to the same set.  The cool thing about CAS actions is that there is one for almost any task you can think about doing (kind of like a blend between functions and Procs in SAS9).  In fact, all of the visual interfaces SAS deploys utilize CAS actions behind the scenes and most GUI’s will automatically generate code for you if you do not want to write it.

But the real beauty of CAS actions is that you can submit them through different coding interfaces using the open Application Programming Interface’s (API’s) that SAS has written to support external languages like Python, Java, Lua and R (check out Github on this topic).  The standardization aspect of using the same CAS action within any type of external interface looks like it will pay huge dividends to anyone investing in this approach.

Write it once, re-use it elsewhere

I think another feature that old and new users alike will adore is the “write-it-once, re-use it” paradigm that CAS actions support.  Here’s an example of code that was used in Proc CAS, and then used in Jupyter notebook using Python, followed by a R/REST example.

Proc CAS

proc cas;
dnnTrain / table={name  = 'iris_with_folds'
                   where = '_fold_ ne 19'}
 modelWeights = {name='dl1_weights', replace=true}
 target = "species"
 hiddens = {10, 10} acts={'tanh', 'tanh'}
 sgdopts = {miniBatchSize=5, learningRate=0.1, 
                  maxEpochs=10};
run;

 

Python API

s.dnntrain(table = {‘name’: 'iris_with_folds’,
                                  ‘where’: '_fold_ ne 19},
   modelweights = {‘name’: 'dl1_weights', ‘replace’: True}
   target  = "species"
   hiddens  = [10, 10], acts=['tanh', ‘tanh']
   sgdopts  = {‘miniBatchSize’: 5, ‘learningRate’: 0.1, 
                      ‘maxEpochs’: 10})

 

R API

cas.deepNeural.dnnTrain(s,
  table = list(name = 'iris_with_folds’
                   where = '_fold_ ne 19’),
  modelweights = list({name='dl1_weights', replace=T),
  target = "species"
  hiddens = c(10, 10), acts=c('tanh', ‘tanh‘)
  sgdopts = list(miniBatchSize = 5, learningRate = 0.1,
                   maxEpochs=10))

 

See how nearly identical each of these three are to one another?  That is the beauty of SAS Viya.  Using a coding approach like this means that I do not need to rely exclusively on finding SAS coding talent anymore.  Younger coders who usually know several open source languages take one look at this, understand it, and can easily incorporate it into what they are already doing.  In other words, they can stay in coding environments that are familiar to them, whilst learning a few new SAS Viya objects that dramatically extend and improve their work.

Analytics Procedure Advantages

Auto-tuning

Next, I want address some of the advantages in the newer analytics procedures.  One really great new capability that has been added is the auto-tuning feature for some machine learning modeling techniques, specifically (extreme) gradient boosting, decision tree, random forest, support vector machine, factorization machine and neural network.  This capability is something that is hard to find in the open source community, namely the automatic tuning of major option settings required by most iterative machine learning techniques.  Called ‘hyperspace parameters’ among data scientists, SAS has built-in optimizing routines that try different settings and pick the best ones for you (in parallel!!!).  The process takes longer to run initially, but, wow, the increase in accuracy without going through the normal model build trial-and-error process is worth it for this amazing feature!

Extreme Gradient Boosting, plus other popular ML techniques

Admittedly, xgboost has been in the open source community for a couple of years already, but SAS Viya has its own extreme[1] gradient boosting CAS action (‘gbtreetrain’) and accompanying procedure (Gradboost).  Both are very close to what Chen (2015, 2016) originally developed, yet have some nice enhancements sprinkled throughout.  One huge bonus is the auto-tuning feature I mentioned above.  Another set of enhancements include: 1) a more flexible tree-splitting methodology that is not limited to CART (binary tree-splitting), and 2) the handling of nominal input variables is done automatically for you, versus ‘one-hot-encoding’ you need to perform in most open source tools.  Plus, lots of other detailed option settings for fine tuning and control.

In SAS Viya, all of the popular machine learning techniques are there as well, and SAS makes it easy for you to explore your data, create your own model tournaments, and generate score code that is easy to deploy.  Model management is currently done through SAS9 (at least until the next SAS Viya release later this year), but good, solid links are provided between SAS Viya and SAS9 to make transferring tasks and output fairly seamless.  Check out the full list of SAS Viya analytics available as of March 2017.

In-memory forecasting

It is hard to beat SAS9 Forecast Server with its unique 12 patents for automatic diagnosing and generating forecasts, but now all of those industry-leading innovations are also available in SAS Viya’s distributed in-memory environment. And by leveraging SAS Viya’s optimized data shuffling routines, time series data does not need to be sorted, yet it is quickly and efficiently distributed across the shared memory array. The new architecture also has given us a set of new object packages to make more efficient use of the data and run faster than anything witnessed before. For example, we have seen 1.5 million weekly time series with three years of history take 130 hours (running single-machine and single-threaded) and reduce that down to run in 5 minutes on a 40 core networked array with 10 threads per core. Accurately forecasting 870 Gigabytes of information, in 5 minutes?!? That truly is amazing!

Conclusions

Though I first ventured into SAS Viya with some trepidation, it soon became clear that the new platform would fundamentally change how I build models and analytics.  In fact, the jumps in performance and the reductions in time spent to do routine work has been so compelling for me, I am having a hard time thinking about going back to a pure SAS9 environment.  For me it’s all about getting “better, faster answers,” and SAS Viya allows me to do just that.   Multi-threaded processing is the way of the future and I want to be part of that, not only for my own personal development, but also because it will help me achieve things for my customers they may not have thought possible before.  If you have not done so already, I highly recommend you to go out and sign up for a free trial and check out the benefits of SAS Viya for yourself.


[1] The definition of ‘extreme’ refers only to the distributed, multi-threaded aspect of any boosting technique.

References

Chen , Tianqi and Carlos Guestrin , “XGBoost: Reliable Large-scale Tree Boosting System”, 2015

Chen , Tianqi and Carlos Guestrin, “XGBoost: A Scalable Tree Boosting System”, 2016

Using the Robust Analytics Environment of SAS Viya was published on SAS Users.

3月 232017
 

SAS® users have an easy and convenient way to quickly obtain useful information (referred to as metadata) about their SAS session with a number of read-only SAS DICTIONARY tables or SASHELP views. At any time during a SAS session, information about currently defined system options, libnames, tables, columns and their [...]

The post Exploring the content of the DICTIONARIES table and VSVIEW SASHELP view appeared first on SAS Learning Post.

3月 222017
 

Editor's note: The following post is from Scott Leslie, PhD, Manager of Advanced Analytics for MedImpact Healthcare Systems, Inc. Scott will be one of the Code Doctors at SAS Global Forum 2017.

Learn more about Scott.

VISIT THE CODE CLINIC AT SASGF 2017

$0 copay, no deductible.  No waiting rooms, no outdated magazines. What kind of doctor’s office is this? While we might not be able to help with that nasty cough, SAS Code Doctors are here to help – when it comes to your SAS code, that is.

Yes, the Code Doctors return to SAS Global Forum 2017! This year the Code Clinic will have over 20 SAS experts on-call to answer your questions on syntax, SAS Solutions, best practices and concepts across a broad range of SAS topics/applications, including Base SAS, macros, report writing, ODS, SQL, SAS Enterprise Guide, statistics, and more. It’s a fantastic opportunity to review code, ask questions, develop and brainstorm with peers who have decades of experience using SAS. Bring your code on paper, a flash drive, or a laptop. We’ll have 3-4 laptops with several versions of SAS software installed: 9.1.3 to 9.4 and EG 4.1 to 7.1. And if we can’t answer your coding question at the clinic, we can easily refer you to a specialist, namely the SAS R&D section of the Quad.

So, take advantage of this personalized learning experience in the Lower Quad area of the conference. Clinic office hours are:

  • Monday 4/3, 10:00 am - 3:30 pm
  • Tuesday 4/3, 9:30 am – 2:00 pm and 3:30 pm – 6:00 pm

Here’s the detailed schedule of our all-star code doctor lineup. If you haven’t heard of these names yet, you have now...

/*Just by reading this blog…*/.

 

About Scott Leslie

Scott Leslie, PhD, is Manager of Advanced Analytics for MedImpact Healthcare Systems, Inc. with over15 years of SAS® experience in the pharmacy benefits and medical management field. His SAS knowledge areas include SAS/STAT, Enterprise Guide, and Visual Analytics. Scott presents at local, regional and international SAS user group conferences as well at various clinical and scientific conferences. He is a former executive committee member of the Western Users of SAS Software (WUSS) and contributes to the San Diego SAS Users’ Group (SANDS).

Visit the code clinic at SAS Global Forum was published on SAS Users.