Tech

6月 052017
 

Refining your data for effective reports is even easier in the 8.1 release of SAS Visual Analytics. In this blog post, I’ll take a look at the data pane, how it displays data from your active data source, and a few tasks that you might want to perform, such as viewing measure details, changing data item properties, and creating geographic data items, hierarchies, and custom categories.  In a future blog, I’ll look at creating filters, new calculated data items, and calculated aggregations.

In the SAS Visual Analytics interface, you can display the Data pane by clicking the Data icon on the left menu.

Measure details

You can view data item properties and make these changes:

  • The name of a data item.
  • The Classification of Measure or Category data item.
  • The format of a Measure or Date/Time/Datetime data item.
  • The aggregation of a Measure data item.
  • Note:  User-defined formats for Category data items are honored, but cannot be changed.  The user-defined format must be available to the CAS server where the data is loaded.

A Measure classification can be changed to Category or Geography. A Category classification can be changed to Geography.  Date and Tim classifications cannot be changed.

You can modify the aggregation for a Measure data item:

You can modify measure formats:

  • You can expand to display available Duration and Currency formats.
  • Over thirty different Currency formats are available for local or international currency.  (Example: $56,790 or USD56,789)

Geography data items identity a geographic region or location, and are typically used to visualize data on a geographic map.

  • You can create geography data items by using predefined geography classifications such as countries or states.
  • You can create a custom geographic classification by providing latitude and longitude coordinates in your data source.
  • For predefined classifications, the values of your category data items must match the lookup values documented here.

Designating a Geography data item:

Designating a custom Geography data item:

A hierarchy enables you to add drill-down functionality to a report.

  • A hierarchy can display information from general to specific.
  • You can create category, date/datetime, or geography hierarchies.
  • Not all report objects support all types of hierarchies.

  • A custom category is a category data item which associates a set of alphanumeric values with intervals, ranges, or specific values of a data item.
  • A custom category can be based on a category or a measure data item.
  • A custom category can provide functionality in a report similar to that of a user-defined format—without having to have a format previously associated with the data item.

Example:
1-20 = ‘First group’
21-30 = ‘Second group’
31-40 = ‘Third group’

Depending on the report object, some roles require only one value; other roles may require multiple values.

You can right-click on a data item in the Roles pane to remove the data item from a role.

Duplicating a data item can enable you to display data in two different ways.

Duplicating the Date data item with 939 different values can enable you to create a report that analyzes production on each day of the week: Monday, Tuesday, Wednesday,…

Duplicating a measure data item can enable you to use the column as a category data item in reports as well.

Note: You can only delete data items that you have created, such duplicated data items or calculated data items.  Deleting a data item means that it no longer appears in the data pane.

This covers many of the basic tasks that you can complete in the new data pane. In my next blog, I’ll take a look at the visual and text editors for creating filter expressions and calculated items and aggregations in the data pane.

The new Data Pane in SAS Visual Analytics - It's painless! was published on SAS Users.

6月 022017
 

There are several exciting new features in SAS Visual Analytics 8.1 that I know will excite you, including that you can now configure cascading prompts for the Report Prompt and Page Prompt areas! Prior to SAS Visual Analytics 8.1 there was a way to use parameters to configure cascading prompts for report and section prompts, but now all we have to do is define a Filter from the Actions Pane. It’s a lot less work and there are even a couple of different ways it can be done.

Method 1

Steps:
1.      Add Control Objects to Report or Page Prompt area.
2.      Assign Roles to Control Objects.
3.      Define a Filter from the Actions Pane.
4.      Test your cascading prompts and enhance controls.

Method 1, Step 1
Add the desired Control Objects to either the Report or Page Prompt areas.

Method 1, Step 2
Assign Roles to your Control Objects.

Method 1, Step 3
Create the cascading prompt by defining a Filter from the Action Pane. This means that for the selected data value in the source object, it will use that value to filter the target, i.e. return the rows where source prompt = selected value.

In this example, we are using our SASHELP Cars data set. Our source is the car’s Origin and the target is the car’s Make. If you pick Europe for Origin it will return the corresponding car Makes such as Audi, BMW, Saab, Volkswagen, etc.

Click on the source Control object, Origin, and from the Actions Pane, select Add filter.

Next, select the target object to filter and click OK.

You can check to be sure your action is defined by clicking on the Objects and looking at the Actions Pane.

Method 1, Step 4
Enhance your controls using the Options Pane and then test out your configured cascading prompts!

In my example, I made the Origin button bar required and changed the Selection background color and text color. I also added a bar chart to my report.

Method 2

Steps:
1.      Define a prompt hierarchy.
2.      Drag the prompt hierarchy to the Report or Page Prompt area.
3.      Test your cascading prompts.
4.      (Optional) Enhance controls.

Method 2, Step 1
Create a hierarchy for the data items for which you wish to create cascading prompts.

Method 2, Step 2
Drag your prompt hierarchy to either the Report or Page Prompt area.

Method 2, Step 3
Test your automatically configured cascading prompts! That’s right – this is another new feature. Well, it’s two combined, first is the Auto Controls. Second, if you drag a hierarchy data item to either the Report or Page Prompt area, SAS Visual Analytics automatically defines the Filter Actions for you.

Method 2, Step 4 (Optional)
Go back and enhance your Control Objects using the Options Pane as you like.

There you have it – cascading prompts anywhere in SAS Visual Analytics 8.1 reports: in Report Prompts, Page Prompts and of course still anywhere in the report canvas; and it’s as easy as defining Filter Actions. You may be wondering, is it possible to configure cascading prompts for different data sources? Not to worry – it is and I’ll show you in in my next blog.

Cascading Prompts as Report and Page Prompts in SAS Visual Analytics was published on SAS Users.

5月 312017
 

You probably know about the w.d and BESTw. formats. They have been around throughout my entire 38-year history of using SAS. Do you also know about the Dw.p and BESTDw.p formats? You might find that they are the best D formats around and are even better than BEST! The Dw.p. and BESTDw.p formats work well when the range of values is sufficiently large that different values need to be formatted by using different w.d formats. Specifying BESTDw.p (where p = w - 1) is often better than specifying BESTw. for columns of numbers, since the decimal does not shift when the last digit is 0.

     d10.1      d10.3      d10.6      d10.9    best10.  bestd10.9       10.5
 
2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9    0.00000
 5.6028E-9  5.6028E-9  5.6028E-9  5.6028E-9  5.6028E-9  5.6028E-9    0.00000
  1.523E-8   1.523E-8   1.523E-8   1.523E-8   1.523E-8   1.523E-8    0.00000
 0.0000008 8.31529E-7 8.31529E-7 8.31529E-7 8.31529E-7 8.31529E-7    0.00000
 0.0000167  0.0000167 0.00001670 0.00001670  0.0000167 0.00001670    0.00002
 0.0000454  0.0000454 0.00004540 0.00004540  0.0000454 0.00004540    0.00005
    0.0001  0.0001234 0.00012341 0.00012341 0.00012341 0.00012341    0.00012
    0.0498    0.04979  0.0497871 0.04978707 0.04978707 0.04978707    0.04979
    1.0000    1.00000   1.000000 1.00000000          1          1    1.00000
    2.7183    2.71828   2.718282 2.71828183 2.71828183 2.71828183    2.71828
    7.3891    7.38906   7.389056 7.38905610  7.3890561 7.38905610    7.38906
   54.5982   54.59815  54.598150 54.5981500   54.59815 54.5981500   54.59815
  403.4288  403.42879   403.4288 403.428793 403.428793 403.428793  403.42879
 8103.0839       8103  8103.0839 8103.08393 8103.08393 8103.08393 8103.08393
     22026      22026   22026.47 22026.4658 22026.4658 22026.4658 22026.4658
  24154953   24154953   24154953 24154952.8 24154952.8 24154952.8 24154952.8
  65659969   65659969   65659969 65659969.1 65659969.1 65659969.1 65659969.1

Here is a review of some of the basic formats for displaying numbers.

w.d displays a numeric value in a field w positions wide, with d positions to the right of the decimal place, and with w - d - 1 positions to the left of the decimal place. Use it when you know the range of value (such as correlation coefficients) or when you know the typical range (such as t statistics).

Example:

x = 100 / 3;
y = -9.9999;
z = -12.3333333;
put x 7.3 / y 7.4 / z 7.4;
 33.333
-9.9999
-12.333

Notice that the decimal automatically shifts for extreme values.

BESTw. displays numeric values, in a field w positions wide, using the best format for each particular value. Decimals might not align for many of the values. This can make results difficult to read.

Example:

x1 = 100 / 3;
x2 = x1 ** 2;
x3 = 1 / x1;
x4 = x1 / 10;
x5 = 12;
format x1-x5 BEST8.;
put x1 / x2 / x3 / x4 / x5;
33.33333
1111.111
0.03
3.333333
12

Dw.p displays numeric values, in a field w positions wide, possibly with a great range of values, lining up decimals for values of similar magnitude. The precision parameter is p. The Dw.p format selects a w.d (or possibly Ew.) format for each value, with the number of decimal points d chosen so that values of similar magnitude are displayed with decimal points that line up within a column of values. For a given field width w, this internal selection of w.d formats is made subject to the precision constraint specified by p. That is, every value will be displayed with at least p significant digits shown (counting digits both to the left and the right of the decimal point).

Example:

x1 = 100 / 3;
x2 = x1 ** 2;
x3 = 1 / x1;
x4 = x1 / 10;
x5 = 12;
format x1-x5 D8.;
put x1 / x2 / x3 / x4 / x5;
33.3333
1111.1
0.0300
3.3333
12.0000

BESTDw.p displays numeric values, in a field w positions wide, lining up decimals for values of similar magnitude, and displays integers without decimals. As the name implies, this format combines the BESTw. format for integers and the Dw.p format for nonintegers.

Example:

x1 = 100 / 3;
x2 = x1 ** 2;
x3 = 1 / x1;
x4 = x1 / 10;
x5 = 12;
format x1-x5 BestD8.;
put x1 / x2 / x3 / x4 / x5;
33.3333
1111.1
0.0300
3.3333
12

Example:

x1 = 100 / 3;
x2 = x1 ** 2;
x3 = 1 / x1;
x4 = x1 / 10;
x5 = 12;
format x1-x5 BestD8.7;
put x1 / x2 / x3 / x4 / x5;
33.33333
1111.111
0.030000
3.333333
12

Specifying BESTDw.p (where p = w - 1) is better than specifying BESTw. for columns, since decimals do not shift when the last digit is 0.

Example:

proc iml;
   x     = {-2 -1.2 -.52 0 .5 1.3 1.5 2.4 3 3.5 4 4.5 5 6 7 8};
   knots = {-7 -4 -2 0 5 8 12 17};
   b     = bspline(x,4,knots);
   print b[format=bestd7.6 label='BestD7.6'] '  '
         b[format=best7. label='Best7.'];
quit;
 
BestD7.6                                   Best7.
 
0.16534 0.59550 0.23099 0.00816       0    0.16534  0.5955 0.23099 0.00816       0
0.16534 0.59550 0.23099 0.00816       0    0.16534  0.5955 0.23099 0.00816       0
0.16534 0.59550 0.23099 0.00816       0    0.16534  0.5955 0.23099 0.00816       0
0.16534 0.59550 0.23099 0.00816       0    0.16534  0.5955 0.23099 0.00816       0
0.10848 0.55622 0.31543 0.01986 7.66E-6    0.10848 0.55622 0.31543 0.01986 7.66E-6
0.04958 0.45179 0.44110 0.05718 0.00035    0.04958 0.45179  0.4411 0.05718 0.00035
0.03970 0.42132 0.46769 0.07067 0.00062     0.0397 0.42132 0.46769 0.07067 0.00062
0.01209 0.28103 0.55021 0.15260 0.00407    0.01209 0.28103 0.55021  0.1526 0.00407
0.00423 0.19585 0.56541 0.22458 0.00993    0.00423 0.19585 0.56541 0.22458 0.00993
0.00134 0.13569 0.55242 0.29216 0.01839    0.00134 0.13569 0.55242 0.29216 0.01839
0.00026 0.08757 0.51761 0.36319 0.03137    0.00026 0.08757 0.51761 0.36319 0.03137
0.00002 0.05202 0.46436 0.43334 0.05025    0.00002 0.05202 0.46436 0.43334 0.05025
      0 0.02813 0.39792 0.49737 0.07659          0 0.02813 0.39792 0.49737 0.07659
      0 0.02813 0.39792 0.49737 0.07659          0 0.02813 0.39792 0.49737 0.07659
      0 0.02813 0.39792 0.49737 0.07659          0 0.02813 0.39792 0.49737 0.07659
      0 0.02813 0.39792 0.49737 0.07659          0 0.02813 0.39792 0.49737 0.07659

Dw.p Format Details. Dw.p does not imply w.d. The p specification is a precision specification. It is not a number-of-decimal-places (d) specification. First, note that the effective values for p range from 1 to 9. Dw. (no p), Dw.0, and Dw.3 all mean the same thing: p=3. Here is a column of numbers formatted 6 different ways, illustrating various choices of p:

     d10.1      d10.2      d10.3      d10.5      d10.7      d10.9
 
2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9 2.06115E-9
 0.0000061  0.0000061 6.14421E-6 6.14421E-6 6.14421E-6 6.14421E-6
 0.0000167  0.0000167  0.0000167 0.00001670 0.00001670 0.00001670
    7.3891     7.3891    7.38906   7.389056  7.3890561 7.38905610
   20.0855    20.0855   20.08554  20.085537  20.085537 20.0855369
   54.5982    54.5982   54.59815  54.598150  54.598150 54.5981500
  148.4132   148.4132  148.41316    148.413  148.41316 148.413159
  403.4288   403.4288  403.42879    403.429  403.42879 403.428793
 1096.6332  1096.6332       1097   1096.633  1096.6332 1096.63316
 8103.0839  8103.0839       8103   8103.084  8103.0839 8103.08393
     22026      22026      22026  22026.466  22026.466 22026.4658
     59874      59874      59874  59874.142  59874.142 59874.1417
    162755     162755     162755     162755  162754.79 162754.791
    442413     442413     442413     442413  442413.39 442413.392
   1202604    1202604    1202604    1202604  1202604.3 1202604.28
   8886111    8886111    8886111    8886111  8886110.5 8886110.52
  24154953   24154953   24154953   24154953   24154953 24154952.8
  65659969   65659969   65659969   65659969   65659969 65659969.1
 178482301  178482301  178482301  178482301  178482301  178482301
 485165195  485165195  485165195  485165195  485165195  485165195

Some of the results are as follows:

  • Display at least 1 significant digit. When p = 1 (the minimum) , there are relatively few groups of values and decimals that align within those groups. Hence decimals appear in fewer of the w positions, and as a result, some values might be displayed with relatively less precision. In this case, there is one group of values that is displayed with the Ew. format and three additional groups (decimals in positions 3, 6, and not displayed).
  • Display at least 2 significant digits. When p = 2, there tend to be more ranges of values where decimals align. Hence, decimals might appear in more of the w positions, and values tend to be displayed with more precision. In this case, there is one group of values that is displayed with the Ew. format and three additional groups (decimals in positions 3, 6, and not displayed).
  • Display at least 3 significant digits. When p = 3 (the default), there will tend to be still more ranges of values where decimals align. Hence, decimals might appear in still more of the w positions, and values tend to be displayed with still more precision. In this case, there is one group of values that is displayed with the Ew. format and three additional groups (decimals in positions 3, 5, and not displayed). Notice that some values are displayed with more precision than with smaller values of p.
  • Display at least 9 significant digits. When p = 9 (the maximum), there will tend to be still more ranges of values where decimals align. Hence, decimals can appear in all but the first and last of the w positions, and values will be displayed with maximum precision. Decimals will not align for many values. In this case, there is one group of values that is displayed with the Ew. format and eight additional groups (decimals in positions 2, 8 and not displayed). Notice that most values are displayed with more precision than with smaller values of p.

As p increases, more different w.d formats are likely to get chosen, values tend to be displayed with more precision, and decimals are less likely to align. The choice of p and w are related. If you want to ensure more worst-case precision, then a wider field width is appropriate. Also, one advantage of Dw.d is that it allows more moderate width columns. If all you have is w.d, and you want to ensure reasonable precision for values of different scales, they you're pretty much forced to use very wide columns with a large number of decimal places and hope for the best.

Submit the following steps to see more values that are formatted using a variety of D and other formats. The table of formatted numbers displayed above is a subset of the results displayed by these steps.

data x(drop=i j l f);
   array d[13] d0-d12;
   length l f $ 200;
   do j = 0 to 9;
      l = catx(' ', l, cats('d', j), '=', cats('"d10.',j,'"'));
      f = catx(' ', f, cats('d', j), cats('d10.',j));
   end;
   l = catx(' ', 'label',  l, 'd10 = "best10." d11 = "bestd10.9" d12 = "10.5";');
   f = catx(' ', 'format', f, 'd10 best10. d11 bestd10.9 d12 10.5;');
   call symputx('lf', catx(' ', l, f));
   do i = -20 to 20;
      d0 = exp(i);
      do j = 2 to 13; d[j] = d0; end;
      output;
   end;
run;
 
options ls=142;
proc print noobs label; &lf run;

Letting ODS pick the format. In an ODS table template, you can use the CHOOSE_FORMAT= option to choose a format. Use it with the FORMAT_WIDTH=w option to choose a good format for the specified width (w).

CHOOSE_FORMAT=MAX_ABS
picks a good format for a single number based on the maximum absolute value. This is the simplest case.

CHOOSE_FORMAT=MAX
picks a good format for a single number based on the maximum. Values are all expected to be nonnegative so no space is reserved for a minus sign.

CHOOSE_FORMAT=MIN_MAX
picks a good format for a single number based on the minimum and the maximum. Values can be mixed, positive and negative. This option provides room for a minus sign only where it is actually needed. So if all values happen to be positive, no room is reserved for a minus sign. If values range from say -5 to 10, no extra column is needed for the minus sign. If you also specify FORMAT_NDEC=ndec in PROC TEMPLATE, ndec provides a ceiling on the maximum number of decimal places.

CHOOSE_FORMAT=COMPROMISE
picks a good w.d format for a column of numbers if they are not too heterogeneous.

History. The w.d and BESTw. formats have been around for decades. Many SAS formats were written by Rick Langston, although the original mainframe w.d and BESTw. formats precede Rick. The functionality that underlies the Dw.p format has also been around for decades, but it has not always been available as a format. Mark Little wrote it, and for many years it was only available for procedures to use internally. When SAS started making output using ODS, then that functionality was surfaced as the Dw.p format. The BESTDw.p format came much later.

Conclusions. When you explicitly specify a format, you typically specify w.d when the values have a known range (such as correlation coefficients) or Dw.p when values are more diverse. You might instead use BESTDw.p when you expect a mix of integers and nonintegers. BESTw. is not the best format for columns; it is best when used for scalars. Ew. is useful for large or small numbers or when extra precision is desired. Dw.p. and BESTDw.p work well when the range of values is large. Experiment with the precision parameter p to get the best display of your results. In particular, setting p to w - 1 and specifying BESTDw.p provides a good alternative to the BESTw. format.

The BEST D formats that you have ever seen! was published on SAS Users.

5月 192017
 

Technical Support regularly receives incoming calls from customers who have encountered the following transcoding warning:

WARNING: Some character data was lost during transcoding in the data set xxx.xxx. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding

People are not always exactly sure what this warning means nor what to do about it. No worries! This blog provides background information about why this warning occurs and offers resources that can help you understand and resolve any encoding or truncation issues that are the source of the warning above.

In most Western character sets, each character occupies only one byte of computer memory or storage. These are known as single-byte character sets (SBCS). Character sets for languages such as Japanese, Korean, and Simplified and Traditional Chinese require mixed-width encoding. The character sets for these languages are double-byte character sets (DBCS). (The term DBCS can be misleading because not all of the characters in a double-byte character set are two bytes. Thus, the term multibyte character set [MBCS] is sometimes used instead of DBCS.)

Unicode is a multibyte character set that was created to support all languages. It includes all characters from most modern written languages and historic scripts, even hieroglyphs and cuneiform. UTF-8 is one of the more common encoding forms of the Unicode standard, and this encoding is recommended as the SAS session encoding for multilingual environments if your data sources contain characters from many regions. Each character can be up to four bytes in UTF-8. UTF-8 is the SAS session encoding for SAS® Viya™. Clients such as SAS University Edition, SAS Studio, and SAS Visual Analytics typically execute statements in a server environment that run the UTF-8 encoding.

Now, let's look at how SAS handles data from different encodings along with two causes of the warning shown earlier.

Starting in SAS®9, SAS data sets store an encoding indicator in the descriptor portion. If the encoding value of the file differs from the encoding of the currently executing SAS session, the Cross Environment Data Access (CEDA) engine is invoked when SAS reads the data set. CEDA transcodes the data, by default. Transcoding is the process of converting the contents of SAS files from one encoding to another, and this process is necessary in order to read data from around the world.

When transcoding occurs, you receive an informational note about the use of CEDA. In addition, you might see the warning (shown above) in your SAS log.

The transcoding warning is issued when you have values for character variables that have either of the following characteristics:

  • The number of bytes for a character in one encoding do not match the number of bytes that are used for the same character in another encoding (as shown in the image below). If the length of the column is not wide enough to accommodate the additional bytes, truncation of the character data occurs.

  • The characters exist in one encoding, but they do not exist in another encoding. As a result, the transcoding is not successful.

If you use a procedure (for example, the PRINT or REPORT procedures), the procedure runs to completion. The variable is truncated if the additional bytes cause the value to exceed the length of the column. Alternatively, if a character from the data set does not also exist in the target encoding or if it occupies a different code point (after transcoding), the column might be blank or another character might be substituted for the original character. The procedure generates the transcoding warning.

If you have an output data set open, SAS transcodes the character data and stops writing observations to the file at the point it encounters the problematic characters. This behavior can result in either fewer observations than expected or a data set with zero observations. When this happens, SAS generates an error message instead of a warning. This error message contains the same content as the warning message.

ERROR: Some character data was lost during transcoding in the data set xxx.xxx. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding

New Video Available to Help You Resolve the Transcoding Error or Warning

Before you attempt to resolve the transcoding error or warning, it is helpful to understand a bit about encodings and how to handle multilingual SAS data. To help you in this endeavor, SAS Technical Support worked with Education and Training to create a new, short (12 minutes) video to help you quickly troubleshoot and resolve the most common reasons for the transcoding error or warning.

Click the image below to access the video. (Note: You will be prompted to create a SAS profile if you do not have one already.)

Click image to play video

 

If you want to practice the techniques in this video, you can download the data sets that are used in the video here.

This video is based on solutions that have helped customers who called SAS Technical Support for assistance. The video offers these solutions in a visual format that is easy to follow. In addition to showing you how to launch SAS in different encodings to resolve the error, the video also introduces the Character Variable Padding (CVP) engine (Read-only) for processing SAS data files that contain multilingual data.

This video is one of a number of resources available to help you. The next section lists several tips that might help prevent or resolve transcoding warnings or errors.

Helpful Tips from the Field

The following tips have been tested and used in the field, and you might find them helpful when you work with multilingual data. The first two tips are usage tips; the others are programming tips.

  • Invoke SAS in more than one encoding. For this first tip, it is important to know that if you try to change the value of the ENCODING system option during a SAS session, SAS ignores the option. When this situation occurs, the following warning is generated:

WARNING 30-12: SAS option ENCODING is valid only at startup of the SAS System. The SAS option is ignored.

If you work with data that is stored in more than one encoding, you can invoke SAS in more than one session encoding and manage the sessions easily by setting the title bar to display the session encoding in the main SAS window.

Note: This solution only works in the Microsoft Windows operating environment. Display of UTF-8 encoded data in the SAS windowing environment is not fully supported.  SAS Enterprise Guide and SAS Studio have full support for display of UTF-8 characters.

To customize the title bar of the main SAS window.

1.  In a Windows environment, select Start All Programs SAS Additional Languages.

2.  Under Additional Languages, select the shortcut for the session encoding that you want. For example, the following image shows that SAS 9.4 (Unicode Support) is selected

3.  Right-click the shortcut and select Properties. This opens the properties dialog box for that shortcut.

4. On the Shortcut tab, place your cursor at the end of the current string in the Target text box. If it is not already listed in the box, enter the location of the configuration file that you want to use by using the –CONFIG option followed by the –AWSTITLE option.

Then include in quotation marks the text that you want to see in the main SAS window, as shown in the following example:

The entire line should look similar to the following, only it will appear as one long string:

"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\
u8\sasv9.cfg" -awstitle "SAS UTF8"

The following display shows the –AWSTITLE option added to the Target field of the SAS 9.4 (Unicode Support) Properties dialog box:


5
. Click OK.

Now, when you open SAS, you can immediately recognize the session encoding based on the value in the title bar.

You can pin this shortcut to your task bar by dragging it from the Start menu to the task bar. Then you should see your custom title when you hover your cursor over the shortcut.

  • Prevent an unexpected transcoding warning or error: If you read or convert a SAS data set from LATIN1 or WLATIN1 encoding to UTF-8 that contains Microsoft Word smart quotes or dashes in character variables, it can cause truncation errors. Those are not ASCII characters, and they require more than one byte when converted to UTF-8. For example, performing the following steps causes an unexpected transcoding error or warning.
    1. You use SAS in the WLATIN1 encoding.
    2. You copy a quoted string from Microsoft Word and paste that string into the DATALINES section of a SAS program in the SAS windowing environment.
    3. You create a new variable that uses the quoted string in its value.
    4. You execute the SAS program to create a permanent SAS data set.
    5. You invoke a UTF-8 SAS session to read the WLATIN1 data set in a procedure or DATA step.

You can prevent this issue by clearing the "Straight quotes" with "smart quotes" option in Word, as shown below. This option is available by selecting File Options Proofing AutoCorrect Option. On the AutoFormat tab, the option is listed under the Replace category.

  • Set the CVP engine explicitly (for UTF-8 sessions). The transcoding warning is common if you use a UTF-8 session encoding and the input data source is in the WLATIN1 encoding. The first 128 code points of the UTF-8 code page are identical to the 7-bit ASCII encoding. If your character data only contains characters within those 128 code points, SAS does not generate a warning or error when it transcodes the data. For code points above 128, the character variables must be expanded to two or more bytes. Many characters that are used in Western European languages fall into this category. You can address this issue and avoid the warning or error by adding the CVP engine to the LIBNAME statement for your input data source.

The following example LIBNAME statement illustrates how to specify the CVP engine explicitly:

libname mySAS cvp 'SAS-library-path';
data utf8dat;
set mySAS.wlatindat;
run;

In this example, the code is submitted in a UTF-8 SAS session. The CVP engine is Read-only, and it is used to read the input data set from the WLATIN1 encoding.

  • Set the CVP engine implicitly. The CVP engine provides an easy way to convert your files and avoid truncation problems by using a default value of 1.5 times the original length of the character variable. However, that value might not be sufficient for the length of the characters in your data. The expansion length should be based on the characters that are contained in your data.

For example, if your data consists of double-byte character (DBCS) data (Chinese, Japanese, and so on), most characters are three bytes and emoji characters are four bytes. Therefore, the default 1.5 value for the CVP engine might not be enough for the number of additional bytes. A value of 2.5 is a reasonable expansion for DBCS characters, but you might need more.

In such cases, you might want to try setting the CVP engine implicitly. When you set the CVP engine implicitly, you can use either the CVPBYTES= or CVPMULTIPLIER= options. The following example uses the CVPBYTES= option to accommodate the four bytes that are required for the winking emoticon in UTF-8:

libname enc 'c:\public\encdata' cvpbytes=4;

 

  • Convert text in a character variable to another encoding. Sometimes, the CVP engine does not correct the transcoding error or warning. In these cases, Technical Support usually finds that the destination encoding does not support all of the characters found in the data. For example, if you have a UTF-8 data set that contains French, Greek, Hindi, and other characters and you are running a WLATIN1 session, SAS will successfully transcode the French and other Western European characters to WLATIN1. However, the Greek and Hindi characters cannot be transcoded because there is no representation for them in the WLATIN1 encoding. As a result, an error is generated, and the data is lost.

If you have this kind of multilingual data, you can use the KPROPDATA() function to transcode a character string from and to any encoding and convert characters that are not represented in the target encoding to a character of your choice (a question mark or a space). Note that you must specify BINARY or ANY as the input encoding to prevent CEDA from transcoding your data. (See the next tip for information about suppressing transcoding.) A macro is available in the technical paper Multilingual Computing with SAS 9.4 (on page 7) that you can run to accomplish these tasks.

  • Suppress transcoding. There are certain cases in which you might want to bypass transcoding errors. For example, suppose that you know that your data contains only ASCII characters. You are providing data sets to users and you do not know whether the users will run SAS in WLATIN1 or UTF-8 session encodings. Because the characters are ASCII, SAS does not need to transcode them. You do not want CEDA notes nor the extra overhead that CEDA requires. In such a case, you can suppress transcoding without risking data loss. In the LIBNAME statement for the output data set, specify either the OUTENCODING=ASCIIANY option or the ENCODING=ASCIIANY data set option.

Example 1:

libname final 'SAS-library-path' outencoding=asciiany; 
proc copy in=sashelp out=final noclone;
   select dsname;
run;

Example 2:

libname final 'SAS-library-path'
data final.class(encoding=asciiany);
   set dsname;
run;

The value of ASCIIANY means no transcoding occurs on ASCII machines. When the data source is open on an EBCDIC machine, SAS assumes that the data is ASCII and transcodes it into the EBCDIC session encoding. Note: The CONTENTS procedure shows the encoding value: us-ascii ASCII (ANSI).

The encoding options shown above have two other values:

  • ANY: With this value, SAS does not transcode at all, which is, effectively, binary mode.
  • EBCDICANY: With this value, transcoding only occurs on ASCII machines. SAS assumes the data is EBCDIC and transcodes the character data into the ASCII session encoding.

Additional Self-Help Resources

This section lists several documents that provide more detail about encoding concepts and suggests ways to handle transcoding problems.

  • Encoding: Helping SAS speak your language. This blog by Bari Lawhorn briefly explains what an encoding is, how to determine the default encoding, what to do if your encoding differs from that of other people with whom you share SAS data, and where to find more in-depth information about encodings.
  • Multilingual Computing with SAS® 9.4: This paper, which describes new and enhanced features for handling multiple languages in SAS 9.4, covers topics such as installing and configuring the SAS System, changing the locale of a deployment, working with multibyte data, multilingual support in SAS® Visual Analytics, and more
  • SAS® and UTF-8: Ultimately the Finest. Your Data and Applications Will Thank You!: This SAS Global Forum 2017 paper by Elizabeth Bales and Wei Zheng is a perfect go-to resource when you are migrating data files and formats from another encoding to UTF-8.
  • SAS® Encoding: Understanding the Details: This SAS Press book, by Manfred Kiefer, explains the basics about character encoding that are required for creating, manipulating, and rendering any type of character. This book also provide examples for troubleshooting a variety of encoding problems.
  • SAS® 9.4 National Language Support (NLS): Reference Guide, Fifth Edition. This user guide provides detailed information about encodings, transcoding, and other NLS topics (including dictionaries of NLS autocall macros, data set options, system options, formats, informats, functions, and procedures)
  • SAS Note 15597, "How to convert SAS data set encoding." This note explains how you can convert the encoding for a SAS data set.
  • SAS Note 52716, "The error "Some character data was lost during transcoding in the data set" occurs when the data set encoding does not match the SAS® session encoding."
  • SAS Sample 55054, "PROC SQL can be used to identify data representation and encoding for all data sets in a library:" This sample enables you to query a library of data sets that are in mixed encodings so you can plan to use them in compatible session encodings or you can convert the data sets to another encoding. Using the SQL procedure in this sample, you can create output similar to the following that shows you the data representation and encoding for all data sets in your library.

 

 

Demystifying and resolving common transcoding problems was published on SAS Users.

5月 092017
 

Microservices are a key component of the SAS Viya architecture. In this post, I’ll introduce and explain the benefits of microservices. In a future post we’ll dig deeper into the microservices architecture.

What are microservices?

When we look at SAS Viya architecture diagrams, we can find, among the new core components, microservices.

Microservices are self-contained, lightweight pieces of software that

  • Do one thing.
  • Depend on one another to the least extent possible.
  • Are deployed independently.
  • Provide a language-agnostic API.
  • Can run one or more instances of these processes at any given time.

Note that the prefix “micro” doesn’t mean small in CPU or memory consumption. Rather, it refers to the software performing a single function or being narrow in scope.

Let’s compare to SAS 9

The SAS 9 Web Infrastructure Platform services and the overall platform are tightly coupled to metadata structures and schemas. Every maintenance action takes a bit of effort: can you apply a fix to a single application without first stopping the whole infrastructure? Can you upgrade one component and leave all of the other ones at the previous release? Can you…?

To address these and other issues, SAS R&D decomposed the metadata server,  the Web Infrastructure Platform, and  many web applications. As a result, we got many functional units. Each one is a microservice.

Let’s have a look at the following examples.

In SAS 9.4 we can open the SAS Management Console to manage users and groups:

In SAS Viya, we can do the same using the SAS Environment Manager web application:

You may think we simply switched to a different, web-based client. Actually, the real difference lies in the backend implementation. With SAS 9, the metadata server was responsible for servicing that functionality in addition to a host of other features. With SAS Viya, we have a dedicated microservice for it: the Identities microservice.

Here’s another example. We want to edit an option in the configuration of an application, like the address of the Open Street Map server to use with Visual Analytics geo maps. With SAS 9, we use the SAS Management Console to interact, as usual, with the metadata server.

With SAS Viya, we set the property with Environment Manager. And, guess what? We are using the Configuration microservice.

If you are curious and want to see a list of all the microservices deployed in your SAS Viya environment, you can, again, use the Environment Manager.

Note that in all these examples, the Environment Manager is simply serving as the GUI to a particular microservice supporting the associated feature.

What are the benefits of Microservices?

The move to a microservice-oriented architecture brings many benefits to all stakeholders, first and foremost to SAS users and SAS administrators.

Microservices are independently updatable

It is now easier for you to manage and maintain your environment. Hot fixes for a specific microservice are released just as normal updates, and the official installation process is documented in the

Just as with the previous point, there are a few exceptions: almost everything requires the SASLogon and Identities microservices, so, if they are down, nothing works.

Scalability and High Availability

When microservices are spun up, they self-register, making themselves available for processing requests. This way, supporting failover is as easy as ensuring you have at least two instances of the associated microservice up and running. It is possible to scale further for increased capacity/performance, and you can do so at the microservice level, based on the specific demand for each function (e.g., you likely won’t need as many instances of the Import VA SPK microservice as you do for the Authorization microservice).

Microservices are “open”

Microservices can run in different environments – bare OS, Cloud Foundry, Docker. Also, they are accessible to non-SAS developers through REST APIs. As an example, let’s say I’d like to retrieve the same properties for the SAS Administrators group that were shown above in Environment Manager. It’s as easy as calling a REST endpoint: http://<myserver>/identities/groups/SASAdministrators
The result can be in either XML or json.

In fact, even microservices communicate with one another using REST interfaces!

I hope this blog has been helpful.

Feel free to add comments or questions below.

Let’s talk about Microservices was published on SAS Users.

5月 092017
 

Microservices are a key component of the SAS Viya architecture. In this post, I’ll introduce and explain the benefits of microservices. In a future post we’ll dig deeper into the microservices architecture.

What are microservices?

When we look at SAS Viya architecture diagrams, we can find, among the new core components, microservices.

Microservices are self-contained, lightweight pieces of software that

  • Do one thing.
  • Depend on one another to the least extent possible.
  • Are deployed independently.
  • Provide a language-agnostic API.
  • Can run one or more instances of these processes at any given time.

Note that the prefix “micro” doesn’t mean small in CPU or memory consumption. Rather, it refers to the software performing a single function or being narrow in scope.

Let’s compare to SAS 9

The SAS 9 Web Infrastructure Platform services and the overall platform are tightly coupled to metadata structures and schemas. Every maintenance action takes a bit of effort: can you apply a fix to a single application without first stopping the whole infrastructure? Can you upgrade one component and leave all of the other ones at the previous release? Can you…?

To address these and other issues, SAS R&D decomposed the metadata server,  the Web Infrastructure Platform, and  many web applications. As a result, we got many functional units. Each one is a microservice.

Let’s have a look at the following examples.

In SAS 9.4 we can open the SAS Management Console to manage users and groups:

In SAS Viya, we can do the same using the SAS Environment Manager web application:

You may think we simply switched to a different, web-based client. Actually, the real difference lies in the backend implementation. With SAS 9, the metadata server was responsible for servicing that functionality in addition to a host of other features. With SAS Viya, we have a dedicated microservice for it: the Identities microservice.

Here’s another example. We want to edit an option in the configuration of an application, like the address of the Open Street Map server to use with Visual Analytics geo maps. With SAS 9, we use the SAS Management Console to interact, as usual, with the metadata server.

With SAS Viya, we set the property with Environment Manager. And, guess what? We are using the Configuration microservice.

If you are curious and want to see a list of all the microservices deployed in your SAS Viya environment, you can, again, use the Environment Manager.

Note that in all these examples, the Environment Manager is simply serving as the GUI to a particular microservice supporting the associated feature.

What are the benefits of Microservices?

The move to a microservice-oriented architecture brings many benefits to all stakeholders, first and foremost to SAS users and SAS administrators.

Microservices are independently updatable

It is now easier for you to manage and maintain your environment. Hot fixes for a specific microservice are released just as normal updates, and the official installation process is documented in the

Just as with the previous point, there are a few exceptions: almost everything requires the SASLogon and Identities microservices, so, if they are down, nothing works.

Scalability and High Availability

When microservices are spun up, they self-register, making themselves available for processing requests. This way, supporting failover is as easy as ensuring you have at least two instances of the associated microservice up and running. It is possible to scale further for increased capacity/performance, and you can do so at the microservice level, based on the specific demand for each function (e.g., you likely won’t need as many instances of the Import VA SPK microservice as you do for the Authorization microservice).

Microservices are “open”

Microservices can run in different environments – bare OS, Cloud Foundry, Docker. Also, they are accessible to non-SAS developers through REST APIs. As an example, let’s say I’d like to retrieve the same properties for the SAS Administrators group that were shown above in Environment Manager. It’s as easy as calling a REST endpoint: http://<myserver>/identities/groups/SASAdministrators
The result can be in either XML or json.

In fact, even microservices communicate with one another using REST interfaces!

I hope this blog has been helpful.

Feel free to add comments or questions below.

Let’s talk about Microservices was published on SAS Users.

5月 062017
 

As SAS Viya has been gaining awareness over the past year among SAS users, there has been a lot of discussion about how SAS’ Cloud Analytic Server (CAS) handles memory vs SAS’ previous technologies such as LASR and HPA.  Recently, while I was involved in delivering several SAS Viya enablement sessions, I realised that many, including myself, held an incorrect understanding of how this works, mainly around one particular CAS option called maxTableMem.

The maxTableMem option determines the memory block size that is used per table, per CAS Worker, before converting data to memory-mapped memory.  It is not intended to directly control how much data is put into memory vs how much is put into CAS_DISK_CACHE, but rather it indirectly influences this.

Let’s unpack that a bit and try to understand what it really means.

The CAS Controller doesn’t care what the value of maxTableMem is.  In a serial load example, the CAS Controller distributes the data evenly across the CAS Workers[1], which then fill up maxTableMem-sized buckets (memory blocks), emptying them (converting them to memory-mapped memory) as they fill up, only leaving non-full buckets of table data.  You should almost never  change the default setting of this option (16MB), except perhaps in cases of extremely large tables, in order to reduce the number of file handles (up to 256MB is probably sufficient in these cases).

CAS takes advantage of standard memory mapping techniques for the CAS_DISK_CACHE, and leaves the optimisation of it up to the OS.  With SASHDAT files and LASR in SAS 9.4, the SASHDAT file essentially acts as a pre-paged file, written in a memory-mapped format, so the table data in memory doesn’t need to be written to disk when it is paged out.  Should a table need to be dropped from memory to make room for other data, and subsequently needed to be read back in to memory, it would be paged in from the SASHDAT file.

With CAS, the CAS_DISK_CACHE allows us to extend this pre-paged file approach to all data sources, not just SASHDAT.  Traditional OS swap files are written to each time memory is paged out, however with CAS, regardless of the data source (SASHDAT, database, client-uploaded file etc.) most table memory will never need to be written to disk, as it will already exist in the backing store (this could be CAS_DISK_CACHE, HDFS or NFS).   Although data will be continually paged in and out of memory, the amount of writing to disk will be minimised, which is typically slower than reading data from disk.

Another advantage of the CAS_DISK_CACHE is that when data does need to be written to disk it can happen upfront when the server is less busy, rather than at the last moment when the system detects it is out of memory (pre-paging rather than demand-paging).  Once it is written, it can be paged back into memory multiple times, by multiple concurrent processes.  The CAS_DISK_CACHE also spreads the I/O across multiple devices and servers as opposed to a typical OS swap file that may only write to a single file on a single server.

While CAS supports exceeding memory capacity by using CAS_DISK_CACHE as a backing store, read/write disk operations do have a performance cost.  Therefore, for best performance, we recommend you have enough memory capacity to hold your  most commonly used tables, meaning  most of the time the entire table will be both in memory and the backing store.

If you expect to regularly exceed memory capacity, and therefore are frequently paging data in from CAS_DISK_CACHE, consider spreading the CAS_DISK_CACHE location across multiple devices and using newer solid state storage technologies in order to improve performance.[1]

Additionally, when you need CAS to peacefully co-exist with other applications that are sharing resources on the same nodes, standard Linux cgroup settings along with Hadoop YARN configuration can be utilised to control the resources that CAS sessions can exploit.

References

Paging

Notes

[1] There are exceptions to data being evenly distributed across the CAS Workers.  The main one is if the data is partitioned and the partitions are of different sizes – all the data of a partition must be on the same node therefore resulting in an uneven distribution.  Also, if a table is very small, it may end up on only a single node, and when CAS is co-located with Hadoop the data is loaded locally from each node, so CAS receives whatever the distribution of data is that Hadoop provides.

[2] A comprehensive analysis of all possible storage combinations and the impact on performance has not yet been completed by SAS.

Dr. StrangeRAM or: How I learned to stop worrying and love CAS was published on SAS Users.

5月 052017
 

SAS Visual Analytics 7.4 comes with a chock-full of new features. Report and section linking come with an added benefit. If you set up linking from one section to another section in the same report, or from one report to another report, you have the option to configure linking such that any filter prompt in the linked target location is brushed or highlighted by the values that are selected in the linked report object. And the visual report objects in that target location are filtered to reflect the context that was passed from the source location.

In SAS Visual Analytics 7.3, when you took a report link from the subscribed report to a target report with a filter prompt (or a target section in the current report) with a filter prompt, the target filter prompt was filtered by the selection made in the source report or source section. Now, with SAS Visual Analytics 7.4, if a selection is made in the source report, and a report link (or a section link) is taken to the target report (or target section), the target filter prompt is brushed. Users benefit from the flexibility to choose filter options from that filter prompt in the target location and modify that filter prompt selection as needed. Note that in both the source and target locations, common data sources should be used. If the data item is different, you are asked to map it.

To illustrate this new linking feature in SAS Visual Analytics 7.4, I created a source report and a target report. The source report has a Button Bar that filters the report objects in the source report. The target report contains the target Button Bar that receives the filtering selection made in the source report and displays the applicable button.

To illustrate the new linking enhancement, let’s take a look at the default scenario and the configured scenario where the values in the target report filter prompt are brushed or highlighted. Here are the two reports – the source and the target reports.

Source Report with Linking

Target Report

Default behavior for report and section linking

In this example, let’s take a quick look at how linking worked in SAS Visual Analytics 7.3 (and it still works the same way in SAS Visual Analytics 7.4 by default). In the following source report, I have a Button Bar in the filter prompt.

Choosing Orion Germany in the Button Bar

When I choose Orion Germany in the Button Bar, the report objects are filtered to show the filtered results.

Report Objects Filtered by Orion Germany in the Source Report

When I take a link from the Orion Germany tile in the Treemap to the target report, the Button Bar in the target report is filtered to show Orion Germany (this is the default behavior for linking) in the target report.

Target Report With Orion Germany in the Button Bar

But what if I want my users to take a report link from the source report, and be able to choose from the filter choices in the Button Bar within this target report?

SAS Visual Analytics 7.4 to the rescue!

Here’s an example of what I did with the report linking in SAS Visual Analytics 7.4 by allowing the filtering choices to be retained in the target filter prompt.

I chose Orion France in the Button Bar within the Source Report:

Choosing Orion France in the Source Report

Then, I took a report link from the Orion France tile in the Tilemap to the target report:

Target Report with Orion France Highlighted in the Button Bar

Notice how the Button Bar in the target report is brushed by Orion France, and I still have a choice of selecting a different Orion country in the Button Bar.

Design the Link for the Prompt Filters in the Target Report

It’s simple to make this happen.

1.  In the source report, where I had previously created report linking, I selected the Treemap and I chose to edit the report link by going to Interactions tab.

2.  I clicked the icon for editing this report link.

3.  In the Edit Report Link dialog, I selected the checkbox for Set the value for controls in the target report prompt bar and clicked OK. And I saved my source report. That’s it!

Note: This option sets only values on the controls that use the same data item as the source object or on data items that filter the source object. The source and target of report link should be based on the same data source. If you have multiple data sources, you are prompted to map the report link.

Linking to target reports and sections in SAS Visual Analytics 7.4 was published on SAS Users.

4月 282017
 

ETL automationWhen developing SAS applications, you can feed database tables into your application by using the libname access engine either by directly referring a database table, or via SAS or database views that themselves refer to one or more of the database tables.

However, such on-the-fly data access may not be efficient enough, especially for interactive SAS applications when system response time is critical. In case of distributed systems where a database might reside on one server, and the SAS Application server – on a different physical server, user wait time caused by delays in data access across networks and databases might become intolerable. In such cases, it makes perfect sense to extract database tables and load them into SAS tables in advance, preferably placing them on (or close to) the SAS Application server. That will ensure higher operational efficiency (responsiveness) of the interactive SAS application.

SAS Data Integration Studio provides a powerful visual design tool for building, implementing, and managing data integration processes across different data sources, applications, and platforms.

However, in case of multiple tables, loading them one by one using even such a powerful tool might become a bit tedious. In an automation paradigm, being “visual” is not necessarily a good thing; when we automate it implies that we want to get things done without visualizing or even seeing them.

Here is a SAS coding approach, which I recently used for a customer project, to automate the repetitive process.

SAS code example of loading multiple Oracle tables into SAS tables

There were several Oracle tables that needed to be extracted and loaded into SAS data tables on a different physical server. The Oracle tables contained multi-year data, but we were only interested in the latest year’s data. That is those tables that have datetime stamp EVENT_DT column needed to be subset to the latest rolling 365 days.

Step 1. Creating a driver table

In order to make this process data-driven, let’s first create a driver table containing a list of the table names that needed to be extracted and loaded:

/* --------------------------------------------------- */
/* create table list to extract &amp; load,                */
/* dt_flag indicates whether EVENT_DT variable present */
/* --------------------------------------------------- */
 
libname parmdl '/sas/data/parmdata';
 
data parmdl.tablelist;
        length tname $8 dt_flag $1;
        input tname dt_flag;
        datalines;
ADDRESS N
ACCOUNT Y
BENEFIT Y
EXCLUSN N
FINANCE Y
HOUSING Y
;

This program runs just once to create the driver table parmdl.tablelist.

Step 2. Loading multiple tables

The following data-driven program runs every time when you need to reload Oracle tables into SAS:

/* source ORACLE library */
libname oralib oracle path="xxx" schema="yyy" user="uuu"
 	PASSWORD="{SAS002}D2DF612A161F7F874C4EF97F" access=readonly;
 
/* target SAS library */
libname sasdl '/sas/data/appdata';
 
/* driver table SAS library */
libname parmdl '/sas/data/parmdata';
 
options symbolgen mprint mlogic;
%let cutoff_date = %eval(%sysfunc(today()) - 365);
 
/* --------------------------------------------------- */
/* loop through table list to extract & load           */
/* --------------------------------------------------- */
%macro ETL;
   %let dsid = %sysfunc(open(parmdl.tablelist));
   %syscall set(dsid);
   %do %while(%sysfunc(fetch(&dsid)) eq 0);
      data sasdl.&tname;
         set oralib.&tname;
         %if &dt_flag eq Y %then %str(where datepart(EVENT_DT) ge &cutoff_date;);
      run;
   %end;
   %let rc = %sysfunc(close(&dsid));
%mend ETL;
%ETL

In this code, we use a similar coding technique to that described in my earlier blog post Modifying variable attributes in all datasets of a SAS library. We loop through the initially created table parmdl.tablelist. in a macro %do-loop, and for each observation containing a single table name, implement a data step extracting one table at a time.

If macro variable &dt_flag=Y, then we generate and insert a where statement sub-setting the extracted table to the latest year’s data.

Note a very useful feature here %syscall set(dsid) that links SAS data set variables to macro variables that have the same name and data type. Since we have not defined those macro variables, in essence, it creates a namesake macro variable for each SAS data set variable.

Implementing data-driven load in SAS Data Integration Studio

The above SAS code is run on the SAS Application server – the same server where the SAS target library resides. To fully automate the ETL process, this code may be embedded into a SAS job in SAS Data Integration Studio and then scheduled via either Platform Suite for SAS or your operating system scheduler. Alternately, it can be run in batch mode or via script that that in turn can be scheduled by the operating system scheduler.

In SAS DI Studio:

1.  Create a New job and give it a meaningful name (e.g. Loading Oracle Tables into SAS Tables.)

2.  Drag & drop User Written Code transformation onto the SAS Job canvas and give it a meaningful name (e.g. ETL macro):

SAS Data Integration Studio Job

3.  Go to the Properties of this transformation and click on the Code tab, then paste your SAS code in the text entry area:

SAS DI job properties

4.  Click OK and Save your SAS Job.

Note, that we do not add Source and Target tables to the Job canvas, only User Written Code transformation. Also note, that the %ETL macro has no direct references to the database table names being extracted and loaded into SAS datasets. Instead, it receives the names of the tables to be loaded from the PARMDL.TABLELIST dataset. This makes it extremely flexible to the task of processing a completely different set of tables; all you need to do is update the table list in the driver table. No change to your SAS DI job is needed.

Now it's your turn

What is your experience with automating ETL processes? Please feel free to share it here in the Comments section.

 

Automating the loading of multiple database tables into SAS tables was published on SAS Users.

4月 262017
 

Oracle databases from SAS ViyaSAS Data Connector to Oracle lets you easily load data from your Oracle DB into SAS Viya for advanced analytics. SAS/ACCESS Interface to Oracle (on SAS Viya) provides the required SAS Data Connector to Oracle, that should be deployed to your CAS environment. Once the below described configuration steps for SAS Data Connector to Oracle are completed, SAS Studio or Visual Data Builder will be able to directly access Oracle tables and load them into the CAS engine.

SAS Data Connector to Oracle requires Oracle client components (release 12c or later) to be installed and configured and configurations deployed to your CAS server. Here is a guide that walks you through the process of installing the Oracle client on a Linux server and configuring SAS Data Connector to Oracle on SAS Viya 3.1 (or 3.2):

Step 1: Get the Oracle Instant client software (release 12.c or later)

To find the oracle client software package open a web browser and navigate to Oracle support at:
http://www.oracle.com/technetwork/topics/linuxx86-64soft-092277.html

Download the following two packages to your CAS controller server:

  • oracle-instantclient12.1-basic-12.1.0.2.0-1.x86_64.rpm
  • oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm (optional for testing only)

Step 2: Install and configure Oracle instant client

On your CAS controller server, execute the following commands to install the Oracle instant client and SQLPlus utilities.

rpm –ivh oracle-instantclient12.1-basic-12.1.0.2.0-1.x86_64.rpm
rpm –ivh oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm

The Oracle client should be installed to /usr/lib/oracle/12.1/client64.
To configure the Oracle client, create a file called tnsnames.ora, for example, in the /etc/ directory. Paste the following lines with the appropriate connection parameters of your Oracle DB into the tnsnames.ora file. (Replace "your_tnsname", "your_oracle_host", "your_oracle_port" and "your_oracle_db_service_name" with parameters according to your Oracle DB implementation)

your_tnsname =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = your_oracle_host)(PORT = your_oracle_port ))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME =your_oracle_db_service_name)
    )
  )

Next you need to set environment variables for the Oracle client to work:

LD_LIBRARY_PATH: Specifies the directory of your Oracle instant client libs
PATH: Must include your Oracle instant client bin directory
TNS_ADMIN: Directory of your tnsnames.ora file
ORACLE_HOME: Location of your Oracle instant client installation

In a console window on your CAS controller Linux server, issue the following commands to set environment variables for the Oracle client: (replace the directory locations if needed)

export LD_LIBRARY_PATH=/usr/lib/oracle/12.1/client64/lib:$LD_LIBRARY_PATH
export PATH=/usr/lib/oracle/12.1/client64/bin:$PATH 
export TNS_ADMIN=/etc 
export ORACLE_HOME=/usr/lib/oracle/12.1/client64

If you installed the SQLPlus package from Oracle, you can test connectivity with the following command: (replace "your_oracle_user" and "your_tnsname" with a valid oracle user and the tnsname configured previously)

sqlplus your_oracle_user@your_tnsname

When prompted for a password, use your Oracle DB password to log on.
You should see a “SQL>” prompt and you can issue SQL queries against the Oracle DB tables to verify your DB connection. This test indicates if the Oracle instant client is successfully installed and configured.

Step 3: Configure SAS Data Connector to Oracle on SAS Viya

Next you need to configure the CAS server to use the instant client.

The recommended way is to edit the vars.yml file in your Ansible playbook and deploy the required changes to your SAS Viya cluster.

Locate the vars.yml file on your cluster deployment and change the CAS_SETTINGS section to reflect the correct environment variables needed for CAS to use the Oracle instant client:
To do so, uncomment the lines for ORACLE_HOME and LD_LIBRARY_PATH and insert the respective path for your Oracle instant client installation as shown below.

CAS Specific ####
# Anything in this list will end up in the cas.settings file
CAS_SETTINGS:
1: ORACLE_HOME=/usr/lib/oracle/12.1/client64
#3: ODBCHOME=ODBC home directory
#4: JAVA_HOME=/usr/lib/jvm/jre-1.8.0
5:LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib

Run the ansible-playbook to deploy the changes to your CAS server.
After ansible finished the update, your cas.settings file should contain the following lines:

export ORACLE_HOME=/usr/lib/oracle/12.1/client64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib

Now you are ready to use SAS/ACCESS Interface to Oracle in SAS Viya.

Step 4: Test SAS/ACCESS Interface to Oracle in SAS Studio

Log on to SAS Studio to load data from your Oracle DB into CAS.
Execute the following SAS Code example in SAS Studio to connect to your Oracle DB and load data into CAS. Change the parameters starting with "your_" in the SAS code below according to your Oracle DB implementation.

/************************************************************************/
/*  Start a CAS session   */
/************************************************************************/
cas;
/************************************************************************/
/*  Create a Caslib for the Oracle connection   */
/************************************************************************/
caslib ORACLE datasource=(                                           
    srctype="oracle",
    uid="your_oracle_user_ID",
    pwd="your_oracle_password",
    path="//your_db_hostname:your_db_port/your_oracle_service_name",
    schema="your_oracle_schema_name" );
 
/************************************************************************/
/*  Load a table from Oracle into CAS   */
/************************************************************************/
proc casutil;
   list files incaslib="ORACLE"; 
   load casdata="your_oracle_table_name" incaslib="ORACLE" outcaslib="casuser" casout="your_cas_table_name";                   
   list tables incaslib="casuser";
quit;
/************************************************************************/
/*  Assign caslib to SAS Studio   */
/************************************************************************/
 caslib _all_ assign;

The previous example is a simple SAS program to test access to Oracle and load data from an Oracle table into CAS memory. As a result, the program loads a table in your CAS library with data from your Oracle database.

How to configure Oracle client for successful access to Oracle databases from SAS Viya was published on SAS Users.