11月 152017
 

Did you know that a SAS/IML function can recover from a run-time error? You can specify how to handle run-time errors by using a programming technique that is similar to the modern "try-catch" technique, although the SAS/IML technique is an older implementation.

Preventing errors versus handling errors

In general, SAS/IML programmers should try to detect potential problems and prevent errors from occurring. For example, before you compute the square root of a number, you should test whether the number is greater than or equal to zero. If not, you can handle the bad input. By testing the input value before you call the SQRT function, you prevent a run-time error.

However, sometimes you don't know whether a computation will fail until you attempt it. Other times, the test to determine whether an error will occur is very expensive, and it is cheaper to attempt the computation and handle the error if it occurs.

A canonical example is computing the inverse of a square matrix. It is difficult to test whether a given matrix is numerically singular. You can compute the rank of the matrix or the condition number of the matrix, but both computations are expensive because they rely on the SVD or eigenvalue decomposition. Cheaper methods include computing a determinant or using Gaussian elimination, but these methods can be numerically unstable and are not good indicators of whether a matrix is numerically singular.

Run-time error in SAS/IML modules

To understand how to handle run-time errors in SAS/IML modules, recall the following facts about how to use the execution queue in SAS/IML modules.

In summary, when a run-time error occurs in a module, the module pauses and waits for additional commands. However, if there are statements in the execution queue, those statements will be executed.

You can use these facts to handle errors. At the top of the module, use the PUSH statement to add error-handling statements into the execution queue. If an error occurs, the module pauses, sees that there are statements in the queue, and executes those statements. If one of the statements contains a RESUME statement, the module resumes execution.

Push, pause, execute, and resume

Let's apply these ideas to computing the inverse of a square matrix. The following SAS/IML function attempts to compute the inverse of the input matrix. If the matrix is singular, the computation fails and the module handles that error. The main steps of the modules are as follows:

  1. Define the statements that will be executed in the event of an error. The last executable statement should be the RESUME statement. (Technical point: The RESUME statement is only valid when the module is paused. Surround the statements with IF-THEN logic to prevent the error-handling statements from running when the module exits normally.)
  2. Push these statements into the execution queue.
  3. If an error occurs, the statements in the queue will be executed. The RESUME statement clears the error and resumes execution of the module. (If no error occurs, the queue is flushed after the RETURN statement.)
proc iml;
/* If the matrix is not invertible, return a missing value. 
   Otherwise, return the inverse.
*/
start InvEx(A); 
   errFlag = 1;           /* set flag. Assume we will get an error :-) */   
   on_error = "if errFlag then do;  AInv = .;  resume; end;";
   call push(on_error);   /* PUSH code that will be executed if an error occurs */
   AInv = inv(A);         /* if error, AInv set to missing and function resumes */
   errFlag = 0;           /* remove flag for normal exit from module */
   return ( AInv );
finish;
 
A1 = {1 1,
      0 1};
B1 = InvEx(A1);
A2 = {1 1,
      1 1};
B2 = InvEx(A2);
 
print B1, B2;

The function is called first with a nonsingular matrix, A1. The INV function does not fail, so the module exits normally. When the module exits, it flushes the execution queue. Because ErrFlag=0, the pushed statements have no effect. B1 is equal to the inverse of A1.

The function is called next with a singular matrix, A2. The INV function encounters an error and the module pauses. When the module pauses, it executes the statements in the queue. Because ErrFlag=1, the statements set AINV to a missing value and resume the module execution. The module exits normally and the B2 value is set to missing.

Summary

In summary, you can implement statements that handle run-time errors in SAS/IML modules. The key is to push error-handling code into a queue. The error-handling statements should include a RESUME statement. If the module pauses due to an error, the statements in the queue are executed and the module resumes execution. Be aware that the queue is flushed when the module exits normally, so use a flag variable and IF-THEN logic to control how the statements behave when there is no error.

The post Catch run-time errors in SAS/IML programs appeared first on The DO Loop.

11月 142017
 

As the banking industry continues to combat increasing fraud challenges, payment fraud is growing exponentially. This growth stems from a shifting payment landscape with new and varied payment options. Globally, governments are introducing new initiatives like faster payments and real-time payments which compress turnaround times. These initiatives are altering the [...]

Combat wire fraud with analytics was published on SAS Voices by Veena Hirannaiah

11月 142017
 

SAS Visual Analytics 7.4 has added the support for date parameters. Recall from my first post,  Using parameters in SAS Visual Analytics, a parameter is a variable whose value can be changed at any time by the report viewer and referenced by other report objects. These objects can be a calculated item, aggregated measure, filter, rank or display rule. And remember, every time the parameter is changed, the corresponding objects are updated to reflect that change.

Here is my updated table that lists the supported control objects and parameter types for SAS Visual Analytics 7.4. The type of parameter is required to match the type of data that is assigned to the control.
Notice that SAS Visual Analytics 7.4 has also introduced the support for multiple value selection control objects. I’ll address these in another blog.

Using Date Parameters in your SAS Visual Analytics Reports

Let’s look at an example of a SAS Visual Analytics Report using date parameters. In this fictitious report, we have been given the requirements that the user wants to pick two independent date periods for comparison. This is not the same requirement as filtering the report between a start and end date. This report requirement is such that a report user can pick two independent months in the source data to be able to analyze the change in Expense magnitude for different aggregation levels, such as Region, Product Line and Product.

In this example, we will compare two different Month,Year periods. This could easily be two different Quarter,Year or Week,Year periods; depending on the report requirements, these same steps can be applied.

In this high level breakdown, you can see in red I will create two date parameters from data driven drop-down lists. From these parameter values, I will create two calculated data items, shown in purple, and one aggregated measure that will be used in three different report objects, shown in green.

Here are the steps:

1.     Create the date parameters.

2.     Add the control objects to the report and assign roles.

3.     Create the dependent data items, i.e. the calculated data items and aggregated measure.

4.     Add the remaining report objects to the canvas and assign roles.

Step 1: Create the date parameters

First we will need to create the date parameters that will hold the values made by the report viewers. From the Data Pane, use the drop-down menu and select New Parameter….

Then create your first parameter as shown below. Give it a name.

Next, select minimum and maximum values allowed for this parameter. I used the min and max available in my data source, but you could select a more narrow range if you wanted to restrict the users to only have access to portions of the data, just so long as the values are in your data source since, in this example, we will use the data source to populate the available values in the drop-down list.

Then select a current value, this will serve as the default value that will populate when a user first opens the report.

Finally, select the format in which you want your data item to be formatted. I selected the same format as my underlying data item I will be using to populate the drop-down list.

Notice how your new parameters will now be available from your Data Pane.

Step 2: Add the control objects to the report and assign roles

Next, drag and drop the drop-down list control objects onto the report canvas. In this example, we are not using the Report or Section Prompt areas since I do not want to filter the objects in the report or section automatically. Instead, I am using these prompt values and storing them in a parameter. I will then use those values to create new calculated data items and an aggregated measure.

Once your control objects are in the report canvas, then use the Roles Pane to assign the data items to the roles. As you can see from the screenshot, we are using the Date by Month data item to seed the values of the drop-down list by assigning it to the Category role, this data item is in our data source.

Then we are going to assign our newly created parameters, Period1Parameter and Period2Parameter to the Parameter role. This will allow us to use the value selected in our calculations.

Step 3: Create the dependent data items, i.e. the calculated data items and aggregated measure

Now we are free to use our parameters as we like. In this example, I am prompting the report viewer for two values: Period 1 and Period 2 which are the two periods the user would like compared in this report. So, we will need to create two calculated data times from a single column in our source data. Since we want to display these as columns next to each other in a crosstab object and use them for an aggregated measure, this technique can be used.

Calculated Data Item: Period 1 Expenses

From the Data Pane, use the drop-down menu and select New Calculated Item…. Then use the editor to create this expression: If the Date by Month for this data row equals the parameter value selected for Period 1, then return the Expenses; else return 0.

Calculated Data Item: Period 2 Expenses

Repeat this using the Period2Parameter in the expression.

Aggregated Measure: Period Difference

Next, we want to calculate the difference between the two user selected Period Expenses. To do this, we will need to create an aggregated measure which will evaluate based on the report object’s role assignments. In other words, it will be calculated “on-the-fly” based on the visualization.

Similar to the calculated data items, use the Data Pane and from the drop-down menu select New Aggregated Measure…. Use the editor to create this expression. Notice that we are using our newly created calculated data items we defined using the parameter values. This expression does not use the parameter value directly, but indirectly through the calculated data item.

Step 4: Add the remaining report objects to the canvas and assign roles

No that we have:

  • our Control Objects to capture the user input.,
  • the Parameters to store the values.,
  • and the Calculated Data Items and Aggregated Measure created…

we can add our report objects to the canvas and assign our roles.

You can see I used all three new measures in the crosstab object. I used the aggregated measure in the bar chart and treemap but notice the different aggregation levels. There is even a hierarchy assigned to the treemap category role. This Period Difference aggregated measure calculation is done dynamically and will evaluate for each visualization with its unique role assignments, even while navigating up and down the hierarchy.

Here are some additional screenshots of different period selections.

In this first screenshot you can see the parallel period comparison between December 2010 and 2011.

In these next two screenshots, we are looking at the Thanksgiving Black Friday month of November. We are comparing the two years 2010 and 2011 again. Here we see that the Board Product from the Game Product Line is bright blue indicating an increase in magnitude of Expenses in the most recent period, Nov2011.

By double clicking on Board in the treemap, we are taken to the next level of the hierarchy, Product Description, where we see a the largest magnitude of Expenses is coming from Backgammon and Bob Board Games.

In these final two screenshots we are comparing consecutive periods, November 2011 with December 2011. We can see from the bar chart easily the Region and Product Line where there is the greatest increase in Expenses.

I’ve configured a brush interaction between all three visualizations so that when I select the tallest bar it will highlight the corresponding data values in the crosstab and treemap.

Conclusion

Now you can use date parameters in your Visual Analytics Reports. There are several applications of this feature and this is only one way you can use parameters to drive business intelligence. Using this technique to create columns based on a user selected value is great when you need to compare values when your source data isn’t structured in this manner.

Using Date Parameters in your SAS Visual Analytics Reports was published on SAS Users.

11月 132017
 

Debugging is the bane of every programmer. SAS supports a DATA step debugger, but that debugger can't be used for debugging SAS/IML programs. In lieu of a formal debugger, many SAS/IML programmers resort to inserting multiple PRINT statements into a function definition. However, there is an easier way to query the values of variables inside a SAS/IML function: Use the PAUSE statement to halt the execution of program inside the misbehaving function. You can then interactively submit PRINT statements or any other valid statements. When you are finished debugging, you can submit the RESUME statement and the function will resume execution. (Or you can submit the STOP statement to halt execution and return to the main scope of the program.)

The PAUSE statement

The SAS/IML language is interactive. The PAUSE statement pauses the execution of a SAS/IML module. While the program is paused you can print or assign local variables inside the module. When you are ready for the module to resume execution, you submit the RESUME statement. (The SAS/IML Studio application works slightly differently. The PAUSE statement brings up a dialog box that you can use to submit statements. You press the Resume button to resume execution.)

For example, suppose you write a function called 'Func' whose purpose is to compute the sum of squares of the elements in a numeric vector. While testing the function, you discover that the answer is not correct when you pass in a row vector. You decide to insert a PAUSE statement ("set a breakpoint") near the end of the module, just before the RETURN statement, as follows:

proc iml;
/* in SAS/IML, the CALL EXECUTE statement runs immediately */
start Func(x);
   y = x`*x;
 
  /* Debugging tip: Use PAUSE to enter INTERACTIVE mode INSIDE module! */
  pause "Inside 'Func'; submit RESUME to resume computations.";
  /* Execution pauses until you submit RESUME, then continues from the next statement. */
 
   return (y);
finish;
 
w = Func( {1 2 3} );

When you run the program, it prints Inside 'Func'; submit RESUME to resume computations. The program then waits for you to enter commands. The program will remain paused until you submit the RESUME statement (include a semicolon!).

Because the program has paused inside the function, you can query or set the values of local variables. For this function, there are only two local variables, x and y. Highlight the following statement, and press F3 to submit it.

print y;     /* inside module; print local variable */

The output shows that the value of y is a matrix, not a scalar, which indicates that the expression x`*x does not compute the sum of squares for this input vector. You might recall that SAS/IML has a built-in SSQ function, which computes the sum of squares. Highlight the following statement and press F3 to submit it:

y = ssq(x);  /* assign local variable inside module */

This assignment has overwritten the value of the local variable, y. When you submit a RESUME statement, the function will resume execution and return the value of y. The program is now at the main scope, which you can demonstrate by printing the value of w:

resume;      /* resume execution. Return to main scope */
print w;     /* print result at main scope */

To make the change permanent, you must edit the function definition and redefine the module. Be sure to remove the PAUSE statement when you are finished debugging: If you run a program in batch mode that contains a PAUSE statement, the program will forever wait for input!

In conclusion, the PAUSE statement enables you to pause the execution of a program inside a module and interactively query and change the local variables in the module. This can help you to debug a function. After you finish investigating the state of the local variables, you can submit the RESUME statement, which tells the function to continue executing from the line after the PAUSE statement. (Or submit STOP to exit the function.) The PAUSE statement can be a useful alternative to inserting many PRINT statements inside a function during the debugging phase of program development.

The post A tip for debugging SAS/IML modules: The PAUSE statement appeared first on The DO Loop.

11月 112017
 

Here's a Proc Print trick for grouped data. Suppose your data is divided into groups, such as males and females. You could sort by the grouping variable before printing, like this: Suppose you want to better emphasize the groups. You could add a BY statement, like this: OK, but, personally, [...]

The post Simple Proc Print trick for grouped data appeared first on SAS Learning Post.

11月 092017
 

I recently saw an alarming article on social media about an outbreak of airborne plague spreading from Madagascar to Africa (and potentially to the rest of the world). The plague?!? - I thought that only happened hundreds of years ago?!? I don't really trust news on Facebook, so I went [...]

The post Graphing mistakes to avoid ... like the plague! appeared first on SAS Learning Post.

11月 082017
 

A SAS programmer wanted to display a table in which the rows have different formats. An example is shown below. The programmer wanted columns that represent statistics and rows that represent variables. She wanted to display formats (such as DOLLAR) for some variables—but only for certain statistics. For example, the number of nonmissing observations (N) should never use the format of the variable, whereas the minimum, maximum, and mean values should.

Format rows of a table

In SAS you can easily apply a format to a column, but not to a row. However, SAS provides two techniques that enable you to control the appearance of cells in tables:

Both PROC TEMPLATE and PROC REPORT have a learning curve. If you are in a hurry, an alternative solution is to use the DATA step to create a factoid. Recall that a factoid is an ODS table that contains character values and (formatted) numbers in the same column. A factoid can display mixed-type data by converting every value to a string. This conversion process gives you complete control over the format of each cell in the table. Consequently, you can use context to format some numbers differently from other numbers.

Creating a custom factoid

Suppose that you want to generate the table shown at the top of this article. You can obtain the statistics from PROC MEANS and then transpose the data. The following statements produce an output data set that contains descriptive statistics for three variables in the Sashelp.Cars data:

proc means data=sashelp.cars noprint;
var Cylinders EngineSize MSRP;
output out=stats(drop=_TYPE_ _FREQ_
                 rename=(_STAT_=Label1)
                 where=(Label1 ^= "STD") );
run;
 
proc print data=stats noobs;
run;
Descriptive statistics with formats

I have highlighted the first row of the output data to indicate why this output is not suitable for a report. The output variables retain the format of the original variables. However, the 'N' statistic is the sample size, and a sample of size '$428' does not make sense! Even '428.000' is a strange value to see for a sample size. It would be great to format the first row with an intrger format such as 8.0.

The next DATA step creates three character variables (CVALUE1-CVALUE3) that contain formatted values of the three numerical variables in the data. The SELECT-WHEN statement does the following:

  • For each observation and for each of the three numerical variables:
    • If the LABEL1 variable is 'N', then apply the 8.0 format to the value of the numerical variable.
    • Otherwise use the VVALUE function to get the formatted value for the numerical variable.
data Factoid;
set stats;
array nValue[3] Cylinders EngineSize MSRP;      /* numerical variables */
array cValue[3] $12.;                           /* cValue[i] is formatted version of nValue[i] */
/* with a little effort you could assign labels dynamically */
label cValue1="Cylinders" cValue2="EngineSize" cValue3="MSRP";
 
do i = 1 to dim(nValue);
   select (Label1);
      when ('N')    cValue[i] = put(nvalue[i], 8.0);
      otherwise     cValue[i] = vvalue(nvalue[i]);
   end;
end;
run;
 
proc print data=Factoid label noobs;
   var Label1 cValue:;
run;

The output displays the character columns, which contain formatted numbers. With additional logic, you could display fewer decimal values for the MEAN row.

Transpose the factoid

The previous table contains the information that the programmer wants. However, the programmer asked to display the transpose of the previous table. For completeness, here are SAS statements that transpose the table:

proc transpose data=Factoid out=TFactoid;
   var cValue:;
   ID Label1;
   IDLABEL Label1;
run;
 
proc print data=TFactoid(drop=_NAME_) noobs label;
   label _LABEL_ = "Variable";
run;

The final table is displayed at the top of the article.

Summary

In summary, this article shows one way to display a column of data in which each cell potentially has a different format. The trick is to create a factoid, which means that you create a character copy of numeric data. As you create the copy, you can use the PUT function to apply a custom format, or you can use the VVALUE function to get the existing formatted value.

In general, you should try to avoid creating copies of data, so use PROC TEMPLATE and PROC REPORT if you know how. However, if you don't know how to use those tools, the factoid provides an alternative way to control the formatted values that appear in a table. Although not shown in this article, a factoid also enables you to display character and numeric values in the same column.

The post How to format rows of a table in SAS appeared first on The DO Loop.

11月 082017
 

A SAS programmer wanted to display a table in which the rows have different formats. An example is shown below. The programmer wanted columns that represent statistics and rows that represent variables. She wanted to display formats (such as DOLLAR) for some variables—but only for certain statistics. For example, the number of nonmissing observations (N) should never use the format of the variable, whereas the minimum, maximum, and mean values should.

Format rows of a table

In SAS you can easily apply a format to a column, but not to a row. However, SAS provides two techniques that enable you to control the appearance of cells in tables:

Both PROC TEMPLATE and PROC REPORT have a learning curve. If you are in a hurry, an alternative solution is to use the DATA step to create a factoid. Recall that a factoid is an ODS table that contains character values and (formatted) numbers in the same column. A factoid can display mixed-type data by converting every value to a string. This conversion process gives you complete control over the format of each cell in the table. Consequently, you can use context to format some numbers differently from other numbers.

Creating a custom factoid

Suppose that you want to generate the table shown at the top of this article. You can obtain the statistics from PROC MEANS and then transpose the data. The following statements produce an output data set that contains descriptive statistics for three variables in the Sashelp.Cars data:

proc means data=sashelp.cars noprint;
var Cylinders EngineSize MSRP;
output out=stats(drop=_TYPE_ _FREQ_
                 rename=(_STAT_=Label1)
                 where=(Label1 ^= "STD") );
run;
 
proc print data=stats noobs;
run;
Descriptive statistics with formats

I have highlighted the first row of the output data to indicate why this output is not suitable for a report. The output variables retain the format of the original variables. However, the 'N' statistic is the sample size, and a sample of size '$428' does not make sense! Even '428.000' is a strange value to see for a sample size. It would be great to format the first row with an intrger format such as 8.0.

The next DATA step creates three character variables (CVALUE1-CVALUE3) that contain formatted values of the three numerical variables in the data. The SELECT-WHEN statement does the following:

  • For each observation and for each of the three numerical variables:
    • If the LABEL1 variable is 'N', then apply the 8.0 format to the value of the numerical variable.
    • Otherwise use the VVALUE function to get the formatted value for the numerical variable.
data Factoid;
set stats;
array nValue[3] Cylinders EngineSize MSRP;      /* numerical variables */
array cValue[3] $12.;                           /* cValue[i] is formatted version of nValue[i] */
/* with a little effort you could assign labels dynamically */
label cValue1="Cylinders" cValue2="EngineSize" cValue3="MSRP";
 
do i = 1 to dim(nValue);
   select (Label1);
      when ('N')    cValue[i] = put(nvalue[i], 8.0);
      otherwise     cValue[i] = vvalue(nvalue[i]);
   end;
end;
run;
 
proc print data=Factoid label noobs;
   var Label1 cValue:;
run;

The output displays the character columns, which contain formatted numbers. With additional logic, you could display fewer decimal values for the MEAN row.

Transpose the factoid

The previous table contains the information that the programmer wants. However, the programmer asked to display the transpose of the previous table. For completeness, here are SAS statements that transpose the table:

proc transpose data=Factoid out=TFactoid;
   var cValue:;
   ID Label1;
   IDLABEL Label1;
run;
 
proc print data=TFactoid(drop=_NAME_) noobs label;
   label _LABEL_ = "Variable";
run;

The final table is displayed at the top of the article.

Summary

In summary, this article shows one way to display a column of data in which each cell potentially has a different format. The trick is to create a factoid, which means that you create a character copy of numeric data. As you create the copy, you can use the PUT function to apply a custom format, or you can use the VVALUE function to get the existing formatted value.

In general, you should try to avoid creating copies of data, so use PROC TEMPLATE and PROC REPORT if you know how. However, if you don't know how to use those tools, the factoid provides an alternative way to control the formatted values that appear in a table. Although not shown in this article, a factoid also enables you to display character and numeric values in the same column.

The post How to format rows of a table in SAS appeared first on The DO Loop.