5月 052020
 

When you route REPORT procedure output to any of the ODS destinations, you might want to apply style attributes to a column based on multiple variable values. To do that, you need to use CALL DEFINE statements in a COMPUTE block. This structure could then require complex logic and many CALL DEFINE statements to ensure that all combinations of the variable values and styles are applied appropriately. However, the STYLE/MERGE and STYLE/REPLACE attributes in the CALL DEFINE statement can simplify this process. They are useful when you have two or more COMPUTE blocks with CALL DEFINE statements and the CALL DEFINE statements refer to the same cell in a table.

Using the STYLE/MERGE attribute

The STYLE/MERGE attribute combines any new styles that are specified by the STYLE= argument with any existing styles that are already applied to values in the row or column. In this example, style attributes are applied to the Sex column. In the first COMPUTE block, if the value of Sex is F, the background color of the cell is yellow. In the second COMPUTE block for Age, if the value of Age is greater than 14, the color of the text in the Sex column is red. When STYLE/MERGE is used, that means that a yellow background that also has red text is used for any cell in the Sex column where the value is F and the corresponding Age is also greater than 14.

 
proc report data=sashelp.class; 
column sex age height weight;
define sex--weight / display;
 
compute sex;
if sex = 'F' then
 call define('sex', "style", "style=[background=yellow]"); 
endcomp;
 
compute age;
if age > 14 then
 call define('sex', "style/merge", "style=[color=red]"); 
endcomp;
run;

Here is the resulting output:

Using the STYLE/REPLACE attribute

The STYLE/REPLACE attribute replaces any existing styles for a row or column with the new styles that are specified by the STYLE= argument. In this example, style attributes are applied to the Sex column again. In the first COMPUTE block, if the value of Sex is F, the background color of the cell is yellow. In the second COMPUTE block for Age, if the value of Age is greater than 14, the color of the text in the Sex column is red. When STYLE/REPLACE is used, that means that red text only, without any background color, is used for any cell in the Sex column where the value is F and the corresponding Age is also greater than 14. The red-text style replaces the yellow background.

 
proc report data=sashelp.class; 
column sex age height weight;
define sex--weight / display;
 
compute sex;
if sex = 'F' then
 call define('sex', "style", "style=[background=yellow]"); 
endcomp;
 
compute age;
if age > 14 then	
 call define('sex', "style/replace", "style=[color=red]"); 
endcomp;
run;

Here is the resulting output:

The STYLE/MERGE and STYLE/REPLACE attributes are supported only in the CALL DEFINE statement in a COMPUTE block in PROC REPORT. These useful tools can simplify complex code and enable you to customize your PROC REPORT output with meaningful style choices.

Additional References

PROC REPORT: CALL DEFINE

Sample Note 43758: How to merge styles from multiple CALL DEFINE statements with PROC REPORT

Using STYLE/MERGE and STYLE/REPLACE in PROC REPORT was published on SAS Users.

5月 042020
 

[Jessica Curtis and Adam Hillman, both Forecasting Advisors at SAS, were co-authors of this post] The world has been dramatically impacted by the recent COVID-19 pandemic. Many of us are juggling a completely new lifestyle that was forced upon us overnight. As consumers find their way to a new normal, [...]

Retail forecasting through a pandemic was published on SAS Voices by Brittany Bullard

5月 042020
 

SAS programmers sometimes ask about ways to perform one-dimensional linear interpolation in SAS. This article shows three ways to perform linear interpolation in SAS: PROC IML (in SAS/IML software), PROC EXPAND (in SAS/ETS software), and PROC TRANSREG (in SAS/STAT software). Of these, PROC IML Is the simplest to use and has the most flexibility. This article shows how to implement an efficient 1-D linear interpolation algorithm in SAS. You can download the SAS program that creates the analyses and graphs in this article.

Linear interpolation assumptions

For one-dimensional linear interpolation of points in the plane, you need two sets of numbers:

  1. Data: Let (x1, y1), (x2, y2), ..., (xn, yn) be a set of n data points. The data should not contain any missing values. The data must be ordered so that x1 < x2 < ... < xn. These values uniquely define the linear interpolation function on [x1, xn]. I call this the "sample data" or "fitting data" because it is used to create the linear interpolation model.
  2. Values to score: Let {t1, t2, ..., tk} be a set of k new values for the X variable. For interpolation, all values must be within the range of the data: x1 ≤ ti ≤ xn for all i. The goal of interpolation is to produce a new Y value for each value of ti. The scoring data is also called the "query data."

Interpolation requires a model. For linear interpolation, the model is the unique piecewise linear function that passes through each sample point and is linear on each interval [xi, xi+1]. The model is usually undefined outside of the range of the data, although there are various (nonunique) ways to extrapolate the model beyond the range of the data. You fit the model by using the data. You then score the model on the set of new values.

The following SAS data sets define example data for linear interpolation. The POINTS data set contains fitting data that define the linear model. The SCORE data set contains the new query points at which we want to interpolate. The linear interpolation is shown to the right. The sample data are shown as blue markers. The model is shown as blue lines. The query values to score are shown as a fringe plot along the X axis. The interpolated values are shown as red markers.

The data used for this example are:

/* Example data for 1-D interpolation */
data Points;  /* these points define the model */
input x y;
datalines;
0  1
1  3
4  5
5  4
7  6
8  3
10 3
;
 
data Score; /* these points are to be interpolated */
input t @@;
datalines;
2 -1 4.8 0 0.5 1 9 5.3 7.1 10.5 9
;

For convenience, the fitting data are already sorted by the X variable, which is in the range [0, 10]. The scoring data set does not need to be sorted.

The scoring data for this example contains five special values:

  • Two scoring values (-1 and 10.5) are outside of the range of the data. An interpolation algorithm should return a missing value for these values. (Otherwise, it is extrapolation.)
  • Two scoring values (0 and 1) are duplicates of X values in the data. Ideally, this should not present a problem for the interpolation algorithm.
  • The value 9 appears twice in the scoring data.

Linear interpolation in SAS by using PROC IML

As is often the case, PROC IML enables you to implement a custom algorithm in only a few lines of code. For simplicity, suppose you have a single value, t, that you want to interpolate, based on the data (x1, y1), (x2, y2), ..., (xn, yn). The main steps for linear interpolation are:

  1. Check that the X values are nonmissing and in increasing order: x1 < x2 < ... < xn. Check that t is in the range [x1, xn]. If not, return a missing value.
  2. Find the first interval that contains t. You can use the BIN function in SAS/IML to find the first value i for which x_i <= t <= x_{i+1}.
  3. Define the left and right endpoint of the interval: xL = x_i and xR = x_{i+1}. Define the corresponding response values: yL = y_i and yR = y_{i+1}.
  4. Let f = (t - xL) / (xR - xL) be the proportion of the interval to the left of t. Then p = (1 - f)*yL + f*yR is the linear interpolation at t.

The steps are implemented in the following SAS/IML function. The function accepts a vector of scoring values, t. Notice that the program does not contain any loops over the elements of t. All statements and operations are vectorized, which is very efficient.

/* Linear interpolation based on the values (x1,y1), (x2,y2),....
   The X  values must be nonmissing and in increasing order: x1 < x2 < ... < xn
   The values of the t vector are linearly interpolated.
*/
proc iml;
start LinInterp(x, y, _t);
   d = dif(x, 1, 1);                     /* check that x[i+1] > x[i] */
   if any(d<=0) then stop "ERROR: x values must be nonmissing and strictly increasing.";
   idx = loc(_t>=min(x) && _t<=max(x));  /* check for valid scoring values */
   if ncol(idx)=0 then stop "ERROR: No values of t are inside the range of x.";
 
   p = j(nrow(_t)*ncol(_t), 1, .);     /* allocate output (prediction) vector */
   t = _t[idx];                        /* subset t values inside range(x) */
   k = bin(t, x);                      /* find interval [x_i, x_{i+1}] that contains s */
   xL = x[k];   yL = y[k];             /* find (xL, yL) and (xR, yR) */
   xR = x[k+1]; yR = y[k+1];
   f = (t - xL) / (xR - xL);           /* f = fraction of interval [xL, xR] */
   p[idx] = (1 - f)#yL + f#yR;        /* interpolate between yL and yR */
   return( p );
finish;
 
/* example of linear interpolation in SAS */
use Points; read all var {'x' 'y'}; close;
use Score; read all var 't'; close;
 
pred = LinInterp(x, y, t);
create PRED var {'t' 'pred'}; append; close;
QUIT;

Visualize a linear interpolation in SAS

The previous program writes the interpolated values to the PRED data set. You can concatenate the original data and the interpolated values to visualize the linear interpolation:

/* Visualize: concatenate data and predicted (interpolated) values */
data All;
set Points Pred;
run;
 
title "Linear Interpolation";
title2 "No Extrapolation";
proc sgplot data=All noautolegend;
   series x=x y=y;
   scatter x=x y=y / markerattrs=(symbol=CircleFilled size=12)
                    name="data" legendlabel="Data";
   scatter x=t y=Pred / markerattrs=(symbol=asterisk size=12 color=red)
                    name="interp" legendlabel="Interpolated Values";
   fringe t / lineattrs=(color=red thickness=2)
                    name="score" legendlabel="Values to Score";
   xaxis grid values=(0 to 10) valueshint label="X";
   yaxis grid label="Y" offsetmin=0.05;
   keylegend "data" "score" "interp";
run;

The graph is shown at the top of this article. A few noteworthy items:

  • The values -1 and 10.5 are not scored because they are outside the range of the data.
  • The values 0 and 1 correspond to a data point. The interpolated value is the corresponding data value.
  • The other values are interpolated onto the straight line segments that connect the data.

Performance of the IML algorithm

The IML algorithm is very fast. On my Windows PC (Pentium i7), the interpolation takes only 0.2 seconds for 1,000 data points and one million scoring values. For 10,000 data points and one million scoring values, the interpolation takes about 0.25 seconds. The SAS program that accompanies this article includes timing code.

Other SAS procedures that can perform linear interpolation

According to a SAS Usage Note, you can perform linear interpolation in SAS by using PROC EXPAND in SAS/ETS software or PROC TRANSREG in SAS/STAT software. Each has some limitations that I don't like:

  • Both procedures use the missing value trick to perform the fitting and scoring in a single call. That means you must concatenate the sample data (which is often small) and the query data (which can be large).
  • PROC EXPAND requires that the combined data be sorted by X. That can be easily accomplished by calling PROC SORT after you concatenate the sample and query data. However, PROC EXPAND does not support duplicate X values! For me, this makes PROC EXPAND unusable. It means that you cannot score the model at points that are in the original data, nor can you have repeated values in the scoring data.
  • If you use PROC TRANSREG for linear interpolation, you must know the number of sample data points, n. You must specify n – 2 on the NKNOTS= option on a SPLINE transformation. Usually, this means that you must perform an extra step (DATA step or PROC MEANS) and store n – 2 in a macro variable.
  • For scoring values outside the range of the data, PROC EXPAND returns a missing value. However, PROC TRANSREG extrapolates. If t < x1, then the extrapolated value at t is y1. Similarly, if t > xn, then the extrapolated value at t is yn.

I've included examples of using PROC EXPAND and PROC TRANSREG in the SAS program that accompanies this article. You can use these procedures for linear interpolation, but neither is as convenient as PROC IML.

With effort, you can use Base SAS routines such as the DATA step to implement a linear interpolation algorithm. An example is provided by KSharp in a SAS Support Community thread.

Summary

If you want to perform linear interpolation in SAS, the easiest and most efficient method is PROC IML. I have provided a SAS/IML function that implements linear interpolation by using two input vectors (x and y) to define the model and one input vector (t) to specify the points at which to interpolate. The function returns the interpolated values for t.

The post Linear interpolation in SAS appeared first on The DO Loop.

5月 042020
 

SAS programmers sometimes ask about ways to perform one-dimensional linear interpolation in SAS. This article shows three ways to perform linear interpolation in SAS: PROC IML (in SAS/IML software), PROC EXPAND (in SAS/ETS software), and PROC TRANSREG (in SAS/STAT software). Of these, PROC IML Is the simplest to use and has the most flexibility. This article shows how to implement an efficient 1-D linear interpolation algorithm in SAS. You can download the SAS program that creates the analyses and graphs in this article.

Linear interpolation assumptions

For one-dimensional linear interpolation of points in the plane, you need two sets of numbers:

  1. Data: Let (x1, y1), (x2, y2), ..., (xn, yn) be a set of n data points. The data should not contain any missing values. The data must be ordered so that x1 < x2 < ... < xn. These values uniquely define the linear interpolation function on [x1, xn]. I call this the "sample data" or "fitting data" because it is used to create the linear interpolation model.
  2. Values to score: Let {t1, t2, ..., tk} be a set of k new values for the X variable. For interpolation, all values must be within the range of the data: x1 ≤ ti ≤ xn for all i. The goal of interpolation is to produce a new Y value for each value of ti. The scoring data is also called the "query data."

Interpolation requires a model. For linear interpolation, the model is the unique piecewise linear function that passes through each sample point and is linear on each interval [xi, xi+1]. The model is usually undefined outside of the range of the data, although there are various (nonunique) ways to extrapolate the model beyond the range of the data. You fit the model by using the data. You then score the model on the set of new values.

The following SAS data sets define example data for linear interpolation. The POINTS data set contains fitting data that define the linear model. The SCORE data set contains the new query points at which we want to interpolate. The linear interpolation is shown to the right. The sample data are shown as blue markers. The model is shown as blue lines. The query values to score are shown as a fringe plot along the X axis. The interpolated values are shown as red markers.

The data used for this example are:

/* Example data for 1-D interpolation */
data Points;  /* these points define the model */
input x y;
datalines;
0  1
1  3
4  5
5  4
7  6
8  3
10 3
;
 
data Score; /* these points are to be interpolated */
input t @@;
datalines;
2 -1 4.8 0 0.5 1 9 5.3 7.1 10.5 9
;

For convenience, the fitting data are already sorted by the X variable, which is in the range [0, 10]. The scoring data set does not need to be sorted.

The scoring data for this example contains five special values:

  • Two scoring values (-1 and 10.5) are outside of the range of the data. An interpolation algorithm should return a missing value for these values. (Otherwise, it is extrapolation.)
  • Two scoring values (0 and 1) are duplicates of X values in the data. Ideally, this should not present a problem for the interpolation algorithm.
  • The value 9 appears twice in the scoring data.

Linear interpolation in SAS by using PROC IML

As is often the case, PROC IML enables you to implement a custom algorithm in only a few lines of code. For simplicity, suppose you have a single value, t, that you want to interpolate, based on the data (x1, y1), (x2, y2), ..., (xn, yn). The main steps for linear interpolation are:

  1. Check that the X values are nonmissing and in increasing order: x1 < x2 < ... < xn. Check that t is in the range [x1, xn]. If not, return a missing value.
  2. Find the first interval that contains t. You can use the BIN function in SAS/IML to find the first value i for which x_i <= t <= x_{i+1}.
  3. Define the left and right endpoint of the interval: xL = x_i and xR = x_{i+1}. Define the corresponding response values: yL = y_i and yR = y_{i+1}.
  4. Let f = (t - xL) / (xR - xL) be the proportion of the interval to the left of t. Then p = (1 - f)*yL + f*yR is the linear interpolation at t.

The steps are implemented in the following SAS/IML function. The function accepts a vector of scoring values, t. Notice that the program does not contain any loops over the elements of t. All statements and operations are vectorized, which is very efficient.

/* Linear interpolation based on the values (x1,y1), (x2,y2),....
   The X  values must be nonmissing and in increasing order: x1 < x2 < ... < xn
   The values of the t vector are linearly interpolated.
*/
proc iml;
start LinInterp(x, y, _t);
   d = dif(x, 1, 1);                     /* check that x[i+1] > x[i] */
   if any(d<=0) then stop "ERROR: x values must be nonmissing and strictly increasing.";
   idx = loc(_t>=min(x) && _t<=max(x));  /* check for valid scoring values */
   if ncol(idx)=0 then stop "ERROR: No values of t are inside the range of x.";
 
   p = j(nrow(_t)*ncol(_t), 1, .);     /* allocate output (prediction) vector */
   t = _t[idx];                        /* subset t values inside range(x) */
   k = bin(t, x);                      /* find interval [x_i, x_{i+1}] that contains s */
   xL = x[k];   yL = y[k];             /* find (xL, yL) and (xR, yR) */
   xR = x[k+1]; yR = y[k+1];
   f = (t - xL) / (xR - xL);           /* f = fraction of interval [xL, xR] */
   p[idx] = (1 - f)#yL + f#yR;        /* interpolate between yL and yR */
   return( p );
finish;
 
/* example of linear interpolation in SAS */
use Points; read all var {'x' 'y'}; close;
use Score; read all var 't'; close;
 
pred = LinInterp(x, y, t);
create PRED var {'t' 'pred'}; append; close;
QUIT;

Visualize a linear interpolation in SAS

The previous program writes the interpolated values to the PRED data set. You can concatenate the original data and the interpolated values to visualize the linear interpolation:

/* Visualize: concatenate data and predicted (interpolated) values */
data All;
set Points Pred;
run;
 
title "Linear Interpolation";
title2 "No Extrapolation";
proc sgplot data=All noautolegend;
   series x=x y=y;
   scatter x=x y=y / markerattrs=(symbol=CircleFilled size=12)
                    name="data" legendlabel="Data";
   scatter x=t y=Pred / markerattrs=(symbol=asterisk size=12 color=red)
                    name="interp" legendlabel="Interpolated Values";
   fringe t / lineattrs=(color=red thickness=2)
                    name="score" legendlabel="Values to Score";
   xaxis grid values=(0 to 10) valueshint label="X";
   yaxis grid label="Y" offsetmin=0.05;
   keylegend "data" "score" "interp";
run;

The graph is shown at the top of this article. A few noteworthy items:

  • The values -1 and 10.5 are not scored because they are outside the range of the data.
  • The values 0 and 1 correspond to a data point. The interpolated value is the corresponding data value.
  • The other values are interpolated onto the straight line segments that connect the data.

Performance of the IML algorithm

The IML algorithm is very fast. On my Windows PC (Pentium i7), the interpolation takes only 0.2 seconds for 1,000 data points and one million scoring values. For 10,000 data points and one million scoring values, the interpolation takes about 0.25 seconds. The SAS program that accompanies this article includes timing code.

Other SAS procedures that can perform linear interpolation

According to a SAS Usage Note, you can perform linear interpolation in SAS by using PROC EXPAND in SAS/ETS software or PROC TRANSREG in SAS/STAT software. Each has some limitations that I don't like:

  • Both procedures use the missing value trick to perform the fitting and scoring in a single call. That means you must concatenate the sample data (which is often small) and the query data (which can be large).
  • PROC EXPAND requires that the combined data be sorted by X. That can be easily accomplished by calling PROC SORT after you concatenate the sample and query data. However, PROC EXPAND does not support duplicate X values! For me, this makes PROC EXPAND unusable. It means that you cannot score the model at points that are in the original data, nor can you have repeated values in the scoring data.
  • If you use PROC TRANSREG for linear interpolation, you must know the number of sample data points, n. You must specify n – 2 on the NKNOTS= option on a SPLINE transformation. Usually, this means that you must perform an extra step (DATA step or PROC MEANS) and store n – 2 in a macro variable.
  • For scoring values outside the range of the data, PROC EXPAND returns a missing value. However, PROC TRANSREG extrapolates. If t < x1, then the extrapolated value at t is y1. Similarly, if t > xn, then the extrapolated value at t is yn.

I've included examples of using PROC EXPAND and PROC TRANSREG in the SAS program that accompanies this article. You can use these procedures for linear interpolation, but neither is as convenient as PROC IML.

With effort, you can use Base SAS routines such as the DATA step to implement a linear interpolation algorithm. An example is provided by KSharp in a SAS Support Community thread.

Summary

If you want to perform linear interpolation in SAS, the easiest and most efficient method is PROC IML. I have provided a SAS/IML function that implements linear interpolation by using two input vectors (x and y) to define the model and one input vector (t) to specify the points at which to interpolate. The function returns the interpolated values for t.

The post Linear interpolation in SAS appeared first on The DO Loop.

4月 292020
 

When we interviewed supply chain expert Chris Tyas a few weeks prior to the COVID-19 pandemic, we had no idea that his 40 years of experience and knowledge in the consumer product goods (CPG) industry would soon have a profound impact on the supply chain and efforts to keep supplies [...]

The importance of the supply chain during the COVID-19 crisis was published on SAS Voices by Nancy Rudolph

4月 292020
 

Recently I read an excellent blog post by Paul von Hippel entitled "How many imputations do you need?". It is based on a paper (von Hippel, 2018), which provides more details.

Suppose you are faced with data that has many missing values. One way to address the missing values is to use multiple imputations. If two different researchers use different random-number seeds when they perform the imputations, they will get slightly different estimates. Clearly, you would like this difference to be small, which you can accomplish by using many imputations. The paper and blog address the important question: how many imputations are enough?

The purpose of this article is simply to provide greater visibility for von Hippel's work and to advertise a SAS macro that he wrote that implements his ideas. I am not an expert on multiple imputations, so if you have questions about the macro or the method, you should follow the previous links and read the original work.

A formula for the number of imputations

Traditionally, practitioners used 5 or 10 multiple imputations to perform a missing value analysis. As mentioned in the SAS documentation of the MI procedure, "Von Hippel (2009, p. 278) shows that with a small number of imputations, only the point estimates are reliable. That is, the point estimates will not change much if the missing values are imputed again. For other statistics (such as standard error and p-value) to be reliable," you must use more imputations. This was the main reason that the default value for the NIMPUTE= option in PROC MI was changed from NIMPUTE=5 to NIMPUTE=25 in SAS/STAT 14.1.

In his 2018 paper, von Hippel addressed the issue of "how many imputations are enough" by proposing a quadratic formula and a SAS macro that implements the formula based on data.

There is a clever trick in von Hippel's method. The formula needs a certain statistic (called the "fraction of missing information" or FMI) in order to estimate the number of imputations. However, you can't get an estimate for the FMI until you have performed imputations and a subsequent analysis! The solution to this "Catch-22" is to use the same idea that is used in power and sample size computations. First, you perform a small "pilot study" to estimate the FMI. Then you use that estimate in the formula to obtain the number of imputations that are needed for the full study.

By the way, you can use this clever trick in other resampling methods. In my book Simulating Data with SAS (Wicklin, 2013, p. 317), I discuss how to run a "small-scale simulation on a coarse grid" first, and then "refine the grid of parameter values" only where necessary.

A SAS macro for the number of imputations

Paul von Hippel provides a link to a SAS macro that he wrote that implements his two-stage method. In the file "two_stage example.sas," von Hippel shows how to automate his two-stage method for choosing an appropriate number of imputations. Again, I am not the expert, so please direct your questions to von Hippel of to an online SAS Support Community.

In summary, the purpose of this blog post is to make you aware von Hippel's blog post, paper, and macro. It sounds like a nice feature for analysts who use PROC MI in SAS for missing value imputations. Check it out!

The post How many imputations are enough? appeared first on The DO Loop.

4月 282020
 

With increasing interest in Continuous Integration/Continuous Delivery (CI/CD), many SAS Users want to know what can be done for Visual Analytics reports. In this article, I will explain how to use Python and SAS Viya REST APIs to extract a report from a SAS Viya environment and import it into another environment. For those trying to understand the secret behind CI/CD and DevOps, here it is:

What you do tomorrow will be better than what you did yesterday because you gained more experience today!

About Continuous Integration/Continuous Delivery

If you apply this principle to code development, it means that you may update your code every day, test it, deploy it, and start the process all over again the following day. You might feel like Sisyphus rolling his boulder around for eternity. This is where CI/CD can help. In the code deployment process, you have many recurrent tasks that can be automated to reduce repetitiveness and boredom. CI/CD is a paradigm where improvements to code are pushed, tested, validated, and deployed to production in a continuous automated manner.

About ModelOps and AnalyticOps

I hope you now have a better understanding of what CI/CD is. You might now wonder how CI/CD relates to Visual Analytics reports, models, etc. With the success of DevOps which describes the Development Operations for software development, companies have moved to the CI/CD paradigms for operations not related to software development. This is why you hear about ModelOps, AnalyticOps... Wait a second, what is the difference between writing or generating code for a model or report versus writing code for software? You create a model, you test it, you validate it, and finally deploy it. You create a report, you test it, you validate it, and then you deploy it. Essentially, the processes are the same. This is why we apply CI/CD techniques to models, reports, and many other business-related tasks.

About tools

As with many methodologies like CI/CD, tools are developed to help users through the process. There are many tools available and some of them are more widely used. SAS offers SAS Workflow Manager for building workflows to help with ModelOps. Additionally, you have surely heard about Git and maybe even Jenkins.

  • Git is a version control system that is used by many companies with some popular implementations: GitHub, GitLab, BitBucket.
  • Jenkins is an automation program that is designed to build action flows to ease the CI/CD process.

With these tools, you have the needed architecture to start your CI/CD journey.

The steps

With a basic understanding of the CI/CD world; you might ask yourself: How does this apply to reports?

When designing a report in an environment managed by DevOps principles, here are the steps to deploy the report from a development environment to production:

  1. Design the report in development environment.
  2. Validate the report with Business stakeholders.
  3. Export the report from the development environment.
  4. Save a version of the report.
  5. Import the report into the test environment.
  6. Test the report into the test environment.
  7. Import the report into the production environment.
  8. Monitor the report usage and performance.

Note: In some companies, the development and test environments are the same. In this case, steps 4 to 6 are not required.

Walking through the steps, we identify steps 1 and 2 are manual. The other steps can be automated as part of the CI/CD process. I will not explain how to build a pipeline in Jenkins or other tools in this post. I will nevertheless provide you with the Python code to extract, import, and test a report.

The code

To perform the steps described in the previous section, you can use different techniques and programming languages. I’ve chosen to use Python and REST APIs. You might wonder why I've chosen these and not the sas-admin CLI or another language. The reason is quite simple: my Chinese zodiac sign is a snake!

Jokes aside, I opted for Python because:

  • It is easy to read and understand.
  • There is no need to compile.
  • It can run on different operating systems without adaptation.
  • The developer.sas.com site provides examples.

I’ve created four Python files:

  1. getReport.py: to extract the report from an environment.
  2. postReport.py: to create the report in an environment.
  3. testReport.py: to test the report in an environment.
  4. functions.py: contains the functions that are used in the other Python files.

All the files are available on GitHub.

The usage

Before you use the above code, you should configure your SAS Viya environment to enable access to REST APIs. Please follow the first step from this article to register your client.

You should also make sure the environment executing the Python code has Python 3 with the requests package installed. If you're missing the requests package, you will get an error message when executing the Python files.

Get the report

Now that your environment is set up, you can execute the getReport code to extract the report content from your source environment. Below is the command line arguments to pass to execute the code:

python3 getReport.py -a myAdmin -p myAdminPW -sn http://va85.gel.sas.com -an app -as appsecret -rl "/Users/sbxxab/My Folder" -rn CarsReport -o /tmp/CICD/

The parameters are:

  • a - the user that is used to connect to the SAS Viya environment.
  • p - the password of the user.
  • sn - the URL of the SAS Viya environment.
  • an - the name of the application that was defined to enable REST APIs.
  • as - the secret to access the REST APIs.
  • rl - the report location should be between quotes if it contains white spaces.
  • rn - the report name should be between quotes if it contains white spaces.
  • o - the output location that will be used.

The output location should ideally be a Git repository. This allows a commit as the next step in the CI/CD process to keep a history log of any report changes.

The output generated by getReport is a JSON file which has the following structure:

{
"name": "CarsReport",
"location": "/Users/sbxxab/My Folder",
"content": {
"@element": "SASReport",
"xmlns": "http://www.sas.com/sasreportmodel/bird-4.2.4",
"label": "CarsReport",
"dateCreated": "2020-01-14T08:10:31Z",
"createdApplicationName": "SAS Visual Analytics 8.5",
"dateModified": "2020-02-17T15:33:13Z",
"lastModifiedApplicationName": "SAS Visual Analytics 8.5",
"createdVersion": "4.2.4",
"createdLocale": "en",
"nextUniqueNameIndex": 94,
 
...
 
}

From the response:

  • The name value is the report name.
  • The location value is the folder location in the SAS Content.
  • The content value is the BIRD representation of the report.

The generated file is not a result of the direct extraction of the report in the SAS environment. It is a combination of multiple elements to build a file containing the required information to import the report in the target environment.

Version the report

The next step in the process is to commit the file within the Git repository. You can use the git commit command followed by a git push to upload the content to a remote repository. Here are some examples:

# Saving the report in the local repository
git commit -m "Save CarsReport"
 
# Pushing the report to a remote repository after a local commit
git push https://gitlab.sas.com/myRepository.git

Promote the report

As soon as you have saved a version of the report, you can import the report in the target environment (production). This is where the postReport comes into play. Here is a sample command line:

python3 postReport.py -a myAdmin -p myAdminPW -sn http://va85.gel.sas.com -an app -as appsecret -i /tmp/CICD/CarsReport.json

The parameters are:

  • a - the user that is used to connect to the SAS Viya environment.
  • p - the password of the user.
  • sn - the URL of the SAS Viya environment.
  • an - the name of the application that was defined to enable REST APIs.
  • as - the secret to access the REST APIs.
  • i - the input JSON file which contains the output of the getReport.py.

The execution of the code returns nothing except in the case of an error.

Testing the report

You now have access to the report in the production environment. A good practice is to test/validate access to the report. While testing manually in an interface is possible, it's best to automate. Sure, you could validate one report, but what if you had twenty? Use the testReport script to verify the report. Below are the command line arguments to execute the code:

python3 testReport.py -a myAdmin -p myAdminPW -sn http://va85.gel.sas.com -an app -as appsecret -i /tmp/CICD/CarsReport.json

The parameters are:

  • a - the user that is used to connect to the SAS Viya environment.
  • p - the password of the user.
  • sn - the URL of the SAS Viya environment.
  • an - the name of the application that was defined to enable REST APIs.
  • as - the secret to access the REST APIs.
  • i - the input JSON file which contains the output of the getReport.py

The testReport connects to SAS Visual Analytics using REST APIs and generates an image of the first section of the report. The image generation process produces an SVG image of little interest. The most compelling part of the test is the duration of the execution. This gives us an indication of the report's performance.

Throughout the CI/CD process, validation is important and it is also interesting to get a benchmark for the report. This is why the testReport populates a .perf file for the report. The response file has the following structure:

{
"name": "CarsReport",
"location": "/Users/sbxxab/My Folder",
"performance": [
{
"testDate": "2020-03-26T08:55:37.296Z",
"duration": 7.571
},
{
"testDate": "2020-03-26T08:55:56.449Z",
"duration": 8.288
}
]
}

From the response:

  • The name value is the report name.
  • The location value is the folder location in the SAS Content.
  • The performance array contains the date time stamp of the test and the time needed to generate the image.

The file updates for each execution of the testReport code.

Conclusion

CI/CD is important to SAS. Many SAS users need solutions to automate their deployment processes for code, reports, models, etc. This is not something to fear because SAS Viya has the tools required to integrate into CI/CD pipelines. As you have seen in this post, we write code to ease the integration. Even if you don’t have CI/CD tools like Jenkins to orchestrate the deployment, you can execute the different Python files to promote content from one environment to another and test the deployed content.

If you want to get more information about ModelOps, I recommend to have a look at this series.

Continuous Integration/Continuous Delivery – Using Python and REST APIs for SAS Visual Analytics reports was published on SAS Users.

4月 272020
 

Many of us are currently working from home and getting adjusted to this new way of working. If you’re an employee working on the shop floor of a manufacturing facility, however, working from home is not an option. Among the many hard decisions manufacturing leaders have had to make during [...]

Increase employee safety with production planning optimization  was published on SAS Voices by Chris Hartmann