11月 032021
 

A SAS programmer asked whether it is possible to add reference lines to the categorical axis of a bar chart. The answer is yes. You can use the VBAR statement, but I prefer to use the VBARBASIC (or VBARPARM) statement, which enables you to overlay a wide variety of graphs on a bar chart. I have previously written about using the VBARBASIC statement to overlay graphs on bar charts. The VBARBASIC chart is compatible with more graphs than the VBAR chart. See the documentation for a complete discussion of "compatible" plot types.

This article shows two ways to overlay a reference line on the categorical axis of a bar chart. But the SAS programmer wanted more. He wanted to create a bar for each day of the year. That is a lot of bars! For bar charts that have many bars, I recommend using the NEEDLE statement to create a needle plot. The second part of this article demonstrates a needle plot and overlays reference lines for certain holidays.

For simplicity, this article discusses only vertical bar charts, but all programs can be adapted to display horizontal bar charts.

Reference lines and bar charts that use the VBAR statement

First, to be clear, you can easily add horizontal reference lines to a vertical bar chart. This is straightforward. The programmer wanted to add vertical reference lines to the categorical axis, as shown in the graph to the right. In this graph, reference lines are added behind the bars for Age=12 and Age=14. I made the bars semi-transparent so that the full reference lines are visible.

As the SAS programmer discovered, the following attempt to add reference lines does not display any reference lines:

title "Bar Chart with Reference Line on Categorical Axis";
proc sgplot data=Sashelp.Class;
  refline 12 14 / axis=x lineattrs=(color=red); /* DOES NOT WORK */
  vbar Age / response=Weight transparency=0.2;
run;

Why don't the reference lines appear? As I have previously written, you must specify the formatted values for a categorical axis. This is mentioned in the documentation for the REFLINE statement, which states that "unformatted numeric values do not map to a formatted discrete axis. For example, if reference lines are drawn at points on a discrete X axis, the REFLINE values must be the formatted value that appears on the X axis." In other words, you must change the REFLINE values to be "the formatted values," which are '12' and '14'. The following call to PROC SGPLOT displays the vertical reference lines:

proc sgplot data=Sashelp.Class;
  refline '12' '14' / axis=x lineattrs=(color=red); /* YES! THIS WORKS! */
  vbar Age / response=Weight transparency=0.2;
run;

The reference lines are shown in the graph at the beginning of this section.

Reference lines and the VBARBASIC statement

I prefer to use the VBARBASIC statement for most bar charts. If you use the VBARBASIC statement, you can specify the raw reference values. To be honest, I am not sure why it works, but, in general, the VBARBASIC statement is better when you need to overlay a bar chart and other graphical elements. If you use the VBARBASIC statement, the natural syntax works as expected:

proc sgplot data=All;
  refline 12 14 / axis=x lineattrs=(color=red);   /* THIS WORKS, TOO! */
  vbarbasic Age / response=Weight transparency=0.2;
run;

The graph is the same as shown in the previous section.

Reference lines for holidays on a graph of sales by date

This section discusses an example that has hundreds of bars. Suppose you want to display a bar chart for sales by date for an entire year. For data like these, I have two recommendations:

  1. Do not use a vertical bar chart. Even if each bar requires only three pixels, the chart will be more than 3*365 ≈ 1,100 pixels wide. On a monitor that displays 72 pixels per inch, this graph would be about 40 cm (15.3 inches) wide. A better choice is to use a needle plot, which is essentially a bar chart where each bar is represented as a vertical line.
  2. The horizontal axis cannot be discrete. If it is, you will get 365 dates printed along the axis. Instead, you want to use the XAXIS TYPE=TIME option to display the bars along an axis where tick marks are placed according to months, not days. (If the categories are not dates but are "days since the beginning," you can use the XAXIS TYPE=LINEAR option instead.)

Recall that the SAS programmer wanted to display holidays on the graph of sales for each day. Rather than specify the holidays on the REFLINE statement (for example, '01JAN2003'd '25DEC2003'd), it is more convenient to put the reference line values into a SAS data set and specify the name of the You can use the HOLIDAY function in SAS to get the date associated with major government holidays.

The following SAS DATA step extracts a year's worth of data for the sale of potato chips (in 2003) from the Sashelp.Snacks data set. These data are concatenated with a separate data set that contains the holidays that you want to display by using reference lines. A needle plot shows the daily sales and the reference lines.

data Snacks;       /* sales of potato chips for each date in 2003 */
set Sashelp.Snacks;
where '01JAN2003'd <= Date <= '31DEC2003'd AND Product="Classic potato chips";
run;
 
data Reflines;     /* holidays to overlay as reference lines */
format RefDate DATE9.;
RefDate = holiday("Christmas", 2003);       output;
RefDate = holiday("Halloween", 2003);       output;
RefDate = holiday("Memorial", 2003);        output;
RefDate = holiday("NewYear", 2003);         output;
RefDate = holiday("Thanksgiving", 2003);    output;
RefDate = holiday("USIndependence", 2003);  output;
RefDate = holiday("Valentines", 2003);      output;
run;
 
data All;          /* concatentate the data and reference lines */
set Snacks RefLines;
run;
 
title "Sales and US Holidays";
title2 "Needle Plot";
proc sgplot data=All;
  refline RefDate / axis=x lineattrs=(color=red);
  needle x=Date y=QtySold;
run;

Notice that you do not have to use the XAXIS TYPE=TIME option with the NEEDLE statement. The SGPLOT procedure uses TYPE=TIME option by default when the X variable has a time, date, or datetime format. If you decide to use the VBARBASIC statement, you should include the XAXIS TYPE=TIME statement.

Summary

In summary, this article shows how to add vertical reference lines to a vertical bar chart. You can use the VBAR statement and specify the formatted reference values, but I prefer to use the VBARBASIC statement whenever I want to overlay a bar chart and other graphical elements. You can also use a needle plot, which is especially helpful when you need to display 100 or more bars.

The post Add reference lines to a bar chart in SAS appeared first on The DO Loop.

11月 022021
 

Industries including sports and entertainment, travel, manufacturing, education and government benefit from analytical insights In the United States and other parts of the world, there are signs. Record automobile traffic. Surging demand for workers. And a continued push to vaccinate. The pandemic and its effects are still very much with [...]

How analytics help move organizations to a “new normal” was published on SAS Voices by Mark Demers

11月 022021
 

There are times when it is useful to simulate data. One of the reasons I use simulated data sets is to demonstrate statistical techniques such as multiple or logistic regression. By using SAS random functions and some DATA step logic, you can create variables that follow certain distributions or are correlated with other variables. You might decide to simulate a drug study where the drug group has a higher or lower mean than a placebo group.

Because most programs that create simulated data use random numbers, let's start off by discussing the RAND function. This function can generate random numbers that follow distributions such as uniform, normal, Bernoulli, as well as dozens of other distributions. Veteran SAS programmers might be more familiar with some of the older random number functions such as RANUNI and RANNOR. RANUNI was used to generate uniform random numbers (numbers between 0 and 1) and RANNOR generated random numbers from a normal distribution. The RAND function has replaced all of the older functions and has a number of advantages over the older functions.

The first argument of the RAND function is the name of the distribution that you want to use, such as Uniform, Normal, or Bernoulli. For some of the distributions, such as Normal, you can supply parameters such as the mean and standard deviation. Here are some examples:

Function Description
rand('Uniform') Generates uniform random numbers (between 0 and 1)
rand('Normal',100,20) Generates values from a normal distribution with a mean of 100 and a standard deviation of 20
rand('Bernoulli',.4) Generates a 0 or 1 with a probability of a 1 equal to .4
rand('Binomial',.2,5) Generates random numbers that represent the number of successes in a sample of size 5 with the probability of success equal to .2

 

Important Note: if you want a reproducible series of random numbers using the RAND function, you must seed it by a call to STREAMINIT (with a positive integer argument) prior to its use. For example:

       call streaminit(132435);

To clarify the note above, here are two programs that use the RAND function—one with, and one without the call to Streaminit.

 data Without;
   do i = 1 to 5;
      x = rand('Uniform');
      output;
   end;   drop i;
run;

Here is the output from running this program twice.

Notice that the values of x are different in each run. Now let's run the same program with CALL STREAMINIT included. Here is the program.

data With;
   call streaminit(13579);
   do i = 1 to 5;
      x = rand('Uniform');
      output;
   end;
   drop i;
run;

And here are the output listings from running this program twice.

Adding CALL STREAMINIT creates the same sequence of random numbers each time the program is run. This is useful if you are generating groups for a drug study and want to be able to re-create the random sequences when it comes time to break the blind and analyze the results. Another reason I sometimes want to generate a repeatable sequence of random numbers is for problem sets included in many of my books—I want the reader to get exactly the same results as I did.

Let's switch topics and see how to write a program where you want to simulate flipping a coin. The program below uses a popular method, but not it is not as elegant as the next program I'm going to show you.

*Old fashioned way to generate "random" events;
data Toss;
   do n = 1 to 10;
      if rand('uniform') lt .5 then Result = 'Tails';
      else Result = 'Heads';
      output;
   end;
run;

In the long run, half of the uniform random numbers will be less than .5, and the proportion of heads and tails will be approximately .5. Here is a listing of data set Toss.

A more sophisticated approach takes advantage of the RAND function's ability to generate random number from multiple distributions. A Bernoulli distribution is similar to a coin toss where you can adjust the probability of a 1 or 0 by including a second parameter to the function. The Toss2 program, shown below, does just that.

*More sophisticated program;
proc format;
   value Heads_Tails 0="Heads" 1="Tails";
run;
 
data Toss2;
   do n = 1 to 10;
      Results = rand('Bernoulli',.5);
      format Results Heads_Tails.;
      output;
   end;
run;

The format Heads_Tails substitutes the labels "Heads" and "Tails" for values of 0 and 1, respectively. Here is a listing of data set Toss2.

The final discussion of this blog, concerns generating random values of two or more variables that are correlated. The example that follows generates x-y pairs that are correlated.

*Creating correlated x-y pairs;
data Corr;
   do i = 1 to 1000;
      x = rand('normal',100,10);
      y = .5*x + rand('Normal',50,10);
      output;
   end;
   drop i;
run;

By including a proportion of the x-value when creating the y-value, the x- and y-values will be correlated. Shown below is the output from PROC CORR, showing that x and y are correlated (r = .45586).

I used a SAS Studio task to create the scatterplot shown next.

You can increase or decrease the correlation by increasing the proportion of x used to create y. For example, you could use y = .8*x + rand('Normal',20,10); to create x-y pairs with a higher correlation.

You can see more examples of the RAND function in my book, SAS Functions by Example, Second Edition, available as an e-book from RedShelf or in print form from Amazon.

To learn more about how to use SAS Studio as part of OnDemand for Academics, to write SAS programs, or to use SAS Studio tasks, please take a look at my new book: Getting Started with SAS Programing: Using SAS Studio in the Cloud (available in e-book from RedShelf or in a paper version from Amazon).

I hope you enjoyed reading this blog and, as usual, I invite comments and/or questions.

 

 

 

 

 

 

Creating Simulated Data Sets was published on SAS Users.

11月 012021
 

A previous article discusses how to use SAS regression procedures to fit a two-parameter Weibull distribution in SAS. The article shows how to convert the regression output into the more familiar scale and shape parameters for the Weibull probability distribution, which are fit by using PROC UNIVARIATE.

Although PROC UNIVARIATE can fit many univariate distributions, it cannot fit a mixture of distributions. For that task, you need to use PROC FMM, which fits finite mixture models. This article discusses how to use PROC FMM to fit a mixture of two Weibull distributions and how to interpret the results. The same technique can be used to fit other mixtures of distributions. If you are going to use the parameter estimates in SAS functions such as the PDF, CDF, and RAND functions, you cannot use the regression parameters directly. You must transform them into the distribution parameters.

Simulate a mixture of Weibull data

You can use the RAND function in the SAS DATA step to simulate a mixture distribution that has two components, each drawn from a Weibull distribution. The RAND function samples from a two-parameter Weibull distribution Weib(α, β) whose density is given by
\(f(x; \alpha, \beta) = \frac{\beta}{\alpha^{\beta}} (x)^{\beta -1} \exp \left(-\left(\frac{x}{\alpha}\right)^{\beta }\right)\)
where α is a shape parameter and β is a scale parameter. This parameterization is used by most Base SAS functions and procedures, as well as many regression procedures in SAS. The following SAS DATA step simulates data from two Weibull distributions. The first component is sampled from Weib(α=1.5, β=0.8) and the second component is sampled from Weib(α=4, β=2). For the mixture distribution, the probability of drawing from the first distribution is 0.667 and the probability of drawing from the second distribution is 0.333.

After generating the data, you can call PROC UNIVARIATE to estimate the parameters for each component. Notice that this fits each component separately. If the parameter estimates are close to the parameter values, that is evidence that the simulation generated the data correctly.

/* sample from a mixture of two-parameter Weibull distributions */
%let N = 3000;
data Have(drop=i);
call streaminit(12345);
array prob [2] _temporary_ (0.667 0.333);
do i = 1 to &N;
   component = rand("Table", of prob[*]);
   if component=1 then
      d = rand("weibull", 1.5, 0.8);  /* C=Shape=1.5; Sigma=Scale=0.8 */
   else
      d = rand("weibull", 4,   2);    /* C=Shape=4; Sigma=Scale=2 */
   output;
end;
run;
 
proc univariate data=Have;
   class component;
   var d;
   histogram d / weibull NOCURVELEGEND;  /* fit (Sigma, C) for each component */
   ods select Histogram ParameterEstimates Moments;
   ods output ParameterEstimates = UniPE;
   inset weibull(shape scale) / pos=NE; 
run;
 
title "Weibull Estimates for Each Component";
proc print data=UniPE noobs;
   where Parameter in ('Scale', 'Shape');
   var Component Parameter Symbol Estimate;
run;

The graph shows a histogram for data in each component. PROC UNIVARIATE overlays a Weibull density on each histogram, based on the parameter estimates. The estimates for both components are close to the parameter values. The first component contains 1,970 observations, which is 65.7% of the total sample, so the estimated mixing probabilities are close to the mixing parameters. I used ODS OUTPUT and PROC PRINT to display one table that contains the parameter estimates from the two groups. PROC UNIVARIATE calls the shape parameter c and the scale parameter σ.

Fitting a finite mixture distribution

The PROC UNIVARIATE call uses the Component variable to identify the Weibull distribution to which each observation belongs. If you do not have the Component variable, is it still possible to estimate a two-component Weibull model?

The answer is yes. The FMM procedure fits statistical models for which the distribution of the response is a finite mixture of distributions. In general, the component distributions can be from different families, but this example is a homogeneous mixture, with both components from the Weibull family. When fitting a mixture model, we assume that we do not know which observations belong to which component. We must estimate the mixing probabilities and the parameters for the components. Typically, you need a lot of data and well-separated components for this effort to be successful.

The following call to PROC FMM fits a two-component Weibull model to the simulated data. As shown in a previous article, the estimates from PROC FMM are for the intercept and scale of the error term for a Weibull regression model. These estimates are different from the shape and scale parameters in the Weibull distribution. However, you can transform the regression estimates into the shape and scale parameters, as follows:

title "Weibull Estimates for Mixture";
proc fmm data=Have plots=density;
   model d = / dist=weibull link=log k=2;
   ods select ParameterEstimates MixingProbs DensityPlot;
   ods output ParameterEstimates=PE0;
run;
 
/* Add the estimates of Weibull scale and shape to the table of regression estimates.
   See https://blogs.sas.com/content/iml/2021/10/27/weibull-regression-model-sas.html */
data FMMPE;
set PE0(rename=(ILink=WeibScale));
if Parameter="Scale" then WeibShape = 1/Estimate;
else WeibShape = ._;  /* ._ is one of the 28 missing values in SAS */
run;
 
proc print data=FMMPE; 
   var Component Parameter Estimate WeibShape WeibScale;
run;

The program renames the ILink column to WeibScale. It also adds a new column (WeibShape) to the ParameterEstimates table. These two columns display the Weibull shape and scale parameter estimates for each component. Despite not knowing which observation came from which component, the procedure provides good estimates for the Weibull parameters. PROC FMM estimates the first component as Weib(α=1.52, β=0.74) and the second component as Weib(α=3.53, β=1.88). It estimates the mixing parameters for the first component as 0.,6 and the parameter for the second component as 0.4.

The PLOTS=DENSITY option on the PROC FMM statement produces a plot of the data and overlays the component and mixture distributions. The plot is shown below and is discussed in the next section.

The graph of the component densities

The PLOTS=DENSITY option produces a graph of the data and overlays the component and mixture distributions. In the graph, the red curve shows the density of the first Weibull component (W1(d)), the green curve shows the density of the second Weibull component (W2(d)), and the blue curve shows the density of the mixture. Technically, only the blue curve is a "true" density that integrates to unity (or 100% on a percent scale). The components are scaled densities. The integral of a component equals the mixing probability, which for these data are 0.6 and 0.4, respectively. The mixture density equals the sum of the component densities.

Look closely at the legend in the plot, which identifies the component curves by the parameter estimates. Notice, that the estimates in the legend are the REGRESSION estimates, not the shape and scale estimates for the Weibull distribution. Do not be misled by the legend. If you plot the PDF
density = PDF("Weibull", d, 0.74, 0.66); /* WRONG! */
you will NOT get the density curve for the first component. Instead, you need to convert the regression estimates into the shapes and scale parameters for the Weibull distribution. The following DATA step uses the transformed parameter estimates and demonstrates how to graph the component and mixture densities:

/* plot the Weibull component densitites and the mixture density */
data WeibComponents;
retain d1 d2;
array WeibScale[2] _temporary_ (0.7351,  1.8820);  /* =exp(Intercept) */
array WeibShape[2] _temporary_ (1.52207, 3.52965); /* =1/Scale */
array MixParm[2]   _temporary_ (0.6, 0.4);
do d = 0.01, 0.05 to 3.2 by 0.05;
   d1 = MixParm[1]*pdf("Weibull", d, WeibShape[1], WeibScale[1]); 
   d2 = MixParm[2]*pdf("Weibull", d, WeibShape[2], WeibScale[2]); 
   Component = "Mixture        "; density = d1+d2; output;
   Component = "Weib(1.52,0.74)"; density = d1;    output; 
   Component = "Weib(3.53,1.88)"; density = d2;    output; 
end;
run;
 
title "Weibull Mixture Components";
proc sgplot data=WeibComponents;
   series x=d y=density / group=Component;
   keylegend / location=inside position=NE across=1 opaque;
   xaxis values=(0 to 3.2 by 0.2) grid offsetmin=0.05 offsetmax=0.05;
   yaxis grid;
run;

The density curves are the same, but the legend for this graph displays the shape and scale parameters for the Weibull distribution. If you want to reproduce the vertical scale (percent), you can multiply the densities by 100*h, where h=0.2 is the width of the histogram bins.

In general, be aware that the PLOTS=DENSITY option produces a graph in which the legend labels refer to the REGRESSION parameters. For example, if you use PROC FMM to fit a mixture of normal distributions, the parameter estimates in the legend are for the mean and the VARIANCE of the normal distributions. However, if you intend to use those estimates in other SAS functions (such as PDF, CDF, and RAND), you must take the square root of the variance to obtain the standard deviation.

Summary

This article uses PROC FMM to fit a mixture of two Weibull distributions. The article shows how to interpret the parameter estimates from the procedure by transforming them into the shape and scale parameters for the Weibull distribution. The article also emphasizes that if you use the PLOTS=DENSITY option produces a graph, the legend in the graph contains the regression parameters, which are not the same as the parameters that are used for the PDF, CDF, and RAND functions.

The post Fit a mixture of Weibull distributions in SAS appeared first on The DO Loop.

10月 302021
 

In a September 10 post on the SAS Users blog, we announced that SAS Analytics Pro is now available for on-site or containerized cloud-native deployment. For our thousands of SAS Analytics Pro customers, this provides an entry point into SAS Viya.

SAS Analytics Pro consists of three core elements of the SAS system: Base SAS®, SAS/GRAPH® and SAS/STAT®. The containerized deployment option adds the full selection of SAS/ACCESS engines making it even easier to work with data from virtually any source.

Even better, the containerized deployment option now adds new statistical capabilities that are not available in SAS/STAT on SAS9. Thanks to SAS Viya’s continuous delivery approach, we are able to provide this additional functionality so soon after the initial release.

Below are highlights of these additional capabilities (you can find more details by following the links):

Causal Inference Procedures

Bayesian Analysis Procedures

  • Model multinomial data with cumulative probit, cumulative logit, generalized link, or other link functions in PROC BGLIMM.
  • Specify fixed scale values in a generalized linear mixed-effects model, and use an improved CMPTMODEL statement in PROC MCMC and PROC NLMIXED to fit compartment models.

Survey Procedures

Additional Capabilities:

For those SAS customers already on SAS Viya, or those considering the move, SAS Analytics Pro provides one more example of the new powers you will enjoy!

Additional statistical capabilities in the containerized deployment of SAS Analytics Pro was published on SAS Users.

10月 282021
 

From articles I've read on the web, it is clear that data is gold in the twenty-first century. Loading, enriching, manipulating and analyzing data is something in which SAS excels. Based on questions from colleagues and customers, it is clear end-users are willing to display data handled by SAS outside of the user interfaces bundled with the SAS software.

I recently completed a series of articles on the SAS Community library where I shed some light on different techniques for feeding web applications with SAS data stored in SAS Viya environment.  The series includes a discussion of options for extracting data, building a React application, how to build web applications using SAS Viya, SAS Cloud Analytic Service (CAS), SAS Compute Server, and SAS Micro Analytic Service (MAS).

I demonstrate the functionality and discuss project details in the video Develop Web Application to Extract SAS Data, found on the SAS Users YouTube Channel.

I'm tying everything together in this post as a reference point. I'll provide a link to each article along with a brief description. The Community articles have all the detailed steps for developing the application. I'm excited bring you this information, so let's get started.

Part 1 - Develop web applications series: Options for extracting data

In this first article, I explain when to use SAS Micro Analytic Service, SAS Viya Jobs, SAS Cloud Analytic Service, and SAS Compute Server.

Part 2 - Develop web applications series: Creating the React based application

To demonstrate the different options, in the second article, I create a simple web application using React JavaScript library. The application also handles authentication against SAS Viya. The application is structured in such a way to avoid redundant code and each component has a well-defined role. From here, we can build the different pages to access CAS, MAS, Compute Server or SAS Viya Jobs.

The image below offers a view of the application which starts in Part 2 and continues throughout the series..

Part 3 - Develop web applications series: Build a web application using SAS Viya Jobs

In this article, I drive you through the steps to retrieve data from the SAS environment using SAS Viya Jobs. We build out the Jobs tab and on the page, display two dropdown boxes to select a library and table. The final piece is a submit button to retrieve the data to populate a table.

Part 4 - Develop web applications series: Build a web application using SAS Cloud Analytic Service

In article number 4, we go through the steps to build a page similar to the one in the previous article, but this time the data comes directly from the SAS Cloud Analytic Service (CAS). We reuse the application structure which was created in Part 2. We focus on the CAS tab. As for the SAS Viya Jobs, we display two dropdown boxes to select a library and table. We finish again with a submit button to retrieve the data to populate a table.

Part 5 - Develop web applications series: Build a web application using SAS Compute Server

In the next article, we go through the steps to build a page similar to the ones from previous articles, but this time the data comes directly from the SAS Compute Server. We reuse the application structure created in this Part 2 article. The remainder of the article focuses on the Compute tab. As for the CAS content, we display two dropdown boxes to select a library and table. Finishing off again with the submit button to retrieve the data to populate a table.

Part 6 - Develop web applications series: Build a web application using SAS Micro Analytic Service

For the final article, you discover how to build a page to access data from the SAS Micro Analytic Service. We reuse the same basic web application built in Part 2. However, this time it will require a bit more preparation work as the SAS Micro Analytic Service (MAS) is designed for model scoring.

Bonus Material - SAS Authentication for ReactJS based applications

In this addendum to the series, I outline the authorization code OAuth flow. This is the recommended means of authenticating to SAS Viya and I provide technical background and detailed code.

Conclusion

If you followed along with the different articles in this series, you should now have a fully functional web application for accessing different data source types from SAS Viya. This application is not for use as-is in production. You should, for example add functionality to handle token expiration. You can of course tweak the interface to get the look and feel you prefer.

See all of my SAS Communities articles here.

Creating a React web app using SAS Viya was published on SAS Users.

10月 272021
 

The convergence of digitalization, AI and machine learning that has been integrated into wearable devices has become a boon for many industries, particularly healthcare. When Fitbit launched its first wearable device in 2009, it clipped on a user’s clothing and utilized an internal motion detector to track the wearer’s movement, [...]

How wearable technology has improved access to healthcare was published on SAS Voices by Caslee Sims

10月 272021
 

It can be frustrating when the same probability distribution has two different parameterizations, but such is the life of a statistical programmer. I previously wrote an article about the gamma distribution, which has two common parameterizations: one that uses a scale parameter (β) and another that uses a rate parameter (c = 1/β).

The relationship between scale and rate parameters is straightforward, but sometimes the relationship between different parameterizations is more complicated. Recently, a SAS programmer was using a regression procedure to fit the parameters of a Weibull distribution. He was confused about how the output from a SAS regression procedure relates to a more familiar parameterization of the Weibull distribution, such as is fit by PROC UNIVARIATE. This article shows how to perform two-parameter Weibull regression in several SAS procedures, including PROC RELIABILITY, PROC LIFEREG, and PROC FMM. The parameter estimates from regression procedures are not the usual Weibull parameters, but you can transform them into the Weibull parameters.

This article fits a two-parameter Weibull model. In a two-parameter model, the threshold parameter is assumed to be 0. A zero threshold assumes that the data can be any positive value.

Fitting a Weibull distribution in PROC UNIVARIATE

PROC UNIVARIATE is the first tool to reach for if you want to fit a Weibull distribution in SAS. The most common parameterization of the Weibull density is
\(f(x; \alpha, \beta) = \frac{\beta}{\alpha^{\beta}} (x)^{\beta -1} \exp \left(-\left(\frac{x}{\alpha}\right)^{\beta }\right)\)
where α is a shape parameter and β is a scale parameter. This parameterization is used by most Base SAS functions and procedures, as well as many regression procedures in SAS. The following SAS DATA step simulates data from the Weibull(α=1.5, β=0.8) distribution and fits the parameters by using PROC UNIVARIATE:

/* sample from a Weibull distribution */
%let N = 100;
data Have(drop=i);
call streaminit(12345);
do i = 1 to &N;
   d = rand("Weibull", 1.5, 0.8);    /* Shape=1.5; Scale=0.8 */
   output;
end;
run;
 
proc univariate data=Have;
   var d;
   histogram d / weibull endpoints=(0 to 2.5 by 0.25);  /* fit Weib(Sigma, C) to the data */
   probplot / weibull2(C=1.383539 SCALE=0.684287) grid; /* OPTIONAL: P-P plot */
   ods select Histogram ParameterEstimates ProbPlot;
run;

The histogram of the simulated data is overlaid with a density from the fitted Weibull distribution. The parameter estimates are Shape=1.38 and Scale=0.68, which are close to the parameter values. PROC UNIVARIATE uses the symbols c and σ for the shape and scale parameters, respectively.

The probability-probability (P-P) plot for the Weibull distribution is shown. In the P-P plot, a reference line is added by using the option weibull2(C=1.383539 SCALE=0.684287). (In practice, you must run the procedure once to get those estimates, then a second time to plot the P-P plot.) The slope of the reference line is 1/Shape = 0.72 and the intercept of the reference line is log(Scale) = -0.38. Notice that the P-P plot is plotting the quantiles of log(d), not of d itself.

Weibull regression versus univariate fit

It might seem strange to use a regression procedure to fit a univariate distribution, but as I have explained before, there are many situations for which a regression procedure is a good choice for performing a univariate analysis. Several SAS regression parameters can fit Weibull models. In these models, it is usually assumed that the response variable is a time until some event happens (such as failure, death, or occurrence of a disease). The documentation for PROC LIFEREG provides an overview of fitting a model where the logarithm of the random errors follows a Weibull distribution. In this article, we do not use any covariates. We simply model the mean and scale of the response variable.

A problem with using a regression procedure is that a regression model provides estimates for intercepts, slopes, and scales. It is not always intuitive to see how those regression estimates relate to the more familiar parameters for the probability distribution. However, the P-P plot in the previous section shows how intercepts and slopes can be related to parameters of a distribution. The documentation for the LIFEREG procedure states that the Weibull scale parameter is exp(Intercept) and the Weibull shape parameter is the reciprocal of the regression scale parameter.

Notice how confusing this is! For the Weibull distribution, the regression model estimates a SCALE parameter for the error distribution. But the reciprocal of that scale estimate is the Weibull SHAPE parameter, NOT the Weibull scale parameter! (In this article, the response distribution and the error distribution are the same.)

Weibull regression in SAS

The LIFEREG procedure includes an option to produce a probability-probability (P-P) plot, which is similar to a Q-Q plot. The LIFEREG procedure also estimates not only the regression parameters but also provides estimates for the exp(Intercept) and 1/Scale quantities. The following statements use a Weibull regression model to fit the simulated data:

title "Weibull Estimates from LIFEREG Procedure";
proc lifereg data=Have;
   model d = / dist=Weibull;
   probplot;
   inset;
run;

The ParameterEstimates table shows the estimates for the Intercept (-0.38) and Scale (0.72) parameters in the Weibull regression model. We previously saw these numbers as the parameters of the reference line in the P-P plot from PROC UNIVARIATE. Here, they are the result of a maximum likelihood estimate for the regression model. To get from these values to the Weibull parameter estimates, you need to compute Weib_Scale = exp(Intercept) = 0.68 and Weib_Shape = 1/Scale = 1.38. PROC LIFEREG estimates these quantities for you and provides standard errors and confidence intervals.

The graphical output of the PROBPLOT statement is equivalent to the P-P plot in PROC UNIVARIATE, except that PROC LIFEREG reverses the axes and automatically adds the reference line and a confidence band.

Other regression procedures

Before ending this article, I want to mention two other regression procedures that perform similar computations: PROC RELIABILITY, which is in SAS/QC software, and PROC FMM in SAS/STAT software.

The following statements call PROC RELIABILITY to fit a regression model to the simulated data:

title "Weibull Estimates from RELIABILITY Procedure";
proc reliability data=Have;
   distribution Weibull;
   model d = ;
run;

The parameter estimates are similar to the estimates from PROC LIFEREG. The output also includes an estimate of the Weibull shape parameter, which is 1/EV_Scale. The output does not include an estimate for the Weibull scale parameter, which is exp(Intercept).

In a similar way, you can use PROC FMM to fit a Weibull model. PROC FMM Is typically used to fix a mixture distribution, but you can specify the K=1 option to fit a single response distribution, as follows:

title "Weibull Estimates from FMM Procedure";
proc fmm data=Have;
   model d = / dist=weibull link=log k=1;
   ods select ParameterEstimates;
run;

The ParameterEstimates table shows the estimates for the Intercept (-0.38) and Scale (0.72) parameters in the Weibull regression model. The Weibull scale parameter is shown in the column labeled "Inverse Linked Estimate." (The model uses a LOG link, so the inverse link is EXP.) There is no estimate for the Weibull shape parameter. which is the reciprocal of the Scale estimate.

Summary

The easiest way to fit a Weibull distribution to univariate data is to use the UNIVARIATE procedure in Base SAS. The Weibull shape and scale parameters are directly estimated by that procedure. However, you can also fit a Weibull model by using a SAS regression procedure. If you do this, the regression parameters are the Intercept and the scale of the error distribution. You can transform these estimates into estimates for the Weibull shape and scale parameters. This article shows the output (and how to interpret it) for several SAS procedures that can fit a Weibull regression model.

Why would you want to use a regression procedure instead of PROC UNIVARIATE? One reason is that the response variable (failure or survival) might depend on additional covariates. A regression model enables you to account for additional covariates and still understand the underlying distribution of the random errors. A second reason is that the FMM procedure can fit a mixture of distributions. To make sense of the results, you must be able to interpret the regression output in terms of the usual parameters for the probability distributions.

The post Interpret estimates for a Weibull regression model in SAS appeared first on The DO Loop.

10月 232021
 

Rijkswaterstaat (RWS) is the Netherlands main agency for design, construction, management and maintenance for waterways and infrastructure. Their mission is to promote safety, mobility and quality of life in the Netherlands. They are the masterminds behind some of the most prestigious water projects in the world. In a recent panel [...]

A conversation with Rijkswaterstaat: How SAS is helping keep the Netherlands safe was published on SAS Voices by Olivia Ojeda