10月 172018
 

In a recent article about nonlinear least squares, I wrote, "you can often fit one model and use the ESTIMATE statement to estimate the parameters in a different parameterization." This article expands on that statement. It shows how to fit a model for one set of parameters and use the ESTIMATE statement to obtain estimates for a different set of parameters.

A canonical example of a probability distribution that has multiple parameterizations is the gamma distribution. You can parameterize the gamma distribution in terms of a scale parameter or in terms of a rate parameter, where the rate parameter is the reciprocal of the scale parameter. The density functions for the gamma distribution with scale parameter β or rate parameter c = 1 / β are as follows:
Shape α, scale β: f(x; α, β) = 1 / (βα Γ(α)) xα-1 exp(-x/β)
Shape α, rate c: f(x; α, c) = cα / Γ(α) xα-1 exp(-c x)

You can use PROC NLMIXED to fit either set of parameters and use the ESTIMATE statement to obtain estimates for the other set. In general, you can do this whenever you can explicitly write down an analytical formula that expresses one set of parameters in terms of another set.

Estimating the scale or rate parameter in a gamma distribution

Let's implement this idea on some simulated data. The following SAS DATA step simulates 100 observations from a gamma distribution with shape parameter α = 2.5 and scale parameter β = 1 / 10. A call to PROC UNIVARIATE estimates the parameters from the data and overlays a gamma density on the histogram of the data:

/* The gamma model has two different parameterizations: a rate parameter and a scale parameter.
   Generate gamma-distributed data and fit a model for each parameterization. Use the 
   ESTIMATE stmt to estimate the parameter in the model that was not fit. 
*/
%let N = 100;                                    /* N = sample size */
data Gamma;
aParm = 2.5; scaleParm = 1/10;  rateParm = 1/scaleParm;
do i = 1 to &N;
   x = rand("gamma", aParm, scaleParm); 
   output;
end;
run;
 
proc univariate data=Gamma;
   var x;
   histogram x / gamma;
run;
Distribution of gamma(2.5, 0.1) distributed data, overlaid with density estimate

The parameter estimates are (2.99, 0.09), which are close to the parameter values that were used to generate the data. Notice that PROC UNIVARIATE does not provide standard errors or confidence intervals for these point estimates. However, you can get those statistics from PROC NLMIXED. PROC NLMIXED also supports the ESTIMATE statement, which can estimate the rate parameter:

/*Use PROC NLMIXED to fit the shape and scale parameters in a gamma model */
title "Fit Gamma Model, SCALE Parameter";
proc nlmixed data=Gamma;
   parms a 1 scale 1;             * initial value for parameter;
   bounds 0 < a scale;
   model x ~ gamma(a, scale);
   rate = 1/scale;
   estimate 'rate' rate;
   ods output ParameterEstimates=PE;
   ods select ParameterEstimates ConvergenceStatus AdditionalEstimates;
run;

The parameter estimates are the same as from PROC UNIVARIATE. However, PROC NLMIXED also provides standard errors and confidence intervals for the parameters. More importantly, the ESTIMATE statement enables you to obtain an estimate of the rate parameter without needing to refit the model. The next section explains how the rate parameter is estimated.

New estimates and changing variables

You might wonder how the ESTIMATE statement computes the estimate for the rate parameter from the estimate of the scale parameter. The PROC NLMIXED documentation states that the procedure "computes approximate standard errors for the estimates by using the delta method (Billingsley 1986)." I have not studied the "delta method," but it seems to a way to approximate a nonlinear transformation by using the first two terms of its Taylor series, otherwise known as a linearization.

In the ESTIMATE statement, you supply a transformation, which is c = 1 / β for the gamma distribution. The derivative of that transformation is -1 / β2. The estimate for β is 0.09. Plugging that estimate into the transformations give 1/0.09 as the point estimate for c and 1/0.092 as a linear multiplier for the size of the standard error. Geometrically, the linearized transformation scales the standard error for the scale parameter into the standard error for the rate parameter. The following SAS/IML statements read in the parameter estimates for β. It then uses the transformation to produce a point estimate for the rate parameter, c, and uses the linearization of the transformation to estimate standard errors and confidence intervals for c.

proc iml;
start Xform(p);
   return( 1 / p );          /* estimate rate from scale parameter */
finish;
start JacXform(p);
   return abs( -1 / p##2 );  /* Jacobian of transformation */
finish;
 
use PE where (Parameter='scale');
read all var {Estimate StandardError Lower Upper};
close;
 
xformEst = Xform(Estimate);
xformStdErr = StandardError * JacXform(Estimate);
CI = Lower || Upper;
xformCI = CI * JacXform(Estimate);
AdditionalParamEst = xformEst || xformStdErr || xformCI;
print AdditionalParamEst[c={"Est" "StdErr" "LowerCL" "UpperCL"} F=D8.4];
quit;

The estimates and other statistics are identical to those produced by the ESTIMATE statement in PROC NLMIXED. For a general discussion of changing variables in probability and statistics, see these Penn State course notes. For a discussion about two different parameterizations of the lognormal distribution, see the blog post "Geometry, sensitivity, and parameters of the lognormal distribution."

In case you are wondering, you obtain exactly the same parameter estimates and standard errors if you fit the rate parameter directly. That is, the following call to PROC NLMIXED produces the same parameter estimates for c.

/*Use PROC NLMIXED to fit gamma model using rate parameter */
title "Fit Gamma Model, RATE Parameter";
proc nlmixed data=Gamma;
   parms a 1 rate 1;             * initial value for parameter;
   bounds 0 < a rate;
   model x ~ gamma(a, 1/rate);
   ods select ParameterEstimates;
run;

For this example, fitting the scale parameter is neither more nor less difficult than fitting the rate parameter. If I were faced with a distribution that has two different parameterizations, one simple and one more complicated, I would try to fit the simple parameters to data and use this technique to estimate the more complicated parameters.

The post Parameter estimates for different parameterizations appeared first on The DO Loop.

10月 162018
 

What if you could automatically detect supply chain anomalies as they happen, or even predict them in advance? You'd be able to take timely corrective action and help maximize revenue, margins, customer satisfaction and shareholder value. There's no question: Supply chain planning and execution is complex. From design and sourcing, to [...]

The supply chain of things: IoT in supply chain was published on SAS Voices by Scott Nalick

10月 152018
 


Old and new SAS users alike learned the tricks of the data trade from our Little SAS Book! We hope these fun tips from our exercise and project book teach you even more about how to master the data analytics game!

From Rebecca Ottesen:

Tip #1: Grouping Quantitative Variables
My favorite tip to share with students and SAS users is how to use PROC FORMAT to group quantitative variables into categories. A format can be created with a VALUE statement that specifies the ranges relevant to the category groupings. Then, this format can be applied with a FORMAT statement during an analysis to group the variable accordingly (don't forget the CLASS statement when applicable). You can also create categorical variables in the DATA step by applying the format in an assignment statement with a PUT function.

From Lora Delwiche:

Tip #2: Commenting Blocks of Code
This tip I learned from fellow SAS Press author Alan Wilson at SAS Global Forum 2008 in San Antonio. It might be a bit overly dramatic to say that this tip changed my life, but that’s not far from the truth! So, I am paying this tip forward. Thank you, Alan!

To comment out a whole block of code, simply highlight the lines of code, hold down the control key, and press the forward slash ( /). SAS will take those lines of code and turn them into comments by adding a /* to the beginning of each line and an */ at the end of each line.

To convert the commented lines back to code, highlight the lines again, hold down the control and shift keys, and press the forward slash ( /). This works in both the SAS Windowing environment (Display Manager) and SAS Enterprise Guide.

If you are using SAS Studio as your programming interface, you comment the same way, but to uncomment, just hold down the control key and then press the forward slash.

From Susan J. Slaughter:
Tip #3: Susan's Macro Mottos
There is no question that writing and debugging SAS macros can be a challenge. So I have two "macro mottos" that I use to help keep me on track.

“Remember, you are writing a program that writes a program.”

This is the most important concept to keep in mind whenever you work with SAS macros. If you feel the least bit confused by a macro, repeating this motto can help you to see what is going on. I speak from personal experience here. This is my macro mantra.

“To avoid mangling your macros, always write them one piece at a time.”

This means, write your program in standard SAS code first. When that is working and bug-free, then add your %MACRO and %MEND statements. When they are working, then add your parameters, if any, one at a time. If you make sure that each macro feature you add is working before you add another one, then debugging will be vastly simplified.

And, this is the best time ever to learn SAS! When I first encountered SAS, there were only two ways that I could get help. I could either ask another graduate student who might or might not know the answer, or I could go to the computer center and borrow the SAS manual. (There was only one.) Today it's totally different.

I am continually AMAZED by the resources that are available now—many for FREE. Here are four resources that every new SAS user should know about:

1. SAS Studio
This is a wonderful new interface for SAS that runs in a browser and has both programming and point-and-click features. SAS Studio is free for students, professors, and independent learners. You can download the SAS University Edition to run SAS Studio on your own computer, or use SAS OnDemand for Academics via the Internet.

2. Online classes
Two of the most popular self-paced e-learning classes are available for free: SAS Programming 1: Essentials, and Statistics 1. These are real classes which in the past people paid hundreds of dollars to take.

3. Videos
You can access hundreds of SAS training videos, tutorials, and demos at support.sas.com/training. Topics range from basic (What is SAS?) to advanced (SAS 9.4 Metadata Clustering).

4. Community of SAS users
If you encounter a problem, it is likely that someone else faced a similar situation and figured out how to solve it. On communities.sas.com you can post questions and get answers from SAS users and developers. On the site, www.lexjansen.com, you can find virtually every paper ever presented at a SAS users group conference.

If you want even more tips and tricks, check out our Exercises and Projects for The Little SAS Book, Fifth Edition! Let us know if enjoyed these tips in the comment boxes below.

New to SAS? Ready to learn more? Check out these tips and tricks from the authors of Exercises and Projects for The Little SAS Book, 5th Edition was published on SAS Users.

10月 152018
 

There are several ways to use SAS to get the unique values for a data variable. In Base SAS, you can use the TABLES statement in PROC FREQ to generate a table of unique values (and the counts). You can also use the DISTINCT function in PROC SQL to get the same information. In the SAS/IML language, you can use the UNIQUE function to find the unique values of a vector. In all these cases, the values are returned in sorted (alphanumeric) order. For example, here are three ways to obtain the unique values of the TYPE variable in the Sashelp.Cars data set. Only the result from PROC IML is shown:

/* Three ways to get unique values of a variable in sorted order */
proc freq data=sashelp.cars;
tables Type / missing;  /* MISSING option treats mising values as a valid category */
run;
 
proc sql;
   select distinct(Type) as uType
   from sashelp.cars;
quit;
 
proc iml;
use sashelp.cars;
   read all var "Type";
close;
 
uType = unique(Type);
print uType;

Unique values in data order

The FREQ procedure has a nice option that I use FREQently: you can use the ORDER=DATA option to obtain the unique categories in the order in which they appear in a data set. Over the years, I've used the ORDER=DATA option for many purposes, including a recent post that shows how to create a "Top 10" chart. Another useful option is ORDER=FREQ, which orders the categories in descending order by frequency.

Recently a colleague asked how to obtain the unique values of a SAS/IML vector in the order in which they appear in the vector. I thought it might also be useful to support an option to return the unique values in (descending) order of frequency, so I wrote the following function. The first argument to the UniqueOrder function is the data vector. The second (optional) argument is a string that determines the order of the unique values. If the second argument has the value "DATA", then the unique values are returned in data order. If the second argument has the value "FREQ", then the unique values are returned in descending order by frequency. The function is shown below. I test the function on a character vector (TYPE) and a numeric vector (CYLINDERS) that contains some missing values.

proc iml;
/* Return the unique values in a row vector. The second parameter
   determines the sort order of the result.
   "INTERNAL": Order by alphanumeric values (default)
   "DATA"    : Order in which elements appear in vector
   "FREQ"    : Order by frequency
*/
start UniqueOrder(x, order="INTERNAL");
   if upcase(order)="DATA" then do;
      u = unique(x);              /* unique values (sorted) */
      idx = j(ncol(u),1,0);       /* vector to store indices in data order */
      do i = 1 to ncol(u);
         idx[i] = loc(x=u[i])[1]; /* 1st location of i_th unique value */
      end;
      call sort(idx);             /* sort the indices ==> data order */
      return ( T(x[idx]) );       /* put the values in data order */
   end;
   else if upcase(order)="FREQ" then do;
      call tabulate(u, freq, x, "missing"); /* compute freqs; missing is valid level */
      call sortndx(idx, freq`) descend=1;   /* order descending by freq */
      return ( T(u[idx]) );
   end;
   else return unique(x);
finish;
 
/* test the function by calling it on a character and numeric vectors */
use sashelp.cars;
   read all var "Type";
   read all var "Cylinders";
close;
 
uData = UniqueOrder(Type, "data"); /* unique values in the order they appear in data */
uFreq = UniqueOrder(Type, "freq"); /* unique values in descending frequency order */
print uData, uFreq;
 
uData = UniqueOrder(Cylinders, "data");
uFreq = UniqueOrder(Cylinders, "freq");
print uData, uFreq;

The results of the function are similar to the results of PROC FREQ when you use the ORDER= option. There is a small difference when the data contain missing values. PROC FREQ always lists the missing value first, regardless of where the missing values appear in the data or how many missing values there are. In contrast, the SAS/IML function handles missing and nonmissing values equivalently.

In summary, whether you are implementing an analysis in Base SAS or SAS/IML, this article shows how to obtain the unique values of data in sorted order, in order of frequency, or in the order that the values appear in the data.

The post Get the unique values of a variable in data order appeared first on The DO Loop.

10月 122018
 

Note: Today’s utility industry is in upheaval. All of the assumptions the business has run on have been turned on their heads. This post is the first in a three-part series looking at how analytics are helping utilities navigate this challenging landscape and find new opportunities for improvements in operations, [...]

The digital utility era is here -- and now it’s time for a platform was published on SAS Voices by Mike F. Smith

10月 112018
 

Every brand offers a digital experience for a reason. But to achieve on your mission, raising awareness and attracting visitors through online media is critical. No visitors? Game over. Let’s dive into a business case using website visitor data to sas.com. Suppose that a manager asks: What did our web [...]

SAS Customer Intelligence 360: Visual forecasting, traffic acquisition, and digital media was published on Customer Intelligence Blog.

10月 112018
 

Smart retailers know that omnichannel customer experience isn't just about marketing anymore.  It’s about bridging all your digital and physical channels to recognize customers wherever they are, collecting data and understanding the retail customer’s purchasing journey. By taking customer data, product data, and supply chain data - and applying predictive and prescriptive [...]

Retailers: is your customer experience strategy working? was published on SAS Voices by Greg Heidrick