10月 222018
 

This blog post was also written by Reece Clifford.

Who’s responsible for x, y, z sales territory? What’s the most amount of people they engaged with in a month? What type of location leads to the best response from the meeting?

To get the complete answer to these sales team-related questions, you need to trust your data. You need to be able to cut and slice high-quality data to prepare for analytics to drive innovation in your company. With SAS Data Preparation alongside SAS Decision Manager, you can do all this. Its many features allow you to perform out-of-the-box column and row transformations to increase your data quality and build the foundations for data-driven innovation.

This blog will discuss how you can leverage SAS Decision Manager to enrich data when preparing it through SAS Data Preparation.

The use case

As posed above, we want to create a SAS Data Preparation plan to map a sales person to a postcode area. We use a SAS Decision Manager rule to find the sales person for a postcode area and map the person to the address. To trigger the rule, we are going to call it from SAS Data Preparation.

In SAS Decision Manager we import a csv file to create a Lookup Table mapping a sales person to a postcode area. Lookup Tables are tables of key-value pairs and provide the ability to access reference data from business rules.

Next, we create a rule to map a postcode and sales person. A rule specifies conditions to be evaluated and actions to be taken if those conditions are satisfied. Rules are grouped together into rule sets. Most rules correspond to the form:

if condition_expressions then action_expressions

For our rule, we are going to have an incoming postcode plus a record id. The postcode is assumed to be a UK postcode. We are extracting the first two characters of the postcode and lookup the sales person from the Lookup Table that we have just imported.

The rule outputs the sales person (representative) and the record ID. When we have tested and published the rule, it's ready to be used in a SAS Data Preparation Plan.

In SAS Data Preparation, we load a table with address data that we want to enrich by the appropriate sales person.

  1. We need to make sure the table column names and rule input parameter names match. Therefore, we are renaming the field ADDRESS_ID to ID, as ID is the rule input name. The second rule input parameter is Postcode which is the same as in the table, therefore no action is needed.

  1. We can then call the previously-created rule in SAS Decision Manager to map a sales person to an area. This will be done by adding some CASL code to the Code node in the SAS Data Preparation plan. This is featured below with a brief explanation of the functions.
    As the rule has two output parameters, we receive only two columns when executing the code step.

CASL Code

loadactionset "ds2";
action runModel submit / 
	modelTable={name="MONITORRULES", caslib="DCMRULES"}
	modelName="Mon_Person"
	table= {name= _dp_inputTable, caslib=_dp_inputCaslib}
	casout= {name= _dp_outputTable, caslib=_dp_outputCaslib};

Parameters settings for CASL call

modelTable name Name of the table where the rule set was published to.
modelTable caslib Name of the caslib where the rule set was published to.
modelName Name of the decision flow to execute.
table name Table name of the decision flow input data.

(Set to _dp_inputTable)

table caslib caslib name of the table input data.

(Set to _dp_inputCaslib)

casout name Table name of the decision flow output data.

(Set to _dp_outputTable)

casout caslib caslib name of the table output data.

(Set to _dp_outputCaslib)

 

Decision Manager Publishing Dialogue

 

  1. We then wanted to bring back the columns from the input table. We do this through joining the table in the SAS Data Preparation Plan to the original table (again) on the rule output field ID and the tables field ADDRESS_ID.

Conclusion

We have answered our initial question of which sales person is mapped to which region by enriching our data in a user-friendly, efficient process in SAS Data Preparation. We can now begin to gain further insight from our data to answer more of the questions posed at the beginning of the blog to help drive innovation. This can be done through additional insight using SAS Decision Manager or functions in SAS Data Preparation in the current plan or use the output table in another plan. Ultimately, this will facilitate data-driven Innovation via reporting or advanced analytics in your organisation.

Using SAS Decision Manager to enrich the data prep process was published on SAS Users.

10月 222018
 

A SAS programmer asked how to rearrange elements of a matrix. The rearrangement he wanted was rather complicated: certain blocks of data needed to move relative to other blocks, but the values within each block were to remain unchanged. It turned out that the mathematical operation he needed is called a block transpose. The BTRAN function in SAS/IML performs block-transpose operations, so the complicated rearrangement was easy to implement with a judicious call to the BTRAN function.

This article discusses the block-transpose operation and gives an example. It also shows how a block transpose can conveniently transform wide data into long data and vice versa.

The block-transpose operation

In general, a block matrix can contain blocks of various sizes. However, the BTRAN function requires that all blocks be the same size. Specifically, suppose A is a block matrix where each block is an n x m matrix. That means that A is an (r n) x (c m) matrix for some whole numbers r and c. The BTRAN function transposes the blocks to create a (c n) x (r m) block matrix.

The idea is more easily explained by a picture. The following image shows a matrix A that is composed of six blocks (of the same size). The block-transpose operation rearranges the blocks but leaves the elements within the blocks unchanged.

A block-transpose operation transposes the blocks but leaves the contents of each block unchanged

The BTRAN function in SAS/IML

Let's see how the block transpose works on an example in the SAS/IML language. The following statements create a 4 x 12 matrix. I have overlaid some grid lines to help you visualize this matrix as a 2 x 3 block matrix, where each block is 2 x 4.

proc iml;
x = repeat(1:12,2);
x = x // (10+x);
print x;
A 4x12 matrix visualized as a 2x3 block matrix. Each block is 2x4.

You can use the BTRAN function to apply a block transpose. The first argument is the matrix that contains the data. The second and third arguments specify the size of the blocks:

y = btran(x, 2, 4);    /* block size is 2x4 */
print y;
A 6x6 matrix visualized as a 3x2 block matrix. Each block is 2x4.

The output is a 6 x 6 matrix, visualized as a 3 x 2 block matrix.

Block transpose for wide-to-long transforms

If the number of rows in the block equals the number of rows in the data, then the BTRAN function stacks columns. In particular, if the block size is n x 1, the BTRAN function stacks multiple variables into a single column, just like the SHAPECOL function. For example, the Sashelp.Iris data contains four variables named SepalLength, SepalWidth, PetalLength, and PetalWidth. The following SAS/IML statements stack the four variables into a single column and create a new column that identifies the name of the original variable:

proc iml;
/* wide to long */
varNames = {"SepalLength" "SepalWidth" "PetalLength" "PetalWidth"};
use Sashelp.Iris;
   read all var "Species";        /* 150 x 3 */
   read all var varNames into X;  /* 150 x 4 */
close;
n = nrow(X); m = ncol(X);         /* n = 150; m = 4 */
 
ID = repeat(Species, m);          /* repeat the vars that are not transposed */
Var = shapecol(repeat(varNames, n), n*m); /* create column that contains variable names */
Value = btran(X, n, 1);           /* vector with 150*4 rows */ 
 
create test var {ID "Var" "Value"}; append; close;

Block transpose for wide-to-long transforms

In a similar way, you can transpose balanced data from long form to wide form. (Recall that "balanced data" means that each group has the same number of observations.) For example, the Sashelp.Iris data contains a Species variable. The first 50 observations have the value 'Setosa', the next 50 have the value 'Versicolor', and the last 50 have the value 'Virginica'. You can use a WHERE clause or a BY statement to analyze each species separately, but suppose you want to create a new data set that contains only 50 observations and has variables named SepalLengthSetosa, SepalLengthVersicolor, SepalLengthVirginica, and so forth for the other variables. The BTRAN function makes this easy: just specify a block size of 50 x 4, as follows.

/* create pairwise combinations of var names and group levels */
BlockRows = 50;
w = btran(X, BlockRows, m);                  /* 50 x 12 matrix */
 
s = Species[ do(1, n, BlockRows) ];          /* s = {'Setosa' 'Versicolor', 'Virginica'} */
vNames = rowcatc(expandgrid(varNames, s));   /* combine var names and species */
print vNames, w[c=vNames L="Measurenents (cm)"];

In summary, the BTRAN function is a useful function when you need to rearrange blocks of data without changing the values in a block. For balanced designs, you can use the BTRAN function to convert data between wide and long formats.

The post Transpose blocks to reshape data appeared first on The DO Loop.

10月 192018
 

Did you know that you can now chat with SAS Technical Support? Technical Chat enables you to quickly engage with a knowledgeable consultant when you have a SAS question or need help with troubleshooting an issue.

Technical Chat is a great tool for quick questions like these:

  • “What does this error mean?”
  • “How do I apply my new license?”
  • “What release of the operating system is supported on SAS® 9.4?”
  • “How do I activate JMP® without an internet connection?”
  • “Which function can I use to obtain the antilog of a value?”
  • “What is the status of my track?”

How to start a Technical Chat

To get started, click the orange Technical Chat button on select Technical Support web pages. Technical Chat is currently available in the United States and Canada, Monday–Friday from 9 a.m. to 6 p.m. Eastern Time. If the button is not available, technical assistance through other channels is listed on this SAS Technical Support page.  (And don't forget about SAS Support Communities -- peer-to-peer support that's available 24/7.)

Having trouble with a DATA step program or an ODS statement? Specialists in the areas of SAS programming, SAS Studio, and graphics might be available during select afternoon hours. When these specialists are available, you can request their assistance as soon as you click the Technical Chat button. When prompted with What is the nature of your inquiry?, select Usage of Base SAS, SAS/Studio or graphics. For all other questions, select All other products and/or tasks. Generalists are available throughout the day to answer questions.

Ensure that you have the following information available:

  • Site number
  • Operating system
  • Release of SAS, including the maintenance level

When the chat begins, the chat agent’s name appears at the top of the window, as in the following example:

Although every effort is made to resolve your question during the chat, sometimes the chat agent needs to open a track with a subject matter expert. Your question will still be addressed with the same urgency and professionalism that you are accustomed to when working with SAS Technical Support!

Your feedback counts!

When the chat is complete, you can request an emailed copy of the chat transcript. You can also rate your chat experience and provide feedback. Your responses are important to us as we continue to evaluate and improve our Technical Chat services.

Try it out. . . . Chat with SAS Technical Support!

Solve your SAS questions with Technical Chat was published on SAS Users.

10月 172018
 

Any of you who are even slightly into politics, or have followed any news lately, have probably seen something about the DNA test Elizabeth Warren took to prove/disprove her Native American ancestry. The test indicates she might have had a Native American ancestor 6 to 10 generations ago. That's a [...]

The post DNA and your family tree ... 6-10 generations back appeared first on SAS Learning Post.

10月 172018
 

What can you learn about wildfires when you provide a room full of analysts with 7 years of US wildfire data and the tools they need to analyze it? A lot. At a recent data dive, we plit 35 data scientists into 9 teams, provided multiple data sets containing information [...]

3 wildfire predictions from a recent SAS hackathon that might surprise you was published on SAS Voices by Alison Bolen

10月 172018
 

In a recent article about nonlinear least squares, I wrote, "you can often fit one model and use the ESTIMATE statement to estimate the parameters in a different parameterization." This article expands on that statement. It shows how to fit a model for one set of parameters and use the ESTIMATE statement to obtain estimates for a different set of parameters.

A canonical example of a probability distribution that has multiple parameterizations is the gamma distribution. You can parameterize the gamma distribution in terms of a scale parameter or in terms of a rate parameter, where the rate parameter is the reciprocal of the scale parameter. The density functions for the gamma distribution with scale parameter β or rate parameter c = 1 / β are as follows:
Shape α, scale β: f(x; α, β) = 1 / (βα Γ(α)) xα-1 exp(-x/β)
Shape α, rate c: f(x; α, c) = cα / Γ(α) xα-1 exp(-c x)

You can use PROC NLMIXED to fit either set of parameters and use the ESTIMATE statement to obtain estimates for the other set. In general, you can do this whenever you can explicitly write down an analytical formula that expresses one set of parameters in terms of another set.

Estimating the scale or rate parameter in a gamma distribution

Let's implement this idea on some simulated data. The following SAS DATA step simulates 100 observations from a gamma distribution with shape parameter α = 2.5 and scale parameter β = 1 / 10. A call to PROC UNIVARIATE estimates the parameters from the data and overlays a gamma density on the histogram of the data:

/* The gamma model has two different parameterizations: a rate parameter and a scale parameter.
   Generate gamma-distributed data and fit a model for each parameterization. Use the 
   ESTIMATE stmt to estimate the parameter in the model that was not fit. 
*/
%let N = 100;                                    /* N = sample size */
data Gamma;
aParm = 2.5; scaleParm = 1/10;  rateParm = 1/scaleParm;
do i = 1 to &N;
   x = rand("gamma", aParm, scaleParm); 
   output;
end;
run;
 
proc univariate data=Gamma;
   var x;
   histogram x / gamma;
run;
Distribution of gamma(2.5, 0.1) distributed data, overlaid with density estimate

The parameter estimates are (2.99, 0.09), which are close to the parameter values that were used to generate the data. Notice that PROC UNIVARIATE does not provide standard errors or confidence intervals for these point estimates. However, you can get those statistics from PROC NLMIXED. PROC NLMIXED also supports the ESTIMATE statement, which can estimate the rate parameter:

/*Use PROC NLMIXED to fit the shape and scale parameters in a gamma model */
title "Fit Gamma Model, SCALE Parameter";
proc nlmixed data=Gamma;
   parms a 1 scale 1;             * initial value for parameter;
   bounds 0 < a scale;
   model x ~ gamma(a, scale);
   rate = 1/scale;
   estimate 'rate' rate;
   ods output ParameterEstimates=PE;
   ods select ParameterEstimates ConvergenceStatus AdditionalEstimates;
run;

The parameter estimates are the same as from PROC UNIVARIATE. However, PROC NLMIXED also provides standard errors and confidence intervals for the parameters. More importantly, the ESTIMATE statement enables you to obtain an estimate of the rate parameter without needing to refit the model. The next section explains how the rate parameter is estimated.

New estimates and changing variables

You might wonder how the ESTIMATE statement computes the estimate for the rate parameter from the estimate of the scale parameter. The PROC NLMIXED documentation states that the procedure "computes approximate standard errors for the estimates by using the delta method (Billingsley 1986)." I have not studied the "delta method," but it seems to a way to approximate a nonlinear transformation by using the first two terms of its Taylor series, otherwise known as a linearization.

In the ESTIMATE statement, you supply a transformation, which is c = 1 / β for the gamma distribution. The derivative of that transformation is -1 / β2. The estimate for β is 0.09. Plugging that estimate into the transformations give 1/0.09 as the point estimate for c and 1/0.092 as a linear multiplier for the size of the standard error. Geometrically, the linearized transformation scales the standard error for the scale parameter into the standard error for the rate parameter. The following SAS/IML statements read in the parameter estimates for β. It then uses the transformation to produce a point estimate for the rate parameter, c, and uses the linearization of the transformation to estimate standard errors and confidence intervals for c.

proc iml;
start Xform(p);
   return( 1 / p );          /* estimate rate from scale parameter */
finish;
start JacXform(p);
   return abs( -1 / p##2 );  /* Jacobian of transformation */
finish;
 
use PE where (Parameter='scale');
read all var {Estimate StandardError Lower Upper};
close;
 
xformEst = Xform(Estimate);
xformStdErr = StandardError * JacXform(Estimate);
CI = Lower || Upper;
xformCI = CI * JacXform(Estimate);
AdditionalParamEst = xformEst || xformStdErr || xformCI;
print AdditionalParamEst[c={"Est" "StdErr" "LowerCL" "UpperCL"} F=D8.4];
quit;

The estimates and other statistics are identical to those produced by the ESTIMATE statement in PROC NLMIXED. For a general discussion of changing variables in probability and statistics, see these Penn State course notes. For a discussion about two different parameterizations of the lognormal distribution, see the blog post "Geometry, sensitivity, and parameters of the lognormal distribution."

In case you are wondering, you obtain exactly the same parameter estimates and standard errors if you fit the rate parameter directly. That is, the following call to PROC NLMIXED produces the same parameter estimates for c.

/*Use PROC NLMIXED to fit gamma model using rate parameter */
title "Fit Gamma Model, RATE Parameter";
proc nlmixed data=Gamma;
   parms a 1 rate 1;             * initial value for parameter;
   bounds 0 < a rate;
   model x ~ gamma(a, 1/rate);
   ods select ParameterEstimates;
run;

For this example, fitting the scale parameter is neither more nor less difficult than fitting the rate parameter. If I were faced with a distribution that has two different parameterizations, one simple and one more complicated, I would try to fit the simple parameters to data and use this technique to estimate the more complicated parameters.

The post Parameter estimates for different parameterizations appeared first on The DO Loop.

10月 162018
 

What if you could automatically detect supply chain anomalies as they happen, or even predict them in advance? You'd be able to take timely corrective action and help maximize revenue, margins, customer satisfaction and shareholder value. There's no question: Supply chain planning and execution is complex. From design and sourcing, to [...]

The supply chain of things: IoT in supply chain was published on SAS Voices by Scott Nalick

10月 152018
 


Old and new SAS users alike learned the tricks of the data trade from our Little SAS Book! We hope these fun tips from our exercise and project book teach you even more about how to master the data analytics game!

From Rebecca Ottesen:

Tip #1: Grouping Quantitative Variables
My favorite tip to share with students and SAS users is how to use PROC FORMAT to group quantitative variables into categories. A format can be created with a VALUE statement that specifies the ranges relevant to the category groupings. Then, this format can be applied with a FORMAT statement during an analysis to group the variable accordingly (don't forget the CLASS statement when applicable). You can also create categorical variables in the DATA step by applying the format in an assignment statement with a PUT function.

From Lora Delwiche:

Tip #2: Commenting Blocks of Code
This tip I learned from fellow SAS Press author Alan Wilson at SAS Global Forum 2008 in San Antonio. It might be a bit overly dramatic to say that this tip changed my life, but that’s not far from the truth! So, I am paying this tip forward. Thank you, Alan!

To comment out a whole block of code, simply highlight the lines of code, hold down the control key, and press the forward slash ( /). SAS will take those lines of code and turn them into comments by adding a /* to the beginning of each line and an */ at the end of each line.

To convert the commented lines back to code, highlight the lines again, hold down the control and shift keys, and press the forward slash ( /). This works in both the SAS Windowing environment (Display Manager) and SAS Enterprise Guide.

If you are using SAS Studio as your programming interface, you comment the same way, but to uncomment, just hold down the control key and then press the forward slash.

From Susan J. Slaughter:
Tip #3: Susan's Macro Mottos
There is no question that writing and debugging SAS macros can be a challenge. So I have two "macro mottos" that I use to help keep me on track.

“Remember, you are writing a program that writes a program.”

This is the most important concept to keep in mind whenever you work with SAS macros. If you feel the least bit confused by a macro, repeating this motto can help you to see what is going on. I speak from personal experience here. This is my macro mantra.

“To avoid mangling your macros, always write them one piece at a time.”

This means, write your program in standard SAS code first. When that is working and bug-free, then add your %MACRO and %MEND statements. When they are working, then add your parameters, if any, one at a time. If you make sure that each macro feature you add is working before you add another one, then debugging will be vastly simplified.

And, this is the best time ever to learn SAS! When I first encountered SAS, there were only two ways that I could get help. I could either ask another graduate student who might or might not know the answer, or I could go to the computer center and borrow the SAS manual. (There was only one.) Today it's totally different.

I am continually AMAZED by the resources that are available now—many for FREE. Here are four resources that every new SAS user should know about:

1. SAS Studio
This is a wonderful new interface for SAS that runs in a browser and has both programming and point-and-click features. SAS Studio is free for students, professors, and independent learners. You can download the SAS University Edition to run SAS Studio on your own computer, or use SAS OnDemand for Academics via the Internet.

2. Online classes
Two of the most popular self-paced e-learning classes are available for free: SAS Programming 1: Essentials, and Statistics 1. These are real classes which in the past people paid hundreds of dollars to take.

3. Videos
You can access hundreds of SAS training videos, tutorials, and demos at support.sas.com/training. Topics range from basic (What is SAS?) to advanced (SAS 9.4 Metadata Clustering).

4. Community of SAS users
If you encounter a problem, it is likely that someone else faced a similar situation and figured out how to solve it. On communities.sas.com you can post questions and get answers from SAS users and developers. On the site, www.lexjansen.com, you can find virtually every paper ever presented at a SAS users group conference.

If you want even more tips and tricks, check out our Exercises and Projects for The Little SAS Book, Fifth Edition! Let us know if enjoyed these tips in the comment boxes below.

New to SAS? Ready to learn more? Check out these tips and tricks from the authors of Exercises and Projects for The Little SAS Book, 5th Edition was published on SAS Users.