11月 042019
 

According to the Price Waterhouse Cooper 2018 Global Economic Crime and Fraud Survey, the reported rate of economic crime is on the rise, up to 49% in 2018. That makes the use case I want to share particularly relevant, no matter what industry or sector you're in. This use case [...]

How using analytics and AI to detect payment fraud netted an immediate $16 million ROI was published on SAS Voices by David Pope

11月 042019
 

Understanding multivariate statistics requires mastery of high-dimensional geometry and concepts in linear algebra such as matrix factorizations, basis vectors, and linear subspaces. Graphs can help to summarize what a multivariate analysis is telling us about the data. This article looks at four graphs that are often part of a principal component analysis of multivariate data. The four plots are the scree plot, the profile plot, the score plot, and the pattern plot.

The graphs are shown for a principal component analysis of the 150 flowers in the Fisher iris data set. In SAS, you can create the graphs by using PROC PRINCOMP. By default, the scatter plots that display markers also label the markers by using an ID variable (such as name, state, patient ID, ...) or by using the observation number if you do not specify an ID variable. To suppress the labels, you can create an ID variable that contains a blank string, as follows:

data iris;
set Sashelp.iris;
id = " ";                  /* create empty label variable */
run;

The following statements create the four plots as part of a principal component analysis. The data are the measurements (in millimeters) of length and width of the sepal and petal for 150 iris flowers:

ods graphics on;
proc princomp data=iris         /* use N= option to specify number of PCs */
              STD               /* optional: stdize PC scores to unit variance */
              out=PCOut         /* only needed to demonstate corr(PC, orig vars) */
              plots=(scree profile pattern score);
   var SepalLength SepalWidth PetalLength PetalWidth;  /* or use _NUMERIC_ */
   ID id;                       /* use blank ID to avoid labeling by obs number */
   ods output Eigenvectors=EV;  /* to create loadings plot, output this table */
run;

The principal components are linear combinations of the original data variables. Before we discuss the graph, let's identify the principal components and interpret their relationship to the original variables. The linear coefficients for the PCs (sometimes called the "loadings") are shown in the columns of the Eigenvectors table.

  • The first PC is the linear combination PC1 = 0.52*SepalLength – 0.27*SepalWidth + 0.58*PetalLength + 0.56*PetalWidth. You can interpret this as a contrast between the SepalWidth variable and an equally weighted sum of the other variables.
  • For the second PC, the coefficients for the PetalLength and PetalWidth variables are very small. Therefore, the second PC is approximately PC2 ≈ 0.38*SepalLength + 0.92*SepalWidth. You can interpret this weighted sum as a vector that points mostly in the direction of the SepalWidth variable but has a small component in the direction of the SepalLength variable.
  • In a similar way, the third PC is primarily a weighted contrast between the SepalLength and PetalWidth variables, with smaller contributions from the other variables.
  • The fourth PC is a weighted contrast between the SepalWidth and PetalLength variables (with positive coefficients) and the SepalLength and PetalWidth variables (with negative coefficients).

Note that the principal components (which are based on eigenvectors of the correlation matrix) are not unique. If v is a PC vector, then so is -v. If you compare PCs from two different software packages, you might notice that a PC from one package is the negative of the same PC from another package.

You could present this table graphically by creating a "loadings plot," as shown in the last section of this article.

The scree plot

Recall that the main idea behind principal component analysis (PCA) is that most of the variance in high-dimensional data can be captured in a lower-dimensional subspace that is spanned by the first few principal components. You can therefore to "reduce the dimension" by choosing a small number of principal components to retain. But how many PCs should you retain? The scree plot is a line plot of the eigenvalues of the correlation matrix, ordered from largest to smallest. (If you use the COV option, it is a plot of the eigenvalues of the covariance matrix.)

You can use the scree plot as a graphical tool to help you choose how many PCs to retain. In the scree plot for the iris data, you can see (on the "Variance Explained" plot) that the first two eigenvalues explain about 96% of the variance in the four-dimensional data. This suggests that you should retain the first two PCs, and that a projection of the data onto the first to PCs will give you a good way to visualize the data in a low-dimensional linear subspace.

The profile plot

The profile plot shows the correlations between each PC and the original variables. To some extent, you can guess the sign and the approximate magnitude of the correlations by looking at the coefficients that define each PC as a linear combination of the original variables. The correlations are shown in the following "Component Pattern Profiles" plot.

The profile plot reveals the following facts about the PCs:

  • The first PC (solid blue line) is strongly positively correlated with SepalLength, PetalLength, and PetalWidth. It is moderately negatively correlated with SepalWidth.
  • The second PC (dashed reddish line) is positively correlated with SepalLength and SepalWidth.
  • The third and fourth PCs have only small correlations with the original variables.

The component pattern plot shows all pairwise correlations at a glance. However, if you are analyzing many variables (say 10 or more), the plot will become very crowded and hard to read. In that situation, the pattern plots are easier to read, as shown in the next section.

The pattern plots

The output from PROC PRINCOMP includes six "component pattern" plots, which show the correlations between the principal components and the original variables. Because there are four PCs, a component pattern plot is created for each pairwise combination of PCs: (PC1, PC2), (PC1, PC3), (PC1, PC4), (PC2, PC3), (PC2, PC4), and (PC3, PC4). In general, if there are k principal components, there are N(N-1)/2 pairwise combinations of PCs.

Each plot shows the correlations between the original variables and the PCs. For example, the graph shown to the right shows the correlations between the original variables and the first two PCs. Each point represents the correlations between an original variable and two PCs. The correlations with the first PC are plotted on the horizontal axis; the correlations with the second PC are plotted on the vertical axis.

Six graphs require a lot of space for what is essentially a grid of sixteen numbers. If you want to see the actual correlations in a table, you can call PROC CORR on the OUT= data set, as follows:

/* what are the correlations between PCs and orig vars? */
proc corr data=PCOUT noprob nosimple;
   var SepalLength SepalWidth PetalLength PetalWidth;
   with Prin1-Prin4;
run;

If you know you want to keep only two PCs, you can use the N=2 option on the PROC PRINCOMP statement, which will reduce the number of graphs that are created.

The score plots

The score plots indicate the projection of the data onto the span of the principal components. As in the previous section, this four-dimensional example results in six score plots, one for each pairwise combination of PCs. You will get different plots if you create PCs for the covariance matrix (the COV option) as opposed to the correlation matrix (the default). Similarly, if you standardize the PCs (the STD option) or do not standardize them (the default), the corresponding score plot will be different.

The score plot for the first two PCs is shown. Notice that it uses equal scales for the axes. The graph shows that the first principal component separates the data into two clusters. The left cluster contains the flower from the Iris setosa species. You can see a few outliers, such as one setosa flower whose second PC score (about -2.5) is much smaller than the other setosa flowers.

ODS graphics provide an easy way to generate a quick look at the data. However, if you want to more control over the graphs, it is often simplest to output the results to a SAS data set and customize the plots by hand. You can use the OUT= option to write the PCs to a data set. The following call to PROC SGPLOT creates the same score plot but colors the markers by the Species variable and adds a grid of reference lines.

title "Score Plot";
title2 "Observations Projected onto PC1 and PC2";
proc sgplot data=PCOut aspect=1;
   scatter x=Prin1 y=Prin2 / group=species;
   xaxis grid label="Component 1 (72.96%)";
   yaxis grid label="Component 2 (22.85%)";
run;

The loadings plots

A loadings plot is a plot of two columns of the Eigenvectors table. PROC PRINCOMP does not create a loadings plot automatically, but there are two ways to create it. One way is to use the ODS OUTPUT to write the Eigenvectors table to a SAS data set. The previous call to PROC PRINCOMP created a data set named EV. The following call to PROC SGPLOT creates a score plot that projects the observations onto the first two PCs.

title "Loadings Plot";
title2 "Variables Projected onto PC1 and PC2";
proc sgplot data=EV aspect=1;
   vector x=Prin1 y=Prin2 / datalabel=Variable;
   xaxis grid label="Component 1 (72.96%)";
   yaxis grid label="Component 2 (22.85%)";
run;

The loadings plot shows the relationship between the PCs and the original variables. You can use the graph to show how the original variables relate to the PCs, or the other way around. For example, the graph indicates that the PetalWidth and PetalLength variables point in the same direction as PC1. The graph also shows that the second PC is primarily in the direction of the SepalWidth variable, with a small shift towards the direction of the SepalLength variable. The second PC is essentially orthogonal to the and PetalWidth and PetalLength variables.

The second way to create a loadings plot is to use PROC FACTOR, as shown by the following statements. To documentation for PROC FACTOR compares the PROC FACTOR analysis to the PROC PRINCOMP analysis.

proc factor data=iris N=2         /* use N= option to specify number of PCs */
     method=principal       
     plots=(initloadings(vector));
   var SepalLength SepalWidth PetalLength PetalWidth;  /* or use _NUMERIC_ */
run;

Summary

In summary, PROC PRINCOMP can compute a lot of graphs that are associated with a principal component analysis. This article shows how to interpret the most-used graphs. The scree plot is useful for determining the number of PCs to keep. The component pattern plot shows the correlations between the PCs and the original variables. The component pattern plots show similar information, but each plot displays the correlations between the original variables and a pair of PCs. The score plots project the observations onto a pair of PCs. The loadings plot project the original variables onto a pair of PCs.

When you analyze many variables, the number of graphs can be overwhelming. I suggest that you use the WHERE option in the ODS SELECT statement to restrict the number of pattern plots and score plots. For example, the following statement creates only two pattern plots and two score plots:

   ods select Eigenvectors 
              ScreePlot PatternProfilePlot
              where=(_label_ in: ('2 by 1','4 by 3'));  /* limit pattern plots and score plots */

There is one more plot that is sometimes used. It is called a "biplot" and it combines the information in a score plot and a loadings plot. I will discuss the biplot in a subsequent article.

The post How to interpret graphs in a principal component analysis appeared first on The DO Loop.

11月 012019
 

When I started working in data and analytics 30 years ago, information security wasn’t high on the agenda for organizations. That's changed with the rise of the Internet, and now that cloud is becoming more and more prevalent in organizations, information security is no longer just the domain of specialists [...]

Cloud security: Questions you need to ask when choosing a cloud provider was published on SAS Voices by David Annis

10月 312019
 

Buy my costume, Georgie!

While growing up in the 80's, I watched The Golden Girls on TV with my Grandma Betty. Now, when my sister visits, we binge watch reruns on TV Land. I was excited when I saw for this Halloween, you could buy Golden Girls costumes! Too bad they sold out right away, making them one this year's most popular costumes.

For the record, I wasn't planning to dress up tonight as a Golden Girl, but the news got me to thinking how Halloween costumes have changed over the years. What was popular when? Hence, this post. I explain how to use SAS REST APIs to append a table containing historic costume data with this year's most popular costumes (including the Golden Girls and Pennywise from It). While looking at costume data in this example, consider the append steps as a template to translate any table needing updates in SAS Viya, using REST APIs.

The data

There I am, in the middle

I created a data set containing the most popular Halloween costumes from the last 50 years (1968-2018). I compiled the data from several sources who couldn't seem to agree on the best-selling costume for a given year, so I combined the lists. Many years have two entries. The data here isn't as important as the append table procedure. What fun to review the costumes list! It was not hard to tell in what year certain movies (and their sequels) were released. Only one costume I wore made the list – my 1979 Ace Frehley outfit!

The process

The procedures in this example run on SAS Viya, utilizing the Cloud Analytics Services (CAS) REST APIs. CAS REST APIs use CAS actions to perform statistical methods across a variety of SAS products. You can learn more about CAS REST APIs and CAS Actions in the Using SAS Cloud Analytics Service REST APIs to run CAS Actions post or on developer.sas.com.

Below, I'll detail each REST call along with sample code. I originally used Postman to organize my calls. This allowed me to utilize pre and post-call scripting to handle responses and create variables. You can find the entire Postman collection here on GitHub. For ease-of-display purposes in this post, I'll use equivalent cURL commands.

Pre-requisites

I registered my client, obtained access token, and added it as an environment variable ACCESSTOKEN. For more information on registering a client or getting an access token, see my earlier post Authentication to SAS Viya: a couple of approaches.

Create a CAS session

Before running any CAS actions, we need to establish a connection to the SAS Viya server.

curl -X POST https://sasserver:8777/cas/sessions \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/vnd.sas.cas.session+json'

The result of this call is a session id in the form of a089ce2b-8116-7a40-b3e3-6e39b7b5566d. This will be used in all subsequent REST calls. You could easily create another variable for further use. In the examples below I substitute the actual session id with <session-id>. You will need to substitute this place holder when attempting the steps on your own.

Create a global Caslib HALLOWEEN

Data in CAS is stored in a Caslib. In the step below, I create a Caslib called HALLOWEEN and link it to a physical server path (/home/sasdemo/halloween), where the table is stored.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.addCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"name":"HALLOWEEN","path":"/home/sasdemo/halloween","description":"HALLOWEEN","subDirectories":"false","permission":"PUBLICWRITE","session":"false","dataSource":{"srcType":"path"},"createDirectory":"true","hidden":"false","transient":"false"}

Note that I created the directory ~/halloween and set permissions as needed. Further, since the Caslib is global, other users have access to the data. This step (and the next step) are one time requests. If you were to repeat this process you would not need to create the Caslib nor upload the data.

Copy data set costumesByYear into HALLOWEEN's path

Now that we have a Caslib and a path, we load the data table to the server. In this instance, I copy the costumesByYear.xlsx file into /home/sasdemo/halloween. There are multiple ways to upload data to the server. You can read more about the various methods in the SAS documentation.

Create a temporary Caslib TEMP

While our data lives in the HALLOWEEN Caslib, we want to create a temporary Caslib to run the append step. We will then save the appended table back into HALLOWEEN. The following code creates a new Caslib called TEMP.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.addCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"name":"TEMP","path":"/home/sasdemo/temp","description":"TEMP","subDirectories":"false","permission":"PUBLICWRITE","session":"false","dataSource":{"srcType":"path"},"createDirectory":"true","hidden":"false","transient":"false"}

Now we're ready to load the data into memory and append the table.

Load costumesByYear into memory

First, we load costumesByYear into memory in the TEMP Caslib.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.loadTable
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"path":"costumesByYear.xlsx","caslib":"HALLOWEEN","importOptions":{"fileType":"EXCEL"},"casOut":{"caslib":"TEMP","name":"costumesByYear","promote":"true"}}

Create a temporary table data2019 with containing append data

Next, we create a new table called data2019 with costume data for 2019 in TEMP.

curl -X PUT https://sasserver:8777/cas/sessions/<session-id>/actions/upload
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: text/plain' \
  -H 'JSON-Parameters: {"casOut":{"caslib":"TEMP","name":"data2019","promote":"true"},"importOptions":{"fileType":"CSV"}}' \
  --data-binary $'Year,Costume\n2019,The Golden Girls\n2019,Pennywise'

Run data step to append data2019 to costumesByYear table

Finally, we run data step code to append table data2019 to table costumesByYear.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/runCode \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"code": "data temp.costumesbyyear(append=force) ; set temp.data2019;run;"}

Save the costumesByYear table back to the HALLOWEEN CASlib

Now that we have successfully appended the costumesByYear table in the TEMP Caslib, we are ready to save back to the HALLOWEEN Caslib.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.save \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"table":{"name":"costumesbyyear","caslib":"TEMP","singlePass":"false"},"name":"costumesbyyear","replace":"true","compress":"false","caslib":"HALLOWEEN","exportOptions":{"fileType":"BASESAS"}}

Delete TEMP Caslib

The TEMP Caslib is just that, temporary. With the code below we drop the Caslib and all its data.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/table.dropCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"caslib":"TEMP"}

Delete the CAS session

The final step is to close our connection to the CAS server.

curl -X DELETE https://sasserver:8777/cas/sessions/<session-id> \
  -H 'Authorization: Bearer $ACCESSTOKEN'

Wrapping it up

There you have it. With a few simple commands we were able to load, append, and save a table. This example is fairly simple in scope, but translates into more complex use cases. The steps for my 2 x 50 table are the same as it would be for a 5GB table with hundreds of columns and millions of rows.

I have asked my mother to send the Polaroid photo of me as Ace in 1979. She just has to dig it out of a photo album. Check back in a week so you can gain fodder and poke fun at me.

Additional Resources

developer.sas.com - developers site for SAS
GitHub resources - GitHub repository for code used in this post

Append tables in SAS® Viya® with REST APIs – a treat, no tricks was published on SAS Users.

10月 302019
 

Every year at Halloween, I post an article that shows a SAS trick that is a real treat. This article shows how to use the INTNX function to find dates that are related to a specified date. The INTNX function is a sweet treat, indeed.

I previously wrote an article about how to use the INTCK and INTNX functions to compute with dates. These Base SAS functions are very powerful and deserve to be known more widely. In particular, the INTNX function enables you to compute the next or previous date that is a certain number of time units away from a known date. The "time unit" is usually a day, week, a month, or a year, but the function supports many options.

Recently, a SAS programmer asked how to get "the first and last days of the previous month." The programmer added that the expression needs to be used in a WHERE clause. Because the expression needs to appear in a WHERE clause, it should be concise and not require several statements or temporary variables.

The first day of a previous (or subsequent) time interval

Finding the first day of the previous month is an ideal situation for using the INTNX function. The basic syntax of the INTNX function is
      INTNX(timeUnit, startDate, numberOfUnits)
This form of the INTNX function returns the first day of the specified time unit. For example, the following statements give dates relative to the bombing of Pearl Harbor on 07DEC1941. The time interval is 'month'. Notice that you can ask for dates after the given date (a positive number of time units) or before the given date (a negative number of units). If you specify 0 for the third argument, you get the current month.

data Months;
date = '07DEC1941'd;
FirstDayCurrMonth = intnx('month', Date,  0);   /*  0 = current month */
FirstDayPrevMonth = intnx('month', Date, -1);   /* -1 = previous month */
FirstDayNextMonth = intnx('month', Date,  1);   /*  1 = next month */
FirstDay6Months   = intnx('month', Date,  6);   /*  6 = six months later */
format _ALL_ date9.;
run;
 
proc print data=Months noobs;
run;

Because the time unit is 'month' for this example, the calculated dates are the first day of the months relative to 07DEC1941. If you change 'month' to 'year', all the calculated dates will be 01JAN of some year relative to 1941.

The last (or same) day of a previous (or subsequent) time interval

A cool fact about the INTNX function is that it supports an optional fourth argument that enables you to specify whether you want the calculated date to be at the beginning, the middle, or the end of the specified time interval. You can even specify that you want the "same" characteristics as the source date, which is useful for finding anniversaries of an event. For example, the following statements vary the fourth argument, which can be one of four values:

data FirstLastMiddle;
date = '07DEC1941'd;
FirstDayPrevMonth = intnx('month', Date, -1, 'B');   /* B = beginning */
LastDayPrevMonth  = intnx('month', Date, -1, 'E');   /* E = end */
MiddlePrevMonth   = intnx('month', Date, -1, 'M');   /* M = middle */
FirstAnniv        = intnx('year',  Date,  1, 'S');   /* S = same */
format _ALL_ date9.;
run;
 
proc print data=FirstLastMiddle noobs;
run;

The program shows that you can find the first day of the previous month, the last day of the previous month, the middle of the previous month, or an anniversary of the specified date. In particular, the program answers the programmer's question by showing a concise "one-liner" that you can use to get the first and last days of the previous month.

In summary, the INTNX function is a powerful tool for working with dates. It enables you to find dates that are related to a specified date. You can use the first argument to specify the time unit (day, week, month, year,...) and the third argument to specify the number of time units before or after the specified data. The optional fourth argument determines whether you want the first, last, middle, or "same" portion of the time interval.

Whether you work with dates in SAS every day or whether you work with them occasionally, the INTNX function is a sweet treat to remember. No tricks required.

The post Compute the first or last day of a month or year appeared first on The DO Loop.

10月 302019
 

Check mark
When I'm about to make a major purchase, I appreciate being able to compare products features at a glance, side by side. I am sure you have seen these ubiquitous comparison tables with check marks showing which features are characteristic of different products and which are not.

These data visualizations, sometimes called comparison matrixes, are also commonly known as checklist tables or checklist table charts. Such charts are extremely useful, persuasive visuals as they allow us to quickly identify differences as well as commonalities between comparable products or solutions and quickly decide which one of them is more desirable or suitable for our needs.

For example, here is such a table that I created in MS Word:
Checklist table example created in Word

SAS code to create checklist table chart

Thanks to SAS’ ability to use Unicode characters in formatted data, it’s very easy to create such a checklist table in SAS. Just imagine that each cell value with visible check mark is assigned value of 1 and each cell value with no check mark is assigned value of 0. That is exactly the data table that lies behind this data visualization. To print this data table with proper formatting, we will format number 1 to a more visually appealing check mark, and 0 to a “silent” blank. Here is the SAS code to accomplish this:

data CHECKLIST;
   length FEATURE $10;
   input FEATURE $1-10 A B C;
   label
      FEATURE = 'Feature'
      A = 'Product A'
      B = 'Product B'
      c = 'Product C';
   datalines;
Feature 1 1 1 1
Feature 2 1 0 1
Feature 3 0 1 1
Feature N 1 0 1
;
 
proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      other = ' ';
   value chcolor
      1 = green;
run;
 
ods html path='c:\temp' file='checklist1.html' style=HTMLBlue;
 
proc print data=CHECKLIST label noobs;
   var FEATURE / style={fontweight=bold};
   var A B C / style={color=chcolor. just=center fontweight=bold};
   format A B C chmark.;
run;
 
ods html close;

If you run this SAS code, your output will look much as the one above created in MS Word:
Checklist table created in SAS
Key elements of the SAS code that produce this checklist table are user-defined formats in the PROC FORMAT. You format the values of 1 to a Unicode 2714 corresponding to a checkmark character ✔ in a user-defined format chmark. Also, the value of 1 is formatted to green color in the chcolor user-defined format. The syntax for using Unicode symbols in user-defined formats is this:

value chmark
1 = '(*ESC*){unicode "2714"x}'

NOTE: ESC here must be upper-case; x at the end stands for “hexadecimal.”

Unicode characters for checklist tables

Unicode or Unicode Transformation Format (UTF) is an international encoding standard by which each letter, digit or symbol is assigned a unique numeric value that applies across different platforms and programs. The Unicode standard is supported by many operating systems and all modern browsers.

It is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The most commonly used Unicode encodings standards are UTF-8 and UTF-16. HTML 5 supports both UTF-8 and UTF-16.

You can use this HTML Unicode (UTF-8) Reference to look up and choose symbols you can embed in your report using SAS user-defined formats. They are grouped by categories to make it easier to find the ones you needed.

Here is just a small random sample of the Unicode symbols that can be used to spice up your checklist tables to get their different flavors:

Unicode characters and codes
You can also apply colors to all these symbols the way we did it in the SAS code example above.

Different flavors of checklist tables

By just changing user-defined formats for the symbol shapes and colors we can get quite a variety of different checklist tables.

For example, we can format 0 to ✘ instead of blank and also make it red to explicitly visualize feature exclusion from product (in addition to explicit inclusion). All we need to do is to modify our PROC FORMAT to look like this:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      0 = '(*ESC*){unicode "2718"x}';
   value chcolor
      1 = green
      0 = red;
run;

SAS output comparison matrix will look a bit more dramatic and persuasive:
SAS-generated checklist table
Or, if you’d like, you can use the following format definition:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2611"x}'
      0 = '(*ESC*){unicode "2612"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following SAS-generated ballot-like table checklist:
Ballot-like checklist table created in SAS
Here is another one:

proc format;
   value chmark
      1 = '(*ESC*){unicode "1F5F9"x}'
      0 = '(*ESC*){unicode "20E0"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following variation of the checklist table:
Another SAS-generated checklist table
As you can see, the possibilities are endless.

Your thoughts?

Do you find these comparison matrixes or checklist tables useful? Do you envision SAS producing them for your presentation, documentation, data story or marketing materials? What Unicode symbols do you like? Can you come up with some creative usages of symbols and colors? For example, table cells background colors...

How to create checklist tables in SAS® was published on SAS Users.

10月 302019
 

Check mark
When I'm about to make a major purchase, I appreciate being able to compare products features at a glance, side by side. I am sure you have seen these ubiquitous comparison tables with check marks showing which features are characteristic of different products and which are not.

These data visualizations, sometimes called comparison matrixes, are also commonly known as checklist tables or checklist table charts. Such charts are extremely useful, persuasive visuals as they allow us to quickly identify differences as well as commonalities between comparable products or solutions and quickly decide which one of them is more desirable or suitable for our needs.

For example, here is such a table that I created in MS Word:
Checklist table example created in Word

SAS code to create checklist table chart

Thanks to SAS’ ability to use Unicode characters in formatted data, it’s very easy to create such a checklist table in SAS. Just imagine that each cell value with visible check mark is assigned value of 1 and each cell value with no check mark is assigned value of 0. That is exactly the data table that lies behind this data visualization. To print this data table with proper formatting, we will format number 1 to a more visually appealing check mark, and 0 to a “silent” blank. Here is the SAS code to accomplish this:

data CHECKLIST;
   length FEATURE $10;
   input FEATURE $1-10 A B C;
   label
      FEATURE = 'Feature'
      A = 'Product A'
      B = 'Product B'
      c = 'Product C';
   datalines;
Feature 1 1 1 1
Feature 2 1 0 1
Feature 3 0 1 1
Feature N 1 0 1
;
 
proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      other = ' ';
   value chcolor
      1 = green;
run;
 
ods html path='c:\temp' file='checklist1.html' style=HTMLBlue;
 
proc print data=CHECKLIST label noobs;
   var FEATURE / style={fontweight=bold};
   var A B C / style={color=chcolor. just=center fontweight=bold};
   format A B C chmark.;
run;
 
ods html close;

If you run this SAS code, your output will look much as the one above created in MS Word:
Checklist table created in SAS
Key elements of the SAS code that produce this checklist table are user-defined formats in the PROC FORMAT. You format the values of 1 to a Unicode 2714 corresponding to a checkmark character ✔ in a user-defined format chmark. Also, the value of 1 is formatted to green color in the chcolor user-defined format. The syntax for using Unicode symbols in user-defined formats is this:

value chmark
1 = '(*ESC*){unicode "2714"x}'

NOTE: ESC here must be upper-case; x at the end stands for “hexadecimal.”

Unicode characters for checklist tables

Unicode or Unicode Transformation Format (UTF) is an international encoding standard by which each letter, digit or symbol is assigned a unique numeric value that applies across different platforms and programs. The Unicode standard is supported by many operating systems and all modern browsers.

It is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The most commonly used Unicode encodings standards are UTF-8 and UTF-16. HTML 5 supports both UTF-8 and UTF-16.

You can use this HTML Unicode (UTF-8) Reference to look up and choose symbols you can embed in your report using SAS user-defined formats. They are grouped by categories to make it easier to find the ones you needed.

Here is just a small random sample of the Unicode symbols that can be used to spice up your checklist tables to get their different flavors:

Unicode characters and codes
You can also apply colors to all these symbols the way we did it in the SAS code example above.

Different flavors of checklist tables

By just changing user-defined formats for the symbol shapes and colors we can get quite a variety of different checklist tables.

For example, we can format 0 to ✘ instead of blank and also make it red to explicitly visualize feature exclusion from product (in addition to explicit inclusion). All we need to do is to modify our PROC FORMAT to look like this:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      0 = '(*ESC*){unicode "2718"x}';
   value chcolor
      1 = green
      0 = red;
run;

SAS output comparison matrix will look a bit more dramatic and persuasive:
SAS-generated checklist table
Or, if you’d like, you can use the following format definition:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2611"x}'
      0 = '(*ESC*){unicode "2612"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following SAS-generated ballot-like table checklist:
Ballot-like checklist table created in SAS
Here is another one:

proc format;
   value chmark
      1 = '(*ESC*){unicode "1F5F9"x}'
      0 = '(*ESC*){unicode "20E0"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following variation of the checklist table:
Another SAS-generated checklist table
As you can see, the possibilities are endless.

Your thoughts?

Do you find these comparison matrixes or checklist tables useful? Do you envision SAS producing them for your presentation, documentation, data story or marketing materials? What Unicode symbols do you like? Can you come up with some creative usages of symbols and colors? For example, table cells background colors...

How to create checklist tables in SAS® was published on SAS Users.

10月 302019
 

I suffer from arthritis. You can tell just by watching me walk: Depending on the day, I have a slight limp, which varies in severity based on a number of factors such as the time of day and recent physical activity. Years of treatment for my condition have shown me [...]

I applied AI to my arthritis assessment. Here’s what happened. was published on SAS Voices by Mark Wolff

10月 292019
 

Thank you to Lora Delwiche and Susan Slaughter for providing the following information:

Six editions is a lot! If you had told us back when we wrote the first edition of The Little SAS Book that someday we would write a sixth, we would have wondered how we could possibly find that much to say. After all, it is supposed to be The Little SAS Book, isn’t it? But the developers at SAS are constantly hard at work inventing new and better ways of analyzing and visualizing data. And some of those ways turn out to be so fundamental that they belong even in a little book about SAS.

Interface independence

One of the biggest changes to SAS software in recent years is the proliferation of interfaces. SAS programmers have more choices than ever before. Previous editions contained some sections specific to the SAS windowing environment (also called Display Manager). We wrote this edition for all SAS programmers whether you use SAS Studio, SAS Enterprise Guide, the SAS windowing environment, or run in batch. That sounds easy, but it wasn’t. There are differences in how SAS behaves with different interfaces, and these differences can be very fundamental. In particular, the system option that sets the rules for names of variables varies depending on how you run SAS. So old sections had to be rewritten, and we added a whole new section showing how to use variable names containing blanks and special characters.

New ways to read and write Microsoft Excel files

Previous editions already covered how to read and write Microsoft Excel files, but SAS developers have created new ways that are even better. This edition contains new sections about the XLSX LIBNAME engine and the ODS EXCEL destination.

More PROC SQL

From the very first edition, The Little SAS Book always covered PROC SQL. But it was in an appendix, and over time we noticed that most people ignore appendices. So for this edition, we removed the appendix and added new sections on using PROC SQL to:

• Subset your data
• Join data sets
• Add summary statistics to a data set
• Create macro variables with the INTO clause

For people who are new to SQL, these sections provide a good introduction; for people who already know SQL, they provide a model of how to leverage SQL in your SAS programs.

Updates and additions throughout the book

Almost every section in this edition has been changed in some way. We added new options, made sure everything is up-to-date, and ran every example in every SAS interface noting any differences. For example, PROC SGPLOT has some new options, the default ODS style for PDF has changed, and the LISTING destination behaves differently in different interfaces. Here’s a short list, in no particular order, of new or expanded topics in the sixth edition:

• More examples with permanent SAS data sets, CSV files, or tab-delimited files
• More log notes throughout the book showing what to look for
• LIKE or sounds-like (=*) operators in WHERE statements
• CROSSLIST, NOCUM, and NOPRINT options in PROC FREQ
• Grouping data with a user-defined format and the PUT function
• Iterative DO groups
• DO WHILE and DO UNTIL statements
• %DO statements

Even though we have added a lot to this edition, it is still a little book. In fact, this edition is shorter than the last—by 12 pages! We think this is the best edition yet. For a sneak preview check out the free book excerpt. You can also learn more about SAS Press, check out the up-and-coming titles, and to exclusive discounts -- make sure to subscribe to the newsletter.

The Little SAS Book 6.0: The best-selling SAS book gets even better was published on SAS Users.

10月 282019
 

If you're looking for advice on developing an analytics strategy, there's no shortage of resources, including this from SAS: Building your data and analytics strategy.  If, on the other hand, you're looking for advice on how to apply analytics to strategic planning, your search has likely to come up wanting.  [...]

8 ways analytics can support strategic planning and decision making was published on SAS Voices by Leo Sadovy