5月 042018
 

Landmark population health study involving 50,000 northern Nevadans turns to SAS to reveal hidden health insights These are exciting times in health care. Earlier this week I read that the NIH will soon open enrollment for its “All of Us” initiative. The research program aims to compile the genetic and health [...]

Healthy Nevada Project drives life-changing results was published on SAS Voices by Saurabh Gupta

5月 032018
 

Have you ever lied about your age? When you were younger, perhaps you exaggerated your age to watch an R-rated movie, buy cigarettes, get into a night club, or drink alcohol? And when people reach their 30s or 40s, they might subtract a few years when people ask their age. [...]

The post Proof that Southerners lie about their age! appeared first on SAS Learning Post.

5月 022018
 

Oklahoma State University (OSU) has corralled its data faster than a tumbleweed in a whirlwind, and has bold plans to transform its institutional research efforts. I recently met with OSU's Institutional Research and Information Management (IRIM) team, which provides information, research, decision support, and analysis on demand to the OSU [...]

Oklahoma State University visualizes data ‘til the cows come home was published on SAS Voices by Georgia Mariani

5月 022018
 

Order matters. When you create a graph that has a categorical axis (such as a bar chart), it is important to consider the order in which the categories appear. Most software defaults to alphabetical order, which typically gives no insight into how the categories relate to each other.

Alphabetical order rarely adds any insight to the data visualization. For a one-dimensional graph (such as a bar chart) software often provides alternative orderings. For example, in SAS you can use the CATEGORYORDER= option to sort categories of a bar chart by frequency. PROC SGPLOT also supports the DISCRETEORDER= option on the XAXIS and YAXIS statements, which you can use to specify the order of the categories.

It is much harder to choose the order of variables for a two-dimensional display such as a heat map (for example, of a correlation matrix) or a matrix of scatter plots. Both of these displays are often improved by strategically choosing an order for the variables so that adjacent levels have similar properties. Catherine Hurley ("Clustering Visualizations of Multidimensional Data", JCGS, 2004) discusses several ways to choose a particular order for p variables from among the p! possible permutations. This article implements the single-link clustering technique for ordering variables. The same technique is applicable for ordering levels of a nominal categorical variable.

Order variables in two-dimensional displays

I first implemented the single-link clustering method in 2008, which was before I started writing this blog. Last week I remembered my decade-old SAS/IML program and decided to blog about it. This article discusses how to use the program; see Hurley's paper for details about how the program works.

To illustrate ordering a set of variables, the following program creates a heat map of the correlation matrix for variables from the Sashelp.Cars data set. The variables are characteristics of motor vehicles. The rows and columns of the matrix display the variables in alphabetical order. The (i,j)th cell of the heat map visualizes the correlation between the i_th and j_th variable:

proc iml;
/* create correlation matric for variables in alphabetical order */
varNames = {'Cylinders' 'EngineSize' 'Horsepower' 'Invoice' 'Length' 
            'MPG_City' 'MPG_Highway' 'MSRP' 'Weight' 'Wheelbase'};
use Sashelp.Cars;  read all var varNames into Y;  close;
corr = corr(Y);
 
call HeatmapCont(corr) xvalues=varNames yvalues=varNames 
            colorramp=colors range={-1.01 1.01} 
            title="Correlation Matrix: Variables in Alphabetical Order";
Heat map of correlation matrix; variables in alphabetical order

The heat map shows that the fuel economy variables (MPG_City and MPG_Highway) are negatively correlated with most other variables. The size of the vehicle and the power of the engine are positively correlated with each other. The correlations with the price variables (Invoice and MSRP) are small.

Although this 10 x 10 heat map visualizes all pairwise correlations, it is possible to permute the variables so that highly correlated variables are adjacent to each other. The single-link cluster method rearranges the variables by using a symmetric "merit matrix." The merit matrix can be a correlation matrix, a distance matrix, or some other matrix that indicates how the i_th category is related to the j_th category.

The following statements assume that the SingleLinkFunction is stored in a module library. The statements load and call the SingleLinkCluster function. The "merit matrix" is the correlation matrix so the algorithm tries to put strongly positively correlated variables next to each other. The function returns a permutation of the indices. The permutation induces a new order on the variables. The correlation matrix for the new order is then displayed:

/* load the module for single-link clustering */
load module=(SingleLinkCluster);
permutation = SingleLinkCluster( corr );  /* permutation of 1:p */
V = varNames[permutation];                /* new order for variables */
R = corr[permutation, permutation];       /* new order for matrix */
call HeatmapCont(R) xvalues=V yvalues=V 
            colorramp=colors range={-1.01 1.01} 
            title="Correlation: Variables in Clustered Order";
Heat map of correlation matrix; variables in clustered order

In this permuted matrix, the variables are ordered according to their pairwise correlation. For this example, the single-link cluster algorithm groups together the three "size" variables (Length, Wheelbase, and Weight), the three "power" variables (EngineSize, Cylinders, and Horsepower), the two "price" variables (MSRP and Invoice), and the two "fuel economy" variables (MPG_Highway and MPG_City). In this representation, strong positive correlations tend to be along the diagonal, as evidenced by the dark green colors near the matrix diagonal.

The same order is useful if you want to construct a huge scatter plot matrix for these variables. Pairs of variables that are "interesting" tend to appear near the diagonal. In this case, the definition of "interesting" is "highly correlated," but you could choose some other statistic to build the "merit matrix" that is used to cluster the variables. The scatter plot matrix is shown below. Click to enlarge.

Scatter plot matrix of 10 variables. The highly correlated variable tend to appear together so that diagonal scatter plot exhibit high correlation.

Other merit matrices

I remembered Hurley's article while I was working on an article about homophily in married couples. Recall that homophily is the tendency for similar individuals to associate, or in this case to marry partners that have similar academic interests. My heat map used the data order for the disciplines, but I wondered whether the order could be improved by using single-link clustering. The data in the heat map are not symmetric, presumably because women and men using different criteria to choose their partners. However, you can construct the "average homophily matrix" in a natural way. If A is any square matrix then (A` + A)/2 is a symmetric matrix that averages the lower- and upper-triangular elements of A. The following image shows the "average homophily" matrix with the college majors ordered by clusters:

The heat map shows the "averaged data," but in a real study you would probably want to display the real data. You can see that green cells and red cells tend to occur in blocks and that most of the dark green cells are close to the diagonal. In particular, the order of adjacent majors on an axis gives information about affinity. For example, here are a few of the adjacent categories:

  • Biology and Medical/Health
  • Engineering tech, computers, and engineering
  • Interdisciplinary studies, philosophy, and theology/religion
  • Math/statistics and physical sciences
  • Communications and business

Summary

In conclusion, when you create a graph that has a nominal categorical axis, you should consider whether the graph can be improved by ordering the categories. The default order, which is often alphabetical, makes it easy to locate a particular category, but there are other orders that use the data to reveal additional structure. This article uses the single-link cluster method, but Hurley (2004) mentions several other methods.

The post Order variables in a heat map or scatter plot matrix appeared first on The DO Loop.

5月 022018
 

During SAS Global Forum 2018, SAS instructor Charu Shankar sat down with four SAS users to get their take on what makes them a SAS user. Read through to find valuable tips they shared and up your SAS game. I’m sure you will come away inspired, as you discover some universal commonalities in being a SAS user.

The post What makes a SAS user? Order, logic and magic: Louise Hadden appeared first on SAS Learning Post.

5月 012018
 

As health care evolves, its entire ecosystem – from payers and providers to pharmaceutical companies and government agencies – seeks to find common ground. More data is available than ever. But transforming information into innovation is challenging. Organizations strive to create shared goals, internally and externally, trying to improve patient [...]

How will IoT and AI drive transformation in health care and life sciences? was published on SAS Voices by Kristine Vick

5月 012018
 

The title of this blog says what you really need to know: SAS Enterprise Guide does have a future, and it's a bright one. Ever since SAS Studio debuted in 2014, onlookers have speculated about its impact on the development of SAS Enterprise Guide.

I think that we have been consistent with our message that SAS Enterprise Guide serves an important purpose -- a power-user interface for SAS on the desktop -- and that the product will continue to get support and new features. But that doesn't stop folks from wondering whether it might meet sudden demise like a favorite Star Wars or Game of Thrones character.

I recently recorded a session with Amy Peters, the SAS product manager for SAS Enterprise Guide and SAS Studio. Amy loves to meet with SAS users and hear their successes, their concerns, and their ideas. Her enthusiasm for SAS Enterprise Guide comes through in this video, even as I bumble my way through the prototype of the Next Big Release.

Coming soon: the features of a modern IDE

In addition to a much-needed makeover and modern appearance, the new version of SAS Enterprise Guide (scheduled for sometime in 2019) addresses many of the key requests that we hear from SAS users. First, the new version blows open the window management capabilities. You can open and view many items -- programs, data, log, results -- at the same time, and arrange those views exactly as you want. You can spread your workspace over multiple displays. And you can tear away or dock each item to suit your working style.

Screenshot of Future EG

(in development) screenshot of SAS Enterprise Guide

Second, you can decide whether you want to work with a SAS Enterprise Guide project -- or just simply write and run code. Currently you must start with a project before you can create or open anything else. The new version allows you leverage a project to organize your work...or not, depending on your need at the moment.

And finally, you can expect more alignment and collaboration features between SAS Studio and SAS Enterprise Guide. We see that more users find themselves using both interfaces for related tasks, and presenting a common experience is important. SAS Studio runs in your browser while SAS Enterprise Guide works on your desktop. Each application has different capabilities related to that, but there's no reason that they need to be so different, right?

For more information about what the future will bring, check out the communities article that recaps the SAS Global Forum 2018 presentation. It includes an attached presentation slide deck with many exciting screenshots and roadmap details. All of this is subject to change, of course (including release dates!), but I think it's safe to say the future is bright for SAS users who love their tools.

The post A productive future for SAS Enterprise Guide appeared first on The SAS Dummy.

4月 302018
 

How old was the oldest person in your family, or the oldest person you personally know? And how do they compare to the oldest people in the world? ... Perhaps you can easily make the comparison, with this cool graph! But before we get started, here's a picture of my [...]

The post Keeping track of the oldest people in the world appeared first on SAS Learning Post.