Data Visualization

8月 082011
At a recent meeting with Alan Brown of Syngenta, one of the world’s leading agribusinesses, he described how his role has evolved from being a chemist into being a global statistical troubleshooter. In this role, Alan helps other parts of Syngenta, notably production, to increase yields and resolve problems through [...]
7月 202011
With all the fanfare in New York for Derek Jeter reaching the 3,000-hit mark, I couldn’t resist the opportunity to gather some data to put this rare event in perspective. First off, let me qualify my interest and passion around this topic. I was born and raised in Pelham, New York, which is roughly 15 minutes from Yankee Stadium. I grew up following Mickey Mantle, Roger Maris, Yogi Berra and Whitey Ford in the 1960s. My passion for baseball and the Yankees is clearly a result of those memorable early days of going to see them with my Dad.

Much later on in my life, I embarked on a fantasy baseball endeavor with my Eastman Kodak colleagues, and I have been active since 1988 with this hobby. One of the most memorable moves that I made in our keeper fantasy baseball league was to trade Cal Ripken for Derek Jeter during his rookie season. The rest, as they say, is history. Derek Jeter has been the JMP Yankees shortstop his entire career, and the JMP Yankees have been a consistent championship contender year in and year out.

Visualizing the data on the 3,000-hit club in JMP only increased my appreciation of Derek Jeter’s career. My first observation is that the 28 players that have reached the lofty milestone of 3,000 hits have a mean age of 39. Jeter is 37.

graph in JMP of distribution of ages of players who have had 3,000 hits

Further evaluation of the distribution of the year that each player reached 3,000 hits led to another interesting observation. When one fits the distribution in JMP, a normal two-mixture fit results. Curiously enough, the MLB expansion occurred in the 1960s, when the number of teams increased from 16 to 30. The number of players to reach 3,000 hits was 8 before 1960 and 19 since 1960. One could argue that pitching dilution is the reason for this.

I also thought it would be cool to visualize the data utilizing Graph Builder in JMP, adding a reference line for the mean age of 39, and also to see the frequency of the 3,000-hit event over time. (Click on the graph below to see a larger version of it.)

Quite honestly, even though I am a biased Yankee fan, this graphic really makes me appreciate how extraordinary a player Derek Jeter is and how glad I am that I was able to lure him away for Cal Ripken a long time ago.

EDITOR'S NOTE July 27: The JMP data table for the 3,000-hit club is now available in the JMP File Exchange. (You need a SAS login to get the free download.)
7月 112011
As data volumes continue to grow, analysts, business users and students must rely on increasingly sophisticated techniques to extract meaningful, actionable information from their data. JMP is proof that they won’t have to sacrifice ease of use for predictive power: With JMP, popular data mining and forecasting tools are accessible to students, teachers and professionals in a wide variety of disciplines.

The Partition and Stepwise Regression platforms help users determine which of the explanatory variables in their data offer the most insight, while tools such as Principal Components Analysis and Factor Analysis enable users to make better sense of “wide” data sets (often involving dozens or even hundreds of variables) by constructing a handful of new variables, losing very little predictive power in the process. JMP’s forecasting tools, which include ARIMA and exponential smoothing models, help to predict what the future values of a series may look like, given its past values.

Now, we’ve made it even easier for new users of JMP to harness the power of these techniques, by adding to our collection of One-Page Guides, which are available in the Learning Library. These One-Page Guides offer instructions, screenshots and tips for analyzing data with JMP. More than 40 are already posted, with more on the way. The latest additions include:

• Classification trees
• Regression trees
• ARIMA models
• Smoothing models
• Principal Components Analysis
• Factor Analysis
• Clustering
• Stepwise Regression
• Discriminant Analysis
• Neural Networks

Check them out and get started!
6月 292011
Journal of Quality Technology has appointed Bradley Jones as its new editor starting next year. JQT is a journal published by ASQ that emphasizes applied techniques in industrial statistics, including experimental design, Brad’s specialty.

Brad’s current role at SAS is Principal Research Fellow at JMP, where he develops software for optimal experimental design, advances research in that and closely related fields, and travels to present at seminars and conferences. Optimal experimental design is extremely important for engineers so that they can learn the most from a given number of runs in an experiment. Experiments are the key to learning how to make the best products, run the best processes, reduce variation and waste, and improve yields and performance.

Congratulations, Brad.
6月 212011
JMP wasn’t around when Anne Milley was taking quantitative analytics courses in college.
“Back then, it was all programming,” explained Milley, an economics major. “The visuals were just hideous.”

Today, as senior director of analytic strategy in JMP Product Marketing at SAS, Milley appreciates the value that robust and interactive graphics in JMP bring to the analytical experience.

“Now you can visually wallow in your data,” she says. “You can see what you need to pay attention to immediately, visually, and that’s wonderful.” Outliers and trends that can remain hidden in columns of numbers become apparent immediately when displayed graphically.

With organizations today collecting more data than ever before, the ability to more quickly derive meaning from vast stores of raw data is becoming increasingly important. Milley advocates a strategic approach to analytics that increases its value across the enterprise.

She enjoys helping organizations increase their “analytic bandwidth” to get more bang from their technology buck by creating analytic centers of excellence – internal teams that promote the strategic use of analytics to support enterprise goals. That will be the topic of her complimentary SAS TALKS webcast at 1 p.m. ET on Thursday, June 23.

Tune in and you’re likely to hear Milley refer to the 80-20 rule: Most analysts spend 80 percent of their time preparing their data and only 20 percent of their time exploring and analyzing it to make discoveries. She’ll talk in the webcast about strategies for moving that ratio in favor of analysis, which will integrate ways to communicate results.

“Matching the right technology paradigms to the right skill sets is very important,” she explains. “Some people prefer to write code, some prefer to work visually, and some like both.” JMP lets users choose which approach to take. Using JMP, programmers can connect to SAS and R, while users who need graphical representations can use JMP with Excel to make data more interactive.

Whichever approach individual users take, finding ways to use data to greater competitive advantage is a universal goal. “Analytics is relevant everywhere, and it’s something that you can’t get away from, so you might as well recognize its power and invest in it,” says Milley.
6月 152011
Kaiser Fung is a statistician with more than a decade of experience in applying statistical methods to unlocking the relationship between marketing and customer behaviors. He leads a team of statisticians at Sirius XM Radio responsible for gaining insight into customers and operational best practices.

You may know him from his popular blog, Junk Charts, which pioneered the critical examination of data and graphics in the mass media. He was also a keynote speaker at the JMP Discovery Summit in 2010, at which he gave a speech that asked “What Happens After the Math Is Done?” And he is author of the book Numbers Rule Your World. (We've got 25 signed copies for a special JMP Blog giveaway. Details are below.)

Fung believes the analytics community should spend more time thinking about how to most effectively support data-driven decision making -- and he explains this point in a new story "On a mission to make more practical sense of data":

    Information-design guru Edward Tufte threatens that every time someone gives a PowerPoint presentation, he will kill a kitten. Kaiser Fung knows Tufte is only kidding, but Fung is still concerned.

    Fung has seen too much not to be concerned – too many slides jammed with absurd amalgamations of information.

    There’s so much information to ponder, so readily at hand. But insights are often lost in the clutter or lack practical value.

    Don’t get Fung wrong: He is in favor of data and data-driven decision making. He worries about how presenters lose the audience when conveying quantitative information.

    “A lot of things are needed in order to bring insights about numbers to the point where you can get nontechnical people to move in a direction that is essentially driven by evidence and not simply by data,” Fung says.

Read the whole story, including a sidebar about the role of JMP in Fung's work.

And if you are among the first 25 people to comment on this blog post explaining your strategies for promoting data-driven business decision making in your organization, you could win a signed hardcover copy of Fung's book, Numbers Rule Your World. Your contribution to this discussion should be between 50 and 75 words long. Be sure to enter your e-mail address when you write your comment so we can contact you if you are a winner. Only one book per commenter.
6月 082011
Last week, SAS and JMP hosted Interface 2011: Statistical, Machine Learning, and Visualization Algorithms, 42nd Symposium on the Interface. The deep statistical issues were mostly over my head, but I found enough discussion of visualizations to keep me interested. Leland Wilkinson shared his recent research on Venn diagrams, which may find its way into JMP Genomics, where similar diagrams are already used for some results. I gave a presentation on some of the challenges JMP has faced integrating geographic map visualizations into an interactive statistical product.

Related to mapping, Dan Carr presented some of his work on "micromaps." I have his book, Visualizing Data Patterns with Micromaps, but it was very helpful to see the presentation and talk with him afterward. Micromaps incorporate small maps into arrangements that allow multivariate data exploration. He defines several types of micromaps, and it occurred to me that JMP can already approximate the conditional micromaps in Graph Builder.

Here is a pseudo-micromap of Europe colored by Human Development Index (HDI) for each country. HDI is a development measure used by the United Nations that combines life expectancy, education and income. The grouping variables condition the maps and make it easier to see the relationships between the grouping variables, the coloring variables and the countries.

Each country appears in one panel. In this case, we can see that lower HDI appears to be correlated with lower GDP per capita since countries in the top three panels are generally bluish, and because it's a map, we can also see that those countries are in eastern Europe. I don't see a similar pattern going across, suggesting median age is not correlated with HDI.

My mock-up misses some key micromap features, including
  • simple, reduced detail maps

  • enlarged tiny countries

  • sliders to adjust the grouping categories

  • categorical coloring to simplify comparison

We hope to address those with a JMP add-in. Let us know if you have a particular interest in micromaps.
6月 022011
We are seeing lots of interest in a new illustrated white paper that is available for download: Moving from SPSS to JMP: A Transition Guide by Dr. Jason Brinkley of the Department of Biostatistics at East Carolina University.

As its title indicates, the purpose of the paper is "to transition users who are familiar with SPSS to performing analysis in JMP." Dr. Brinkley accomplishes this through the use of an example that is independent of either SPSS or JMP. And he shows the latest versions of both programs: SPSS 19 and JMP 9.

"Looking across several instances, we start to see a pattern emerge in the overall differences JMP and SPSS. Point and click in SPSS is a mechanism for generating SPSS code, so users decide which analysis and options they want to perform and then submit the generated code to obtain the output. The code and results are listed in the output file, which can be saved, copied or manipulated. By contrast, JMP dynamically links the data and reports in order to create an interaction; users start with a general area of analysis and then are allowed to customize output to add different features or analytics. The dynamic link between data and output makes exploring unusual observations very simple, and the interactivity of features such as Graph Builder allows users to create and update visuals in real time," Dr. Brinkley writes in the paper's conclusion.

Here are the main topics he covers in the paper:

  • Importing and Cleaning Data
  • Visualization
  • Descriptive Statistics
  • Custom Tables
  • Correlation
  • Inference, including Two-Sample t-Test, Crosstabs/Contingency Tables and Linear Regression
  • Data Manipulation, including Creating New Variables and File Splitting
  • Saving and Reproducing Output

Get your free copy of the paper and make your own transition from SPSS to JMP easier!