Data Visualization

5月 212011
 
May 12 is becoming the annual event for data visualisation, with world-renowned visualisation guru Stephen Few presenting in London on that day for the second year running. It was perfect timing for him to visit the UK, the home of David McCandless, following Stephen's recent blog post that has rocked the world of information graphics.

Author of such leading visualisation books such as Show Me the Numbers and Now You See It, Stephen presented a message on May 12 that was music to my ears: In this world of increasingly complex data, keep your visuals simple and accurate if you want to get your point across.

JMP had a simple set of messages that expanded on Stephen’s theme:

  • JMP allows you to easily see patterns in your data.
  • As your data size and complexity grow, you can bring in JMP’s analytics to make this manageable (and even extend it to predictive modelling with JMP Pro).
  • If you want to explain your data to clients or executives, use JMP to help you tell the story.

This last point is a particularly interesting one. Much of JMP’s history has been about using the software to discover (see Discovery Summit for evidence); what we neglected to mention was that it is one thing to discover insights, and it is another to uncover them to the world in a way that meets the high expectations of the digital-content audience. One analyst in a marketing communications company said to me, “The designers in our company treat us as second-class citizens; it’s cool to have a tool that makes them go ‘wow’ when I show something!” So there you have it: JMP is “eye candy for analysts”!
5月 122011
 
The editor of the KD Nuggets newsletter, Gregory Piatetsky-Shapiro, Ph.D., has attracted quite a number of subscribers over the years with a variety of interesting news items, polls and topics. It's hard to believe he is now posting the 12th annual poll on Data Mining / Analytic Tools Used. Despite the many issues with the data from this admittedly unscientific poll, I was curious to bring this small time series data set into JMP and look at what poll participants had volunteered about their use of modeling tools over more than a decade.

Several of my statistician friends are quick to remind us that no meaningful conclusions can be drawn from such polls because they are fraught with bias and data quality issues. Vendor ballot-stuffing is a particular issue in this poll, which has been commented on in the 2001 and 2003 poll results.

First, what qualifies as a data mining tool is certainly a factor — many statisticians who have been doing predictive modeling for years may not consider themselves as doing data mining. Data mining/Analytics is a very broad cross-disciplinary area. [Full disclosure: Since JMP has included some data mining capabilities starting with JMP 6 (we are now at JMP 9 and also have 64-bit JMP Pro), I asked Gregory if JMP could be included in this year’s survey, and he kindly agreed.]

Second, given the number of new product entrants, exits, acquisitions as well as population/participant changes over time, there are considerable issues in the time dimension of this data. Some of this is evident in the missing data for various vendors. JMP’s Graph Builder shows variation in the number of votes versus percent of total votes each year by vendor.

Graph Builder in JMP displays KD Nuggets poll data

There are almost certainly some fat-fingering mistakes I made in collecting the data from the past poll results on www.kdnuggets.com. And this is by no means an exhaustive list of issues with the data.

Data issues aside, many still find it interesting to look at what information people volunteer and how that may be changing over time. We are by nature curious. I also combined tools in an attempt to reflect acquisitions at a vendor/tool-provider level while still keeping the detailed votes at the product level (some assumptions had to be made given repackaging/naming, but all appears to be directionally correct).

From the bubble plot below (output as Flash), you can play the “data movie” to see a changing blend of colors over time since the bubbles are colored by the tool provider/vendor. The open source tools are showing strong growth, which is consistent with what we observe from other sources, most notably a similar poll done by Karl Rexer of Rexer Analytics. Last year’s results of his poll are available. Now in its fifth year, active survey links and access codes are on Rexer Analytics and on Dean Abbott’s blog.



Back to the KD Nuggets poll results, using “your own code” could mean commercial, open source or a combination thereof (as is the case with JMP and R, with a growing number of examples of JMP Scripting Language and R on the JMP File Exchange). Many commercial software tools are also showing recent growth. This is consistent with the well-observed trend of organizations leveraging analytics more to create more value. By the way, if you want to volunteer your perspectives — here on this blog or in either of these popular data mining/analytics polls, we invite you to participate.
4月 212011
 
JMP Product Manager Jeff Perkinson presented the April 14 Mastering JMP session, Using Geographic and Custom Maps. I divided his demo into three parts for you to watch at your leisure.

By default, JMP installs map files in a Maps directory. Each map consists of two JMP data files with a common prefix:

  • -Name file that contains the unique names for the different regions.
  • -XY file that contains the latitude and longitude coordinates of the boundaries.

The two files are implicitly linked by a Shape ID column. For example:

  • The World-Country-Name.jmp file contains the exact names and abbreviations for different countries throughout the world.
  • The World-Country-XY.jmp file contains the latitude and longitude numbers for each country, by Shape ID.

The Shape ID strings in the -Name and -XY .jmp files must match. Hint: If JMP does not recognize the names in your data, check the built-in map files for spelling and names that JMP recognizes. For example, United Kingdom is a country name, but Great Britain is not.

Here are some of the questions that arose during the webcast and corresponding answers provided by Jeff and JMP Developer Xan Gregg.

Q: Do the built-in maps use the 1984 WGS projection definitions?
A: JMP uses one of two projections when the axes are set to "geodesic" scaling. For the whole world, JMP uses a projection called Kavrasky 7, which is similar to that used by National Geographic. For more local areas, like the US Map, JMP uses the Albers equal-area projection used by the US Geologic Survey. That method lets JMP compare filled areas accurately.

Q: What are Choropleth maps?
A: Choropleth maps, such as those you can display in JMP using .shp files, shade or give patterns to areas in proportion to the measurement of the statistical variable being displayed on the map. This lets you visualize differences across a geographic area.

Q: Do you have a list of ZIP codes with longitude and latitude built into JMP or know where I can find them?
A: ZIP code shapes are not included with JMP. The .shp files for them can be downloaded from the < a href=” http://www.census.gov/geo/www/cob/dv2000.html”>US Census Bureau one state at a time.

Q: Do Simple Earth and Detailed Earth maps show detail to 1 square mile?
A: Simple Earth and Detailed Earth are not high enough resolution to be used for drawing 1 square mile. It will attempt to display, but you will see “fat pixels.” To zoom in to that level of detail, you may try one of the maps offered on the NASA server.

Q: Does JMP support SAS/GRAPH maps?
A: SAS/GRAPH includes map data sets that can be converted for use as shape (.shp) files with JMP. These data sets are in the JMP Maps library. They come as a pair of data sets just where the traditional map data sets contain the XY coordinate data. The feature table contains the common place names.

Most of the traditional map data sets include unprojected latitude and longitude variables in radians. They can be used with JMP after they’ve been converted to degrees and the longitude variable has been adjusted for projection.

To convert SAS maps, you can download the SAS map to JMP map converter from the JMP File Exchange using your SAS login.

Q: In the past, SAS GIS has had difficulty importing .shp files that are donuts, or polygons within polygons. Does JMP overcome that issue?
A: JMP does support shapes with holes. South Africa is one example. Also, SAS/GRAPH includes a number of map data sets that can be converted for use as shape files with JMP. These data sets are in the JMP Maps library.

Q: Please share with us the steps for adding a .shp file.
A: Mike Vorburger's blog post describes the steps and gives information about locating .shp files.

Q: Where do we save our .shp files?
A: Save the .shp files to the same place as the built-in files, in the Maps folder, and JMP will automatically find them. The locations differ according to operating system:
  • Windows XP: \Local Settings\Application Data\SAS\JMP\Maps
  • Windows Vista: \AppData\Local\SAS\JMP\Maps
  • Macintosh: /Library/Application Support/JMP/Maps

    Q: Where can we locate other .shp files?
    A: Some government agencies, such as the US Census Bureau, offer free files, including a variety of recent TIGER/Line®Shapefiles that contain features such as roads, railroads, rivers, as well as legal and statistical geographic areas. Other commercial Geographic Information System businesses, such as ESRI and the ESRI map service ARCGIS, also offer map shapes.

    Q: Jeff said that the NASA map server changed location after JMP 9.0 was released, so the JMP built-in link to the NASA server is no longer accurate. When will it work?
    A: We updated JMP 9.0.2 to link to the new NASA server location. You can download the update by May 1 from the JMP Software Updates page in the Support area of our website.

    A FINAL TIP: Want to set JMP to look for updates today or regularly (daily, weekly or monthly)? Just specify your wish using JMP File>Preferences>JMP Updates.
  • 4月 132011
     
    I have always been interested in studies that rank US cities according to a variety of attributes. It’s one reason I was drawn to the Raleigh, North Carolina, area more than a decade ago. These studies typically rate cities based on income, job growth, taxes, crime, etc.

    Kiplinger, a Washington, D.C.-based publisher of business forecasts and personal finance advice, recently published a new one with a different spin on the criteria. The process Kiplinger used is described below:

    “Our process is based on the work of Kevin Stolarick, of the Martin Prosperity Institute, a think tank that studies economic prosperity. Stolarick came up with a formula that identifies cities with current and likely future growth in high-quality jobs and income. We also weighed affordability and public-transit infrastructure – the latter being an important factor to ensure continued growth in certain metro areas.

    Stolarick also included in the formula a measurement of the 'creative class,' a product of his work with Richard Florida, academic director of the Martin Institute and author of The Rise of the Creative Class. Creative-class workers – scientists, engineers, educators, writers, artists, entertainers and others – inject both economic and cultural vitality into a city and help make it a vibrant place to live.”

    The results for 2010 were published in a less-than-informative table. So I downloaded the data into JMP and created the visual below using a Bubble Plot.

    JMP Bubble Plot of all metro regions

    Why a Bubble Plot and not a map of the US? Well, in this case there were too many variables to adequately show on a map. However, I’m open to suggestions.

    Even though there’s no time variable, the Bubble Plot can show the data more elegantly than any other graph type I could think of. The bubbles represent each metro region. They are sized by population and colored by percent of creative class. The Y-axis is the cost of living, and the X-axis is the median household income.

    The first thing that jumps out at the viewer is the large bubble well north of any other. While I’m sure you can guess the city, I’ll bet you’re as surprised as I am that no other city even comes close to it. So which city sits atop the others like the Sun?

    JMP Bubble Plot of all metro regions but with the highest one highlighted to show that it's New York and Northern New Jersey

    It's New York and Northern New Jersey, of course. That metro region sits up there because the study rates its cost of living at 400% of the US average. Yes, that’s right – 400%. I went back and checked the number, and accurate or not, it's what the study shows.

    Now, the sweet spot of the graph is the lower right side with colored bubbles as close to red as possible. This will yield cities that have a low cost of living, high average income per household and a rich creative class. Using the Data Filter in JMP, I zeroed in on those cities.

    JMP Data Filter

    First, I selected cities with no more than 125% cost of living. Then, I added cities with a minimum of $50,000 household income. Lastly, I restricted my view to those that had at least 30% creative citizens.

    So which metro area should I move to, based on those criteria?

    The winner is Oxnard-Thousand Oaks-Ventura, California...

    Bubble Plot in JMP with Oxnard-Thousand Oaks-Ventura, California highlighted

    Followed by Minneapolis-St Paul-Bloomington, Minnesota-Wisconsin...

    Bubble Plot in JMP with Minneapolis-St Paul-Bloomington, MN-WI highlighted

    But wait ... the Raleigh-Cary, North Carolina, area where I now live is in a pretty decent spot.

    Bubble Plot in JMP with Raleigh-Cary, NC, highlighted

    So, I guess I’ll stay put.

    If you want to explore this data yourself, visit the Best Cities 2010 page at the Kiplinger website.

    And if you find a better way to show it, post a comment.
    4月 122011
     
    If you’ve been to the Discovery Summit before, you know it’s not one to miss. Registration is now open for the analytics event of the year, Sept. 13-16 in Denver.

    If you haven’t attended, consider what you’ll get out of the experience.

    10 event takeaways
    • Breakout presentations from some of the finest analysts.
    • Keynotes from authorities in statistics, technology and innovation.
    • An expanded network that includes JMP users from all industries.
    • Proven statistical techniques for thoroughly exploring your data.
    • Time with JMP developers to investigate ideas and issues.
    • An appreciation for a wide variety of JMP applications.
    • Hands-on practice with JMP when you sign up for pre-conference training.
    • Memories from an evening at Red Rocks Amphitheatre and a Colorado Rockies game.
    • Inspiration for spreading analytics across your organization.
    • One more item checked off your bucket list!

    What will you put in?
    The conference registration fee is only $500. And many JMP users will qualify for premium pricing through one of the following discount programs:

    Corporate Colleague Discount: Has a colleague from your organization registered at full price? If so, you can sign up at half the price.
    JMP or SAS® Users Group Member Discount: Are you a current member of a registered users group? If so, you can sign up at half the price.
    Government and Academic Discount: Do you teach or take classes full time? If so, you can sign up at half the price with a school-affiliated email address. Are you a government employee? Sign up at half the price with a government email address.

    Our keynote speakers are amazing. We’re not just blowing hot air! Take a look:


    Jonah Lehrer
    The Popular Science Prodigy


    Jeffrey Ma
    The Real-Life Inspiration for the Film 21


    John Sall
    The Co-Founder of SAS, No. 1 on Fortune’s ‘100 Best Companies’ List


    David Salsburg
    The Storied Statistician

    And stay tuned for announcements regarding our steering committee’s call for papers decisions. Their choices will round out the week of talks with presentations from JMP users.

    We hope to see you at Discovery Summit 2011!
    4月 072011
     
    Contributed by Bill Roehl, Data Geek at Capella University (@garciasn)

    Analysts love raw data and end-users love to see that same data displayed in beautiful charts and pictures with exciting color. Dr. Danni Bayn, a Research Analyst at Capella University in Minneapolis, provides a drop-in method for SAS users to present data in ways that end-users will most definitely appreciate. Bayn explained that through the use of Circos, data visualization software written in the architecture independent Perl scripting language, SAS programmers can provide exactly what end-users are looking to see. Thankfully, she gives a step-by-step SAS program detailing everything a SAS user needs to pipe out their boring tabular data into thrilling and informative circular layouts.

    Beginning with an example taken from Wired magazine, carrying a hefty warning about the possible spoiler it may be if you have not yet seen the end of the popular television series Lost, Bayn provided a brief glimpse of the capabilities of the Circos software package. She showed the tool’s powerful capability for drilling down within the chart to display complex associations between the show’s characters. Bayn provided a fun and exciting look into the package’s true power through popular culture.

    In Bayn’s paper, she notes how Circos is used rarely outside genomics and high-profile media pieces because of the complexity of installation and operation. She provides a method, executable almost entirely within SAS, for installation of the Circos software and conversion of SAS data sets into a format that may be imported by the tool.

    Utilizing four descriptive macros, the software is installed and operated without the need for anything more than the most basic knowledge of SAS programming. While the tool itself is powerful, Bayn admits she has only harnessed a small portion of its true power through the development of her macros. However, she promised that when time permits she would like to expand on the capabilities already present and provide SAS users with even more options with which to make stunning circular data visualizations.

    Bayn’s excellent presentation spawned several after-talk discussions about how Circos may be used by others to provide new and informative ways to display their own tabular data. This is exactly why SAS Global Forum exists: to provide a single location for SAS users to learn from one another and take home with them new ways to perform their job better.

    Way to go Danni!
    3月 232011
     

    I believe I would have interviewed AnnMaria De Mars even if you hadn't sent me scads of e-mails and tweets suggesting her as a perfect candidate for the SAS Rock Stars series. I "met" AnnMaria when I started looking for SAS users on Twitter – nearly three years ago while preparing for my first SAS Global Forum conference. I was a newbie at SAS back then. My real introduction to AnnMaria was through her blog. After you've read it, you'll see why so many people who care nothing about statistics read AnnMaria's blog – it's because of her gift for storytelling and knack for cutting through life's pettiness.


    For today’s interview, I’ve caught up with Dr. De Mars as she waits in her Las Vegas hotel room to watch her daughter fight. Among AnnMaria’s many accomplishments, she is a former gold medalist in the World Judo Championship. Now, she has the good fortune to see that passion in her third daughter, a professional fighter in mixed martial arts.



    1. I usually get to know SAS users as professionals first, by what they do. So, who is AnnMaria? Tell me about your formal role using SAS – teaching and consulting.
      Until a few months ago, I was the Senior Statistical Consultant at USC – teaching SAS workshops to faculty, staff and graduate students, writing SAS documentation and providing individual consulting. For the past 25 years, I have been the partner or sole proprietor of one consulting company or another. Currently, I am the President and owner of The Julia Group, a consulting company specializing in statistical consulting, program evaluation and SAS programming. Right now, I am finishing the report on the beneficiary satisfaction survey for Ticket to Work, a work incentive program of the Social Security Administration.


    2. How do you use SAS at The Julia Group?
      As a contractor group, we do a lot of survey analysis using the SURVEYSELECT PROC to select samples and analyze data. We also do a lot of statistical analysis, both for surveys and program evaluations. Analysis of covariance, repeated measures analysis of variance and MANOVA are probably the most frequent techniques for testing whether there are differences between treatment and control covarying for any pre-existing differences, comparing experimental and control group differences on pre-test and post-test or comparing groups for differences on multiple variables. We also do a lot of logistic regression – trying to predict which student will pass, which patient will survive or other categorical outcomes. Data visualization with SAS is also a common usage. We use SAS to present data in a meaningful way to people; we use both SAS/GRAPH® and JMP® for that. 


    3. How long have you been using SAS® Enterprise Guide®? Some SAS users say they like the feeling of Base SAS better. Why did you start using SAS Enterprise Guide?
      I started using SAS when I was pregnant with my first child, so it has been more than 28 years. I started using SAS Enterprise Guide – I believe at version 1 – many years ago. It was so slow, I decided it was a piece of junk and didn’t touch it again until three years ago, when I was at USC. I thought I should at least try everything that the university licensed and make recommendations as to what should be installed everywhere, what should be discontinued and what should continue at the current level of support. The speed had improved dramatically, as had the usability. I still use SAS Enterprise Guide almost as often as I use SAS, though not as much. That probably sounds contradictory. I might use SAS Enterprise Guide to create a graph, and see what it looks like or create a summary table. Or, I might use the characterize data task to get a quick look at the data quality.


    4. Now, can you tell us a little about your social side? I love your blog, and I know many of my colleagues and Twitter friends do, too. How long ago did you start blogging, and what was the impetus?  
      I started blogging because, when I was working for a large organization, I was told by the person in charge of the website that my pages on statistics and statistical software were too informal and too controversial. I was advised that our organization only had one voice and one personality and that was the Chief Information Officer. I was told, "If you want to have your own voice, why don't you start a blog or something! You either need to do that or learn to be like everyone else!"

      I thought it would be a lot easier for me to start a blog than to learn to be like everyone else. This is actually the third blog I have started. The first was for judo, when I was Director of Development for one of the national organizations. I still write that one, too, and I think when some of the people who read my judo blog find and follow me on Twitter as @annmariastat they are probably disappointed there is not much discussion of how to armbar an opponent into submission.


    5. Many readers will not have met you in person, but I have. You are a tiny woman, but I know better than to spar with you. Why did you take up martial arts and how has that impacted your life?
      I was a short, fat kid with thick glasses who sat inside, ate and read books all day. My mom managed to get a YMCA membership one year, drove me to the Y, opened the car door, pushed me out and said, "Go join something." In those pre-Title IX days, few sports accepted girls, but the judo instructor had a sister who had wanted to join. By the time I came along, she was a black belt. I have three brothers, so I was pretty used to fighting, and I was good at judo from day one. I ended up being the first American to win a gold medal in the World Championships. I think it was a positive impact on my life going into fields that have been male-dominated – many of my classes as an undergraduate were 90 percent male. I was an engineer in 1982 when women were even scarcer than they are now. I am so accustomed to being the only woman in the room that I am now happily surprised when there are other women on committees or projects.


    6. SAS sponsors science- and math-related events to encourage young people to aim for careers in math or science. What led you to your field?
      I was good at math through high school, but not as much in college (a combination of being a 16-year-old freshman, full-time work and parties interfered). When I was in graduate school for my PhD, I objected to articles we were assigned that argued Hispanics have a lower average IQ because they are genetically less intelligent. The professor said, "AnnMaria, you just don't understand statistics." So ... I decided that would be my specialization for my PhD. I was very fortunate to be at a university with a lot of really good people in applied areas of statistics. The thing that was most helpful to me was to have a number of mentors, particularly the late Dr. Richard Eyman, who not only taught me a lot but also encouraged me to go further, introduced me to other people in the field and was constantly loading me down with stacks of books I must read or courses that I must take even after I had completed every required statistics course. He really instilled in me the idea of learning not everything you need, but everything you can possibly ever learn. 


    7. I know you have an unusual, but very interesting, passion. Could you tell me a little about your research? 
      I have a lot of interesting passions. Have you been talking to my husband? Two projects I have worked on a lot are evaluation of blended learning (combination of online and classroom instruction) for direct care staff for people with disability and chronic illness on American Indian reservations and analysis of data on ethics.


    8. Which of your SAS projects was the most fun to work on?
      In 28 years? That's like picking your favorite child! MANY years ago, I did a meta-analysis of home environment effects on cognitive development, trying to answer the question "Why do some children from what seem to be very poor, almost toxic, environments turn out well?" It was for the Theory Construction and Research Methodology workshop of the National Council on Family Relations. They do publish proceedings, but since that was pre-Internet, it is not available online. Two of the more interesting things I worked on lately were an article on Internet usage by Native Americans on reservations and analysis of ethics data. One of the reasons the article on Internet usage was interesting is that I set myself the challenge to do all of the analyses for a scientific article using SAS Enterprise Guide. The article was published in Rural Special Education Quarterly in July. The other reason that was interesting is because people had assumed a lot of things about people with disabilities living on the reservations, but they didn't have any actual data, so we got to go out and collect that and analyze it. Plus, the people I worked with were just really interesting in and of themselves.

      A while back, I had someone contact me for a genetics project. They had mapped the genomes of a couple of a male and female. They wanted to run a simulation to randomly create 100,000 offspring by random combinations of the genes and then use those records as input to another program to compare the distribution of traits in the population to what would have been observed in a population that was truly random. They could then compare the data and see which genetic combinations did not appear, which led them to speculate that perhaps those were either lethal combinations or something that caused the offspring to be quickly weeded out by predators, like a slow rabbit. One reason it was fun was because I generalized from the parallel analysis criterion we used way back in graduate school to decide on the number of factors to come up with an analogy for what we could do to create a sort of population value to test against. Another reason it was fun, as you can tell by my explanation, is that genetics is far out of my field so I was working with people who knew a lot about their area but not a lot about SAS while I was at the other end.


    9. Is there something else really cool about you that I’m missing? Do you volunteer, raise money for the poor, are you a cancer survivor, do you raise animals, etc.?
      My children are as far apart as four people could be. The oldest is a journalist who writes for ESPN and Fox News Latino. The second teaches history at an inner city middle school. The third is a professional fighter in mixed martial arts. And the fourth is a seventh grader on the student council for her third year.

      My current company, The Julia Group, is a spinoff of Spirit Lake Consulting, a company I co-founded with two partners. When I decided to run my own company, I had to come up with a new name. I was in North Dakota, readying to sign the paperwork and talking to my husband on the phone. He asks jokingly of our youngest daughter, "What do you think Mom ought to name her new company?" and the little one in the car seat pipes up, "She should name it after me!"

      Hence, The Julia Group. Julia De Mars is named after Gaston Julia, the mathematician who the Julia set of fractals is named after.

      While trying to convince me he was cool enough to date me, my husband wrote a program to create fractals, made a pink fractal and e-mailed it to me as an attachment on Valentine's Day. This is back when very few people had e-mail, much less knew about attachments. Obviously, it worked. The shareware fees from the fractal program pretty much paid for all of the baby furniture, clothes and toys. Unfortunately, now she's bigger and wants more expensive toys.


    10. Why do you attend SAS Global Forums? I know you are a SAS Rock Star, so I’m wondering if you go to learn things, teach or network? What is the high you get from going?
      We're a small company and our senior partners tend to be specialized – in medicine, qualitative research curriculum design – so usually if I have a technical problem to figure out, I'm on my own. The papers from SAS Global Forum are a great help, as are SAS-L, saspedia, the SAS blogs and the people I follow on Twitter. The big advantage of SAS Global Forum is that it is all in one place. I can go to a session on Bayesian procedures, to another on multiple imputation and a third on macros, all in the same morning. Also, I can learn about new procedures or statistics BEFORE I need them, so when a possible use comes up, I remember what I heard three months ago. It's not just the new features coming out, sometimes it's new ways of using old features, like logistic regression to calculate propensity scores, or some cool macros to read in your PROC CONTENTS output and create a report on available data. And, I take a class before and after the conference.

      SAS Global Forum is a really high concentration of smart people all in one place, and there are people I have met there, like, and look forward to meeting again. The main reason I go, though, is for the sessions. I go to a presentation almost every hour of every day except for the first one in the morning (I don't do mornings). I attend as many of the SAS Presents exhibits as I can.

      Basically, I go to soak up as much knowledge as I can. I like to learn stuff.

      AnnMaria won’t be presenting at this year’s SAS Global Forum. She hasn’t had any free time for fun stuff like writing a SAS paper. She says that she is kept quite busy “writing grant proposals, bidding on contracts, writing reports for clients, [writing] journal articles and blogs (not to mention doing the actual programming and research design) and coaching.”


    You can keep up with her tips and programming insights by following her blog. You can also check out these papers. How do you know AnnMaria De Mars?

    3月 142011
     
    One of the biggest challenges for Microsoft Excel users is modifying their Excel spreadsheets so the data can be properly analyzed in JMP. Below is an example of data an Excel user sent me:



    He asked how he could produce the above graph in JMP.

    The majority of Excel users organize their time series data as shown above, with each column representing a period of time. And this makes sense to anyone who views such data in tabular form. But for JMP to graph this, the data needs to be reorganized. The JMP Graph Builder will not read multiple time periods that are broken up into separate columns as one continuous stream of data. It needs them organized into their common denominators. By that, I mean, the time variable (or dimension) needs to be one column.

    So the first thing I did was use Tables->Stack to stack the time columns as shown below:







    Once I had the time values in one column, I was easily able to create the following graph:



    The one above duplicated the initial graph. I added the two below so you can compare the data by each day of the week. I prefer the trellis graph (the top one) because I can more easily compare each day’s calls from day to day.





    However, if you look at the original single line graph and compare it to the one I created in JMP, you’ll notice the JMP graph is not showing the average data point along with the line. If I ask JMP to display the dots as well, I’ll get the following:



    This is because, while JMP will calculate the average before it displays the line, it’s more informative to display all the data points in addition to the line than just the single calculated data point. The graph above not only shows the average value, but also how the data is spread for each time period. Notice how widely dispersed the calls are from 2:00-3:00.

    Still I could reproduce the original graph by simply creating a new file with just the average amounts. First, I summarized the data using Table -> Summary:





    Then I used Graph Builder to graph from the following summarized table.



    Graphing this data and adding grid lines gives me a result that more closely resembles the original.

    3月 112011
     
    On March 10 in her Mastering JMP live webcast, Dara Hammond demonstrated techniques for visualizing data using dynamic graphics. She shared ways to modify, customize, export and animate graphs; manipulate graph axis settings, points and display characteristics; and use the shape and geographic mapping functions to plot results.

    As promised, here are some links to information that came up during the session. Some items require registration or login.