5月 052017
 

In the SAS’ Omni Channel Analytics webinar series, we review the challenges we’re seeing within SAS’ customer base in achieving the omnichannel vision. We have found that many retailers and consumer goods packaging companies struggle with the same issues as they embark upon this journey. Common obstacles are: Data is [...]

The post Taming the omnichannel data monster appeared first on The Data Roundtable.

5月 052017
 

SAS Visual Analytics 7.4 comes with a chock-full of new features. Report and section linking come with an added benefit. If you set up linking from one section to another section in the same report, or from one report to another report, you have the option to configure linking such that any filter prompt in the linked target location is brushed or highlighted by the values that are selected in the linked report object. And the visual report objects in that target location are filtered to reflect the context that was passed from the source location.

In SAS Visual Analytics 7.3, when you took a report link from the subscribed report to a target report with a filter prompt (or a target section in the current report) with a filter prompt, the target filter prompt was filtered by the selection made in the source report or source section. Now, with SAS Visual Analytics 7.4, if a selection is made in the source report, and a report link (or a section link) is taken to the target report (or target section), the target filter prompt is brushed. Users benefit from the flexibility to choose filter options from that filter prompt in the target location and modify that filter prompt selection as needed. Note that in both the source and target locations, common data sources should be used. If the data item is different, you are asked to map it.

To illustrate this new linking feature in SAS Visual Analytics 7.4, I created a source report and a target report. The source report has a Button Bar that filters the report objects in the source report. The target report contains the target Button Bar that receives the filtering selection made in the source report and displays the applicable button.

To illustrate the new linking enhancement, let’s take a look at the default scenario and the configured scenario where the values in the target report filter prompt are brushed or highlighted. Here are the two reports – the source and the target reports.

Source Report with Linking

Target Report

Default behavior for report and section linking

In this example, let’s take a quick look at how linking worked in SAS Visual Analytics 7.3 (and it still works the same way in SAS Visual Analytics 7.4 by default). In the following source report, I have a Button Bar in the filter prompt.

Choosing Orion Germany in the Button Bar

When I choose Orion Germany in the Button Bar, the report objects are filtered to show the filtered results.

Report Objects Filtered by Orion Germany in the Source Report

When I take a link from the Orion Germany tile in the Treemap to the target report, the Button Bar in the target report is filtered to show Orion Germany (this is the default behavior for linking) in the target report.

Target Report With Orion Germany in the Button Bar

But what if I want my users to take a report link from the source report, and be able to choose from the filter choices in the Button Bar within this target report?

SAS Visual Analytics 7.4 to the rescue!

Here’s an example of what I did with the report linking in SAS Visual Analytics 7.4 by allowing the filtering choices to be retained in the target filter prompt.

I chose Orion France in the Button Bar within the Source Report:

Choosing Orion France in the Source Report

Then, I took a report link from the Orion France tile in the Tilemap to the target report:

Target Report with Orion France Highlighted in the Button Bar

Notice how the Button Bar in the target report is brushed by Orion France, and I still have a choice of selecting a different Orion country in the Button Bar.

Design the Link for the Prompt Filters in the Target Report

It’s simple to make this happen.

1.  In the source report, where I had previously created report linking, I selected the Treemap and I chose to edit the report link by going to Interactions tab.

2.  I clicked the icon for editing this report link.

3.  In the Edit Report Link dialog, I selected the checkbox for Set the value for controls in the target report prompt bar and clicked OK. And I saved my source report. That’s it!

Note: This option sets only values on the controls that use the same data item as the source object or on data items that filter the source object. The source and target of report link should be based on the same data source. If you have multiple data sources, you are prompted to map the report link.

Linking to target reports and sections in SAS Visual Analytics 7.4 was published on SAS Users.

5月 032017
 

I recently saw an interesting data visualization on the flowingdata website, which analyzed & compared the causes of fatal crashes in the US, by month and time-of-day. At first I thought it was a really cool visualization, but after I studied it a while, I realized that I had misinterpreted [...]

The post When do fatal crashes happen? appeared first on SAS Learning Post.

5月 032017
 

If a financial analyst says it is "likely" that a company will be profitable next year, what probability would you ascribe to that statement? If an intelligence report claims that there is "little chance" of a terrorist attack against an embassy, should the ambassador interpret this as a one-in-a-hundred chance, a one-in-ten chance, or some other value?

Analysts often use vague statements like "probably" or "chances are slight" to convey their beliefs that a future event will or will not occur. Government officials and policy-makers who read reports from analysts must interpret and act on these vague statements. If the reader of a report interprets a phrase different from what the writer intended, that can lead to bad decisions.

Assigning probabilities to statements

Original box plot: Distribution of probabilities for word phrases

In the book Psychology of Intelligence Analysis (Heuer, 1999), the author presents "the results of an experiment with 23 NATO military officers accustomed to reading intelligence reports. They were given a number of sentences such as: "It is highly unlikely that ...." All the sentences were the same except that the verbal expressions of probability changed. The officers were asked what percentage probability they would attribute to each statement if they read it in an intelligence report."

The results are summarized in the adjacent dot plot from Heuer (Chapter 12), which summarizes how the officers assess the probability of various statements. The graph includes a gray box for some statements. The box is not a statistical box plot. Rather it indicates the probability range according to a nomenclature proposed by Kent (1964), who tried to get the intelligence community to agree that certain phrases would be associated with certain probability ranges.

For some statements (such as "better than even" and "almost no chance") there was general agreement among the officers. For others, there was large variability in the probability estimates. For example, many officers interpreted "probable" as approximately a 75% chance, but quite a few interpreted it as less than 50% chance.

A modern re-visualization

Zonination's box plot: Distribution of probabilities for word phrases

The results of this experiment are interesting on many levels, but I am going to focus on the visualization of the data. I do not have access to the original data, but this experiment was repeated in 2015 when the user "Zonination" got 46 users on Reddit (who were not military experts) to assign probabilities to the statements. His visualization of the resulting data won a 2015 Kantar Information is Beautiful Award. The visualization uses box plots to show the schematic distribution and overlays the 46 individual estimates by using a jittered, semi-transparent, scatter plot. The Zonination plot is shown at the right (click to enlarge). Notice that the "boxes" in this second graph are determined by quantiles of the data, whereas in the first graph they were theoretical ranges.

Creating the graph in SAS

I decided to remake Zonination's plot by using PROC SGPLOT in SAS. I made several modifications that improve the readability and clarity of the plot.

  • I sorted the categories by the median probability. The median is a robust estimate of the "consensus probability" for each statement. The sorted categories indicate the relative order of the statements in terms of perceived likelihood. For example, an "unlikely" event is generally perceived as more probable than an event that has "little chance." For details about sorting the variables in SAS, see my article about how to sort variables by a statistic.
  • I removed the colors. Zonination's rainbow-colored chart is aesthetically pleasing, but the colors do not add any new information about the data. However, the colors help the eye track horizontally across the graph, so I used alternating bands to visually differentiate adjacent categories. You can create color bands by using the COLORBANDS= option in the YAXIS statement.
  • To reduce overplotting of markers, I used systematic jittering instead of random jittering. In random jittering, each vertical position is randomly offset. In systematic (centered) jittering, the markers are arranged so that they are centered on the "spine" of the box plot. Vertical positions are changed only when the markers would otherwise overlap. You can use the JITTER option in the SCATTER statement to systematically jitter marker positions.
  • Zonination's plot displays some markers twice, which I find confusing. Outliers are displayed once by the box plot and a second time by the jittered scatter plot. In my version, I suppress the display of outliers by the box plot by using the NOOUTLIERS option in the HBOX statement.

You can download the SAS code that creates the data, sorts the variables by median, and creates the plot. The following call to PROC SGPLOT shows the HBOX and SCATTER statements that create the plot:

title "Perceptions of Probability";
proc sgplot data=Long noautolegend;
   hbox _Value_ / category=_Label_ nooutliers nomean nocaps;  
   scatter x=_Value_ y=_Label_ / jitter transparency=0.5
                     markerattrs=GraphData2(symbol=circlefilled size=4);
   yaxis reverse discreteorder=data labelpos=top labelattrs=(weight=bold)
                     colorbands=even colorbandsattrs=(color=gray transparency=0.9)
                     offsetmin=0.0294 offsetmax=0.0294; /* half of 1/k, where k=number of catgories */
   xaxis grid values=(0 to 100 by 10);
   label _Value_ = "Assigned Probability (%)" _label_="Statement";
run;
SAS box plot: Distribution of probabilities for word phrases

The graph indicates that some responders either didn't understand the task or intentionally gave ridiculous answers. Of the 17 categories, nine contain extreme outliers, such as assigning certainty (100%) to the phrases "probably not," "we doubt," and "little chance." However, the extreme outliers do not affect the statistical conclusions about the distribution of probabilities because box plots (which use quartiles) are robust to outliers.

The SAS graph, which uses systematic jittering, reveals a fact about the data that was hidden in the graphs that used random jittering. Namely, most of the data values are multiples of 5%. Although a few people responded with values such as 88.7%, 1%, or 3%, most values (about 80%) are rounded to the nearest 5%. For the phrases "likely" and "we believe," 44 of 46 responses (96%) were a multiple of 5%. In contrast, the phrase "almost no chance" had only 18 of 46 responses (39%) were multiples of 5% because many responses were 1%, 2%, or 3%.

Like the military officers in the original study, there is considerable variation in the way that the Reddit users assign a probability to certain phrases. It is interesting that some phrases (for example, "We believe," "Likely," and "Probable") have the same median value but wildly different interquartile ranges. For clarity, speakers/writers should use phrases that have small variation or (even better!) provide their own assessment of probability.

Does something about this perception study surprise you? Do you have an opinion about the best way to visualize these data? Leave a comment.

The post Perceptions of probability appeared first on The DO Loop.

5月 032017
 

The list of SAS credentials keeps growing every year, as more and more SAS users want to validate their application of SAS skills in different business topic areas, such as SAS Data Management, SAS Administration, and more. The field of Big Data is no exception, and the SAS Global Certification [...]

The post Do SAS Big Data credentials equal big professional value? appeared first on SAS Learning Post.

5月 022017
 

For many years the humble spreadsheet has held many different roles and responsibilities supporting finance, marketing, sales -- pretty much every department in your business. There's always someone with a “magic spreadsheet,” but how effective is this culture that always uses the same format to consume data? My view of [...]

The spreadsheet: Friend or foe? was published on SAS Voices by Tim Clark

5月 012017
 
How transferable are features in deep neural networks? https://arxiv.org/abs/1411.1792
TensorFlow CNN for fast style transfer https://github.com/lengstrom/fast-style-transfer
https://github.com/HappyShadowWalker/ChineseTextClassify 中文文本分类,使用搜狗文本分类语料库
https://lukeoakdenrayner.wordpress.com/2017/04/24/the-end-of-human-doctors-understanding-medicine/ The End of Human Doctors – Understanding Medicine
Machine Learning in Science and Industry slides http://arogozhnikov.github.io/2017/04/20/machine-learning-in-science-and-industry.html https://github.com/yandexdataschool/MLAtGradDays
all the available code repos for the NIPS 2016's top papers https://www.reddit.com/r/MachineLearning/comments/5hwqeb/project_all_code_implementations_for_nips_2016/
Best Practices for Applying Deep Learning to Novel Applications https://arxiv.org/abs/1704.01568v1?utm_campaign=Revue newsletter&utm_medium=Newsletter&utm_source=revue
Medical Image Analysis with Deep Learning https://medium.com/@taposhdr/medical-image-analysis-with-deep-learning-i-23d518abf531
https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf
中国谣言数据库 http://rumor.thunlp.org/


 
 Posted by at 10:33 下午