10月 182017
 

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots are also discussed.

How to create a diffogram in SAS

The diffogram (also called a mean-mean scatter diagram) is automatically created when you use the PDIFF=ALL option on the LSMEANS statement in several SAS/STAT regression procedures such as PROC GLM and PROC GLMMIX, assuming that you have enabled ODS graphics. The documentation for PROC GLM contains an example that uses data about the chemical composition of shards of pottery from four archaeological sites in Great Britain. Researchers want to determine which sites have shards that are chemically similar. The data are contained in the pottery data set. The following call to PROC GLM performs an ANOVA of the calcium oxide in the pottery at the sites. The PDIFF=ALL option requests an analysis of all pairwise comparisons between the LS-means of calcium oxide for the different sites. The ADJUST=TUKEY option is one way to adjust the confidence intervals for multiple comparisons.

ods graphics on;
proc glm data=pottery plots(only)=( diffplot(center)          /* diffogram */
                                    meanplot(cl ascending) ); /* plot of means and CIs */
   label Ca = "Calcium Oxide (%)";
   class Site;
   model Ca = Site;
   lsmeans Site / pdiff=all cl adjust=tukey;  /* all pairwise comparisons of means w/ adjusted CL */
   ods output LSMeanDiffCL=MeanDiff;          /* optional: save mean differences and CIs */
quit;

Two graphs are requested: the diffogram (or "diffplot") and a "mean plot" that shows the group means and 95% confidence intervals. The ODS OUTPUT statement creates a data set from a table that contains the mean differences between pairs of groups, along with 95% confidence intervals for the differences. You can use that information to construct a plot of the mean differences, as shown later in this article.

How to interpret a diffogram

Diffogram in SAS showing multiple comparisons of means

The diffogram, which is shown to the right (click to enlarge), is my favorite graph for multiple comparisons of means. Every diffogram displays a diagonal reference line that has unit slope. Horizontal and vertical reference lines are placed along the axes at the location of the means of the groups. For these data, there are four vertical and four horizontal reference lines. At the intersection of most reference lines there is a small colored line segment. These segments indicate which of the 4(4-1)/2 = 6 pairwise comparisons of means are significant.

Let's start at the top of the diffogram. The mean for the Caldecot site is about 0.3, as shown by a horizontal reference line near Y=0.3. On the reference line for Caldecot are three line segments that are centered at the mean values for the other groups, which are (from left to right) IslandThorns, AshleyRails, and Llanederyn. The first two line segments (in blue) do not intersect the dashed diagonal reference line, which indicates that the means for the (IslandThorns, Caldecot) and (AshleyRails, Caldecot) pairs are significantly different. The third line segment (in red) intersects the diagonal reference line, which indicates that the (Llanederyn, Caldecot) comparison is not significant.

Similarly, the mean for the Llanederyn site is about 0.2, as shown by a horizontal reference line near Y=0.2. On the reference line for Llanederyn are two line segments, which represent the (IslandThorns, Llanederyn) and (AshleyRails, Llanederyn) comparisons. Neither line segment intersects the diagonal reference line, which indicates that the means for those pairs are significantly different.

Lastly, the mean for the AshleyRails site is about 0.05, as shown by a horizontal reference line near Y=0.05. The line segment on that reference line represents the (IslandThorns, AshleyRails) comparison. It intersects the diagonal reference line, which indicates that those means are not significantly different.

The colors for the significant/insignificant pairs depends on the ODS style, but it is easy to remember the simple rule: if a line segment intersects the diagonal reference line, then the corresponding group means are not significantly different. A mnemonic is "Intersect? Insignificant!"

The mean plot with confidence interval: Use wisely!

Mean plot and 95% confidence intervals, created with SAS

The previous section shows how to use the diffogram to visually determine which pairs of means are significantly different. This section reminds you that you should not try to use the mean plot (shown at the right) for making those inferences.

I have seen presentations in which the speaker erroneously claims that "the means of these groups are significantly different because their 95% confidence intervals do not overlap." That is not a correct inference. In general, the overlap (or lack thereof) between two (1 – α)100% confidence intervals does not give sufficient information about whether the difference between the means is significant at the α level. In particular, you can construct examples where

  • Two confidence intervals overlap, but the difference of means is significant. (See Figure 2 in High (2014).)
  • Two confidence intervals do not overlap, but the difference of means is not significant.

The reason is twofold. First, the confidence intervals in the plot are constructed by using the sample sizes and standard deviations for each group, whereas tests for the difference between the means are constructed by using pooled standard deviations and sample sizes. Second, if you are making multiple comparisons, you need to adjust the widths of the intervals to accommodate the multiple (simultaneous) inferences.

"But Rick," you might say, "in the mean plot for these data, the Llanederyn and Caldecot confidence intervals overlap. The intervals for IslandThorns and AshleyRails also overlap. And these are exactly the two pairs that are not significantly different, as shown by the diffogram!" Yes, that is true for these data, but it is not true in general. Use the diffogram, not the means plot, to visualize multiple comparisons of means.

Plot the pairwise difference of means and confidence intervals

In addition to the diffogram, you can visualize comparisons of means by plotting the confidence intervals for the pairwise mean differences. Those intervals that contain 0 represent insignificant differences. Recall that the call to PROC GLM included an ODS output statement that created a data set (MeanDiff) that contains the mean differences. The following DATA step constructs labels for each pair and computes whether each pairwise difference is significant:

data Intervals;
length Pair $28.;
length S $16.;
set MeanDiff;
/* The next line is data dependent. For class variable 'C', concatenate C and _C variables */
Pair = catx(' - ', Site, _Site);           
Significant = (0 < LowerCL | UpperCL < 0); /* is 0 in interior of interval? */
S = ifc(Significant, "Significant", "Not significant");
run;
 
title "Pairwise Difference of LSMeans (Tukey Adjustment)";
title2 "95% Confidence Intervals of Mean Difference";
footnote J=L "Pairs Whose Intervals Contain 0 Are Not Significantly Different";
proc sgplot data=Intervals;
   scatter y=Pair x=Difference / group=S name="CIs"
           xerrorlower=LowerCL xerrorupper=UpperCL;
   refline 0 / axis=x;
   yaxis reverse colorbands=odd display=(nolabel) 
                 offsetmin=0.06 offsetmax=0.06;
   keylegend "CIs" / sortorder=ascending;
run;
Plot of mean differences and confidence intervals, created with SAS

The resulting graph is shown to the right. For each pair of groups, the graph shows an estimate for the difference of means and the Tukey-adjusted 95% confidence intervals for the difference. Intervals that contain 0 indicate that the difference of means is not significant. Intervals that do not contain 0 indicate significant differences.

Although the diffogram and the difference-of-means plot provide the same information, I prefer the diffogram because it shows the values for the means rather than for the mean differences. Furthermore, the height of the difference-of-means plot needs to be increased as more groups are added (there are k(k-1)/2 rows for k groups), whereas the diffogram can accommodate a moderate number of groups without being rescaled. On the other hand, it can be difficult to read the reference lines in the diffogram when there are many groups, especially if some of the groups have similar means.

For more about multiple comparisons tests and how to visualize the differences between means, see the following references:

The post The diffogram and other graphs for multiple comparisons of means appeared first on The DO Loop.

10月 182017
 

In growing areas such as the Research Triangle Park, there is always the tough decision between developing land for business, or keeping it natural for parks and recreation. SAS is fortunate to have the great hiking trails at Umstead Park just across the road to the north - it's the [...]

The post RDU airport -vs- surrounding parks and nature appeared first on SAS Learning Post.

10月 182017
 

In growing areas such as the Research Triangle Park, there is always the tough decision between developing land for business, or keeping it natural for parks and recreation. SAS is fortunate to have the great hiking trails at Umstead Park just across the road to the north - it's the [...]

The post RDU airport -vs- surrounding parks and nature appeared first on SAS Learning Post.

10月 182017
 

You know the value of your SAS Certification; does the rest of the world? A recent MONEY and Payscale.com study named SAS Analytics skills as the most valuable skills to have in today’s job market and certification is the best way to demonstrate those skills. Recognition like this helps solidify [...]

The post Share your SAS Certification experience and maximize value appeared first on SAS Learning Post.

10月 172017
 

moving content between SAS Viya environmentsIn a SAS Viya 3.2 environment two types of content can be created: SAS Visual Analytics Reports and Data Plans. For administrators, who may want to manage that content within a folder structure, there are some things to keep in mind. In the current release, both types of content can be moved around in folders, but the objects cannot be copied. In addition, SAS Viya 3.2 supports the promotion of SAS Visual Analytics Reports, but doesn’t support the promotion of Data Plans (support for Plans is coming in SAS Viya 3.3). So, what if I want to copy a report between, say my personal folders, to a production folder?

If you want copy a Report or Data Plan within an environment there is an easy way. When the object is open in edit mode you can do a Save As to save a copy to a different location in the folder structure.

Between environments, Reports can be exported and imported using the SAS Visual Analytics, when you are editing your content (Report or Data Plan) you can access a “diagnostics” window. The diagnostics window will show you the json (or xml) used to render the Report or Plan. To enter the diagnostics window use the keystrokes:

  • ctl+alt+d for SAS Visual Data Builder.
  • ctl+alt+b for SAS Visual Analytics.

In the steps below I will use the diagnostics window to save a Data Plan so that it can be loaded to a different SAS Viya Environment. The steps for a SAS Visual Analytics report are very similar.

In SAS Visual Data Builder when editing your Data Plan select ctl-alt-d to open the SAS Visual Data Builder Diagnostics window. The source tab of the window shows the json that will render the data plan.

Click Save to save the json to a text file and close the dialog. The json file will be saved in the browsers default downloads folder.

Copy the saved text file to a location accessible to the SAS Viya environment where you want to import the plan. In that environment, open Data Builder and click New to open a new Data Plan.

Click ctl-alt-d on the empty data plan and cut and paste the json from your text file replacing the json in the diagnostics window.

Click Parse to check the json.A message should be displayed indicating that the  “plan text was parsed successfully.”  Once you have parsed the text, click Run and the plan is loaded into SAS Visual Data Builder.

In SAS Visual Data Builder, select Save As and save the plan to any location in the folder structure.

The assumption with this approach is that the data is available in the same location in both environments.

You can do much the same with SAS Visual Analytics reports. The key-stroke is ctl-alt-b to open the SAS Visual Analytics Diagnostics window.  You can see the report xml or json on the BIRD tab.

To copy a single report between environments, you can select json and then save the json to a file. In the target environment open a new report, paste the json in the BIRD tab, parse and load and then save the report to a folder. This can be a useful approach if you want to relocate a report to a different location in your target environment. The transfer service currently will only import reports to the same folder location in the target that they are located in the source environment.

I hope you found this tip useful.

A tip for moving content between SAS Viya environments was published on SAS Users.

10月 172017
 

Would you like to format your macro variables in SAS? Good news. It's easy!  Just use the %FORMAT function, like this: %let x=1111; Log %put %format(&amp;x,dollar11.); $1,111 %put %format(&amp;x,roman.); MCXI %put %format(&amp;x,worddate.); January 16, 1963   %let today=%sysfunc(today()); %put %format(&amp;today,worddate.); October 13, 2017   %put %format(Macro,$3.); Mac What?!  You never [...]

The post How to format a macro variable appeared first on SAS Learning Post.

10月 172017
 

Professionals with deep analytical knowledge, particularly using SAS, are in high demand. But to build a successful career, you first have to acquire the necessary analytical skills. Recently, I interviewed Andrea Baroni, a graduate from Lancaster University, and asked him to share his experience learning SAS and how it has helped advance his professional career. Your background Mayra [...]

The post How SAS can help your career appeared first on SAS Analytics U Blog.

10月 162017
 

Last week Warren Kuhfeld wrote about a graph called the "lines plot" that is produced by SAS/STAT procedures in SAS 9.4M5. (Notice that the "lines plot" has an 's'; it is not a line plot!) The lines plot is produced as part of an analysis that performs multiple comparisons of means. Although the graphical version of the lines plot is new in SAS 9.4M5, the underlying analysis has been available in SAS for decades. If you have an earlier version of SAS, the analysis is presented as a table rather than as a graph.

Warren's focus was on the plot itself, with an emphasis on how to create it. However, the plot is also interesting for the statistical information it provides. This article discusses how to interpret the lines plot in a multiple comparisons of means analysis.

The lines plot in SAS

You can use the LINES option in the LSMEANS statement to request a lines plot in SAS 9.4M5. The following data are taken from Multiple Comparisons and Multiple Tests (p. 42-53 of the First Edition). Researchers are studying the effectiveness of five weight-loss diets, denoted by A, B, C, D, and E. Ten male subjects are randomly assigned to each method. After a fixed length of time, the weight loss of each subject is recorded, as follows:

/* Data and programs from _Multiple Comparisons and Multiple Tests_
   Westfall, Tobias, Rom, Wolfinger, and Hochberg (1999, First Edition) */
data wloss;
do diet = 'A','B','C','D','E';
   do i = 1 to 10;  input WeightLoss @@;  output;  end;
end;
datalines;
datalines;
12.4 10.7 11.9 11.0 12.4 12.3 13.0 12.5 11.2 13.1
 9.1 11.5 11.3  9.7 13.2 10.7 10.6 11.3 11.1 11.7
 8.5 11.6 10.2 10.9  9.0  9.6  9.9 11.3 10.5 11.2
 8.7  9.3  8.2  8.3  9.0  9.4  9.2 12.2  8.5  9.9
12.7 13.2 11.8 11.9 12.2 11.2 13.7 11.8 11.5 11.7
;

You can use PROC GLM to perform a balanced one-way analysis of variance and use the MEANS or LSMEANS statement to request pairwise comparisons of means among the five diet groups:

proc glm data=wloss;
   class diet;
   model WeightLoss = diet;
   *means diet / tukey LINES; 
   lsmeans diet / pdiff=all adjust=tukey LINES;
quit;

In general, I use the LSMEANS statement rather than the MEANS statement because LS-means are more versatile and handle unbalanced data. (More about this in a later section.) The PDIFF=ALL option requests an analysis of all pairwise comparisons between the LS-means of weight loss for the different diets. The ADJUST=TUKEY option is a common way to adjust the widths of confidence intervals to accommodate the multiple comparisons. The analysis produces several graphs and tables, including the following lines plot.

Lines plot for multiple comparisons of means in SAS

How to interpret a lines plot

In the lines plot, the vertical lines visually connect groups whose LS-means are "statistically indistinguishable." Statistically speaking, two means are "statistically indistinguishable" when their pairwise difference is not statistically significant.

If you have k groups, there are k(k-1)/2 pairwise differences that you can examine. The lines plot attempts to summarize those comparisons by connecting groups whose means are statistically indistinguishable. Often there are fewer lines than pairwise comparisons, so the lines plot displays a summary of which groups have similar means.

In the previous analysis, there are five groups, so there are 10 pairwise comparisons of means. The lines plot summarizes the results by using three vertical lines. The leftmost line (blue) indicates that the means of the 'B' and 'C' groups are statistically indistinguishable (they are not significantly different). Similarly, the upper right vertical bar (red) indicates that the means of the pairs ('E','A'), ('E','B'), and ('A','B') are not significantly different from each other. Lastly, the lower right vertical bar (green) indicates that the means for groups 'C' and 'D' are not significantly different. Thus in total, the lines plot indicates that five pairs of means are not significantly different.

The remaining pairs of mean differences (for example, 'E' and 'D') are significantly different. By using only three vertical lines, the lines plot visually associates pairs of means that are essentially the same. Those pairs that are not connected by a vertical line are significantly different.

Advantages and limitations of the lines plot

Advantages of the lines plot include the following:

  • The groups are ordered according to the observed means of the groups.
  • The number of vertical lines is often much smaller than the number of pairwise comparisons between groups.

Notice that the groups in this example are the same size (10 subjects). When the group sizes are equal (the so-called "balanced ANOVA" case), the lines plot can always correctly represent the relationships between the group means. However, that is not always true for unbalanced data. Westfall et al. (1999, p. 69) provide an example in which using the LINES option on the MEANS statement produces a misleading plot.

The situation is less severe when you use the LSMEANS statement, but for unbalanced data it is sometimes impossible for the lines plot to accurately connect all groups that have insignificant mean differences. In those cases, SAS appends a footnote to the plot that alerts you to the situation and lists the additional significances not represented by the plot.

In my next blog post, I will show some alternative graphical displays that are appropriate for multiple comparisons of means for unbalanced groups.

Summary

In summary, the new lines plot in SAS/STAT software is a graphical version of an analysis that has been in SAS for decades. You can create the plot by using the LINES option in the LSMEANS statement. The lines plot indicates which groups have mean differences that are not significant. For balanced data (or nearly balanced), it does a good job of summarizes which differences of means are not significant. For highly unbalanced data, there are other graphs that you can use. Those graphs will be discussed in a future article.

The post Graphs for multiple comparisons of means: The lines plot appeared first on The DO Loop.

10月 132017
 

Every year in early October, the eyes of the world turn to Sweden and Norway, where the Nobel Prize winners are announced to the world. The Nobel Prize is considered the world's most prestigious award. Since 1901, the Prize has been presented to individuals and organizations that have made significant achievements in the fields of physics, chemistry, physiology or medicine, world peace and literature in each year (there were several exceptions during war years). In 1968, Sveriges Riksbank established the Sveriges Riksbank Prize in Economic Sciences in memory of Alfred Nobel, founder of the Nobel Prize. Today, individuals or organizations who are awarded Nobel Prizes and the Prize in Economic Sciences are called Nobel Laureates.

So far, more than 900 Nobel Laureates have been awarded. In this post, I wanted to learn a little more about these impressive individuals. Where were these Nobel Laureates from? Why do they get awarded? Is there any common characteristics you’ll find in these Laureates? Below you’ll find a preliminary analysis of Nobel Laureates using SAS Visual Analytics.

The analysis is based on data from List_of_Nobel_laureates, List of Nobel laureates by university affiliation and Nobel Laureates datasets at Kaggle, which definitely has some missing and inconsistent values. I have cleaned the data to correct for some obvious inconsistency as possible for my analysis.

How many Nobel Laureates have their been so far?

Recently, 12 new Nobel Laureates were awarded by the 2017 Nobel Prizes and Prize in Economic Sciences, and that makes 923 Laureates in total since the first Nobel Prize in 1901. Some Laureates share one prize, so we see more shared Laureates total in below table. While we see 27 organization winners of the Peace prize, most Laureates are individual winners.

analysis of the Nobel Laureates

The chart below shows the overall trend of annual total Nobel Laureates is increasing year-over-year, as more and more winners are sharing the Prize. The purple circle on the plot indicates that there are shared winners in that year. The average number of winners is about eight each year. Yet there was only one winner in 1916 for the Literature Prize. The most winners came in 2001, with 15 Laureates sharing the prizes. I also note from the chart that during the First World War, there were very few Nobel Prizes awarded, and during the Second World War, there were none.

Moreover, we know that most Nobel Laureates are awarded one Nobel Prize, yet I learned from childhood that the female scientist Marie Curie received two Nobel Prizes. If you search the datasets for winners awarded more than one Prize, you’ll find four scientists accomplished this feat. They are: Marie Curie, Linus Pauling, John Bardeen and Frederick Sanger.

Do Nobel Laureates live longer?

The answer is YES, per the research by Prof. Andrew Oswald from University of Warwick. Winning a Nobel prize adds about 1.5 years to the lifespan of Nobel Laureates compared to those who were merely nominated. Of course, it is not because of the monetary benefits that come with the Nobel Prize, but because of ‘the deep links between mind and body’, and that ‘happiness’ may make people live longer, which makes sense to me.

Since I don’t have the data of Nobel Prize nominees, let’s only test the lifespan of the Nobel Laureates and the ages they got awarded. The average life age of all Nobel Laureates is nearly 80, much older than the global average life expectancy of 71.4 years-old (according to World Health Organization 2015). Digging a bit more, we see Martin Luther King is the Nobel laureate (Peace, 1964) who died at youngest age. He was assassinated at 39 years old. Laureates who lived longest are Rita Levi-Montalcini (Medicine, 1986) and Ronald H. Coase (Economics, 1991), who both lived to 103 years old. You may also notice that the distribution of the Laureates’ lifespan is left skewed, the Nobel Prize winners certainly live longer than most.

In addition, something more worth noting:

  • The most laureates with the longest lifespans are from the Economics and Medicine categories. The Nobel Prize winning economists live longer than other categories’ winners on average. The average lifespan of these economists is about 86 years-old, five years longer than the second category of Medicine.
  • Economics winners are winning the awards at the highest age – 67 years-old on average. More digging shows that the oldest awarded age is 90 when Leonid Hurwicz (Economics, 2007) was awarded his Prize. We see the average awarded age of Physics winners is 56, which is 10+ years younger than that of the Economics winners. Thus, we get the impression that economists need more time to have outstanding achievements.
  • If we compare the time span between Laurates’ average awarded ages and their lifespan, the Physics Prize winners enjoy the longest life time after winning the award – about 20 years on average.
  • It is also worth noting that the Nobel Peace winners have the largest span of awarded age, about 70 years’ span. That’s because the youngest Nobel Laureates Malala Yousafzai, who got awarded of Nobel Peace Prize at 17 years-old in 2014.

The chart below is created in SAS Visual Analytics and shows the awarded ages of all individual Nobel Laureates in different prize categories. The reference line is the average awarded age of 59. It is very easy to note that no Nobel Prize was awarded during 1940-1943 due to the Second World War.

From which universities have Nobel Laureates graduated?

Next, let’s look at the educational background of Nobel Laureates. The left chart below obviously shows that much more Nobel winners hold Doctorate degrees than those of Bachelor or Master degrees. If we see the chart for Literature and Peace categories on the right, the difference is not that big. From the data, we know that the educational background of Nobel Laureates in Physics, Chemistry, Medicine and Economics categories (I call these four categories the scientific categories for easier description later) has the higher percentage of doctorate than that of winners in the Literature and Peace categories.

To learn more about the universities the Laureates in these scientific categories are graduated from, I ranked the top 10 university affiliations for the scientific categories in below chart, and their distribution among these categories, as well as the countries in which these universities are located.

The top 10 university affiliations were selected basing on the highest degree of the scientific categories’ Laureates obtained. That is, if one winner held a Master degree from Harvard University and a Doctorate degree from University of Cambridge, he/she is counted in University of Cambridge but not in the Harvard University. From the parallel coordinates plot, you may have noticed that the Physics in University of Cambridge and the Medicine in Harvard University are their greatest majors respectively. On the right, it shows the countries where these top 10 university affiliations are in United States, United Kingdom, France and Germany. The bar charts on the left show the percentage of educational degrees (Doctorate, Master, Bachelor) of each in the scientific categories (according to the available dataset). In the bottom chart, top 10 universities are ranked by their percentages. Perhaps now you have a great university in your mind for future education?

Next, I created the chart below to show the top eight countries having the university affiliations that more Nobel Prize winners graduated from. (Here the chart only shows for scientific categories, thus it excludes the Nobel Literature Prize and Peace prize.). An obvious trend we see from the chart is that the United States has the most Laureates spanning in the scientific categories after the Second World War, while Germany has more Laureates in the scientific categories comparatively before World War II.

Why do the Nobel Laureates get awarded?

Per the ‘nobelprize.org’, in his excerpt of the will, Alfred Nobel (1833-1896) dictates that his entire remaining estate should be used to endow "prizes to those who, during the preceding year, shall have conferred the greatest benefit to mankind." So Alfred's interests are reflected in the Prize, which said “The whole of his remaining realizable estate constitutes a fund, and the annually interest shall be divided into five equal parts, which shall be apportioned as follows: one part to the person who shall have made the most important discovery or invention within the field of physics; one part to the person who shall have made the most important chemical discovery or improvement; one part to the person who shall have made the most important discovery within the domain of physiology or medicine; one part to the person who shall have produced in the field of literature the most outstanding work in an ideal direction; and one part to the person who shall have done the most or the best work for fraternity between nations, for the abolition or reduction of standing armies and for the holding and promotion of peace congresses.”

Since it’s not easy to seek evidence in the datasets that Nobel Laureates are awarded by fulfilling Alfred’s will, what I do is to use SAS Visual Analytics text topics analysis performing some preliminary text analysis of the ‘Motivation’ field in the dataset for a validation to some extent. The ‘Motivation’ is given by ‘nobelprize.org’ for why the Laureate gets awarded. The analysis shows that the most frequently mentioned word is ‘discovery’, while the most 5 frequently appeared words include ‘work’, ‘development’, ‘contribution’, and ‘theory’. And from the topics analysis result, the top 10 topics are about ‘discovery’, ‘human”, “structure”, “economic”,” technique”, etc., which are reflecting Alfred Nobel‘s will in establishing the Prize. Moreover, the sentimental analysis result shows that the statements in the ‘Motivation’ field are mainly neutral (being ‘objective’), even though there are few positive and negative sentimental statements.

 

I hope you’ve found this analysis of Nobel Laureates data interesting. I believe there are still many other perspectives you can analyze to get insights. Is there anything interesting you see?

A preliminary analysis of the Nobel Laureates was published on SAS Users.