7月 062018
 

SAS Visual Text Analytics provides dictionary-based and non-domain-specific tokenization functionality for Chinese documents, however sometimes you still want to get N-gram tokens. This can be especially helpful when the documents are domain-specific and most of the tokens are not included into the SAS-provided Chinese dictionary.

What is an N-gram?

An N-gram is a sequence of N items from a given text with n representing any positive integer starting from 1. When n is 1, it refers to a unigram; when n is 2, it refers to a bigram; when n is 3, it refers to a trigram. For example, suppose we have a text in Chinese "我爱中国。", which means "I love China." Its N-gram sequence looks like the following:

n Size N-gram Sequence
1 [我], [爱], [中], [国], [。]
2 [我爱], [爱中], [中国], [国。]
3 [我爱中], [爱中国], [中国。]

How many N-gram tokens are in a given sentence?

If Token_Count_of_Sentence is number of words in a given sentence, then the number of N-grams would be:

Count of N-grams = Token_Count_of_Sentence – ( n - 1 )

The following table shows the N-gram token count of "我爱中国。" with different n sizes.

n Size N-gram Sequence Token Count
1 [我], [爱], [中], [国], [。] 5 = 5- (1-1)
2 [我爱], [爱中], [中国], [国。] 4 = 5- (2-1)
3 [我爱中], [爱中国], [中国。] 3 = 5- (3-1)

In real actual language processing (NLP) tasks, we often want to get unigram, bigram and trigram together when we set N as 3. Similarly, when we set N as 4, we want to get unigram, bigram, trigram, and four-gram together.

N-gram theory is very simple and under some conditions it has big advantage over dictionary-based tokenization method, especially when the corpus you are working on has many vocabularies out of the dictionary or you don't have a dictionary at all.

How to get N-grams with SAS?

SAS is a powerful programming language when you manipulate data. Below you'll find a program I wrote, using the DATA step to get N-grams.

data data_test;
   infile cards dlm='|' missover;
   input _document_ text :$100.;
cards;
1|我爱中国。
;
run;
 
data NGRAMS;
   set data_test;
   _tmpStr_ = text;
   do while (klength(_tmpStr_)>0);  
      _maxN_=min(klength(_tmpStr_), 3);  
      do _i_=1 to _maxN_;
         _term_ = ksubstr(_tmpStr_, 1, _i_);
         output;  
      end;  
      if klength(_tmpStr_)>1 then _tmpStr_ = ksubstr(_tmpStr_, 2);  
      else _tmpStr_ = '';
   end;
   keep _document_ _term_ _i_;
run;

Let's see the SAS results.

proc sort data=NGRAMS;
   by _document_ _i_;
run;
 
proc print; run;

N-gram results

N-grams tokenization is the first step of NLP tasks. For most NLP tasks the second step is to calculate the term frequency–inverse document frequency (TF-IDF). Here's the approach:

tfidf(t,d,D) = tf(t,d) * idf(t,D)
IDF(t) = log_e(total number of documents / number of documents that contain term t)

Where t denotes the terms; d denotes each document; D denotes the collection of documents.

Suppose that you need to handle process lots of documents -- let me show you how to do it using SAS Viya. I used these four steps.

Step 1: Start CAS Server and create a CAS library.

cas casauto host="host.example.com" port=5570;
libname mycas cas;
 
<h4>Step 2: Load your data into CAS. </h4>
Here to simply the code, I only tried 3 sentences for demo purpose. 
data mycas.data_test;
   infile cards dlm='|' missover;
   input _document_ fact :$100.;
cards;
1|我爱中国。
2|我是中国人。
3|我是山西人。
;
run;

Once the data in loaded to CAS, you may run following code to check the column information and record count of your corpus.

proc cas;
  table.columnInfo / table="data_test";
run;
 
  table.recordCount / table="data_test";
run;
quit;

Step 3: Tokenize texts into N-grams

%macro TextToNgram(dsin=, docvar=, textvar=, N=, dsout=);
proc cas;
   loadactionset "dataStep";
   dscode =
      "data &dsout;
         set &dsin;
         length _term_ varchar(&N);
         _tmpStr_ = &textvar;
         do while (klength(_tmpStr_)>0);
            _maxN_=min(klength(_tmpStr_), &N);
            do _i_=1 to _maxN_;
              _term_ = ksubstr(_tmpStr_, 1, _i_);
              output;
            end;
            if klength(_tmpStr_)>1 then _tmpStr_ = ksubstr(_tmpStr_, 2); 
            else _tmpStr_ = ''; 
         end;
         keep &docvar _term_;
      run;";
   runCode code = dscode; 
run;
quit;
%mend TextToNgram;
 
%TextToNgram(dsin=data_test, docvar=_document_, textvar=text, N=3, dsout=NGRAMS);

Step 4: Calculate TF-IDF.

%macro NgramTfidfCount(dsin=, docvar=, termvar=, dsout=);
proc cas;
simple.groupBy / table={name="&dsin"}
                 inputs={"&docvar", "&termvar"}
                 aggregator="n" 
                 casout={name="NGRAMS_Count", replace=true};
run;
quit;
 
proc cas;
simple.groupBy / table={name="&dsin"}
                 inputs={"&docvar", "&termvar"}
                 casout={name="term_doc_nodup", replace=true};
run;
 
simple.groupBy / table={name="term_doc_nodup"}
                 inputs={"&docvar"}
                 casout={name="doc_nodup", replace=true};
run;
numRows result=r/ table={name="doc_nodup"};
totalDocs = r.numrows;
run;
 
simple.groupBy / table={name="term_doc_nodup"}
                 inputs={"&termvar"}
                 aggregator="n" 
                 casout={name="term_numdocs", replace=true};
run;
 
mergePgm = 
    "data &dsout;"
      || "merge NGRAMS_Count(keep=&docvar &termvar _score_ rename=(_score_=tf))
            term_numdocs(keep=&termvar _score_ rename=(_score_=numDocs));"
      || "by &termvar;"
      || "idf=log("||totalDocs||"/numDocs);"
      || "tfidf=tf*idf;"
      || "run;";
print mergePgm;
dataStep.runCode / code=mergePgm;
run;
quit;
%mend NgramTfidfCount;
 
%NgramTfidfCount(dsin=NGRAMS, docvar=_document_, termvar=_term_, dsout=NGRAMS_TFIDF);

Now let's see the TFIDF result of the first sentence.

proc print data=sascas1.NGRAMS_TFIDF;
   where _document_=1;
run;

ngram results

These N-gram methods are not designed only for Chinese documents; and documents in any language can be tokenized with this method. However, the tokenization granularity of English documents is different from Chinese documents, which is word-based rather than character-based. To handle English documents, you only need to make small changes to my code.

How to get N-grams and TF-IDF count from Chinese documents was published on SAS Users.

7月 052018
 

Our company talks to utilities all over the world about the value of analytics and "the digital utility," and we share analytics use cases across: assets and operations; customers; portfolio; and corporate operations (see diagram below). In this third post of my four-part series, I'll highlight a customer analytics use case. Customer [...]

Analytics use cases for utilities: Customer analytics was published on SAS Voices by David Pope

7月 052018
 

SAS enables you to evaluate a regression model at any location within the range of the data. However, sometimes you might be interested in how the predicted response is increasing or decreasing at specified locations. You can use finite differences to compute the slope (first derivative) of a regression model. This numerical approximation technique is most useful for nonparametric regression models that cannot be simply written in terms of an analytical formula.

A nonparametric model of drug absorption

The following data are the hypothetical concentrations of a drug in a patient's bloodstream at times (measured in hours) during a 72-hour period after the drug is administered. For a real drug, a pharmacokinetic researcher might construct a parametric model and fit the model by using PROC NLMIXED. The following example uses the EFFECT statement to fit a regression model that uses cubic splines. Other nonparametric models in SAS include loess curves, generalized additive models, adaptive regression, and thin-plate splines.

data Drug;
input Time Concentration @@;
datalines;
1  3    3  7   6 19  12 73  18 81
24 71  36 38  42 28  48 20  72 12
;
 
proc glmselect data=Drug;
   effect spl = spline(Time/ naturalcubic basis=tpf(noint) knotmethod=percentiles(5));
   model Concentration = spl / selection=none; /* fit model by using spline effects */
   store out=SplineModel;                      /* store model for future scoring */
quit;

Because the data are not evenly distributed in time, a graph of the spline fit evaluated at the data points does not adequately show the response curve. Notice that the call to PROC GLMSELECT used a STORE statement to store the model to an item store. You can use PROC PLM to score the model on a uniform grid of values to visualize the regression model:

/* use uniform grid to visualize curve */
data ScoreData;
do Time = 0 to 72;
   output;
end;
run;
 
/* score the model on the uniform grid */
proc plm restore=SplineModel noprint;
   score data=ScoreData out=ScoreResults;
run;
 
/* merge fitted curve with original data and plot the fitted curve */
data All;
set Drug ScoreResults;
run;
 
title "Observed and Predicted Blood Content of Drug";
proc sgplot data=All noautolegend; 
   scatter x=Time y=Concentration;
   series x=Time y=Predicted / name="fit" legendlabel="Predicted Response";
   keylegend "fit" / position=NE location=inside opaque;
   xaxis grid values=(0 to 72 by 12) label="Hours";
   yaxis grid;
run;
Nonparametric model: How can you evaluate the derivative?

Finite difference approximations for derivatives of nonparametric models

A researcher might be interested in knowing the slope of the regression curve at certain time points. The slope indicates the rate of change of the response variable (the blood-level concentration). Because nonparametric regression curves do not have explicit formulas, you cannot use calculus to compute a derivative. However, you can use finite difference formulas to compute a numerical derivative at any point.

There are several finite difference formulas for the first derivative. The forward and backward formulas are less accurate than the central difference formula. Let h be a small value. Then the approximate derivative of a function f at a point t is given by
f′(t) ≈ [ f(t + h) – f(th) ] / 2h

The formula says that you can approximate the slope at t by evaluating the model at the points t ± h. You can use the DIF function to compute the difference between the response function at adjacent time points and divide that difference by 2h. The code that scores the model is similar to the more-familiar case of scoring on a uniform grid. The following statements evaluate the derivative of the model at six-hour intervals.
/* compute derivatives at specified points */
data ScoreData;
h = 0.5e-5;
do t = 6 to 48 by 6;        /* siz-hour intervals */
   Time = t - h;  output;
   Time = t + h;  output;
end;
keep t Time h;
run;
 
/* score the model at each time point */
proc plm restore=SplineModel noprint;
   score data=ScoreData out=ScoreOut;  /* Predicted column contains f(x+h) and f(x-h) */
run;
 
/* compute first derivative by using central difference formula */
data Deriv;
set ScoreOut;
Slope = dif(Predicted) / (2*h);  /* [f(x+h) - f(x-h)] / 2h */
Time = t;                        /* estimate slope at this time */
if mod(_N_,2)=0;                 /* process observations in pairs; drop even obs */
drop h t;
run;
 
proc print data=Deriv;
run;

The output shows that after six hours the drug is entering the bloodstream at 6.8 units per hour. By 18 hours, the rate of absorption has slowed to 0.6 units per hour. After 24 hours, the rate of absorption is negative, which means that the blood-level concentration is decreasing. At approximately 30 hours, the drug is leaving the bloodstream at the rate of -3.5 units per hour.

This technique generalizes to other nonparametric models. If you can score the model, you can use the central difference formula to approximate the first derivative in the interior of the data range.

For more about numerical derivatives, including a finite-difference approximation of the second derivative, see Warren Kuhfeld's article on derivatives for penalized B-splines. Warren's article is focused on how to obtain the predicted values that are generated by the built-in regression models in PROC SGPLOT (LOESS and PBSPLINE), but it contains derivative formulas that apply to any regression curve.

The post Compute derivatives for nonparametric regression models appeared first on The DO Loop.

7月 032018
 

In FiveThirtyEight's Significant Digits section, they recently mentioned that nine states could potentially have 50% or more females in their legislature after the upcoming election. My first thought - I need to see some graphs! They linked to a New York Times article that had some interesting graphs. There was [...]

The post When will women control state legislatures? appeared first on SAS Learning Post.

7月 032018
 

In FiveThirtyEight's Significant Digits section, they recently mentioned that nine states could potentially have 50% or more females in their legislature after the upcoming election. My first thought - I need to see some graphs! They linked to a New York Times article that had some interesting graphs. There was [...]

The post When will women control state legislatures? appeared first on SAS Learning Post.

7月 032018
 

At SAS, we love data. Data is central to our corporate vision: to transform a world of data into a world of intelligence. We're also famous for enjoying M&Ms, but to us they are more than a sweet snack. They're also another source of data.

My colleague Pete Privitera, with a team of like-minded "makers," built a device that they named SnackBot. SnackBot is an internet-connected sensor that measures the flow of M&Ms in a particular SAS break room. There's a lot to love about this project. You can learn more by watching its origin story in this video:

As the number of M&Ms changes, SnackBot takes a reading and records the M&M count in a database. Most readings reflect a decrease in candy pieces, as my colleagues help themselves to a treat. But once per week, the reading shows a drastic increase -- as our facilities staff restocks the canister. SnackBot has its own website. It also has its own API, and you know what that means (right?). It means that we can use SAS to read and analyze the sensor data.

Reading sensor data into SAS with the SnackBot API

The SnackBot system offers a REST API with a JSON data response. Like any REST API, we can use PROC HTTP to fetch the data, and the JSON library engine to parse the response into a SAS data set.

%let start = '20MAY2018:0:0:0'dt;
 
/* format the start/end per the API needs */
%let start_time= %sysfunc(putn(&start.,is8601dt26.));
%let end_time=   %sysfunc(datetime(),is8601dt26.);
 
/* Call the SnackBot API from snackbot.net */
filename resp temp;
proc http
  method="GET"
  url="http://snackbot.net/snackdata?start_time=&start_time.%str(&)end_time=&end_time.%str(&)utc_offset_minutes=-240"
  out=resp;
run;
 
/* JSON libname engine to read the result       */
/* Simple record layout, all in the ROOT member */
libname mms json fileref=resp;
data mmlevels;
  set mms.root;
run;

I've written about how to use SAS with REST APIs in several other blog posts, so I won't dwell on this part of the process here. This short program retrieves the raw data from SnackBot, which represents a series of M&M "levels" (count of remaining pieces) and a timestamp for each measurement. It's a good start. Though there are only two fields to work with here, there's quite a bit we can do with these data.

raw SnackBot data

Add features to the raw sensor data

With a few additional DATA step statements and some built-in SAS formats, we can derive several interesting characteristics of these data for use in further analysis.

First, we need to convert the character-formatted datetime field to a proper SAS datetime value. That's easily achieved with the INPUT function and the ANYDTDTM informat. (Rick Wicklin wrote a helpful article about how how the ANYDT* informats work.)

data mmlevels;
  set mms.root;
  drop ordinal_root timestamp;
  /* Convert the TIMESTAMP field to native value -- it's a character */
  datetime = input(timestamp, anydtdtm.);
  date = datepart(datetime);
  time = timepart(datetime);
  dow = date;
  qhour = round(datetime,'0:15:0'T);
  format  datetime datetime20. 
          qhour datetime20.
          date date9.
          time timeampm10.
          dow downame.;
run;

For convenience, I duplicated the datetime value a few times and applied different formats so we can get different views of the same value: datetime, just the date, just the time-of-day, and the day-of-week. I also used the ROUND function to "round" the raw datetime value to the nearest quarter hour. I'll explain why I've done that in a later step, but the ROUNDing of a time value is one of the documented unusual uses of the ROUND function.

SnackBot data with features

Even with this small amount of data preparation, we can begin to analyze the characteristics of these data. For example, let's look at the descriptive stats for the data classified by day-of-week:

title "SnackBot readings per day-of-week";
proc means data=mmlevels mean stddev max min;
 var pieces;
 class dow;
run;

SnackBot by day of week

The "N Obs" column shows the number of measurements taken over the entire "study period" broken down by day-of-week. If a measurement is a proxy for a "number-of-pieces-changed" event, then we can see that most events happen on Wednesday, Thursday, and Friday. From this, can you guess which day the M&M canister is refilled?

Let's take another slice through these data, but this time looking at time-of-day. For this, I used PROC FREQ to count the measurements by hour. I applied the HOUR2. format, which allows the SAS procedure to group these data into hour-long intervals with no need for additional data prep. ( I've written previously about how to use SAS formats to derive new categories without expensive data rewriting.) Then I used PROC SGPLOT to produce a step plot for the 24-hour cycle.

/* Count of readings per hour of the day */ 
title "SnackBot readings per hour";
proc freq data=mmlevels ;
 table time / out=perhour;
 format time hour2.;
run;
 
ods graphics / height=400 width=800;
 
title "SnackBot readings per hour";
proc sgplot data=perhour des="Readings per hour of day";
 step x=time y=count;
 xaxis min='0:0:0't max='24:0:0't label="Time of day" grid;
 yaxis label="Servings";
run;

SnackBot hour step

From the chart, we can see that most M&M "events" happen around 11am, and then again between 2pm and 4pm. From personal experience, I can confirm that those are the times when I hear the M&Ms calling to me.

Expand the time series to regular intervals

The SnackBot website can tell you how many M&Ms are remaining right now. But what if you want to know how many were remaining last Friday? Or on any typical Monday morning?

The sensor data that we've analyzed so far is sparse -- that is, there are data entries for each "change" event, but not for every discrete time interval in the study period. I don't know how the SnackBot sensor records its readings -- it might sample the M&M levels every minute, or every second. Regardless, the API reports (and probably stores) only the records that represent a change. If SnackBot records that the final 24 pieces were depleted at 25JUN2018:07:45:00 (a Monday morning) bringing the count to 0, how many M&Ms remain at 1pm later that day? The data don't tell us explicitly with a recorded reading. But we can assume at that point that the count was still 0. The next recorded reading occurs at 27JUN2018:10:30:00 (on a Wednesday, bringing the count to 1332 -- oh joy!).

If we want to create a useful time series visualization of the M&M candy counts over time, we need to expand the time series from these sparse recordings to regular intervals. SAS offers a few sophisticated time series procedures to accomplish this: PROC EXPAND, PROC TIMESERIES, and PROC TIMEDATA. Each of these offer powerful econometrics methods for interpolation and forecasting -- and that's more than we need for this situation. For my example, I took a more low-tech approach.

First, I created an empty data set with datetime entries at quarter-hour intervals, covering the study period of the data we're looking at.

/* Empty data set with 15 minute interval slots    */
/* Regular intervals for the entire "study" period */
data timeslots;
  last = datetime();
  length qhour 8;
  format qhour datetime20;
  drop last i;
  do i = &start. to last by '0:15:00't;
    qhour = i;
    output;
  end;
run;

Then I used a DATA step to merge these empty slots with the actual event data that I had rounded to the nearest quarter hour (remember that?):

/* Merge the sample data with the timeslots */
data expand;
  merge mmlevels(keep=pieces qhour) timeslots;
  by qhour;
run;

Finally, I used a variation of a last-observation-carried-forward (LOCF) approach to fill in the remaining empty slots. If a reading at 20MAY2018:11:15:00 reports 132 pieces remaining, then that value should be RETAINed for each 15-minute slot until the next reading at 20MAY2018:17:30:00. (That reading is 82 pieces -- meaning somebody helped themselves to 50 pieces. Recommended serving size for plain M&Ms is 20 pieces, but I'm not passing judgement.) I also recorded a text value for the day-of-week to help with the final visualization.

/* for empty timeslots, carry the sample data   */
/* forward, so we always have a count of pieces */
/* Variation on a LOCF technique                */
data final;
  set expand;
  length day $ 3;
  /* 3-char value for day of week */
  day=put(datepart(qhour),weekdate3.);
  retain hold;
  if not missing(pieces) then
    hold=pieces;
  else pieces=hold;
  drop hold;
  if not missing(pieces);
run;

Now I have data that represents the regular intervals that we need.

SnackBot regular intervals

Putting it all together

For my final visualization, I created a series plot for the study period. It shows the rise and fall of M&Ms levels in one SAS break room over several weeks. For additional "color", I annotated the plot with a block chart to delineate the days of the week.

title 'Plain M&M pieces on S1 tracked by SnackBot';
ods graphics / height=300 width=1600;
 
proc sgplot data=final des='M&M pieces tracked by SnackBot';
 
  /* plot the data as a series */ 
  series x=qhour y=pieces / lineattrs=(color=navy thickness=3px);
 
  /* Yes, these are the "official" M&M colors               */
  /* Will be applied in data-order, so works best when data */
  /* begins on a Sunday                                     */
  styleattrs datacolors=(red orange yellow green blue CX593B18 red);
  /* block areas to indicate days-of-week                   */
  block x=qhour block=day / transparency=0.65
    valueattrs=(weight=bold size=10pt color=navy);
 
  xaxis minor display=(nolabel);
  yaxis display=(nolabel) grid max=1600 minor;
run;

You can see the pattern. M&Ms are typically filled on Wednesday to the canister capacity of about 1400 pieces. We usually enter into the weekend with 0 remaining, but there are exceptions. The week of May 27 was our Memorial Day holiday, which explains the lack of activity on Monday (and even Tuesday) during that week as SAS folks took advantage of a slow week with their vacation plans.

SnackBot visualization

More about SAS and M&Ms data

You can download the complete code for this example from my public Gist on GitHub. The example code should work with SAS University Edition and SAS OnDemand for Academics, as well as with any SAS environment that can reach the internet with PROC HTTP.

For more M&M data fun, check out Rick Wicklin's article about the distribution of colors in plain M&Ms. SnackBot does not (yet) report on how many and which color of M&Ms are taken per serving, but using statistics, we can predict that!

The post The Internet of Snacks: SnackBot data and what it reveals about SAS life appeared first on The SAS Dummy.

7月 022018
 

Looking to connect and engage with other like-minded people in a meaningful way? A #SASchat might just what you’re looking for! #SASChat is SAS’ version of a Twitter chat: a live, online event that works like any get-together. Participating in a #SASchat is a great way to: Engage with an [...]

Everything you need to know about the weekly #SASChat was published on SAS Voices by Kristine Vick

7月 022018
 
Who was the greatest US president? Presidental greatness scores through 2018

Which president of the United States is ranked the greatest by presidential historians? This article visualizes the results of the 2018 Presidential Greatness Survey, which was created and administered by B. Rottinghaus and J. Vaughn. They analyzed 166 responses from experts in political science who ranked the 44 US presidents on a 0–100 "greatness scale" where 0 represents abject failure, 50 is average, and 100 is great. The survey results are presented in their 2018 report.

Table 1 of the report (p. 2-3) provides the average greatness scores of the presidents among all respondents. Just as "beauty is in the eye of the beholder," so too "greatness" is a subjective notion that depends on the values and beliefs of the assessor. Table 2A (p. 6-7) provides the average greatness scores for respondents by the political party and political ideology of the respondents. Of the respondents, 57.2% were Democrats, 30.1% were Independent or Other, and 12.7% were Republicans. Similarly, 58.4% self-identified as liberal or somewhat liberal, 24.1% self-identified as moderate, and 17.5% reported being conservative or somewhat conservative. The relatively large proportion of liberals should be considered when interpreting the survey results.

I imported the survey results into SAS and combined them to produce a visualization of the data. The graph at the right (click to enlarge) shows the presidents ranked by their overall mean greatness score, which is shown by an X. The mean scores for the conservative, moderate, and liberal respondents are shown as filled circles. The party of the president is indicated by a colored bar that shows the range of scores.

A few results are apparent from the graph:

  • Lincoln, Washington, FDR, Teddy Roosevelt, and Jefferson are the five greatest presidents according to these experts.
  • Obama and Reagan are two recent presidents who are ranked highly.
  • Statistically speaking, there are large groups of presidents for which the relative ranking is uncertain. For example, the presidents ranked 12–20 (Madison through Monroe) have mean scores that are similar.
  • The current president, Donald Trump, is ranked last. In fairness, Trump had not yet completed his first year in office at the time of the survey (December 2017). The authors of the report dedicate several pages (p. 10–13) analyzing a survey question about who was the Most Polarizing president, a category that Trump won handily.
  • Other presidents who scored low include James Buchanan (the "Do-Nothing President"), William Henry Harrison (who died 31 days into his term), and Franklin Pierce (a weak leader in a contentious pre-Civil War era).

You can also graph the data by using the political parties of the experts rather than their ideologies. The graph is very similar.

Bias and disagreement in ranking presidents

One of the most interesting aspects of these data is the relative widths of the ranges of most 20th and 21st century presidents. These wide intervals are caused when experts of one political ideology judge a president much differently than experts from another ideology. Typically, the conservative-liberal disparity amounts to 10–15 points on the "greatness" scale, but for several presidents (Coolidge and Obama) the average conservative score is more than 20 points different from the average liberal score.

This disparity is evidence of the partisan state of politics in the US. Even these experts, who presumably use historical facts rather than opinions to guide their ranking, are not immune from seeing current events through the lens of their own personal values and biases.

To better visualize the range of disagreement among the experts, I graphed the difference between the conservative and liberal scores for each president. Again, the color of the line indicates the political party of the president. The 20th century began with McKinley's presidency. For almost every president since McKinley, the gap for Democratic presidents is negative, indicating that the conservative experts judged the presidency less favorably than the liberal experts. For Republican presidents, the direction of the gap is reversed: the conservative experts ranked the presidency more favorably than their liberal counterparts. The exception is Teddy Roosevelt, whose progressive agenda (the "Square Deal" and conservation measures such as national parks and forests) appeals to liberal experts.

Summary

Rottinghaus and Vaughn's 2018 Presidential Greatness Survey analyzes the opinions of 166 experts as to which US presidents are considered great, average, or a failure. The first graph in this blog post shows that although there is general agreement among experts for many 18th and 19th century presidents (Lincoln and Washington were great; Pierce, W. H. Harrison, and Buchanan were not.), there is less agreement for modern presidents. As seen in the second graph, the variation tends to correlate along ideological grounds, with conservatives and liberals having very different views about which presidents were great.

You can download the SAS program that creates the graphs in this article. For a different visualization of these data, see Robert Lawrence's article "All the U.S. Presidents, Ranked by 'Greatness'."

The post Ranking US presidents appeared first on The DO Loop.