8月 142018
 

Even though I’ve worked at SAS for nearly 30 years, I still get excited when great things come together for our customers! This year we are hosting the very first hackathon at our Analytics Experience conference in San Diego - the AnalyticsX Hackathon. Analytics Experience is in its third year [...]

The post Announcing the AnalyticsX Hackathon: Are you up for it? appeared first on SAS Learning Post.

8月 132018
 

Data in the cloud makes it easily accessible, and can help businesses run more smoothly. SAS Viya runs its calculations on Cloud Analytics Service (CAS). David Shannon of Amadeus Software spoke at SAS Global Forum 2018 and presented his paper, Come On, Baby, Light my SAS Viya: Programming for CAS. (In addition to being an avid SAS user and partner, David must be an avid Doors fan.) This article summarizes David's overview of how to run SAS programs in SAS Viya and how to use CAS sessions and libraries.

If you're using SAS Viya, you're going to need to know the basics of CAS to be able to perform calculations and use SAS Viya to its potential. SAS 9 programs are compatible with SAS Viya, and will run as-is through the CAS engine.

Using CAS sessions and libraries

Use a CAS statement to kick off a session, then use CAS libraries (caslibs) to store data and resources. To start the session, simply code "cas;" Each CAS session is given its own unique identifier (UUID) that you can use to reconnect to the session.

Handpicked Related VIDEO: SAS programming in the cloud: CASL code

There are a few significant codes that can help you to master CAS operations. Consider these examples, based on a CAS session that David labeled "speedyanalytics":

  • What CAS sessions do I have running?
    cas _all_ list;
  • Get the version and license specifics from the CAS server hosting my session:
    cas speedyanalytics listabout;
  • I want to sign out of SAS Studio for now, so I will disconnect from my CAS session, but return to it later…
    cas speedyanalytics disconnect;
  • ...later in the same or different SAS Studio session, I want to reconnect to the CAS session I started earlier using the UUID I previous grabbed from the macro variable or SAS log:
    cas uuid="&speedyanalytics_uuid";
  • At the end of my program(s), shutdown all my CAS sessions to release resources on the server:
    cas _all_ terminate;

Using CAS libraries

CAS libraries (caslib) are the method to access data that is being stored in memory, as well as the related metadata.

From the library, you can load data into CAS tables in a couple of different ways:

  1. Takes a sample data set, calculate a new measure and stores the output in memory
  2. Proc COPY can bring existing SAS data into a caslib
  3. Proc CASUTIL loads tables into caslibs

The Proc CASUTIL allows you to save your tables (named "classsi" data in David's examples) for future use through the SAVE statement:

proc casutil;
 save casdata="classsi" casout="classsi";
run;

And reload like this in a future session, using the LOAD statement:

proc casutil;
 load casdata="classsi" casout="classsi";
run;

When accessing your CAS libraries, remember that there are multiple levels of scope that can apply. "Session" refers to data from just the current session, whereas "Global" allows you to reach data from all CAS sessions.

Programming in CAS

Showing how to put CAS into action, David shared this diagram of a typical load/save/share flow:

Existing SAS 9 programs and CAS code can both be run in SAS Viya. The calculations and data memory occurs through CAS, the Cloud Analytics Service. Before beginning, it's important to understand a general overview of CAS, to be able to access CAS libraries and your data. For more about CAS architecture, read this paper from CAS developer Jerry Pendergrass.

The performance case for SAS Viya

To close out his paper, David outlined a small experiment he ran to demonstrate performance advantages that can be seen by using SAS Viya v3.3 over a standard, stand-alone SAS v9.4 environment. The test was basic, but performed reads, writes, and analytics on a 5GB table. The tests revealed about a 50 percent increase in performance between CAS and SAS 9 (see the paper for a detailed table of comparison metrics). SAS Viya is engineered for distributive computing (which works especially well in cloud deployments), so more extensive tests could certainly reveal even further increases in performance in many use cases.

Additional resources

A quick introduction to CAS in SAS Viya was published on SAS Users.

8月 132018
 

In the oil industry you can make or lose money based on how good your forecasts are, so I’ve pulled together six papers that discuss different ways in which you can leverage analytics to optimize your output and more accurately predict your production performance. Written by employees at oil and [...]

6 real-world ways to use optimization and forecasting in the oil and gas industry was published on SAS Voices by David Pope

8月 132018
 

My colleague, Robert Allison, recently published an interesting visualization of the relationship between chess ratings and age. His post was inspired by the article "Age vs Elo — Your battle against time," which was published on the chess.com website. ("Elo" is one of the rating systems in chess.) Robert Allison's article indicates how to download the 2014 Elo ratings for 70,963 active players into a SAS data set. You can use PROC SGPLOT to create a scatter plot of the rating and age of each player. Because the graph displays so many observations, I followed Robert's advice and used semi-transparency to bring out hidden features in the data. I also colored each marker according to whether the player is male (blue) or female (light red). The results are shown below:

Elo rating in chess versus age for 70,000+ active chess players in 2014.

As pointed out in the previous articles, ratings depend on age in a nonlinear fashion. The ratings for young players tend to be less than for young adults. Ratings tend to be highest for players in their mid to late twenties. After age 30, ratings tend to decrease with age. The purpose of this article is to use statistical regression to predict ratings from age.

Predict percentiles of Elo rating as a function of age

The chess.com article shows a line plot that is formed by binning the players' ages into five-year age ranges, computing the average rating in each bin, and then connecting the means in each age group. This bin-and-connect-the-means method is less powerful than regression, but approximates the nonlinear relationship between age and the average rating for that age. A more powerful statistical technique is quantile regression. In quantile regression, you model the percentiles of the response variable, such as the 25th, 50th, or 90th percentile of the distribution of ratings for a given age. In other words, quantile regression enables you to compute separate predictions for the ratings of poor players, for good players, and for great players. A previous article compares quantile regression with the simpler binning technique.

I used the QUANTREG procedure in SAS to performs quantile regression to model the 10th, 25th, 50th, 75th, and 90th percentiles of rating as a function of age. (Quantiles are numbers between 0 and 1, whereas percentiles are between 0 and 100, but otherwise the terms are interchangeable.) I ignored the gender of the players. The following graph is the default plot that is created by PROC QUANTREG. The lowest curve shows the predicted rating for the 10th percentile of players (the weak players) as a function of age. The middle curve shows the predicted rating for the 50th percentile of players (the average players). The highest curve shows the predicted rating for the 90th percentile, who are excellent chess players.

Elo rating in chess versus age. The 10th, 25th, 50th, 75th, and 90th percentiles are shown for active players.

For the shown percentiles, the predicted chess ratings increase quickly for young players, then peak for players in their mid-twenties. This happens for all percentiles, from the 10th to the 90th. For example, the median player (50th percentile) at age 29 has a rating of about 2050. In contrast, the 90th percentile for a 29-year-old is about 2356.

This plot should remind you of a children's weight and height charts in a doctor's office. Just as the mother of a young child might use the doctor's chart to estimate her child's percentile for weight and height, so too can a chess player use this chart to estimate his percentile for rating. If you are a chess player, find your age on the horizontal axis. Then trace a line straight up until you reach your personal Elo rating. Use the relative position of the surrounding curves to estimate your rating percentile. For example, a 40-year-old with a 1900 rating could conclude from the graph that he is between the 25th and 50th percentile, but closer to the 50th.

As I said, this graph is created by default, but we can do better. In the following section, I add separate curves for women and men. I also add grid lines to make it easier to estimate coordinates in the graph.

Adding gender to the predict percentiles

Chess is enjoyed by both men and women. If you add gender to the quantile regression model, you obtain separate curves for men and women for each percentile. The following graph shows the predicted percentiles for the 10th, 50th, and 90th percentiles, for both men and women. Notice that the predicted male ratings are higher than the predicted female ratings at every age and for each quantile. Notice also that the predicted scores for females tend to peak later in life: about 30 years for females versus 25 for males. After peaking, the female curves drop off more quickly than their male counterparts.

Percentile curves for mean and women. Shown are the 10th, 50th, and 90th percentiles

Percentiles for Elo chess ratings by gender

The previous section overlays the male and female percentile curves. This makes it easy to compare males and females, but even with only three quantiles, the graph is crowded. If you want to display five or more percentile curves, it makes sense to separate the graph into a panel that contains two graphs, one for males and one for females. Again, this is similar to the height-weight percentile charts in the pediatrician offices, which often have separate charts for boys and girls.

The following panel shows side-by-side predictions for quantiles of ratings as a function of age. If you are a chess player, you can use these charts to estimate your rating percentile. For your current age and gender, what is your percentile?

In summary, I used data for Elo ratings in chess to build a quantile regression model that predicts percentiles of ratings for a given age and gender. Using these graphs, you can see the predicted Elo ratings for poor, good, and great chess players for every combination of age and gender. If you are a chess player yourself, you can locate your age-score value in one of the plots and estimate your percentile among your age-gender peer group.

If you are a SAS user and are interested in the details, you can download the SAS program that creates the analysis and graphs. It uses PROC QUANTREG to predict the quantile curves, and uses tips and techniques from the articles "How to score and graph a quantile regression model in SAS" and "Plot curves for levels of two categorical variables in SAS."

The post A quantile regression analysis of chess ratings by age appeared first on The DO Loop.

8月 112018
 

With analytics, brands can see the world as their customers do ‒ and shape customer experience in real time. And according to latest Harvard Business Review Analytics Services Report, Real-Time Analytics: The Key to Unlocking Customer Insights & Driving the Customer Experience, real-time analytics is a critical enabler. There are [...]

Who owns real-time customer interactions in the C-suite? was published on Customer Intelligence Blog.

8月 102018
 

Elections are in the news again, therefore I have been on the lookout for interesting graphs. I recently found some graphs of the Census Current Population Survey (CPS) Voting and Registration Supplement data, and tried to improve them. Follow along if you're interested in voter data, or creating better graphs! [...]

The post Survey says! .... US voters are _________. appeared first on SAS Learning Post.

8月 102018
 

Elections are in the news again, therefore I have been on the lookout for interesting graphs. I recently found some graphs of the Census Current Population Survey (CPS) Voting and Registration Supplement data, and tried to improve them. Follow along if you're interested in voter data, or creating better graphs! [...]

The post Survey says! .... US voters are _________. appeared first on SAS Learning Post.

8月 102018
 

Hash tables are a very powerful and flexible data structure. Most SAS applications of hash tables focus on just one of their many powerful facilities: table lookup. Hash tables are a fantastic table lookup tool and their use for that should never be diminished. However, hash tables can do so [...]

The post Five things you (probably) don’t know you can do with a hash table appeared first on SAS Learning Post.