My colleague, Robert Allison, recently published an interesting visualization of the relationship between chess ratings and age. His post was inspired by the article "Age vs Elo — Your battle against time," which was published on the chess.com website. ("Elo" is one of the rating systems in chess.) Robert Allison's article indicates how to download the 2014 Elo ratings for 70,963 active players into a SAS data set. You can use PROC SGPLOT to create a scatter plot of the rating and age of each player. Because the graph displays so many observations, I followed Robert's advice and used semi-transparency to bring out hidden features in the data. I also colored each marker according to whether the player is male (blue) or female (light red). The results are shown below:

As pointed out in the previous articles, ratings depend on age in a nonlinear fashion. The ratings for young players tend to be less than for young adults. Ratings tend to be highest for players in their mid to late twenties. After age 30, ratings tend to decrease with age. The purpose of this article is to use statistical regression to predict ratings from age.

### Predict percentiles of Elo rating as a function of age

The chess.com article shows a line plot that is formed by binning the players' ages into five-year age ranges, computing the average rating in each bin, and then connecting the means in each age group. This bin-and-connect-the-means method is less powerful than regression, but approximates the nonlinear relationship between age and the average rating for that age. A more powerful statistical technique is quantile regression. In quantile regression, you model the percentiles of the response variable, such as the 25th, 50th, or 90th percentile of the distribution of ratings for a given age. In other words, quantile regression enables you to compute separate predictions for the ratings of poor players, for good players, and for great players. A previous article compares quantile regression with the simpler binning technique.

I used the QUANTREG procedure in SAS to performs quantile regression to model the 10th, 25th, 50th, 75th, and 90th percentiles of rating as a function of age. (Quantiles are numbers between 0 and 1, whereas percentiles are between 0 and 100, but otherwise the terms are interchangeable.) I ignored the gender of the players. The following graph is the default plot that is created by PROC QUANTREG. The lowest curve shows the predicted rating for the 10th percentile of players (the weak players) as a function of age. The middle curve shows the predicted rating for the 50th percentile of players (the average players). The highest curve shows the predicted rating for the 90th percentile, who are excellent chess players.

For the shown percentiles, the predicted chess ratings increase quickly for young players, then peak for players in their mid-twenties. This happens for all percentiles, from the 10th to the 90th. For example, the median player (50th percentile) at age 29 has a rating of about 2050. In contrast, the 90th percentile for a 29-year-old is about 2356.

This plot should remind you of a children's weight and height charts in a doctor's office. Just as the mother of a young child might use the doctor's chart to estimate her child's percentile for weight and height, so too can a chess player use this chart to estimate his percentile for rating. If you are a chess player, find your age on the horizontal axis. Then trace a line straight up until you reach your personal Elo rating. Use the relative position of the surrounding curves to estimate your rating percentile. For example, a 40-year-old with a 1900 rating could conclude from the graph that he is between the 25th and 50th percentile, but closer to the 50th.

As I said, this graph is created by default, but we can do better. In the following section, I add separate curves for women and men. I also add grid lines to make it easier to estimate coordinates in the graph.

### Adding gender to the predict percentiles

Chess is enjoyed by both men and women. If you add gender to the quantile regression model, you obtain separate curves for men and women for each percentile. The following graph shows the predicted percentiles for the 10th, 50th, and 90th percentiles, for both men and women. Notice that the predicted male ratings are higher than the predicted female ratings at every age and for each quantile. Notice also that the predicted scores for females tend to peak later in life: about 30 years for females versus 25 for males. After peaking, the female curves drop off more quickly than their male counterparts.

### Percentiles for Elo chess ratings by gender

The previous section overlays the male and female percentile curves. This makes it easy to compare males and females, but even with only three quantiles, the graph is crowded. If you want to display five or more percentile curves, it makes sense to separate the graph into a panel that contains two graphs, one for males and one for females. Again, this is similar to the height-weight percentile charts in the pediatrician offices, which often have separate charts for boys and girls.

The following panel shows side-by-side predictions for quantiles of ratings as a function of age. If you are a chess player, you can use these charts to estimate your rating percentile. For your current age and gender, what is your percentile?

In summary, I used data for Elo ratings in chess to build a quantile regression model that predicts percentiles of ratings for a given age and gender. Using these graphs, you can see the predicted Elo ratings for poor, good, and great chess players for every combination of age and gender. If you are a chess player yourself, you can locate your age-score value in one of the plots and estimate your percentile among your age-gender peer group.

If you are a SAS user and are interested in the details, you can download the SAS program that creates the analysis and graphs. It uses PROC QUANTREG to predict the quantile curves, and uses tips and techniques from the articles "How to score and graph a quantile regression model in SAS" and "Plot curves for levels of two categorical variables in SAS."

The post A quantile regression analysis of chess ratings by age appeared first on The DO Loop.

With analytics, brands can see the world as their customers do ‒ and shape customer experience in real time. And according to latest Harvard Business Review Analytics Services Report, Real-Time Analytics: The Key to Unlocking Customer Insights & Driving the Customer Experience, real-time analytics is a critical enabler. There are [...]

I am unabashedly a data zealot, but not a data scientist. I believe in the power of data and data analytics, and that the data revolution has started. It will transform all our lives in ways we cannot even imagine. This is one of the things that attracted me to [...]

The wizarding world of data was published on SAS Voices by John Maynard

Elections are in the news again, therefore I have been on the lookout for interesting graphs. I recently found some graphs of the Census Current Population Survey (CPS) Voting and Registration Supplement data, and tried to improve them. Follow along if you're interested in voter data, or creating better graphs! [...]

The post Survey says! .... US voters are _________. appeared first on SAS Learning Post.

Elections are in the news again, therefore I have been on the lookout for interesting graphs. I recently found some graphs of the Census Current Population Survey (CPS) Voting and Registration Supplement data, and tried to improve them. Follow along if you're interested in voter data, or creating better graphs! [...]

The post Survey says! .... US voters are _________. appeared first on SAS Learning Post.

Hash tables are a very powerful and flexible data structure. Most SAS applications of hash tables focus on just one of their many powerful facilities: table lookup. Hash tables are a fantastic table lookup tool and their use for that should never be diminished. However, hash tables can do so [...]

The post Five things you (probably) don’t know you can do with a hash table appeared first on SAS Learning Post.

Recently I’ve been listening to the BBC Radio Series 50 Things That Made the Modern Economy, which was first broadcast in 2016. One of the episodes considers the impact of a simple box (the shipping container) and concludes its invention was a major contributor to the post-war boom in global trade. It’s worth a listen, if you can.

Notwithstanding the tenuous link, containerization is having perhaps an equally significant impact on Cloud Computing and I want to share a recent experience which highlights the convenience of containers. I’m not aiming to summarize all the multiple SAS initiatives in the Cloud (including SAS Viya and Cloud Foundry) here rather it’s to share a few observations about a specific offering for SAS 9.4.

Recently I attended a demonstration by SAS’ Doug Liming on SAS Analytics for Containers. While this product was launched in 2016, until now I confess I’d not appreciated its simplicity or potential. I’d like to use this blog post to share what I saw & learned because this session served as a bit of an epiphany for me.

As a reminder SAS Analytics for Containers consists of:

• Foundation SAS (Base, STAT & Graph) ready-packaged to be deployed in a Docker container.
• SAS Studio.
• Optional SAS/Access connectors & Accelerators.

In the space of 20 minutes, Doug took us through the The power and potential of simplicity: SAS 9.4 and Containers was published on SAS Users.

We’re bringing the concept of #VideoTag to LinkedIn. What's #VideoTag, you ask? It's an online adaptation of the old schoolyard game. In short, you record a video of yourself, upload it to LinkedIn and tag others to respond. It’s a fun, easy way to spur conversation online by showing your [...]