For colleges and universities, awarding financial aid today requires sophisticated analysis. When higher education leaders ask, “How can we use financial aid to help meet our institutional goals?” they need to consider many scenarios to balance strategic enrollment goals, student need, and institutional finances in order to optimize yield and [...]
My high school basketball coach started preparing us for the tournaments in the season’s first practice. He talked about the “long haul” of tournament basketball, and geared our strategies toward a successful run at the end of the season.
I thought about the “long haul” when considering my brackets for this year’s NCAA Tournament, and came to this conclusion; instead of seeking to predict who might win a particular game, I wanted to use analytics to identify which teams were most likely to win multiple games. The question that I sought to answer was simply, “could I look at regular season data, and recognize the characteristics inherent in teams that win multiple games in the NCAA Tournament?”
I prepared and extracted features from data representing the last 5 regular seasons. I took tournament data from the same period and counted numbers of wins per team (per season). This number would be my target value (0, 1, 2, 3, 4, 5, or 6 wins). Only teams that participated in the tournaments made the analysis.
I used SAS Enterprise Miner’s High Performance Random Forest Node to build 10,000 trees (in less than 14 seconds), and I determined my “top 10 stats” by simply observing which factors were split on the most.
Here are the results (remember that statistics represented are from the regular season and not the tournament), my “top 10 statistics to consider.”
1 --- Winning Percentage. Winners win, right? It is evident this is true the further a team moves into the tournament.
- Teams that win a single game have an average winning percentage of .729
- Teams that win 6 games have an average winning percentage of .858
- No team that has won a Final Four game over the last 5 years has a winning percentage less than .706
- Teams that won 6 games have a minimum winning percentage of .765.
2 --- Winning games by wide margins. Teams that advance in the tournament have beaten teams by wide margins during the regular season – this means that in some game over the course of the year, a team let go and won big! From a former player’s perspective, it doesn’t matter “who” you beat by a wide margin, but rather do you have the drive to crush the opponent?
- Teams that won 6 games have beaten some team by 49 points differentiating themselves from even the 5 win teams by 9 points!
3 --- The ratio of assists to turnovers (ATO). Teams that take care of and distribute the ball tend to be making assists instead of turnovers. From my perspective, the ATO indicates whether or not a team dictates the action.
- Over the last 5 years, no team that won 6 games had an ATO less than 1.19!
- Teams that have won at least 5 had an average ATO of 1.04.
- Teams that won less than 5 had average ATOs of less than 1.
4 --- Winning percentage on the road. We’re already noted that overall winning percentage is important, but it’s also important to win on the road since the tournament games are rarely played on a team’s home floor!
- Teams that don’t win any tournament games win 52% of their road games
- Teams that win 1-2 games win 57.8%
- Teams that win 3-5 win 63%
- Team that win 6 win 78% of their road games, and average only 2.4 (road) losses per year
- No team that has won at least 5 games has lost more than 5 on the road (in the last 5 years)!
5 --- The ratio of a team’s field goal percentage to the opposition’s field goal percentage? Winning games on the big stage requires both scoring and defense! A ratio above 1 indicates that you score the ball better than you allow your opposition to score.
- Teams that win 2 or fewer games have a ratio of 1.12
- Teams that win 3-5 games have a ratio of 1.18
- Teams that win 6 games have a ratio of 1.23 – no team that has won 6 games had a ratio of less than 1.19!
6 --- The ratio of turnovers to the turnovers created (TOR). I recall coaches telling me that a turnover committed by our team was essentially a 4-point play: 2 that we didn’t get, and 2 they did.
- Teams that win the most tournament games have an average TOR of 0.89. This means they turn the ball over at a minimal rate when compared to the turnovers they create.
- Over the past 5 years, teams that won 6 games have an average TOR .11 better than the rest of the pack which can be interpreted this way: they force the opposition into turnovers 10 times as often as they commit turnovers themselves.
7 --- Just as important as beating teams by wide margins, are the close games! Close games build character, and provide preparation for the tournament.
- Teams that win 6 games play more close games than any other group. The average minimum differential for this group is 1.6 points
- Teams winning less games average a differential of 1.8 points.
8 --- Defending the 3. Teams that win more games in the tournament defend the 3 point shot only slightly better than the other teams, but they are twice as consistent in doing it! So, regardless of who’s coming to play, look for some sticky D beyond the arc!
- On average, teams allow a 3-point field goal percentage .328
- Teams winning the most tournament games defend only slightly better at .324; however the standard deviation is the more interesting statistic indicating the consistency of doing so (defending the 3 point shot) is almost twice as good as the other teams!
9 --- Teams that win are good at the stripe! Free throws close games. Make them and get away with win!
- Teams that win the most games shoot for an average of .730 while the rest of the pack sits at .700
10 --- Teams that win the most games block shots! They play defense, period.
- Teams that win the most tournament games average over 5 blocks per game.
- Teams winning 6 games have blocked at least 3.4 shots per game (over the last 5 years)
Next steps? Take what’s been learned and apply it to this year’s tournament teams, and then as Larry Bird used to do, ask the question, “who’s playing for second?”
Are you interested in taking an advanced course on the machine learning topic of Neural Networks? Does text mining on unstructured data sound interesting? Interested in becoming a certified predictive modeler using Enterprise Miner? If you answered “yes” to any of these questions, then your first step is likely to […]
The post Step into the wonderful world of SAS Enterprise Miner appeared first on SAS Learning Post.
Learn more about Xiaoyuan Zhang.
As a business user with limited statistical skills, I don’t think I could build a credit scorecard without the help of SAS Enterprise Miner. As you can see from the flow chart, SAS Enterprise Miner, a descriptive and predictive modeling software, does an amazing job in model developing and streamlining.
The flow chart presents my whole credit score modeling process, which is divided into three parts: creating the preliminary scorecard, performing reject inference, and building the final scorecard. I will cover the whole process in the Insurance and Finance Users Group (IFSUG) virtual session on Feb 3, 2017. In this blog I wanted to emphasize the second part, which is sometimes easy to ignore.
The data for preliminary scorecard is from only accepted loan applications. However, the scorecard modeler needs to apply the scorecard to all applicants, both accepted and rejected. To solve the sample bias problem reject inference is performed.
Before inferring the behavior (good or bad) of the rejected applicants, data examination is needed. I used StatExplore node to explore the data and found out that there were a significant number of missing values, which is problematic. Because in SAS Enterprise Miner regression model, the model that is used here for scorecard creation and reject inference, ignores observations that contain missing values, which reduces the size of the training data set. Less training data can substantially weaken the predictive power of the model.
To help with this problem, Impute Node is used to impute the missing values. In the Properties Panel of the node, there are a variety of choices from which the modeler could choose for the imputation. In this model, Tree surrogate is selected for class variables and Median is selected for interval variables.
However, in Impute Node data role is set as Train. In order to use the data in Reject Inference Node, data role needs to be changed into Score. A SAS Code node is put in between for this purpose, which writes as:
data &em_export_score; set &em_import_data; run;
Last but not least, Reject Inference Node is used to infer the performance of the rejected loan applicant data. SAS Enterprise Miner offers three standard, industry-accepted methods for inferring the performance of the rejected applicant data by the use of a model that is built on the accepted applicants. We won’t explore the three methods in detail here, as the emphasis of the blog is on the process.
To hear more on this topic, please register for the IFSUG virtual session, Credit Score Modeling in SAS Enterprise Miner on February 3rd from 11am-12pm ET.
About Xiaoyuan Zhang
Xiaoyuan Zhang grew up in Zhaoyuan China on the coast of the Bohai sea. Her town is famous for its ancient gold mine, hot springs and its unusual and tasty seafood. Her undergraduate degree is from China Agricultural University in Bejing, where she majored in Marketing Intelligence and graduated with honors. She graduated, with honors, from Drexel University with a Master Degree in Finance. She has passed two CFA exams and learned Enterprise Miner in one of her courses. She specializes in efficient credit score modeling with unutilized SAS Enterprise Minor. She is using some of her post-graduation free time to study "regular SAS", to tutor and to volunteer.
One of the most powerful sales tools is often something that you can’t foresee or control. Even though customers read papers, visit websites and talk with a salesperson, another factor can make all the difference – a referral from a friend or coworker.
Think about the way that sites like Google, Yelp and others have changed the way consumers make everyday decisions, such as choosing restaurants. You can go to the restaurant nearest you or one you’ve visited before. Or, you can try something new by looking at your smartphone to see which dining spot has the highest ratings or the best reviews. Why? People show a preference for the personal experience of those in their networks.
For business-to-business software companies like SAS, the impact of customer advocacy is critical. These influencers can set the tone and provide a consistent positive influence throughout the customer journey. Unfortunately, this type of advocacy is tough to measure and hard to predict.
The challenge: Acquisition and retention
Although a customer may be a single record in your database, she doesn’t exist in a vacuum. Each contact has a connection to others within her business or the industry. Understanding and fostering good relationships can have a huge effect on your retention and loyalty efforts.
During our effort to map a modern customer journey, the SAS marketing team focused on different phases of this cycle. The customer journey contained these phases:
- Acquisition – which includes need, research, decide and buy.
- Retention – which includes adopt, use and recommend.
On the retention side, the team knew from anecdotal evidence that some SAS customers were advocates of the technology and for the company overall. In fact, several SAS regional offices and divisions had data confirming the idea that finding and rewarding high-value customers led to big returns. What was lacking was an overarching program for getting customers to advocate for SAS technology.
For a larger effort, the team assessed the customer behavior data, examining those who attended events, provided feedback on surveys, sent ideas to R&D, and generally stayed engaged with the company. From a revenue standpoint, those people were often the ones advocating for the use of new SAS technologies or the expansion of existing deployments.
What was less understood was the reach of these influencers and how their activities affected others. With that information, SAS could identify more advocates and nurture that behavior.
The approach: Identify advocates by scoring BFF behaviors
The SAS marketing team members started by digging into the data that they had on customers. They first identified a segment of the top accounts that contained more than 20,000 individual contacts and the team began to examine the behaviors exhibited by that group including:
- Live event attendance.
- Website traffic.
- Technical support queries.
- Customer satisfaction survey data.
- Customer reference activity.
- Webinar attendance.
- White paper downloads.
This information provided a better understanding of the range of activities that customers undertake. However, simply cataloging the behaviors wasn’t enough. The team applied a scoring model for different types of interactions. This allowed the team to weight certain activities, helping to further identify which customers were the best advocates—“BFFs” (best friends forever) as the marketing team began to call them.
The results: Advocacy campaigns that matter
SAS marketing used the information to create a model that is the foundation for customer-focused data exploration. The initial effort helped shed light on how influential advocates can shape retention and additional sales. As a result, sales and marketing worked together to highlight BFFs within key accounts in an ongoing effort to foster better relationships with those key individuals.
Initiatives to locate and encourage advocates used the model to identify the likely candidates within customer organizations. The team then designed campaigns and outreach efforts to give these advocates the tools to foster and expand their influence.
The marketing team now focuses on advocacy campaigns that target potential BFFs. The goal is to build more SAS advocacy during the recommend phase of the customer journey.
Acquisition and retention campaigns begin by doing advanced segmentation in SAS Marketing Automation. Campaign workflows are created that are backed by analytics, ensuring that communications to customers are appropriate and relevant. Through the collection of both contact and response history data, attribution can be performed in SAS Visual Analytics that allows marketers to see correlations and cross-promotion opportunities.
Interested in learning how to leverage SAS Marketing Automation techniques for advanced segmentation? Explore our SAS Marketing Automation: Designing and Executing Outbound Marketing Campaigns and Customer Segmentation Using SAS Enterprise Miner course offerings.
Editor’s note: This post is part of a series excerpted from Adele Sweetwood’s book, The Analytical Marketer: How to Transform Your Marketing Organization. Each post is a real-world case study of how to improve your customers’ experience and optimize your marketing campaigns.
With my first open source software (OSS) experience over a decade ago, I was ecstatic. It was amazing to learn how easy it was to download the latest version on my personal computer, with no initial license fee. I was quickly able to analyse datasets using various statistical methods.
Organisations might feel similar excitement when they first employ people with predominantly open source programming skills. . However, it becomes tricky to organize an enterprise-wide approach based solely on open source software. . However, it becomes tricky to organize an enterprise-wide approach based solely on open source software. Decision makers within many organisations are now coming to realize the value of investing in both OSS and vendor provided, proprietary software. Very often, open source has been utilized widely to prototype models, whilst proprietary software, such as SAS, provides a stable platform to deploy models in real time or for batch processing, monitor changes and update - directly in any database or on a Hadoop platform.
Industries such as pharma and finance have realised the advantages of complementing open source software usage with enterprise solutions such as SAS.
A classic example is when pharmaceutical companies conduct clinical trials, which must follow international good clinical practice (GCP) guidelines. Some pharma organisations use SAS for operational analytics, taking advantage of standardized macros and automated statistical reporting, whilst R is used for the planning phase (i.e. simulations), for the peer-validation of the results (i.e. double programming) and for certain specific analyses.
In finance, transparency is required by ever demanding regulators, intensified after the recent financial crisis. Changing regulations, security and compliance are mitigating factors to using open source technology exclusively. Basel’s metrics such as PD, LGD and EADs computation must be properly performed. A very well-known bank in the Nordics, for example, uses open source technology to build all type of models including ensemble models, but relies on SAS’ ability to co-exist and extend open source on its platform to deploy and operationalise open source models.
Open source software and SAS working together – An example
The appetite of deriving actionable insight from data is very crucial. It is often believed that when data is thoroughly tortured, the required insight will become obvious to drive business growth. SAS and open source technology is used by various organisations to achieve maximum business opportunities and ROI on all analytics investment made.
Using the flexibility of prototyping predictive model in R and the power and stable platform of SAS to handle massive dataset, parallelize analytic workload processing, a well-known financial institution is combining both to deliver instant results from analytics and take quick actions.
How does this work?
SAS embraces and extends open source in different ways, following the complete analytics lifecycle of Data, Discovery and Deployment.
An ensemble model, built in R is used within SAS for objective comparison within SAS Enterprise Miner (Enterprise Miner is a drag and drop, workflow modelling application which is easy to use without the need to code) – including an R model within the ‘open source integration node.’
Once this model has been compared and the best model identified from automatically generated fit statistics, the model can be registered into the metadata repository making it available for usage on all SAS platform.
We used SAS Model Manager to monitor Probability of Default(PD) and Loss Given Default(LGD) model. All models are also visible to everyone within the organization depending on system rights and privileges and can be used to score and retrain new dataset when necessary. Alerts can also be set to monitor model degradation and automated message sent for real time intervention.
Once champion model was set and published, it was used in Real Time Decision Manager(RTDM) flow to score new customers coming in for loan. RTDM is a web application which allows instant assessment of new applications without the need to score the entire database.
As a result of this flexibility the bank was able to manage their workload and modernize their platform in order to make better hedging decisions and cost saving investments. Complex algorithms can now be integrated into SAS to make better predictions and manage exploding data volumes.