Analytics Life Cycle

3月 312022
 

Editors note: This is the first in a series of articles.

According to the Global McKinsey Survey on the State of AI in 2021, the adoption of AI is continuing to grow at an unprecedented rate. Fifty-six percent of all respondents reported AI adoption – including machine learning (ML) – in at least one function, up from 50% in 2020.

Businesses are deploying ML models for everything from data exploration, prediction and learning business logic to more accurate decision making and policymaking. ML is also solving problems that have stumped traditional analytical approaches, such as those involving unstructured data from graphics, sound, video, computer vision and other high-dimensional, machine-generated sources.

But as organizations build and scale their use of models, governance challenges increase as well. Most notably, they struggle with:

    Managing data quality and exploding data volumes. Most large enterprises host their data in a mix of modern and legacy databases, data warehouses, and ERP and CRM services – both on-premises and in the cloud. Unless organizations have ongoing data management and quality systems supporting them, data scientists may inadvertently use inaccurate data to build models.
    Collaborating across business and IT departments. Model development requires multidisciplinary teams of data scientists, IT infrastructure and line-of-business experts across the organization working together. This can be a difficult task for many enterprises due to poor workflow management, skill gaps between roles, and unclear divisions of roles and responsibilities among stakeholders.
    Building models with existing programming skills. Learning a programming language can take years to perfect, so if developers can build models using the skills they already have, they can deploy new models faster. Modern machine learning services must empower developers to build using their language of choice and provide a low-code/no-code user interface that nontechnical employees can use to build models.
    Scaling models. Enterprises must have the ability to deploy models anywhere – in applications, in the cloud or on the edge. To ensure the best performance, models need to be deployed as lightweight as possible and have access to a scalable compute engine.
    Efficiently monitoring models. Once deployed, ML models will begin to drift and degrade over time due to external real-world factors. Data scientists must be able to monitor models for degradation, and quickly retrain and redeploy the models into production to ensure companies are returning maximum productivity.
    Using repeatable, traceable components. To minimize the time a given model is out of production during rescoring and training, models must be built using repeatable and traceable components. Without a component library and documented version history, there is no way to understand which components were used to build a model, which means it must be rebuilt from scratch.

To help you address these challenges, SAS develops its services – and integrations in the Microsoft Cloud – with the analytics life cycle in mind. The analytics life cycle enables businesses to move seamlessly from questions to decisions by connecting DataOps, artificial intelligence and ModelOps in a continuous and deeply interrelated process (see Figure 1). Let us take a closer look at each of these elements:

    DataOps. Borrowing from agile software development practices, DataOps provides an agile approach to data access, quality, preparation and governance. It enables greater reliability, adaptability, speed and collaboration in your efforts to operationalize data and analytics workflows.
    Artificial intelligence. Data scientists use a combination of techniques to understand the data and build predictive models. They use statistics, machine learning, deep learning, natural language processing, computer vision, forecasting, optimization, and other techniques to answer real-world questions.
    ModelOps. ModelOps focuses on getting AI models through validation, testing and deployment phases as quickly as possible while ensuring quality results. It also focuses on ongoing monitoring, retraining and governance to ensure peak performance and transparent decisions.

Figure 1

So how can we apply the analytics life cycle to help us solve the challenges we listed above? To answer that, we will have to take a closer look at ModelOps.

Based on longstanding DevOps principles, the SAS ModelOps process allows you to move to validation, testing and deployment as quickly as possible while ensuring quality results. It enables you to manage and scale models to meet demand, and continuously monitor them to spot and fix early signs of degradation.

ModelOps also increases confidence in ML models while reducing risk through an efficient and highly automated governance process. This ensures high-quality analytics results and the realization of expected business value. At every step, ModelOps ensures that deployment-ready models are regularly cycled from the data science team to the IT operations team. And, when needed, model retraining occurs promptly based on feedback received during model monitoring.

    Managing data quality and exploding data volumes. ModelOps ensures that the data used to train models aligns with the operational data that will be used in production. Managing data in a data warehouse, such as Azure Synapse Analytics, helps you ingest data from multiple sources and perform all ELT/ETL steps, so data is ready to explore and model.
    Collaborating across business and IT departments. ModelOps empowers data scientists, IT infrastructure and line-of-business experts work in harmony thanks to a mutual understanding of their counterparts and ultimate end users.
    Building models with existing programming skills. Make it easier for everyone on your team to build models using their preferred programming language including SAS, Python and R, in addition to visual drag-and-drop tools for a faster building experience.
    Scaling models. Deploy your models anywhere in the Microsoft Cloud including applications, services, containers and edge devices.
    Efficiently monitoring models. Models are developed with a deployment mindset and deployed with a monitoring mindset so data scientists and analysts can monitor and quickly retrain models as they degrade.
    Using repeatable, traceable components. There are no black box models anymore because the business always knows the data it uses to train the model, monitors that model for efficacy, tracks the history of the code used in training the models, and uses automation for deployment and repeatability.

Next time you will learn how together, SAS and Microsoft empower your ModelOps steps.

To learn more about ModelOps and our partnership with Microsoft, see our whitepaper: ModelOps with SAS Viya on Azure.

How ModelOps addresses your biggest machine learning challenges was published on SAS Users.

7月 252019
 

Recommendations on SAS Support Communities

If you visit the SAS Support Communities and sign in with your SAS Profile, you'll experience a little bit of SAS AI with every topic that you view.

While it appears to be a simple web site widget, the "Recommended by SAS" sidebar is made possible by an application of the full Analytics Life Cycle. This includes data collection and prep, model building and test, API wrappers with a gateway for monitoring, model deployment in containers with orchestration in Kubernetes, and model assessment using feedback from click actions on the recommendations. We built this by using a combination of SAS analytics and open source tools -- see the SAS Global Forum paper by my colleague, Jared Dean, for the full list of ingredients.

Jared and I have been working for over a year to bring this recommendation engine to life. We discussed it at SAS Global Forum 2018, and finally near the end of 2018 it went into production on communities.sas.com. The engine scores user visits for new recommendations thousands of times per day. The engine is updated each day with new data and a new scoring model.

Now that the recommendation engine is available, Jared and I met again in front of the camera. This time we discussed how the engine is working and the efforts required to get into production. Like many analytics projects, the hardest part of the journey was that "last mile," but we (and the entire company, actually) were very motivated to bring you a live example of SAS analytics in action. You can watch the full video at (where else?) communities.sas.com. The video is 17 minutes long -- longer than most "explainer"-type videos. But there was a lot to unpack here, and I think you'll agree there is much to learn from the experience. Not ready to binge on our video? I'll use the rest of this article to cover some highlights.

Good recommendations begin with clean data

The approach of our recommendation engine is based upon your viewing behavior, especially as compared to the behavior of others in the community. With this approach, we don't need to capture much information about you personally, nor do we need information about the content you're reading. Rather, we just need the unique IDs (numbers) for each topic that is viewed, and the ID (again, a number) for the logged-in user who viewed it. One benefit of this approach is that we don't have to worry about surfacing any personal information in the recommendation API that we'll ultimately build. That makes the conversation with our IT and Legal colleagues much easier.

Our communities platform captures details about every action -- including page views -- that happens on the site. We use SAS and the community platform APIs to fetch this data every day so that we can build reports about community activity and health. We now save off a special subset of this data to feed our recommendation engine. Here's an example of the transactions we're using. It's millions of records, covering nearly 100,000 topics and nearly 150,000 active users.

Sample data records for the model

Building user item recommendations with PROC FACTMAC

Starting with these records, Jared uses SAS DATA step to prep the data for further analysis and a pass through the algorithm he selected: factorization machines. As Jared explains in the video, this algorithm shines when the data are represented in sparse matrices. That's what we have here. We have thousands of topics and thousands of community members, and we have a record for each "view" action of a topic by a member. Most members have not viewed most of the topics, and most of the topics have not been viewed by most members. With today's data, that results in a 13 billion cell matrix, but with only 3.3 million view events. Traditional linear algebra methods don't scale to this type of application.

Jared uses PROC FACTMAC (part of SAS Visual Data Mining and Machine Learning) to create an analytics store (ASTORE) for fast scoring. Using the autotuning feature, the FACTMAC selects the best combination of values for factors and iterations. And Jared caps the run time to 3600 seconds (1 hour) -- because we do need this to run in a predictable time window for updating each day.

proc factmac data=mycas.weighted_factmac  outmodel=mycas.factors_out;
   autotune maxtime=3600 objective=MSE 
       TUNINGPARAMETERS=(nfactors(init=20) maxiter(init=200) learnstep(init=0.001) ) ;
   input user_uid conversation_uid /level=nominal;
   target rating /level=interval;
   savestate rstore=mycas.sascomm_rstore;
run;

Using containers to build and containers to score

To update the model with new data each day and then deploy the scoring model as an ASTORE, Jared uses multiple SAS Viya environments. These SAS Viya environments need to "live" only for a short time -- for building the model and then for scoring data. We use Docker containers to spin these up as needed within the cloud environment hosted by SAS IT.

Jared makes the distinction between the "building container," which hosts the full stack of SAS Viya and everything that's needed to prep data and run FACTMAC, and the "scoring container", which contains just the ASTORE and enough code infrastructure (include the SAS Micro Analytics Service, or MAS) to score recommendations. This scoring container is lightweight and is actually run on multiple nodes so that our engine scales to lots of requests. And the fact that it does just the one thing -- score topics for user recommendations -- makes it an easier case for SAS IT to host as a service.

DevOps flow for the recommendation engine

Monitoring API performance and alerting

To access the scoring service, Jared built a simple API using a Python Flask app. The API accepts just one input: the user ID (a number). It returns a list of recommendations and scores. Here's my Postman snippet for testing the engine.

To provision this API as a hosted service that can be called from our community web site, we use an API gateway tool called Apigee. Apigee allows us to control access with API keys, and also monitors the performance of the API. Here's a sample performance report for the past 7 days.

In addition to this dashboard for reporting, we have integrated proactive alerts into Microsoft Teams, the tool we use for collaboration on this project. I scheduled a SAS program that tests the recommendations API daily, and the program then posts to a Teams channel (using the Teams API) with the results. I want to share the specific steps for this Microsoft Teams integration -- that's a topic for another article. But I'll tell you this: the process is very similar to the technique I shared about publishing to a Slack channel with SAS.

Are visitors selecting recommended content?

To make it easier to track recommendation clicks, we added special parameters to the recommended topics URLs to capture the clicks as Google Analytics "events." Here's what that data looks like within the Google Analytics web reporting tool:

You might know that I use SAS with the Google Analytics API to collect web metrics. I've added a new use case for that trick, so now I collect data about the "SAS Recommended Click" events. Each click event contains the unique ID of the recommendation score that the engine generated. Here's what that raw data looks like when I collect it with SAS:

With the data in SAS, we can use that to monitor the health/success of the model in SAS Model Manager, and eventually to improve the algorithm.

Challenges and rewards

This project has been exciting from Day 1. When Jared and I saw the potential for using our own SAS Viya products to improve visitor experience on our communities, we committed ourselves to see it through. Like many analytics applications, this project required buy-in and cooperation from other stakeholders, especially SAS IT. Our friends in IT helped with the API gateway and it's their cloud infrastructure that hosts and orchestrates the containers for the production models. Putting models into production is often referred to as "the last mile" of an analytics project, and it can represent a difficult stretch. It helps when you have the proper tools to manage the scale and the risks.

We've all learned a lot in the process. We learned how to ask for services from IT and to present our case, with both benefits and risks. And we learned to mitigate those risks by applying security measures to our API, and by limiting the execution scope and data of the API container (which lives outside of our firewall).

Thanks to extensive preparation and planning, the engine has been running almost flawlessly for 8 months. You can experience it yourself by visiting SAS Support Communities and logging in with your SAS Profile. The recommendations that you see will be personal to you (whether they are good recommendations...that's another question). We have plans to expand the engine's use to anonymous visitors as well, which will significantly increase the traffic to our little API. Stay tuned!

The post Building a recommendation engine with SAS appeared first on The SAS Dummy.