How old was the oldest person in your family, or the oldest person you personally know? And how do they compare to the oldest people in the world? ... Perhaps you can easily make the comparison, with this cool graph! But before we get started, here's a picture of my [...]
Plastic pollution in the oceans is becoming a huge problem. And, as with any problem, finding the solution starts with identifying the source of the problem. A recent study estimated that 95% of the plastic pollution in our oceans comes from 10 rivers - let's put some visual analytics to [...]
The post Can you guess which 10 rivers produce 95% of ocean's plastic pollution? appeared first on SAS Learning Post.
In this blog post, I will build a Visual Forecasting (VF) Pipeline, which is a process flow diagram whose nodes represent tasks in the VF Process. The objective is to show how to perform the full analytics life cycle with large volumes of data: from accessing data and assigning variable roles accurately, to building forecasting models, to select a champion model and overriding the system generated forecast. In this blog post I will use 1,337 time series related to a chemical company and will illustrate the main steps you would use for your own applications and datasets.
In future posts, I will work in the Programming application, a collection of SAS procedures and CAS actions for direct coding or access through tasks in SAS Studio, and will develop and assess VF models via Python code.
In a VF pipeline, teams can easily save forecast components to the Toolbox to later share in this collaborative environment.
Forecasting Node in Visual Analytics
This section briefly describes what is available in SAS Visual Analytics, the rest of the blog discusses SAS Visual Forecasting 8.2.
In SAS Visual Analytics the Forecast object will select the model that best fits the data out of these models: ARIMA, Damped trend exponential smoothing, Linear exponential smoothing, Seasonal exponential smoothing, Simple exponential smoothing, Winters method (additive), or Winters method (multiplicative). Currently there are no diagnostic statistics (MAPE, RMSE) for the model selected.
You can do “what-if analysis” using the Scenario Analysis and Goal Seeking functionalities. Scenario Analysis enables you to forecast hypothetical scenarios by specifying the future values for one or more underlying factors that contribute to the forecast. For example, if you forecast the profit of a company, and material cost is an underlying factor, then you might use scenario analysis to determine how the forecasted profit would change if the material cost increased by 10%. Goal Seeking enables you to specify a target value for your forecast measure, and then determine the values of underlying factors that would be required to achieve the target value. For example, if you forecast the profit of a company, and material cost is an underlying factor, then you might use Goal Seeking to determine what value for material cost would be required to achieve a 10% increase in profit.
Another neat feature in SAS Visual Analytics is that one can apply different filters to the final forecast. Filters are underlying factors or different levels of the hierarchy, and the resulting plot incorporates those filters.
Data Requirements for Forecasting
There are specific data requirements when working in a forecasting project, a time series dataset that contains at least two variables: 1) the variable that you want to forecast which is known as the target of your analysis, for example, Revenue and 2) a time ID variable that contains the time stamps of the target variable. The intervals of this variable are regularly spaced. Your time series table can contain other time-varying variables. When your time series table contains more than one individual series, you might have classification variables, as shown in the photo below. Distribution Center and Product Key are classification variables. Optionally, you can designate numerical variables (ex: Discount) as independent variables in your models.
You also have the option of adding a table of attributes to the time series table. Attributes are categorical variables that define qualities of the time series. Attribute variables are similar to BY variables, but are not used to identify the series that you want to forecast. In this post, the data I am using includes Distribution Center, Supplier Name, Product Type, Venue and Product Category. Notice that the attributes are time invariant, and that the attribute table is much smaller than the time series table.
The two data sets (SkinProduct and SkinProductAttributes) used in this blog contain 1,337 time series related to a chemical company. This picture shows a few rows of the two data sets used in this post, note that DATE intervals are regularly spaced weeks. The SkinProduct dataset is referred as the Time Series table in the SkinProductAttributes dataset as the Attribute Data.
Developing Models in SAS Visual Forecasting 8.2
Step One: Create a Forecasting Project and Assign Variables
From the SAS Home menu select the action Build Models that will take you to SAS Model Studio, where you select New Project, and enter 1) the Name of your project, 2) the Type of project, make sure you enter “Forecasting” and 3) the data source.
In the Data tab, assign the variables roles by using the icon in the upper right corner.
The roles assigned in this post are typical of role assignments in forecasting projects. Notice these variables are in the Time Series table. Also, notice that the classification variables are ordered from highest to lowest hierarchy:
Time Variable: Date
Dependent Variable: Revenue
Classification Variables: Distribution Center and Product Key
In the Time Series table, you might have additional variables you’d like to assign the role “independent” that should be considered for model generation. Independent variables are the explanatory, input, predictor, or causal variables that can be used to model and forecast the dependent variable. In this post, the variable “Discount” is assigned the role “independent”. To do this assignment: right click on the variable, and select Edit Variables.
To bring in the second dataset with the Attribute variables, follow the steps in this photo:
Step two: Automated Modeling with Pipelines
The objective in this step is to select a champion model. Working in the Pipelines tab, one explores the time series plots, uses the code editor to modify the default model, adds a 2nd model to the pipeline, compares the models and selects a champion model.
After step one is completed, select the Pipelines Tab
The first node in the VF pipelines is the Data node. After right-clicking and running this node, one can see the time series by selecting Explore Time Series. Notice that one can filter by the attribute variables, and that the table shows the exact historical data values.
Auto-forecasting is the next node in the default pipeline. Remember that we are modeling 1,332 time series. For each time series, the Auto-forecasting node automatically diagnoses the statistical characteristics of the time series, generates a list of appropriate time series models, automatically selects the model, and generates forecasts. This node evaluates and selects for each time series the ARIMAX and exponential smoothing models.
One can customize and modify the forecasting models (except the Hierarchical Forecasting model) by editing the model’s code. For example, to add the class of models for intermittent demand to the auto- forecasting node, one could open the code editor for that node and replace these lines
rc = diagspec.setESM();
rc = diagspec.setARIMAX();
rc = diagspec.setIDM();
To open the code editor see photo below. After changes, save code and close the editor.
At this point, you can run the Auto Forecasting node, and after looking at its results, save it to the toolbox, so the editing changes are saved and later reused or shared with team members.
By expanding the Nodes pane and the Forecasting Modeling pane on the left, you can select from several models and add a 2nd modeling node to the pipeline
The next photo shows a pipeline with the Naïve Forecast as the second model. It was added to the pipeline by dropping its node into the parent node (data node). This is the resulting pipeline:
After running the Model Comparison node, compare the WMAE (Weighted Mean Absolute Error) and WMAPE (Weighted Mean Absolute Percent Error) and select a champion model.
You can build several pipelines using different model strategies. In order to select a champion model from all the models developed in the pipelines one uses the Pipeline Comparison tab.
Before you work on any overrides for your forecasting project, you need to make sure that you are working with the best pipeline and modeling node for your data. SAS Visual Forecasting selects the best fit model in each pipeline. After each pipeline is run, the champion pipeline is selected based on the statistics of fit that you chose for the selection criteria. If necessary, you can change the selected champion pipeline.
Step Three: Overrides
The Overrides tab is used to manually adjust the forecasts in the future. For example, if you want to account for some promotions that your company and its competitors are running and that are not captured by the models.
The Overrides module allows users to select subsets of time series at the aggregate level by selecting attribute values in the attribute table that you defined in the data tab. The filters based on the attributes are highly customizable and do not restrict you to use the hierarchy that was used for the modeling. The section of a filter using attribute is often referred to as faceted search. Whenever you create a new filter based on a selection of values of the attributes (also known as facets), the aggregate for all series that match the facets will be displayed on your main panel.
There is a wealth of information in the Overrides overview: 1) a list of the BY variables, as well as attribute variables, available to use as filters.
2) a Plot of the Time Series Aggregation and Overrides displaying historical and forecast data, and
3) a Forecast and Overrides table which can be used to create, edit and submit, override values for a time series based on external factors that are not included in the forecast models
Using SAS Visual Forecasting 8.2 you can effectively model and forecast time series in large scale. The Visual Forecasting Pipeline greatly facilitates the automatic forecasting of large volumes of data, and provides a structured and robust method with efficient and flexible processes.
In May of 2016, I expressed my personal excitement for the release of SAS 360 Discover to the SAS platform, and the subject of predictive marketing analytics. Since then, I have worked with brands across industries like sports, non-profits, and financial services to learn about valuable use cases. Before diving [...]
Data and analytics touch our lives every day. Consider: A call from your bank warning of a suspicious transaction. A well-timed discounted offer for something you need. Most people realize that data and analytics are behind these things, but they remain on the periphery of mainstream conversations. We need to [...]
If you grew up as a space-junkie kid like me, then you've probably been watching the news for the past few days, wondering when and where China's Tiangong 1 space station was going to fall out of orbit, and crash to Earth. According to Popular Mechanics (PM), that happened at 8:16pm [...]
The post Tracking the reentry of China's space space station appeared first on SAS Learning Post.
BREAKING NEWS. Today, shortly after midnight on the U.S. East Coast, Cary, NC-based SAS Institute successfully completed its first space exploration mission.
This interplanetary expedition was conducted on a SAS-designed manned spacecraft powered by our state-of-the-art atomic Collider Acceleration System (CAS) engine. A crew of three SAS volunteers took part in that undertaking. These brave souls were:
All three specialize in terrestrial communications (i.e. social media) and received special training on extra-terrestrial travel and communications. Their mission was to study the Venus space area up close to find out what causes the gravitational field anomaly that has recently been observed there.
How it all started
Those of us who attended last year’s SAS Global Forum in Orlando must remember the inspiring speech by Canadian astronaut Chris Hadfield. The main idea I took away from his speech was that success is not a good teacher, as it teaches us nothing; failure, on the hand, is a very good teacher, at least for those of us who are willing to learn the lesson. But, as we all know, there is a time to fail/learn, and there is a time to succeed.
I can’t speak for all of you, but, man, were we inspired by that speech! We at SAS knew right then that we, too, wanted to explore the “final frontier.” As a data analytics software company, all our studies start with data explorations. Our public relations group worked tirelessly with the major stakeholder Government agencies and private companies (NASA, SpaceX, ROSCOSMOS, etc.) to get ahold of the data. When our analysts finally did get access to the data, they were overwhelmed by its size. That was really BIG data (literally of cosmic proportions) – data about every little pocket of spacetime in our Solar system, collected over multiple years of astronomical observations and Space exploration programs.
Using our flagship SAS Viya Analytics software, we mined these vast data archives, employing various predictive modeling, computational, and heuristic techniques such as automatic machine deep space learning, 3D artificial intelligence simulation, and, most importantly, natural, coffee-stimulated human intelligence.
What caught our attention was the space area around Venus. Planet Venus is notorious for being an outlier. First of all, it spins slower than any other planet in the Solar system, even slower than it revolves around the Sun. In fact it spins about 243 times slower than Earth. That means that a day there lasts approximately 243 Earth-days, making it longer than a Venusian year, which is only about 225 Earth-days long.
Second of all, it spins backwards, in the opposite direction from most other planets, including Earth, so that on Venus the sun rises in the west.
Third, it has the highest mean surface temperature of all the Solar System planets – reaching up to 726 °K (452 °C or 870 °F), which is 1.6 times hotter than Mercury, the closest planet to the Sun. This is because of Venus’ thick atmosphere composed mostly of greenhouse gases (carbon dioxide and sulfur dioxide), which trap a good portion of the Sun’s heat.
However, the most unusual thing that we discovered was an aberration in Venus’ gravitational field, suggesting a significant mass (possibly large enough to be a planet) hidden behind it.
The following bubble plot uses a logarithmic scale for x-axis (distance from the Sun) and visualizes our finding:
Now we can see it clearly. Not only does it show an unknown small planet behind Venus, it also explains why it is not visible from Earth: its period of revolution around the Sun is exactly the same as that of Venus, which is why it has always been obstructed by Venus, and not visible from Earth.
Due to its very close proximity to Venus, there is a good chance that even a slight tangential nudge experienced by planet “2-?” might break gravitational equilibrium, causing it to start orbiting Venus as a moon rather than the Sun. We will be observing this situation carefully.
Another interesting finding is that this unknown planet is a much more hospitable place than Venus, as it has an oxygen-dominated atmosphere and a relaxing surface temperature only slightly higher than that of Earth, as it is shown in the following chart:
At this point we had had enough modeling and needed some hard proof.
Space exploration mission
We leveraged our best human intelligence resources around the world to progress rapidly through all the required phases of spacecraft design and construction, crew selection and training, and finally launching and completing the space mission.
All in all, it took just under a year (11 months to be precise) to bring this project to completion. The data collection and exploration phase took around one month, which is in line with the capacity of the SAS Viya analytical environment. The design and build phase took about five months, and it was conducted in parallel with crew selection and training; finally, the fly phase also took 5 months including launch, travel to the Venus space area with a flyby of Venus and the planet X, and a return to Earth. As you can see from the picture at the very top, the unknown planet X hidden behind Venus does indeed exist. However, further analysis and study will be necessary to determine the nature of the observed surface irregularities.
Your participation and input are requested
If you would like you to engage in the fascinating field of space exploration, you are welcome to use the following SAS-generated summary data table:
If you are a detail-oriented type, it will be obvious to you that the circumference of the new planet oddly equals 200π2 (km), which defies all the canons of geometry. You are welcome to prove or disprove the possibility of such an unusual occurrence.
So far, we have referred to this new planet as “2-?” and “planet X”, but it’s about time to give it a name, a real name as all other 8 planets of our Solar System have.
Our first inclination was to name it after our newest SAS analytical software environment, Viya. But do we really want to dilute the brand by applying it to two different though prominent objects?!
That is why we decided to reach out to you, our readers, to solicit ideas for the name of the new planet. Please provide a brief justification for your name suggestion. We also welcome any insights, hypotheses and data stories you might come up with based on the collected data. We greatly appreciate your input.
Please click here for full disclosure.
These days, it's all about music streaming ... You sign up for a service, you let them know your music preferences, and they create a virtual "radio station" that plays the songs you probably want to hear. But that just doesn't compare with going to a physical record store, buying [...]
Are you going to Denver, Colorado, and wondering what fun/interesting/eclectic things you can do there? Then this is the map for you! For the past couple of years, I've made maps of the city SAS Global Forum is in, pointing out some of the attractions that conference attendees might want [...]
Unless you live under a rock, you've probably seen news reports that Russian trolls have been posting on social media to allegedly conduct what they called "information warfare against the United States, with the stated goal of spreading distrust towards the candidates and the political system in general." NBC recently [...]