Data Visualization

10月 302019
 

Check mark
When I'm about to make a major purchase, I appreciate being able to compare products features at a glance, side by side. I am sure you have seen these ubiquitous comparison tables with check marks showing which features are characteristic of different products and which are not.

These data visualizations, sometimes called comparison matrixes, are also commonly known as checklist tables or checklist table charts. Such charts are extremely useful, persuasive visuals as they allow us to quickly identify differences as well as commonalities between comparable products or solutions and quickly decide which one of them is more desirable or suitable for our needs.

For example, here is such a table that I created in MS Word:
Checklist table example created in Word

SAS code to create checklist table chart

Thanks to SAS’ ability to use Unicode characters in formatted data, it’s very easy to create such a checklist table in SAS. Just imagine that each cell value with visible check mark is assigned value of 1 and each cell value with no check mark is assigned value of 0. That is exactly the data table that lies behind this data visualization. To print this data table with proper formatting, we will format number 1 to a more visually appealing check mark, and 0 to a “silent” blank. Here is the SAS code to accomplish this:

data CHECKLIST;
   length FEATURE $10;
   input FEATURE $1-10 A B C;
   label
      FEATURE = 'Feature'
      A = 'Product A'
      B = 'Product B'
      c = 'Product C';
   datalines;
Feature 1 1 1 1
Feature 2 1 0 1
Feature 3 0 1 1
Feature N 1 0 1
;
 
proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      other = ' ';
   value chcolor
      1 = green;
run;
 
ods html path='c:\temp' file='checklist1.html' style=HTMLBlue;
 
proc print data=CHECKLIST label noobs;
   var FEATURE / style={fontweight=bold};
   var A B C / style={color=chcolor. just=center fontweight=bold};
   format A B C chmark.;
run;
 
ods html close;

If you run this SAS code, your output will look much as the one above created in MS Word:
Checklist table created in SAS
Key elements of the SAS code that produce this checklist table are user-defined formats in the PROC FORMAT. You format the values of 1 to a Unicode 2714 corresponding to a checkmark character ✔ in a user-defined format chmark. Also, the value of 1 is formatted to green color in the chcolor user-defined format. The syntax for using Unicode symbols in user-defined formats is this:

value chmark
1 = '(*ESC*){unicode "2714"x}'

NOTE: ESC here must be upper-case; x at the end stands for “hexadecimal.”

Unicode characters for checklist tables

Unicode or Unicode Transformation Format (UTF) is an international encoding standard by which each letter, digit or symbol is assigned a unique numeric value that applies across different platforms and programs. The Unicode standard is supported by many operating systems and all modern browsers.

It is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The most commonly used Unicode encodings standards are UTF-8 and UTF-16. HTML 5 supports both UTF-8 and UTF-16.

You can use this HTML Unicode (UTF-8) Reference to look up and choose symbols you can embed in your report using SAS user-defined formats. They are grouped by categories to make it easier to find the ones you needed.

Here is just a small random sample of the Unicode symbols that can be used to spice up your checklist tables to get their different flavors:

Unicode characters and codes
You can also apply colors to all these symbols the way we did it in the SAS code example above.

Different flavors of checklist tables

By just changing user-defined formats for the symbol shapes and colors we can get quite a variety of different checklist tables.

For example, we can format 0 to ✘ instead of blank and also make it red to explicitly visualize feature exclusion from product (in addition to explicit inclusion). All we need to do is to modify our PROC FORMAT to look like this:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      0 = '(*ESC*){unicode "2718"x}';
   value chcolor
      1 = green
      0 = red;
run;

SAS output comparison matrix will look a bit more dramatic and persuasive:
SAS-generated checklist table
Or, if you’d like, you can use the following format definition:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2611"x}'
      0 = '(*ESC*){unicode "2612"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following SAS-generated ballot-like table checklist:
Ballot-like checklist table created in SAS
Here is another one:

proc format;
   value chmark
      1 = '(*ESC*){unicode "1F5F9"x}'
      0 = '(*ESC*){unicode "20E0"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following variation of the checklist table:
Another SAS-generated checklist table
As you can see, the possibilities are endless.

Your thoughts?

Do you find these comparison matrixes or checklist tables useful? Do you envision SAS producing them for your presentation, documentation, data story or marketing materials? What Unicode symbols do you like? Can you come up with some creative usages of symbols and colors? For example, table cells background colors...

How to create checklist tables in SAS® was published on SAS Users.

10月 302019
 

Check mark
When I'm about to make a major purchase, I appreciate being able to compare products features at a glance, side by side. I am sure you have seen these ubiquitous comparison tables with check marks showing which features are characteristic of different products and which are not.

These data visualizations, sometimes called comparison matrixes, are also commonly known as checklist tables or checklist table charts. Such charts are extremely useful, persuasive visuals as they allow us to quickly identify differences as well as commonalities between comparable products or solutions and quickly decide which one of them is more desirable or suitable for our needs.

For example, here is such a table that I created in MS Word:
Checklist table example created in Word

SAS code to create checklist table chart

Thanks to SAS’ ability to use Unicode characters in formatted data, it’s very easy to create such a checklist table in SAS. Just imagine that each cell value with visible check mark is assigned value of 1 and each cell value with no check mark is assigned value of 0. That is exactly the data table that lies behind this data visualization. To print this data table with proper formatting, we will format number 1 to a more visually appealing check mark, and 0 to a “silent” blank. Here is the SAS code to accomplish this:

data CHECKLIST;
   length FEATURE $10;
   input FEATURE $1-10 A B C;
   label
      FEATURE = 'Feature'
      A = 'Product A'
      B = 'Product B'
      c = 'Product C';
   datalines;
Feature 1 1 1 1
Feature 2 1 0 1
Feature 3 0 1 1
Feature N 1 0 1
;
 
proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      other = ' ';
   value chcolor
      1 = green;
run;
 
ods html path='c:\temp' file='checklist1.html' style=HTMLBlue;
 
proc print data=CHECKLIST label noobs;
   var FEATURE / style={fontweight=bold};
   var A B C / style={color=chcolor. just=center fontweight=bold};
   format A B C chmark.;
run;
 
ods html close;

If you run this SAS code, your output will look much as the one above created in MS Word:
Checklist table created in SAS
Key elements of the SAS code that produce this checklist table are user-defined formats in the PROC FORMAT. You format the values of 1 to a Unicode 2714 corresponding to a checkmark character ✔ in a user-defined format chmark. Also, the value of 1 is formatted to green color in the chcolor user-defined format. The syntax for using Unicode symbols in user-defined formats is this:

value chmark
1 = '(*ESC*){unicode "2714"x}'

NOTE: ESC here must be upper-case; x at the end stands for “hexadecimal.”

Unicode characters for checklist tables

Unicode or Unicode Transformation Format (UTF) is an international encoding standard by which each letter, digit or symbol is assigned a unique numeric value that applies across different platforms and programs. The Unicode standard is supported by many operating systems and all modern browsers.

It is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The most commonly used Unicode encodings standards are UTF-8 and UTF-16. HTML 5 supports both UTF-8 and UTF-16.

You can use this HTML Unicode (UTF-8) Reference to look up and choose symbols you can embed in your report using SAS user-defined formats. They are grouped by categories to make it easier to find the ones you needed.

Here is just a small random sample of the Unicode symbols that can be used to spice up your checklist tables to get their different flavors:

Unicode characters and codes
You can also apply colors to all these symbols the way we did it in the SAS code example above.

Different flavors of checklist tables

By just changing user-defined formats for the symbol shapes and colors we can get quite a variety of different checklist tables.

For example, we can format 0 to ✘ instead of blank and also make it red to explicitly visualize feature exclusion from product (in addition to explicit inclusion). All we need to do is to modify our PROC FORMAT to look like this:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      0 = '(*ESC*){unicode "2718"x}';
   value chcolor
      1 = green
      0 = red;
run;

SAS output comparison matrix will look a bit more dramatic and persuasive:
SAS-generated checklist table
Or, if you’d like, you can use the following format definition:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2611"x}'
      0 = '(*ESC*){unicode "2612"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following SAS-generated ballot-like table checklist:
Ballot-like checklist table created in SAS
Here is another one:

proc format;
   value chmark
      1 = '(*ESC*){unicode "1F5F9"x}'
      0 = '(*ESC*){unicode "20E0"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following variation of the checklist table:
Another SAS-generated checklist table
As you can see, the possibilities are endless.

Your thoughts?

Do you find these comparison matrixes or checklist tables useful? Do you envision SAS producing them for your presentation, documentation, data story or marketing materials? What Unicode symbols do you like? Can you come up with some creative usages of symbols and colors? For example, table cells background colors...

How to create checklist tables in SAS® was published on SAS Users.

10月 242019
 

“Analytics Can Save Higher Education. Really.” is a call to action for the higher education community to leverage data and analytics for better decision making at colleges and universities. It stresses the importance of using data and analytics to improve student outcomes, campus operations and much more. Oklahoma State University [...]

Establishing an analytics culture: An interview with Oklahoma State University was published on SAS Voices by Georgia Mariani

8月 282019
 

Moving Average (MA) is a common indicator in stocks, securities and futures trading in financial markets to gauge momentum and confirm trends. MA is often used to smooth out short-term fluctuations and show long-term trends. But most MA indicators have big lags in signaling a changing trend. To be faster to capture a trend reversal, several New MA indicators are now available that more quickly detect trend changes – and of those, the Hull Moving Average (HMA), is one of the most popular. This post demonstrates its superiority.

A closer look at HMA

Developed by Alan Hull, it's faster and thus a more useful signal than others. Its main advantage over general MA indicators is its relative smoothness as it signals change. Commonly-used MA indicators include Simple Moving Average (SMA), Weighted Moving Average (WMA) and so on. SMA calculates the arithmetic mean of the prices, which gives individual value equal weight. WMA averages individual values with some predetermined weights.

Since moving averages are computed from prior data, all MA indicators suffer a significant drawback of being a lagging indicator. Even in a shorter-period of moving average, which has less lag than one with a longer period, a stock price may drop sharply before a MA indicator signals the trend change. The Hull Moving Average (HMA) uses weighted moving average and the square root of the period instead of the actual period itself, which leads it to be more responsive to most recent price activity, whilst maintaining smoothness.

According to Alan Hull, the formula for HMA is:

We see that the major computing components in HMA are three WMAs. Refer to the specification here, we have the corresponding WMA formula as pictured below. In the WMA formula, the weight of each price value is related to the position of the value and the period length. The more recent the higher weights, and the shorter of the period the higher weights.

HMA in action

In the remainder of this post, I will show how to calculate HMA of a stock price using calculated items in SAS Visual Analytics and show that HMA gives faster upward/downward signals than SMA. I use the data from SASHELP. STOCK with ‘IBM’ as an example. The data needs to be sorted by the date and a column (named ‘tid’) added to hold the sequence number before loading into SAS Visual Analytics for calculation. The data preparation codes can be found here. After loading the data into SAS Visual Analytics, we can start by creating the calculated items. Here, I set the period length to 5 in calculation (i.e. =5 in the formula) and calculate HMA for ‘Close’ price of IBM stock for example.

1. Calculate the first WMA like so...

... using the AggregateCells operator in SAS Visual Analytics. I name it as 'WMA(5/2 days)'. Have the data value in, note I’ve rounded the (5⁄2) to an integer of 3. That is, the aggregation is starting from the previous two (-2) row and ending at current row (0). The corresponding formula of the calculated item ‘WMA(5/2 days)’ in SAS VA is:

AggregateCells(_Sum_, ( 'Close'n * 'tid'n ), default, CellIndex(current, -2), CellIndex(current, 0)) / AggregateCells(_Sum_, 'tid'n, default, CellIndex(current, -2), CellIndex(current, 0))

 

2. Similarly, calculate the second in SAS Visual Analytics:

Name it as ‘WMA(5 days)’. The corresponding formula is:
AggregateCells(_Sum_, ( 'Close'n * 'tid'n ), default, CellIndex(current, -4), CellIndex(current, 0)) / AggregateCells(_Sum_, 'tid'n, default, CellIndex(current, -4), CellIndex(current, 0))

3. Now we calculate the HMA, which computes the third WMA using the two WMAs we get from above calculation. In SAS Visual Analytics, if we directly apply a similar approach for the last WMA calculation, it will show message of operands requiring group. So here, I need a workaround to make the aggregation work.

4. To work around the problem, I create an aggregated item named ‘sumtid’ to indicate the row sequence number in an aggregation way. To do this, firstly create a calculated item named ‘One’ with the constant value 1; then use AggregateCells operator creating the ‘sumtid’ to get the current row number: AggregateCells(_Sum_, 'One'n, default, CellIndex(start, 0), CellIndex(current, 0)).

5. Now we can compute the HMA in a similar way as we do for previous two WMAs. Name it as ‘HMA for close (5 days)’. Since int(√(5 ))=2, the starting position of the aggregation is set to the previous row (-1) and the ending position is set to the current row (0). Note the operands are now using the aggregated item ‘sumtid’. The formula for the ‘HMA for close (5 days)’ item is:

AggregateCells(_Sum_, ( ( ( 2 * 'WMA(5/2 days)'n ) - 'WMA(5 days)'n ) * 'sumtid'n ), default, CellIndex(current, -1), CellIndex(current, 0)) / AggregateCells(_Sum_, 'sumtid'n, default, CellIndex(current, -1), CellIndex(current, 0))

So far, we’ve created the Hull Moving Average of IBM stock Close price and saved it in the calculated item ‘HMA for close (5 days)’. We can easily draw its time series plot in SAS Visual Analytics. Now, I'll create a Simple Moving Average of ‘SMA for the close (5 days)’ with an equal weight, and then compare it with the HMA. The formula for ‘SMA for the close (5 days)’ is: AggregateCells(_Average_, 'Close'n, default, CellIndex(current, -4), CellIndex(current, 0))

Now let’s visualize the ‘SMA for the close (5 days)’ and ‘HMA for close (5 days)’ respectively. In below chars, each grey vertical bar shows the monthly price span of IBM stock, and the red lines correspond to SMA and HMA respectively. With the upper SMA line, we see constant lags with price changing and poor smoothness. And with bottom HMA line, we see rapid keep-up with price activities while maintaining good smoothness.

Below is the comparison of the ‘SMA for the close (5 days)’, ‘HMA for close (5 days)’ and the Close price. Besides smoothing out some fluctuations in Close price, the HMA indeed gives better signal than SMA does in indicating a turning point when there is an upward/downward trend reversal. Note the obvious lags of SMA compared to HMA. For example, compare the trends around the reference line in the visualization below. The Close price reached to a local peak at Jun1992 and started to go down from Jul1992. HMA quickly reflected the downward turn with one lag at Aug1992, while SMA still showed the rising trend in the meantime. SMA started to go down with one more lag to give the reversal signal.

Now it’s easy to understand why HMA is a better indicator than SMA to signal the reversal point. What has been your experience with HMA?

How to Calculate Hull Moving Average in SAS Visual Analytics was published on SAS Users.

8月 272019
 

I was born in a country (Brazil) where voting is mandatory. Most of my family still lives there, and now that I live in the US, they ask me about American politics all the time. One thing that often catches them by surprise is that not only is voting not [...]

Examining voter registration data with SAS Visual Analytics was published on SAS Voices by Frank Silva

8月 092019
 

Opening Plenary session, Esri UC 2019

Several of my colleagues and I attended the annual Esri User Conference last month in San Diego - along with 18,000 other Geo professionals.  It was a busy week of meetings, seminars and talks about the latest in GIS and Spatial technologies.  The days were long and exhausting, but it was also exciting and a ton of fun.  As we continue to process, plan and prepare to integrate some of these technologies into SAS Visual Analytics, I thought it would be beneficial to highlight the Esri features available in VA today.

One topic that received a lot of questions during this year’s SAS Global Forum in Dallas was that of Geocoding.  Geocoding is the process of transforming text address data into numeric latitude and longitude values.  Once the latitude and longitude are known, they can be mapped and analyzed spatially.  SAS has offered geocoding capabilities for quite some time as a part of SAS/Graph.  Beginning with SAS v940m5, PROC GEOCODE has moved into BASE SAS.  See my colleague’s blog posts here and here for more information on geocoding from BASE SAS.

But Geocoding is no longer limited to just Base SAS.  You can also geocode from within Visual Analytics, thanks to the integration with the Esri geocoding api.  This feature is part of the Esri Premium agreement, and became available in VA 8.3.   Esri premium features require an existing relationship and credentials with Esri.  This post assumes that relationship exists and your credentials have been validated.  I will discuss the details of the Esri premium features in a future post, but for today the focus is how to use the Esri Geocoding feature from VA with a real-world data set.

1. Getting the data into Visual Analytics

We will be using point data from the City of Dallas for the Public Library branch locations.  You can download the .csv file from the Dallas Open Data portal.  After downloading, it must be imported into VA for geocoding.

  • From the Data tab in VA, select Import > Local File
  • Navigate to the location of the Dallas library .csv file and select it
  • Adjust the default settings, if desired, and click the ‘Import Item’ button
  • Once you see the green success message, the data has been imported into VA and is ready to be geocoded. Click the ‘Cancel’ button

Message indicating successful data import

2. Selecting the data columns to geocode on

Accessing the Geocoding feature in VA follows a similar process to the steps we just performed to import the .csv file.

  • From the Data tab in VA, select Import > Esri > Geocode. Here, you must select the location of the newly imported library data set.  This path will vary depending upon the configuration of your VA instance.  For my installation, it is located at cas-shared-default > Public folder > CITY_OF_DALLAS_LIBRARY_LOCATIONS.  Once located, click the 'Select' button
  • The Geocoding Import window will open. This window should look familiar.  The top half is the same as the Import data we just used to get the .csv file into VA.  Essentially, the geocoding process is a new data import.  It will send selected columns to Esri via a REST api call.  The response will contain the corresponding latitude and longitude values we desire.  They will be added to our existing data set and imported into VA as a new geocoded data set.  The name of the new data set will have _GEO_CODE appended to the end of the original data set name.  This name can be modified as desired.

Geocoding selection dialog window

  • At the bottom of the Geocoding Import window are two list boxes, Available items and Selected items. The Available items box on the left contains all columns in the data set.  Select the column(s) containing the address information you wish to geocode.  Double click or click the right arrow to move them to the Selected items window on the right.  In this example, we are using the Address column.
  • VA concatenates the selected column(s) to generate a sample address for geocoding. Clicking the ‘Test’ button returns coordinates for the sample address and a score representing the confidence level of the results.  In the screenshot above, our score is 71/100 for the test address.  Not bad, but it could be better.  More on this a bit later.
  • To finish the geocoding process, click the ‘Import Item’ at the top of the page, as we did with the original .csv file import. This time, you will be presented with a new dialog window.  Geocoding, as with other Esri premium features require the use of credits.  This dialog indicates how many Esri credits will be used by the geocoding process and will also be discussed in detail in a future post.

Esri credit usage alert dialog

For now, select 'Yes' to continue.  When you see the green success message, the operation is complete.  We are now ready to map our Dallas Library locations.  Click 'Ok' to open the new geocoded data set.

3. Create the geography variable and display the map

Next, we need to create our geography variable from the new geocoded data set.  As part of the geocoding process, four new columns have been added to the new data set: esri_latitude, esri_longitude, esri_score, esri_address.  We only need the esri_latitude and esri_longitude columns for our map.

  • Select the Branch Name category variable and change its Classification to Geography
  • For Geography data type, select Custom Coordinates
  • Select esri_latitude for Latitude
  • Select esri_longitude for Longitude
  • Click 'OK'
  • Drag the Branch Name geography variable to the canvas to create the map

Map of non-unique geocoded addresses

What happened??  Our data set contains Dallas Public library locations, so why are the data points spread across the world?  It’s all in the data.  If you look at the original data a bit deeper, you will notice the Address field we selected for the geocoding only contains the street number and street name of the library location.  It does not contain enough information to make it unique.  Therefore, during the geocoding process, the first instance of that address will be considered a match, regardless of where it is actually located.

Detailed view of incorrect geocoded address

In the image above for the Preston Royal branch, its street number and name were a perfect match to a location in Eugene, Oregon.  Not quite what we were looking for.  So, how do we fix this?  To make our addresses unique, it requires a simple addition to the source data .csv file.

Column selection to ensure unique addresses for geocoding

We need to add a ‘City’ and ‘State’ column to the original .csv file with the values of ‘Dallas’ and ‘Texas’ assigned to all entries.  This will ensure each address is unique and within our area of interest.  Re-import the new .csv file and geocode it using the Address, City and State columns.  The result?  A confidence score of a perfect 100.  Much better than our first attempt!  This will now give us the map we desire for the Dallas Public Library locations.

 

Final geocoded map of Dallas Library branches

In this post, I used real-world data to illustrate two things: the importance of knowing your data set, and how to geocode address information in SAS Visual Analytics.  Public data sets are a great resource but need to be used with a critical eye.  They may still need additional cleansing in order to work for your situation.

The geocoding feature is one example of the premium Esri features currently available in VA.  In future posts, I will go into more detail on other Esri features available, what make these features ‘premium’ and examples of how to use them.  Stay tuned!

Esri integration with SAS Visual Analytics: Geocoding was published on SAS Users.

7月 262019
 

In a previous post, Zero to SAS in 60 Seconds- SAS Machine Learning on SAS Analytics Cloud, I documented my experience with a SAS free trial on the SAS Analytics Cloud. Well, the engineers at SAS have been busy and created another free trial. The new trial covers SAS Event Stream Processing (ESP).

This time last year (when just starting at SAS), I only knew ESP as extrasensory perception. I'm more enlightened now. Working through this exercise introduced me to how event stream processing is a powerful and effective tool for analyzing data using machine learning and streaming analytics to uncover insights for real-time decision making. In a nutshell, you create a model, stream your data, process the results, and make timely decisions based on the results.

The trial uses SAS ESPPy, allowing you to embed an ESP project inside a Python pipeline. To see ESPPy in action take a look at this video. To learn more about ESP and IoT see this article on the SAS Communities Library. In this article I chronicle my journey through the trial while introducing key concepts and operations of ESP.

Register and get started

The process to register and initial login are identical to the machine learning article. You must have a SAS Profile to participate in the trial. The only difference is you need to follow this link to sign up for the ESP trial. Please refer to the machine learning article for detailed steps of signing up and logging in.

The use case

SAS Solar Farm in Cary

The SAS Solar Farm sits on almost 12 acres of SAS Headquarters property. There are 10,276 solar panels producing more than 3.6 million kilowatt hours annually. That’s enough power for more than 325 average sized U.S. homes.

As part of the environment management, it is important to continuously monitor the operation of the solar panels to optimize configuration parameters, detect potential equipment failure, and accurately forecast the amount of energy generated. Factors considered include panel angles, time of day, seasons, and weather patterns as the energy generated depends directly of the amount of sun available to the panels.

The ESP project in this demo is pre-loaded in the trial and is run through a Jupyter notebook. The project shows the monitoring of energy (kWh) and power (kW) generated during a specific time interval eliminating localized outlier effects and triggering alerts when there is a pre-defined difference in the energy generated between subsequent time intervals.

Solar Farm Data represented as digital art

Take two minutes and watch this video on how SAS uses SAS software to create a work of art with solar farm data.

Disclaimer: no sheep were harmed during data collection or writing of this article.

Navigating the trial

Once logged into the trial, you see the Applications screen.

ESP trial Applications screen

The Data and Team options in the left pane behave exactly as those in the machine learning trial. These sections allow you to access data and manage a multi-user system. Select the SAS Event Stream Processing icon to start a JupyterLab session.

JupyterLab home screen

I will not go into the details of JupyterLab here. The left pane contains menus, file management, and other options. The pane on the right displays three options:

Python 3 Notebook - a blank Jupyter notebook - documents that combine live, runnable code with narrative text (Markdown), equations (LaTeX), images, interactive visualizations and other rich output
Python 3 Console - a blank Python console - code consoles enable you to run code interactively in a kernel
Text File - basic text editor - enables you to edit text files in JupyterLab

For this article we're going to follow along and interact with the pre-loaded demo Solar Farm ESP project. To locate the Jupyter notebook double click the demo directory from the left pane.

Select the demo directory from the left pane

Next select Event_Stream_Processing. Before proceeding with the demo, I'd highly suggest opening the README.ipynb file.

Contents of the README notebook

Here you will find overview and environment organization information for the trial. The trial uses SAS ESPPy for designing, testing, and deploying projects on ESP Servers.

Step through the demo

Before starting the trial, I needed a little background on event stream processing. I located the SAS ESP product documentation. I recommend referring to it for details on the ESP model, objects, and workflow.

To access the demo, double click the demo directory from the left pane. The trial comes with five pre-loaded demos. Feel free to try any/all of them. Double click on ESP Basic Project - Solar Farm.ipynb to display the Solar Farm notebook. The notebook walks you through the ESP model creation and execution. To run a command place the cursor in a command cell and select the 'Run' button (triangle-shaped button at the top of the notebook). If no response returns when running the cell block, assume the commands ran successfully.

Below is a brief description of the steps in the project:

  1. Create the project and query used - this creates dedicated space and objects where the ESP process takes place
  2. Create input and aggregate windows - this action extracts desired data and creates data subsets from the stream
  3. Add a join window - this brings together lag and current values into the project
  4. Add a compute window - this calculates the difference between the previous and current event
  5. Add a filter window - this action filters occurrences outside a threshold value; this creates an alert for potential mechanical issues
  6. Define workflow connections - this defines the workflow between the various windows in the project
  7. Save the project - this generates an XML file for the project
  8. Load the project to the ESP Server - this loads the project and produces a graphical representation of the workflow

    Solar Farm project workflow

  9. Start streaming data - in this example, rather than streaming data in real time, the stream derives from the solar farm table data
  10. View solar farm data - this creates a graphical representation of streaming data

    Solar Farm graph for kW and kWh

While not included in the demo, the streaming data would pass through the filter and if a threshold breach occurs, an alert is created. Considering the graph above, alerts could very well have occurred just before 1:15 pm (IntkW drops from 185 to 150) and just before 2:30 pm (IntkW drops from 125 to 35).

Your turn

Now that you have a taste of ESP, feel free to step through the rest of the demos. You may also load your own data and create your own ESP models. Feel free to share your experience and what you create by leaving a comment.

SAS Event Stream Processing on SAS Analytics Cloud - my journey was published on SAS Users.

7月 192019
 

When a new Moon passes between the Earth and the Sun, the Moon can cast a shadow on certain regions of the Earth. This natural phenomenon creates a solar eclipse, meaning the Moon covers, or eclipses, your view of the Sun if you're in that region. No surprise that in [...]

Ring of fire: Visualizing 5,000 years of solar eclipses was published on SAS Voices by Falko Schulz

7月 122019
 

Are you a seasoned data scientist looking for a fast, all-inclusive machine learning solution? Curious about machine learning but have little to no programming experience? Interested in using AI to take over the world? Follow my lead and use SAS VDDML to fast track your world domination.

This blog is the beginning of a series on  SAS Visual Data Mining and Machine Learning (VDMML) told from my perspective as a first-time SAS Viya user, Graduate Intern at SAS, and ABD PhD Candidate in Computer Science. I'm writing this series for two main reasons: 1) to express how surprised I am at seeing how easily complex tasks can be completed after doing it the hard way for years and 2) to provide examples to convince you, too.  

SAS VDMML is only one of many products available in SAS Viya®. Its distinguishing feature being the machine learning pipelines which are created in a single, integrated in-memory environment via a drag and drop interface.

In this post, I will provide a high-level overview of a few of the features available in SAS VDMML. In the next posts, I will provide detailed examples and code comparisons for individual features, such as pipeline creation and autotuning. 

Tip: At the bottom of the post, I talk about a course on machine learning using SAS Viya that provides access to the software and teaches machine learning basics. 

simpleDecisionTreePipeline

Simple custom pipeline using SAS Viya

If you've never used SAS VDMML, here are the top 3 reasons why I think you should check it out. 

sasvdmml_hyperparameters

A sample of the variety of hyperparameter modifications available.

SAS VDMML creates a simplified approach to machine learning solutions beneficial to people with a wide range of expertise.

Have you been programming for as long as you can remember and are well-versed in the machine learning world?

Why spend all your precious mental energy focusing on tedious programming tasks? Instead, you could be focusing your energy on diving deeper in your data and discovering the extent of its modelling capabilities.

After spending only a week familiarizing myself with the interface, I felt confident I could perform my normal tasks with ease and with better hyperparameter tuning and more comprehensive model evaluations.

If you are wondering if these simplifications limit the customization of models, think again. For most uses, the customization options available match the level programming provides by utilizing features such as drop-down menus and editable text boxes - eliminating unnecessary mental overhead. 

 

opensourcecodenode

Open Source Code node as a supervised learning node and example code in code editor

Still itching to program?

You have options: the SAS Code node and the Open Source Code node (available for use with R and Python). Both nodes can be used in any part of the pipeline, including preprocessing, supervised learning, and post-processing.

For example, you may have preprocessing code for extra messy data already written in R. All you need to do is add the Open Source Code node, insert your code, and update the variables to match the provided macros. Or, maybe you want to try the Deep Learning toolkit and CAS action? Drop a SAS Code node into your pipeline, add your code, and you are good to go!

 

Little to no programming experience or not quite a machine learning expert?

The drag and drop interface, wide selection of templates, and the extensive evaluations allow for almost anyone to produce professional-level results in a matter of minutes. While using SAS VDMML might not require expert-level knowledge on machine learning, important projects should have an expert review the approach and results. 

Example of creating a new pipeline using an advanced template

sasvdmml_pipelinenodeoptions

A sample of options for preprocessing and supervised learning.

The days of spending weeks programming scripts for feature extraction, fine-tuning models, and evaluating your model are over!

For example, let's say I'm attempting to impute some variables using R. First, I might store the names of  each columns separately based on the type of imputation I want to perform on it. Then, I could create the code for each type of imputation. If I'm only attempting 2 different types of imputation, I will most likely need less than 10 lines of code. Not much, right?

But, I will also need to test and verify that each variable has been imputed correctly.

Instead, the same task in SAS VDMML would just require you to drop an Imputation node into the pipeline and select via a drop-down menu how to impute the variables - no time wasted.

Additionally, you can quickly compare a variety of supervised learning methods as well as test out the same model with different pre-processing methods using the automatic evaluations provided.  

Looking to save even more time?

You can use the autotuning feature to select the best set of hyperparameters for your model by turning it on in your supervised learning node of choice and hitting run. 

sasvdmml_autotuning

Turn on the autotuning feature inside your supervised learning node, then adjust ranges for the hyperparameters.

After the run is complete, view the supervised learning model's results to see the best configuration of parameters as determined by autotuning. 

sasvdmml_autotuningresults

Example of the results after using autotuning for a decision tree.

All of this can be accomplished with a few clicks, which eliminates the hours spent debugging scripts and connecting the steps in your workflow.

stressedatcomputerYou’ve spent the last hour transforming and creating features in your code editor of choice. Now, after waiting 30 minutes for your model to run again, you get the same results! How? Wait...you’ve forgotten to update the reference to your new data, AGAIN. (This has definitely not happened to me.) 

Fortunately, SAS VDMML only allows you to view results if the pipeline is up-to-date, which ensures that all changes are accounted for. Now, instead of checking and checking again that I passed the right data to the right functions, I can immediately know that my small tweak had no effect on the results. *sigh* OR on a brighter note, that the drastic improvement is not a fluke!  

Updating the Feature Extraction node resets all child nodes below - ensuring that the pipeline stays up-to-date.

Interested in checking out SAS Viya?

Machine Learning Using SAS Viya is a course that teaches machine learning basics, gives instruction on using SAS Viya VDMML, and provides access to the SAS Viya for Learners software all for $79.This course is the pre-requisite course for the SAS Certified Specialist in Machine Learning Certification. Going through the course myself, I was able to quickly learn how to use SAS VDMML and received a refresher on many data preprocessing tactics and machine learning concepts. 

Want to learning more? 

Stay tuned!!

I will be posting blogs with in-depth examples of specific features in the SAS VDMML and adding links to the new blog posts here as they are posted. If you there’s any specific features you would like to know more about, leave a comment below! 

Visual machine learning using SAS Viya: a Graduate Intern’s perspective was published on SAS Users.

7月 122019
 

Are you a seasoned data scientist looking for a fast, all-inclusive machine learning solution? Curious about machine learning but have little to no programming experience? Interested in using AI to take over the world? Follow my lead and use SAS VDDML to fast track your world domination.

This blog is the beginning of a series on  SAS Visual Data Mining and Machine Learning (VDMML) told from my perspective as a first-time SAS Viya user, Graduate Intern at SAS, and ABD PhD Candidate in Computer Science. I'm writing this series for two main reasons: 1) to express how surprised I am at seeing how easily complex tasks can be completed after doing it the hard way for years and 2) to provide examples to convince you, too.  

SAS VDMML is only one of many products available in SAS Viya®. Its distinguishing feature being the machine learning pipelines which are created in a single, integrated in-memory environment via a drag and drop interface.

In this post, I will provide a high-level overview of a few of the features available in SAS VDMML. In the next posts, I will provide detailed examples and code comparisons for individual features, such as pipeline creation and autotuning. 

Tip: At the bottom of the post, I talk about a course on machine learning using SAS Viya that provides access to the software and teaches machine learning basics. 

simpleDecisionTreePipeline

Simple custom pipeline using SAS Viya

If you've never used SAS VDMML, here are the top 3 reasons why I think you should check it out. 

sasvdmml_hyperparameters

A sample of the variety of hyperparameter modifications available.

SAS VDMML creates a simplified approach to machine learning solutions beneficial to people with a wide range of expertise.

Have you been programming for as long as you can remember and are well-versed in the machine learning world?

Why spend all your precious mental energy focusing on tedious programming tasks? Instead, you could be focusing your energy on diving deeper in your data and discovering the extent of its modelling capabilities.

After spending only a week familiarizing myself with the interface, I felt confident I could perform my normal tasks with ease and with better hyperparameter tuning and more comprehensive model evaluations.

If you are wondering if these simplifications limit the customization of models, think again. For most uses, the customization options available match the level programming provides by utilizing features such as drop-down menus and editable text boxes - eliminating unnecessary mental overhead. 

 

opensourcecodenode

Open Source Code node as a supervised learning node and example code in code editor

Still itching to program?

You have options: the SAS Code node and the Open Source Code node (available for use with R and Python). Both nodes can be used in any part of the pipeline, including preprocessing, supervised learning, and post-processing.

For example, you may have preprocessing code for extra messy data already written in R. All you need to do is add the Open Source Code node, insert your code, and update the variables to match the provided macros. Or, maybe you want to try the Deep Learning toolkit and CAS action? Drop a SAS Code node into your pipeline, add your code, and you are good to go!

 

Little to no programming experience or not quite a machine learning expert?

The drag and drop interface, wide selection of templates, and the extensive evaluations allow for almost anyone to produce professional-level results in a matter of minutes. While using SAS VDMML might not require expert-level knowledge on machine learning, important projects should have an expert review the approach and results. 

Example of creating a new pipeline using an advanced template

sasvdmml_pipelinenodeoptions

A sample of options for preprocessing and supervised learning.

The days of spending weeks programming scripts for feature extraction, fine-tuning models, and evaluating your model are over!

For example, let's say I'm attempting to impute some variables using R. First, I might store the names of  each columns separately based on the type of imputation I want to perform on it. Then, I could create the code for each type of imputation. If I'm only attempting 2 different types of imputation, I will most likely need less than 10 lines of code. Not much, right?

But, I will also need to test and verify that each variable has been imputed correctly.

Instead, the same task in SAS VDMML would just require you to drop an Imputation node into the pipeline and select via a drop-down menu how to impute the variables - no time wasted.

Additionally, you can quickly compare a variety of supervised learning methods as well as test out the same model with different pre-processing methods using the automatic evaluations provided.  

Looking to save even more time?

You can use the autotuning feature to select the best set of hyperparameters for your model by turning it on in your supervised learning node of choice and hitting run. 

sasvdmml_autotuning

Turn on the autotuning feature inside your supervised learning node, then adjust ranges for the hyperparameters.

After the run is complete, view the supervised learning model's results to see the best configuration of parameters as determined by autotuning. 

sasvdmml_autotuningresults

Example of the results after using autotuning for a decision tree.

All of this can be accomplished with a few clicks, which eliminates the hours spent debugging scripts and connecting the steps in your workflow.

stressedatcomputerYou’ve spent the last hour transforming and creating features in your code editor of choice. Now, after waiting 30 minutes for your model to run again, you get the same results! How? Wait...you’ve forgotten to update the reference to your new data, AGAIN. (This has definitely not happened to me.) 

Fortunately, SAS VDMML only allows you to view results if the pipeline is up-to-date, which ensures that all changes are accounted for. Now, instead of checking and checking again that I passed the right data to the right functions, I can immediately know that my small tweak had no effect on the results. *sigh* OR on a brighter note, that the drastic improvement is not a fluke!  

Updating the Feature Extraction node resets all child nodes below - ensuring that the pipeline stays up-to-date.

Interested in checking out SAS Viya?

Machine Learning Using SAS Viya is a course that teaches machine learning basics, gives instruction on using SAS Viya VDMML, and provides access to the SAS Viya for Learners software all for $79.This course is the pre-requisite course for the SAS Certified Specialist in Machine Learning Certification. Going through the course myself, I was able to quickly learn how to use SAS VDMML and received a refresher on many data preprocessing tactics and machine learning concepts. 

Want to learning more? 

Stay tuned!!

I will be posting blogs with in-depth examples of specific features in the SAS VDMML and adding links to the new blog posts here as they are posted. If you there’s any specific features you would like to know more about, leave a comment below! 

Visual machine learning using SAS Viya: a Graduate Intern’s perspective was published on SAS Users.