Cindy Wang

9月 162020
 

There are three types of visualization APIs defined in the SAS Viya REST API reference documetation: Reports, Report Images and Report Transforms. You may have seen the posts on how to use Reports and Report Images. In this post, I'm going to show you how to use the Report Transforms API. The scenario I am using changes the data source of a SAS Visual Analytics report and saves the transformed report.

Overview of the Report Transforms API

The Report Transforms API provides simple alterations to SAS Visual Analytics reports, and it uses the 'application/vnd.sas.report.transform' media type for the transformation (passed in the REST API call header). When part of a request, the transform performs editing or modifications to a report. If part of a response, the transform describes the operation performed on the report. Some typical transformations include:

  • Replace a data source of a report.
  • Change the theme of a report.
  • Translate labels in a report.
  • Generate an automatic visualization report of a specified data source and columns.

To use the Transforms API, we need to properly set the request body and some attributes for the transform. After the transform, the response contains the transformed report or a reference to the report.

Prepare the source report

This step is very straight-forward. In SAS Visual Analytics, create a report and save it to a folder (instructions on creating a report are found in this video). For this example, I'll use the 'HPS.CARS' table as the data source and create a bar chart. I save the report with name 'Report 1' in 'My Folder'. I'll use this report as the original report in the transform.

Generate the request body

I will use PROC HTTP to call the Transforms API using the 'POST' method and appending the URL with '/reportTransforms/dataMappedReports'. The call needs to set the request body.

  1. Get the ReportURI: In an earlier post I outlined how to get the reportURI via REST API, so I won't go into details. If you'd like an easy way, try this: in SAS Visual Analytics, choose 'Copy Link…'item from the menu. In the pop-up dialog, expand the 'Options' and choose 'Embedded Web Component', and you see there is a string in the form reportUri='/reports/reports/…', that's it. In the request body, we set the string to the 'inputReportUri' to specify the original report - the 'Report 1'.
  2. Report URI from SAS Visual Analytics

  3. Decide on changes to the data source: Here I’d like to change the data source from ‘HPS.CARS’ to ‘CASUSER.CARS_NEW’. The new table uses three columns from ‘HPS.CARS’ as mapped below.
  4. Select columns to include in new table

  5. Specify the data sources in the request body: The request requires two data sources, 'original' and 'replacement', respectively, representing the data sources in original report and the transformed report. Note that the 'namePattern' value is used to enumerate the way of identifying the data source. If it is set to 'uniqueName', the data source is identified by its unique data item name in the XML file of the report. If it is set to 'serverLibraryTable', the data source is identified by the CAS Server, CAS Library and Table names together. The snippets below show the data source section in the request body. I like to use the 'serverLibraryTable' to specify the data source for both original and transformed report, which is clear and easy.
  6. /* data source identification for original report */
      {
        "namePattern": "serverLibraryTable",
        "purpose": "original",
        "server": "cas-shared-default",
        "library": "HPS",
        "table": "CARS"
      }
     
    /* data source identification for transformed report */
      {
        "namePattern": "serverLibraryTable",
        "purpose": "replacement",
        "server": "cas-shared-default",
        "library": "CASUSER",
        "table": "CARS_NEW",
        "replacementLabel": "NEW CARS",
        "dataItemReplacements": [
          {
            "originalColumn": "dte",
            "replacementColumn": "date"
          },
          {
            "originalColumn": "wght",
            "replacementColumn": "weight"
          },	
          {
            "originalColumn": "dest",
            "replacementColumn": "region"
          }
        ]
      }

    Set more attributes for transform

    Besides the request body, we need to set some other attributes for the transform API when changing the data source. These include 'useSavedReport', 'saveResult', 'failOnDataSourceError' and 'validate'.

    • useSavedReport specifies whether to find the input (original) report as a permanent resource. Since I am using the saved report in the repository, I will set it to true.
    • saveResult specifies to save the transformed report permanently in the repository or not. I am going to save the transformed report in the repository, so I set it to true.
    • failOnDataSourceError specifies whether the transform continues if there is a data source failure. The default value is false, and I leave it as such.
    • The validate value decides if the transform will perform the XML schema validation or not. The default value is false, and I leave it is as such.

    Decide on a target report and folder

    I'll save the transformed report with the name 'Transformed Report 1' in the same folder as the original 'Report 1'. I set the 'resultReportName' to 'Transformed Report 1', and set the 'resultReport' with 'name" and 'description' attributes. I also need to get the folderURI of the 'My Folder' directory. You may refer my previous post to see how to get the folderURI using REST APIs.

    Below is the section of the settings for the target report and folder:

    "resultReportName": "Transformed Report 1",
    "resultParentFolderUri": "/folders/folders/cf981702-fb8f-4c6f-bef3-742dd898a69c",
    "resultReport": {
    			    "name": "Transformed Report 1",
    			    "description": "TEST report transform"
    			}

    Perform the transform

    Now, we have set all the necessary parameters for the transform and are ready to run the transform. I put my entire set of code on GitHub. Running the code creates the 'Transformed Report 1' report in 'My Folder', with the data source changing to CASUSER.CARS_NEW', containing the three mapped columns.

    Check the result

    If the API failed to create the transformed report, the PROC SQL statements displays an error code and error message. For example, if the replacement data source is not valid, it returns errors similar to the following.

    Invalid data source error message

    If the API successfully creates the transformed report, “201 Created” is displayed in the log. You may find more info about the transformed report from the response body of tranFile from PROC HTTP. You can also log into the SAS Visual Analytics user interface to check the transformed report is opened successfully, and the data sources are changed as expected. Below is the screenshot of the original report and transformed report. You may have already noticed they use different data sources from data labels.

    Original and transformed reports in SAS Visual Analytics

    Finally

    There are a wealth of other transformations available through the Report Transform APIs. Check out the SAS Developer site for more information.

Using REST API to transform a Visual Analytics Report was published on SAS Users.

8月 212020
 

SAS Viya is an open analytics platform accessible from interfaces or various coding languages. REST API is one of the widely used interfaces. Multiple resources exist on how to access SAS Visual Analytics reports using SAS Viya REST API. For example Programmatically listing data sources in SAS Visual Analytics by my colleague Michael Drutar. His post shows how to list the data sources of VA reports. Also, in Using SAS Viya REST APIs to access images from SAS Visual Analytics, Joe Furbee demonstrates how to retrieve report images. In this post, I am going to show you how to get the path for SAS Visual Analytics reports using REST APIs.

Full API reference documentation for SAS REST APIs is on developer.sas.com. You can exercise REST APIs in several ways such as curl, browsers, browser plugins, or any other REST client. Here I am going to access the SAS Viya Visualization and Core Services REST API with SAS Code. The Visualization service APIs provide access for reports, report images, and report transforms. The Core Services APIs provides operations for resources like folders, files, authorization, and so on.

Composition of a report object

The chart below describes the object composition of VA reports, from an API perspective. We see the report object itself has metadata storing the report properties like id, name, creator, modified date, and links, etc. Each VA report object is identified uniquely by its ID in SAS Viya. The report content object, presented in either XML or JSON format, is stored separately from the report object. The report content object enumerates the data and image resources, generating visual elements such as graphs, tables, and images.

Reports API definition

Get a list of reports

Let's begin with a scenario of getting a list of reports. These reports may be returned from a search or a filter in Viya, or a list you've got at hand. (The SAS Viya support filter link has more information on using the filter.) Here I'm using a filter to get a list of reports named 'Report 2'. I use Proc HTTP to access the Reports API in the Visualization service with a 'GET' request and '/reports/reports?filter=eq(name,'Report 2')' in the URL. Note, the HEADERS of Proc HTTP need to be set properly to generate expected results. Below is a snippet for this.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME rptFile TEMP ENCODING='UTF-8';
PROC HTTP METHOD = "GET" oauth_bearer=sas_services OUT = rptFile
      /* get a list of reports, say report name is 'Report 2' */
      URL = "&BASE_URI/reports/reports?filter=eq(name,'Report 2')";
      HEADERS "Accept" = "application/vnd.sas.collection+json"
               "Accept-Item" = "application/vnd.sas.summary+json";
RUN;
LIBNAME rptFile json;

The results of running the code above returns a list in the ITEMS table, in the rptFile json library. It returns about 10 reports with the same name of 'Report 2', each with a unique id.

ITEMS table report for 'Report2' query

Get the report content object of a VA report

Using the Reports API of the Visualization service, we can get the report content object of a VA report. As shown in the snippet below, by making a 'GET' request to the SAS Viya server followed by the '/reports/reports//content' in the URL, the report content object is retrieved.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME rptFile TEMP ENCODING='UTF-8';
PROC HTTP METHOD="GET" oauth_bearer=sas_services OUT=rptFile
           URL = "&BASE_URI/reports/reports/<report id>/content";
           HEADERS "Accept" = "application/vnd.sas.report.content+json";
RUN;
LIBNAME rptFile json;

In the output, we see the rptFile json library enumerates the data and image resources in the report content object. Below shows what I retrieved from a report content object.

Contents of rptFile json library

Notice the DATASOURCES_CASRESOURCE table, which Michael uses in Programmatically listing data sources in SAS Visual Analytics. You may explore more information in these tables if interested, such as report states, visual elements, etc. In this post, I won't dig further into the report content object.

Get the metadata of a report object

Next, I am going to get the metadata of a report object with its unique report id using the Reports API in the Visualization service. I use the 'GET' request and '/reports/reports/' in the URL. By runing the code snippet below, I get the metadata of the report object in the rptFile json library.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME rptFile TEMP ENCODING='UTF-8';
PROC HTTP METHOD="GET" oauth_bearer=sas_services OUT=rptFile
        URL = "&BASE_URI/reports/reports/cecac7d7-b957-412e-9709-a3fe504f00b1";
        HEADERS "Accept" = "application/vnd.sas.report+json";
RUN;
LIBNAME rptFile json;

Below is part of the ALLDATA table from the rptFile library. The table contains metadata of the report object, including its unique id, name, creator, creationTimeStamp, modifiedTimeStamp, links, and so on. But in the table, I can't find the folder location of the report object.

ALLDATA table from the rptFile library

Get the report object folder location

So far, I've retrieved most of the metadata info we are looking for, but not the report object folder location. All VA reports are put under the /SAS Content/ folder or its subfolders in SAS Viya. Yet, no such information exists in the report object or the report content object. How can I get the path of a VA report under the /SAS Content/ folder?

The answer is to use the Folders service on the Core Services API. Folders provide an organizational structure for SAS content as well as external content in Viya. The Folders object itself is a virtual container for other resources or folders, and it persists only the URI of resources managed by other services.

A folder object has two types of members: child and reference. Whereas resources can have references in multiple folders, they are restricted to being the child in a single folder. Resources like VA reports are added as child members of a folder, and the folder persists the URI of the is VA report. Thus, we get the folder reversely from the child report by looking for the ancestors of this report object.

By using the Folders API in Core services with a 'GET' request and '/folders/ancestors?childUri=' in the URL, the Proc HTTP code below gets the ancestors of the VA report before getting the full path.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME fldFile TEMP ENCODING='UTF-8';
PROC HTTP METHOD="GET" oauth_bearer=sas_services OUT=fldFile
          URL = "&BASE_URI/folders/ancestors?childUri=/reports/reports/cecac7d7-b957-412e-9709-a3fe504f00b1";
          HEADERS "Accept" = "application/vnd.sas.content.folder.ancestor+json";
RUN;
LIBNAME fldFile json;

From the fldFile.ANCESTORS table, we see the metadata of the ancestor folders, including folder id, folder name, creator, type, and its parentFolderURI, etc. The screenshot below is part of the ANCESTORS table. Thus, the path of the specific report concatenates these subfolders to a full path of /SAS Content/NLS/Cindy/.

Folder path detailed in the ANCESTORS table

Get the path for VA reports

Now I have several reports, I need to go through the above steps repeatedly for each report. So, I wrote SAS code to handle these:

  1. Filter those reports named 'Report 2', using the reports API in Visualization service. Save the list of reports in the ds_rpts dataset. The results include metadata for report id, name, createdBy, CreatedAt, and LastModified.
  2. For each report in the ds_rpts data set, call the macro named 'save_VA_Report_Path(reportURI)'. The macro accesses the Folders API in Core Services, and saves the path for a given report back in the rptPath column of the ds_rpts data set.
  3. Print the list of reports with path and other metadata.

The code yields the following output:

List of reports with paths and metadata

You may access my code samples from GitHub and give it a try in your environment. I run the code with SAS Studio 5.2 and VA on SAS Viya 3.5. You may prefer to modify the filter condition as needed (such as createdBy, contains, or more from SAS Viya support filter).

Finally

The Reports API is one of many SAS Viya REST APIs. In this post, I've provided multiple discovery paths to follow. You can find more information about this and other APIs on the SAS Viya REST APIs page on the developers portal.

Discover Visual Analytics Report Paths with REST APIs was published on SAS Users.

9月 132019
 

Time-series decomposition is an important technique for time series analysis, especially for seasonal adjustment and trend strength measurement. Decomposition deconstructs a time series into several components, with each representing a certain pattern or characteristic. This post shows you how to use SAS® Visual Analytics to visually show the decomposition of a time series so that you can understand more about its underlying patterns.

Characteristics of time series decomposition

Time series decomposition generally splits a time series into three components: 1) a trend-cycle, which can be further decomposed into trend and cycle components; 2) seasonal; and 3) residual, in an additive or multiplicative fashion.

In additive decomposition, the cyclical, seasonal, and residual components are absolute deviations from the trend component, and they do not depend on trend level. In multiplicative decomposition, the cyclical, seasonal and residual components are relative deviations from the trend. Thus, we often see different magnitudes of seasonal, cyclical and residual components when comparing with the trend component, while the trend component keeps the same scale as the original series.

How to begin a time series decomposition

SAS provides several procedures for time series decomposition, I use the PROC Timeseries in this post. Now the first step is to decide whether to use additive or multiplicative decomposition. You know SAS PROC Timeseries provides multiplicative (MODE=MULT), additive (MODE=ADD), pseudo-additive (MODE=PSEUDOADD) and log-additive (MODE=LOGADD) decomposition. You can also use the default MODE option of MULTORADD to let SAS help you make a decision based on the feature of your data. Good thing is, you can always use the log transformation whenever there is a need to change a multiplicative relationship to an additive relationship. The plot option in PROC Timeseries can produce graphs of the generated trend-cycle component, seasonal component and residual component. In this post, I would like to output the OUTDECOMP dataset from PROC Timeseries, load the data and visualize the decomposed time series with SAS Visual Analytics to understand more about their patterns.

See how it's done

I decompose the time series in the SASHELP.AIR dataset as an example. The series involves data about international air travel with monthly data points from Jan 1949 to Jan 1961, as pictured below:

We see an obvious upward trend and significant seasonality in the original series, with more and more intensive fluctuation around the trend. This indicates that the multiplicative decomposition of trend and seasonality components is more appropriate. I get the decomposed components using this SAS code. Here I do not explicitly give the mode option first, and let SAS use the default MODE=MULTORADD option. Since the values in this time series are strictly positive, SAS eventually specifies the MODE=MULT to generate the decomposed series in the OUTDECOMP dataset (see details in the document).

When you load the data set into SAS Visual Analytics and make visualizations, it’s very straight forward to draw a time-series plot showing the decomposed series, respectively.

Note that the magnitudes of the Trend-Cycle-Seasonal and Trend-Cycle components are much larger than those of the Seasonal, Irregular and Cycle components. The upward trend and increasing volatility of the Trend-Cycle-Seasonal component reveal an obvious multiplicative composition of Trend-Cycle and Seasonal components. The formula should be: Trend-Cycle-Seasonal Component = Trend-Cycle Component * Seasonal Component.

Can you visually show the multiplicative relationship in the series?

I can easily make the log transformation of the decomposition series using the calculated item in SAS Visual Analytics, and accordingly show the additive relationship of the transformed series. The visualization below shows the additive relationship of the log transformation of the Trend-Cycle-Seasonal component with the log transformations of Trend-Cycle component and Seasonal component, which is the equivalent of the pre-transformed multiplicative relationship.

In the visualization below, the moss-green line series at the bottom of the chart shows the Log Seasonal component, with each vertical black line representing its value. The lines at the top show that the value of the orange line series (the Log Trend-Cycle component) adds to the value that the mint-green vertical lines (value of the Log Seasonal component) will make to the pine-green line series (the Log Trend-Cycle-Seasonal component).

In the list table, note that the value of the calculated item 'Trend-Cycle Component * Seasonal Component’ is equal to the 'Trend-Cycle-Seasonal Component' value highlighted in blue, which indicates the multiplicative composition of 'Trend-Cycle Component' and 'Seasonal Component' to the 'Trend-Cycle-Seasonal Component.' Also, summation of the calculated item 'Log Trend-Cycle Component' and the 'Log Seasonal Component' is equal to the value of 'Log Trend-Cycle-Seasonal Component' in light green. They verify the multiplicative and additive relationships, respectively.

More ways to expose and view patterns

Besides the above multiplicative decomposition, we can dig for more multiplicative or additive relationships from the original series and the decomposed series. Here are the formulas:

Original Series = Trend-Cycle-Seasonal Component * Irregular Component

Seasonal-Irregular Component = Seasonal Component * Irregular Component

Original Series = Seasonal Adjusted Series * Seasonal Component

Trend-Cycle Component = Trend Component + Cycle Component 1

[ 1 Note: Despite setting the MODE=MULT option, SAS Proc Timeseries uses the Hodrick-Prescott filter, which always decomposes the trend-cycle component into the trend component and cycle component in an additive fashion. ]

Considering the decomposed dataset from various time series will have the fixed structure as shown below, we can easily apply the visualizations in SAS Visual Analytics to the decomposed series from different time series. Just applying the new dataset, all the calculated items will be inherited accordingly, and the new data will be applied to the visualizations automatically. That’s the thing I like most for visualizing time series decomposition in SAS Visual Analytics.

A final decomposition comparison

Let’s compare the multiplicative decomposition and the additive decomposition of the same series. Note the Trend-Cycle components (as well as Trend component and Cycle component) from multiplicative and additive decomposition are the same, meaning that the seasonal component is decomposed differently in multiplicative and additive decomposition.

In the screenshot below, we see that the two seasonal components have similar seasonal fluctuation style, but the value of seasonal components are largely different between multiplicative and additive decomposition. Different decomposition method also leads to different Trend-Cycle-Seasonal component, Irregular component and Seasonal-Irregular component. In addition, we see still some patterns there in the Irregular component from additive decomposition.

But in multiplicative decomposition, the Irregular component seems more random-like. Thus, the multiplicative decomposition is a better choice than additive decomposition for SASHELP.AIR time series.

PROC Timeseries provides classical decomposition of time series, and SAS has other procedures that can perform more complex decomposition of time series. If you want to visualize time series decomposition in a way you like, give SAS Visual Analytics a try!

SAS® Visual Analytics on SAS® Viya® Try it for free!

How to Visualize Time Series Decomposition using SAS Visual Analytics was published on SAS Users.

8月 282019
 

Moving Average (MA) is a common indicator in stocks, securities and futures trading in financial markets to gauge momentum and confirm trends. MA is often used to smooth out short-term fluctuations and show long-term trends. But most MA indicators have big lags in signaling a changing trend. To be faster to capture a trend reversal, several New MA indicators are now available that more quickly detect trend changes – and of those, the Hull Moving Average (HMA), is one of the most popular. This post demonstrates its superiority.

A closer look at HMA

Developed by Alan Hull, it's faster and thus a more useful signal than others. Its main advantage over general MA indicators is its relative smoothness as it signals change. Commonly-used MA indicators include Simple Moving Average (SMA), Weighted Moving Average (WMA) and so on. SMA calculates the arithmetic mean of the prices, which gives individual value equal weight. WMA averages individual values with some predetermined weights.

Since moving averages are computed from prior data, all MA indicators suffer a significant drawback of being a lagging indicator. Even in a shorter-period of moving average, which has less lag than one with a longer period, a stock price may drop sharply before a MA indicator signals the trend change. The Hull Moving Average (HMA) uses weighted moving average and the square root of the period instead of the actual period itself, which leads it to be more responsive to most recent price activity, whilst maintaining smoothness.

According to Alan Hull, the formula for HMA is:

We see that the major computing components in HMA are three WMAs. Refer to the specification here, we have the corresponding WMA formula as pictured below. In the WMA formula, the weight of each price value is related to the position of the value and the period length. The more recent the higher weights, and the shorter of the period the higher weights.

HMA in action

In the remainder of this post, I will show how to calculate HMA of a stock price using calculated items in SAS Visual Analytics and show that HMA gives faster upward/downward signals than SMA. I use the data from SASHELP. STOCK with ‘IBM’ as an example. The data needs to be sorted by the date and a column (named ‘tid’) added to hold the sequence number before loading into SAS Visual Analytics for calculation. The data preparation codes can be found here. After loading the data into SAS Visual Analytics, we can start by creating the calculated items. Here, I set the period length to 5 in calculation (i.e. =5 in the formula) and calculate HMA for ‘Close’ price of IBM stock for example.

1. Calculate the first WMA like so...

... using the AggregateCells operator in SAS Visual Analytics. I name it as 'WMA(5/2 days)'. Have the data value in, note I’ve rounded the (5⁄2) to an integer of 3. That is, the aggregation is starting from the previous two (-2) row and ending at current row (0). The corresponding formula of the calculated item ‘WMA(5/2 days)’ in SAS VA is:

AggregateCells(_Sum_, ( 'Close'n * 'tid'n ), default, CellIndex(current, -2), CellIndex(current, 0)) / AggregateCells(_Sum_, 'tid'n, default, CellIndex(current, -2), CellIndex(current, 0))

 

2. Similarly, calculate the second in SAS Visual Analytics:

Name it as ‘WMA(5 days)’. The corresponding formula is:
AggregateCells(_Sum_, ( 'Close'n * 'tid'n ), default, CellIndex(current, -4), CellIndex(current, 0)) / AggregateCells(_Sum_, 'tid'n, default, CellIndex(current, -4), CellIndex(current, 0))

3. Now we calculate the HMA, which computes the third WMA using the two WMAs we get from above calculation. In SAS Visual Analytics, if we directly apply a similar approach for the last WMA calculation, it will show message of operands requiring group. So here, I need a workaround to make the aggregation work.

4. To work around the problem, I create an aggregated item named ‘sumtid’ to indicate the row sequence number in an aggregation way. To do this, firstly create a calculated item named ‘One’ with the constant value 1; then use AggregateCells operator creating the ‘sumtid’ to get the current row number: AggregateCells(_Sum_, 'One'n, default, CellIndex(start, 0), CellIndex(current, 0)).

5. Now we can compute the HMA in a similar way as we do for previous two WMAs. Name it as ‘HMA for close (5 days)’. Since int(√(5 ))=2, the starting position of the aggregation is set to the previous row (-1) and the ending position is set to the current row (0). Note the operands are now using the aggregated item ‘sumtid’. The formula for the ‘HMA for close (5 days)’ item is:

AggregateCells(_Sum_, ( ( ( 2 * 'WMA(5/2 days)'n ) - 'WMA(5 days)'n ) * 'sumtid'n ), default, CellIndex(current, -1), CellIndex(current, 0)) / AggregateCells(_Sum_, 'sumtid'n, default, CellIndex(current, -1), CellIndex(current, 0))

So far, we’ve created the Hull Moving Average of IBM stock Close price and saved it in the calculated item ‘HMA for close (5 days)’. We can easily draw its time series plot in SAS Visual Analytics. Now, I'll create a Simple Moving Average of ‘SMA for the close (5 days)’ with an equal weight, and then compare it with the HMA. The formula for ‘SMA for the close (5 days)’ is: AggregateCells(_Average_, 'Close'n, default, CellIndex(current, -4), CellIndex(current, 0))

Now let’s visualize the ‘SMA for the close (5 days)’ and ‘HMA for close (5 days)’ respectively. In below chars, each grey vertical bar shows the monthly price span of IBM stock, and the red lines correspond to SMA and HMA respectively. With the upper SMA line, we see constant lags with price changing and poor smoothness. And with bottom HMA line, we see rapid keep-up with price activities while maintaining good smoothness.

Below is the comparison of the ‘SMA for the close (5 days)’, ‘HMA for close (5 days)’ and the Close price. Besides smoothing out some fluctuations in Close price, the HMA indeed gives better signal than SMA does in indicating a turning point when there is an upward/downward trend reversal. Note the obvious lags of SMA compared to HMA. For example, compare the trends around the reference line in the visualization below. The Close price reached to a local peak at Jun1992 and started to go down from Jul1992. HMA quickly reflected the downward turn with one lag at Aug1992, while SMA still showed the rising trend in the meantime. SMA started to go down with one more lag to give the reversal signal.

Now it’s easy to understand why HMA is a better indicator than SMA to signal the reversal point. What has been your experience with HMA?

How to Calculate Hull Moving Average in SAS Visual Analytics was published on SAS Users.

8月 172018
 

Data density estimation is often used in statistical analysis as well as in data mining and machine learning. Visualization of data density estimation will show the data’s characteristics like distribution, skewness and modality, etc. The most widely-used visualizations people used for data density are boxplot, histogram, kernel density estimates, and some other plots. SAS has several procedures that can create such plots. Here, I'll visualize the kernel density estimates superimposing on histogram using SAS Visual Analytics.

A histogram shows the data distribution through some continuous interval bins, and it is a very useful visualization to present the data distribution. With a histogram, we can get a rough view of the density of the values distribution. However, the bin width (or number of bins) has significant impact to the shape of a histogram and thus gives different impressions to viewers. For example, we have same data for the two below histograms, the left one with 6 bins and the right one with 4 bins. Different bin width shows different distribution for same data. In addition, histogram is not smooth enough to visually compare with the mathematical density models. Thus, many people use kernel density estimates which looks more smoothly varying in the distribution.

Kernel density estimates (KDE) is a widely-used non-parametric approach of estimating the probability density of a random variable. Non-parametric means the estimation adjusts to the observations in the data, and it is more flexible than parametric estimation. To plot KDE, we need to choose the kernel function and its bandwidth. Kernel function is used to compute kernel density estimates. Bandwidth controls the smoothness of KDE plot, which is essentially the width of the sliding window used to generate the density. SAS offers several ways to generate the kernel density estimates. Here I use the Proc UNIVARIATE to create KDE output as an example (for simplicity, I set c = SJPI to have SAS select the bandwidth by using the Sheather-Jones plug-in method), then make the corresponding visualization in SAS Visual Analytics.

Visualize the kernel density estimates using SAS code

It is straightforward to run kernel density estimates using SAS Proc UNIVARIATE. Take the variable MSRP in SASHELP.CARS dataset as an example. The min/max value of MSRP column is 10280 and 192465 respectively. I plot the histogram with 15 bins here in the example. Below is the sample codes segment I used to construct kernel density estimates of the MSRP column:

title 'Kernel density estimates of MSRP';
proc univariate data = sashelp.cars noprint;	
   histogram MSRP / kernel (c = SJPI) endpoints = 10280 to 192465 by 12145 outkernel = KDE  odstitle = title; 
run;

Run above code in SAS Studio, and we get following graph.

Visualize the kernel density estimates using SAS Visual Analytics

  1. In SAS Visual Analytics, load the SASHELP.CARS and the KDE dataset (from previous Proc UNIVARIATE) to the CAS server.
  2. Drag and drop a ‘Precision Container’ in the canvas, and put a histogram and a numeric series plot in the container.
  3. Assign corresponding data to the histogram plot: assign CARS.MSRP as histogram Measure, and ‘Frequency Percent’ as histogram Frequency; Set the options of the histogram with following settings:
    Object -> Title: No title;

    Graph Frame: Grid lines: disabled

    Histogram -> Bin range: Measure values; check the ‘Set a fixed bin count’ and set ‘Bin count’ to 15.

    X Axis options:

       Fixed minimum: 10280

       Fixed maximum: 192465

       Axis label: disabled

       Axis Line: enabled

       Tick value: enabled

    Y Axis options:

       Fixed minimum: 0

       Fixed maximum: 0.5

       Axis label: disabled

       Axis Line: disabled

       Tick value: disabled

  1. Assign corresponding KDE data to the numeric series plot. Define a calculated item: Percent as (‘Percent of Observations Per Data Unit’n / 100) with the format of ‘PERCENT12.2’, and assign it to the ‘Y axis’; assign the ‘Data Value’ to the ‘X axis.’ Now set the options of the numeric series plot with following settings:
    Object -> Title: No title;

    Style -> Line/Marker: (change the first color to purple)

    Graph Frame -> Grid lines: disabled

    Series -> Line thickness: 2

    X Axis options:

       Axis label: disabled

       Axis Line: disabled

       Tick value: disabled

    Y Axis options:

       Fixed minimum: 0

       Fixed maximum: 0.5

       Axis label: enabled

       Axis Line: enabled

       Tick value: enabled

    Legend:

       Visibility: Off

  1. Now we can start to overlay the two charts. As can be seen in the screenshot below, SAS Visual Analytics 8.3 provides a smart guide with precision container, which shows grids to help you align the objects in it. If you hold the ctrl button while dragging the numeric series plot to overlay the histogram, some fine grids displayed by the smart guide to help you with basic alignment. It is a little tricky though, to make the overlay precisely, you may fine tune the value of the Left/Top/Width/Height in the Layout of VA Options panel. The goal is to make the intersection of the axes coincides with each other.

After that, we can add a text object above the charts we just made, and done with the kernel density estimates superimposing on a histogram shown in below screenshot, similarly as we got from SAS Proc UNIVARIATE. (If you'd like to use PROC KDE UNIVAR statement for data density estimates, you can visualize it in SAS Visual Analytics in a similar way.)

To go further, I make a KDE with a scatter plot where we can also get impression of the data density with those little circles; another KDE plot with a needle plot where the data density is also represented by the barcode-like lines. Both are created in similar ways as described in above histogram example.

So far, I’ve shown you how I visualize KDE using SAS Visual Analytics. There are other approaches to visualize the kernel density estimates in SAS Visual Analytics, for example, you may create a custom graph in Graph Builder and import it into SAS Visual Analytics to do the visualization. Anyway, KDE is a good visualization in helping you understand more about your data. Why not give a try?

Visualizing kernel density estimates in SAS Visual Analytics was published on SAS Users.

3月 302018
 

Gradient boosting is one of the most widely used machine learning models in practice, with more and more people like to use it in Kaggle competitions. Are you interested in seeing how to use gradient boosting model for classification in SAS Visual Data Mining and Machine Learning? Here I play with the classification of Fisher’s Iris flower dataset using gradient boosting, and this may serve as a start point to those interested in trying the classification models in SAS Visual Data Mining and Machine Learning product.

Fisher’s Iris data is a well-known dataset in data mining. Per Wikipedia, Fisher developed a linear discriminant model to distinguish the species from each other by the features provided in the dataset. You may already see people run different classification models on this dataset, such as neural network. What I am interested in, is to see how well SAS gradient boosting model will do the species classification.

#1  Explore the dataset

We can easily load Fisher’s Iris dataset from SASHelp.Iris into SAS Viya. The dataset consists of 50 samples each species of Iris Setosa Virginica and Versicolor, totally 150 records with five attributes: Petal Length, Petal Width, Sepal Length, Sepal width and Iris Species. The dataset itself is already well-formed, with neither missing values, nor outliers. Take a quick look of the dataset in SAS Visual Analytics as below.

Gradient boosting

From the chart, we see that the iris species of ‘Setosa’ can be easily distinguished from the ‘Versicolor’ and ‘Virginica’ species by the length and width of their petals and sepals. However, this is not the case for the latter two species, some of them are staggered closely, which makes it a little hard to distinguish each other by these features.

#2  Prepare Data

There is not much effort needed to prepare the data for the prediction. But one thing I’d like to mention here is about the standardization of measure variables. By viewing the measure details in SAS Visual Analytics, we see that neither Petal Length distribution nor Petal Width distribution is normal. You may wonder if we need to normalize the data before applying it to the model for analysis, but this leads to one great thing I like the Gradient Boosting model. Users do not need to explicitly standardize quantitative data. Tree-base models should be robust to such problem in an input feature, since the algorithm is based on node splits. (Here is an article discussing a similar problem.)

So, here my data preparation is just doing the data partitioning before starting the classification on iris species. I need to make sure each partition will follow the same distribution on different species in the iris dataset. This can be achieved easily in SAS Visual Analytics by adding a partition data item - by setting the Sampling method to ‘Stratified sampling’ and add the ‘Iris Species’ as the column to be stratified by. I define two partitions so I have training partition, validation partition. I set 60% for training, and 40% for validation partition, with random seed 1234. Thus, a categorical data item ‘Partition’ is added, with value of 0 for validation, 1 for training partition. (For easier understanding in the charts, I’ve created a custom category called ‘Partitions’ based on the ‘Partition’ data item values.)

The charts below show that the 150 rows in Fisher’s Iris dataset are distributed equally into three species, and the created partitions are sampled with the same percentage among the three species.

#3  Train the gradient boosting model

Training various models in SAS Visual Data Mining and Machine Learning allows us to appreciate the advantages of visualization, and it’s very straight-forward for users. In ‘Objects’ tab, drag and drop the ‘Gradient Boosting’ to the canvas. Assign the ‘Iris Species’ as response variable, and ‘Petal Length, Petal Width, Sepal Length, Sepal width’ as predictors. Then set the ‘Partition’ data item for Partition ID. After that, the system will train the model and show the model assessment. I’ve taken a screenshot for ‘Virginica’ event as below.

The response variable of Iris Species has three event levels – ‘Setosa’, ‘Versicolor’ and ‘Virginica’, and we can choose desired event level to have a look of the model output. In addition, we may switch the assessment plot of Lift to ROC plot, or to Misclassification plot (Note: the misclassification plot is based on event level, thus it will show the ‘Setosa’ and ‘NOT Setosa’ species if we choose the ‘Setosa’ event.). Below is a screenshot with ROC plot and the model assessment statistics.

In practice, training models usually cost a lot of effort in tuning model parameters. SAS Visual Data Mining and Machine Learning has provided the ‘Autotune’ feature that can help this, users may decide some settings like maximum iterations, seconds, and evaluations and the product will choose the optimal values for the hyperparameters of the model. Considering that this dataset only has 150 samples, I won’t bother to do the hyperparameters tuning.

#4  Make prediction by the model

Now I can start to make predictions from the gradient boosting model for the data in testing partition. There are several ways to go here. In Visual Data Mining and Machine Learning, on the right-button mouse menu, either click the ‘Export model…’ or click the ‘Derive predicted…’ menu. The first one will export the model codes, so you can run it in SAS Studio with your data to be predicted. The latter one is very straight-forward in SAS Visual Data Mining and Machine Learning. It will pop up the ‘New Prediction Items’ page, where you may choose to get the predicted value and its probability values for all the levels of Iris Species. These data items will be added to the iris CAS table for further evaluation. Since the iris dataset has three species in the sample, I need to set ‘All levels’ so the prediction will give out the classification in three species and their probabilities.

#5  Review the prediction result

In the model assessment tab, we already see the model assessment statistics for model evaluation. We may also switch to ‘Variable Importance’ tab, or ‘Lift’ tab, ‘ROC’ tab, and ‘Misclassification’ tab to see more about the model. Here I’d like to visually compare the predicted species value with the iris species value provided in the dataset.

To show how many failures of the classification visually, I perform following actions:

  • In SAS Visual Analytics, create a list table to show all 150 rows of the iris dataset. Since there is no primary key in the dataset, the SAS Visual Analytics list table will do aggregation for measure variables by default, so be sure to set the ‘Detail data’ option in the Options tab.
  • Create a calculated item (named ‘equals’) to compare if the values of ‘Iris Species’ and ‘Predicted: Iris Species’ columns are equal: {IF ( 'Iris Species'n = 'Predicted: Iris Species'n ) RETURN 1 ELSE 0. }
  • Define a display rule with the calculated item to highlight the misclassified rows. I’ve sorted the table by above ‘equals’ value so those rows without equal value of ‘Iris Species’ and ‘Predicted : Iris Species’ columns are shown on top.

We see four rows are misclassified by the model, 3 of them are from training partition and 1 from validation partition. So far, the result looks not bad, right?

We may continue to tune the parameters of gradient boosting model easily in SAS Visual Data Mining and Machine Learning, to improve the model. For example, if I set smaller leaf size value to 2 instead of the default value of 5, the model accuracy will be improved (too good to be true?). See below screenshot for a comparison.

Of course, people may like to try tuning other parameters, or to generate more features to refine the model. Anyway, it is easy-to-use and straight-forwarded to do classification using gradient boosting model in SAS Visual Data Mining and Machine Learning. In addition, there are many other models in SAS Visual Data Mining and Machine Learning people may like to run for classification. Do you like to play with the other models for practicing?

Play with classification of Iris data using gradient boosting was published on SAS Users.

10月 132017
 

Every year in early October, the eyes of the world turn to Sweden and Norway, where the Nobel Prize winners are announced to the world. The Nobel Prize is considered the world's most prestigious award. Since 1901, the Prize has been presented to individuals and organizations that have made significant achievements in the fields of physics, chemistry, physiology or medicine, world peace and literature in each year (there were several exceptions during war years). In 1968, Sveriges Riksbank established the Sveriges Riksbank Prize in Economic Sciences in memory of Alfred Nobel, founder of the Nobel Prize. Today, individuals or organizations who are awarded Nobel Prizes and the Prize in Economic Sciences are called Nobel Laureates.

So far, more than 900 Nobel Laureates have been awarded. In this post, I wanted to learn a little more about these impressive individuals. Where were these Nobel Laureates from? Why do they get awarded? Is there any common characteristics you’ll find in these Laureates? Below you’ll find a preliminary analysis of Nobel Laureates using SAS Visual Analytics.

The analysis is based on data from List_of_Nobel_laureates, List of Nobel laureates by university affiliation and Nobel Laureates datasets at Kaggle, which definitely has some missing and inconsistent values. I have cleaned the data to correct for some obvious inconsistency as possible for my analysis.

How many Nobel Laureates have their been so far?

Recently, 12 new Nobel Laureates were awarded by the 2017 Nobel Prizes and Prize in Economic Sciences, and that makes 923 Laureates in total since the first Nobel Prize in 1901. Some Laureates share one prize, so we see more shared Laureates total in below table. While we see 27 organization winners of the Peace prize, most Laureates are individual winners.

analysis of the Nobel Laureates

The chart below shows the overall trend of annual total Nobel Laureates is increasing year-over-year, as more and more winners are sharing the Prize. The purple circle on the plot indicates that there are shared winners in that year. The average number of winners is about eight each year. Yet there was only one winner in 1916 for the Literature Prize. The most winners came in 2001, with 15 Laureates sharing the prizes. I also note from the chart that during the First World War, there were very few Nobel Prizes awarded, and during the Second World War, there were none.

Moreover, we know that most Nobel Laureates are awarded one Nobel Prize, yet I learned from childhood that the female scientist Marie Curie received two Nobel Prizes. If you search the datasets for winners awarded more than one Prize, you’ll find four scientists accomplished this feat. They are: Marie Curie, Linus Pauling, John Bardeen and Frederick Sanger.

Do Nobel Laureates live longer?

The answer is YES, per the research by Prof. Andrew Oswald from University of Warwick. Winning a Nobel prize adds about 1.5 years to the lifespan of Nobel Laureates compared to those who were merely nominated. Of course, it is not because of the monetary benefits that come with the Nobel Prize, but because of ‘the deep links between mind and body’, and that ‘happiness’ may make people live longer, which makes sense to me.

Since I don’t have the data of Nobel Prize nominees, let’s only test the lifespan of the Nobel Laureates and the ages they got awarded. The average life age of all Nobel Laureates is nearly 80, much older than the global average life expectancy of 71.4 years-old (according to World Health Organization 2015). Digging a bit more, we see Martin Luther King is the Nobel laureate (Peace, 1964) who died at youngest age. He was assassinated at 39 years old. Laureates who lived longest are Rita Levi-Montalcini (Medicine, 1986) and Ronald H. Coase (Economics, 1991), who both lived to 103 years old. You may also notice that the distribution of the Laureates’ lifespan is left skewed, the Nobel Prize winners certainly live longer than most.

In addition, something more worth noting:

  • The most laureates with the longest lifespans are from the Economics and Medicine categories. The Nobel Prize winning economists live longer than other categories’ winners on average. The average lifespan of these economists is about 86 years-old, five years longer than the second category of Medicine.
  • Economics winners are winning the awards at the highest age – 67 years-old on average. More digging shows that the oldest awarded age is 90 when Leonid Hurwicz (Economics, 2007) was awarded his Prize. We see the average awarded age of Physics winners is 56, which is 10+ years younger than that of the Economics winners. Thus, we get the impression that economists need more time to have outstanding achievements.
  • If we compare the time span between Laurates’ average awarded ages and their lifespan, the Physics Prize winners enjoy the longest life time after winning the award – about 20 years on average.
  • It is also worth noting that the Nobel Peace winners have the largest span of awarded age, about 70 years’ span. That’s because the youngest Nobel Laureates Malala Yousafzai, who got awarded of Nobel Peace Prize at 17 years-old in 2014.

The chart below is created in SAS Visual Analytics and shows the awarded ages of all individual Nobel Laureates in different prize categories. The reference line is the average awarded age of 59. It is very easy to note that no Nobel Prize was awarded during 1940-1943 due to the Second World War.

From which universities have Nobel Laureates graduated?

Next, let’s look at the educational background of Nobel Laureates. The left chart below obviously shows that much more Nobel winners hold Doctorate degrees than those of Bachelor or Master degrees. If we see the chart for Literature and Peace categories on the right, the difference is not that big. From the data, we know that the educational background of Nobel Laureates in Physics, Chemistry, Medicine and Economics categories (I call these four categories the scientific categories for easier description later) has the higher percentage of doctorate than that of winners in the Literature and Peace categories.

To learn more about the universities the Laureates in these scientific categories are graduated from, I ranked the top 10 university affiliations for the scientific categories in below chart, and their distribution among these categories, as well as the countries in which these universities are located.

The top 10 university affiliations were selected basing on the highest degree of the scientific categories’ Laureates obtained. That is, if one winner held a Master degree from Harvard University and a Doctorate degree from University of Cambridge, he/she is counted in University of Cambridge but not in the Harvard University. From the parallel coordinates plot, you may have noticed that the Physics in University of Cambridge and the Medicine in Harvard University are their greatest majors respectively. On the right, it shows the countries where these top 10 university affiliations are in United States, United Kingdom, France and Germany. The bar charts on the left show the percentage of educational degrees (Doctorate, Master, Bachelor) of each in the scientific categories (according to the available dataset). In the bottom chart, top 10 universities are ranked by their percentages. Perhaps now you have a great university in your mind for future education?

Next, I created the chart below to show the top eight countries having the university affiliations that more Nobel Prize winners graduated from. (Here the chart only shows for scientific categories, thus it excludes the Nobel Literature Prize and Peace prize.). An obvious trend we see from the chart is that the United States has the most Laureates spanning in the scientific categories after the Second World War, while Germany has more Laureates in the scientific categories comparatively before World War II.

Why do the Nobel Laureates get awarded?

Per the ‘nobelprize.org’, in his excerpt of the will, Alfred Nobel (1833-1896) dictates that his entire remaining estate should be used to endow "prizes to those who, during the preceding year, shall have conferred the greatest benefit to mankind." So Alfred's interests are reflected in the Prize, which said “The whole of his remaining realizable estate constitutes a fund, and the annually interest shall be divided into five equal parts, which shall be apportioned as follows: one part to the person who shall have made the most important discovery or invention within the field of physics; one part to the person who shall have made the most important chemical discovery or improvement; one part to the person who shall have made the most important discovery within the domain of physiology or medicine; one part to the person who shall have produced in the field of literature the most outstanding work in an ideal direction; and one part to the person who shall have done the most or the best work for fraternity between nations, for the abolition or reduction of standing armies and for the holding and promotion of peace congresses.”

Since it’s not easy to seek evidence in the datasets that Nobel Laureates are awarded by fulfilling Alfred’s will, what I do is to use SAS Visual Analytics text topics analysis performing some preliminary text analysis of the ‘Motivation’ field in the dataset for a validation to some extent. The ‘Motivation’ is given by ‘nobelprize.org’ for why the Laureate gets awarded. The analysis shows that the most frequently mentioned word is ‘discovery’, while the most 5 frequently appeared words include ‘work’, ‘development’, ‘contribution’, and ‘theory’. And from the topics analysis result, the top 10 topics are about ‘discovery’, ‘human”, “structure”, “economic”,” technique”, etc., which are reflecting Alfred Nobel‘s will in establishing the Prize. Moreover, the sentimental analysis result shows that the statements in the ‘Motivation’ field are mainly neutral (being ‘objective’), even though there are few positive and negative sentimental statements.

 

I hope you’ve found this analysis of Nobel Laureates data interesting. I believe there are still many other perspectives you can analyze to get insights. Is there anything interesting you see?

A preliminary analysis of the Nobel Laureates was published on SAS Users.

2月 252017
 

As a practitioner of visual analytics, I read the featured blog of ‘Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Python, SAS’ last year with great interest. In the post, the blogger Tim Matteson asked the readers to guess which software was used to create his 18 graphs. My buddy, Emily Gao, suggested that I should see how SAS VA does recreating these visualizations. I agreed.

SAS Visual Analytics (VA) is better known for its interactive visual analysis, and it’s also able to create nice visualizations. Users can easily create professional charts and visualizations without SAS coding. So what I am trying to do in this post, is to load the corresponding data to SAS VA environment, and use VA Explorer and Designer to mimic Matteson’s visualizations.

I want to specially thank Robert Allison for his valuable advices during the process of writing this post. Robert Allison is a SAS graph expert, and I learned a lot from his posts. I read his blog on creating 18 amazing graphs using purely SAS code, and I copied most data from his blog when doing these visualization, which saved me a lot time preparing data.

So, here’s my attempt at recreating Matteson’s 18 visualization using SAS Visual Analytics.

Chart 1

This visualization is created by using two customized bar charts in VA, and putting them together using precision layout so it looks like one chart. The customization of bar charts can be done by using the ‘Custom Graph Builder’ in SAS VA, which includes: set the reverse order for X axis, set the axes direction to horizontal, and don’t show axis label for X axis and Y axis, uncheck the ‘show tick marks’, etc. Comparing with Matteson’s visualization, my version has the tick values on X axis displayed as non-negative numbers, as people generally would expect positive value for the frequency.

Another thing is, I used the custom sort for the category to define the order of the items in the bar chart. This can be done by right click on the category and select ‘Edit Custom Sort…’ to get the desired order. You may also have noticed that the legend is a bit strange for the Neutral response, since it is split into Neutral_1stHalf and Neutral_2ndHalf, which I need to gracefully show the data symmetrically in the visualization in VA.

Chart 2

VA can create a grouped bar chart with desired sort order for the countries and the questions easily. However, we can only put the questions texts horizontally atop of each group bar in VA. VA uses vertical section bar instead, with its tooltip to show the whole question text when the mouse is hovered onto it. And we can see the value of each section in bar interactively in VA when hovering the mouse over.

Chart 3

Matteson’s chart looks a bit scattered to me, while Robert’s chart is great at label text and markers for the scatterplot matrix. Here I use VA Explorer to create the scatterplot matrix for the data, which omitted the diagonal cells and its diagonal symmetrical part for easier data analysis purpose. It can then be exported to report, and change the color of data points.

Chart 4

I used the ‘Numeric Series Plot’ to draw this chart of job losses in recession. It was straightforward. I just adjust some setting like checking the ‘Show markers’ in the Properties tab, unchecking the ‘Show label’ in X Axis and unchecking the ‘Use filled markers’, etc. To make refinement of X axis label of different fonts, I need to use the ‘Precision’ layout instead of the default ‘Tile’ layout. Then drag the ‘Text’ object to contain the wanted X axis label.

Chart 5

VA can easily draw the grouped bar charts automatically. Disable the X axis label, and set the grey color for the ‘Header background.’ What we need to do here, is to add some display rules for the mapping of color-value. For the formatted text at the bottom, use the ‘Text’ object. (Note: VA puts the Age_range values at the bottom of the chart.)

Chart 6

SAS VA does not support drawn 3D charts, so I could not make similar chart as Robert did with SAS codes. What I do for this visualization, is to create a network diagram using the Karate club dataset. The grouped detected communities (0, 1, 2, 3) are showing with different colors. The diagram can be exported as image in VAE.

***I use the following codes to generate the necessary data for the visualization:

http://support.sas.com/documentation/cdl/en/procgralg/68145/HTML/default/viewer.htm#procgralg_optgraph_examples07.htm 

/* Dataset of Zachary’s Karate Club data is from: http://support.sas.com/documentation/cdl/en/procgralg/68145/HTML/default/viewer.htm#procgralg_optgraph_examples07.htm  
This dataset describes social network friendships in karate club at a U.S. university.
*/
data LinkSetIn;
   input from to weight @@;
   datalines;
 0  9  1  0 10  1  0 14  1  0 15  1  0 16  1  0 19  1  0 20  1  0 21  1
 0 23  1  0 24  1  0 27  1  0 28  1  0 29  1  0 30  1  0 31  1  0 32  1
 0 33  1  2  1  1  3  1  1  3  2  1  4  1  1  4  2  1  4  3  1  5  1  1
 6  1  1  7  1  1  7  5  1  7  6  1  8  1  1  8  2  1  8  3  1  8  4  1
 9  1  1  9  3  1 10  3  1 11  1  1 11  5  1 11  6  1 12  1  1 13  1  1
13  4  1 14  1  1 14  2  1 14  3  1 14  4  1 17  6  1 17  7  1 18  1  1
18  2  1 20  1  1 20  2  1 22  1  1 22  2  1 26 24  1 26 25  1 28  3  1
28 24  1 28 25  1 29  3  1 30 24  1 30 27  1 31  2  1 31  9  1 32  1  1
32 25  1 32 26  1 32 29  1 33  3  1 33  9  1 33 15  1 33 16  1 33 19  1
33 21  1 33 23  1 33 24  1 33 30  1 33 31  1 33 32  1
;
run;
/* Perform the community detection using resolution levels (1, 0.5) on the Karate Club data. */
proc optgraph
   data_links            = LinkSetIn
   out_nodes             = NodeSetOut
   graph_internal_format = thin;
   community
      resolution_list    = 1.0 0.5
      out_level          = CommLevelOut
      out_community      = CommOut
      out_overlap        = CommOverlapOut
      out_comm_links     = CommLinksOut;
run;
 
/* Create the dataset of detected community (0, 1, 2, 3) for resolution level equals 1.0 */ 
proc sql;
	create table mylib.newlink as 
	select a.from, a.to, b.community_1, c.nodes from LinkSetIn a, NodeSetOut b, CommOut c
 	where a.from=b.node and b.community_1=c.community and c.resolution=1 ;
quit;

Chart 7

I created this map using the ‘Geo Coordinate Map’ in VA. I need to create a geography variable by right clicking on the ‘World-cities’ and selecting Geography->Custom…->, and set the Latitude to the ‘Unprojected degrees latitude,’ and Longitude to the ‘Unprojected degrees longitude.’ To get the black continents in the map, go to VA preferences, check the ‘Invert application colors’ under the Theme. Remember to set the ‘Marker size’ to 1, and change the first color of markers to black so that it will show in white when application color is inverted.

Chart 8

This is a very simple scatter chart in VA. I only set transparency in order to show the overlapping value. The blue text in left-upper corner is using a text object.

Chart 9

To get this black background graph, set the ‘Wall background’ color to black. Then change the ‘Line/Marker’ color in data colors section accordingly. I’ve also checked the ‘Show markers’ option and changed the marker size to bigger 6.

Chart 10

There is nothing special for creating this scatter plot in VA. I simply create several reference lines, and uncheck the ‘Use filled markers’ with smaller marker size. The transparency of the markers is set to 30%.

Chart 11

In VA’s current release, if we use a category variable for color, the marker will automatically change to different markers for different colors. So I create a customized scatterplot using VA Custom Graph Builder, to define the marker as always round. Nothing else, just set the transparency to clearly show the overlapping values. As always, we can add an image object in VA with precision layout.

Chart 12

I used the GEO Bubble Map to create this visualization. I needed to create a custom Geography variable from the trap variable using ‘lat_deg’ and ‘lon_deg’ as latitude and longitude respectively. Then rename the NumMosquitos measure to ‘Total Mosquitos’ and use it for bubble size. To show the presence of west nile virus, I use the display rule in VA. I also create an image to show the meaning of the colored icons for display rule. The precision layout is enabled in order to have text and images added for this visualization.

Chart 13

This visualization is also created with GEO bubble map in VA. First I did some data manipulation to make the magnitude squared just for the sake of the bubble size resolution, so it shows contrast in size. Then I create some display rules to show the significance of the earth quakes with different colors, and set the transparency of the bubble to 30% for clarity. I also created an image to show the meaning of the colored icons.

Be aware that some data manipulation is needed for original longitude data. Since the geographic coordinates will use the meridian as reference, if we want to show the data of American in the right part, we need to add 360 to the longitude, whose value is negative.

Chart 14

My understanding that one of the key points of this visualization Matteson made, is to show the control/interaction feature. Great thing is, VA has various control objects for interactive analysis. For the upper part in this visualization, I simply put a list table object. The trick here is how to use display rule to mimic the style. Before assigning any data to the list table in VA, I create a display rule with Expression, and at this moment we can specify the column with any measure value in an expression. (Otherwise, you need to define the display rule for each column with some expressions.) Just define ‘Any measure value’ is missing or greater than a value with proper filled color for cell. (VA doesn’t support filling the cell with certain pattern like Robert did for missing value. Therefore, I use grey for missing value to differentiate from 0 with a light color.)

For the lower part, I create a new dataset for interventions to hold the intervention items, and put it in the list control and a list table. The right horizontal bar chart is a target bar chart with the expected duration as the targeted value. The label on each bar shows the actual duration.

Chart 15

VA does not have solid-modeling animation like Matteson made in his original chart, yet VA has animation support for bubble plots in an interactive mode. So I made this visualization using Robert’s animation dataset, trying to make an imitation of the famous animation by the late Hans Rosling as a memorial. I set the dates for animation by creating the dates variable with the first day in each year (just for simplicity). One customization here is: I use the custom graph builder to add a new role so that it can display the data label in the bubble plot, and set the country name as the bubble label in VA Designer. Certainly, we can always filter the interested countries in VA for further analysis.

VA can’t show only a part of the bubble labels as Robert did using SAS codes. So in order to clearly show the labels of those interested countries, I made a rank of top 20 countries of average populations, and set a filter to show data between year 1950 to 2011. I use a capture screen tool to have the animation saved as a .gif file. Be sure to click the chart to see the animation.

Chart 16

I think Matteson’s original chart is to show the overview axis in the line chart, since I don’t see specialty of the line chart otherwise. So I draw this time series plot with the overview axis enabled in VA using the SASHELP.STOCK dataset. It shows the date on X axis with tick marks splitting to months, which can be zoomed in to day level in VA interactively. The overview axis can do the zooming in and out, as well as movement of the focused period.

Chart 17

For this visualization, I use a customized bubble plot (in Custom Graph Builder, add a Data Label Role for Bubble Plot.) so it will have bubble labels displayed. I use one reference line with label of Gross Avg., and 2 reference lines for X and Y axis accordingly, thus it visually creats four quadrants. As usual, add 4 text objects to hold the labels at each corner in the precision layout.

Chart 18

I think Matteson made an impressive 3D chart, and Robert recreated a very beautiful 3D chart with pure SAS codes. But VA does not have any 3D charts. So for this visualization, I simply load the data in VA, and drag them to have a visualization in VAE. Then choose the best fit from the fit line list, and export the visualization to report. Then, add display rules according to the value of Yield. Since VA shows the display rules at information panel, I create an image for colored markers to show them as legend in the visualization and put it in the precision layout.

There you have it. Matteson’s 18 visualizations recreated in VA.

How did I do?

18 Visualizations created by SAS Visual Analytics was published on SAS Users.