SAS Visual Analytics

11月 142017
 

SAS Visual Analytics 7.4 has added the support for date parameters. Recall from my first post,  Using parameters in SAS Visual Analytics, a parameter is a variable whose value can be changed at any time by the report viewer and referenced by other report objects. These objects can be a calculated item, aggregated measure, filter, rank or display rule. And remember, every time the parameter is changed, the corresponding objects are updated to reflect that change.

Here is my updated table that lists the supported control objects and parameter types for SAS Visual Analytics 7.4. The type of parameter is required to match the type of data that is assigned to the control.
Notice that SAS Visual Analytics 7.4 has also introduced the support for multiple value selection control objects. I’ll address these in another blog.

Using Date Parameters in your SAS Visual Analytics Reports

Let’s look at an example of a SAS Visual Analytics Report using date parameters. In this fictitious report, we have been given the requirements that the user wants to pick two independent date periods for comparison. This is not the same requirement as filtering the report between a start and end date. This report requirement is such that a report user can pick two independent months in the source data to be able to analyze the change in Expense magnitude for different aggregation levels, such as Region, Product Line and Product.

In this example, we will compare two different Month,Year periods. This could easily be two different Quarter,Year or Week,Year periods; depending on the report requirements, these same steps can be applied.

In this high level breakdown, you can see in red I will create two date parameters from data driven drop-down lists. From these parameter values, I will create two calculated data items, shown in purple, and one aggregated measure that will be used in three different report objects, shown in green.

Here are the steps:

1.     Create the date parameters.

2.     Add the control objects to the report and assign roles.

3.     Create the dependent data items, i.e. the calculated data items and aggregated measure.

4.     Add the remaining report objects to the canvas and assign roles.

Step 1: Create the date parameters

First we will need to create the date parameters that will hold the values made by the report viewers. From the Data Pane, use the drop-down menu and select New Parameter….

Then create your first parameter as shown below. Give it a name.

Next, select minimum and maximum values allowed for this parameter. I used the min and max available in my data source, but you could select a more narrow range if you wanted to restrict the users to only have access to portions of the data, just so long as the values are in your data source since, in this example, we will use the data source to populate the available values in the drop-down list.

Then select a current value, this will serve as the default value that will populate when a user first opens the report.

Finally, select the format in which you want your data item to be formatted. I selected the same format as my underlying data item I will be using to populate the drop-down list.

Notice how your new parameters will now be available from your Data Pane.

Step 2: Add the control objects to the report and assign roles

Next, drag and drop the drop-down list control objects onto the report canvas. In this example, we are not using the Report or Section Prompt areas since I do not want to filter the objects in the report or section automatically. Instead, I am using these prompt values and storing them in a parameter. I will then use those values to create new calculated data items and an aggregated measure.

Once your control objects are in the report canvas, then use the Roles Pane to assign the data items to the roles. As you can see from the screenshot, we are using the Date by Month data item to seed the values of the drop-down list by assigning it to the Category role, this data item is in our data source.

Then we are going to assign our newly created parameters, Period1Parameter and Period2Parameter to the Parameter role. This will allow us to use the value selected in our calculations.

Step 3: Create the dependent data items, i.e. the calculated data items and aggregated measure

Now we are free to use our parameters as we like. In this example, I am prompting the report viewer for two values: Period 1 and Period 2 which are the two periods the user would like compared in this report. So, we will need to create two calculated data times from a single column in our source data. Since we want to display these as columns next to each other in a crosstab object and use them for an aggregated measure, this technique can be used.

Calculated Data Item: Period 1 Expenses

From the Data Pane, use the drop-down menu and select New Calculated Item…. Then use the editor to create this expression: If the Date by Month for this data row equals the parameter value selected for Period 1, then return the Expenses; else return 0.

Calculated Data Item: Period 2 Expenses

Repeat this using the Period2Parameter in the expression.

Aggregated Measure: Period Difference

Next, we want to calculate the difference between the two user selected Period Expenses. To do this, we will need to create an aggregated measure which will evaluate based on the report object’s role assignments. In other words, it will be calculated “on-the-fly” based on the visualization.

Similar to the calculated data items, use the Data Pane and from the drop-down menu select New Aggregated Measure…. Use the editor to create this expression. Notice that we are using our newly created calculated data items we defined using the parameter values. This expression does not use the parameter value directly, but indirectly through the calculated data item.

Step 4: Add the remaining report objects to the canvas and assign roles

No that we have:

  • our Control Objects to capture the user input.,
  • the Parameters to store the values.,
  • and the Calculated Data Items and Aggregated Measure created…

we can add our report objects to the canvas and assign our roles.

You can see I used all three new measures in the crosstab object. I used the aggregated measure in the bar chart and treemap but notice the different aggregation levels. There is even a hierarchy assigned to the treemap category role. This Period Difference aggregated measure calculation is done dynamically and will evaluate for each visualization with its unique role assignments, even while navigating up and down the hierarchy.

Here are some additional screenshots of different period selections.

In this first screenshot you can see the parallel period comparison between December 2010 and 2011.

In these next two screenshots, we are looking at the Thanksgiving Black Friday month of November. We are comparing the two years 2010 and 2011 again. Here we see that the Board Product from the Game Product Line is bright blue indicating an increase in magnitude of Expenses in the most recent period, Nov2011.

By double clicking on Board in the treemap, we are taken to the next level of the hierarchy, Product Description, where we see a the largest magnitude of Expenses is coming from Backgammon and Bob Board Games.

In these final two screenshots we are comparing consecutive periods, November 2011 with December 2011. We can see from the bar chart easily the Region and Product Line where there is the greatest increase in Expenses.

I’ve configured a brush interaction between all three visualizations so that when I select the tallest bar it will highlight the corresponding data values in the crosstab and treemap.

Conclusion

Now you can use date parameters in your Visual Analytics Reports. There are several applications of this feature and this is only one way you can use parameters to drive business intelligence. Using this technique to create columns based on a user selected value is great when you need to compare values when your source data isn’t structured in this manner.

Using Date Parameters in your SAS Visual Analytics Reports was published on SAS Users.

11月 022017
 

In my first four posts in the education analytics blog series, we learned how education customers are using SAS, the positive impact for their users and institution, and some of their best practices. In talking to customers for this series, one of the many things we've learned is that they [...]

Education analytics: Use of SAS evolving was published on SAS Voices by Georgia Mariani

10月 172017
 

moving content between SAS Viya environmentsIn a SAS Viya 3.2 environment two types of content can be created: SAS Visual Analytics Reports and Data Plans. For administrators, who may want to manage that content within a folder structure, there are some things to keep in mind. In the current release, both types of content can be moved around in folders, but the objects cannot be copied. In addition, SAS Viya 3.2 supports the promotion of SAS Visual Analytics Reports, but doesn’t support the promotion of Data Plans (support for Plans is coming in SAS Viya 3.3). So, what if I want to copy a report between, say my personal folders, to a production folder?

If you want copy a Report or Data Plan within an environment there is an easy way. When the object is open in edit mode you can do a Save As to save a copy to a different location in the folder structure.

Between environments, Reports can be exported and imported using the SAS Visual Analytics, when you are editing your content (Report or Data Plan) you can access a “diagnostics” window. The diagnostics window will show you the json (or xml) used to render the Report or Plan. To enter the diagnostics window use the keystrokes:

  • ctl+alt+d for SAS Visual Data Builder.
  • ctl+alt+b for SAS Visual Analytics.

In the steps below I will use the diagnostics window to save a Data Plan so that it can be loaded to a different SAS Viya Environment. The steps for a SAS Visual Analytics report are very similar.

In SAS Visual Data Builder when editing your Data Plan select ctl-alt-d to open the SAS Visual Data Builder Diagnostics window. The source tab of the window shows the json that will render the data plan.

Click Save to save the json to a text file and close the dialog. The json file will be saved in the browsers default downloads folder.

Copy the saved text file to a location accessible to the SAS Viya environment where you want to import the plan. In that environment, open Data Builder and click New to open a new Data Plan.

Click ctl-alt-d on the empty data plan and cut and paste the json from your text file replacing the json in the diagnostics window.

Click Parse to check the json.A message should be displayed indicating that the  “plan text was parsed successfully.”  Once you have parsed the text, click Run and the plan is loaded into SAS Visual Data Builder.

In SAS Visual Data Builder, select Save As and save the plan to any location in the folder structure.

The assumption with this approach is that the data is available in the same location in both environments.

You can do much the same with SAS Visual Analytics reports. The key-stroke is ctl-alt-b to open the SAS Visual Analytics Diagnostics window.  You can see the report xml or json on the BIRD tab.

To copy a single report between environments, you can select json and then save the json to a file. In the target environment open a new report, paste the json in the BIRD tab, parse and load and then save the report to a folder. This can be a useful approach if you want to relocate a report to a different location in your target environment. The transfer service currently will only import reports to the same folder location in the target that they are located in the source environment.

I hope you found this tip useful.

A tip for moving content between SAS Viya environments was published on SAS Users.

10月 132017
 

Every year in early October, the eyes of the world turn to Sweden and Norway, where the Nobel Prize winners are announced to the world. The Nobel Prize is considered the world's most prestigious award. Since 1901, the Prize has been presented to individuals and organizations that have made significant achievements in the fields of physics, chemistry, physiology or medicine, world peace and literature in each year (there were several exceptions during war years). In 1968, Sveriges Riksbank established the Sveriges Riksbank Prize in Economic Sciences in memory of Alfred Nobel, founder of the Nobel Prize. Today, individuals or organizations who are awarded Nobel Prizes and the Prize in Economic Sciences are called Nobel Laureates.

So far, more than 900 Nobel Laureates have been awarded. In this post, I wanted to learn a little more about these impressive individuals. Where were these Nobel Laureates from? Why do they get awarded? Is there any common characteristics you’ll find in these Laureates? Below you’ll find a preliminary analysis of Nobel Laureates using SAS Visual Analytics.

The analysis is based on data from List_of_Nobel_laureates, List of Nobel laureates by university affiliation and Nobel Laureates datasets at Kaggle, which definitely has some missing and inconsistent values. I have cleaned the data to correct for some obvious inconsistency as possible for my analysis.

How many Nobel Laureates have their been so far?

Recently, 12 new Nobel Laureates were awarded by the 2017 Nobel Prizes and Prize in Economic Sciences, and that makes 923 Laureates in total since the first Nobel Prize in 1901. Some Laureates share one prize, so we see more shared Laureates total in below table. While we see 27 organization winners of the Peace prize, most Laureates are individual winners.

analysis of the Nobel Laureates

The chart below shows the overall trend of annual total Nobel Laureates is increasing year-over-year, as more and more winners are sharing the Prize. The purple circle on the plot indicates that there are shared winners in that year. The average number of winners is about eight each year. Yet there was only one winner in 1916 for the Literature Prize. The most winners came in 2001, with 15 Laureates sharing the prizes. I also note from the chart that during the First World War, there were very few Nobel Prizes awarded, and during the Second World War, there were none.

Moreover, we know that most Nobel Laureates are awarded one Nobel Prize, yet I learned from childhood that the female scientist Marie Curie received two Nobel Prizes. If you search the datasets for winners awarded more than one Prize, you’ll find four scientists accomplished this feat. They are: Marie Curie, Linus Pauling, John Bardeen and Frederick Sanger.

Do Nobel Laureates live longer?

The answer is YES, per the research by Prof. Andrew Oswald from University of Warwick. Winning a Nobel prize adds about 1.5 years to the lifespan of Nobel Laureates compared to those who were merely nominated. Of course, it is not because of the monetary benefits that come with the Nobel Prize, but because of ‘the deep links between mind and body’, and that ‘happiness’ may make people live longer, which makes sense to me.

Since I don’t have the data of Nobel Prize nominees, let’s only test the lifespan of the Nobel Laureates and the ages they got awarded. The average life age of all Nobel Laureates is nearly 80, much older than the global average life expectancy of 71.4 years-old (according to World Health Organization 2015). Digging a bit more, we see Martin Luther King is the Nobel laureate (Peace, 1964) who died at youngest age. He was assassinated at 39 years old. Laureates who lived longest are Rita Levi-Montalcini (Medicine, 1986) and Ronald H. Coase (Economics, 1991), who both lived to 103 years old. You may also notice that the distribution of the Laureates’ lifespan is left skewed, the Nobel Prize winners certainly live longer than most.

In addition, something more worth noting:

  • The most laureates with the longest lifespans are from the Economics and Medicine categories. The Nobel Prize winning economists live longer than other categories’ winners on average. The average lifespan of these economists is about 86 years-old, five years longer than the second category of Medicine.
  • Economics winners are winning the awards at the highest age – 67 years-old on average. More digging shows that the oldest awarded age is 90 when Leonid Hurwicz (Economics, 2007) was awarded his Prize. We see the average awarded age of Physics winners is 56, which is 10+ years younger than that of the Economics winners. Thus, we get the impression that economists need more time to have outstanding achievements.
  • If we compare the time span between Laurates’ average awarded ages and their lifespan, the Physics Prize winners enjoy the longest life time after winning the award – about 20 years on average.
  • It is also worth noting that the Nobel Peace winners have the largest span of awarded age, about 70 years’ span. That’s because the youngest Nobel Laureates Malala Yousafzai, who got awarded of Nobel Peace Prize at 17 years-old in 2014.

The chart below is created in SAS Visual Analytics and shows the awarded ages of all individual Nobel Laureates in different prize categories. The reference line is the average awarded age of 59. It is very easy to note that no Nobel Prize was awarded during 1940-1943 due to the Second World War.

From which universities have Nobel Laureates graduated?

Next, let’s look at the educational background of Nobel Laureates. The left chart below obviously shows that much more Nobel winners hold Doctorate degrees than those of Bachelor or Master degrees. If we see the chart for Literature and Peace categories on the right, the difference is not that big. From the data, we know that the educational background of Nobel Laureates in Physics, Chemistry, Medicine and Economics categories (I call these four categories the scientific categories for easier description later) has the higher percentage of doctorate than that of winners in the Literature and Peace categories.

To learn more about the universities the Laureates in these scientific categories are graduated from, I ranked the top 10 university affiliations for the scientific categories in below chart, and their distribution among these categories, as well as the countries in which these universities are located.

The top 10 university affiliations were selected basing on the highest degree of the scientific categories’ Laureates obtained. That is, if one winner held a Master degree from Harvard University and a Doctorate degree from University of Cambridge, he/she is counted in University of Cambridge but not in the Harvard University. From the parallel coordinates plot, you may have noticed that the Physics in University of Cambridge and the Medicine in Harvard University are their greatest majors respectively. On the right, it shows the countries where these top 10 university affiliations are in United States, United Kingdom, France and Germany. The bar charts on the left show the percentage of educational degrees (Doctorate, Master, Bachelor) of each in the scientific categories (according to the available dataset). In the bottom chart, top 10 universities are ranked by their percentages. Perhaps now you have a great university in your mind for future education?

Next, I created the chart below to show the top eight countries having the university affiliations that more Nobel Prize winners graduated from. (Here the chart only shows for scientific categories, thus it excludes the Nobel Literature Prize and Peace prize.). An obvious trend we see from the chart is that the United States has the most Laureates spanning in the scientific categories after the Second World War, while Germany has more Laureates in the scientific categories comparatively before World War II.

Why do the Nobel Laureates get awarded?

Per the ‘nobelprize.org’, in his excerpt of the will, Alfred Nobel (1833-1896) dictates that his entire remaining estate should be used to endow "prizes to those who, during the preceding year, shall have conferred the greatest benefit to mankind." So Alfred's interests are reflected in the Prize, which said “The whole of his remaining realizable estate constitutes a fund, and the annually interest shall be divided into five equal parts, which shall be apportioned as follows: one part to the person who shall have made the most important discovery or invention within the field of physics; one part to the person who shall have made the most important chemical discovery or improvement; one part to the person who shall have made the most important discovery within the domain of physiology or medicine; one part to the person who shall have produced in the field of literature the most outstanding work in an ideal direction; and one part to the person who shall have done the most or the best work for fraternity between nations, for the abolition or reduction of standing armies and for the holding and promotion of peace congresses.”

Since it’s not easy to seek evidence in the datasets that Nobel Laureates are awarded by fulfilling Alfred’s will, what I do is to use SAS Visual Analytics text topics analysis performing some preliminary text analysis of the ‘Motivation’ field in the dataset for a validation to some extent. The ‘Motivation’ is given by ‘nobelprize.org’ for why the Laureate gets awarded. The analysis shows that the most frequently mentioned word is ‘discovery’, while the most 5 frequently appeared words include ‘work’, ‘development’, ‘contribution’, and ‘theory’. And from the topics analysis result, the top 10 topics are about ‘discovery’, ‘human”, “structure”, “economic”,” technique”, etc., which are reflecting Alfred Nobel‘s will in establishing the Prize. Moreover, the sentimental analysis result shows that the statements in the ‘Motivation’ field are mainly neutral (being ‘objective’), even though there are few positive and negative sentimental statements.

 

I hope you’ve found this analysis of Nobel Laureates data interesting. I believe there are still many other perspectives you can analyze to get insights. Is there anything interesting you see?

A preliminary analysis of the Nobel Laureates was published on SAS Users.

10月 132017
 

In this education analytics series of blog posts, we have been on a journey to learn how education customers are turning their data into insights to be a more data-informed and analytical organization. So far on our journey, we learned how education customers are using SAS, the positive impact that SAS [...]

Education analytics: Best practices for using data visualization and analytics was published on SAS Voices by Georgia Mariani

10月 092017
 

My new SAS Press book “An Introduction to SAS Visual Analytics” (written in collaboration with Tricia Aanderud and Rob Collum) covers all of the different aspects of SAS® Visual Analytics, including how to develop reports, load data, and handle administration. Below is an example of the types of tips that you can find [...]

The post An introduction to SAS Visual Analytics: the Parallel Period function of the Derived Item calculations appeared first on SAS Learning Post.

9月 082017
 

In SAS Visual Analytics 8.1, report creators have the ability to include drive-distance and drive-time in their geographical maps, but only if their site has an Esri ArcGIS Online account and they have valid credentials for the account.

In the user Settings for SAS Visual Analytics Geographic Mapping 8.1 release, there are three choices for selection of a geographic map provider.  The map provider creates the background map for geo maps and for network diagrams that display a map.

The map provider options are:

  • OpenStreet Map service, hosted at SAS.
  • Esri ArGIS Online Services, which only requirFinale acceptance of the terms and conditions.
  • Esri premium services, which requires a credential validation.

If Esri premium services is selected, there is an additional prompt for valid credentials, and you must still accept the Esri ArcGIS Online Services terms in order to select the premium services checkbox.

It’s also worth pointing out here, that even if you have Esri premium credentials, in order for these credentials to be validated in SAS Visual Analytics, you must also be a member of the ESRI Users custom group.  Users can be added to this group in SAS Environment Manager, as shown below.

Note that without the Esri ‘premium’ service and validated credentials, when you right-click and Create geographic selection in your report map, you are only able to select the Distance selection, which displays the radial distance for the selection point.

With premium services in effect, you can also select drive-time or drive-distance.  An example of a drive-time selection is shown here.  Drive-time creates an irregular selection based on the distance that can be driven in the specified amount of time.

A drive-distance example is shown below.  Drive-distance creates an irregular selection based on the driving distance using roads.

When selecting drive-time or drive distance, you can also add breaks to show, as in the example below, the 5-mile distance, the 10-mile distance, and the 15-mile distance on the maps.

It’s also worth pointing out, that if a viewer of the report has not had Esri premium credentials validated, the viewer will be unable to view the drive-distances and drive-time features.  The settings for users of the report viewer are also stored in the Report Viewer Geographic Mapping user settings.

If a user is adding a connection to the server in SAS Mobile BI 8.15 and their account is a member of the Esri Users group, they will be prompted for their Esri premium credentials when adding the server connection:

I hope you’ve found this post helpful.

How do I access the Premium Esri Map Service for my SAS Visual Analytics reports? was published on SAS Users.

8月 312017
 

In my first post of this blog series, we learned how three education customers are using SAS. Today, we'll hear about the positive impact that SAS and analytics are providing for users and the education institutions. In this post, you'll hear from: Linda Sullivan, Assistant Vice President for Institutional Knowledge Management, [...]

Education analytics: The impact of using SAS and analytics was published on SAS Voices by Georgia Mariani

8月 172017
 

In this blog post I am going to cover the example of importing data into SAS Viya using Cloud Analytic Services (CAS) actions via REST API. For example, you may want to import data into a CASLib via REST API.  This means you can perform an import of data outside of the SAS Self-Service Import user interface environment using REST API.  Once this data is loaded into CAS it is available for use in applications such as SAS Visual Analytics and SAS Visual Data Builder.

Introduction

To import data into SAS Viya via REST API, you need to make a series of REST API calls:

1.     Start CAS Session
2.     Load Data into a CASLib
3.     End CAS Session

I will walk through these various REST API calls in the sections below using the REST API testing application HTTPRequestor, which is a free add-on to the Mozilla Firefox browser.

Before I perform any of my REST API calls, I need to Base-64 encode my credentials. The input for encoding the credentials is: I used the site https://www.base64encode.org/ to encode my credentials.  Note: You can use other methods (e.g., Python) to encode your credentials. Use the preferred method by your organization to ensure you are meeting their security protocols.

Below is the header Authorization information I will be sending with each of my requests.

Authorization Header

1.     Start CAS Session

First, I need to start a CAS Session. Below is an example request for starting a CAS Session:

POST https://<YourCASServer:Port>/cas/sessions

Authorization: Basic <Base-64EncodedCredentials>
 Content-Type: application/json

{}

This request returns the CASSessionUUID needed in the next step.

I construct my request in HTTPRequestor as follows and submit the request:

Start CAS Session Request/Response

Here is a screenshot of the raw transaction information.

Start CAS Session Raw Transaction

I need to copy the CAS Session UUID information that was returned for use in the subsequent REST API calls since their CAS Actions must be performed within a CAS Session.

2.     Load Data into a CASLib

Now that I have started my CAS session and have its UUID, I can load the table to CAS. Below is an example request for the table.loadTable CAS Action:

POST 
https://<YourCASServer:Port>/cas/sessions/<CASSessionUUID>/actions/table.load
Table

Authorization: Basic <Base-64EncodedCredentials>
 Content-Type: application/json

{"casLib":"<InputCASLib>","importOptions":{"fileType":"<FileType>"},"path":"<InputFilePathAndName>",
 "casout":{"caslib":"<OutputCASLib>","name":"<OutputTableName>","promote":true}}

 

This request returns a log message: “NOTE: Cloud Analytic Services made the file <InputFilePathAndName> available as table <OutputTableName> in caslib <OutputCASLib>.”

For my example, I will load the SAS data set BASEBALL located in the helpdata CASLib to the Public CASLib and call the CAS Table SAS_BASEBALL.  I am copying the data to the Public CASLib to make it more readily available to all CAS users. Let’s first confirm that the SAS_BASEBALL table does not currently exist in the Public CASLib.

Public CASLib Before LoadTable CAS Action Called

I construct my request in HTTPRequestor as follows and submit the request:

Load Table Request/Response

Here is a screenshot of the raw transaction information.

Load Table Raw Transaction

Next, I will confirm that the SAS_BASEBALL data set is now loaded in the Public CASLib.

Public CASLib After LoadTable CAS Action Called

The SAS_BASEBALL data set is now available for use in applications such as SAS Visual Analytics and SAS Visual Data Builder.

3.     End CAS Session

Finally, I need to terminate my CAS Session. Below is an example request for the session.endSession CAS Action:

POST https://&lt;YourCASServer:Port&gt;/cas/sessions/&lt;CASSessionUUID&gt;/actions/session.endSession

Authorization: Basic &lt;Base-64EncodedCredentials&gt;
 Content-Type: application/json

{}

 

This request returns a status of 0 indicating there was no error and the CASSessionUUID specified in the request has ended.

I construct my request in HTTPRequestor as follows and submit the request:

End CAS Session Request/Response

Here is a screenshot of the raw transaction information.

End CAS Session Raw Transaction

Conclusion

These calls can be strung together so you could schedule their execution. For more information on SAS Viya and REST APIs, refer to the following documentation the SAS Cloud Analytics REST API documentation.

Load Data into SAS Viya via REST API was published on SAS Users.

8月 152017
 

CAS data modelingThe CAS physical data model, i.e.what features CAS offers for data storage, and how to use them to maximize performance in CAS (and consequently SAS Visual Analytics 8.1 too).

So, specifically let’s answer the question:

What CAS physical table storage features can we use to get better performance in CAS and SAS Visual Analytics/CAS?

CAS Physical Table Storage Features

The following data storage features affect how CAS tables are physically structured:

  • Compression
  • Partitioning
  • Sorting
  • Repeated Tables
  • Extended Data Types (Varchar)
  • User Defined Formats

Compression — the Storage Option that Degrades Performance

data public.MegaCorp (compress=yes);
   set baselib.MegaCorp;
run;

Partitioning and Sorting

Partitioning is a powerful tool for improving Bar Charts, Decision Tree, Linear Regression) provide grouping as well as classification functionality.

When performing analyses/processing, CAS first groups the data into the required BY-groups. Pre-partitioning on commonly-used BY-groups means CAS can skip this step, vastly improving performance.

Within partitions, tables can be sorted by non-partition-key variables. Pre-sorting by natural ordering variables (e.g. time) allows CAS to skip the ordering step in many cases just like partitioning allows CAS to skip the grouping step.

For a full use-case, consider a line graph that groups sales by region and plots by date. This graph object would benefit greatly from a CAS table that is pre-partitioned by region and pre-sorted by date.

Join Optimization

Partitioning can also support join operations since both the CAS FedSQL Merge Join algorithm utilize BY-GROUP operations to support their processing.

Pre-partitioning tables in anticipation of joins will greatly improve join performance. A good use case is partitioning both a large transaction table and an equally large reference table (e.g. an enormous Customer table) by the common field, customerID. When a DATA Step MERGE or a FedSQL join is performed between the two tables on that field, the join/merge will take advantage of partitioning for the BY-GROUP operation resulting in something similar to a partition-wise join.

Like Compression, partitioning and sorting can be implemented via CAS actions as well as data set options. Using the data set options is demonstrated below:

data mycas.bigOrderTable (partition=(region division) orderby=(year quarter month));
   set CASorBase.bigOrderTable;
run;

Repeated Tables

By default, in distributed CAS Server deployments, CAS divides incoming tables into blocks and distributes those blocks among its DUPLICATE data set option or the Repeated Tables have two main use-cases in CAS:

1.     Join Optimization
2.     Small Table Operation Optimization

Join Optimization

For join operations, the default data distribution scheme can result in significant network traffic as matching records from the two tables travel between worker nodes to meet. If one of the two tables was created with the DUPLICATE/REPEAT option, then every possible record from that table is available on every node to the other table. There is no need for any network traffic.

Small Table Operation Optimization

For small tables, even single table operations can perform better with repeated instead of divided distribution. LASR actually implemented the “High Volume Access to Smaller Tables” feature for the same reason. When a table is repeated, CAS runs any required operation on a single worker node against the full copy of the table that resides there, instead of distributing the work.

As stated, repeated tables can be implemented with the DUPLICATE data set option, it can also be implemented with the REPEAT option on the PROC CASUTIL LOAD statement. The CASUTIL method is shown below:

proc casutil ;
   load data=sashelp.prdsale outcaslib=”caspath”
           casout=”prdsale” replace REPEAT ;
quit ;

Extended Data Types (VARCHAR)

With Viya 3.2 comes SAS’ first widespread implementation of variable length character fields. While Base SAS offers variable length character fields through compression, Viya 3.2 is the first major SAS release to include a save storage space, it also improves performance by reducing the size of the record being processed. CAS, like any other processing engine, will process narrower records more quickly than wide records.

User Defined Formats

User defined formats (UDFs) exist in CAS in much the same way they do in Base SAS. Their primary function, of course, is to provide display formatting for raw data values. Think about a format for direction. The raw data might be: “E”, “W”, “N”, “S” while the corresponding format values might be “East”, “West”, “North”, “South.”

So how might user defined formats improve performance in CAS? The same way they do in Base SAS, and the same way that VARCHAR does, by reducing the size of the record that CAS has to process. Imagine replacing multiple 200 byte description fields with 1 byte codes. If you had 10 such fields, the record length would decrease 1990 bytes ((10 X 200) – 10). This is an extreme example but it illustrates the point: User defined formats can reduce the amount of data that CAS has to process and, consequently, will lead to performance gains.

CAS data modeling for performance was published on SAS Users.