SAS Viya

10月 052018
 

In my earlier blog, I described how to create maps in SAS Visual Analytics 8.2 if you have an ESRI shapefile with  granular geographies, such as counties, that you wish to combine into regions. Since posting this blog in January 2018, I received a lot of questions from users on a range of mapping topics, so I thought a more general post on using – and troubleshooting - custom polygons in SAS Visual Analytics on Viya was in order. Since version 8.3 is now generally available, this post is tailored to the 8.3 version of SAS Visual Analytics, but the custom polygon functionality hasn’t really changed between the 8.2 and 8.3 releases.

What are custom polygons?

Custom polygons are geographic boundaries that enable you to visualize data as shaded areas on the map. They are also sometimes referred to as a choropleth maps. For example, you work for a non-profit organization which is trying to decide where to put a new senior center. So you create a map that shows the population of people over 65 years of age by US census tract. The darker polygons suggest a larger number of seniors, and thus a potentially better location to build a senior center:

SAS Visual Analytics 8.3 includes a few predefined polygonal shapes, including countries and states/provinces. But if you need something more granular, you can upload your own polygonal shapes.

How do I create my own polygonal shapes?

To create a polygonal map, you need two components:

  1. A dataset with a measure variable and a region ID variable. For example, you may have population as a measure, and census tract ID as a region ID. A simple frequency can be used as a measure, too.
  2. A “polygon provider” dataset, which contains the same region ID as above, plus geographic coordinates of each vertex in each polygon, a segment ID and a sequence number.

So where do I get this mysterious polygon provider? Typically, you will need to search for a shapefile that contains the polygons you need, and do a little bit of data preparation. Shapefile is a geographic data format supported by ESRI. When you download a shapefile and look at it on the file system, you will see that it contains several files. For example, my 2010 Census Tract shapefile includes all these components:

Sometimes you may see other components present as well. Make sure to keep all components together.

To prepare this data for SAS Visual Analytics, you have two options.

Preparing shapefile for SAS Visual Analytics: The long way

One method to prepare the polygon provider is to run PROC MAPIMPORT to convert the shapefile into a SAS dataset, add a sequence ID field and then load into the Cloud Analytic Services (CAS) server in SAS Viya. The sequence ID is mandatory, as it helps SAS Visual Analytics to draw the lines connecting vertices in the correct order.

A colleague recently reached out for help with a map of Census block groups for Chatham County in North Carolina. Let’s look at his example:

The shapefile was downloaded from here. We then ran the following code on my desktop:

libname geo 'C:\...\Data;
 
proc mapimport datafile="C:\...\Data\Chatham_County__2010_Census_Block_Groups.shp"
out=work.chatham_cbg;
run;
 
data geo.chatham_cbg;
set  chatham_cbg;
seqno=_n_;
run;

We then manually loaded the geo.chatham_cbg dataset in CAS using self-service import in SAS Visual Analytics. If you are not sure how to upload a dataset to CAS, please check the %SHIMPR. The macro will automatically run PROC MAPIMPORT, create a sequence ID variable and load the table into CAS. Here’s an example:

%shpimprt(shapefilepath=/path/Chatham_County__2010_Census_Block_Groups.shp, id=GEOID, outtable=Chatham_CBG, cashost=my_viya_host.com,   casport=5570, caslib='Public');

For this macro to work, the shapefile must be copied to a location that your SAS Viya server can access, and the code needs to be executed in an environment that has SAS Viya installed. So, it wouldn’t work if I tried to run it on my desktop, which only has SAS 9.4 installed. But it works beautifully if I run it in SAS Studio on my SAS Viya machine.

Configuring the polygon provider

The next step is to configure the polygon provider inside your report. I provided a detailed description of this in my earlier blog, so here I’ll just summarize the steps:

  • Add your data to the SAS Visual Analytics report, locate the region ID variable, right-click and select New Geography
  • Give it a name and select Custom Polygonal Shapes as geography type
  • Click on the Custom Polygon Provider box and select Define New Polygon Provider
  • Configure your polygon provider by selecting the library, table and ID column. The values in your ID column must match the values of the region ID variable in the dataset you are visualizing. The ID column, however, does not need to have the same name as in the visualization dataset.
  • If necessary, configure advanced options of the polygon provider (more on that in the troubleshooting section of this blog).

If all goes well, you should see a preview of your polygons and a percentage of regions mapped. Click OK to save your geographic item, and feel free to use it in the Geo Map object.

I followed your instructions, but the map is not working. What am I missing?

I observed a few common troubleshooting issues with custom maps, and all are fairly easy to fix. The table below summarizes symptoms and solutions.
 

Symptom Solution
In the Geographic Item preview, 0% of the regions are mapped. For example:
Check that the values in the region ID variable match between the main dataset and the polygon provider dataset.
I successfully created the map, but the colors of the polygons all look the same. I know I have a range of values, but the map doesn’t convey the differences. In your main dataset, you probably have missing region ID values or region IDs that don’t exist in the polygon provider dataset. Add a filter to your Geo Map object to exclude region IDs that can’t be mapped.

Only a subset of regions is rendered. You may have too many points (vertices) in your polygon provider dataset. SAS Visual Analytics can render up to 250,000 points. If you have a large number of polygons represented in a detailed way, you can easily exceed this limit. You have two options, which you can mix and match:

(1)    Filter the map to show fewer polygons

(2)    Reduce the level of detail in the polygon provider dataset using PROC GREDUCE. See example here. Also, if you imported data using the %shpimprt macro, it has an option to reduce the dataset. Here’s a handy link to In the Geographic Item preview, the note shows that 100% of the regions are mapped, but the regions don’t render, or the regions are rendered in the wrong location (e.g., in the middle of the ocean) and/or at an incorrect scale.

This is probably the trickiest issue, and the most likely culprit is an incorrectly specified coordinate space code (EPSG code). The EPSG code corresponds to the type of projection applied to the latitude and longitude in the polygon provider dataset (and the originating shapefile). Projection is a method of displaying points from a sphere (the Earth) on a two-dimensional plane (flat surface). See this tutorial if you want to know more about projections.

There are several projection types and numerous flavors of each type. The default EPSG code used in SAS Visual Analytics is EPSG:4326, which corresponds to the unprojected coordinate system.  If you open advanced properties of your polygon provider, you can see the current EPSG code:

Finding the correct EPSG code can be tricky, as not all shapefiles have consistent and reliable metadata built in. Here are a couple of things you can try:

(1)    Open your shapefile as a layer in a mapping application such as ArcMap (licensed by ESRI) or QGIS (open source) and view the properties of the layer. In many cases the EPSG code will appear in the properties.

(2)    Go to the location of your shapefile and open the .prj file in Notepad. It will show the projection information for your shapefile, although it may look a bit cryptic. Take note of the unit of measure (e.g., feet), datum (e.g., NAD 83) and projection type (e.g., Lambert Conformal Conic). Then, go to https://epsg.io/ and search for your geography.  Going back to the example for Chatham county, I searched for North Carolina. If more than one code is listed, select a few codes that seem to match your .prj information the best, then go back to SAS Visual Analytics and change the polygon provider Coordinate Space property. You may have to try a few codes before you find the one that works best.

I ruled out a projection issue, the note in Geographic Item preview shows that 100% of the regions are mapped, but the regions still don’t render. Take a look at your polygon provider preparation code and double-check that the order of observations didn’t accidentally get changed. The order of records may change, for example, if you use a PROC SQL join when you prepare the dataset. If you accidentally changed the order of the records prior to assigning the sequence ID, it can result in an illogical order of points which SAS Visual Analytics will have trouble rendering. Remember, sequence ID is needed so that SAS Visual Analytics can render the outlines of each polygon correctly.

You can validate the order of records by mapping the polygon provider using PROC GMAP, for example:

proc gmap map=geo.chatham_cbg data=geo.chatham_cbg;
   id geoid;
   choro geoid / nolegend levels=1;
run;

For example, in image #1 below, the records are ordered correctly. In image #2, the order or records is clearly wrong, hence the lines going crisscross.

 

As you can see, custom regional maps in SAS Visual Analytics 8.3 are pretty straightforward to implement. The few "gotchas" I described will help you troubleshoot some of the common issues you may encounter.

P.S. I would like to thank Falko Schulz for his help in reviewing this blog.

Troubleshooting custom polygon maps in SAS Visual Analytics 8.3 was published on SAS Users.

9月 132018
 

With the release of SAS Viya 3.4, you can easily build large-scale machine learning models and seamlessly publish and run models to Hadoop, or other external databases such as Teradata, without the data ever leaving the Hadoop environment. In this process, SAS Viya:

1) Converts the model into MapReduce Code.

2) Executes the MapReduce code.

3) Returns a new, scored dataset in Hadoop.

SAS Viya is a new, distributed in-memory product that allows users to easily build predictive models at scale. Using the SAS Model Studio interface, I can build complex models without the need to write large amounts of underlying code.

For this blog post, I'll go through the steps to build my model using a telecommunications dataset to predict customer churn. Under the “Data” tab, I can see all of my variables, assign the proper roles, and view the dataset.

 

 

 

 

 

 

 

 

 

 

 

 

 

With the data prepared, I build a pipeline to perform data preprocessing steps such as imputation and binning and build several predictive models, including Regression, Neural Networks and Gradient Boosting. Pipelines are powerful because they automate the heavy lifting of the model building process, allowing you to solve problems faster. In addition, pipelines are re-usable across different users and datasets, allowing the adoption of best practices across an organization.

 

After building the models, I combine the models into one ensemble model with ease, and compare their performance on the validation sample. I determine that the gradient boosting model is the most accurate based on the misclassification rate. You can pick from a large number of accuracy criteria, including KS Statistic, AUC, MCR or F1.

 

After having identified the best model, I  publish the model to Hadoop. This allows me to perform future scoring at the data source, meaning data does not have to leave Hadoop. I could have configured the system to publish the model directly from SAS Model Studio; however, I publish and score the model via SAS code for maximum flexibility. With SAS Studio, I can easily control, and change, where I write my resulting models and datasets in Hadoop.

In the “Compare Models” tab, I then download the score code, which provides me with the following:

  1. sas file containing DS2 code that performs all the data preprocessing steps, such as binning and imputation, in the pipeline above. I load this .sas file into a location that can be viewed in SAS Studio.
  2. A .sashdat file in the “Models” Caslib, that is a binary representation of our model called ASTORE, used to score our model in Hadoop.

 

Opening up the dm_epscore.sas file in SAS Studio, the comments in the top tell me the ASTORE file needed to publish the model.

 

 

 

This scoring file allows the data preparation within the pipeline above to be published to Hadoop as well. In this case, the file is binning the variables before building the Gradient Boosting.

 

 

 

 

The scoring file then invokes the ASTORE file needed to score the model in Hadoop.

 

 

 

Now, I switch to SAS Studio to publish and score my model in Hadoop.  The full code can be found here.

Below is the syntax to publish the model.  I'm sure to set the classpath variable to the appropriate jar and config files for my Hadoop cluster. Note that you will need permission to read and write from the modeldir directory. Publishing the model converts the .sas scoring code and the ATORE file into MapReduce code for execution in the cluster.

 

 

 

 

 

 

 

 

 

 

This publishing code will create a directory called “telco_churn” in my home directory in HDFS, /user/ankram. In a SAS Viya environment co-located with Hadoop, the “CASUSERHDFS” Caslib is by default pointed to this location, allowing me to ensure the “telco_churn” file was successfully published.

 

 

 

 

The next step is to score the model in Hadoop. The code below scores the “looking_glass_v4” table in Hive and create a new table called “looking_glass_v4_scored”, without the data ever leaving Hadoop.

 

 

 

 

 

 

 

 

If everything is configured properly, the log should show that the SAS Embedded Process executed correctly.

 

Using a previously setup Caslib called “Hivelib” that points to the default schema in the Hive Server, I can now load the “Looking_Glass_v4_scored” dataset into CAS to view the table.

 

 

 

 

Using the Table Viewer, I can then see the predicted probabilities of churn for each individual.

 

To conclude, many organizations have very large datasets, often times terabytes or larger, and often find that minimizing data movement is critical to successfully putting models into production. The in-database technologies for Hadoop on SAS Viya allow you and your fellow data scientists to easily prepare data and score large-scale models entirely in Hadoop, with the data never leaving the environment. You can now focus on solving more problems and are no longer at the mercy of large datasets and network latency.

 

 

 

 

Publishing and running models to Hadoop in SAS Viya was published on SAS Users.

9月 052018
 

Typically, when filters are applied in SAS Visual Analytics it affects all the records and aggregations in linked objects. For example, in a typical sales report below, when filters are applied, it changes all the measures of linked objects.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

With this kind of filtering, it becomes difficult to calculate measures which requires a different level of aggregation. In above image the expectation is that the ‘Total Customers’ should not be changing irrespective of ‘Region’, ‘State’, ‘Category’ and ‘Subcategory’ control selections. ‘Total Customers (Geo)’ should be changing only based on ‘Region’ and ‘State’ control selections. ‘Total Customers (Geo and Prod)’ should be changing based on all the controls mentioned above. In the above example only, a ‘Total Customers (Geo and Prod)’ calculation is correct.

We will learn to create measures with different levels of aggregation by using ‘Customer Penetration’ measure as an example.

          Customer Penetration = Distinct customers at selected geography and product level/ Distinct customers at selected geography level

Selective filtering may be used for creating similar reports like: Dealer Participation, Sales Contribution, etc. The below section exemplifies the creation of a customer penetration report with selective filtering.

Customer penetration using SAS Visual Analytics 8.2 (selective filtering)

Customer penetration is used to analyze whether marketing and sales strategies are working or not. Managers often uses customer penetration or dealer participation measures along with other measures to measure the popularity of a product, category or brand.

This report requirement is such that the numerator in the ‘Customer Penetration’ formula should be filtered based on region and state list control selections, while the denominator should be filtered based on region, state, category and subcategory list control selections. This is not the same requirement as filtering the whole table through common list controls. In general, if you link a table with any control, all the measures in that table will be filtered as per selected value(s) in controls. However, our requirement is not like that. Instead of linking control and tables we will use control parameters to achieve our objective.

Assume we have a customer transaction table with following variables:

 

 

 

 

 

 

Before we move, be ready with the basic report as per below image:

 

Once you are ready with the report as per the above image, create parameters for ‘Region’, ‘State’, ‘Category’, ‘SubCategory’:

Region Parameter


 

State Parameter

 


Category Parameter

 


SubCategory Parameter

 

Now create the following two calculated items derived from ‘Customer_ID’:

Geo_Customer_ID
Equivalent to ‘Customer_ID’. However, populated only for selected geography levels and rest would be filled with missing.

 

 

 

 

 

 

Geo_and_Prod_Customer_ID
Equivalent to ‘Customer_ID’. However, populated only for selected geography and product levels and rest would be filled with missing.

 

Create the following two aggregated measures:

Total Customers (Geo)
You need to subtract the distinct count related to missing ‘Geo_Customer_ID’, which is 1.

 

Total Customers (Geo and Prod)
You need to subtract the distinct count related to missing ‘Geo_and_Prod_Customer_ID’, which is 1.

 

Now you can create an aggregated measure ‘Customer Penetration’.

Customer Penetration = Total Customers (Geo and Prod) / Total Customers (Geo)

 

Final report will look like this:

 

 

 

 

 

 

 

 

 

 

 

 

 

Comparative images with default and selective filtering implementation:

 

If you compare the above images, you will find the difference in highlighted measures where the first image aggregation level is based on selective filtering, while in second image aggregation level is uniform.

Note – ‘Total Customers’ is count of distinct ‘Customer_ID’ i.e., total customers count is independent from geography and product hierarchy selection.

Conclusion

This process allows you to use control parameters in ‘If Then Else…’ statements to create a variable (calculated item) having character values. You can utilize this feature in several other applications – this is just one way you can use parameters to fulfil a business requirement.

Selective filtering in SAS Visual Analytics 8.2 was published on SAS Users.

8月 302018
 

A combination of SAS Grid Manager and SAS Viya can change the game for IT leaders looking to take on peak computing demands without sacrificing reliability or driving higher costs.  Maybe that’s why we fielded so many questions about SAS Grid Manager and SAS Viya in our recent webinar  about how the two can work together to process massive volumes of data – fast.

Participants asked us so many great questions that we wanted to share the answers here, assuming that you may have the same questions.  This is the first of two blog posts focusing on some of the very best questions we received.  Stay tuned for more soon – and if you don’t see your own burning questions posed here, just post your question in the comments and we’ll respond.

1. Do SAS Grid Manager and SAS Viya need to be collocated in the same data center?

And if they’re in different data centers, can SAS Grid Manager and SAS Viya communicate with one another? As to the question of how well SAS Grid Manager and SAS Viya can communicate if they’re in different data centers, it shouldn’t be a functional concern. If there’s network connectivity between the two data centers, they can communicate.  Just be mindful of a few things:

  • The size of data being processed in each environment.
  • How much data is going back and forth between the two.
  • The impact all that data movement can have on response times and overall performance.

In addition to performance, there's a greater sensitivity around data handling in physically separated deployments. The emergence of stricter data protection regulations increases the complexity of compliance when moving data between locations with different legal jurisdictions. It will be important to consider the additional performance implications of encryption of the transferred data as well. Ultimately, having compute as close as possible to the data it needs results in less complexity and better performance.

In this case it is important to remember the old saying “just because you can doesn’t mean that you should.”  When SAS Grid Manager and SAS Viya are collocated, they can share the same data. I need to be clear that there are implications of sharing the same data for example data sets. For example, the data cannot be open by both a process running on the grid and by the SAS Viya analytics server at the same time. If your business processes can accommodate this requirement then sharing the same physical copies of data in storage may save your organization money as well as ease compliance efforts.  I also cannot sufficiently stress the need to complete a proof of concept with production data volumes and job complexity to compare the performance of hosting SAS Grid Manager and SAS Viya in the same data center, compared to having them in geographically separated data centers.

2. Should SAS Viya and SAS Grid Manager 9.4M5 run on the same OS for integration purposes?

From an integration perspective, they can definitely run on different operating systems. In fact, as of version SAS 9.4m5, you can access a CAS (Cloud Analytics Services) server from Solaris, AIX, or 64-bit Windows. The CAS server itself will run on Linux – and soon we’ll roll out SAS Viya for Windows server.

Short version: Crossing operating systems does not present a functional problem.  Those of you who have used SAS in heterogeneous environments know that there are performance implications when processing data that is not native to the running session.  You should carefully consider the performance implications of deploying in a heterogeneous topology before committing to a mixed environment.

3. How do SAS Viya and SAS Grid Manager compare in terms of complexity – particularly in the context of platform administration?

They’re actually very comparable.  Some level of detail varies but in many cases the underlying concepts and effort required are similar – especially when you reach the level of multi-node administration, keeping multiple hosts patched, and those sorts of issues.

4. SAS Grid Manager requires high-performance storage. Do we need to have that same level of storage (such as IBM General Parallel File System) for SAS Viya? 

No – SAS Viya relies on Cloud Analytic Services (CAS), so it doesn’t have the same storage requirements as SAS Grid Manager.  It’s more like what you’d find in a Hadoop environment – the CAS reference architecture is mainly a collection of nodes with local storage that allows SAS Viya to perform memory-mapping to disk run jobs that need resources larger than the total RAM and can use disk cache to continue running. SAS Viya can ingest data serially or in parallel, so for customers that have the ability to use cost-efficient distributed file systems, movement of data into CAS can be done in parallel.

Note, the shared file system that is part of an existing SAS Grid Manager environment could be further leveraged as a means to share data between the SAS Grid and SAS Viya environments.

SAS’ recommended IO throughput for SAS Grid Manager deployments are based upon years of experience with customers who have been unsatisfied with the performance of their chosen storage.  The resulting best practice is one that minimizes performance complaints and allows customers to process very large data in the timeliest manner. If SAS Viya is deployed with a multi-node analytics server (MPP mode) then a shared file system is required. The SAS Viya Cloud Analytic Server (CAS) has been designed with Network File System (NFS) in mind.

Our customers get the best value from environments built with a blend of storage solutions, including both shared files systems for job/user/application concurrency as well as less expensive distributed storage for workloads that may not require concurrency like large machine learning and AI training problems. The latter is where SAS Viya shines.

These were all great questions that we thought deserved more detail than we could offer in a webinar – and there are more!  Soon we’ll post a second set of questions that you can use to inform your work with SAS Grid Manager and SAS Viya.  In the meantime, feel free to post any further questions in the comment section of this post.  We’ll answer them quickly.

4 FAQs about SAS Grid Manager and SAS Viya was published on SAS Users.

8月 152018
 

SAS Viya logoSAS Viya 3.4 has some new functionality that provides real help for those who want to transition from SAS Visual Analytics on 9.4 to SAS Viya. In prior releases of SAS Viya you could promote reports and explorations (and a few other supporting objects). In SAS Viya 3.4, promotion support is added for many additional SAS 9.4 resources, making it easier to make the leap to SAS Viya. In this blog, I will review this new functionality.

In SAS Viya 3.4, the following objects participate in promotion from SAS 9.4.

  • Configuration
    • Identities
    • Authorization
    • Data definitions
  • Content
    • Folders
    • Reports
    • Explorations
    • Stored processes
    • Supporting resources (such as themes, images, graphs templates)

The details of support for each resource are unique and are discussed below.

Identities

User and group promotion from SAS 9.4 to SAS Viya is used to support the transition to the target environment of authorization settings that are associated with content.  Metadata is exported to support the mapping of SAS 9.4 identity metadata (Users and Groups) to SAS Viya identities (Users, Groups and Custom groups).

During promotion of identity metadata:

  • Users connections are mapped using metadata DefaultAuth:logonid to SAS Viya identity id
  • Metadata-only groups from SAS 9.4 are converted to SAS Viya Custom groups (except SAS General Servers and SAS System Services)
  • If custom groups of the same name (or sometimes the same purpose but a different name) exist in the target, the group is preserved and any mapped members from the source system are added to the group.

Authorization

Identities are “promoted” to support re-implementation of authorization. You do not have to explicitly export authorization as it is included with libraries, tables, folders and reports when they are exported. Promotion of authorization is optional. If you don’t wish to include authorization, but rather re-implement it in
SAS Viya, you can switch this functionality off at import time.

SAS Viya has two authorization systems, the general authorization system for folders and content, and the CAS authorization system for data. These authorization systems are different than the metadata authorization model in SAS 9.4. So what happens when you promote content that includes authorization?

General Authorization (folders and content)

Promotion will attempt to convert SAS 9.4 authorization to rules in the General authorization system.  During the process:

  • Explicit Access Control Entries are converted to SAS Viya Rules
  • Access Control Entries with denials are discarded
  • Access Control Templates are not promoted

In addition, if an object (folder/report):

  • does not exist in the target environment,relevant authorization is set for the object and the access control entries from the source are implemented as rules on the object.
  • existsin the target environment, then access control entries from the source are merged with any pre-existing authorizations in the target environment.

CAS Authorization

The CAS authorization system covers CASlibs and data.  Promotion will attempt to convert SAS 9.4 authorization on libraries and tables to access controls in the CAS authorization system. During the process:

  • Access Control Entries are not promoted unless they are applied directly to a library or table.
  • Access Control Entries are converted to CAS access controls.
  • Row-level permissions are preserved.
  • If an object exists in the target environment no authorization settings are imported.
  • Access Control Templates are not promoted.

For details of how individual permissions for both data and content are mapped from SAS 9.4 to SAS Viya see the documentation has great coverage of the steps to follow.

The Process

To finish off, I'll share few observations on the process of exporting from 9.4 and importing in SAS Viya. Like SAS 9.4 promotion, you need to import in a specific order. This allows the software to make the relevant connections to dependent resources. For example, if the CASLIB already exists in the target, then import tables can be mapped to it. Typically, the order is: identities > library definitions > tables > reports and folders. To support this process, make sure, during export, you have a separate package for each resource type. Some considerations for the export process.

You should export:

  • Identities (users and groups) from the security folders in SAS 9.4 metadata to a separate package.
  • Only groups that you need in the target environment (you can subset any irrelevant SAS 9 groups at export time).
  • LASR and Base Libraries and tables directly from the library definition in the folder tree (this prevents extraneous folders being created in the target environment).
  • Libraries in a separate package from tables so that they may be imported first and be available for mapping when the tables are imported.
  • Content and reports from the base of the folder tree so that all directly applied access control entries will be included in the package.

Prior to importing, make sure that users and groups are configured correctly in LDAP. As I already mentioned, physical data is not promoted so ensure that required data and formats are accessible to the SAS Viya environment.

The new functionality for promotion is a great start in helping with the transition from SAS 9.4 to SAS Viya. Look for more functionality in future releases.

New functionality for transitioning from SAS Visual Analytics on 9.4 to SAS Viya was published on SAS Users.

8月 132018
 

Data in the cloud makes it easily accessible, and can help businesses run more smoothly. SAS Viya runs its calculations on Cloud Analytics Service (CAS). David Shannon of Amadeus Software spoke at SAS Global Forum 2018 and presented his paper, Come On, Baby, Light my SAS Viya: Programming for CAS. (In addition to being an avid SAS user and partner, David must be an avid Doors fan.) This article summarizes David's overview of how to run SAS programs in SAS Viya and how to use CAS sessions and libraries.

If you're using SAS Viya, you're going to need to know the basics of CAS to be able to perform calculations and use SAS Viya to its potential. SAS 9 programs are compatible with SAS Viya, and will run as-is through the CAS engine.

Using CAS sessions and libraries

Use a CAS statement to kick off a session, then use CAS libraries (caslibs) to store data and resources. To start the session, simply code "cas;" Each CAS session is given its own unique identifier (UUID) that you can use to reconnect to the session.

Handpicked Related VIDEO: SAS programming in the cloud: CASL code

There are a few significant codes that can help you to master CAS operations. Consider these examples, based on a CAS session that David labeled "speedyanalytics":

  • What CAS sessions do I have running?
    cas _all_ list;
  • Get the version and license specifics from the CAS server hosting my session:
    cas speedyanalytics listabout;
  • I want to sign out of SAS Studio for now, so I will disconnect from my CAS session, but return to it later…
    cas speedyanalytics disconnect;
  • ...later in the same or different SAS Studio session, I want to reconnect to the CAS session I started earlier using the UUID I previous grabbed from the macro variable or SAS log:
    cas uuid="&speedyanalytics_uuid";
  • At the end of my program(s), shutdown all my CAS sessions to release resources on the server:
    cas _all_ terminate;

Using CAS libraries

CAS libraries (caslib) are the method to access data that is being stored in memory, as well as the related metadata.

From the library, you can load data into CAS tables in a couple of different ways:

  1. Takes a sample data set, calculate a new measure and stores the output in memory
  2. Proc COPY can bring existing SAS data into a caslib
  3. Proc CASUTIL loads tables into caslibs

The Proc CASUTIL allows you to save your tables (named "classsi" data in David's examples) for future use through the SAVE statement:

proc casutil;
 save casdata="classsi" casout="classsi";
run;

And reload like this in a future session, using the LOAD statement:

proc casutil;
 load casdata="classsi" casout="classsi";
run;

When accessing your CAS libraries, remember that there are multiple levels of scope that can apply. "Session" refers to data from just the current session, whereas "Global" allows you to reach data from all CAS sessions.

Programming in CAS

Showing how to put CAS into action, David shared this diagram of a typical load/save/share flow:

Existing SAS 9 programs and CAS code can both be run in SAS Viya. The calculations and data memory occurs through CAS, the Cloud Analytics Service. Before beginning, it's important to understand a general overview of CAS, to be able to access CAS libraries and your data. For more about CAS architecture, read this paper from CAS developer Jerry Pendergrass.

The performance case for SAS Viya

To close out his paper, David outlined a small experiment he ran to demonstrate performance advantages that can be seen by using SAS Viya v3.3 over a standard, stand-alone SAS v9.4 environment. The test was basic, but performed reads, writes, and analytics on a 5GB table. The tests revealed about a 50 percent increase in performance between CAS and SAS 9 (see the paper for a detailed table of comparison metrics). SAS Viya is engineered for distributive computing (which works especially well in cloud deployments), so more extensive tests could certainly reveal even further increases in performance in many use cases.

Additional resources

A quick introduction to CAS in SAS Viya was published on SAS Users.

8月 022018
 

SAS Viya has opened an entirely new set of capabilities, allowing SAS to analyze on cloud technology in real-time. One of the best new features of SAS Viya is its ability to pair with open source platforms, allowing developers the freedom of language and implementation to integrate with the power of SAS analytics.

At SAS Global Forum 2018, Sean Ankenbruck and Grace Heyne Lybrand from Zencos Consulting led the talk, SAS Viya: The Beauty of REST in Action. While the paper – and this blog post – outlines the use of Python and SAS Viya, note that SAS Viya integrates with R, Java and Lua as well.

Nonetheless, this Python integration example shows how easy it is to integrate SAS Viya and open source technologies. Here is the basic workflow:

1. A developer creates a web application, in a language of their choice
2. A user enters data in the web application
3. The collected data that is passed to Viya via the defined APIs
4. Analysis is performed in Viya using SAS actions
5. Results are passed back to the web application
6. The web application presents the results to the user

About the process

SAS’ Cloud Analytic Services (CAS) acts as a server to analyze data, and REST API’s are being used to integrate many programming languages into SAS Viya. REST stands for Representational State Transfer, and is a set of constraints that allows scalability and integration of multiple web-based systems. In layman’s terms, it’s a set of software design patterns that provides handy connector points from one web app to another. The REST API is what developers use to interact with and submit requests through the processing system.

CAS actions are what allow “tasks” to be completed on SAS Viya. These “tasks” are under the categories of Statistics, Analytics, System, and Data Mining and Machine Learning.

Integration with Python

To access CAS through Python, the SAS Scripting Wrapper for Analytics Transfer (SWAT) package is used, letting Python conventions dictate CAS actions. To create this interface, data must be captured through a web application in a format that Python can transmit to SAS Viya.
In order to connect Python and CAS, the following is necessary:

• Hostname
• CAS Port
• Username
• Password

Let’s see it in action

As an example, one project about wine preferences used CAS-collected data through a questionnaire stored in Python’s Pandas library. When the information was gathered, the decision tree was uploaded to SAS Viya. A model was created with common terms reviewers use to describe wines, feeding into a decision tree. The CAS server scored the users’ responses in real-time, and then sent the results back to the user providing them with suggested wines to match their inputs.

Process for model

Code to utilize tree:

conn.loadactionset("decisionTree") 
conn.decisionTree.dTreeTrain( 
      casOut = {"name":"tree_model"},
      inputs = [{vars}], 
      modelId = "DT_wine_variety", 
      table = {"caslib":"public", "name": "wines_model_data"}, 
      target = "variety")

"Decision" given to user

Conclusion

SAS Viya has opened SAS to a plethora of opportunities, allowing many different programming languages to be interpreted and quickly integrated, giving analysts and data scientists more flexibility.

Additional resources

At Your Service: Using SAS® Viya™ and Python to Create Worker Programs for Real-Time Analytics, Jon Klopfer, Scott Koval, and Mia List
SAS Viya
sas-viya-programming on github
python-swat on github
SAS Global Forum

Additional SAS Viya talks from SAS Global Forum

A Need for Speed: Loading Data via the Cloud, Henry Christoffels
Come On, Baby, Light my SAS® Viya®: Programming for CAS, David Shannon
Just Enough SAS® Cloud Analytic Services: CAS Actions for SAS® Visual Analytics Report Developers, Michael Drutar
Running SAS Viya on Oracle Cloud without Sacrificing Performance, Dan Grant
Command-Line Administration in SAS® Viya®, Danny Hamrick
Five Approaches for High-Performance Data Loading to the SAS® Cloud Analytic Services Server, Rob Collum

How SAS Viya uses REST APIs to integrate with Python was published on SAS Users.

7月 262018
 

SAS Text Analytics analyze documents at document-level by default, but sometimes sentence-level analysis gains further insights into the data. Two years ago, SAS Text Analytics team did some research on sentence-level text analysis and shared their discoveries in a SGF paper Getting More from the Singular Value Decomposition (SVD): Enhance Your Models with Document, Sentence, and Term Representations. Recently my team started working on a concept extraction project. We need to extract all sentences containing one or two query words, so that linguists don't need to read the whole documents in order to write concept extraction rules. This improves their work efficiency on rules development and rule tuning significantly.

Sentence boundary detection

Sentence boundary detection is a challenge in Natural Language Processing -- it's more complicated than you might expect. For example, most sentences in English end with a period, but sometimes a period is used to denote an abbreviation or used as a part of ellipsis. My colleagues Biljana and Teresa wrote an article about the complexities of how a period may be used. if you are interested in this topic, please check out their article Text analytics through linguists' eyes: When is a period not a full stop?

Sentence boundary rules are different for different languages, and when you work with multilingual data you might want to write one set of code to manipulate all data in varied languages. For example, a period in German is used to denote ending of an ordinal number token; in Chinese, the sentence-final period is different from English period; and Thai does not use period to denote the end of a sentence.

Here are several sentence boundary examples:

Sentences Language Text
1 English Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990.
2 English I paid $23.45 for this book.
3 English We earn more and more money, but we feel less and less happier. So…what happened to us?
4 Chinese 北京确实人多车多,但是根源在哪里?
5 Chinese 在于首都集中了太多全国性资源。
6 German Was sind die Konsequenzen der Abstimmung vom 12. Juni?

How to tokenize documents into sentences with SAS?

There are several methods to build a sentence tokenizer with SAS Text Analytics. Here I only list three methods:

  • Method 1: Use CAS action tpParse and SAS Viya
  • Method 3: Use SAS Data Step Code and SAS 9

Among the above three methods, I recommend the first method, because it can extract sentences and keep the raw texts intact. With the second method, uppercase letters are changed into lowercase letters after parsing with SAS, and some unseen characters will be replaced with white spaces. The third method is based on traditional SAS 9 technology (not SAS Viya), so it might not scale to large data as well.

In my article, I show the SAS code of only the first two methods. For details of the SAS code for the last method, please check out the paper Getting More from the Singular Value Decomposition (SVD): Enhance Your Models with Document, Sentence, and Term Representations.

Use CAS action The applyConcept action performs concept extraction using a concept extraction model that you compile and validate.

%macro sentenceTokenizer1(
   dsIn=,
   docVar=,
   textVar=,
   language=,
   dsOut=
);
/* Rule for determining sentence boundaries */
data sascas1.concept_rule;
   length rule $ 200;
   ruleId=1;
   rule='ENABLE:SentBoundaries';
   output;
 
   ruleId=2;
   rule='PREDICATE_RULE:SentBoundaries(first,last):(SENT,"_first{_w}","_last{_w}")';
   output;
run;
 
proc cas;
textRuleDevelop.validateConcept / 
   table={name="concept_rule"}
   config='rule'
   ruleId='ruleId'
   language="&language"
   casOut={name='outValidation',replace=TRUE}
;
run;
quit;
 
/* Compile concept rule; */
proc cas;
textRuleDevelop.compileConcept / 
   table={name="concept_rule"}
   config="rule"
   enablePredefined=false
   language="&language"
   casOut={name="outli", replace=TRUE}
;
run;
quit;
 
/* Get Sentences */
proc cas;
textRuleScore.applyConcept / 
   table={name="&dsIn"}
   docId="&docVar"
   text="&textVar"
   language="&language"
   model={name="outli"}
   matchType="best"
   casOut={name="outpos_eli", replace=TRUE}
   factOut={name="&dsOut", replace=TRUE, where="_fact_argument_=''"}
;
run;
quit;
 
proc cas;
   table.dropTable name="concept_rule" quiet=true; run;
   table.dropTable name="outli" quiet=true; run;
   table.dropTable name="outpos_eli" quiet=true; run;
quit; 
%mend sentenceTokenizer1;

Use CAS action NLP technique called tpParse.

%macro sentenceTokenizer2(
   dsIn=,
   docVar=,
   textVar=,
   language=,
   dsOut=
);
/* Parse the data set */
proc cas;
textparse.tpParse /
   docId="&docVar"
   documents={name="&dsIn"}
   text="&textVar"
   language="&language"
   cellWeight="NONE"
   stemming=false
   tagging=false
   noungroups=false
   entities="none"
   selectAttribute={opType="IGNORE",tagList={}}
   selectPos={opType="IGNORE",tagList={}}
   offset={name="offset",replace=TRUE}
;
run;
 
/* Get Sentences */
proc cas;
table.partition / 
   table={name="offset" 
          groupby={{name="_document_"}, {name="_sentence_"}}
          orderby={{name="_start_"}}
         }
   casout={name="offset" replace=true};
run;
 
datastep.runCode /
code= "
data &dsOut;
   set offset;
   by _document_ _sentence_ _start_;
   length _text_ varchar(20000);
   if first._sentence_ then do;
      _text_='';
      _lag_end_ = -1;
   end;  
   if _start_=_lag_end_+1 then
      _text_=cats(_text_, _term_);
   else
      _text_=trim(_text_)||repeat(' ',_start_-_lag_end_-2)||_term_;
   _lag_end_=_end_;  
   if last._sentence_ then output;
   retain _text_ _lag_end_;
   keep _document_ _sentence_ _text_;
run;
";
run;   
quit;
 
proc cas;
   table.dropTable name="offset" quiet=true; run;
quit; 
%mend sentenceTokenizer2;

Here are three examples for using each of these tokenizer methods:

/*-------------------------------------*/
/* Start CAS Server.                   */
/*-------------------------------------*/
cas casauto host="host.example.com" port=5570;
libname sascas1 cas;
 
/*-------------------------------------*/
/* Example 1: Chinese texts            */
/*-------------------------------------*/
data sascas1.text_zh;
   infile cards dlm='|' missover;
   input _document_ text :$200.;
   cards;
1|北京确实人多车多,但是根源在哪里?在于首都集中了太多全国性资源。
;
run;   
 
%sentenceTokenizer1(
   dsIn=text_zh,
   docVar=_document_,
   textVar=text,
   language=chinese,
   dsOut=sentences_zh1
);
 
%sentenceTokenizer2(
   dsIn=text_zh,
   docVar=_document_,
   textVar=text,
   language=chinese,
   dsOut=sentences_zh2
);
 
/*-------------------------------------*/
/* Example 2: English texts            */
/*-------------------------------------*/
data sascas1.text_en;
   infile cards dlm='|' missover;
   input _document_ text :$500.;
   cards;
1|Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990.
2|I paid $23.45 for this book.
3|We earn more and more money, but we feel less and less happier. So…what happened to us?
;
run;   
 
%sentenceTokenizer1(
   dsIn=text_en,
   docVar=_document_,
   textVar=text,
   language=english,
   dsOut=sentences_en1
);
 
%sentenceTokenizer2(
   dsIn=text_en,
   docVar=_document_,
   textVar=text,
   language=english,
   dsOut=sentences_en2
);
 
 
/*-------------------------------------*/
/* Example 3: German texts             */
/*-------------------------------------*/
data sascas1.text_de;
   infile cards dlm='|' missover;
   input _document_ text :$600.;
   cards;
1|Was sind die Konsequenzen der Abstimmung vom 12. Juni?
;
run;   
 
%sentenceTokenizer1(
   dsIn=text_de,
   docVar=_document_,
   textVar=text,
   language=german,
   dsOut=sentences_de1
);
 
%sentenceTokenizer2(
   dsIn=text_de,
   docVar=_document_,
   textVar=text,
   language=german,
   dsOut=sentences_de2
);

The sentences extracted of the three examples as Table 2 shows below.

Example Doc Text Sentence (Method 1) Sentence (Method 2)
English

 

1 Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990. Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990. rolls-royce motor cars inc. said it expects its u.s. sales to remain steady at about 1,200 cars in 1990.
2 I paid $23.45 for this book. I paid $23.45 for this book. i paid $23.45 for this book.
3 We earn more and more money, but we feel less and less happier. So…what happened to us? We earn more and more money, but we feel less and less happier. we earn more and more money, but we feel less and less happier.
So…what happened? so…what happened?
Chinese

 

1 北京确实人多车多,但是根源在哪里?在于首都集中了太多全国性资源。 北京确实人多车多,但是根源在哪里? 北京确实人多车多,但是根源在哪里?
在于首都集中了太多全国性资源。 在于首都集中了太多全国性资源。
German 1 Was sind die Konsequenzen der Abstimmung vom 12. Juni? Was sind die Konsequenzen der Abstimmung vom 12. Juni? was sind die konsequenzen der abstimmung vom 12. juni?

From the above table, you can see that there is no difference between two methods with Chinese textual data, but many differences between two methods with English or German textual data. So which method you should use? It depends on the SAS products that you have available. Method 1 depends on compileConcept, validateConcept, and applyConcept actions, and requires SAS Visual Text Analytics. Method 2 depends on the tpParse action in SAS Visual Analytics. If you have both products available, then consider your use case. If you are working on text analytics that are case insensitive, such as topic detection or text clustering, you may choose method 2. Otherwise, if the text analytics are case sensitive such as named entity recognition, you must choose method 1. (And of course, if you don't have SAS Viya, you can use method 3 with SAS 9 and guidance from the cited paper.)

If you have SAS Viya, I suggest trying the above sentence tokenization method with your data and then run text mining actions on the sentence-level data to see what insights you will get.

How to tokenize documents into sentences was published on SAS Users.

7月 252018
 

I recently joined SAS in a brand new role: I'm a Developer Advocate.  My job is to help SAS customers who want to access the power of SAS from within other applications, or who might want to build their own applications that leverage SAS analytics.  For my first contribution, I decided to write an article about a quick task that would interest developers and that isn't already heavily documented. So was born this novice's experience in using R (and RStudio) with SAS Viya. This writing will chronicle my journey from the planning stages, all the way to running commands from RStudio on the data stored in SAS Viya. This is just the beginning; we will discuss at the end where I should go next.

Why use SAS Viya with R?

From the start, I asked myself, "What's the use case here? Why would anyone want to do this?" After a bit of research discussion with my SAS colleagues, the answer became clear.  R is a popular programming language used by data scientists, developers, and analysts – even within organizations that also use SAS.  However, R has some well-known limitations when working with big data, and our SAS customers are often challenged to combine the work of a diverse set of tools into a well-governed analytics lifecycle. Combining the developers' familiarity of R programming with the power and flexibility of SAS Viya for data storage, analytical processing, and governance, this seemed like a perfect exercise.  For this purpose of this scenario, think of SAS Viya as the platform and the Cloud Analytics Server (CAS) is where all the data is stored and processed.

How I got started with SAS Viya

I did not want to start with the task of deploying my own SAS Viya environment. This is a non-trivial activity, and not something an analyst would tackle, so the major pre-req here is you'll need access to an existing SAS Viya setup.  Fortunately for me, here at SAS we have preconfigured SAS Viya environments available on a private cloud that we can use for demos and testing.  So, SAS Viya is my server-side environment. Beyond that, a client is all I needed. I used a generic Windows machine and got busy loading some software.

What documentation did I use/follow?

I started with the official SAS documentation: SAS Scripting Wrapper for Analytics Transfer (SWAT) for R.

The Process

The first two things I installed were R and RStudio, which I found at these locations:

https://cran.r-project.org/
https://www.rstudio.com/products/rstudio/download/

The installs were uneventful, so I'll won't list all those steps here. Next, I installed a couple of pre-req R packages and attempted to install the SAS Scripting Wrapper for Analytics Transfer (SWAT) package for R. Think of SWAT as what allows R and SAS to work together. In an R command line, I entered the following commands:

> install.packages('httr')
> install.packages('jsonlite')
> install.packages('https://github.com/sassoftware/R-swat/releases/download/v1.2.1/R-swat-1.2.1-> 
  linux64.tar.gz', repos=NULL, type='file')

When attempting the last command, I hit an error:

…
ERROR: dependency 'dplyr' is not available for package 'swat'
* removing 'C:/Program Files/R/R-3.5.1/library/swat'
In R CMD INSTALL
Warning message:
In install.packages("https://github.com/sassoftware/R-swat/releases/download/v1.2.1/R-swat-1.2.1-linux64.tar.gz",  :
installation of package 'C:/Users/sas/AppData/Local/Temp/2/RtmpEXUAuC/downloaded_packages/R-swat-1.2.1-linux64.tar.gz'
  had non-zero exit status

The install failed. Based on the error message, it turns out I had forgotten to install another R package:

> install.packages("dplyr")

(This dependency is documented in the R SWAT documentation, but I missed it. Since this could happen to anyone – right? – I decided to come clean here. Perhaps you'll learn from my misstep.)

After installing the dplyr package in the R session, I reran the swat install and was happy to hit a return code of zero. Success!

For the brevity of this post, I decided to not configure an authentication file and will be required to pass user credentials when making connections. I will configure authinfo in a follow-up post.

Testing my RStudio->SAS Viya connection

From RStudio, I ran the following command to connect to the CAS server:

> library(swat)
> conn <- CAS("mycas.company.com", 8777, protocol='http', user='user', password='password')

Now that I succeeded in connecting my R client to the CAS server, I was ready to load data and start making API calls.

How did I decide on a use case?

I'm in the process of moving houses, so I decided to find a data set on property values in the area to do some basic analysis, to see if I was getting a good deal. I did a quick google search and downloaded a .csv from a local government site. At this point, I was all set up, connected, and had data. All I needed now was to run some CAS Actions from RStudio.

CAS actions are commands that you submit through RStudio to tell the CAS server to 'do' something. One or more objects are returned to the client -- for example, a collection of data frames. CAS actions are organized into action sets and are invoked via APIs. You can find

> citydata <- cas.read.csv(conn, "C:\\Users\\sas\\Downloads\\property.csv", sep=';')
NOTE: Cloud Analytic Services made the uploaded file available as table PROPERTY in caslib CASUSER(user).

What analysis did I perform?

I purposefully kept my analysis brief, as I just wanted to make sure that I could connect, run a few commands, and get results back.

My RStudio session, including all of the things I tried

Here is a brief series of CAS action commands that I ran from RStudio:

Get the mean value of a variable:

> cas.mean(citydata$TotalSaleValue)
          Column     Mean
1 TotalSaleValue 343806.5

Get the standard deviation of a variable:

> cas.sd(citydata$TotalSaleValue)
          Column      Std
1 TotalSaleValue 185992.9

Get boxplot data for a variable:

> cas.percentile.boxPlot(citydata$TotalSaleValue)
$`BoxPlot`
          Column     Q1     Q2     Q3     Mean WhiskerLo WhiskerHi Min     Max      Std    N
1 TotalSaleValue 239000 320000 418000 343806.5         0    685000   0 2318000 185992.9 5301

Get boxplot data for another variable:

> cas.percentile.boxPlot(citydata$TotalBldgSqFt)
$`BoxPlot`
         Column   Q1   Q2   Q3     Mean WhiskerLo WhiskerHi Min   Max      Std    N
1 TotalBldgSqFt 2522 2922 3492 3131.446      1072      4943 572 13801 1032.024 5301

Did I succeed?

I think so. Let's say the house I want is 3,000 square feet and costs $258,000. As you can see in the box plot data, I'm getting a good deal. The house size is in the second quartile, while the house cost falls in the first quartile. Yes, this is not the most in depth statistical analysis, but I'll get more into that in a future article.

What's next?

This activity has really sparked my interest to learn more and I will continue to expand my analysis, attempt more complex statistical procedures and create graphs. A follow up blog is already in the works. If this article has piqued your interest in the subject, I'd like to ask you: What would you like to see next? Please comment and I will turn my focus to those topics for a future post.

Using RStudio with SAS Viya was published on SAS Users.

7月 252018
 

In SAS Visual Analytics 8.3, a Data View is a reusable and shareable template for a data source. That means that the data view is tied to the data source, and not to the report. If you update a data view it will not automatically propagate those changes into a report.
 
So, what can a data view do for you? Plenty! Here are just a few of the settings and customizations that a data view can save for a data source: (taken from documentation here):

  • Data item settings such as names, formats, classifications, and aggregations
  • Data source filters
  • Hierarchies
  • Derived data items
  • Calculated items
  • Custom categories
  • Duplicate data items
  • Show / hide status for data items
  • Unique row identifier selection

Create a Data View

Now you must be wondering, how do you save all these wonderful customizations for your data source? Answer: by creating a Data View.
 
To get started, use the Data Source menu and select Save data view…. In this example, I created a hierarchy for the SASHELP CARS data set but as you can see from the list above you could have created many more calculations, custom categories, etc.
 
 

 
Then give the Data View a name. A few other things you may notice about this Save Data View dialogue are the options for: Default data view and Shared data view.
 
 

Default data view

A default data view is automatically applied whenever the data source is added to the report.
 
Each user can create their own data view of the source data and select their own default data view. This could lead to each user having a personalized default view. But, what if you want share your data views with others on your team? Or have everyone start with the same default view? That is when you need to first be an Application Administrator and second use the Shared data view option.

Shared data view

In order to be able to share a data view, you must be an Application Administrator. Then the option to share a data view will be available. Once a data view is shared for a data source, other users with access to that data source will be able to apply that data view.

Apply a Data View

Data views are templates of saved settings, hierarchies, custom categories, calculated data item, etc. which can be combined in an infinite amount of ways. Therefore, it follows that multiple data views can be applied to the same data source. In the example above, I created a new hierarchy for the SASHELP CARS data set. But I could also create a new data view which changes the aggregation of the MPG measures to reflect the average aggregation and not the default sum aggregation.

To apply a data view: open a new report, select your data source, then use the Data Source menu and select Data views…. You will see any individually created data views as well as any shared data views. Highlight the data view you wish to apply, then select Apply. Repeat for all of the data views you wish to apply.

If any data items are duplicated with the addition of data views then, as shown below, those data items are given a (n) after their names.

Administrator-controlled Default Data View

We've learned what Data Views are and that we can share them. How can we ensure that all the users who select a data source get the same starting point with a particular data view? To set this up, you must be an Application Administrator and the Data View must be Shared.
 
Once these two criteria are met, you can navigate to the report's overflow menu and select Edit administration settings. Then select the data source and which data view to apply as the default for all users.


 
Caution: If the user has already selected a personal default data view, then the personal default data view overrides the administrator-set default data view. Remember that an individual user can apply a personal or another shared data view and override the default data view.

Conclusion

Data Views are just one of the exciting new features in SAS Visual Analytics 8.3. A few key points to remember:

  • Data Views are tied to a data source, not a report. If a data view is edited, those edits do not propagate to the reports that applied that Data View.
  • A data source can have multiple Data Views applied.
  • Only an Application Administrator can share a data view with other users as well as define a default data view for a data source for all users. Any personal defined default data views override the administrator-set default data view.
  • Data Views are a template of data settings and edits – not a fully robust semantic layer where updates are pushed to all instances of usage. While Data Views can be used to assist in defining commonly used calculations and custom categories, remember that each user can still create their own data views and thus override the administrator-set default.

Using Data Views in SAS Visual Analytics was published on SAS Users.