Data Visualization

7月 192019
 

When a new Moon passes between the Earth and the Sun, the Moon can cast a shadow on certain regions of the Earth. This natural phenomenon creates a solar eclipse, meaning the Moon covers, or eclipses, your view of the Sun if you're in that region. No surprise that in [...]

Ring of fire: Visualizing 5,000 years of solar eclipses was published on SAS Voices by Falko Schulz

7月 122019
 

Are you a seasoned data scientist looking for a fast, all-inclusive machine learning solution? Curious about machine learning but have little to no programming experience? Interested in using AI to take over the world? Follow my lead and use SAS VDDML to fast track your world domination.

This blog is the beginning of a series on  SAS Visual Data Mining and Machine Learning (VDMML) told from my perspective as a first-time SAS Viya user, Graduate Intern at SAS, and ABD PhD Candidate in Computer Science. I'm writing this series for two main reasons: 1) to express how surprised I am at seeing how easily complex tasks can be completed after doing it the hard way for years and 2) to provide examples to convince you, too.  

SAS VDMML is only one of many products available in SAS Viya®. Its distinguishing feature being the machine learning pipelines which are created in a single, integrated in-memory environment via a drag and drop interface.

In this post, I will provide a high-level overview of a few of the features available in SAS VDMML. In the next posts, I will provide detailed examples and code comparisons for individual features, such as pipeline creation and autotuning. 

Tip: At the bottom of the post, I talk about a course on machine learning using SAS Viya that provides access to the software and teaches machine learning basics. 

simpleDecisionTreePipeline

Simple custom pipeline using SAS Viya

If you've never used SAS VDMML, here are the top 3 reasons why I think you should check it out. 

sasvdmml_hyperparameters

A sample of the variety of hyperparameter modifications available.

SAS VDMML creates a simplified approach to machine learning solutions beneficial to people with a wide range of expertise.

Have you been programming for as long as you can remember and are well-versed in the machine learning world?

Why spend all your precious mental energy focusing on tedious programming tasks? Instead, you could be focusing your energy on diving deeper in your data and discovering the extent of its modelling capabilities.

After spending only a week familiarizing myself with the interface, I felt confident I could perform my normal tasks with ease and with better hyperparameter tuning and more comprehensive model evaluations.

If you are wondering if these simplifications limit the customization of models, think again. For most uses, the customization options available match the level programming provides by utilizing features such as drop-down menus and editable text boxes - eliminating unnecessary mental overhead. 

 

opensourcecodenode

Open Source Code node as a supervised learning node and example code in code editor

Still itching to program?

You have options: the SAS Code node and the Open Source Code node (available for use with R and Python). Both nodes can be used in any part of the pipeline, including preprocessing, supervised learning, and post-processing.

For example, you may have preprocessing code for extra messy data already written in R. All you need to do is add the Open Source Code node, insert your code, and update the variables to match the provided macros. Or, maybe you want to try the Deep Learning toolkit and CAS action? Drop a SAS Code node into your pipeline, add your code, and you are good to go!

 

Little to no programming experience or not quite a machine learning expert?

The drag and drop interface, wide selection of templates, and the extensive evaluations allow for almost anyone to produce professional-level results in a matter of minutes. While using SAS VDMML might not require expert-level knowledge on machine learning, important projects should have an expert review the approach and results. 

Example of creating a new pipeline using an advanced template

sasvdmml_pipelinenodeoptions

A sample of options for preprocessing and supervised learning.

The days of spending weeks programming scripts for feature extraction, fine-tuning models, and evaluating your model are over!

For example, let's say I'm attempting to impute some variables using R. First, I might store the names of  each columns separately based on the type of imputation I want to perform on it. Then, I could create the code for each type of imputation. If I'm only attempting 2 different types of imputation, I will most likely need less than 10 lines of code. Not much, right?

But, I will also need to test and verify that each variable has been imputed correctly.

Instead, the same task in SAS VDMML would just require you to drop an Imputation node into the pipeline and select via a drop-down menu how to impute the variables - no time wasted.

Additionally, you can quickly compare a variety of supervised learning methods as well as test out the same model with different pre-processing methods using the automatic evaluations provided.  

Looking to save even more time?

You can use the autotuning feature to select the best set of hyperparameters for your model by turning it on in your supervised learning node of choice and hitting run. 

sasvdmml_autotuning

Turn on the autotuning feature inside your supervised learning node, then adjust ranges for the hyperparameters.

After the run is complete, view the supervised learning model's results to see the best configuration of parameters as determined by autotuning. 

sasvdmml_autotuningresults

Example of the results after using autotuning for a decision tree.

All of this can be accomplished with a few clicks, which eliminates the hours spent debugging scripts and connecting the steps in your workflow.

stressedatcomputerYou’ve spent the last hour transforming and creating features in your code editor of choice. Now, after waiting 30 minutes for your model to run again, you get the same results! How? Wait...you’ve forgotten to update the reference to your new data, AGAIN. (This has definitely not happened to me.) 

Fortunately, SAS VDMML only allows you to view results if the pipeline is up-to-date, which ensures that all changes are accounted for. Now, instead of checking and checking again that I passed the right data to the right functions, I can immediately know that my small tweak had no effect on the results. *sigh* OR on a brighter note, that the drastic improvement is not a fluke!  

Updating the Feature Extraction node resets all child nodes below - ensuring that the pipeline stays up-to-date.

Interested in checking out SAS Viya?

Machine Learning Using SAS Viya is a course that teaches machine learning basics, gives instruction on using SAS Viya VDMML, and provides access to the SAS Viya for Learners software all for $79.This course is the pre-requisite course for the SAS Certified Specialist in Machine Learning Certification. Going through the course myself, I was able to quickly learn how to use SAS VDMML and received a refresher on many data preprocessing tactics and machine learning concepts. 

Want to learning more? 

Stay tuned!!

I will be posting blogs with in-depth examples of specific features in the SAS VDMML and adding links to the new blog posts here as they are posted. If you there’s any specific features you would like to know more about, leave a comment below! 

Visual machine learning using SAS Viya: a Graduate Intern’s perspective was published on SAS Users.

7月 122019
 

Are you a seasoned data scientist looking for a fast, all-inclusive machine learning solution? Curious about machine learning but have little to no programming experience? Interested in using AI to take over the world? Follow my lead and use SAS VDDML to fast track your world domination.

This blog is the beginning of a series on  SAS Visual Data Mining and Machine Learning (VDMML) told from my perspective as a first-time SAS Viya user, Graduate Intern at SAS, and ABD PhD Candidate in Computer Science. I'm writing this series for two main reasons: 1) to express how surprised I am at seeing how easily complex tasks can be completed after doing it the hard way for years and 2) to provide examples to convince you, too.  

SAS VDMML is only one of many products available in SAS Viya®. Its distinguishing feature being the machine learning pipelines which are created in a single, integrated in-memory environment via a drag and drop interface.

In this post, I will provide a high-level overview of a few of the features available in SAS VDMML. In the next posts, I will provide detailed examples and code comparisons for individual features, such as pipeline creation and autotuning. 

Tip: At the bottom of the post, I talk about a course on machine learning using SAS Viya that provides access to the software and teaches machine learning basics. 

simpleDecisionTreePipeline

Simple custom pipeline using SAS Viya

If you've never used SAS VDMML, here are the top 3 reasons why I think you should check it out. 

sasvdmml_hyperparameters

A sample of the variety of hyperparameter modifications available.

SAS VDMML creates a simplified approach to machine learning solutions beneficial to people with a wide range of expertise.

Have you been programming for as long as you can remember and are well-versed in the machine learning world?

Why spend all your precious mental energy focusing on tedious programming tasks? Instead, you could be focusing your energy on diving deeper in your data and discovering the extent of its modelling capabilities.

After spending only a week familiarizing myself with the interface, I felt confident I could perform my normal tasks with ease and with better hyperparameter tuning and more comprehensive model evaluations.

If you are wondering if these simplifications limit the customization of models, think again. For most uses, the customization options available match the level programming provides by utilizing features such as drop-down menus and editable text boxes - eliminating unnecessary mental overhead. 

 

opensourcecodenode

Open Source Code node as a supervised learning node and example code in code editor

Still itching to program?

You have options: the SAS Code node and the Open Source Code node (available for use with R and Python). Both nodes can be used in any part of the pipeline, including preprocessing, supervised learning, and post-processing.

For example, you may have preprocessing code for extra messy data already written in R. All you need to do is add the Open Source Code node, insert your code, and update the variables to match the provided macros. Or, maybe you want to try the Deep Learning toolkit and CAS action? Drop a SAS Code node into your pipeline, add your code, and you are good to go!

 

Little to no programming experience or not quite a machine learning expert?

The drag and drop interface, wide selection of templates, and the extensive evaluations allow for almost anyone to produce professional-level results in a matter of minutes. While using SAS VDMML might not require expert-level knowledge on machine learning, important projects should have an expert review the approach and results. 

Example of creating a new pipeline using an advanced template

sasvdmml_pipelinenodeoptions

A sample of options for preprocessing and supervised learning.

The days of spending weeks programming scripts for feature extraction, fine-tuning models, and evaluating your model are over!

For example, let's say I'm attempting to impute some variables using R. First, I might store the names of  each columns separately based on the type of imputation I want to perform on it. Then, I could create the code for each type of imputation. If I'm only attempting 2 different types of imputation, I will most likely need less than 10 lines of code. Not much, right?

But, I will also need to test and verify that each variable has been imputed correctly.

Instead, the same task in SAS VDMML would just require you to drop an Imputation node into the pipeline and select via a drop-down menu how to impute the variables - no time wasted.

Additionally, you can quickly compare a variety of supervised learning methods as well as test out the same model with different pre-processing methods using the automatic evaluations provided.  

Looking to save even more time?

You can use the autotuning feature to select the best set of hyperparameters for your model by turning it on in your supervised learning node of choice and hitting run. 

sasvdmml_autotuning

Turn on the autotuning feature inside your supervised learning node, then adjust ranges for the hyperparameters.

After the run is complete, view the supervised learning model's results to see the best configuration of parameters as determined by autotuning. 

sasvdmml_autotuningresults

Example of the results after using autotuning for a decision tree.

All of this can be accomplished with a few clicks, which eliminates the hours spent debugging scripts and connecting the steps in your workflow.

stressedatcomputerYou’ve spent the last hour transforming and creating features in your code editor of choice. Now, after waiting 30 minutes for your model to run again, you get the same results! How? Wait...you’ve forgotten to update the reference to your new data, AGAIN. (This has definitely not happened to me.) 

Fortunately, SAS VDMML only allows you to view results if the pipeline is up-to-date, which ensures that all changes are accounted for. Now, instead of checking and checking again that I passed the right data to the right functions, I can immediately know that my small tweak had no effect on the results. *sigh* OR on a brighter note, that the drastic improvement is not a fluke!  

Updating the Feature Extraction node resets all child nodes below - ensuring that the pipeline stays up-to-date.

Interested in checking out SAS Viya?

Machine Learning Using SAS Viya is a course that teaches machine learning basics, gives instruction on using SAS Viya VDMML, and provides access to the SAS Viya for Learners software all for $79.This course is the pre-requisite course for the SAS Certified Specialist in Machine Learning Certification. Going through the course myself, I was able to quickly learn how to use SAS VDMML and received a refresher on many data preprocessing tactics and machine learning concepts. 

Want to learning more? 

Stay tuned!!

I will be posting blogs with in-depth examples of specific features in the SAS VDMML and adding links to the new blog posts here as they are posted. If you there’s any specific features you would like to know more about, leave a comment below! 

Visual machine learning using SAS Viya: a Graduate Intern’s perspective was published on SAS Users.

7月 032019
 

One of my favorite parts of summer is a relaxing weekend by the pool. Summer is the time I get to finally catch up on my reading list, which has been building over the year. So, if expanding your knowledge is a goal of yours this summer, SAS Press has a shelf full of new titles for you to explore. To help navigate your selection we asked some of our authors what SAS books were on their reading lists for this summer?

Teresa Jade


Teresa Jade, co-author of SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, has already started The DS2 Procedure: SAS Programming Methods at Work by Peter Eberhardt. Teresa reports that the book “is a concise, well-written book with good examples. If you know a little bit about the SAS DATA step, then you can leverage what you know to more quickly get up to speed with DS2 and understand the differences and benefits.”
 
 
 

Derek Morgan

Derek Morgan, author of The Essential Guide to SAS® Dates and Times, Second Edition, tells us his go-to books this summer are Art Carpenter’s Complete Guide to the SAS® REPORT Procedure and Kirk Lafler's PROC SQL: Beyond the Basics Using SAS®, Third Edition. He also notes that he “learned how to use hash objects from Don Henderson’s Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study.”
 

Chris Holland

Chris Holland co-author of Implementing CDISC Using SAS®: An End-to-End Guide, Revised Second Edition, recommends Richard Zink’s JMP and SAS book, Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP® and SAS®, which describes how to improve efficiency while reducing costs in trials with centralized monitoring techniques.
 
 
 
 
 

And our recommendations this summer?

Download our two new free e-books which illustrate the features and capabilities of SAS® Viya®, and SAS® Visual Analytics on SAS® Viya®.

Want to be notified when new books become available? Sign up to receive information about new books delivered right to your inbox.

Summer reading – Book recommendations from SAS Press authors was published on SAS Users.

5月 212019
 

If you spend any time working with maps and spatial data, having a fundamental understanding of coordinate systems and map projections becomes necessary.  It’s the foundation of how spatial data and maps work.  These areas invariably evoke trepidation and some angst, even in the most seasoned map professional.  And rightfully so, it can get complicated quickly. Fortunately, most of those worries can be set aside when creating maps with SAS Visual Analytics, without requiring a degree in Geodesy.

Visual Analytics includes several different coordinate system definitions configured out-of-the-box.  Like the Predefined geography types (see Fundamental of SAS Visual Analytics geo maps), they are selected from a drop-down list during the geography variable setup.  With the details handled by VA, all you need to know is what coordinate space your data uses and select the appropriate one.

The four Coordinate spaces included with VA are:

  1. World Geodetic System (WGS84)
    Area of coverage: World.  Used by GPS navigation systems and NATO military geodetic surveying.  This is the VA default and should work in most situations.
  2. Web Mercator
    Area of coverage: World.  Format used by Google maps, OpenStreetMap, Bing maps and other web map providers.
  3. British National Grid (OSGB36)
    Area of coverage: United Kingdom – Great Britain, Isle of Man
  4. Singapore Transverse Mercator (SVY21)
    Area of coverage: Singapore onshore/offshore

But what if your data does not use one of these?  For those situations, VA also supports custom coordinate spaces.  With this option, you can specify the definition of your desired coordinate space using industry standard formats for EPSG codes or Proj4 strings.  Before we get into the details of how to use custom coordinate spaces in VA, let’s take a step back and review the basics of coordinate spaces and projections.

Background

A coordinate space is simply a grid designed to cover a specific area of the Earth.  Some have global coverage (WGS84, the default in VA) and others cover relatively small areas (SVY21/Singapore Transverse Mercator).  Each coordinate space is defined by several parameters, including but not limited to:

  • Center coordinates (origin)
  • Coverage area (‘bounds’ or ‘extent’)
  • Unit of measurement (feet or meters)

Comparison of coordinate space definitions included in Visual Analytics -- Source: http://epsg.io

The image above compares the four coordinate space definitions included with VA.  The two on the right, BNG and Singapore Transverse Mercator, have a limited extent.  A red rectangle outlines the area of coverage for each region.  The two on the left, WGS84 and Mercator, are both world maps.  At first glance, they may appear to have the same coverage area, but they are not interchangeable.  The origin for both is located at the intersection of the Equator and the Prime Meridian.  However, the similarities end there.  Notice the extent for WGS84 covers the entire latitude range, from -90 to +90.  Mercator on the other hand, covers from -85 to +85 latitude, so the first 5 degrees from each Pole are not included.  Another difference is the unit of measurement.  WGS84 is measured in un-projected degrees, which is indicative of a spherical Geographic Coordinate System (GCS).  Mercator uses meters, which implies a Projected Coordinate System (PCS) used for a flat surface, ie. a screen or paper.

The projection itself is a complex mathematical operation that transforms the spherical surface of the GCS into the flat surface of the PCS.  This transformation introduces distortion in one or more qualities of the map: shape, area, direction, or distance.  The process of map projection compares to peeling an orange. Removing the peel and placing it on a flat surface will cause parts of it to stretch, tear or separate as it flattens. The same thing happens to a map projection.

A flat map will always have some degree of distortion.  The amount of distortion depends on the projection used.  Select a projection that minimizes the distortion in the areas most important to the map.  For example, are you creating a navigation map where direction is critical?  How about a World map to compare land mass of various countries?  Or maybe a local map of Municipality services where all factors are equally important?  These decisions are important if you are collecting and creating your data set from the field.  But, if you are using existing data sets, chances are that decision has already been made for you.  It then becomes a task of understanding what coordinate system was selected and how to use it within VA.

Using a Custom Coordinate Space in VA

When using VA’s custom coordinate space option, it is critical the geography variable and the dataset use the same coordinate space.  This tells VA how to align the grid used by the data with the grid used by the underlying map.  If they align, the data will be placed at the expected location.  If they don’t align, the data will appear in the wrong location or may not be displayed at all.

Illustration of aligning the map and data grids

To illustrate the process of using a custom coordinate space in VA, we will be creating a custom region map of the Oklahoma City School Districts.  The data can be found on the Oklahoma City Open Data Portal.  We will use the Esri shapefile format.  As you may recall from a previous blog post, Creating custom region maps with SAS Visual Analytics, the first step is to import the Esri shapefile data into a SAS dataset.

Once the shapefile has been successfully imported into SAS, we then must determine the coordinate system of the data.  While WGS84 is common and will work in many situations, it should not be assumed.  The first place to look is at the source, the data provider.  Many Open Data portals will have the coordinate system listed along with the metadata and description of the dataset.  But when using an Esri shapefile, there is an easier way to find what we need.

Locate the directory where you unzipped the original shapefile.  Inside of that directory is a file with a .prj extension.  This file defines the projection and coordinate system used by the shapefile.  Below are the contents of our .prj file with the first parameter highlighted.  We are only interested in this value.  Here, you can see the data has been defined in the Oklahoma State Plane coordinate system -- not in VA’s default WGS84.  So, we must use a custom coordinate system when defining the geography variable.

PROJCS["NAD_1983_StatePlane_Oklahoma_North_FIPS_3501_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101004]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]], PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",1968500],PARAMETER["False_Northing",0],PARAMETER["Central_Meridian",-98],PARAMETER["Standard_Parallel_1",35.5666666666667],PARAMETER["Standard_Parallel_2",36.7666666666667],PARAMETER["Scale_Factor",1],PARAMETER["Latitude_Of_Origin",35],UNIT["Foot_US",0.304800609601219]]

Next, we need to look up the Oklahoma State Plane coordinate system to find a definition VA understands.  From the main page of the SpatialReference.org website, type ‘Oklahoma State Plane’ into the search box. Four results are returned.  Compare the results with the string highlighted above.  You can see the third option is what we are looking for: NAD 1983 StatePlane Oklahoma North FIPS 3501 Feet.

Selecting the appropriate definition based on the .prj file contents

To get the definitions we need for VA, click the third link for the option NAD 1983 StatePlane Oklahoma North FIPS 3501 Feet.  Here you will see a grey box with a bulleted list of links.  Each of these links represent a definition for the Oklahoma StatePlane coordinate space.

Visual Analytics supports two of the listed formats, EPSG and Proj4.  EPSG stands for European Petroleum Survey Group, an organization that publishes a database of coordinate system and projection information.  The syntax of this format is epsg:<number> or esri:<number>, where <number> is a 4-6 digit for the desired coordinate system.  In our cases, the format we need is the title of the page:

ESRI:102724

The second format supported by VA is Proj4, the third link in the image above.  This format consists of a string of space-delimited name value pairs.  The Oklahoma StatePlane proj4 definition we are interested in is:

+proj=lcc +lat_1=35.56666666666667 +lat_2=36.76666666666667 +lat_0=35 +lon_0=-98 +x_0=600000.0000000001 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs

Now we have identified the coordinate system used by our data set and looked up its definition, we are ready to configure VA to use it.

Using a Projected Coordinate System definition in VA

The following section assumes you are familiar with custom region maps and setting up a polygon provider.  If not, see my previous post on that process, Creating custom region maps with SAS Visual Analytics.  The first step in setting up a geography variable for a custom region map is to start with the polygon provider.  At the bottom of the ‘Edit Polygon Provider’ window, there is an ‘Advanced’ section that is collapsed by default.  Expand it to see the Coordinate Space option.  By default, it is populated with the value EPSG:4326, which is the EPSG code for WGS84.  Since our Oklahoma City School District code data does not use WGS84, we need to replace this value with the EPSG code that we looked up from SpatialReference.org (ESRI:102724).

Using the same Custom Coordinate definition for Polygon provider and geography variable

Next, we must make sure to configure the geography variable itself with the same coordinate space as the polygon provider.  On the ‘Edit Geography Item’ window, the Coordinate Space option is the last item.  Again, we must change this from the default WGS84 to ESRI:102724.  From the dropdown list, select the option ‘Custom’.  A new entry box appears where we can enter the custom coordinate space definition.  If configured correctly, you should see your map in the preview thumbnail and a 100% mapped indicator.

Congratulations!  The setup was successful.  Now, simply click OK and drag the geography variable to the canvas.  VA’s auto-map feature will recognize it and display the custom region map.

In this post, I showed how to identify the coordinate system of your Esri shapefile data, lookup its epsg and proj4 definitions, and configure VA to use it via the Custom Coordinate space option.  While the focus was on a custom region map, the technique also applies to Custom Coordinate maps, minus the polygon provider setup.  The support of custom coordinate spaces in VA allow the mapping of practically any spatial dataset, giving you a new level of power and flexibility in your mapping efforts.

Essentials of Map Coordinate Systems and Projections in Visual Analytics was published on SAS Users.

5月 062019
 

App security is at the top of mind for just about everybody – users, IT folks, business executives. Rightfully so. Mobile apps and the devices on which they reside tend to travel around, without any physical boundaries that encompass the traditional desktop computers.

In chatting with folks who are evaluating the SAS Visual Analytics app for their mobile devices, the conversation eventually winds up with a focus on security and the big question comes up:

How is this app secure?

Great question! Here’s a whirlwind tour of the security features that have been built into the SAS Visual Analytics app for Windows 10, Android, and iOS devices. The app is now a young kid and not a toddler anymore, it has been around for about six years. And during its growth journey, the app has been beefed up with rock-solid features to address security for Visual Analytics reports viewed from mobile devices.

Before we take a look at the security features in the app, here are a few things you should know:

    • The app is free.
    • No license is needed to use the app.
    • You can download it anytime from the app store, and try out the sample reports in the app.
    • If you already have SAS Visual Analytics deployed in your organization, you can connect to your server, add reports to the app, and start interacting with your reports from your smartphone or tablet. The Help available in the app walks you through these steps.

Now, let’s get back to security for Visual Analytics reports on mobile devices. Here are five things that make the Visual Analytics app robust and secure on mobile devices.

    1. Device Whitelisting: If you want to connect to your SAS Visual Analytics server from the app, your administrator will “whitelist” your mobile device. Your device is first registered as a valid device that can connect to the Visual Analytics server. The whitelist affects devices, not users. If you happen to lose your mobile device, your administrator can remove the device from the whitelist and prevent access to the reports and data. The option to “blacklist” devices is also available.
    2. Cached Reports: After you add Visual Analytics reports to your app, if you don’t want the report data to remain with the report in the app, your administrator can enable the cached report feature. Data is downloaded only when you open and view the report on your mobile device. When you close the report, that data is removed from the device. For enhanced security, thumbnail images for report tiles in your app will not display for cached reports.
    3. Passcode: To prevent anyone other than yourself from opening the Visual Analytics app, you can set a 4-digit passcode for the app. There are two kinds of passcodes: required and optional. A required passcode is mandated by the server – when you connect to the server, you will create a passcode. Then, whenever you open the app or view a report from that server, you must enter the passcode. An optional passcode, on the other hand, is a passcode that you choose to use to lock up the app – it is not required to access the server, it is needed only to open the app. In addition, there are several features for passcode use that solidify security and access to the app: time-out, lock-out and so forth. I’ll go over these features in an upcoming blog.
    4. SSL/HTTPS: If the Visual Analytics server is set up with SSL/HTTPS, the data viewed in the reports on your mobile device is encrypted.
    5. Offline: If you were offline for a specified number of days, you must sign into the server again. If you don’t, the app does not download reports, update reports, or open reports for viewing.

Cached Reports

One of the security features we just talked about was the cached report feature. Here’s how cached report thumbnails are displayed in the Visual Analytics app on Windows 10, without any images.

When you tap the thumbnail for the cached report, data is immediately downloaded and the report opens in the app for viewing and interaction:

When you close this cached report in the app, the data is removed from the device and the cached report thumbnail displays in the app without any images.

Thanks for joining me on this whirlwind security tour of the SAS Visual Analytics app. Now you know the many different security mechanisms that are in place to protect your organization’s data and reports accessed from the mobile app.

Five key security features in the SAS Visual Analytics app was published on SAS Users.

4月 112019
 

What's the impact of using data governance and analytics for the business side of education? It's an interesting question, and during a video interview, Dale Pietrzak, Ed.D., Director of Institutional Effectiveness and Accreditation (IEA) at the University of Idaho shared details on the results they're realizing from using SAS for data [...]

The impact of data governance and analytics: An interview with the U. of Idaho was published on SAS Voices by Georgia Mariani

4月 082019
 
The catch phrase “everything happens somewhere” is increasingly common these days.  That “somewhere” translates into a location on the Earth; a latitude and longitude.  When one of these “somewhere’s” is combined with many other “somewhere’s”, you quickly have a robust spatial data set that becomes actionable with the right analytic tools.

Opportunities for Spatial Analytics are increasing

In today’s modern world, GPS-enabled devices are ubiquitous, and their use continues to increase daily.  Cell phones, cars, fitness trackers, and cameras are all able to locate and track our position.  As a result, the location analytics market is expected to grow to over USD 16 Billion by 2021, up 17.6% from 2016 [1].

Waldo Tobler, an American-Swiss geographer and cartographer, developed his First Law of Geography based on this concept of everything happening somewhere.  He stated, “Everything is related to everything else, but near things are more related than distant things”[2].  As analytic professionals, we are accustomed to working with these correlations using scatterplots, heatmaps, or clustering models.  But what happens when we add a geographic map into the analysis?

Maps offer the ability to unlock a new level of insight into our data that traditional graphs do not offer: personal connection.  As humans, we naturally relate to our surroundings on a spatial level.   It helps build our perspective and frame of reference through which we view and navigate the world.  We feel a sense of loss when a physical landmark from our childhood – a building, tree, park, or route we used to walk to school – is destroyed or changed from the memories we have of it.  In this sense, we are connected, spatially and emotionally, to our surroundings.

We inherently understand how data relates to the world around us, at some level, just by viewing it on a map.  Whether it is a body of water or a mountain affecting a driving route or maybe a trendy area of a city causing housing prices to increase faster than the local average, a map connects us with these facts intuitively.  We come to these basic conclusions based solely on our experiences in the world and knowledge of the physical landmarks in the map.

One of the best examples of this is the 1854 Cholera outbreak in London.  Dr. John Snow was one of the first to use a map for understanding the origin of an epidemiological outbreak.  He created a map of the affected London neighborhood by plotting the location of all known Cholera deaths.  In addition to the deaths, he also plotted the location of 13 community wells that served as the public water supply.  Using this data, he was able to see a clustering of deaths around a single pump.  Armed with this information, Dr. Snow was able to convince local officials to remove the handle from the Broad Street pump.  Once removed, new cases of Cholera quickly began to diminish.  This helped prove his theory the outbreak’s origin was not air-borne as commonly believed during that time, but rather of a water-borne origin. [3]

1854 London Cholera deaths: Tabular data vs. Coordinate map [3]

Let’s look at how Dr. Snow’s map helped mitigate the outbreak and prove his theory.  The image above compares the data of the recorded deaths and community wells in tabular form to a Coordinate map.  It is obvious from the coordinate map that there is a clustering of points.  Town officials and those familiar with the neighborhood could easily get a sense of where the outbreak was concentrated.  The map told a better story by connecting their personal experience of the area to the locations of the deaths and ultimately to the wells.  Something a data table or traditional graph could not do.

Maps of London Cholera deaths with modern analytic overlays [3]

Today, with the computing power and modern analytic methods available to us, we can take the analysis even further.  The examples above show the same coordinate map with added Voronoi polygon and cluster analysis overlays.  The concentration around the Broad Street pump becomes even clearer, showing why Geographic Maps are an important tool to have in your analytic toolbox.

SAS Global Forum 2019 is being held April 28-May 1, 2019 in Dallas, Texas.  If you are planning to go to this year’s event, be sure to attend one of our presentations on the latest mapping features included in SAS Visual Analytics and BASE SAS.  While you’re there, don’t forget to stop by the SAS Mapping booth located in the QUAD to say ‘Hi!’ and let us help with your spatial data needs.  See you in Dallas!

Introduction to Esri Integration in SAS Visual Analytics

  • Monday, April 29, 4:30-5:30p, Room: Level 1, D162

There’s a Map for That! What’s New and Coming Soon in SAS Mapping Technologies

  • Tuesday April 30, 4:00-4:30p, Room: Level 1, D162

Creating Great Maps in ODS Graphics Using the SGMAP Procedure

  • Wednesday May 01, 11:30a-12:30p, Room: Level 1, D162

[1] https://www.marketsandmarkets.com/Market-Reports/location-analytics-market-177193456.html

[2] https://en.wikipedia.org/wiki/Tobler%27s_first_law_of_geography

[3] https://www1.udel.edu/johnmack/frec682/cholera/

How the 1854 Cholera outbreak showed us the importance of spatial analysis was published on SAS Users.

3月 272019
 

SAS Visual Analytics supports region maps for Country, US states, and provinces out-of-the-box.  These work well for small scale maps covering the world, a continent, or a single country.  However, other regions are often needed.  Beginning in version 8.3, VA supports custom polygons to display regions such as sales territories, counties, or zip codes.

Region (choropleth) maps use a fill color to show relationships between the regions based upon a response value from your data.  Using custom polygons in VA follows the same steps outlined in previous posts for predefined or custom coordinate geography items, with just a few additional steps.  Here’s the basic flow:

  • Identify your data
  • Import polygon shapefile into SAS dataset
  • Import the shape dataset into VA
  • Create a Custom polygon provider
  • Create the geography item
  • Create and customize the map

Before we begin

VA supports two sources for creating custom polygons: Esri shapefiles and Esri Feature Services.  The goal for this post is to show how to create custom polygons using an Esri shapefile.

Typically, when working with custom polygons, you will have two datasets: the first defines the custom regions (shape data) and the second contains the data you wish to map (business data).  The shape data is derived from an Esri shapefile or feature service.  The business data can be in a shapefile or any format supported by VA (.sas7bdat, .csv, .xls, etc). It contains the information you want to analyze distributed across the regions defined by the shape data.

It is recommended that you verify the imported shape data before using it in your final map.  This will confirm the data is valid and make debugging an issue easier should you encounter any errors.  To verify, use the same dataset for both the shape and business data.  The example below will use this approach.

Access to a GIS application such as Esri’s ArcGIS or QGIS is recommended.  There are two areas where they can help you prepare to use custom polygons in your VA map:

  • Creating a shapefile to define polygons specific to your business need or application
  • Viewing the attribute table of existing shapefiles to determine its unique identifier column

For this example, we will be creating a map of registered Neighborhood Associations in Boise, Idaho. To follow along, download the data from the City of Boise open data site: Boise Neighborhood Associations

1. Identify your data

Shape data

The shape data defining the custom regions needs to be in an Esri shapefile format. These files can be created in a GIS application or obtained from a wide variety of online sources such as: the US Census Bureau (http://www.census.gov); local and state municipalities; state agencies such as the Department of Transportation; and university GIS departments.  Most municipalities now have Open Data portals that provide a wealth of reliable data for public use.  These sources are maintained by dedicated staff and are updated regularly.

Business data

The business data can be specific to your company’s operation or customer base.  Or it can be broad and general using census or demographic information.  It answers the question of What you want to analyze on the map.  The business data must contain a column that aligns with your shape data.  For example: If you want to map the age distribution and spending habits of your target customers across zip codes, then your business data must have a column for zip codes that allows it to be joined to a zip code region in the shape data.

2. Import polygon data into a SAS dataset

VA 8.3 does not support the native shapefile format. To use a shapefile in VA, you must first import it into SAS.  Included with Viya3.4, the %shpimprt macro will convert a shapefile into a SAS dataset and load it into CAS.  You can find the documentation for it here: %shpimprt documentation.

Alternatively, the shapefile can be manually imported with these basic steps:

  • Import the shapefile into SAS
  • Add a sequence column to the dataset
  • Reduce the density of the dataset
  • Limit the dataset based on the density value

Additional details and sample code for each of these steps can be found in the text file linked here: Manual shapefile import steps.

3. Import the shape dataset into VA

Next, we must import the dataset into VA, if using the manual shapefile import process.  To do this, locate the data pane on the left of VA.  From the ‘Open Data Source’ window, select Import > Local File.  Navigate to the location of the SAS dataset created from Step 2 and click the Open button.

Adjust the target location as needed, based on your VA installation, and make note of the location selected.  This path will be required to configure the custom polygon provider. Review and adjust the other options as needed.  Click the blue ‘Import Item’ button at the top of the window to start the import process.  A message will appear indicating the import status. Upon successful import, click the 'OK' button to open the dataset.

Since we are using the same dataset for the shape and business data, we need to make a copy of the category variable that will be used for our map. Right click on ‘ASSOCIATIO’ and select ‘Duplicate’.  Next, let’s change the names of both variables to better distinguish them from one another:

  • Change ‘ASSOCIATIO’ to ‘Business data’
  • Change ‘ASSOCIATIO (1)’ to ‘Shape data’

4. Create the geography item

We are now ready to start creating the geography item.  With Custom polygons, an additional step is required beyond what was described in previous posts with predefined and custom coordinates geography items.  We must define a Custom Polygon provider so VA knows how to locate and display the Boise Neighborhood Associations.  This is needed only once and is part of the geography item setup you are familiar with.

Our goal is to map the regions of the Boise Neighborhood Associations, so we will use ‘Shape data’ for our geography item.  Locate it in the VA data panel and change its Classification type to ‘Geography’.  From the ‘Geography data type’ dropdown, select ‘Custom polygonal shapes’. Several new fields will be displayed.  In the ‘Custom polygon provider’ dropdown, click the ‘Define new polygon provider’ button.

A ‘New Polygon Provider’ window will appear.  All fields shown are required.  The Advanced section has additional options, but they are not needed for this example.

Configure the fields based on the following:

  • Name / Label – Enter ‘Boise Neighborhoods’ for both (these values do not have to be the same)
  • Type – The default CAS Table is the correct option for this example.
  • Server / Library – These values must match those used for the data upload in Step 3.
  • Table – Select the name of the table uploaded in Step 3 (Boise_Neighborhoods)
  • ID Column – The unique identifier column of the dataset. Used to join the shape and business data together. (Select OBJECTID)
  • Sequence Column – This column is created during the import process from Step 2. Needed by VA to display the custom regions. (Select SEQUENCE)

The custom polygon provider is now configured.  All that is needed to finish the geography item setup, is to identify the Region ID.  This is the crucial step that will join the shape data to the business data.  The Region ID column must match the ID Column chosen when the custom polygon provider was setup.  Since we are using the same dataset in this example, that value is the same: OBJECTID.

In cases where different datasets are used for the shape and business data, the name of Region ID and ID Column may be different.  The column labels are not important, but their content must match for the join to occur.

Notice that once you select the correct RegionID value, the preview window will display the custom regions from the imported shape data.  The Latitude and Longitude columns are not required in this example.  Click the ‘OK’ button, to finish the setup.

5. Create and customize the map

You are now ready to create your map.  Drag the Boise Neighborhoods geography item to the report canvas.  Let’s enhance the appearance of our map by making a few style changes:

  • Set a Color role to shade the Neighborhood Association regions (Roles > Color > Business data)
  • Position the legend on the left of the map (Options > Legend)
  • Adjust the transparency of the fill color to 45% (Options > Map Transparency)
  • Change the map service to Esri World Street Map (Options > Map service)

Final map with custom polygons.

Congratulations!  You have just created your first custom region map.  In this post we discussed how to use the Custom Polygon provider to define your own regions using an Esri shapefile.  Compared to the Predefined and Custom Coordinate options, custom polygons give you additional flexibility and control over how your spatial data is analyzed.

Creating custom region maps with SAS Visual Analytics was published on SAS Users.

3月 152019
 
SAS makes it easy for you to create a large amount of procedure output with very few statements. However, when you create a large amount of procedure output with the Output Delivery System (ODS), your SAS session might stop responding or run slowly. In some cases, SAS generates a “Not Responding” message. Beginning with SAS® 9.3, the SAS windowing environment creates HTML output by default and enables ODS Graphics by default. If your code creates a large amount of either HTML output or ODS Graphics output, you can experience performance issues in SAS. This blog article discusses how to work around this issue.

Option 1: Enable the Output window instead of the Results Viewer window

By default, the SAS windowing environment with SAS 9.3 and SAS® 9.4 creates procedure output in HTML format and displays that HTML output in the Results Viewer window. However, when a large amount of HTML output is displayed in the Results Viewer window, performance might suffer. To display HTML output in the Results Viewer window, SAS uses an embedded version of Internet Explorer within the SAS environment. And because Internet Explorer does not process large amounts of HTML output well, it can slow down your results.

If you do not need to create HTML output, you can display procedure output in the Output window instead. To do so, add the following statements to the top of your code before the procedure step:

   ods _all_ close; 
   ods listing;

The Output window can show results faster than HTML output that is displayed in the Results Viewer window.

If you want to enable the Output window via the SAS windowing environment, take these steps:

    1. Choose Tools ► Options ► Preferences.
    2. Click the Results tab.
    3. In this window, select Create listing and clear the Create HTML check box.
    4. Click OK.

A large amount of output in the Output window, which typically does not cause a performance issue, might still generate an “Output window is full” message. In that case, you can route your LISTING output to a disk file. Use either the PRINTTO procedure or the ODS LISTING statement with the FILE= option. Here is an example:

   ods _all_ close; 
   ods listing file="sasoutput.lst"; 

Option 2: Disable ODS Graphics

Beginning with SAS 9.3, the SAS windowing environment enables ODS Graphics by default. Therefore, most SAS/STAT® procedures now create graphics output automatically. Naturally, graphics output can take longer to create than regular text output. If you are running a SAS/STAT procedure but you do not need to create graphics output, add the following statement to the code before the procedure step:

   ods graphics off; 

If you want to set this option via the SAS windowing environment, take these steps:

    1. Choose Tools ► Options ► Preferences.
    2. Click the Results tab.
    3. In this window, clear the Use ODS Graphics check box.
    4. Click OK.

For maximum efficiency, you can combine the ODS GRAPHICS OFF statement with the statements listed in the previous section, as shown here:

   ods _all_ close;
   ods listing;
   ods graphics off; 

Option 3: Write ODS output to disk

You can ask SAS to write ODS output to disk but not to create output in the Results Viewer window. To do so, add the following statement to your code before your procedure step:

   ods results off;

Later in your SAS session, if you decide that you want to see output in the Results Viewer window, submit this statement:

   ods results on;

If you want to disable the Results Viewer window via the SAS windowing environment, take these steps:

    1. Choose Tools ► Options ► Preferences.
    2. Click the Results tab.
    3. In this window, clear the View results as they are generated check box.
    4. Click OK.

The ODS RESULTS OFF statement is a valuable debugging tool because it enables you to write ODS output to disk without viewing it in the Results Viewer window. You can then inspect the ODS output file on disk to check the size of it (before you open it).

Option 4: Suppress specific procedure output from the ODS results

In certain situations, you might use multiple procedure steps to send output to ODS. However, if you want to exclude certain procedure output from being written to ODS, use the following statement:

   ods exclude all;

Ensure that you place the statement right before the procedure step that contains the output that you want to suppress.

If necessary, use the following statement when you want to resume sending subsequent procedure output to ODS:

   ods exclude none;

Five reasons to use ODS EXCLUDE to suppress SAS output discusses the ODS EXCLUDE statement in more detail.

Conclusion

Certain web browsers display large HTML files better than others. When you use SAS to create large HTML files, you might try using a web browser such as Chrome, Firefox, or Edge instead of Internet Explorer. However, even browsers such as Chrome, Firefox, and Edge might run slowly when processing a very large HTML file.

Instead, as a substitute for HTML, you might consider creating PDF output (with the ODS PDF destination) or RTF output (with the ODS RTF destination). However, if you end up creating a very large PDF or RTF file, then Adobe (for PDF output) and Microsoft Word (for RTF output) might also experience performance issues.

The information in this blog mainly pertains to the SAS windowing environment. For information about how to resolve ODS issues in SAS® Enterprise Guide®, refer to Take control of ODS results in SAS Enterprise Guide.

How to view or create ODS output without causing SAS® to stop responding or run slowly was published on SAS Users.