SAS 9 to SAS Viya

2月 292020
 

Stored processes were a very popular feature in SAS 9.4. They were used in reporting, analytics and web application development. In Viya, the equivalent feature is jobs. Using jobs the same way as stored processes was enhanced in Viya 3.5. In addition, there is preliminary support for promoting 9.4 stored processes to Viya. In this post, I will examine what is sure to be a popular feature for SAS 9.4 users starting out with Viya.

What is a job? In Viya, that is a complicated question as there are a number of different types of jobs. In the context of the log, we are talking about jobs that can be created in SAS Studio or in the Job Execution application. A SAS Viya job consists of a program and its definition (metadata related to the program). The job definition includes information such as the job name, the author and when it was created. After you have created a job definition, you have an execution URL that you can share with others at your site. The execution URL can be entered into a web browser and ran without opening SAS Studio, or you can run a job directly from SAS Studio. When running a SAS job, SAS Studio uses the SAS Job Execution Web Application.

In addition, you can create task prompt(XLM) or an HTML form to provide a user interface to the job. When the user selects an option in the prompt and submits the job, the data specified in the form or task prompt is passed to a SAS session as global macro variables. The SAS program runs and the results are returned to the web browser. Sounds a lot like a stored process!

For more information on authoring jobs in SAS Studio, see SAS® Studio 5.2 Developer’s Guide: Working with Jobs.

With the release of Viya 3.5, there is preliminary support for the promotion of 9.4 Stored Processes to Viya Jobs. For more information on this functionality, see SAS® Viya® 3.5 Administration: Promotion (Import and Export).

How does it work? Stored processes are exported from SAS 9.4 using the export wizard or on the command-line and the resultant package can be imported to Viya using SAS Environment Manager or the sas-admin transfer CLI. The import process converts the stored processes to job definitions which can be run in SAS Studio or directly via a URL.

The new job definition includes the job metadata, SAS code and task prompts.

1. Job Metadata

The data is not included in the promotion process. However, it must be made available to the Viya compute server so that the job code and any dynamic prompts can access it. There are two Viya server contexts that need to have access to the data:

  • Job Execution Context: used when running jobs via a URL
  • SAS Studio Context: used when running jobs from SAS Studio

To make the data available, the compute server has to access it via a libname and the table must exist in the library.

To add a libname to the SAS Job Execution Compute context in SAS Environment Manager, select Contexts > Compute contexts and SAS Job Execution Compute context. Then select Edit > Advanced.

In the box labelled, “Enter each autoexec statement on a new line:” add a line for the libname. If you keep the same 8 character libname as in 9.4, you will have less to change in your code and prompts.

NOTE: the libname could be added to the /opt/sas/viya/config/etc/compsrv/default/autoexec_usermods.sas file on the compute server. While this is perhaps easier than adding to each context that requires it, this would make it available to every Viya compute process.

2. SAS Code

The SAS code that makes up the stored process is copied without modification to the Viya job definition. In most cases, the code will need to be edited so that it will run in the Viya environment. The SAS code in the definition can be modified using SAS Studio or the SAS Job Execution Web Application. Edit the code so that:

  • Any libnames in the new job code to point to the location of the data in Viya
  • Any other SAS 9 metadata related code is removed from the stored process

As you test your new jobs in Viya, additional changes may be needed to get them running.

3. Task Prompts

Prompts that were defined in metadata in the SAS 9 stored process are converted to the task prompts and stored within the job definition. The xml for the prompt definition can be accessed and edited from SAS Studio or the SAS Job Execution Web App. For more information on working with task prompts in Viya, see the SAS Studio Developers guide.

If you have shared prompts you want to include, it is recommended that you select, “Include dependent objects” when exporting from SAS 9.4. If you do not select this option, any shared prompts will be omitted from the package. If the shared prompts, libnames and tables are included in the package with the stored processes, the SAS 9 metadata based library and table definitions referenced in dynamic prompts will convert to use a library.table reference in Viya. When this happens, the XML for the prompt will include the libname.tablename of the dataset used to populate the prompt values. For example:

<datasource active=”true” name=”DataSource2″
defaultValue=”FINDATA.FINANCIAL__SUMMARY” name=”DataSource1″”>

If the libnames and tables are not included in the package, the prompt will show a url mapping to the tables location in 9.4 metadata. For example:

<DataSource active=”true” name=”DataSource2″ url=”/gelcorp/financecontent/Data/Source Data/FINANCIAL__SUMMARY(Table)”>

For the prompt to work in the latter case, you need to edit it and provide the libname.table reference in the same way as it is shown in the first example.

Including libraries and tables in a package imported to Viya results in folders that contain libraries and tables in 9.4 created in Viya. This may result in Viya folders that are not needed, because data does not reside in folders in Viya. As an administrator, you can choose:

  • Include the dependent data and tables in the package and clean up any extra folders after promotion
  • Exclude the dependent data and tables in the package and edit the data source in prompt xml to reference the libname.table (this is not a great option if you have shared prompts)

Issues encountered converting SAS 9 prompts to Viya SAS Prompt Interface will be cause warning messages to be included at the beginning of the xml defining the prompt interface.

As I mentioned earlier, you can run the job from SAS Studio or from the Job Execution Web Application. You can also run the job from a URL just like you could with stored processes in SAS 9.4. To get the URL for the job, select the job properties of in SAS Studio.

Earlier releases of Viya provided a different way to support stored processes. This consists of enabling access to the 9.4 stored process server and its stored processes in Viya. This approach is still supported in Viya 3.5 because, while jobs can replace some stored processes, they can not currently be embedded in a Viya Visual Analytics Report.

For more information, please check out:

Jobs: Stored processes in Viya was published on SAS Users.

5月 212019
 

If you spend any time working with maps and spatial data, having a fundamental understanding of coordinate systems and map projections becomes necessary.  It’s the foundation of how spatial data and maps work.  These areas invariably evoke trepidation and some angst, even in the most seasoned map professional.  And rightfully so, it can get complicated quickly. Fortunately, most of those worries can be set aside when creating maps with SAS Visual Analytics, without requiring a degree in Geodesy.

Visual Analytics includes several different coordinate system definitions configured out-of-the-box.  Like the Predefined geography types (see Fundamental of SAS Visual Analytics geo maps), they are selected from a drop-down list during the geography variable setup.  With the details handled by VA, all you need to know is what coordinate space your data uses and select the appropriate one.

The four Coordinate spaces included with VA are:

  1. World Geodetic System (WGS84)
    Area of coverage: World.  Used by GPS navigation systems and NATO military geodetic surveying.  This is the VA default and should work in most situations.
  2. Web Mercator
    Area of coverage: World.  Format used by Google maps, OpenStreetMap, Bing maps and other web map providers.
  3. British National Grid (OSGB36)
    Area of coverage: United Kingdom – Great Britain, Isle of Man
  4. Singapore Transverse Mercator (SVY21)
    Area of coverage: Singapore onshore/offshore

But what if your data does not use one of these?  For those situations, VA also supports custom coordinate spaces.  With this option, you can specify the definition of your desired coordinate space using industry standard formats for EPSG codes or Proj4 strings.  Before we get into the details of how to use custom coordinate spaces in VA, let’s take a step back and review the basics of coordinate spaces and projections.

Background

A coordinate space is simply a grid designed to cover a specific area of the Earth.  Some have global coverage (WGS84, the default in VA) and others cover relatively small areas (SVY21/Singapore Transverse Mercator).  Each coordinate space is defined by several parameters, including but not limited to:

  • Center coordinates (origin)
  • Coverage area (‘bounds’ or ‘extent’)
  • Unit of measurement (feet or meters)

Comparison of coordinate space definitions included in Visual Analytics -- Source: http://epsg.io

The image above compares the four coordinate space definitions included with VA.  The two on the right, BNG and Singapore Transverse Mercator, have a limited extent.  A red rectangle outlines the area of coverage for each region.  The two on the left, WGS84 and Mercator, are both world maps.  At first glance, they may appear to have the same coverage area, but they are not interchangeable.  The origin for both is located at the intersection of the Equator and the Prime Meridian.  However, the similarities end there.  Notice the extent for WGS84 covers the entire latitude range, from -90 to +90.  Mercator on the other hand, covers from -85 to +85 latitude, so the first 5 degrees from each Pole are not included.  Another difference is the unit of measurement.  WGS84 is measured in un-projected degrees, which is indicative of a spherical Geographic Coordinate System (GCS).  Mercator uses meters, which implies a Projected Coordinate System (PCS) used for a flat surface, ie. a screen or paper.

The projection itself is a complex mathematical operation that transforms the spherical surface of the GCS into the flat surface of the PCS.  This transformation introduces distortion in one or more qualities of the map: shape, area, direction, or distance.  The process of map projection compares to peeling an orange. Removing the peel and placing it on a flat surface will cause parts of it to stretch, tear or separate as it flattens. The same thing happens to a map projection.

A flat map will always have some degree of distortion.  The amount of distortion depends on the projection used.  Select a projection that minimizes the distortion in the areas most important to the map.  For example, are you creating a navigation map where direction is critical?  How about a World map to compare land mass of various countries?  Or maybe a local map of Municipality services where all factors are equally important?  These decisions are important if you are collecting and creating your data set from the field.  But, if you are using existing data sets, chances are that decision has already been made for you.  It then becomes a task of understanding what coordinate system was selected and how to use it within VA.

Using a Custom Coordinate Space in VA

When using VA’s custom coordinate space option, it is critical the geography variable and the dataset use the same coordinate space.  This tells VA how to align the grid used by the data with the grid used by the underlying map.  If they align, the data will be placed at the expected location.  If they don’t align, the data will appear in the wrong location or may not be displayed at all.

Illustration of aligning the map and data grids

To illustrate the process of using a custom coordinate space in VA, we will be creating a custom region map of the Oklahoma City School Districts.  The data can be found on the Oklahoma City Open Data Portal.  We will use the Esri shapefile format.  As you may recall from a previous blog post, Creating custom region maps with SAS Visual Analytics, the first step is to import the Esri shapefile data into a SAS dataset.

Once the shapefile has been successfully imported into SAS, we then must determine the coordinate system of the data.  While WGS84 is common and will work in many situations, it should not be assumed.  The first place to look is at the source, the data provider.  Many Open Data portals will have the coordinate system listed along with the metadata and description of the dataset.  But when using an Esri shapefile, there is an easier way to find what we need.

Locate the directory where you unzipped the original shapefile.  Inside of that directory is a file with a .prj extension.  This file defines the projection and coordinate system used by the shapefile.  Below are the contents of our .prj file with the first parameter highlighted.  We are only interested in this value.  Here, you can see the data has been defined in the Oklahoma State Plane coordinate system -- not in VA’s default WGS84.  So, we must use a custom coordinate system when defining the geography variable.

PROJCS["NAD_1983_StatePlane_Oklahoma_North_FIPS_3501_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101004]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]], PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",1968500],PARAMETER["False_Northing",0],PARAMETER["Central_Meridian",-98],PARAMETER["Standard_Parallel_1",35.5666666666667],PARAMETER["Standard_Parallel_2",36.7666666666667],PARAMETER["Scale_Factor",1],PARAMETER["Latitude_Of_Origin",35],UNIT["Foot_US",0.304800609601219]]

Next, we need to look up the Oklahoma State Plane coordinate system to find a definition VA understands.  From the main page of the SpatialReference.org website, type ‘Oklahoma State Plane’ into the search box. Four results are returned.  Compare the results with the string highlighted above.  You can see the third option is what we are looking for: NAD 1983 StatePlane Oklahoma North FIPS 3501 Feet.

Selecting the appropriate definition based on the .prj file contents

To get the definitions we need for VA, click the third link for the option NAD 1983 StatePlane Oklahoma North FIPS 3501 Feet.  Here you will see a grey box with a bulleted list of links.  Each of these links represent a definition for the Oklahoma StatePlane coordinate space.

Visual Analytics supports two of the listed formats, EPSG and Proj4.  EPSG stands for European Petroleum Survey Group, an organization that publishes a database of coordinate system and projection information.  The syntax of this format is epsg:<number> or esri:<number>, where <number> is a 4-6 digit for the desired coordinate system.  In our cases, the format we need is the title of the page:

ESRI:102724

The second format supported by VA is Proj4, the third link in the image above.  This format consists of a string of space-delimited name value pairs.  The Oklahoma StatePlane proj4 definition we are interested in is:

+proj=lcc +lat_1=35.56666666666667 +lat_2=36.76666666666667 +lat_0=35 +lon_0=-98 +x_0=600000.0000000001 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs

Now we have identified the coordinate system used by our data set and looked up its definition, we are ready to configure VA to use it.

Using a Projected Coordinate System definition in VA

The following section assumes you are familiar with custom region maps and setting up a polygon provider.  If not, see my previous post on that process, Creating custom region maps with SAS Visual Analytics.  The first step in setting up a geography variable for a custom region map is to start with the polygon provider.  At the bottom of the ‘Edit Polygon Provider’ window, there is an ‘Advanced’ section that is collapsed by default.  Expand it to see the Coordinate Space option.  By default, it is populated with the value EPSG:4326, which is the EPSG code for WGS84.  Since our Oklahoma City School District code data does not use WGS84, we need to replace this value with the EPSG code that we looked up from SpatialReference.org (ESRI:102724).

Using the same Custom Coordinate definition for Polygon provider and geography variable

Next, we must make sure to configure the geography variable itself with the same coordinate space as the polygon provider.  On the ‘Edit Geography Item’ window, the Coordinate Space option is the last item.  Again, we must change this from the default WGS84 to ESRI:102724.  From the dropdown list, select the option ‘Custom’.  A new entry box appears where we can enter the custom coordinate space definition.  If configured correctly, you should see your map in the preview thumbnail and a 100% mapped indicator.

Congratulations!  The setup was successful.  Now, simply click OK and drag the geography variable to the canvas.  VA’s auto-map feature will recognize it and display the custom region map.

In this post, I showed how to identify the coordinate system of your Esri shapefile data, lookup its epsg and proj4 definitions, and configure VA to use it via the Custom Coordinate space option.  While the focus was on a custom region map, the technique also applies to Custom Coordinate maps, minus the polygon provider setup.  The support of custom coordinate spaces in VA allow the mapping of practically any spatial dataset, giving you a new level of power and flexibility in your mapping efforts.

Essentials of Map Coordinate Systems and Projections in Visual Analytics was published on SAS Users.

4月 162019
 

This blog post is based on the Code Snippets tutorial video in the free SAS® Viya® Enablement course from SAS Education. Keep reading to learn more about code snippets or check out the video to follow along with the tutorial in real-time.

Has there ever been a block of code that you use so infrequently that you always seem to forget the options that you need? Conversely, has there ever been a block of code that you use so frequently that you grow tired of typing it all the time? Code snippets can greatly assist with both of these scenarios. In this blog post, we discuss using pre-installed code snippets and creating new code snippets within SAS Viya.

Pre-installed code snippets

Figure 1: Pre-installed Snippets

SAS Viya comes with several code snippets pre-installed, including snippets to connect to CAS. To access these snippets, expand the Snippets area on the left navigation panel of SAS Studio as shown in Figure 1. You can see that the snippets are divided into categories, making it easier to find them.

If you double-click a pre-installed code snippet, or if you click and drag the snippet into the code editor panel, then the snippet will appear in the panel.

Snippets can range from very simple to very complex. Some contain comments. Some contain macro variables. Some might be only a couple of lines of code. That is the advantage of snippets. They can be anything that you want them to be.

 

 

Create new snippets

Now, let’s create a snippet of our own. Figure 2 shows an example of code that calls PROC CARDINALITY. This code is complete and fully executable. When you have the code the way that you want in your code window, click on the shortcut button for Add to My Snippets above the code. The button is outlined in a box in Figure 2.

Figure 2: Add to My Snippets Button

A window will appear that asks you to name the snippet. Naming the snippet then saves it into the My Snippets area in the left navigation panel for future use.

Remember that snippets are extremely flexible. The code that you save does not have to be fully executable. Instead of supplying the data source in your code, you may instead include notes or comments about what needs to be added, which makes the code more general, but it is still a very useful snippet.

To use one of your saved snippets, simply navigate to the My Snippets area, then double-click on your snippet or drag it into the code window.

Want to learn more about SAS Viya? Download the free e-book Exploring SAS® Viya®: Programming and Data Management. The content in this e-book is based on SAS® Viya® Enablement," a free course available from SAS Education.

Using code snippets in SAS® Viya® was published on SAS Users.

4月 082019
 
The catch phrase “everything happens somewhere” is increasingly common these days.  That “somewhere” translates into a location on the Earth; a latitude and longitude.  When one of these “somewhere’s” is combined with many other “somewhere’s”, you quickly have a robust spatial data set that becomes actionable with the right analytic tools.

Opportunities for Spatial Analytics are increasing

In today’s modern world, GPS-enabled devices are ubiquitous, and their use continues to increase daily.  Cell phones, cars, fitness trackers, and cameras are all able to locate and track our position.  As a result, the location analytics market is expected to grow to over USD 16 Billion by 2021, up 17.6% from 2016 [1].

Waldo Tobler, an American-Swiss geographer and cartographer, developed his First Law of Geography based on this concept of everything happening somewhere.  He stated, “Everything is related to everything else, but near things are more related than distant things”[2].  As analytic professionals, we are accustomed to working with these correlations using scatterplots, heatmaps, or clustering models.  But what happens when we add a geographic map into the analysis?

Maps offer the ability to unlock a new level of insight into our data that traditional graphs do not offer: personal connection.  As humans, we naturally relate to our surroundings on a spatial level.   It helps build our perspective and frame of reference through which we view and navigate the world.  We feel a sense of loss when a physical landmark from our childhood – a building, tree, park, or route we used to walk to school – is destroyed or changed from the memories we have of it.  In this sense, we are connected, spatially and emotionally, to our surroundings.

We inherently understand how data relates to the world around us, at some level, just by viewing it on a map.  Whether it is a body of water or a mountain affecting a driving route or maybe a trendy area of a city causing housing prices to increase faster than the local average, a map connects us with these facts intuitively.  We come to these basic conclusions based solely on our experiences in the world and knowledge of the physical landmarks in the map.

One of the best examples of this is the 1854 Cholera outbreak in London.  Dr. John Snow was one of the first to use a map for understanding the origin of an epidemiological outbreak.  He created a map of the affected London neighborhood by plotting the location of all known Cholera deaths.  In addition to the deaths, he also plotted the location of 13 community wells that served as the public water supply.  Using this data, he was able to see a clustering of deaths around a single pump.  Armed with this information, Dr. Snow was able to convince local officials to remove the handle from the Broad Street pump.  Once removed, new cases of Cholera quickly began to diminish.  This helped prove his theory the outbreak’s origin was not air-borne as commonly believed during that time, but rather of a water-borne origin. [3]

1854 London Cholera deaths: Tabular data vs. Coordinate map [3]

Let’s look at how Dr. Snow’s map helped mitigate the outbreak and prove his theory.  The image above compares the data of the recorded deaths and community wells in tabular form to a Coordinate map.  It is obvious from the coordinate map that there is a clustering of points.  Town officials and those familiar with the neighborhood could easily get a sense of where the outbreak was concentrated.  The map told a better story by connecting their personal experience of the area to the locations of the deaths and ultimately to the wells.  Something a data table or traditional graph could not do.

Maps of London Cholera deaths with modern analytic overlays [3]

Today, with the computing power and modern analytic methods available to us, we can take the analysis even further.  The examples above show the same coordinate map with added Voronoi polygon and cluster analysis overlays.  The concentration around the Broad Street pump becomes even clearer, showing why Geographic Maps are an important tool to have in your analytic toolbox.

SAS Global Forum 2019 is being held April 28-May 1, 2019 in Dallas, Texas.  If you are planning to go to this year’s event, be sure to attend one of our presentations on the latest mapping features included in SAS Visual Analytics and BASE SAS.  While you’re there, don’t forget to stop by the SAS Mapping booth located in the QUAD to say ‘Hi!’ and let us help with your spatial data needs.  See you in Dallas!

Introduction to Esri Integration in SAS Visual Analytics

  • Monday, April 29, 4:30-5:30p, Room: Level 1, D162

There’s a Map for That! What’s New and Coming Soon in SAS Mapping Technologies

  • Tuesday April 30, 4:00-4:30p, Room: Level 1, D162

Creating Great Maps in ODS Graphics Using the SGMAP Procedure

  • Wednesday May 01, 11:30a-12:30p, Room: Level 1, D162

[1] https://www.marketsandmarkets.com/Market-Reports/location-analytics-market-177193456.html

[2] https://en.wikipedia.org/wiki/Tobler%27s_first_law_of_geography

[3] https://www1.udel.edu/johnmack/frec682/cholera/

How the 1854 Cholera outbreak showed us the importance of spatial analysis was published on SAS Users.

3月 272019
 

SAS Visual Analytics supports region maps for Country, US states, and provinces out-of-the-box.  These work well for small scale maps covering the world, a continent, or a single country.  However, other regions are often needed.  Beginning in version 8.3, VA supports custom polygons to display regions such as sales territories, counties, or zip codes.

Region (choropleth) maps use a fill color to show relationships between the regions based upon a response value from your data.  Using custom polygons in VA follows the same steps outlined in previous posts for predefined or custom coordinate geography items, with just a few additional steps.  Here’s the basic flow:

  • Identify your data
  • Import polygon shapefile into SAS dataset
  • Import the shape dataset into VA
  • Create a Custom polygon provider
  • Create the geography item
  • Create and customize the map

Before we begin

VA supports two sources for creating custom polygons: Esri shapefiles and Esri Feature Services.  The goal for this post is to show how to create custom polygons using an Esri shapefile.

Typically, when working with custom polygons, you will have two datasets: the first defines the custom regions (shape data) and the second contains the data you wish to map (business data).  The shape data is derived from an Esri shapefile or feature service.  The business data can be in a shapefile or any format supported by VA (.sas7bdat, .csv, .xls, etc). It contains the information you want to analyze distributed across the regions defined by the shape data.

It is recommended that you verify the imported shape data before using it in your final map.  This will confirm the data is valid and make debugging an issue easier should you encounter any errors.  To verify, use the same dataset for both the shape and business data.  The example below will use this approach.

Access to a GIS application such as Esri’s ArcGIS or QGIS is recommended.  There are two areas where they can help you prepare to use custom polygons in your VA map:

  • Creating a shapefile to define polygons specific to your business need or application
  • Viewing the attribute table of existing shapefiles to determine its unique identifier column

For this example, we will be creating a map of registered Neighborhood Associations in Boise, Idaho. To follow along, download the data from the City of Boise open data site: Boise Neighborhood Associations

1. Identify your data

Shape data

The shape data defining the custom regions needs to be in an Esri shapefile format. These files can be created in a GIS application or obtained from a wide variety of online sources such as: the US Census Bureau (http://www.census.gov); local and state municipalities; state agencies such as the Department of Transportation; and university GIS departments.  Most municipalities now have Open Data portals that provide a wealth of reliable data for public use.  These sources are maintained by dedicated staff and are updated regularly.

Business data

The business data can be specific to your company’s operation or customer base.  Or it can be broad and general using census or demographic information.  It answers the question of What you want to analyze on the map.  The business data must contain a column that aligns with your shape data.  For example: If you want to map the age distribution and spending habits of your target customers across zip codes, then your business data must have a column for zip codes that allows it to be joined to a zip code region in the shape data.

2. Import polygon data into a SAS dataset

VA 8.3 does not support the native shapefile format. To use a shapefile in VA, you must first import it into SAS.  Included with Viya3.4, the %shpimprt macro will convert a shapefile into a SAS dataset and load it into CAS.  You can find the documentation for it here: %shpimprt documentation.

Alternatively, the shapefile can be manually imported with these basic steps:

  • Import the shapefile into SAS
  • Add a sequence column to the dataset
  • Reduce the density of the dataset
  • Limit the dataset based on the density value

Additional details and sample code for each of these steps can be found in the text file linked here: Manual shapefile import steps.

3. Import the shape dataset into VA

Next, we must import the dataset into VA, if using the manual shapefile import process.  To do this, locate the data pane on the left of VA.  From the ‘Open Data Source’ window, select Import > Local File.  Navigate to the location of the SAS dataset created from Step 2 and click the Open button.

Adjust the target location as needed, based on your VA installation, and make note of the location selected.  This path will be required to configure the custom polygon provider. Review and adjust the other options as needed.  Click the blue ‘Import Item’ button at the top of the window to start the import process.  A message will appear indicating the import status. Upon successful import, click the 'OK' button to open the dataset.

Since we are using the same dataset for the shape and business data, we need to make a copy of the category variable that will be used for our map. Right click on ‘ASSOCIATIO’ and select ‘Duplicate’.  Next, let’s change the names of both variables to better distinguish them from one another:

  • Change ‘ASSOCIATIO’ to ‘Business data’
  • Change ‘ASSOCIATIO (1)’ to ‘Shape data’

4. Create the geography item

We are now ready to start creating the geography item.  With Custom polygons, an additional step is required beyond what was described in previous posts with predefined and custom coordinates geography items.  We must define a Custom Polygon provider so VA knows how to locate and display the Boise Neighborhood Associations.  This is needed only once and is part of the geography item setup you are familiar with.

Our goal is to map the regions of the Boise Neighborhood Associations, so we will use ‘Shape data’ for our geography item.  Locate it in the VA data panel and change its Classification type to ‘Geography’.  From the ‘Geography data type’ dropdown, select ‘Custom polygonal shapes’. Several new fields will be displayed.  In the ‘Custom polygon provider’ dropdown, click the ‘Define new polygon provider’ button.

A ‘New Polygon Provider’ window will appear.  All fields shown are required.  The Advanced section has additional options, but they are not needed for this example.

Configure the fields based on the following:

  • Name / Label – Enter ‘Boise Neighborhoods’ for both (these values do not have to be the same)
  • Type – The default CAS Table is the correct option for this example.
  • Server / Library – These values must match those used for the data upload in Step 3.
  • Table – Select the name of the table uploaded in Step 3 (Boise_Neighborhoods)
  • ID Column – The unique identifier column of the dataset. Used to join the shape and business data together. (Select OBJECTID)
  • Sequence Column – This column is created during the import process from Step 2. Needed by VA to display the custom regions. (Select SEQUENCE)

The custom polygon provider is now configured.  All that is needed to finish the geography item setup, is to identify the Region ID.  This is the crucial step that will join the shape data to the business data.  The Region ID column must match the ID Column chosen when the custom polygon provider was setup.  Since we are using the same dataset in this example, that value is the same: OBJECTID.

In cases where different datasets are used for the shape and business data, the name of Region ID and ID Column may be different.  The column labels are not important, but their content must match for the join to occur.

Notice that once you select the correct RegionID value, the preview window will display the custom regions from the imported shape data.  The Latitude and Longitude columns are not required in this example.  Click the ‘OK’ button, to finish the setup.

5. Create and customize the map

You are now ready to create your map.  Drag the Boise Neighborhoods geography item to the report canvas.  Let’s enhance the appearance of our map by making a few style changes:

  • Set a Color role to shade the Neighborhood Association regions (Roles > Color > Business data)
  • Position the legend on the left of the map (Options > Legend)
  • Adjust the transparency of the fill color to 45% (Options > Map Transparency)
  • Change the map service to Esri World Street Map (Options > Map service)

Final map with custom polygons.

Congratulations!  You have just created your first custom region map.  In this post we discussed how to use the Custom Polygon provider to define your own regions using an Esri shapefile.  Compared to the Predefined and Custom Coordinate options, custom polygons give you additional flexibility and control over how your spatial data is analyzed.

Creating custom region maps with SAS Visual Analytics was published on SAS Users.

4月 192018
 

A very common coding technique SAS programmers use is identifying the largest value for a given column using DATA Step BY statement with the DESCENDING option. In this example I wanted to find the largest value of the number of runs (nRuns) by each team in the SASHELP.BASEBALL dataset. Using a SAS workspace server, one would write:

DESCENDING BY Variables in SAS Viya

Figure 1. Single Threaded DATA Step in SAS Workspace Server

Figure 2 shows the results of the code we ran in Figure 1:

Figure 2. Result from SAS Code Displayed in Figure 1

To run this DATA Step distributed we will leveraging the SAS® Cloud Analytic Services in SAS® Viya™. Notice in Figure 3, there is no need for the PROC SORT step which is required when running DATA Step single threaded in a SAS workspace server. This is because SAS® Cloud Analytic Services in SAS® Viya™ . Instead we will use

Figure 3. Distributed DATA Step in SAS® Cloud Analytic Services in SAS® Viya™

Figure 4 shows the results when running distributed DATA Step in SAS® Cloud Analytic Services in SAS® Viya™.

Figure 4. Results of Distributed DATA Step in SAS® Cloud Analytic Services in SAS® Viya™

Conclusion

Until the BY statement running in the SAS® Cloud Analytic Services in SAS® Viya™ supports DESCENDING use this technique to ensure your DATA Step runs distributed.

Read more SAS Viya posts.

Read our SAS 9 to SAS Viya whitepaper.

How to Simulate DESCENDING BY Variables in DATA Step Code that Runs Distributed in SAS® Viya™ was published on SAS Users.

1月 242018
 

This article and accompanying technical white paper are written to help SAS 9 users process existing SAS 9 code multi-threaded in SAS Viya 3.3.
Read the full paper, Getting Your SAS 9 Code to Run Multi-Threaded in SAS Viya 3.3.

The Future is Multi-threaded Processing Using SAS® Viya®

When I first began researching how to run SAS 9 code as multi-threaded in SAS Viya, I decided to compile all the relevant information and detail the internal processing rules that guide how the SAS Viya Cloud Analytics Services (CAS) server handles code and data. What I found was that there are a simple set of guidelines, which if followed, help ensure that most of your existing SAS 9 code will run multi-threaded. I have stashed a lot of great information into a single whitepaper that is available today called “Getting Your SAS 9 Code to Run Multi-threaded in SAS Viya”.

Starting with the basic distinctions between single and parallel processing is key to understanding why and how some of the parallel processing changes have been implemented. You see, SAS Viya technology constructs code so that everything runs in pooled memory using multiple processors. Redefining SAS for this parallel processing paradigm has led to huge gains in decreasing program run-times, as well as concomitant increases in accuracy for a variety of machine learning techniques. Using SAS Viya products helps revolutionize how we think about undertaking large-scale work because now we can complete so many more tasks in a fraction of the time it took before.

The new SAS Viya products bring a ton of value compared to other choices you might have in the analytics marketplace. Unfortunately most open source libraries and packages, especially those developed for use in Python and R, are limited to single-threading. SAS Viya offers a way forward by coding in these languages using an alternative set of SAS objects that can run as parallel, multi-threaded, distributed processes. The real difference is in the shared memory architecture, which is not the same as parallel, distributed processing that you hear claimed from most Hadoop and cloud vendors. Even though parallel, distributed processing is faster than single-threading, it proverbially hits a performance wall that is far below what pooled and persisted data provides when using multi-threaded techniques and code.

For these reasons, I believe that SAS Viya is the future of data/decision science, with shared memory running against hundreds if not thousands of processors, and returning results back almost instantaneously. And it’s not for just a handful of statistical techniques. I’m talking about running every task in the analytics lifecycle as a multi-threaded process, meaning end-to-end processing through data staging, model development and deployment, all potentially accomplished through a point-and-click interface if you choose. Give SAS Viya a try and you will see what I am talking about. Oh, and don’t forget to read my technical white paper that provides a checklist of all the things that you may need to consider when transitioning your SAS 9 code to run multi-threaded in SAS Viya!

Questions or Comments?

If you have any questions about the details found in this paper, feel free to leave them in the comments field below.

Getting your SAS 9 code to run multi-threaded in SAS Viya 3.3 was published on SAS Users.

11月 292017
 

SAS Viya is an exciting addition to the SAS Platform, allowing you to conduct analysis faster than ever before and providing you the flexibility to utilize open source technologies and generate insights from data in any computing environment. The blog post “Top 12 Advantages of SAS Viya” does a great [...]

The post Learn more about SAS Viya with resources from SAS Education appeared first on SAS Learning Post.

10月 212017
 

Advantages of SAS ViyaThere are many compelling reasons existing SAS users might want to start integrating SAS Viya into their SAS9 programs and applications.  For me, it comes down to ease-of-use, speed, and faster time-to-value.  With the ability to traverse the (necessarily iterative) analytics lifecycle faster than before, we are now able to generate output quicker – better supporting vital decision-making in a reduced timeframe.   In addition to the positive impacts this can have on productivity, it can also change the way we look at current business challenges and how we design possible solutions.

Earlier this year I wrote about how SAS Viya provides a robust analytics environment to handle all of your big data processing needs.  Since then, I’ve been involved in testing the new SAS Viya 3.3 software that will be released near the end of 2017 and found some additional advantages I think warrant attention.  In this article, I rank order the main advantages of SAS Viya processing and new capabilities coming to SAS Viya 3.3 products.  While the new SAS Viya feature list is too long to list everything individually, I’ve put together the top reasons why you might want to start taking advantage of SAS Viya capabilities of the SAS platform.

1.     Multi-threaded everything, including the venerable DATA-step

In SAS Viya, everything that can run multi-threaded - does.  This is the single-most important aspect of the SAS Viya architecture for existing SAS customers.  As part of this new holistic approach to data processing, SAS has enabled the highly flexible DATA step to run multi-threaded, requiring very little modification of code in order to begin taking advantage of this significant new capability (more on that in soon-to-be-released blog).  Migrating to SAS Viya is important especially in those cases where long-running jobs consist of very long DATA steps that act as processing bottle-necks where constraints exist because of older single-threading configurations.

2.     No sorting necessary!

While not 100% true, most sort routines can be removed from your existing SAS programs.  Ask yourself the question: “What portion of my runtimes are due strictly to sorting?”  The answer is likely around 10-25%, maybe more.  In general, the concept of sorting goes away with in-memory processing.  SAS Viya does its own internal memory shuffling as a replacement.  The SAS Viya CAS engine takes care of partitioning and organizing the data so you don’t have to.  So, take those sorts out your existing code!

3.     VARCHAR informat (plus other “variable-blocking” informats/formats)

Not available in SAS 9.4, the VARCHAR informat/format allows you to store byte information without having to allocate room for blank spaces.  Because storage for columnar (input) values varies by row, you have the potential to achieve an enormous amount of (blank space) savings, which is especially important if you are using expensive (fast) disk storage space.  This represents a huge value in terms of potential data storage size reduction.

4.     Reduced I/O in the form of data reads and writes from Hive/HDFS and Teradata to CAS memory

SAS Viya can leverage Hive/HDFS and Teradata platforms by loading (lifting) data up and writing data back down in parallel using CAS pooled memory.  Data I/O, namely reading data from disk and converting it into a SAS binary format needed for processing, is the single most limiting factor of SAS 9.4.  Once you speed up your data loading, especially for extremely large data sets, you will be able to generate faster time to results for all analyses and projects.

5.     Persisted data can stay in memory to support multiple users or processing steps

Similar to SAS LASR, CAS can be structured to persist large data sets in memory, indefinitely.  This allows users to access the same data at the same time and eliminates redundancy and repetitive I/O, potentially saving valuable compute cycles.  Essentially, you can load the data once and then as many people (or processing steps) can reuse it as many times as needed thereafter.

6.     State-of-the-art Machine Learning (ML) techniques (including Gradient Boosting, Random Forest, Support Vector Machines, Factorization Machines, Deep Learning and NLP analytics)

All the most popular ML techniques are represented giving you the flexibility to customize model tournaments to include those techniques most appropriate for your given data and problem set.  We also provide assessment capabilities, thus saving you valuable time to get the types of information you need to make valid model comparisons (like ROC charts, lift charts, etc.) and pick your champion models.  We do not have extreme Gradient Boosting, Factorization Machines, or a specific Assessment procedure in SAS 9.4.  Also, GPU processing is supported in SAS Viya 3.3, for Deep Neural Networks and Convolutional Neural Networks (this has not be available previously).

7.     In-memory TRANSPOSE

The task of transposing data amounts to about 80% of any model building exercise, since predictive analytics requires a specialized data set called a ‘one-row-per-subject’ Analytic Base Table (ABT).  SAS Viya allows you transpose in a fraction of the time that it used to take to develop the critical ABT outputs.  A phenomenal time-saver procedure that now runs entirely multi-threaded, in-memory.

8.     API’s!!!

The ability to code from external interfaces gives coders the flexibility they need in today’s fast-moving programming world.  SAS Viya supports native language bindings for Lua, Java, Python and R.  This means, for example, that you can launch SAS processes from a Jupyter Notebook while staying within a Python coding environment.  SAS also provide a REST API for use in data science and IT departments.

9.     Improved model build and deployment options

The core of SAS  Viya machine learning techniques support auto-tuning.  SAS has the most effective hyper-parameter search and optimization routines, allowing data scientists to arrive at the correct algorithm settings with higher probability and speed, giving them better answers with less effort.  And because ML scoring code output is significantly more complex, SAS Viya Data Mining and Machine Learning allows you to deploy compact binary score files (called Astore files) into databases to help facilitate scoring.  These binary files do not require compilation and can be pushed to ESP-supported edge analytics.  Additionally, training within  event streams is being examined for a future release.

10.    Tons of new SAS visual interface advantages

A.     Less coding – SAS Viya acts as a code generator, producing batch code for repeatability and score code for easier deployment.  Both batch code and score code can be produced in a variety of formats, including SAS, Java, and Python.

B.     Improved data integration between SAS Viya visual analytics products – you can now edit your data in-memory and pass it effortlessly through to reporting, modeling, text, and forecasting applications (new tabs in a single application interface).

C.     Ability to compare modeling pipelines – now data scientists can compare champion models from any number of pipelines (think of SAS9 EM projects or data flows) they’ve created.

D.     Best practices and white box templates – once only available as part of SAS 9 Rapid Predictive Modeler, Model Studio now gives you easy access to basic, intermediate and advanced model templates.

E.     Reusable components – Users can save their best work (including pipelines and individual nodes) and share it with others.  Collaborating is easier than ever.

11.    Data flexibility

You can load big data without having all that data fit into memory.  Before in HPA or LASR engines, the memory environment had to be sized exactly to fit all the data.  That prior requirement has been removed using CAS technology – a really nice feature.

12.    Overall consolidation and consistency

SAS Viya seeks to standardize on common algorithms and techniques provided within every analytic technique so that you don’t get different answers when attempting to do things using alternate procedures or methods. For instance, our deployment of Stochastic Gradient Descent is now the same in every technique that uses that method.  Consistency also applies to the interfaces, as SAS Viya attempts to standardize the look-and-feel of various interfaces to reduce your learning curve when using a new capability.

The net result of these Top 12 advantages is that you have access to state-of-the-art technology, jobs finish faster, and you ultimately get faster time-to-value.  While this idea has been articulated in some of the above points, it is important to re-emphasize because SAS Viya benefits, when added together, result in higher throughputs of work, a greater flexibility in terms of options, and the ability to keep running when other systems would have failed.  You just have a much greater efficiency/productivity level when using SAS Viya as compared to before.  So why not use it?

Learn more about SAS Viya.
Tutorial Library: An introduction to SAS Viya programming for SAS 9 programmers.
Blog: Adding SAS Viya to your SAS 9 programming toolbox.

Top 12 Advantages of SAS Viya was published on SAS Users.

8月 052017
 

free trial of SAS Viya productsIt's a common mantra many parents use to encourage their children to expand their food choices and try something new. Even as adults we’re often more comfortable easing into the unfamiliar, taking small bites, tiny samples, even dipping a toe in the water before diving in headfirst.

Software is no different.

We’ve introduced trials of our latest products – and we encourage you to test them out, see what they are like and consider how they could benefit you and your organization.

SAS is currently offering a free trial of SAS Viya products. The latest evolution of the SAS platform. SAS Viya is a new engine, driving fast analytic answers for more data types, introducing new products, and accessible to those who code in SAS, use other coding languages – or prefer an interactive interface. Extending SAS 9, customers are enjoying the benefits, running existing code faster, adding new machine learning methods into existing SAS analysis and solving entirely new business questions – all from one, governed platform.

To help you learn more about this modernization of the SAS platform, we are now offering product trials at no cost and with no installation necessary. They even come with test data to deliver a complete experience. To start your free trial of SAS Viya products, just sign up at www.sas.com/viya ‘Try It for Free’ – and choose your analytics trial experience. You’ll use your existing SAS profile – or create one if you don’t already have one – to register and get started. In just a couple of minutes after we receive your request, we’ll send you a link with instructions, videos and step-by-step scripts you can follow to sign onto your SAS-hosted image. All you need is a web browser and a little curiosity and you’re on your way to expanding your analytics arsenal.

So, what do you say? Try it… we know you’ll like it.

Try it - you'll like it... was published on SAS Users.