Having the ability to know what the best decision is in any given scenario sounds like a superpower. What may surprise you, is that this “superpower” already exists. SAS® calls it intelligent decisioning. Decisioning is a powerful tool in the business world. It is useful to both the company and [...]
It’s safe to say that SAS Global Forum is a conference designed for users, by users. As your conference chair, I am excited by this year’s top-notch user sessions. More than 150 sessions are available, many by SAS users just like you. Wherever you work or whatever you do, you’ll find sessions relevant to your industry or job role. New to SAS? Been using SAS forever and want to learn something new? Managing SAS users? We have you covered. Search for sessions by industry or topic, then add those sessions to your agenda and personal calendar.
Creating a customizable agenda and experience
Besides two full days of amazing sessions, networking opportunities and more, many user sessions will be available on the SAS Users YouTube channel on May 20, 2021 at 10:00am ET. After you register, build your agenda and attend the sessions that most interest you when the conference begins. Once you’ve viewed a session, you can chat with the presenter. Don’t know where to start? Sample agendas are available in the Help Desk.
For the first time, proceedings will live on SAS Support Communities. Presenters have been busy adding their papers to the community. Everything is there, including full paper content, video presentations, and code on GitHub. It all premiers on “Day 3” of the conference, May 20. Have a question about the paper or code? You’ll be able to post a question on the community and ask the presenter.
Want training or help with your code?
Code Doctors are back this year. Check out the agenda for the specific times they’re available and make your appointment, so you’ll be sure to catch them and get their diagnosis of code errors. If you’re looking for training, you’ll be quite happy. Training is also back this year and it’s free! SAS instructor-led demos will be available on May 20, along with the user presentations on the SAS Users YouTube channel.
Chat with attendees and SAS
It is hard to replicate the buzz of a live conference, but we’ve tried our best to make you feel like you’re walking the conference floor. And we know networking is always an important component to any conference. We’ve made it possible for you to network with colleagues and SAS employees. Simply make your profile visible (by clicking on your photo) to connect with others, and you can schedule a meeting right from the attendee page. That’s almost easier than tracking down someone during the in-person event.
We know the exhibit hall is also a big draw for many attendees. This year’s Innovation Hub (formerly known as The Quad) has industry-focused booths and technology booths, where you can interact in real-time with SAS experts. There will also be a SAS Lounge where you can learn more about various SAS services and platforms such as SAS Support Communities and SAS Analytics Explorers.
Get started now
I’ve highlighted a lot in this blog post, but I encourage you to view this 7-minute Innovation Hub video. It goes in depth on the Hub and all its features.
This year there is no reason not to register for SAS Global Forum…and attend as few or as many sessions as you want. Why? Because the conference is FREE!
Where else can you get such quality SAS content and learning opportunities? Nowhere, which is why I encourage you to register today. See you soon!
I can’t believe it’s true, but SAS Global Forum is just over a month away. I have some exciting news to share with you, so let’s start with the theme for this year:
New Day. New Answers. Inspired by Curiosity.
What a fitting theme for this year! Technology continues to evolve, so each new day is a chance to seek new answers to what can sometimes feel like impossible challenges. Our curiosity as humans drives us to seek out better ways to do things. And I hope your curiosity will drive you to register for this year’s SAS Global Forum.
We are excited to offer a global event across three regions. If you’re in the Americas, the conference is May 18-20. In Asia Pacific? Then we’ll see you May 19-20. And we didn’t forget about Europe. Your dates are May 25-26. We hope these region-specific dates and the virtual nature of the conference means more SAS users than ever will join us for an inspiring event. Curious about the exciting agenda? It’s all on the website, so check it out.
Keynotes speakers that you’ll talk about for months to come
Want to be inspired to chase your “impossible” dreams? Or hear more about the future of AI? How about learning about work-life balance and your mental health? We have you covered. SAS executives are gearing up to host an exciting lineup of extremely smart, engaging and thought-provoking keynote speakers like Adam Grant, Ayesha Khanna and Hakeem Oluseyi.
And who knows, we might have a few more surprises up our sleeve. You’ll just have to register and attend to find out.
Papers and proceedings: simplified and easy to find
Have you joined the SAS Global Forum online community? You should, because that’s where you’ll find all the discussion around the conference…before, during and after. It’s also where you’ll find a link to the 2021 proceedings, when they become available. Authors are busy preparing their presentations now and they are hard at work staging their proceedings in the community. Join the community so you can connect with other attendees and know when the proceedings become available.
Stay tuned for even more details
SAS Global Forum is the place where creativity meets curiosity, and amazing analytics happens! I encourage you to regularly check the conference website, as we’re continually adding new sessions and events. You don’t want to miss this year’s conference, so don’t forget to register for SAS Global Forum. See you soon!
Registration is open for a truly inspiring SAS Global Forum 2021 was published on SAS Users.
The people, the energy, the quality of the content, the demos, the networking opportunities…whew, all of these things combine to make SAS Global Forum great every year. And that is no exception this year.
Preparations are in full swing for an unforgettable conference. I hope you’ve seen the notifications that we set the date, actually multiple dates around the world so that you can enjoy the content in your region and in your time zone. No one needs to set their alarm for 1:00am to attend the conference!
Go ahead and save the date(s)…you don’t want to miss this event!
Content, content, content
We are working hard to replicate the energy and excitement of a live conference in the virtual world. But we know content is king, so we have some amazing speakers and content lined up to make the conference relevant for you. There will be more than 150 breakout sessions for business leaders and SAS users, plus the demos will allow you to see firsthand the innovative solutions from SAS, and the people who make them. I, for one, am looking forward to attending live sessions that will allow attendees the opportunity to ask presenters questions and have them respond in real time.
Our keynote speakers, while still under wraps for now, will have you on the edge of your seats (or couches…no judgement here!).
Networking and entertainment
You read that correctly. We will have live entertainment that'll have you glued to the screen. And you’ll be able to network with SAS experts and peers alike. But you don’t have to wait until the conference begins to network, the SAS Global Forum virtual community is up and running. Join the group to start engaging with other attendees, and maybe take a guess or two at who the live entertainment might be.
A big thank you
We are working hard to bring you the best conference possible, but this isn’t a one-woman show. It takes a team, so I would like to introduce and thank the conference teams for 2021. The Content Advisory Team ensures the Users Program sessions meet the needs of our diverse global audience. The Content Delivery Team ensures that conference presenters and authors have the tools and resources needed to provide high-quality presentations and papers. And, finally, the SAS Advisers help us in a multitude of ways. Thank you all for your time and effort so far!
Registration opens in April, so stay tuned for that announcement. I look forward to “seeing” you all in May.
There’s nothing worse than being in the middle of a task and getting stuck. Being able to find quick tips and tricks to help you solve the task at hand, or simply entertain your curiosity, is key to maintaining your efficiency and building everyday skills. But how do you get quick information that’s ALSO engaging? By adding some personality to traditionally routine tutorials, you can learn and may even have fun at the same time. Cue the SAS Users YouTube channel.
With more than 50 videos that show personality published to-date and over 10,000 hours watched, there’s no shortage of learning going on. Our team of experts love to share their knowledge and passion (with personal flavor!) to give you solutions to those everyday tasks.
What better way to round out the year than provide a roundup of our most popular videos from 2020? Check out these crowd favorites:
- How to convert character to numeric in SAS
- How to import data from Excel to SAS
- How to export SAS data to Excel
Most hours watched
- How to import data from Excel to SAS
- How to convert character to numeric in SAS
- Simple Linear Regression in SAS
- How to export SAS data to Excel
- How to Create Macro Variables and Use Macro Functions
- The SAS Exam Experience | See a Performance-Based Question in Action
- How it Import CSV files into SAS
- SAS Certification Exam: 4 tips for success
- SAS Date Functions FAQs
- Merging Data Sets in SAS Using SQL
- Combining Data in SAS: DATA Step vs SQL
- How to Concatenate Values in SAS
- How to Market to Customers Based on Online Behavior
- How to Plan an Optimal Tour of London Using Network Optimization
- Multiple Linear Regression in SAS
- How to Build Customized Object Detection Models
Looking forward to 2021
We’ve got you covered! SAS will continue to publish videos throughout 2021. Subscribe now to the SAS Users YouTube channel, so you can be notified when we’re publishing new videos. Be on the lookout for some of the following topics:
- Transforming variables in SAS
- Tips for working with SAS Technical Support
- How to use Git with SAS
2020 roundup: SAS Users YouTube channel how to tutorials was published on SAS Users.
If you’re like me and the rest of the conference team, you’ve probably attended more virtual events this year than you ever thought possible. You can see the general evolution of virtual events by watching the early ones from April or May and compare them to the recent ones. We at SAS Global Forum are studying the virtual event world, and we’re learning what works and what needs to be tweaked. We’re using that knowledge to plan the best possible virtual SAS Global Forum 2021.
Everything is virtual these days, so what do we mean by virtual?
Planning a good virtual event takes time, and we’re working through the process now. One thing is certain -- we know the importance of providing quality content and an engaging experience for our attendees. We want to provide attendees with the opportunity as always, but virtually, to continue to learn from other SAS users, hear about new and exciting developments from SAS, and connect and network with experts, peers, partners and SAS. Yes, I said network. We realize it won’t be the same as a live event, but we are hopeful we can provide attendees with an incredible experience where you connect, learn and share with others.
Call for content is open
One of the differences between SAS Global Forum and other conferences is that SAS users are front and center, and the soul of the conference. We can’t have an event without user content. And that’s where you come in! The call for content opened November 17 and lasts through December 21, 2020. Selected presenters will be notified in January 2021. Presentations will be different in 2021; they will be 30 minutes in length, including time for Q&A when able. And since everything is virtual, video is a key component to your content submission. We ask for a 3-minute video along with your title and abstract.
The Student Symposium is back
Calling all postsecondary students -- there’s still time to build a team for the Student Symposium. If you are interested in data science and want to showcase your skills, grab a teammate or two and a faculty advisor and put your thinking caps on. Applications are due by December 21, 2020.
I encourage you to visit the SAS Global Forum website for up-to-date information, follow #SASGF on social channels and join the SAS communities group to engage with the conference team and other attendees.
Connect, learn and share during virtual SAS Global Forum 2021 was published on SAS Users.
Several of my colleagues and I attended the annual Esri User Conference last month in San Diego - along with 18,000 other Geo professionals. It was a busy week of meetings, seminars and talks about the latest in GIS and Spatial technologies. The days were long and exhausting, but it was also exciting and a ton of fun. As we continue to process, plan and prepare to integrate some of these technologies into SAS Visual Analytics, I thought it would be beneficial to highlight the Esri features available in VA today.
One topic that received a lot of questions during this year’s SAS Global Forum in Dallas was that of Geocoding. Geocoding is the process of transforming text address data into numeric latitude and longitude values. Once the latitude and longitude are known, they can be mapped and analyzed spatially. SAS has offered geocoding capabilities for quite some time as a part of SAS/Graph. Beginning with SAS v940m5, PROC GEOCODE has moved into BASE SAS. See my colleague’s blog posts here and here for more information on geocoding from BASE SAS.
But Geocoding is no longer limited to just Base SAS. You can also geocode from within Visual Analytics, thanks to the integration with the Esri geocoding api. This feature is part of the Esri Premium agreement, and became available in VA 8.3. Esri premium features require an existing relationship and credentials with Esri. This post assumes that relationship exists and your credentials have been validated. I will discuss the details of the Esri premium features in a future post, but for today the focus is how to use the Esri Geocoding feature from VA with a real-world data set.
1. Getting the data into Visual Analytics
We will be using point data from the City of Dallas for the Public Library branch locations. You can download the .csv file from the Dallas Open Data portal. After downloading, it must be imported into VA for geocoding.
- From the Data tab in VA, select Import > Local File
- Navigate to the location of the Dallas library .csv file and select it
- Adjust the default settings, if desired, and click the ‘Import Item’ button
- Once you see the green success message, the data has been imported into VA and is ready to be geocoded. Click the ‘Cancel’ button
2. Selecting the data columns to geocode on
Accessing the Geocoding feature in VA follows a similar process to the steps we just performed to import the .csv file.
- From the Data tab in VA, select Import > Esri > Geocode. Here, you must select the location of the newly imported library data set. This path will vary depending upon the configuration of your VA instance. For my installation, it is located at cas-shared-default > Public folder > CITY_OF_DALLAS_LIBRARY_LOCATIONS. Once located, click the 'Select' button
- The Geocoding Import window will open. This window should look familiar. The top half is the same as the Import data we just used to get the .csv file into VA. Essentially, the geocoding process is a new data import. It will send selected columns to Esri via a REST api call. The response will contain the corresponding latitude and longitude values we desire. They will be added to our existing data set and imported into VA as a new geocoded data set. The name of the new data set will have _GEO_CODE appended to the end of the original data set name. This name can be modified as desired.
- At the bottom of the Geocoding Import window are two list boxes, Available items and Selected items. The Available items box on the left contains all columns in the data set. Select the column(s) containing the address information you wish to geocode. Double click or click the right arrow to move them to the Selected items window on the right. In this example, we are using the Address column.
- VA concatenates the selected column(s) to generate a sample address for geocoding. Clicking the ‘Test’ button returns coordinates for the sample address and a score representing the confidence level of the results. In the screenshot above, our score is 71/100 for the test address. Not bad, but it could be better. More on this a bit later.
- To finish the geocoding process, click the ‘Import Item’ at the top of the page, as we did with the original .csv file import. This time, you will be presented with a new dialog window. Geocoding, as with other Esri premium features require the use of credits. This dialog indicates how many Esri credits will be used by the geocoding process and will also be discussed in detail in a future post.
For now, select 'Yes' to continue. When you see the green success message, the operation is complete. We are now ready to map our Dallas Library locations. Click 'Ok' to open the new geocoded data set.
3. Create the geography variable and display the map
Next, we need to create our geography variable from the new geocoded data set. As part of the geocoding process, four new columns have been added to the new data set: esri_latitude, esri_longitude, esri_score, esri_address. We only need the esri_latitude and esri_longitude columns for our map.
- Select the Branch Name category variable and change its Classification to Geography
- For Geography data type, select Custom Coordinates
- Select esri_latitude for Latitude
- Select esri_longitude for Longitude
- Click 'OK'
- Drag the Branch Name geography variable to the canvas to create the map
What happened?? Our data set contains Dallas Public library locations, so why are the data points spread across the world? It’s all in the data. If you look at the original data a bit deeper, you will notice the Address field we selected for the geocoding only contains the street number and street name of the library location. It does not contain enough information to make it unique. Therefore, during the geocoding process, the first instance of that address will be considered a match, regardless of where it is actually located.
In the image above for the Preston Royal branch, its street number and name were a perfect match to a location in Eugene, Oregon. Not quite what we were looking for. So, how do we fix this? To make our addresses unique, it requires a simple addition to the source data .csv file.
We need to add a ‘City’ and ‘State’ column to the original .csv file with the values of ‘Dallas’ and ‘Texas’ assigned to all entries. This will ensure each address is unique and within our area of interest. Re-import the new .csv file and geocode it using the Address, City and State columns. The result? A confidence score of a perfect 100. Much better than our first attempt! This will now give us the map we desire for the Dallas Public Library locations.
In this post, I used real-world data to illustrate two things: the importance of knowing your data set, and how to geocode address information in SAS Visual Analytics. Public data sets are a great resource but need to be used with a critical eye. They may still need additional cleansing in order to work for your situation.
The geocoding feature is one example of the premium Esri features currently available in VA. In future posts, I will go into more detail on other Esri features available, what make these features ‘premium’ and examples of how to use them. Stay tuned!
Esri integration with SAS Visual Analytics: Geocoding was published on SAS Users.
If you spend any time working with maps and spatial data, having a fundamental understanding of coordinate systems and map projections becomes necessary. It’s the foundation of how spatial data and maps work. These areas invariably evoke trepidation and some angst, even in the most seasoned map professional. And rightfully so, it can get complicated quickly. Fortunately, most of those worries can be set aside when creating maps with SAS Visual Analytics, without requiring a degree in Geodesy.
Visual Analytics includes several different coordinate system definitions configured out-of-the-box. Like the Predefined geography types (see Fundamental of SAS Visual Analytics geo maps), they are selected from a drop-down list during the geography variable setup. With the details handled by VA, all you need to know is what coordinate space your data uses and select the appropriate one.
The four Coordinate spaces included with VA are:
- World Geodetic System (WGS84)
Area of coverage: World. Used by GPS navigation systems and NATO military geodetic surveying. This is the VA default and should work in most situations.
- Web Mercator
Area of coverage: World. Format used by Google maps, OpenStreetMap, Bing maps and other web map providers.
- British National Grid (OSGB36)
Area of coverage: United Kingdom – Great Britain, Isle of Man
- Singapore Transverse Mercator (SVY21)
Area of coverage: Singapore onshore/offshore
But what if your data does not use one of these? For those situations, VA also supports custom coordinate spaces. With this option, you can specify the definition of your desired coordinate space using industry standard formats for EPSG codes or Proj4 strings. Before we get into the details of how to use custom coordinate spaces in VA, let’s take a step back and review the basics of coordinate spaces and projections.
A coordinate space is simply a grid designed to cover a specific area of the Earth. Some have global coverage (WGS84, the default in VA) and others cover relatively small areas (SVY21/Singapore Transverse Mercator). Each coordinate space is defined by several parameters, including but not limited to:
- Center coordinates (origin)
- Coverage area (‘bounds’ or ‘extent’)
- Unit of measurement (feet or meters)
The image above compares the four coordinate space definitions included with VA. The two on the right, BNG and Singapore Transverse Mercator, have a limited extent. A red rectangle outlines the area of coverage for each region. The two on the left, WGS84 and Mercator, are both world maps. At first glance, they may appear to have the same coverage area, but they are not interchangeable. The origin for both is located at the intersection of the Equator and the Prime Meridian. However, the similarities end there. Notice the extent for WGS84 covers the entire latitude range, from -90 to +90. Mercator on the other hand, covers from -85 to +85 latitude, so the first 5 degrees from each Pole are not included. Another difference is the unit of measurement. WGS84 is measured in un-projected degrees, which is indicative of a spherical Geographic Coordinate System (GCS). Mercator uses meters, which implies a Projected Coordinate System (PCS) used for a flat surface, ie. a screen or paper.
The projection itself is a complex mathematical operation that transforms the spherical surface of the GCS into the flat surface of the PCS. This transformation introduces distortion in one or more qualities of the map: shape, area, direction, or distance. The process of map projection compares to peeling an orange. Removing the peel and placing it on a flat surface will cause parts of it to stretch, tear or separate as it flattens. The same thing happens to a map projection.
A flat map will always have some degree of distortion. The amount of distortion depends on the projection used. Select a projection that minimizes the distortion in the areas most important to the map. For example, are you creating a navigation map where direction is critical? How about a World map to compare land mass of various countries? Or maybe a local map of Municipality services where all factors are equally important? These decisions are important if you are collecting and creating your data set from the field. But, if you are using existing data sets, chances are that decision has already been made for you. It then becomes a task of understanding what coordinate system was selected and how to use it within VA.
Using a Custom Coordinate Space in VA
When using VA’s custom coordinate space option, it is critical the geography variable and the dataset use the same coordinate space. This tells VA how to align the grid used by the data with the grid used by the underlying map. If they align, the data will be placed at the expected location. If they don’t align, the data will appear in the wrong location or may not be displayed at all.
To illustrate the process of using a custom coordinate space in VA, we will be creating a custom region map of the Oklahoma City School Districts. The data can be found on the Oklahoma City Open Data Portal. We will use the Esri shapefile format. As you may recall from a previous blog post, Creating custom region maps with SAS Visual Analytics, the first step is to import the Esri shapefile data into a SAS dataset.
Once the shapefile has been successfully imported into SAS, we then must determine the coordinate system of the data. While WGS84 is common and will work in many situations, it should not be assumed. The first place to look is at the source, the data provider. Many Open Data portals will have the coordinate system listed along with the metadata and description of the dataset. But when using an Esri shapefile, there is an easier way to find what we need.
Locate the directory where you unzipped the original shapefile. Inside of that directory is a file with a .prj extension. This file defines the projection and coordinate system used by the shapefile. Below are the contents of our .prj file with the first parameter highlighted. We are only interested in this value. Here, you can see the data has been defined in the Oklahoma State Plane coordinate system -- not in VA’s default WGS84. So, we must use a custom coordinate system when defining the geography variable.
Next, we need to look up the Oklahoma State Plane coordinate system to find a definition VA understands. From the main page of the SpatialReference.org website, type ‘Oklahoma State Plane’ into the search box. Four results are returned. Compare the results with the string highlighted above. You can see the third option is what we are looking for: NAD 1983 StatePlane Oklahoma North FIPS 3501 Feet.
To get the definitions we need for VA, click the third link for the option NAD 1983 StatePlane Oklahoma North FIPS 3501 Feet. Here you will see a grey box with a bulleted list of links. Each of these links represent a definition for the Oklahoma StatePlane coordinate space.
Visual Analytics supports two of the listed formats, EPSG and Proj4. EPSG stands for European Petroleum Survey Group, an organization that publishes a database of coordinate system and projection information. The syntax of this format is epsg:<number> or esri:<number>, where <number> is a 4-6 digit for the desired coordinate system. In our cases, the format we need is the title of the page:
The second format supported by VA is Proj4, the third link in the image above. This format consists of a string of space-delimited name value pairs. The Oklahoma StatePlane proj4 definition we are interested in is:
+proj=lcc +lat_1=35.56666666666667 +lat_2=36.76666666666667 +lat_0=35 +lon_0=-98 +x_0=600000.0000000001 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs
Now we have identified the coordinate system used by our data set and looked up its definition, we are ready to configure VA to use it.
Using a Projected Coordinate System definition in VA
The following section assumes you are familiar with custom region maps and setting up a polygon provider. If not, see my previous post on that process, Creating custom region maps with SAS Visual Analytics. The first step in setting up a geography variable for a custom region map is to start with the polygon provider. At the bottom of the ‘Edit Polygon Provider’ window, there is an ‘Advanced’ section that is collapsed by default. Expand it to see the Coordinate Space option. By default, it is populated with the value EPSG:4326, which is the EPSG code for WGS84. Since our Oklahoma City School District code data does not use WGS84, we need to replace this value with the EPSG code that we looked up from SpatialReference.org (ESRI:102724).
Next, we must make sure to configure the geography variable itself with the same coordinate space as the polygon provider. On the ‘Edit Geography Item’ window, the Coordinate Space option is the last item. Again, we must change this from the default WGS84 to ESRI:102724. From the dropdown list, select the option ‘Custom’. A new entry box appears where we can enter the custom coordinate space definition. If configured correctly, you should see your map in the preview thumbnail and a 100% mapped indicator.
Congratulations! The setup was successful. Now, simply click OK and drag the geography variable to the canvas. VA’s auto-map feature will recognize it and display the custom region map.
In this post, I showed how to identify the coordinate system of your Esri shapefile data, lookup its epsg and proj4 definitions, and configure VA to use it via the Custom Coordinate space option. While the focus was on a custom region map, the technique also applies to Custom Coordinate maps, minus the polygon provider setup. The support of custom coordinate spaces in VA allow the mapping of practically any spatial dataset, giving you a new level of power and flexibility in your mapping efforts.
In today’s modern world, GPS-enabled devices are ubiquitous, and their use continues to increase daily. Cell phones, cars, fitness trackers, and cameras are all able to locate and track our position. As a result, the location analytics market is expected to grow to over USD 16 Billion by 2021, up 17.6% from 2016 .
Waldo Tobler, an American-Swiss geographer and cartographer, developed his First Law of Geography based on this concept of everything happening somewhere. He stated, “Everything is related to everything else, but near things are more related than distant things”. As analytic professionals, we are accustomed to working with these correlations using scatterplots, heatmaps, or clustering models. But what happens when we add a geographic map into the analysis?
Maps offer the ability to unlock a new level of insight into our data that traditional graphs do not offer: personal connection. As humans, we naturally relate to our surroundings on a spatial level. It helps build our perspective and frame of reference through which we view and navigate the world. We feel a sense of loss when a physical landmark from our childhood – a building, tree, park, or route we used to walk to school – is destroyed or changed from the memories we have of it. In this sense, we are connected, spatially and emotionally, to our surroundings.
We inherently understand how data relates to the world around us, at some level, just by viewing it on a map. Whether it is a body of water or a mountain affecting a driving route or maybe a trendy area of a city causing housing prices to increase faster than the local average, a map connects us with these facts intuitively. We come to these basic conclusions based solely on our experiences in the world and knowledge of the physical landmarks in the map.
One of the best examples of this is the 1854 Cholera outbreak in London. Dr. John Snow was one of the first to use a map for understanding the origin of an epidemiological outbreak. He created a map of the affected London neighborhood by plotting the location of all known Cholera deaths. In addition to the deaths, he also plotted the location of 13 community wells that served as the public water supply. Using this data, he was able to see a clustering of deaths around a single pump. Armed with this information, Dr. Snow was able to convince local officials to remove the handle from the Broad Street pump. Once removed, new cases of Cholera quickly began to diminish. This helped prove his theory the outbreak’s origin was not air-borne as commonly believed during that time, but rather of a water-borne origin. Let’s look at how Dr. Snow’s map helped mitigate the outbreak and prove his theory. The image above compares the data of the recorded deaths and community wells in tabular form to a Coordinate map. It is obvious from the coordinate map that there is a clustering of points. Town officials and those familiar with the neighborhood could easily get a sense of where the outbreak was concentrated. The map told a better story by connecting their personal experience of the area to the locations of the deaths and ultimately to the wells. Something a data table or traditional graph could not do.
- Today, with the computing power and modern analytic methods available to us, we can take the analysis even further. The examples above show the same coordinate map with added Voronoi polygon and cluster analysis overlays. The concentration around the Broad Street pump becomes even clearer, showing why Geographic Maps are an important tool to have in your analytic toolbox.
SAS Global Forum 2019 is being held April 28-May 1, 2019 in Dallas, Texas. If you are planning to go to this year’s event, be sure to attend one of our presentations on the latest mapping features included in SAS Visual Analytics and BASE SAS. While you’re there, don’t forget to stop by the SAS Mapping booth located in the QUAD to say ‘Hi!’ and let us help with your spatial data needs. See you in Dallas!
- Monday, April 29, 4:30-5:30p, Room: Level 1, D162
- Tuesday April 30, 4:00-4:30p, Room: Level 1, D162
- Wednesday May 01, 11:30a-12:30p, Room: Level 1, D162
SAS Visual Analytics supports region maps for Country, US states, and provinces out-of-the-box. These work well for small scale maps covering the world, a continent, or a single country. However, other regions are often needed. Beginning in version 8.3, VA supports custom polygons to display regions such as sales territories, counties, or zip codes.
Region (choropleth) maps use a fill color to show relationships between the regions based upon a response value from your data. Using custom polygons in VA follows the same steps outlined in previous posts for predefined or custom coordinate geography items, with just a few additional steps. Here’s the basic flow:
- Identify your data
- Import polygon shapefile into SAS dataset
- Import the shape dataset into VA
- Create a Custom polygon provider
- Create the geography item
- Create and customize the map
Before we begin
VA supports two sources for creating custom polygons: Esri shapefiles and Esri Feature Services. The goal for this post is to show how to create custom polygons using an Esri shapefile.
Typically, when working with custom polygons, you will have two datasets: the first defines the custom regions (shape data) and the second contains the data you wish to map (business data). The shape data is derived from an Esri shapefile or feature service. The business data can be in a shapefile or any format supported by VA (.sas7bdat, .csv, .xls, etc). It contains the information you want to analyze distributed across the regions defined by the shape data.
It is recommended that you verify the imported shape data before using it in your final map. This will confirm the data is valid and make debugging an issue easier should you encounter any errors. To verify, use the same dataset for both the shape and business data. The example below will use this approach.
Access to a GIS application such as Esri’s ArcGIS or QGIS is recommended. There are two areas where they can help you prepare to use custom polygons in your VA map:
- Creating a shapefile to define polygons specific to your business need or application
- Viewing the attribute table of existing shapefiles to determine its unique identifier column
For this example, we will be creating a map of registered Neighborhood Associations in Boise, Idaho. To follow along, download the data from the City of Boise open data site: Boise Neighborhood Associations
1. Identify your data
The shape data defining the custom regions needs to be in an Esri shapefile format. These files can be created in a GIS application or obtained from a wide variety of online sources such as: the US Census Bureau (http://www.census.gov); local and state municipalities; state agencies such as the Department of Transportation; and university GIS departments. Most municipalities now have Open Data portals that provide a wealth of reliable data for public use. These sources are maintained by dedicated staff and are updated regularly.
The business data can be specific to your company’s operation or customer base. Or it can be broad and general using census or demographic information. It answers the question of What you want to analyze on the map. The business data must contain a column that aligns with your shape data. For example: If you want to map the age distribution and spending habits of your target customers across zip codes, then your business data must have a column for zip codes that allows it to be joined to a zip code region in the shape data.
2. Import polygon data into a SAS dataset
VA 8.3 does not support the native shapefile format. To use a shapefile in VA, you must first import it into SAS. Included with Viya3.4, the %shpimprt macro will convert a shapefile into a SAS dataset and load it into CAS. You can find the documentation for it here: %shpimprt documentation.
Alternatively, the shapefile can be manually imported with these basic steps:
- Import the shapefile into SAS
- Add a sequence column to the dataset
- Reduce the density of the dataset
- Limit the dataset based on the density value
Additional details and sample code for each of these steps can be found in the text file linked here: Manual shapefile import steps.
3. Import the shape dataset into VA
Next, we must import the dataset into VA, if using the manual shapefile import process. To do this, locate the data pane on the left of VA. From the ‘Open Data Source’ window, select Import > Local File. Navigate to the location of the SAS dataset created from Step 2 and click the Open button.
Adjust the target location as needed, based on your VA installation, and make note of the location selected. This path will be required to configure the custom polygon provider. Review and adjust the other options as needed. Click the blue ‘Import Item’ button at the top of the window to start the import process. A message will appear indicating the import status. Upon successful import, click the 'OK' button to open the dataset.
Since we are using the same dataset for the shape and business data, we need to make a copy of the category variable that will be used for our map. Right click on ‘ASSOCIATIO’ and select ‘Duplicate’. Next, let’s change the names of both variables to better distinguish them from one another:
- Change ‘ASSOCIATIO’ to ‘Business data’
- Change ‘ASSOCIATIO (1)’ to ‘Shape data’
4. Create the geography item
We are now ready to start creating the geography item. With Custom polygons, an additional step is required beyond what was described in previous posts with predefined and custom coordinates geography items. We must define a Custom Polygon provider so VA knows how to locate and display the Boise Neighborhood Associations. This is needed only once and is part of the geography item setup you are familiar with.
Our goal is to map the regions of the Boise Neighborhood Associations, so we will use ‘Shape data’ for our geography item. Locate it in the VA data panel and change its Classification type to ‘Geography’. From the ‘Geography data type’ dropdown, select ‘Custom polygonal shapes’. Several new fields will be displayed. In the ‘Custom polygon provider’ dropdown, click the ‘Define new polygon provider’ button.
A ‘New Polygon Provider’ window will appear. All fields shown are required. The Advanced section has additional options, but they are not needed for this example.
Configure the fields based on the following:
- Name / Label – Enter ‘Boise Neighborhoods’ for both (these values do not have to be the same)
- Type – The default CAS Table is the correct option for this example.
- Server / Library – These values must match those used for the data upload in Step 3.
- Table – Select the name of the table uploaded in Step 3 (Boise_Neighborhoods)
- ID Column – The unique identifier column of the dataset. Used to join the shape and business data together. (Select OBJECTID)
- Sequence Column – This column is created during the import process from Step 2. Needed by VA to display the custom regions. (Select SEQUENCE)
The custom polygon provider is now configured. All that is needed to finish the geography item setup, is to identify the Region ID. This is the crucial step that will join the shape data to the business data. The Region ID column must match the ID Column chosen when the custom polygon provider was setup. Since we are using the same dataset in this example, that value is the same: OBJECTID.
In cases where different datasets are used for the shape and business data, the name of Region ID and ID Column may be different. The column labels are not important, but their content must match for the join to occur.
Notice that once you select the correct RegionID value, the preview window will display the custom regions from the imported shape data. The Latitude and Longitude columns are not required in this example. Click the ‘OK’ button, to finish the setup.
5. Create and customize the map
You are now ready to create your map. Drag the Boise Neighborhoods geography item to the report canvas. Let’s enhance the appearance of our map by making a few style changes:
- Set a Color role to shade the Neighborhood Association regions (Roles > Color > Business data)
- Position the legend on the left of the map (Options > Legend)
- Adjust the transparency of the fill color to 45% (Options > Map Transparency)
- Change the map service to Esri World Street Map (Options > Map service)
Congratulations! You have just created your first custom region map. In this post we discussed how to use the Custom Polygon provider to define your own regions using an Esri shapefile. Compared to the Predefined and Custom Coordinate options, custom polygons give you additional flexibility and control over how your spatial data is analyzed.
Creating custom region maps with SAS Visual Analytics was published on SAS Users.