Falko Schulz

6月 072018

Light lengthens the day and allows us more time to learn, socialize, contemplate and create. Exploring NASA nighttime satellite images shows how illumination patterns have changed over time. Increases and decreases in illumination show the effects of human civilization on earth. From population collapse and destruction in war zones to economic [...]

The dark side: analyzing global changes in nighttime illumination was published on SAS Voices by Falko Schulz

4月 272018

Analyzing ticket sales and customer data for large sports and entertainment events is a complex endeavor. But SAS Visual Analytics makes it easy, with location analytics, customer segmentation, predictive artificial intelligence (AI) capabilities – and more. This blog post covers a brief overview of these features by using a fictitious event company [...]

Analyze ticket sales using location analytics and customer segmentation in SAS Visual Analytics was published on SAS Voices by Falko Schulz

1月 312018

Can you use a data visualization tool to display building maps, floor designs and other Esri data? With the recent addition of custom polygon support in SAS Visual Analytics 8.2, customers wondered if this feature can be utilized to render different types of regional overlays. A common request is to [...]

Building and visualizing custom polygons in SAS Visual Analytics was published on SAS Voices by Falko Schulz

4月 102017

Earth is an explosive world! Data from the Smithsonian Institution's Global Volcanism Program (GVP) documents Earth's volcanoes and their eruptive history over the past 10,000 years. The GVP database includes the names, locations, types, and features of more than 1,500 volcanoes. Let's look closer into volcanic eruptions across the globe [...]

How to design an infographic about volcanic eruptions using SAS Visual Analytics was published on SAS Voices by Falko Schulz

3月 282017

How many meteorites have hit the earth in the last 4,000 years? Where have they landed? And which ones were the biggest? Can we show all of this information - and more in an intuitive data visualization? It turns out NASA provides public data about recorded meteorite impacts on earth all the [...]

How to design a meteorite infographic using NASA data and SAS was published on SAS Voices by Falko Schulz

10月 202016

The study of social networks has gained importance over the years within social and behavioral research on HIV and AIDS. Social network research can show routes of potential viral transfer, and be used to understand the influence of peer norms and practices on the risk behaviors of individuals. This example analyzes the […]

Analyzing social networks using Python and SAS Viya was published on SAS Voices.

8月 192014
Introduction Understanding the behavior of your customers is key to improving and maintaining revenue streams. It is a an important part when crafting successful marketing campaigns. Using SAS Visual Analytics you can analyze, explore and visualize user behavior, click paths and other event-based scenarios. Monitoring the customer journey by visualizing all […]
12月 192013
Being so close before Christmas I thought it would be a good idea to see what route Santa Claus is planning this year. Not just because I'm living in Australia and Santa usually comes in t-shirts & shorts but also because it's a long way to get down here. So maybe SAS can help here by applying some advanced analytics to optimize his busy tour schedule?

So where to start. A website called Travel by GPS has a web page dedicated to tracking Santa. They also offer to download Santa's route as Google Earth KML file. Perfect! So let's have a look what the current route looks like.

Creating XML map files for Google Earth KML

The simplest way of importing XML documents into SAS is by using the XML Libname Engine. The way the engine maps XML tags into individual columns is by using a XML map file. The following code snippet creates a temporary map file which is referenced by the XML libname statement.

/* Generate the XML map to import graph attribute keys from the XML */
filename keymap temp;
data _null_; 
  infile datalines truncover; 
  file keymap; 
  input line $1000.; 
  put line; 
<?xml version="1.0" encoding="utf-8"?>
<SXLEMAP name="KMLMAP" version="2.1">
  <TABLE name="route">
    <TABLE-PATH syntax="XPathENR">/kml/Document/Placemark</TABLE-PATH>

    <COLUMN name="name">
      <PATH syntax="XPathENR">/kml/Document/Placemark/name</PATH>
    <COLUMN name="description">
      <PATH syntax="XPathENR">/kml/Document/Placemark/description</PATH>
    <COLUMN name="coordinates">
      <PATH syntax="XPathENR">/kml/Document/Placemark/Point/coordinates</PATH>
filename graph "Naughty_and_Nice.kml" encoding="utf-8";
libname graph xmlv2 xmlmap=keymap access=readonly;

Import Google Earth KML data into SAS

Since the libname already gives us access to the XML document as SAS data set - it would be just a matter of referencing the table via a Data Step. However looking at the actual content - it appears columns such as description contain some unnessary HTML tags.

So let's extend a data step here to perform some minor data manipulation using Perl Regular Expressions. We are also going to split the coordinates into two individual numeric columns so we can reference these later within SAS Visual Analytics.

data route;
  length stop 8.;
  set graph.route end=last;
  stop = _n_;
  /* remove carriage returns */
  description = prxchange('s/(\n|\r)//', -1, description);
  /* create regular expression to catch the text between the tags*/
  retain re;
  if _N_ = 1 then
      re = prxparse('/(?)(.*?)(?=)/');
  /* extract the text */
  if prxmatch(re, description) then
      description = prxposn(re, 1, description);
  /* split coordinates in lat/lng */
  length lat 8. lng 8.;
  lng = input(scan(coordinates,1,","),8.);
  lat = input(scan(coordinates,2,","),8.);
  /* add a link to the next stop */
  length next_stop 8.;
  if not last then
    next_stop = stop + 1;
  /* throw away things we don't need */
  drop re coordinates;

Calculate distance between two route points

To understand Santa's route it would be beneficial to see the actual distance he needs to travel between two points. SAS provides a great geodist function to do so. And finally we are also going to create a route from one stop to the next by joining the table with itself.

proc sql noprint;
	create table Naughty_and_Nice as
	select 	a.stop label="Stop",
		a.next_stop label="Next Stop",
		a.name label="Client",
		a.description label="Description",
		a.lat label="Latitude",
		a.lng label="Longitude",
		b.lat as next_lat label="Next Latitude",
		b.lng as next_lng label="Next Longitude",
		geodist(a.lat, a.lng, b.lat, b.lng) as distance label="Distance (km)"
	from route a left join route b on a.next_stop eq b.stop
	order by a.stop

Explore the data with SAS Visual Analytics

Loading the data into SAS Visual Analytics Explorer and applying coordinates to our route stops reveals the un-optimized route Santa would take if he would simply go from one stop to the next according to the order in our data base. Based on the color legend it appears the maximum distance between route stops can be as long as 18,000 km. Considering the circumference of the earth at the equator is 40,076 km (24,902 mi) it would suggest Santa needs to travel halfway around the world to reach the next stop - clearly we should be able to optimize this route!

Optimizing Santas Route

SAS provides network optimization algorithms as part of SAS/OR. So let's apply the typical Traveling Salesman Problem to our Santa route. The first SQL statement creates a list of all possible pairs of potential stops. The following PROC OPTNET call will take the challenge and create the optimal route for Santa.

/* Create a list of all the possible pairs of potential stops */
proc sql noprint;
	create table Naughty_and_Nice as
	select 	a.stop label="Stop",
		a.name label="Client",
		a.description label="Description",
		a.lat label="Latitude",
		a.lng label="Longitude",
		b.stop as next_stop label="Next Stop",
		b.lat as next_lat label="Next Latitude",
		b.lng as next_lng label="Next Longitude",
		geodist(a.lat, a.lng, b.lat, b.lng) as distance label="Distance (km)"
	from route as a, route as b
	where a.stop ne b.stop
	order by a.stop
/* Find optimal tour for Santa using OPTNET */
proc optnet
   data_links = data.naughty_and_nice
   out_nodes  = TSPTourNodes;
      from    = stop
      to      = next_stop
      weight  = distance;
      out     = TSPTourLinks;
My slightly ageing computer managed to solve this puzzle in a bit more than an hour.

NOTE: ------------------------------------------------------------------------------------------------
NOTE: Running OPTNET version 13.1.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: Data input used 0.23 (cpu: 0.23) seconds
NOTE: The number of nodes in the input graph is 551.
NOTE: The number of links in the input graph is 151525.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: Processing the traveling salesman problem.
NOTE: The initial TSP heuristics found a tour with cost 189274.24153 using 3.77 (cpu: 3.42) seconds.
NOTE: The MILP presolver value NONE is applied.
NOTE: The MILP solver is called.
NOTE: The MILP solvers symmetry detection found 146616 orbits. The largest orbit contains 15
NOTE: The MILP solver added 155 cuts with 542405 cut coefficients at the root.
NOTE: Optimal within relative gap.
NOTE: Objective = 179509.55266.
NOTE: Processing the traveling salesman problem used 4116.06 (cpu: 4090.08) seconds.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: Data output used 0.02 (cpu: 0.00) seconds.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: The data set WORK.TSPTOURNODES has 551 observations and 2 variables.
NOTE: The data set WORK.TSPTOURLINKS has 551 observations and 3 variables.
NOTE: PROCEDURE OPTNET used (Total process time):
      real time           1:08:36.57
      cpu time            1:08:31.34
NOTE: There were 551 observations read from the data set WORK.TSPTOURLINKS.
NOTE: The data set DATA.TSPTOURLINKS has 551 observations and 3 variables.

Optimal Santa route in SAS Visual Analytics

Reloading our new optimized data set shows the new route Santa should take.

A closer look to North America shows how Santa is travelling across the states.

Combining advanced visual analytics with network optimization algorithms show the great potential a system can deliver. The challenge to find the shortest possible route applies to many industries including planning, logistics, manufacturing, genomics, etc. Although solving the routing problem for Santa Claus has its own special meaning.

Merry Christmas Everyone!

tags: SAS Programmers, SAS Visual Analytics
12月 122013
Twitter is becoming an important data source for organizations who want to better understand customer feedback on social channels. Before you can analyze Twitter data, however, you’ll need to import it into SAS. This post explains one way to do just that. Twitter supports authentication methods such as OAuth signed [...]