2月 122018
 

Did you know that SAS can combine or "merge" a symbol and a line pattern into a single legend item, as shown below? This kind of legend is useful when you are overlaying a group of curves onto a scatter plot. It enables the reader to quickly associate values of a categorical variable with colors, line patterns, and marker symbols in a plot.

Merged legend that combines symbols and line patterns in SAS

Legend items for marker/curve overlays

When you use PROC SGPLOT and the GROUP= option to create a graph, the SGPLOT procedure automatically displays the group attributes (such a symbol, color, and line pattern) in a legend. If you overlay multiple plot types (such as a series plot on a scatter plot) the default behavior is to create a legend for the first plot statement. You can use the KEYLEGEND statement to control which plots contribute to the legend. In the following example, the KEYLEGEND statement creates a legend that shows the attributes for the scatter plot (the marker shapes) and also the series plot (line patterns):

data ScatCurve;    /* example data: scatter plot and curves */
call streaminit(1);
do Group = 1 to 2;
   do x = 0 to 5 by 0.2;
      Pred = Group + (1/Group)*x - (0.2*x)**2;
      y = Pred + rand("Normal",0,0.5);
      output;
   end;
end;
run;
 
ods graphics / antialias=on subpixel=on;
title "Legend Not Merged";
proc sgplot data=ScatCurve;
   scatter x=x y=y / group=Group name="obs" markerattrs=(symbol=CircleFilled);
   series x=x y=Pred / group=Group name="curves"; 
   keylegend "obs" "curves" / title="Group"; /* separate items for markers and lines */
run;
Legend that shows symbols and line patterns separately

The legend contains all the relevant information about symbols, colors, and line patterns, but it is longer than it needs to be. When you display curves and markers for the same groups, you can obtain a more compact representation by merging the symbols and line patterns into a single legend, as shown in the next sections.

The MERGEDLEGEND statement in GTL

If you are comfortable with the Graph Template Language (GTL) in SAS, then you can use the MERGEDLEGEND statement to create a merged legend. The following statements create a graph template that overlays a scatter plot and a series plot and creates a merged legend in which each item contains both a symbol and a line pattern:

proc template;
  define statgraph ScatCurveLegend;
  dynamic _X _Y _Curve _Group _Title;       /* dynamic variables */
  begingraph;
  entrytitle _Title;  
   layout overlay; 
     scatterplot x=_X y=_Y / group=_Group name="obs" markerattrs=(symbol=CircleFilled);
     seriesplot x=_X y=_Curve / group=_Group name="curves";
     /* specify exactly two names for the MERGEDLEGEND statement */
     mergedlegend "obs" "curves" / border=true title=_Group;
   endlayout;
  endgraph;
  end;
run;
 
proc sgrender data=ScatCurve template=ScatCurveLegend;
   dynamic _Title="Overlay Curves, Merged Legend"
           _X="x" _Y="y" _Curve="Pred" _Group="Group";
run;
Merged legend that combines symbols and line patterns in SAS

The graph is the same as before, but the legend is different. The MERGEDLEGEND statement is used in many graphs that are created by SAS regression procedures. As you can see, the legend shows the symbol and line pattern for each group.

The LEGENDITEM statement in PROC SGPLOT

Unfortunately, the MERGEDLEGEND statement is not supported by PROC SGPLOT. However, SAS 9.4M5 supports the LEGENDITEM statement in PROC SGPLOT. The LEGENDITEM statement enables you to construct a custom legend and gives you complete control over every item in a legend. The following example constructs a legend that uses the symbols, line patterns, and group values that are present in the graph. Notice that you have to specify these attributes manually, as shown in the following example:

title "Merged Legend by Using LEGENDITEM in SGPLOT";
proc sgplot data=ScatCurve;
   scatter x=x y=y / group=Group markeratters=(symbol=CircleFilled);
   series x=x y=Pred / group=Group; 
   legenditem type=markerline name="I1" /
              label="1" lineattrs=GraphData1 markerattrs=GraphData1(symbol=CircleFilled);
   legenditem type=markerline name="I2" /
              label="2" lineattrs=GraphData2 markerattrs=GraphData2(symbol=CircleFilled);
   keylegend "I1" "I2" / title="Group"; 
run;

The graph and legend are identical to the previous graph and are not shown.

The advantage of the LEGENDITEM statement is that you can layout the legend however you choose; the legend is not tied to the attributes in any previous graph component. However, this is also a disadvantage. If you change the marker attributes in the SCATTER statement, the legend will not reflect that change until you manually modify each LEGENDITEM statement. Although there is no denying the power of the LEGENDITEM statement, the MERGEDLEGEND statement in the GTL always faithfully and automatically reflects the attributes in the SCATTERPLOT and SERIESPLOT statements.

In summary, the SG procedures in SAS automatically create a legend. When you overlay multiple plots, you can use the KEYLEGEND statement to control which plots contribute to the legend. However, it is also possible to merge the symbols and line patterns into a single compact legend. In the GTL, you can use the MERGEDLEGEND statement. In SAS 9.4M5, PROC SGPLOT supports the LEGENDITEM statement to customize the items that appear in a legend.

The post Merged legends: Overlay a symbol and line in a legend item appeared first on The DO Loop.

2月 092018
 

As the old saying goes, the only constant is change. The corollary that I’ll add is this: and the pace of change feels like it's directly proportional to Moore’s Law. Utilities used to be favorites for conservative investors. Smooth. Steady. When there was change, it was incremental. But today, uncertainty [...]

Trends to watch for utilities in the analytics economy was published on SAS Voices by Mike F. Smith

2月 092018
 

Want to see my newly minted certified professional badge? Scroll down to take a peek. Yes, I managed to successfully complete the Base SAS Programmer certification exam… with, ahem, flying colors I might add. Here are my tips to tackle the Base SAS certification exam: 1.  Get clear on the [...]

The post Demystifying certification (Part 3): To the finish appeared first on SAS Learning Post.

2月 082018
 

In my last article, I worked with an example of using custom polygon data to create a regional geo map in SAS Visual Analytics 7.4. In this article, I will use almost the same example to illustrate the ease of implementing custom polygons to produce the same regional map in SAS Visual Analytics 8.2.

In this example, as in my last blog, the site has sales data for each sales region in the US and would like to display a geo map of the regions.

The six sales regions are:

Custom polygons in SAS Visual Analytics

We will again start with the MAPSGFK.US_STATES dataset, which contains the data required to overlay all states of the US on a VA region geomap and has these columns:

As in my last post, we will add the sales regions (REGION) column and values using data step code, and then use GREMOVE to remove the state boundaries, leaving the region boundary points.  For a look at that code, see my previous blog.

The following datastep adds the necessary columns/values to the polygon dataset so that the form of the data is what is expected by VA.  Note that the LAT and LONG columns are already in unprojected form, so we just assign those values to Y and X, so our column names will more closely match what we will see in the VA interface when creating the geographic data item.   We also create a SEQUENCE column, required by VA 8.2,  using the values of the internal variable, _n_.

data mydata.regions;
   set mydata.regions;
   sequence=_n_;
   id=region;
   x=long;
   y=lat;
   keep ID SEQUENCE SEGMENT X Y ;
   run;

The polygon table, REGIONS,  now has the following columns.

The dataset containing the region and measure data, REGIONSALES contains these columns:

Both datasets should be loaded into memory. Sign in to SAS Visual Analytics – Explore and Visualize Data and create a new report with data source REGIONSALES.

Create a new Geography data item from REGION as shown below, also specifying a New Polygon Provider with values shown on the next several screen shots.  Give the new provider a name and label, and specify the CAS server, library, and table name.

Scroll down to add the ID, Sequence, Segment, latitude and longitude columns.

The new geography data item, after clicking OK:

Now create a Geo Map of type Regions as shown:

Please Creating a regional map with custom polygons in SAS Visual Analytics 8.2 was published on SAS Users.

2月 082018
 

By default, SAS Visual Analytics 7.4 supports country and state level polygons for regional geomaps. In SAS Visual Analytics 7.4, custom shape files are now supported, as well. This means that if a site has their own custom polygon data that defines custom regions, it’s possible to create a region geomap that displays those regions.

Implementing the process requires completing some preparatory steps, explicitly execution of some SAS code, but the steps are explained in Appendix 2 of the SAS Visual Analytics 7.4: Administration Guide. The SAS program that completes the steps is provided for download at http://support.sas.com/rnd/datavisualization/vageo/va74polygons.sas.

Two examples using the program are provided in Appendix 2 for US counties and German provinces. The instructions in Appendix 2 assume that the custom polygon data is provided in ESRI shape file format, which is likely the most common use-case. The site will need access to a SAS programming environment and SAS/GRAPH software, and whoever completes the process will need access to the SAS Visual Analytics configuration directory and the ability to restart services—so an administrator-type person will be required.

One common request is to provide a regional geomap, where the regions are site-defined groups of states or provinces of a country. In this example problem, the site has sales data for each sales region in the US and would like to display a geo map of the regions.

Custom regional map in SAS Visual AnalyticsFor this type of region/province example, you will likely be able to use one of the maps already provided by SAS in the MAPSGFK library to produce your region boundaries. For more information on the datasets in the MAPSGFK library, see this paper. 

The MAPSGFK.US_STATES dataset contains the data required to overlay all states of the US on a VA region geomap and has these columns:

The highlighted columns, STATECODE, LONG, and LAT will be particularly useful, but first, the sales region (REGION) column and values must be added using simple data step code. The unnecessary FIPS code (STATE) can be dropped in the same DATA step.  Note that the region values are assigned in upper case, as these will later be converted to ID values, which VA expects to be in upper case.

data regions;
   length region $ 12;
   drop state;
   set mapsgfk.us_states;
      if statecode in ('AK','HI','PR') then delete;
      else if statecode in ('WA','MT','OR','ID','WY')
         then region='NORTHWEST';
      else if statecode in ('CA','NV','UT','AZ','CO','NM')
         then region='SOUTHWEST'; 
      else if statecode in ('ND','SD','NE','MN','WI','MI','IA','IL','IN')
         then region='NORTHCENTRAL'; 
      else if statecode in ('KS','OK','TX','MO','AR')
         then region='SOUTHCENTRAL'; 
      else if statecode in ('ME','NH','VT','MA','RI','CT','NY','PA','NJ','OH','DE',
'MD','DC')then region='NORTHEAST';
      else if statecode in ('KY','WV','VA','TN','NC','MS','AL','LA','GA','SC','FL')
         then region='SOUTHEAST';
      run;

The data is then sorted by the REGION values, a requirement of the SAS/GRAPH GREMOVE procedure, which is used to remove the internal state boundary data points, leaving the region boundary points only.

proc sort data=regions;
   by region;
 proc gremove data=regions out=mapscstm.regions1;
    by region;
    id statecode;
    run;

To complete the process, since the LAT and LONG values are already in the form that VA needs (unprojected) and we are using a SAS dataset rather than the ESRI shape file format, we’ll only use a part of the code from the downloadable program mentioned at the beginning of the blog.

First, create a mapscstm directory under /SASHome/SASFoundation/9.4 to store the custom polygon dataset.  Make sure that the library is accessible to the SAS session by including a libname statement in the appserver_autoexec_usermods.sas file, found in config/Lev1/SASApp, and then restarting the Object Spawner.

Example:

libname MAPSCSTM “SASHome/SASFoundation/9.4/mapscstm”;

Tip:  Be sure to back up the original ATTRLOOKUP and CENTLOOKUP datasets before running any additional code, as you will be modifying the originals.

To complete creation of the polygon dataset, you will need to execute only a part of the downloadable program to:
• Make sure that your polygon dataset has all of the columns expected by SAS Visual Analytics.
• Add the region attributes to the ATTRLOOKUP.
• Add the region center point locations to the CENTLOOKUP dataset.

%let REGION_LABEL=USRegions;   /* The label for the custom region */
 %let REGION_PREFIX=R1; /* unique ISO 2-Letter Code  */
 %let REGION_ISO=000; /* unique ISO Code  */
 %let REGION_DATASET=MAPSCSTM.REGIONS1;  /* Polygon data set to be 
              created - be sure to use suffix "1" */

Note that the downloadable program includes additional macro assignments and additional code, but since our data is already in the form of a SAS dataset, rather than ESRI shape file format, we won’t be using all of the code.

The following datastep adds the necessary columns/values to the polygon dataset so that the form of the data is what is expected by VA.  Note that the LAT and LONG columns are already in unprojected form, so we just assign the same values to X and Y.  (VA doesn’t actually use the X,Y columns from the polygon dataset.)

data &REGION_DATASET.;
   set &REGION_DATASET.;
   where density <= 3; 
   id=region;
   idname=region;
   x=long;  
   y=lat;
   ISO = "&REGION_ISO.";
   RESOLUTION = 1;
   LAKE = 0;
   ISOALPHA2 = "&REGION_PREFIX.";
   AdminType = "regions";
   keep ID SEGMENT IDNAME LONG LAT X Y ISO DENSITY RESOLUTION LAKE ISOALPHA2 AdminType;
   run;

Then PROC SQL steps are executed to add rows relative to the custom polygons to the ATTRLOOKUP and CENTLOOKUP datasets:

This step adds the USRegions row to ATTRLOOKUP:

proc sql;
   insert into valib.attrlookup
      values ( 
         "&REGION_LABEL.",         /* IDLABEL=State/Province Label */
         "&REGION_PREFIX.",        /* ID=SAS Map ID Value */
         "&REGION_LABEL.",         /* IDNAME=State/Province Name */
         "",                       /* ID1NAME=Country Name */
         "",                       /* ID2NAME */
         "&REGION_ISO.",           /* ISO=Country ISO Numeric Code */
         "&REGION_LABEL.",         /* ISONAME */
         "&REGION_LABEL.",         /* KEY */
         "",                       /* ID1=Country ISO 2-Letter Code */
         "",                       /* ID2 */
         "",                       /* ID3 */
         "",                       /* ID3NAME */
         0                         /* LEVEL (0=country level, 1=state level) */
         );
quit;

This step adds a row to ATTRLOOKUP for each individual region:

proc sql;
   insert into valib.attrlookup
      select distinct 
         IDNAME,            /* IDLABEL=State/Province Label */
         ID,                /* ID=SAS Map ID Value */
         IDNAME,            /* IDNAME=State/Province Name */
 
         "&REGION_LABEL.",  /* ID1NAME=Country Name */
         "",                /* ID2NAME */
         "&REGION_ISO.",    /* ISO=Country ISO Numeric Code */
         "&REGION_LABEL.",  /* ISONAME */
         trim(IDNAME) || "|&REGION_LABEL.",  /* KEY */
         "&REGION_PREFIX.",   /* ID1=Country ISO 2-Letter Code */
         "",                  /* ID2 */
         "",                  /* ID3 */
         "",                  /* ID3NAME */
         1                    /* LEVEL (1=state level) */
   from &REGION_DATASET.;
quit;

This step calculates and adds the central location point for each of the regions to the CENTLOOKUP dataset.   The site data contains only the 48 contiguous states (no Alaska or Hawaii). If Alaska and Hawaii had been included, a different algorithm would need to be used to calculate the central location.

proc sql;
   /* Add custom region */
   insert into valib.centlookup
      select distinct
         "&REGION_DATASET." as mapname,
         "&REGION_PREFIX." as ID,
         avg(x) as x,
         avg(y) as y
      from &REGION_DATASET.;
 
   /* Add custom provinces */
   insert into valib.centlookup
      select distinct
         "&REGION_DATASET." as mapname,
         ID as ID,
         avg(x) as x,
         avg(y) as y
      from &REGION_DATASET.
         group by id;
quit;

After executing the code above, you will need to restart the Web Application server, so that SAS Visual Analytics has access to the new polygons.

Code is also included in the downloadable program to create a dataset for validating your results. The validate dataset includes a column for the ID and IDNAME of the regions, in addition to two randomly calculated measures.  In our case, we will instead just use our original REGIONSALES dataset containing the regional sales data.

1. Sign into SAS Visual Analytics and create a new exploration with data source REGIONSALES.
2. Create a Geo data item from State: Right-click Regions, select Geography?Subdivision(State, Province) Names. From the Country or Region drop-down list, select the USRegions region label.
3. Create a geo map visualization. Select Regions for the map style, Regions for the Geography role, and salesamt for the Color role.

Your regions should display, similar to this:

You can also include the region data item in a hierarchy with the state data item to produce a drill-down region map:

Or a bubble or coordinate map:

I hope this example has been helpful to users of SAS Visual Analytics 7.4.  In my next blog, you will see that this process is tremendously simplified by new mapping features in SAS Visual Analytics 8.2.

Creating a custom regional map in SAS Visual Analytics 7.4 was published on SAS Users.

2月 072018
 
Multiple-Birthday Problem: The distribution of the number of shared birthdays among 23 random people

If N random people are in a room, the classical birthday problem provides the probability that at least two people share a birthday. The birthday problem does not consider how many birthdays are in common. However, a generalization (sometimes called the Multiple-Birthday Problem) examines the distribution of the number of shared birthdays. Specifically, among N people, what is the probability that exactly k birthdays are shared (k = 1, 2, 3, ..., floor(N/2))? The bar chart at the right shows the distribution for N=23. The heights of the bars indicate the probability of 0 shared birthdays, 1 shared birthday, and so on.

This article uses simulation in SAS to examine the multiple-birthday problem. If you are interested in a theoretical treatment, see Fisher, Funk, and Sams (2013, p. 5-14).

The probability of multiple shared birthdays for N=23

You can explore the classical birthday problem by using probability theory or you can estimate the probabilities by using Monte Carlo simulation. The simulation in this article assumes 365 equally distributed birthdays, but see my previous article to extend the simulation to include leap days or a nonuniform distribution of birthdays.

The simulation-based approach enables you to investigate the multiple birthday problem. To begin, consider a room that contains N=23 people. The following SAS/IML statements are taken from my previous article. The Monte Carlo simulation generates one million rooms of size 23 and estimates the proportion of rooms in which the number of shared birthdays is 0, 1, 2, and so forth.

proc iml;
/* Function that simulates B rooms, each with N people, and counts the number of shared 
   (matching) birthdays in each room. The return value is a row vector that has B counts. */
start BirthdaySim(N, B);                 
   bday = Sample(1:365, B||N);             /* each column is a room */
   match = N - countunique(bday, "col");   /* number of unique birthdays in each col */
   return ( match );
finish;
 
call randseed(12345);                      /* set seed for random number stream */
NumPeople = 23;
NumRooms = 1e6;
match = BirthdaySim(NumPeople, NumRooms);  /* 1e6 counts */
 
call tabulate(k, freq, match);  /* summarize: k=number of shared birthdays, freq=counts */ 
prob = freq / NumRooms;         /* proportion of rooms that have 0, 1, 2,... shared birthdays */
print prob[F=BEST6. C=(char(k,2)) L="Estimate of Probability (N=23)"];
Multiple-Birthday Problem: The distribution of shared birthdays among 23 random people

The output summarizes the results. In less than half the rooms, no person shared a birthday with anyone else. In about 36% of the rooms, one birthday is shared by two or more people. In about 12% of the room, there were two birthdays that were shared by four or more people. About 2% of the rooms had three birthdays shared among six or more individuals, and so forth. In theory, there is a small probability of a room in which 8, 9, 10, or 11 birthdays are shared, but the probability of these events is very small. The probability estimates are plotted in the bar chart at the top of this article.

There is a second way to represent this probability distribution: you can create a stacked bar chart that shows the cumulative probability of k or fewer shared birthdays for k=1, 2, 3,.... You can see that the probability of 0 matching birthdays is less than 50%, the probability of 1 or fewer matches is 87%, the probability of 2 or fewer matches is 97%, and so forth. The probability for 5, 6, or 7 matches is not easily determined from either chart.

Multiple-Birthday Problem: The cumulative distribution of shared birthdays among 23 random people

Mathematically speaking, the information in the stacked bar chart is equivalent to the information in the regular bar chart. The regular bar chart shows the PMF (probability mass function) for the distribution whereas the stacked chart shows the CDF (cumulative distribution function).

The probability of shared birthdays among N people

The previous section showed the distribution of shared birthdays for N=23. You can download the SAS program that simulates the multiple birthday problem for rooms that contain N=2, 3, ..., and 60 people. For each simulation, the program computes estimates for the probabilities of k matching birthdays, where k ranges from 0 to 30.

Multiple-Birthday Problem: Three distributions of the number of shared birthdays among N=25, 40, and 60 random people

There are a few ways to visualize those 59 distributions. One way is to plot 59 bar charts, either in a panel or by using an animation. A panel of three bar charts is shown to the right for rooms of size N=25, 40, and 60. You can see that the bar chart for N=25 is similar to the one shown earlier. For N=40, most rooms have between one and three shared birthdays. For rooms with N=60 people, most rooms have between three and six shared birthdays. The horizontal graphs are truncated at k=12, even though a few simulations generated rooms that contain more than 12 matches.

Obviously plotting all 59 bar charts would require a lot of space. A different way to visualize these 59 distributions is to place 59 stacked bar charts side by side. This visualization is more compact. In fact, instead of 59 stacked bar charts, you might want to create a stacked band plot, which is what I show below.

The following stacked band chart is an attempt to visualize the distribution of shared birthdays for ALL room sizes N=2, 3, ..., 60. (Click to enlarge.)

Multiple-Birthday Problem: The cumulative distribution of the probability of k or fewer shared birthdays among N random people, N=2,3,...,60

How do you interpret this graph? A room that contains N=23 people corresponds to drawing a vertical line at N=23. That vertical line intersects the red, green, and brown bands near 0.5, 0.87, and 0.97. Therefore, these are the probabilities that a room with 23 people contains 0 shared birthdays, 1 or fewer shared birthdays, and 2 or fewer shared birthdays. The vertical heights of the bands qualitatively indicate the individual probabilities.

Next consider a room that contains N=40 people. A vertical line at N=40 intersects the colored bands at 0.11, 0.37, 0.66, 0.86, and 0.95. These are the cumulative probabilities P(k ≤ 0), P(k ≤ 1), P(k ≤ 2), and so forth. If you need the exact values, you can use PROC PRINT to display the actual probability estimates. For example:

proc print data=BDayProb noobs;
   where N=40 and k < 5;
run;
Multiple-Birthday Problem: The distribution of shared birthdays among N=40 random people

Summary

In summary, this article examines the multiple-birthday problem, which is a generalization of the classical birthday problem. If a room contains N random people, the multiple-birthday problem examines the probability that k birthdays are shared by two or more people for k = 0, 1, 2, ..., floor(N/2). This article shows how to use Monte Carlo simulation in SAS to estimate the probabilities. You can visualize the results by using a panel of bar charts or by using a stacked band plot to visualize the cumulative distribution. You can download the SAS program that simulates the multiple birthday problem, estimates the probabilities, and visualizes the results.

What do you think? Do you like the stacked band plot that summarizes the results in one graph, or do you prefer a series of traditional bar charts? Leave a comment.

The post The distribution of shared birthdays in the Birthday Problem appeared first on The DO Loop.

2月 072018
 

In an earlier blog, I asked you to participate in CertMag/GoCertify’s Annual IT Salary survey. The response was fantastic and I’m happy to report that we made the list of the Top 75 IT certifications out of more than 900. This marks the first time SAS has been part of [...]

The post You know the value of your SAS Certification; does the rest of the world? appeared first on SAS Learning Post.

2月 072018
 

Jazz up your Geo Map or Network Analysis graph by applying icon-based display rule markers instead of color markers on the map. With SAS Visual Analytics, you may have already used display rules by populating intervals or adding color-mapped values for report objects. Now, you can jazz up your Geo Map or Network Analysis object by choosing from a curated set of icons and applying icon-based display rule markers.SAS Visual Analytics Geo Map

The set of curated icons in SAS Visual Analytics 8.2 are classified into these groups for use with icon-based display rules:

SAS Visual Analytics Geo Map
Here is an example of the display-rule icons that are available for Status:

SAS Visual Analytics Geo Map
When your mouse hovers over an icon, the name of that icon is displayed.

Applying Icon-Based Display Rules to a Geo Map

While working with a data source that included a measure for the total number of cellular mobile subscriptions per 100 in each country, I wanted to display the results in a Geo Map. Before creating the display rules, I looked at the data for the mobile cellular subscriptions for various countries, and decided that I wanted to create four display rules, each one associated with the number of mobile cellular subscriptions per 100. The icons that I wanted for my display rules were all available under Status. So here’s how I decided to set up my operators, values, and the icon style and color:


Here are the steps I followed to setup the icon-based display rules.

Create the New Geography Item

1.  In my new SAS Visual Analytics 8.2 report, I went to Objects, chose Geo Map (available under Graphs) and dragged it over to the blank canvas.
2.  From Data, I searched for my data source and added it to the report.
3.  In my data source, I highlighted the category (Country), right clicked, and selected New Geography.
4.  In the New Geography Item dialog, I entered a name for the new geographic item that I was creating: Country (Geographic Item).

Change Geo Map Type to Coordinates

5.  I select the Geo Map, go to Options in SAS Visual Analytics and scroll down to Map.
6.  By default, the Type is set to Bubbles. I change it to Coordinates (this is a requirement to create the icon-based display rules).


7.  The default size for the Marker size is 11. I change it to 14 because I would like my markers to show up slightly bigger in the Geo Map.
8.   By default, Legend is displayed for the Geo Map and Visibility is set to On. I chose not to display Legend information for the Geo Map, so I chose Off for Visibility.

Choose Role for the Geo Map

9.  I choose Roles, and I am ready to assign the geographic data item to Category. So I choose the Country (Geographic Item) that I had just created, and drag it over to Category. I now see the Country (Geographic) data role applied to the Geo Map.

Create Icon-Based Display Rules for the Geo Map

10.  I click on Rules and under Display Rules, I click on New rule.


11.  In the New Display Rule dialog, I chose <= for Operator and entered a 250 for Value.
12.  I click on Style and choose Red as the color for this display rule.


13.  I click on Icon, and I am presented with seven categories for the icons. When I hover an icon, the icon name is displayed.
14.  I click on the Significantly Lower icon and click OK.


15.  A quick review of what I just created and I click OK.


16.  I continue to create three additional display rules for my Geo Map.

Now, I have completed creating the four display rules. Here’s how they show in the SAS Visual Analytics Viewer:


17.  When all of the display rules have been created, the Geo Map displays with the colorful icon-based display rules applied to the various countries.


You’ve just seen how you can create icon-based display rules for a Geo Map. You can also create icon-based display rules for a Network Analysis object as well.

Jazz up a Geo Map with colorful icon-based display rules was published on SAS Users.

2月 062018
 

Good news learners! SAS University Edition has gone back to school and learned some new tricks.

With the December 2017 update, SAS University Edition now includes the SASPy package, available in its Jupyter Notebook interface. If you're keeping track, you know that SAS University Edition has long had support for Jupyter Notebook. With that, you can write and run SAS programs in a notebook-style environment. But until now, you could not use that Jupyter Notebook to run Python programs. With the latest update, you can -- and you can use the SASPy library to drive SAS features like a Python coder.

Oh, and there's another new trick that you'll find in this version: you can now use SAS (and Python) to access data from HTTPS websites -- that is, sites that use SSL encryption. Previous releases of SAS University Edition did not include the components that are needed to support these encrypted connections. That's going to make downloading web data much easier, not to mention using REST APIs. I'll show one HTTPS-enabled example in this post.

How to create a Python notebook in SAS University Edition

When you first access SAS University Edition in your web browser, you'll see a colorful "Welcome" window. From here, you can (A) start SAS Studio or (B) start Jupyter Notebook. For this article, I'll assume that you select choice (B). However, if you want to learn to use SAS and all of its capabilities, SAS Studio remains the best method for doing that in SAS University Edition.

When you start the notebook interface, you're brought into the Jupyter Home page. To get started with Python, select New->Python 3 from the menu on the right. You'll get a new empty Untitled notebook. I'm going to assume that you know how to work with the notebook interface and that you want to use those skills in a new way...with SAS. That is why you're reading this, right?

Move data from a pandas data frame to SAS

pandas is the standard for Python programmers who work with data. The pandas module is included in SAS University Edition -- you can use it to read and manipulate data frames (which you can think of like a table). Here's an example of retrieving a data file from GitHub and loading it into a data frame. (Read more about this particular file in this article. Note that GitHub uses HTTPS -- now possible to access in SAS University Edition!)

import saspy
import pandas as pd
 
df = pd.read_csv('https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv')
df.describe()

Here's the result. This is all straight Python stuff; we haven't started using any SAS yet.

Before we can use SAS features with this data, we need to move the data into a SAS data set. SASPy provides a dataframe2sasdata() method (shorter alias: df2sd) that can import your Python pandas data frame into a SAS library and data set. The method returns a SASdata object. This example copies the data into WORK.PROBLY in the SAS session:

sas = saspy.SASsession()
probly = sas.df2sd(df,'PROBLY')
probly.describe()

The SASdata object also includes a describe() method that yields a result that's similar to what you get from pandas:

Drive SAS procedures with Python

SASPy includes a collection of built-in objects and methods that provide APIs to the most commonly used SAS procedures. The APIs present a simple "Python-ic" style approach to the work you're trying to accomplish. For example, to create a SAS-based histogram for a variable in a data set, simply use the hist() method.

SASPy offers dozens of simple API methods that represent statistics, machine learning, time series, and more. You can find them documented on the GitHub project page. Note that since SAS University Edition does not include all SAS products, some of these API methods might not work for you. For example, the SASml.forest() method (representing

In SASPy, all methods generate SAS program code behind the scenes. If you like the results you see and want to learn the SAS code that was used, you can flip on the "teach me SAS" mode in SASPy.

sas.teach_me_sas('true')

Here's what SASPy reveals about the describe() and hist() methods we've already seen:

Interesting code, right? Does it make you want to learn more about SCALE= option on PROC SGPLOT?

If you want to experiment with SAS statements that you've learned, you don't need to leave the current notebook and start over. There's also a built-in %%SAS "magic command" that you can use to try out a few of these SAS statements.

%%SAS
proc means data=sashelp.cars stackodsoutput n nmiss median mean std min p25 p50 p75 max;run;

Python limitations in SAS University Edition

SAS University Edition includes over 300 Python modules to support your work in Jupyter Notebook. To see a complete list, run the help('modules') command from within a Python notebook. This list includes the common Python packages required to work with data, such as pandas and NumPy. However, it does not include any of the popular Python-based machine learning modules, nor any modules to support data visualization. Of course, SASPy has support for most of this within its APIs, so why would you need anything else...right?

Because SAS University Edition is packaged in a virtual machine that you cannot alter, you don't have the option of installing additional Python modules. You also don't have access to the Jupyter terminal, which would allow you to control the system from a shell-like interface. All of this is possible (and encouraged) when you have your own SAS installation with your own instance of SASPy. It's all waiting for you when you've outgrown the learning environment of SAS University Edition and you're ready to apply your SAS skills and tech to your official work!

Learn more

The post Coding in Python with SAS University Edition appeared first on The SAS Dummy.