Analysts often use a simple moving average to get an idea of the trends in data. This is simply an average of a subset of time periods, and the size of the subset can differ depending on the application. The technique can be used with data based on time periods, such as sales data, expense data, telecom data, or stock market data. The average is called ‘moving’ because it is continually recomputed as more data becomes available. This type of average is also called a ‘rolling average’ or ‘running average’. In this post, I’ll share a little bit about how to use the periodic operators in SAS Visual Analytics Designer to calculate a simple moving average.

The report below, created in the designer, shows the summary of the Amount column by month. The Three-Month Moving Average column displays the average of a month and the previous two months’ Amount sums. The 3-month sum is simply divided by 3.

This Three-Month Moving Average column is an aggregation, calculated using the RelativePeriod Periodic operator.  Both the visual and the text forms of the aggregation are shown below:

The RelativePeriod operator returns aggregated values - sum of Amount, in this case - relative to the current period – in this case, the previous month. The data item for the period calculation is Month, which is a date value with an associated YYYYMM format. The interval is _ByMonth_, and the 0, -1, and -2 offset values represent the current month, the previous month, and the month before the previous, respectively. The division by three creates the 3-month moving average, but the number of RelativePeriod expressions and the divisor could be adjusted to calculate an average based on any number of months.

The report below displays a 3-Year Moving Average column.

In this report, the RelativePeriod operator is used to calculate the three-year moving average. The data item for the period calculation is Year, which is a date value with an associated Year format.

The interval is _ByYear_, with the 0, -1, and -2 offset values representing the current year, the previous year, and the year before the previous, respectively. Again, the number of RelativePeriod expressions and the divisor could be adjusted to calculate an average based on any number of years.

The ParallelPeriod operator is used in the report below to display a 3-month moving average based on amounts corresponding to the same month in each of the three years. This report, of course, has missing values for all months up until the third year of data.

The first moving average value to be calculated is based on data from January 2010, 2011, and 2012. The measure is Amount and the periodic item for the aggregation Month. The inner interval for the aggregation is _ByMonth_ and the outer interval is _ByYear_. The 0, -1, and -2 offset values represent months of the current year, the previous year, and the year before the previous, respectively. These ParallelPeriod expressions and the divisor could also be adjusted to calculate an average based a different divisor.

Do keep in mind that for all of these periodic aggregations the ‘aggregation by’ column representing month or year must be included in the report.

I hope these examples using the periodic operators will be helpful to you in creating your Visual Analytics reports.

For many people, building something from scratch, no matter how simple or complex, is fascinating. That’s why programs similar to How’s It Made are so appealing and, for me, addicting. And thus, the inspiration for this blog; I will walk you through building a set of graphs and how to improve each visualization through my own personal iterative process. Like all forms of art, a visualization is never complete, as constant improvement, tweaking and alterations are required to accommodate the constant influx of data and the ever changing needs of our audience.

These graphs use telecom data about cell phone network service including call duration and data usage.

### Example 1: Calls versus Drops

In this first example, I noticed that the data contained the number of calls and the number of dropped calls. Like most analytics, audiences are interested in the outliers. In this case, we look at the poor performing occurrence of a call being dropped. This data would prove useful if a company wanted to research poor performing cell technology either of the handset itself or of the cell towers. It could also be used to find any dead zones, where additional towers may need to be added. In this example, I decided to plot the data against the 24-hour day to determine if volume of calls impacted the number of calls dropped.

Example 1: Iteration 1
Naturally, I started with a bar chart visualization. I plotted the hours of the day (24-hour scale) on the x-axis and the number of calls and the number of dropped calls on the y-axis. At first glance, it looks like there is some variation in the number of dropped calls and the hour of the day.

Example 1: Iteration 2
Since the number of calls and the number of drops are of the same scale, we can easily take the ratio of the two to plot the call drop rate. The equation is to take the number of dropped calls divided by the number of calls. I used an aggregated measure to create this ratio, which will be evaluated on the fly, depending on the Group By variable. In this example the “ByGroup” is the hour of the day, and I used a Percent format.

This now gives us one bar to evaluate against the hours of the day. We can see that the Call Drop Rate does not fluctuate as much as the previous graph could lead one to believe.

I also added a reference line at 5% to make it easier to see which hour of the day fell below or above the 5% rate.

Example 1: Iteration 3
Finally, I noticed that the data contained a Cell Technology category. I thought it would be interesting to see if a certain technology was more unreliable than another. To do this, I added Cell Technology to the graph. I liked the visualization the best when I changed the bar chart to a horizontal orientation, used a row lattice for the Cell Technology and kept the 5% reference line. This now gives me an enhanced “quick glance” comparison ability to see that the 4G Cell Technology seems to have the most consistently high Call Drop Rate (over 5%) for all hours of the day.

### Example 2: Call Duration

Example 2: Call Duration
In this second example, I used the Voice_Seconds data item to study the duration of calls over the course of the 24-hour day. This visualization could help determine what the peak hours of the day are for voice calls and potentially the best time to schedule any required maintenance to impact the least amount of customers.

Example 2: Iteration 1
Again, I stared with a bar chart visualization where I plotted hours of the day on the x-axis and the Voice_Seconds on the y-axis. The first thing I noticed was that at hour 20 there was a peak _SUM_ of over 3 million Voice_Seconds. This immediately prompted me to want to find out how long 3 million seconds was and that I need to look at the average of Voice_Seconds.

Example 2: Iteration 2
The first thing I did was create another aggregated measure to produce the Average Call Duration. To do this I took the sum of Voice_Seconds divided by the sum of the number of calls for a By Group.

The next thing I wanted to do was provide a reference point for how long 1,000 seconds is in minutes. Granted, I could convert Voice_Seconds into a new metric but instead I decided to use a reference line where 900 seconds equates to 15 minutes.

Example 2: Iteration 3
Lastly, I wanted to see if the type of Cell Technology had any impact on the distribution or length of call, mostly just because I was curious.  I was surprised that this data shows an average call of 15 minutes. That’s a long personal call when I consider most of my calls consist of “are you on your way?” and “we forgot x at the store – please pick it up on your way home”.

If this were call center data you would be able to determine how quickly issues were getting resolved. If this were sales call data, and representatives were following a script, this visualization would show, on average, how long those calls took and maybe the longer calls would result in a sale. So you could see which hours of the day sold more product.

### Example 3: Data Usage

In this third example, I explored the data usage. I had two data items available for use: mbytes_up and mbytes_down. This visualization could help determine peak hours for which to perform system upgrades or maintenance. It could also help identify those peak hours and then add tower locations to help determine if additional hardware could help network speed performance.

Example 3: Iteration 1
I started with the bar chart visualization and plotted the hours of the day on the x-axis and the mbytes_up and mbytes_down on the y-axis. Again, the first thing I noticed was that I was looking at the _SUM_ for the metrics and the large difference in the numbers for up versus down usage.

Example 3: Iteration 2
The first thing I did was convert the megabytes to gigabytes by creating new calculated data items and then I created the Average Upload and Download by creating aggregated measures.

Here are the two metrics I created for Upload:

This makes the visualization a bit easier to consume, now that we can compare the average upload or download size per session. I also added two reference lines at 5 GB and 15 GB. I still felt like the data needed to be visualized a bit better to understand the usage since I know most typical cell phone plans allow for 5 GB of data per month and the average session using more than 15 GB, it just seems like a lot.

Example 3: Iteration 3
To further classify the data I added Cell Technology to the visualization and broke it into two visualizations: one for upload and one for download. Once I did that, the visualization really started to show different data usage patterns.

Both visualizations show the 4G technology doesn’t even reach 5 GB, which makes me think that the customers with new phones and new service plans are sticking to their data allowance. But the customers with the older technology of 3G may be “grandfathered” in with their unlimited data plan and making the most of it.

This blog has taken you through three examples of how I iteratively develop visualizations using SAS Visual Analytics. Ultimately, what this process shows you is that the more specific business question you have, the better a visualization you can create.

Update-in-place supports the ability to update a SAS Deployment within a major SAS release. Updates often provide new versions of SAS products. However, when using the SAS Deployment Wizard to perform an update-in-place you cannot selectively update a machine or product. As a general rule if you want to update one product in a SAS Deployment you have to update the whole deployment. With the latest version of SAS Studio, that’s not the case.  You can now update from version 3.4 to version 3.5 of SAS Studio without updating any other part of your SAS deployment.

SAS Studio 3.5 contains some interesting new functionality:

• A new batch submit feature.
• The ability to create global settings for all SAS Studio users at your site.
• A new Messages window that displays information about the programs, tasks, queries, and process flows that you run.
• New keyboard shortcuts to add and insert code snippets.
• Many new tasks for statistical process control, multivariate analysis, econometric analysis, and power and sample size analysis. For more information, see SAS Studio Tasks.

For my purposes, I was really interested in using the batch submit feature. Using “Batch Submit” a user can run a saved SAS program in batch mode, which means that the program will run in the background while you continue to use SAS Studio. When you run a program in batch mode, you can view the status of programs that have been submitted, and you can cancel programs that are currently running.

So how does this “selective update” work? Somewhat unusual for a product update, it is available via a hot fix documented in the note 57898: Upgrade SAS® Studio 3.4 to SAS® Studio 3.5 without upgrading other products.

SAS Studio is available in three different deployment flavors: SAS Studio Mid-Tier (the enterprise edition), SAS Studio Basic, and SAS Studio Single-User. The hot fix is available for the enterprise and basic edition. In addition, in order to apply the hot fix the current deployment must be at SAS 9.4 M3. For SAS Studio Single-User, an MSI file has been added to the downloads section of support.sas.com to allow users to download SAS Studio 3.5 to run against their existing Windows desktop SAS for releases 9.4M1 and higher.

The hot fix is a container hot fix, meaning the hot fix delivers one or more “MEMBER” hot fixes in one downloadable unit. Container hot fixes have some special rules you must follow when applying them.

• They must be applied separately to each machine. The installation process will apply only those MEMBER hot fixes which are applicable based on the SAS Deployment Registry for each specific machine.
• They may contain MEMBER hot fixes for multiple operating systems. The SAS Deployment Manager will apply only those MEMBER hot fixes which are applicable for the operating system on each specific machine.
• They often contain pre and/or post installation steps outlined in the instructions provided.

A review of the hot fix instructions shows that to complete the update for the SAS Studio Mid-Tier the web application must be rebuilt and redeployed.

To apply the container hot fix on my three tier deployment, which has a Windows metadata server, LINUX compute tier and LINUX middle tier, I downloaded the hot fix to a network accessible location and followed the process documented in the hot fix instructions. To summarize:

Create a deployment registry report on each machine. The reports showed that:

SAS Studio Basic is installed on the Linux compute tier.

SAS Studio Enterprise is installed on the Linux middle-tier.

### Update SAS Studio Basic

Stop all SAS servers in the deployment. Run the SAS Deployment Manager on the LINUX compute tier and select Apply Hot fixes and then select the directory where the hot fix was downloaded. The Wizard updates SAS Studio Basic. A review of the hot fix documentation shows no post-deployment steps are required for SAS Studio Basic.

### Update SAS Studio Mid-Tier (Enterprise)

Run the SAS Deployment Manager on the LINUX middle-tier tier and select Apply Hot fixes and then select the directory where the hot fix was downloaded. The Wizard updates SAS Studio Mid-Tier.

A review of the hot fix documentation shows that, to complete the update, the SAS Studio Web Application must be rebuilt and redeployed.

Start the SAS Metadata Server and use the SAS Deployment Manager on the middle-tier to rebuild just the SAS Studio Middle-Tier. Start all SAS Servers and use the SAS Deployment Manager on the middle-tier machine to redeploy just the SAS Studio Middle Tier.

When the redeploy is completed, I logon to SAS Studio. Selecting Help > About shows that now I have SAS Studio 3.5.

If I navigate the folder tree and select a SAS program I can now right-click on the program and select “Batch Submit” to run the program in the background.

If you are excited about the new functionality of SAS Studio 3.5, I think you will agree that the hot fix provides an easy path to update the software.

A quick way to update to SAS Studio 3.5 was published on SAS Users.

Have you ever had problems matching data that has typographical errors in it? Because of the nature of arbitrary typos and incorrect spelled words a specific matching technique is required to tackle those cases. SAS Data Quality, with its traditional, in nature deterministic matching approach is by nature not best suited for correctly matching typos such as character transpositions and missing or additional characters in words. But SAS provides a feature called suggestion based matching in SAS Data Quality especially designed for matching data with typos. Suggestion based matching provides a more probabilistic alike way towards matching. With suggestion based matching, SAS Data Quality will output multiple matchcodes based on alternative “suggestions” for a data field. Each suggestion also includes a score that reflects the closeness of the suggestion to input word.

Let's dive a little deeper.

### The concept of SAS Data Quality suggestion based matching

Prerequisite for suggestion based matching in SAS Data Quality is a dictionary of known words along with a frequency count for each word. The matching engine will generate “suggestions” for potentially misspelled input values from the known words dictionary. The generated suggestions are made of the data input, but with spelling errors like character deletions, insertions, replacements, and transpositions. Whitespace insertion, casing, and context-dependent pronunciation can also be taken into account. For each suggestion a score, that reflects the closeness of it to the input, is calculated. The higher the match score for a suggestion, the more likely it is the true entity.

The known words dictionary is generated with the idea that the number of correctly spelled entity names are more frequent and therefore outnumber the misspelled ones. This is an important aspect of the match score calculation for the suggestions. The matching engine will generate suggestions by taking the input value and “inject” character transpositions, replacements and other typos and compare it against the known words dictionary entries. During the whole process the input value is seen as the potentially corrupted version of the true entity. By making various character based alterations to the input data, the matching engine tries to find possible candidate entities in the known words dictionary. If one of the suggestions matches a word of the known words dictionary the engine calculates a match score based on the frequency count and the changes required to create the suggestion. This concept potentially results in a list of possible matches including the true entity and other misspelled or “close enough” words identified as potential matches. The resulting match score can finally help to resolve the true entity.

Suggestion based matching involves more compute resource and therefore will slow down data throughput of the matching process. Still, it is a proven approach to provide match results for input data that contains character level data quality issues. To minimize performance implication, suggestion based matching is best used as a second iteration for input data that could not be matched using the standard matchcode method.

Have you ever had problems matching data that has typographical errors in it? Because of the nature of arbitrary typos and incorrect spelled words a specific matching technique is required to tackle those cases. SAS Data Quality, with its traditional, in nature deterministic matching approach is by nature not best suited for correctly matching typos such as character transpositions and missing or additional characters in words. But SAS provides a feature called suggestion based matching in SAS Data Quality especially designed for matching data with typos. Suggestion based matching provides a more probabilistic alike way towards matching. With suggestion based matching, SAS Data Quality will output multiple matchcodes based on alternative “suggestions” for a data field. Each suggestion also includes a score that reflects the closeness of the suggestion to input word.

Let's dive a little deeper.

### The concept of SAS Data Quality suggestion based matching

Prerequisite for suggestion based matching in SAS Data Quality is a dictionary of known words along with a frequency count for each word. The matching engine will generate “suggestions” for potentially misspelled input values from the known words dictionary. The generated suggestions are made of the data input, but with spelling errors like character deletions, insertions, replacements, and transpositions. Whitespace insertion, casing, and context-dependent pronunciation can also be taken into account. For each suggestion a score, that reflects the closeness of it to the input, is calculated. The higher the match score for a suggestion, the more likely it is the true entity.

The known words dictionary is generated with the idea that the number of correctly spelled entity names are more frequent and therefore outnumber the misspelled ones. This is an important aspect of the match score calculation for the suggestions. The matching engine will generate suggestions by taking the input value and “inject” character transpositions, replacements and other typos and compare it against the known words dictionary entries. During the whole process the input value is seen as the potentially corrupted version of the true entity. By making various character based alterations to the input data, the matching engine tries to find possible candidate entities in the known words dictionary. If one of the suggestions matches a word of the known words dictionary the engine calculates a match score based on the frequency count and the changes required to create the suggestion. This concept potentially results in a list of possible matches including the true entity and other misspelled or “close enough” words identified as potential matches. The resulting match score can finally help to resolve the true entity.

Suggestion based matching involves more compute resource and therefore will slow down data throughput of the matching process. Still, it is a proven approach to provide match results for input data that contains character level data quality issues. To minimize performance implication, suggestion based matching is best used as a second iteration for input data that could not be matched using the standard matchcode method.

When I attended my first SAS conference in 2003 I was not only a first-timer, I was a first time presenter.  Needless to say I was a bit nervous.  I did not know what to expect.  Was my topic good enough for these savvy programmers and statisticians?  Well my first time was an experience I will never forget.  I gave my presentation to a relatively full room and I thought it went well enough, but I was shocked when I found out I got best paper in the section.  Ever since then I have been actively involved with SAS conferences, whether presenting or helping as a conference committee member.  After years of presenting and helping, I was asked to be the Academic Chair.  I was beyond thrilled. But I won’t let the position stop me from actually presenting some material that I have found to be very helpful at this year’s Midwest SAS Users Group.

One of the things I do in my job is I look for functions that can help make my job easier.  Once I find these functions, I like to research them and see how I can incorporate them into my programming to make it more efficient.  This year at MWSUG I will share some of my findings via an e-Poster, “When ANY Function will Just NOT Do.”

The e-Poster illustrates the concept of what I like to refer to as the “ANY and NOT Functions.” Some of the functions in this group are ANYALNUM, NOTALNUM, ANYALPHA and NOTALPHA.  Below are some snippets of code that show how some of these functions can be used to determine if there is an alphabetic character, a number or punctuation in the variable.

``` /* checks for first instance of ... */ alnum = anyalnum(value); /* alpha-numeric */ nalnum = notalnum(value); /* non-alphanumeric */ alpha = anyalpha(value); /* alphabetic */ nalpha = notalpha(value); /* non-alphabetic */ digit = anydigit(value); /* digit */ ndigit = notdigit(value); /* non-digit */ punct = anypunct(value); /* punctuation */ npunct = notpunct(value); /* non-puncuatation */```

Want to learn more about these functions?  At MWSUG this year you can see how they can be used along with other common SAS functions to extract numbers from a text string or how to build ISO 8601 dates.

So please join me at the MidWest SAS Users Group Conference October 9 – 11 at the Hyatt Regency in downtown Cincinnati, Ohio. Register now for three days of great educational opportunities, 100+ presentations, training, workshops, networking and more.

Hope to see you there!

MWSUG preview: When ANY Function will just NOT do! was published on SAS Users.

Tablets, phablets, smartphones.

These mobile devices not only travel to different corners of the earth with their owners; they participate in certain adventures that can result in an unexpected turn of events.

Thanks to their mobility, these devices can be misplaced. And they could be found later. In rare cases, they can get lost. In the event that a user is separated from his or her mobile device, there are security mechanisms in place for protecting access to your organization’s server where data and reports reside.

Whether mobile devices accompany their SAS Mobile BI 7.33 users to the Himalayas or to the Sahara Desert, they certainly need to be tracked and managed by administrators. In my last blog, I talked about how an app-specific passcode protects access to the SAS Mobile BI app by preventing anyone other than the SAS Mobile BI user from accessing the app on the mobile device. Now, let’s take a look at how your administrators manage and protect access from the SAS Mobile BI app on your devices to connect to servers in your organization.

The SAS Visual Analytics 7.3 suite of applications includes the Administrator application with the Mobile Devices tab. The Mobile Devices tab is somewhat like an air traffic control system for an airport. Just as airplanes that land and take off are monitored and managed at the air traffic control tower by personnel, mobile devices that connect to your organization’s server with the SAS Mobile BI app are monitored and managed in the Administrator application’s Mobile Devices tab.

SAS Visual Analytics Administrator runs on the same server where your SAS Visual Analytics reports are stored and accessed. It maintains a logon history that informs your administrator details regarding mobile devices that logged on or attempted to log on to the server from the SAS Mobile BI app. For example, a timestamp indicates when a device connected to the server. A management history equips administrators with data on mobile devices that were whitelisted, blacklisted, or removed from either the whitelist or the blacklist.

### Managing Mobile Devices

Regardless of how many mobile devices are installed with the SAS Mobile BI app, security administration is required for every device that accesses data and reports on the server. Every mobile device has a unique identifier, and this unique identifier is used by SAS Visual Analytics Administrator to determine if the device is allowed to access the server.

To control mobile devices’ access to your organization’s server, your administrator manages server access by implementing either a whitelist or a blacklist from the server. By default, blacklisting is enforced on servers that are accessed by the SAS Mobile BI app.

### Inclusion Approach to Managing Devices

The whitelist scenario follows the inclusion approach. By default, you cannot connect to the server via SAS Mobile BI until your device ID is added to the whitelist by your administrator. If the unique device ID is added to the whitelist by your administrator, you can use the device to subscribe, view, and interact with SAS Visual Analytics reports (via the SAS Mobile BI app).

### Exclusion Approach to Managing Devices

In the blacklist scenario, it is the exclusion approach. By default, everyone can connect to the server from SAS Mobile BI on their mobile devices unless their device IDs are added to the blacklist by the administrator. Any device whose unique device ID is not added to the blacklist can connect to the server from the SAS Mobile BI app. For instance, if you lost your mobile device, your administrator can go to the Logon History, select the device (listed by device ID, user name etc), and add it to the blacklist. Then, you cannot use the device to log on to the server from the app.

### The Easy Way to Switch from Blacklisting to Whitelisting

By default, blacklisting is enforced on the server that is accessed by the SAS Mobile BI app, and the viewerservices.enable.whitelist.support configuration property in SAS Management Console (SAS Configuration Manager for SAS Visual Analytics Transport Service) is set to false. If you are an administrator, and wish to switch from blacklisting to whitelisting, the easiest way to do it is to select whitelisting in the Administrator’s Mobile Devices tab and add the device IDs to the whitelist. Then, the viewerservices.enable.whitelist.support configuration property in SAS Management Console is automatically updated and set to true. This is an easier method for switching from blacklisting to whitelisting because it does not require a restarting of the server. If you were to go to SAS Management Console first and set the viewerservices.enable.whitelist.support configuration property to true, this action requires you to restart the server.

### Request to Add Devices to a Whitelist

There are a couple of different ways that your administrator can obtain and add device IDs to a whitelist. If the unique device ID is already known to your administrator, he or she can easily add it to the whitelist in the Administrator application’s Mobile Devices tab. Alternatively, if you happen to install SAS Mobile BI app on a new mobile device that is not being managed from the server, the app can take you to your email with template text that includes your mobile device ID – just send that email to your administrator requesting server access from your mobile device.

Suspending or Allowing Server Access from the App

Now here is my most favorite part of device management. Access from SAS Mobile BI to the server, as we have just noted, is determined by either whitelist or blacklist management of devices, not by user accounts. This approach extends flexibility for SAS Mobile BI users. For example, I have an iPhone and a Galaxy Tab Pro tablet – I have SAS Mobile BI app on both devices, and I use both of them to access the server, subscribe to SAS Visual Analytics reports, view, and interact with them. If I happen to misplace my Galaxy Tab Pro tablet and can’t seem to find it, I notify my administrator so that server access from the app on this device can be removed.

My administrator, who follows the whitelist approach, removes my Galaxy Tab Pro’s unique device ID from the whitelist. Then, I can no longer use the SAS Mobile BI app on this device to log on to our server and subscribe to SAS Visual Analytics reports. However, I can continue to use my iPhone (which has remained in the whitelist) to log on to the server, subscribe, view and interact with reports on the server.

Few days later, I find my misplaced Galaxy Tab Pro tablet. I email my administrator indicating that the device is back in my possession. My administrator adds the unique device ID for my Galaxy Tab Pro back to the server’s whitelist. Voila – I am back in business, using SAS Mobile BI to connect to the server from my Android tablet.

### In a Nutshell

There are several mechanisms available for securing the SAS Mobile BI app and access to your organization’s servers is one of them. In my next blog, we will take a look at how tethering works to protect data access for SAS Visual Analytics reports from the SAS Mobile BI app.

Managing Server Access for SAS Mobile BI Users was published on SAS Users.

After posting a couple of blogs on the subject of dates and date formats in Visual Analytics Designer, I got a question from a user who wondered how to compare data for a selected date to data from the same day of the previous year. Here’s one way to do this.

The example report enables a user to type in a date value in a variety of formats and displays the sale amount for the specified date, along with the sale amount for the same day of the previous year.

The data source includes information on thousands of orders. Irrelevant data items have been hidden, with the items of interest shown below. Transaction Date has an associated MMDDYYYY format and Transaction Weekday is simply a duplicate of the date with an associated Day of Week format.

A parameter, Param date, is associated with the Parameter role of the Text input field and will store the value typed in the field.

Several calculated data items are created for ‘behind the scenes’ filtering and Ref Date is a data item that will store the date converted from the entered text version of the date.

Note that the ANYDTDTE informat is a great one to use when you are uncertain as to exactly how users will be typing in a date value.
Ref Date (Yr-1) is the ‘same day a year ago’ date.

A filter on the list table completes the report:
( Ref Date = Transaction Date ) OR ( Ref Date (Yr-1) = Transaction Date )
The data items below are now used to populate the report showing the data for the specified date for comparison with the data for the same day of the previous year.

With the addition of a few more calculated data items and filters, some additional report objects can offer alternate ways of displaying the data.
Prod Sale (ref date) and Prod Sale (Yr-1) are calculated as below:

The addition of the two new data items allow the information to be presented in the list table below, with this filter applied:  (Prod Sale (ref date) NotMissing OR Prod Sale (Yr-1) NotMissing)

The same filter (Prod Sale (ref date) NotMissing OR Prod Sale (Yr-1) NotMissing) can be applied to a crosstab object to produce the result below:

The addition of one additional calculated data item, Date, and a new crosstab object with a filter on Date, enables still a different display.

I found this to be an interesting example–both the problem and the solutions. I hope this blog will give you more ideas about using your dates to the best advantage in report.

If you are using the second maintenance release of SAS 9.3 (TS1M2) or later, you might have noticed that you have several map-related libraries that are defined for you.

• The MAPS library contains the old map data sets that have been provided with SAS/GRAPH® software for many years.  The source for these data sets was mainly freely available data or purchased data. As a result, it became difficult or impossible to provide updates to this data.
• The MAPSGFK library contains new map data sets that are licensed through GfK Geomarketing and that are provided as part of SAS/GRAPH software.
• The MAPSSAS library points to the same location as the MAPS library.

### Determining which library to use

You should use the MAPSGFK data sets to produce your maps.  There are several advantages to using the MAPSGFK data sets:

• The older MAPS library data sets contain outdated data, and this library will not be updated.
• The MAPSGFK data is updated more frequently.
• The MAPSGFK data sets standardize the variables in the data set. For example, the X and Y variables always contain the projected values, and LONG and LAT always contain the unprojected values.

Each of the data sets also contains the ID variable, as shown in the following example, to enable you to create a map without knowing about the boundaries that are contained in the data set.

```proc gmap data=mapsgfk.ireland_attr map=mapsgfk.ireland; id id; choro idname; run; quit;```

As mentioned above, the MAPSGFK library data sets contain both projected and unprojected values, which is helpful when you use annotation with maps or when you create a subset of a map.

• The MAPSGFK.PROJPARM data set contains the parameters that were used when the projected data was created for each of the data sets.  You can use these parameters with the GPROJECT procedure when you project an annotate data set. For more information about projecting an annotate data set using MAPSGFK.PROJPARM, see the sample code that appears in the section Code to Project Annotate Data with a GfK Map Data Set ("Chapter 37: GMAP Procedure") in SAS/GRAPH® 9.4: Reference, Fourth Edition.
• Many of the MAPSGFK data sets include a lower level of hierarchy  for boundaries than was available previously in the map data sets, as shown in this example:
```proc gmap data=mapsgfk.africa1 map=mapsgfk.africa1; id id; choro id / nolegend; run; quit;```

This code sample generates the following output:

You can use the GREMOVE procedure to remove internal boundaries that you don’t want as part of your map.

• In earlier releases, the names of the map data sets in the MAPS library were limited to eight characters. The MAPGSFK data sets do not have that restriction, so you can use longer names. The longer data-set names enable you to determine the map data-set content more easily.

### Determining which MAPSGFK data sets to use

To help you determine which data set to use to create your map, you can use dictionary tables in the SQL procedure to generate a list of the data sets that are contained in the MAPSGFK library, along with their associated labels.

This example illustrates how you can view the dictionary table for selected data sets (in this case, MEMNAME and MEMLABEL):

```proc sql; select memname, memlabel from dictionary.tables where libname='MAPSGFK'; quit;```

The following output is a partial display of the results:

The data sets that end in _ATTR are the attribute data sets for the map data set of the same base name. You can use the attribute data sets to obtain names and other information that is associated with variables in the map data set.

You can determine more information about the contents of the data sets using the DICTIONARY.MEMBERS dictionary table in PROC SQL.

```proc sql; select name, type, length, label from dictionary.columns where libname='MAPSGFK' and memname='NAMERICA'; quit;```

The following output shows the results:

In some cases, changing your existing PROC GMAP code to use the MAPSGFK data sets rather than the MAPS data sets might be as simple as changing the data set name in the MAP= option of the PROC GMAP statement. In other cases, this changing data sets can include changing the variables that are listed in the ID statement of PROC GMAP, including using a GREMOVE procedure step to remove a lower-level map boundary to or remove or modify a GPROJECT procedure step. You can find tips about modifying your existing PROC GMAP code to work with MAPSGFK data sets in the section Using GfK Map Data Sets with Existing Code ("Chapter 37: GMAP Procedure") in  SAS/GRAPH® 9.4: Reference, Fourth Edition.

You can find more information about the new map data sets and modifying your existing map programs to use the MAPSGFK data sets in the SAS Global Forum paper The New SAS® Map Data Sets (by Darrell Massengill) and in the SAS/GRAPH Concepts section of SAS/GRAPH® 9.4: Reference, Fourth Edition.

MAPS, MAPSGFK and MAPSSAS, Oh my! was published on SAS Users.

Reference lines on a visualization are used to help identify goals or targets, acceptable or unacceptable ranges, etc; basically any metric that puts a frame of reference around the values on the visualization.

The Percent of Total of a metric is used to help identify a part-to-whole relationship. It answers the question, how much of the whole does this piece represent?

In this blog, let’s take a look at how you can use both the Percent of Total metric and Reference Lines  to enhance your data visualizations using SAS Visual Analytics.

### Example 1

In this section, we are reporting on the Percent of Total for Revenue. First, look at the single select List control object. You’ll see I have displayed the available Product Lines and their corresponding frequency percent. This allows the report viewer to quickly understand the number of rows associated with that Product Line when compared to the whole of the data.

Next, under the List control object, we have a Stacked Bar Chart graphing the Revenue (Percent of Total) which allows the report viewer to understand the part-to-whole relationship of the Products that make up that Product Line. While we can clearly see that the Board Product, colored in blue, is outperforming the other two Products, it may be difficult to tell which remaining Product is pulling in the higher Revenue.

The Grouped By Bar Chart in the middle of the report can be used to quickly compare the performance of each Product. I added a reference line so that the report viewer can quickly identify which Products are pulling in more than 25% of the Revenue for that Product Line.

Here are some additional views from this report:

In this view, we have selected the Action Figure Product Line and it makes up 57.32% of the Frequency Percent. We can see that not one individual Product in the Action Figure Product Line reaches 25% of the Revenue’s Percent of Total and that they are all hovering near 15%. By using the Revenue (Percent of Total) metric all of the data is normalized and the static reference line allows for quick and easy comparison over all of the Product Lines.

And in this view, for the Promotional Product Line which makes up 7.05% of the Frequency Percent of the data, we can see that there are a few Products that are outperforming the others. As these are promotional products this can be expected as trends and styles fluctuate.

After examining this report about Revenue (Percent of Total), you can easily think of other reports that would be useful to an organization. For the high and low Revenue (Percent of Total) values, are the number of employees assigned to the Products and/or Product Lines appropriate? What about the Expenses both Operational and Material for your most and least revenue generating Products and/or Product Lines, is the spending reasonable? Are the Product Material Costs justified?

Using a part-to-whole comparison visualization can help identify other areas of business that may need further investigation.

### Example 2

Here is another report example using the Revenue (Percent of Total) metric with reference lines. In this example, we plotted Revenue (Percent of Total) against the months of the year. Here we can see how the Revenue (Percent of Total) is dispersed across the months. I’ve also added a

As the Promotional Product Line lends itself to the most fluctuation, we can see the breakdown of the Revenue (Percent of Total) and how it maps to the different months. Like the other report, this can then lead to additional reports to answer questions of if there is any seasonality to the spikes, or pair these findings with other market events. The Revenue (Percent of Total) for iPhone Covers is higher in July 2011, was there a new iPhone release that month? It may also be good to learn what was happening in April 2011 as the Revenue (Percent of Total) for Backpacks increased.

Combining the results of these reports with other groups in the organization can help determine which business decisions are having the desired impact on the bottom line results. Are the marketing strategies effective? Are the planned expense reductions are being met? Are we making better use of our product material waste?

Reference lines can help by making it easy to quickly identify whether or not targets are being met. And by using the Revenue (Percent of Total), a single reference line can be used across several categories since the scale has been adjusted to 100%.