Tech

9月 212016
 

mwsug-2016-logoI started out as a Psychology major. During my third year as an undergraduate, I was hired on as a research assistant for my advisor in her cognitive psychology lab. Through this and progressively more complicated psychological research experience, I quickly grew to love statistics. By the end of that year, I decided to declare it as a second major. My first introduction to SAS was as a fourth-year undergraduate psychology student - still new to my statistics degree curriculum and working on a large-scale meta-analysis project spanning years of data. I had never programmed before seeing my first SAS procedure. I broke down in tears, terrified at what I had gotten myself into. I toughed it out (with help from my statistics professor), finished my psychology honors thesis with top grades and went on later to use SAS in my statistics thesis for good measure.

About a year later, in 2011, that same statistics professor encouraged us to submit our work for presentation at MWSUG, sweetening the deal with a promise of extra credit if we did. I hopped on that opportunity and submitted both my psychology thesis as well as my statistics thesis that night. A couple of months later, I received an email…they accepted both of my papers and awarded me a FULL student scholarship to attend!

I have come a long way from presenting my first thesis projects (I just arrived home from my 27th conference last weekend). I have learned to love not only SAS, but the statistics behind each procedure. This year, at MWSUG 2016 in Cincinnati, OH. I will be presenting 3 projects. One project will be in ePoster format. As the chair of this section (yes, this is correct. I’ve gone from terrified student to a section chair!), I felt the need to support it with my own research as well. This project is dedicated to the common and very pesky concept of Multicollinearity.

What is Multicollinearity? Why, it is precisely the statistical phenomenon wherin there exists a perfect or exact relationship between the identified predictor variables and the interested outcome variable. Simply put, it is the existence of predictor co-dependence. Coincidently, it is quite easy to detect. You can do so with three very simple to utilize options and one procedure, such as those given in the below example:

/* Examination of the Correlation Matrix */
Proc corr data=temp;
Var hypertension aspirin hicholesterol anginachd smokingstatus obese_BMI exercise _AGE_G sex alcoholbinge; Run;
 
/* Multicollinearity Investigation: VIF TOL COLLIN */
Proc reg data=temp;
Model stroke = hypertension aspirin hicholesterol anginachd smokingstatus obese_BMI exercise _AGE_G sex alcoholbinge / vif tol collin;    
Run;     Quit;

Through the CORR procedure, we can examine the correlation matrix and manually check for any predictor variables that show high correlation with other variables. In our REG procedure, we can indicate the VIF, TOL, and COLLIN options in the MODEL statement to pull information measuring the variance inflation factor, tolerance, and collinearity.

Would you like to learn more about how to interpret the results produced by these procedures? Would you like to learn ways to control for multicollinearity after it has been detected? Come check out my poster at MWSUG 2016, October 9-11 in Cincinnati! I would love to chat about your multicollinearity issues, interest, or just other curious questions.

tags: MWSUG, papers & presentations, US Regional Conferences

Multicollinearity and MWSUG – a beautiful match was published on SAS Users.

9月 162016
 

ProblemSolversIf you use SAS® software to create a report that contains multiple graphs, you know that each graph appears on a separate page by default. But now you want to really impress your audience by putting multiple graphs on a page. Keep reading because this blog post describes how to achieve that goal.

With newer versions of SAS, there are many different options for putting multiple graphs on a single page. This blog post details these different options based on the following ODS destinations: ODS PDF, ODS RTF, and ODS HTML destinations.

To put multiple graphs on a page (whether you are using ODS or not), the SAS/GRAPH® procedure PROC GREPLAY is typically a good option and is mentioned several times in this blog post. For detailed information about using PROC GREPLAY, see SAS Note 48183: “Using PROC GREPLAY to put multiple graphs on the same page.”

The ODS PDF Destination

The ODS PDF destination is the most commonly used ODS destination for putting multiple graphs on a single page and it also offers the most options, which are described below.

STARTPAGE=NO Option

One way to put multiple graphs on a single PDF page is to use the STARTPAGE=NO option in the ODS PDF statement. Here is sample SAS code that demonstrates how to stack two SGPLOT graphs vertically on the same PDF page using the STARTPAGE=NO option in the ODS PDF statement:


title1; 
ods _all_ close; 
ods pdf file='c:tempsastest.pdf' startpage=no notoc dpi=300;

ods graphics / reset=all height=4in width=7in;  
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="Audi";
run;
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="BMW";
run;

ods pdf close;
ods preferences;  

Note that in the code above, the vertical height of the graphics output is reduced to 4 inches using the HEIGHT option in the ODS GRAPHICS statement. When using a traditional SAS/GRAPH procedure (such as GPLOT), you must specify the vertical height of your graph output using the VSIZE option in a GOPTIONS statement, as shown here:

goptions vsize=4in;

For sample code that demonstrates how to use the STARTPAGE=NO option in the ODS PDF statement to put four graphs on the same PDF page (two across and two down), see SAS Note 48569: “Use the STARTPAGE option in the ODS PDF statement to put multiple graphs on a single page in a PDF document.”

ODS LAYOUT

Another option for putting multiple graphs (and tables) on the same PDF page is ODS LAYOUT. With ODS PDF, absolute layout allows you to define specific regions on the page for your graph and table output. For sample code that demonstrates how to put multiple graphs and tables on the same PDF page using ODS LAYOUT, see SAS Note 55808: “ODS LAYOUT: Placing text, graphs, and images on the same PDF page.”

PROC GREPLAY

You can also put multiple graphs on the same page in a PDF document using the SAS/GRAPH procedure PROC GREPLAY. While the GREPLAY procedure has been around a long time, it is still a very useful option in many situations.

However, note that PROC GREPLAY can only replay graphics output that has previously been written to a SAS/GRAPH catalog, so it has the following limitations:

  • It cannot directly replay graphics output created with the SG procedures and ODS Graphics.
  • It cannot directly replay text-based output such as that created with Base® SAS procedures like PROC PRINT and PROC REPORT.

For sample SAS code that demonstrates how to put four GCHART graphs on the same PDF page using PROC GREPLAY, see SAS Note 44955: “Use PROC GREPLAY with the ODS PDF statement to place four graphs on the same page.”

For sample SAS code that demonstrates how to put six GCHART graphs on the same PDF page using PROC GREPLAY, see SAS Note 44973: “Use PROC GREPLAY to place size graphs on a single page in a PDF document.”

Although you can use PROC GREPLAY with graphics output created with the SG procedures and ODS Graphics, you must use the three-step process demonstrated in SAS Note 41461: “Put multiple PROC SGPLOT outputs on the same PDF page using PROC GREPLAY.”

The ODS RTF Destination

The following sample code demonstrates how to stack multiple graphs vertically on the same page using the SGPLOT procedure in combination with the STARTPAGE=NO option in the ODS RTF statement:

options nodate nonumber; 
 
ods _all_ close; 
ods rtf file='sastest.rtf' startpage=no image_dpi=300; 
 
ods graphics / reset=all outputfmt=png height=3in width=7in; 
 
title1 'Graph for Audi'; 
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="Audi";
run;
 
title1 'Graph for BMW'; 
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="BMW";
run;
 
title1 'Graph for Volvo'; 
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="Volvo";
run;
 
ods _all_ close; 
ods preferences;

Because ODS LAYOUT is not fully supported with the ODS RTF destination, I recommend using PROC GREPLAY if you want to arrange multiple graphs on the same RTF page in a grid. For sample SAS code that demonstrates how to put four graphs on a single page in an RTF document, see SAS Note 45147: “Use PROC GREPLAY to place four graphs on a single page in an RTF document.”

For sample SAS code that demonstrates how to put six graphs on a single page in a RTF document, see SAS Note 45150: “Use PROC GREPLAY to place six graphs on a single page in an RTF document.”

The ODS HTML Destination

When you create multiple graphs with the ODS HTML destination, they are stacked vertically on the same web page by default. You can scroll up and down through the graphics output using your web browser’s scroll bar. In most situations, using PROC GREPLAY is a good option for displaying multiple graphs on a single web page.

Another option is to use the HTMLPANEL ODS tagset to display a panel of graphs via the web. For documentation about using the HTMLPANEL tagset, see “The htmlpanel Tagset Controls Paneling.” For sample code that demonstrates how to use the HTMLPANEL ODS tagset to display multiple graphs and tables via the web in a two-by-two grid, see SAS Note 38066: “Use the HTMLPANEL ODS tagset to put multiple graphs and tables on the same web page.”

In conclusion, this blog post covers just a few of the methods you can use to put multiple graphs on a page. There are more options available than those discussed above. For example, for sample code that puts multiple graphs on the same page using PROC SGPANEL, see SAS Note 35049: “Risk panel graph.” For sample code that puts two graphs side-by-side using Graph Template Language and PROC SGRENDER, see SAS Note 49696: “Generate side-by-side graphs with Y and Y2 axes with the Graph Template Language (GTL).”

tags: ods, Problem Solvers, SAS ODS, SAS Programmers

Do you need multiple graphs on a page? We have got you covered! was published on SAS Users.

9月 162016
 

The SAS Quality Knowledge Base (QKB) is a collection of files which store data and logic that define data cleansing operations such as parsing, standardization, and generating match codes to facilitate fuzzy matching. Various SAS software products reference the QKB when performing data quality operations on your data. One of these products is SAS Event Stream Processing (ESP). SAS ESP enables programmers to build applications that quickly process and analyze streaming events. In this blog, we will look at combining the power of these two products – SAS ESP and the SAS QKB.

SAS Event Stream Processing (ESP) Studio can call definitions from the SAS Quality Knowledge Base (QKB) in its Compute window. The Compute window enables the transformation of input events into output events through computed manipulations of the input event stream fields. One of the computed manipulations that can be used is calling the QKB definitions by using the BlueFusion Expression Engine Language function.

Before QKB definitions can be used in ESP projects the QKB must be installed on the SAS ESP server. Also, two environment variables must be set: DFESP_QKB and DFESP_QKB_LIC. The environment variable DFESP_QKB should be set to the path where the QKB data was installed. The environment variable DFESP_QKB_LIC should be set to the path and filename that contains the license(s) for the QKB locale(s).

In this post, I will explore the example of calling the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale in the ESP Compute window. The Source window is reading in events that contain US State data that may or may not be standardized in the 2-character US State abbreviation.

SASEventStreamProcessing

As part of the event stream analysis I want to perform, I need the US_State values to be in a standard format. To do this I will utilize the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale.

First, I need to initialize the call to the BlueFusion Expression Engine Language function and load the ENUSA (English – United States) locale. Note: The license file that the DFESP_QKB_LIC environment variable points to must contain a license for this locale.

SASEventStreamProcessing02

Next, I need to call the QKB definition and return its result. In this case, I am calling the BlueFusion standardize function. This function expects the following inputs: Definition Name, Input Field to Standardize, Output Field for Standardized Value. In this case the Definition Name is State/Province (Abbreviation), the Input Field to Standardize is US_State, and the Output Field for the Standardized Value is result. Note: The field result was declared in the Initialize expression pictured above. This result value is returned in the output field named US_State_STND.

SASEventStreamProcessing03

I can review the output of the Compute window by testing the ESP Studio project and subscribing to the Compute window.

SASEventStreamProcessing04

Here is the XML code for the SAS ESP project reviewed in this blog:

SASEventStreamProcessing05

Now that the US_State values have been standardized for the event stream, I can add analyses to my ESP Studio project based on those standardized values.

For more information, please refer to the product documentation:
SAS Event Stream Processing
DataFlux Expression Engine Language
SAS Quality Knowledge Base

tags: DataFlux Data Management Studio, SAS Event Stream Processing, SAS Professional Services

Using SAS Quality Knowledge Base Definitions in a SAS Event Stream Processing Compute Window was published on SAS Users.

9月 132016
 

SAS releases regular updates to software products in the form of hot fixes and maintenance releases. Hot fixes are SAS' timely response to customer-reported problems, as well as a way to deliver occasional security-related updates that can affect any software product.

At SAS we call them "hot fixes." Other companies might call these "updates," "patches" or (the old-timey term) "zaps." I like the term "hot fix," because it connotes 1) a timely release of code, and 2) something that you apply to an established production system (while it's running "hot").

An important part of the hot fix process is learning when there is a new fix available. Beginning this month, we now have a new SAS Hot Fix Announcements board -- on SAS Support Communities -- where you can learn about newly available fixes. If you're a SAS administrator, these notices provide more complete and detailed information about available hot fixes. And if you are a business user, it behooves you to stay informed about fixes so you can pass information on to your IT department.

Find your fixes: by e-mail, RSS feeds or search

Subscribe optionBy using the community board to host hot fix notices, we've now provided you with many options to subscribe to this content. Like any community board, you can click Subscribe and have the notifications e-mailed to you as soon as they are posted. Or you can use your favorite RSS reader to peruse the latest entries when the timing is right for you. Finally, you can use the Search field on the hot fix board to find just the fixes you need. Simply type a SAS product name, a version number, or text from a SAS Note to find the matching hot fixes. Visit this topic on the SAS Support Communities to learn more about how to pull the information you need, or to have it pushed to you automatically.

Here's a picture of the Hot Fix RSS feed in my instance of Microsoft Outlook:

Hot fix RSS feed in Microsoft Outlook

Hot fix RSS feed in Microsoft Outlook

TSNEWS-L: it has served us well

For decades, SAS Technical Support has used the TSNEWS-L mailing list to send brief, weekly summaries of the recent hot fixes. While this is good for raising awareness that "a fix is available" for a product you use, the information in the e-mail contains no details about what issues each hot fix addresses. Nor does it include a direct link to download the fix. For now, this information will still be delivered via TSNEWS-L -- but we hope that you find the community board to be more flexible.

A peek behind the hot fix curtain

It shouldn't surprise you that we use SAS software to help bring you the details about SAS fixes.

To build these improved notfications, the SAS Communities team and SAS Technical Support worked together to automate the publishing process for hot fix bulletins. All of the hot fix details were already tracked within internal-only operational databases and displayed on the Technical Support Hot Fix download site. Now, a SAS-based process assembles this data, creates a hot fix bulletin entry for each fix and publishes the entry automatically to SAS Support Communities. It's a wonderful mix of SAS data management, REST APIs, DATA step and more. It's a 300-line SAS program (including plenty of comments) that runs every weekday morning -- and it completes in about 15 seconds. It's time well-spent to keep SAS users in the know!

tags: SAS Administrators, sas communities, SAS Technical Support

How to stay informed about SAS hot fixes was published on SAS Users.

9月 092016
 

If you’re doing data processing in the cloud or using container-enabled infrastructures to deploy your software, you’ll want to learn more about SAS Analytics for Containers. This new solution puts SAS into your existing container-enabled environment – think Docker or Kubernetes – giving data scientists and analysts the ability to perform sophisticated analyses using SAS, and all from the cloud.

The product’s coming out party is the Analytics Experience 2016 conference, September 12-14, 2016 at the Bellagio in Las Vegas. In advance of that event, I sat down with Product Manager Donna DeCapite to learn a little more about SAS Analytics for Containers and find out why it’s such a big deal for organizations who use containers for their applications.

Larry LaRusso: Before we get into details around the solution, and with apologies for my ignorance, let me start out with a really basic question. What are containers?

Donna DeCapite: Cloud containers are all the rage in the IT world. They’re an alternative to virtual machines.  They allow applications and any of its dependencies to be deployed and run in isolated space. Organizations will build and deploy in the container environment because it allows you to build only the necessary system libraries and functions to run a given piece of software. IT prefer it because it’s easy to replicate, and it’s faster and easier to deploy.

LL: And SAS Analytics for Containers will allow organizations to run SAS’ analytics in this environment, the containers?

DD: In short, yes. SAS Analytics for Containers provides a powerful set of data access, analysis and graphical tools to organizations within a container-based infrastructure like Docker. This takes advantage of the build once, run anywhere flexibility of the container environment, making it easier and faster to use SAS Analytics in the cloud.

LL: Who, within an organization, will be the primary users?

DC: Really anyone working with containers in the cloud or anyone working with DevOps. Data scientists will embrace SAS Analytics for Containers because it allows them to access data from nearly any source and easily perform sophisticated analyses using SAS Studio, our browser-based interface, or Jupyter Notebook, an open source notebook-style interface. For SAS developers, the product allows them to quickly provision IT resource to sandbox development ideas. And as I mentioned earlier, IT will appreciate the ease with which applications can be deployed, distributed and managed.

LL: You mentioned data scientists, so now I’m curious; we’re talking complex analyses here, yes?

DC: Definitely. Regressions, decision trees, Bayesian analysis, spatial point pattern analysis, missing data analysis, and many additional statistical analysis techniques can be performed with SAS Analytics for Containers. And in addition to sophisticated statistical and predictive analytics, there are a ton of prebuilt SAS procedures included to handle common tasks like data manipulation, information storage, and report writing, all available via SAS Studio and its assistive nature.

LL: What about organizations that have massive amounts of data? Can they use SAS Analytics for Containers as well?

DC: Yes, SAS Analytics for Containers allows you to take advantage of the processing power of your Hadoop Cluster by leveraging the SAS Accelerators for Hadoop like Code and Scoring Accelerator.

LL: Thank you so much for your time Donna. I know you’ve educated me quite a deal! How about more information. Where can individuals learn more about SAS Analytics for Containers?

DC: If you’re going to Analytics Experience 2016, consider coming to Michael Ames’ table talk, Cloud Computing: How Does It Work? It’s scheduled for Monday, September 12 from 4:30-5:00pm Vegas time. If you’re not going to the event, the SAS website is the best place for more information. Probably the best place to start is the SAS Analytics for Containers home page.

tags: analytics conference, analytics experience, cloud computing, SAS Analytics for Containers, sas studio

Analytics in the Cloud gets a whole lot easier with SAS Analytics for Containers was published on SAS Users.

9月 092016
 
How the SAS Global Forum Presenter Mentoring Program can help would-be presenters

Stephanie Thompson, Datamum, Presenter Mentoring Lead

For a SAS professional, presenting at SAS Global Forum 2017 can be a very rewarding. It can enhance your conference experience, help expand the knowledge of the broader SAS community, and advance your career by putting you on display as an expert SAS professional. It can also be a little scary, especially if you’ve never presented to an audience of SAS peers, or you have, but still get nervous thinking about the process of submitting an idea, preparing your talk and presenting it live.

Luckily, your SAS Global Forum Executive Board has created an incredible resource available to help would-be presenters: the SAS Global Forum Presenter Mentoring Program. I recently sat down with Stephanie Thompson and Cindy Wilson, the mentoring program leads, to learn a little more about this awesome service and how it can help presenters feel great about their upcoming presentation.

Cindy

Cindy Wilson, Eli Lilly, Presenter Mentoring International Focus

Larry LaRusso: I have an idea for a paper to submit for SAS Global Forum but I am not sure how to put this idea into a submission. Can the mentoring program help?

Cindy Wilson:  It sure can! You can get help in preparing your abstract submission through the Presenter Mentoring Program. But, that’s just the start of the service. Presenter Mentors can help with all aspects of your submission.

LL: And that means help developing an abstract concept right through putting together the final presentation?

CW: Yes. Presenter Mentors will help you develop your concept for consideration by the conference team. If your submission is accepted, they will help you right up through the conference. Help using the paper template, presentation tips, focusing your paper on conveying how you used SAS to solve your problem, and even tips for getting the most out of SAS Global Forum are all ways a Presenter Mentor can help.

LL: Can anyone, anywhere with an idea request a mentor?

Stephanie Thompson: Over the years, I have worked with potential presenters from all over the U.S. and the world. Everyone with an idea is welcome to request assistance.

LL: What types of SAS users serve as mentors?

CW: Presenter Mentors are seasoned SAS users and SAS Global Forum presenters who are willing to share their knowledge with those needing help. They come from many types of industries and have experience in all areas of SAS. Some focus on breakout sessions and others e-Posters, so all types of presentations are covered. Mentees are matched with the Presenter Mentor who best fits in terms of the topic of the paper, industry area, and communication preferences.

LL: Will mentors edit my paper?

ST: The purpose of the Presenter Mentoring Program is really to help potential presenters develop their ideas and bring out the SAS of their work. The paper is the work of the submitter and Presenter Mentors want the author’s voice to come through. Editing is not the focus of the program. However, there have been occasions where I have worked with authors whose first language is not English. If help editing is requested, I have helped as much as I could. This is something the mentor and mentee would discuss. Presenter Mentors are SAS experts and not professional editors, certainly something to keep in mind.

LL: Tell me a little about your experience as a Presenter Mentor for past SAS Global Forums.

ST: I have really enjoyed working with presenters across the world. So many people have some great ideas to share and sometimes they just need a little help putting it all together.   Several mentees that I have worked with have become regular presenters at SAS Global Forum and I have kept in touch with a few others over the years. I truly enjoy getting to meet them at the conference after working with them beforehand.

LL: Who do you think can benefit from a Presenter Mentor?

CW: Students who want to transform their research into a SAS Global Forum paper can benefit. Many times the research results are interesting but the focus of the conference paper needs to be how you used SAS to get those results. First-time attendees can also benefit by learning what the norms and expectations are for submissions. Once you have been through the process once, it is easier the next time. Mentors help make that first experience go smoothly. Lastly, anyone who just needs a second set of eyes or a little help getting their ideas across can benefit from the program. No one is turned away.

tags: SAS Global Forum, sas global forum presenter mentoring program

How the SAS Global Forum Presenter Mentoring Program can help would-be presenters was published on SAS Users.

9月 072016
 

Calculate a moving average using periodic operators in SAS Visual AnalyticsThumbAnalysts often use a simple moving average to get an idea of the trends in data. This is simply an average of a subset of time periods, and the size of the subset can differ depending on the application. The technique can be used with data based on time periods, such as sales data, expense data, telecom data, or stock market data. The average is called ‘moving’ because it is continually recomputed as more data becomes available. This type of average is also called a ‘rolling average’ or ‘running average’. In this post, I’ll share a little bit about how to use the periodic operators in SAS Visual Analytics Designer to calculate a simple moving average.

The report below, created in the designer, shows the summary of the Amount column by month. The Three-Month Moving Average column displays the average of a month and the previous two months’ Amount sums. The 3-month sum is simply divided by 3.

Calculate a moving average using periodic operators in SAS Visual Analytics

This Three-Month Moving Average column is an aggregation, calculated using the RelativePeriod Periodic operator.  Both the visual and the text forms of the aggregation are shown below:

Calculate a moving average using periodic operators in SAS Visual Analytics2

Calculate a moving average using periodic operators in SAS Visual Analytics3

The RelativePeriod operator returns aggregated values - sum of Amount, in this case - relative to the current period – in this case, the previous month. The data item for the period calculation is Month, which is a date value with an associated YYYYMM format. The interval is _ByMonth_, and the 0, -1, and -2 offset values represent the current month, the previous month, and the month before the previous, respectively. The division by three creates the 3-month moving average, but the number of RelativePeriod expressions and the divisor could be adjusted to calculate an average based on any number of months.

The report below displays a 3-Year Moving Average column.

Calculate a moving average using periodic operators in SAS Visual Analytics4

In this report, the RelativePeriod operator is used to calculate the three-year moving average. The data item for the period calculation is Year, which is a date value with an associated Year format.

Calculate a moving average using periodic operators in SAS Visual Analytics5

The interval is _ByYear_, with the 0, -1, and -2 offset values representing the current year, the previous year, and the year before the previous, respectively. Again, the number of RelativePeriod expressions and the divisor could be adjusted to calculate an average based on any number of years.

Calculate a moving average using periodic operators in SAS Visual Analytics6

Calculate a moving average using periodic operators in SAS Visual Analytics7

The ParallelPeriod operator is used in the report below to display a 3-month moving average based on amounts corresponding to the same month in each of the three years. This report, of course, has missing values for all months up until the third year of data.

Calculate a moving average using periodic operators in SAS Visual Analytics8

Calculate a moving average using periodic operators in SAS Visual Analytics9

The first moving average value to be calculated is based on data from January 2010, 2011, and 2012. The measure is Amount and the periodic item for the aggregation Month. The inner interval for the aggregation is _ByMonth_ and the outer interval is _ByYear_. The 0, -1, and -2 offset values represent months of the current year, the previous year, and the year before the previous, respectively. These ParallelPeriod expressions and the divisor could also be adjusted to calculate an average based a different divisor.

Calculate a moving average using periodic operators in SAS Visual Analytics10

Calculate a moving average using periodic operators in SAS Visual Analytics11

Do keep in mind that for all of these periodic aggregations the ‘aggregation by’ column representing month or year must be included in the report.

I hope these examples using the periodic operators will be helpful to you in creating your Visual Analytics reports.

tags: business intelligence, SAS Professional Services, SAS Visual Analytics

Calculate a moving average using periodic operators in SAS Visual Analytics Designer was published on SAS Users.

9月 062016
 

For many people, building something from scratch, no matter how simple or complex, is fascinating. That’s why programs similar to How’s It Made are so appealing and, for me, addicting. And thus, the inspiration for this blog; I will walk you through building a set of graphs and how to improve each visualization through my own personal iterative process. Like all forms of art, a visualization is never complete, as constant improvement, tweaking and alterations are required to accommodate the constant influx of data and the ever changing needs of our audience.

These graphs use telecom data about cell phone network service including call duration and data usage.

Example 1: Calls versus Drops

In this first example, I noticed that the data contained the number of calls and the number of dropped calls. Like most analytics, audiences are interested in the outliers. In this case, we look at the poor performing occurrence of a call being dropped. This data would prove useful if a company wanted to research poor performing cell technology either of the handset itself or of the cell towers. It could also be used to find any dead zones, where additional towers may need to be added. In this example, I decided to plot the data against the 24-hour day to determine if volume of calls impacted the number of calls dropped.

Example 1: Iteration 1
Naturally, I started with a bar chart visualization. I plotted the hours of the day (24-hour scale) on the x-axis and the number of calls and the number of dropped calls on the y-axis. At first glance, it looks like there is some variation in the number of dropped calls and the hour of the day.

Visualize cell phone data in SAS Visual Analytics

Example 1: Iteration 2
Since the number of calls and the number of drops are of the same scale, we can easily take the ratio of the two to plot the call drop rate. The equation is to take the number of dropped calls divided by the number of calls. I used an aggregated measure to create this ratio, which will be evaluated on the fly, depending on the Group By variable. In this example the “ByGroup” is the hour of the day, and I used a Percent format.
Visualize cell phone data in SAS Visual Analytics2

This now gives us one bar to evaluate against the hours of the day. We can see that the Call Drop Rate does not fluctuate as much as the previous graph could lead one to believe.

I also added a reference line at 5% to make it easier to see which hour of the day fell below or above the 5% rate.

Visualize cell phone data in SAS Visual Analytics3

Example 1: Iteration 3
Finally, I noticed that the data contained a Cell Technology category. I thought it would be interesting to see if a certain technology was more unreliable than another. To do this, I added Cell Technology to the graph. I liked the visualization the best when I changed the bar chart to a horizontal orientation, used a row lattice for the Cell Technology and kept the 5% reference line. This now gives me an enhanced “quick glance” comparison ability to see that the 4G Cell Technology seems to have the most consistently high Call Drop Rate (over 5%) for all hours of the day.
Visualize cell phone data in SAS Visual Analytics4

Example 2: Call Duration

Example 2: Call Duration
In this second example, I used the Voice_Seconds data item to study the duration of calls over the course of the 24-hour day. This visualization could help determine what the peak hours of the day are for voice calls and potentially the best time to schedule any required maintenance to impact the least amount of customers.

Example 2: Iteration 1
Again, I stared with a bar chart visualization where I plotted hours of the day on the x-axis and the Voice_Seconds on the y-axis. The first thing I noticed was that at hour 20 there was a peak _SUM_ of over 3 million Voice_Seconds. This immediately prompted me to want to find out how long 3 million seconds was and that I need to look at the average of Voice_Seconds.

Visualize cell phone data in SAS Visual Analytics5

Example 2: Iteration 2
The first thing I did was create another aggregated measure to produce the Average Call Duration. To do this I took the sum of Voice_Seconds divided by the sum of the number of calls for a By Group.

Visualize cell phone data in SAS Visual Analytics6

The next thing I wanted to do was provide a reference point for how long 1,000 seconds is in minutes. Granted, I could convert Voice_Seconds into a new metric but instead I decided to use a reference line where 900 seconds equates to 15 minutes.

Visualize cell phone data in SAS Visual Analytics7

Example 2: Iteration 3
Lastly, I wanted to see if the type of Cell Technology had any impact on the distribution or length of call, mostly just because I was curious.  I was surprised that this data shows an average call of 15 minutes. That’s a long personal call when I consider most of my calls consist of “are you on your way?” and “we forgot x at the store – please pick it up on your way home”.

If this were call center data you would be able to determine how quickly issues were getting resolved. If this were sales call data, and representatives were following a script, this visualization would show, on average, how long those calls took and maybe the longer calls would result in a sale. So you could see which hours of the day sold more product.

Visualize cell phone data in SAS Visual Analytics8

Example 3: Data Usage

In this third example, I explored the data usage. I had two data items available for use: mbytes_up and mbytes_down. This visualization could help determine peak hours for which to perform system upgrades or maintenance. It could also help identify those peak hours and then add tower locations to help determine if additional hardware could help network speed performance.

Example 3: Iteration 1
I started with the bar chart visualization and plotted the hours of the day on the x-axis and the mbytes_up and mbytes_down on the y-axis. Again, the first thing I noticed was that I was looking at the _SUM_ for the metrics and the large difference in the numbers for up versus down usage.
Visualize cell phone data in SAS Visual Analytics9

Example 3: Iteration 2
The first thing I did was convert the megabytes to gigabytes by creating new calculated data items and then I created the Average Upload and Download by creating aggregated measures.

Here are the two metrics I created for Upload:
Visualize cell phone data in SAS Visual Analytics10

And here are the two metrics I created for Download:

Visualize cell phone data in SAS Visual Analytics11

This makes the visualization a bit easier to consume, now that we can compare the average upload or download size per session. I also added two reference lines at 5 GB and 15 GB. I still felt like the data needed to be visualized a bit better to understand the usage since I know most typical cell phone plans allow for 5 GB of data per month and the average session using more than 15 GB, it just seems like a lot.

Visualize cell phone data in SAS Visual Analytics12

Example 3: Iteration 3
To further classify the data I added Cell Technology to the visualization and broke it into two visualizations: one for upload and one for download. Once I did that, the visualization really started to show different data usage patterns.

Both visualizations show the 4G technology doesn’t even reach 5 GB, which makes me think that the customers with new phones and new service plans are sticking to their data allowance. But the customers with the older technology of 3G may be “grandfathered” in with their unlimited data plan and making the most of it.

Average Upload (GB)
Visualize cell phone data in SAS Visual Analytics13

Average Download (GB)

Visualize cell phone data in SAS Visual Analytics14

This blog has taken you through three examples of how I iteratively develop visualizations using SAS Visual Analytics. Ultimately, what this process shows you is that the more specific business question you have, the better a visualization you can create.

tags: SAS Professional Services, SAS Visual Analytics

Steps to visualize cell phone data in SAS Visual Analytics: Can you hear me know? was published on SAS Users.

8月 312016
 

update_to_SASStudio_ 3.5Update-in-place supports the ability to update a SAS Deployment within a major SAS release. Updates often provide new versions of SAS products. However, when using the SAS Deployment Wizard to perform an update-in-place you cannot selectively update a machine or product. As a general rule if you want to update one product in a SAS Deployment you have to update the whole deployment. With the latest version of SAS Studio, that’s not the case.  You can now update from version 3.4 to version 3.5 of SAS Studio without updating any other part of your SAS deployment.

SAS Studio 3.5 contains some interesting new functionality:

  • A new batch submit feature.
  • The ability to create global settings for all SAS Studio users at your site.
  • A new Messages window that displays information about the programs, tasks, queries, and process flows that you run.
  • A new table of contents in results.
  • New keyboard shortcuts to add and insert code snippets.
  • Many new tasks for statistical process control, multivariate analysis, econometric analysis, and power and sample size analysis. For more information, see SAS Studio Tasks.

For my purposes, I was really interested in using the batch submit feature. Using “Batch Submit” a user can run a saved SAS program in batch mode, which means that the program will run in the background while you continue to use SAS Studio. When you run a program in batch mode, you can view the status of programs that have been submitted, and you can cancel programs that are currently running.

So how does this “selective update” work? Somewhat unusual for a product update, it is available via a hot fix documented in the note 57898: Upgrade SAS® Studio 3.4 to SAS® Studio 3.5 without upgrading other products.

SAS Studio is available in three different deployment flavors: SAS Studio Mid-Tier (the enterprise edition), SAS Studio Basic, and SAS Studio Single-User. The hot fix is available for the enterprise and basic edition. In addition, in order to apply the hot fix the current deployment must be at SAS 9.4 M3. For SAS Studio Single-User, an MSI file has been added to the downloads section of support.sas.com to allow users to download SAS Studio 3.5 to run against their existing Windows desktop SAS for releases 9.4M1 and higher.

The hot fix is a container hot fix, meaning the hot fix delivers one or more “MEMBER” hot fixes in one downloadable unit. Container hot fixes have some special rules you must follow when applying them.

  • They must be applied separately to each machine. The installation process will apply only those MEMBER hot fixes which are applicable based on the SAS Deployment Registry for each specific machine.
  • They may contain MEMBER hot fixes for multiple operating systems. The SAS Deployment Manager will apply only those MEMBER hot fixes which are applicable for the operating system on each specific machine.
  • They often contain pre and/or post installation steps outlined in the instructions provided.

A review of the hot fix instructions shows that to complete the update for the SAS Studio Mid-Tier the web application must be rebuilt and redeployed.

To apply the container hot fix on my three tier deployment, which has a Windows metadata server, LINUX compute tier and LINUX middle tier, I downloaded the hot fix to a network accessible location and followed the process documented in the hot fix instructions. To summarize:

Create a deployment registry report on each machine. The reports showed that:

SAS Studio Basic is installed on the Linux compute tier.

Update to SAS Studio 3.5

SAS Studio Enterprise is installed on the Linux middle-tier.

Update to SAS Studio 3.5_1

Update SAS Studio Basic

Stop all SAS servers in the deployment. Run the SAS Deployment Manager on the LINUX compute tier and select Apply Hot fixes and then select the directory where the hot fix was downloaded. The Wizard updates SAS Studio Basic. A review of the hot fix documentation shows no post-deployment steps are required for SAS Studio Basic.

Update to SAS Studio 3.5_2

Update SAS Studio Mid-Tier (Enterprise)

Run the SAS Deployment Manager on the LINUX middle-tier tier and select Apply Hot fixes and then select the directory where the hot fix was downloaded. The Wizard updates SAS Studio Mid-Tier.

Update to SAS Studio 3.5_3

A review of the hot fix documentation shows that, to complete the update, the SAS Studio Web Application must be rebuilt and redeployed.

Update to SAS Studio 3.5_4

Start the SAS Metadata Server and use the SAS Deployment Manager on the middle-tier to rebuild just the SAS Studio Middle-Tier. Start all SAS Servers and use the SAS Deployment Manager on the middle-tier machine to redeploy just the SAS Studio Middle Tier.

When the redeploy is completed, I logon to SAS Studio. Selecting Help > About shows that now I have SAS Studio 3.5.

Update to SAS Studio 3.5_5

If I navigate the folder tree and select a SAS program I can now right-click on the program and select “Batch Submit” to run the program in the background.

Update to SAS Studio 3.5_6

If you are excited about the new functionality of SAS Studio 3.5, I think you will agree that the hot fix provides an easy path to update the software.

tags: deployment, SAS Administrators, SAS architecture, SAS Professional Services, sas studio

A quick way to update to SAS Studio 3.5 was published on SAS Users.

8月 302016
 

Suggestion based matching in SAS Data QualityHave you ever had problems matching data that has typographical errors in it? Because of the nature of arbitrary typos and incorrect spelled words a specific matching technique is required to tackle those cases. SAS Data Quality, with its traditional, in nature deterministic matching approach is by nature not best suited for correctly matching typos such as character transpositions and missing or additional characters in words. But SAS provides a feature called suggestion based matching in SAS Data Quality especially designed for matching data with typos. Suggestion based matching provides a more probabilistic alike way towards matching. With suggestion based matching, SAS Data Quality will output multiple matchcodes based on alternative “suggestions” for a data field. Each suggestion also includes a score that reflects the closeness of the suggestion to input word.

Let's dive a little deeper.

The concept of SAS Data Quality suggestion based matching

Suggestion based matching in SAS Data Quality02Prerequisite for suggestion based matching in SAS Data Quality is a dictionary of known words along with a frequency count for each word. The matching engine will generate “suggestions” for potentially misspelled input values from the known words dictionary. The generated suggestions are made of the data input, but with spelling errors like character deletions, insertions, replacements, and transpositions. Whitespace insertion, casing, and context-dependent pronunciation can also be taken into account. For each suggestion a score, that reflects the closeness of it to the input, is calculated. The higher the match score for a suggestion, the more likely it is the true entity.

The known words dictionary is generated with the idea that the number of correctly spelled entity names are more frequent and therefore outnumber the misspelled ones. This is an important aspect of the match score calculation for the suggestions. The matching engine will generate suggestions by taking the input value and “inject” character transpositions, replacements and other typos and compare it against the known words dictionary entries. During the whole process the input value is seen as the potentially corrupted version of the true entity. By making various character based alterations to the input data, the matching engine tries to find possible candidate entities in the known words dictionary. If one of the suggestions matches a word of the known words dictionary the engine calculates a match score based on the frequency count and the changes required to create the suggestion. This concept potentially results in a list of possible matches including the true entity and other misspelled or “close enough” words identified as Suggestion based matching in SAS Data Quality03potential matches. The resulting match score can finally help to resolve the true entity.

Suggestion based matching involves more compute resource and therefore will slow down data throughput of the matching process. Still, it is a proven approach to provide match results for input data that contains character level data quality issues. To minimize performance implication, suggestion based matching is best used as a second iteration for input data that could not be matched using the standard matchcode method.

Suggestion based matching in SAS Data Quality04

tags: data management, SAS Data Quality, suggestion based matching

Improve matching results with suggestion based matching in SAS Data Quality was published on SAS Users.