6月 292016

I recently read an article in which the winner of a Kaggle Competition was not shy about sharing his technique for winning not one, but several of the analytical competitions.

“I always use Gradient Boosting,” he said. And then added, “but the key is Feature Engineering.”

A couple days later, a friend who read the same article called and asked, “What is this Feature Engineering that he’s talking about?”

It was a timely question, as I was in the process of developing a risk model for a client, and specifically, I was working through the stage of Feature Engineering.

The act, or I should say the “art” of Feature Engineering takes time – in fact, it takes much more time than building the actual model, and because many clients desire a quick return regarding analytics, I was giving some thought to explaining the value that I perceived in the effort being put forth regarding the engineering of features from their data.

But first, I needed to respond to my friend’s question, and I began by describing the environments in which we typically work; “the companies we work with are not void of analytical talent, in fact some institutions have dozens of highly qualified statisticians with many years of experience in their respective fields. Still, they may not be getting the results from their models they are expecting.”

I was quickly asked, “So, how do you get better results? Neural Nets? Random Forests?”

“It’s not the method that sets my results apart from theirs,” I answered. “In fact, the success of my modeling projects is generally solidified before I begin considering ‘which’ method.” I paused and pondered how to continue, “Before I consider methods, I simply derive more information from their data than they do, and that is Feature Engineering.”

Feature Engineering

Feature Engineering is a vital component in the process of attaining extraordinary Machine Learning results, but it doesn’t merit as many intriguing conversations, papers, books or talks at conferences. Regardless, I know the successes due my projects that are typically attributed to the application of Machine Learning techniques is actually due effective Feature Engineering that not only improves the model, but also creates information that the clients can more readily understand.

How did I justify the time spent “Feature Engineering” to the client mentioned above? First, I developed a model with PROC LOGISTIC using only the primary data given to me and called it “Model 1.”

Model 1 was then compared to “Model 2,” a model built using the primary data plus the newly developed features. Examples of those new features that proved significant were those constructed as follows:

  • Ratios of various events relative to time.
  • Continuous variables dimensionally reduced into categories.
  • Simple counts of events.

And then I proactively promoted the findings!

I often use the OUTROC option to output the necessary components such that I can create a chart of ROC curves using PROC SGPLOT. An example of this process can be viewed in Graph 1 where the value of Feature Engineering is clearly evident in the chart detailing the curves from both models: the c-statistic improved from 0.70 to 0.90.

Great machine learning starts with resourceful feature engineering

However, there is still a story to be shared as improvement is not only represented as a stronger c-statistic, but rather the in the actual classification of default. For example:

  • Model 1 correctly predicts default in 20% of accounts categorized as “most likely to default,” but it misses on 80%!
  • Model 2 correctly predicts default in 86% of those accounts it categorizes as most likely to default, but misses on only 14%.

How did this happen?

While we could drill down into a discussion of significant factors to seek an in-depth explanation of the results, a more general observation is simply this: Model 2 accomplishes its mission by correctly reclassifying a substantial portion of the population as “low risk” and leaves those with default concerns to be assessed with greater consideration.

Essentially, the default model is vastly improved by better discerning those customers who we need not worry about leaving an uncluttered pool of customers that require attention (see Graph 2).


Instead of assessing the default potential for 900,000 accounts, the model instead deals with 100,000 “risky” accounts. It essentially improves its chances of success by creating a more “easy to hit” target, and the result (see Graph 3) is that when it categorizes an account as “high risk” it means it!


Thanks to Feature Engineering, the business now possess very actionable information.

How Do I Engineer the Features Necessary to Attain Such Results?

Working from the SAS Data Step and select SAS Procedures, I work through a process in which I attempt to create useful features with the following techniques and methods:

  • Calculate statistics like the minimums, maximums, averages, medians and ranges thinking that extremes (or the lack) might help define interesting behaviors.
  • Count occurrences of events considering that I might highlight statistically interesting habitual behaviors.
  • Develop ratios seeking to add predictive value to already analytically powerful variables as well as to variables that might have previously lacked statistical vigor.
  • Develop quintiles across variables of interest seeking to create expressive segments of the population while also dealing with extreme values.
  • Apply dimensionality reduction techniques, ranks, clustering etc. expecting that grouping those with similar behaviors will be statistically beneficial.
  • Consider the element of time as an important interaction with any feature that has been developed.
  • Use regression to identify trends in continuous variables thinking that moves up or down (whether fast or slow) will be interesting.

As with the guy who won the Kaggle competitions, I’ve now revealed some of my secret weapons regarding the building of successful models.

What’s your secret weapon?



tags: machine learning

Great machine learning starts with resourceful feature engineering was published on SAS Users.

6月 282016

SAS CaresIt’s no secret: here at SAS, we care about our customers. In fact, we care so much that we developed a program with that exact name: our SAS Cares program is here to help you with all things SAS!

We know you may be so awesome that you never need us, but even excellent people need help sometimes (I should know). If you find yourself in a tight SAS spot, have a burning SAS question, or even just want to write some SAS poetry, we’re here to help. Our SAS Cares program is extensive and can help you with any problem, question, or comment you’d like to share with us. We make it easy for you to connect with us, no matter what you are looking for, or what channel you are on. You can find information about all the services available from the SAS Cares Program in this blog from Maggie Miller.

A little bird (tweet, tweet!) told me you’d like to know more about one of the main ways SAS connects with our customers (which also happens to be my personal favorite!), through our social customer service channels.

SAS’ social properties are excellent places for you to ask questions, make suggestions, interact with other users, and give us feedback.  We are currently on Twitter, Facebook, LinkedIn, and YouTube, constantly monitoring these channels, looking for your questions and opportunities to support you. And, because we’re only a click away, we’d like you to consider us your first stop for pretty much anything about SAS. Most of the time, we’ll be able to give you exactly what you need, and in the rare instances we don’t have the answer, we’ll point you to resources that can help you find it.

For example, if you have questions about using SAS or SAS programming, we might send you to our SAS Support Communities. Or if you are getting program errors, or have installation questions, we can help you find the right Product Documentation and SAS Notes, or, if needed, open a Technical Support track for you. If you are not sure where to start, start with us. Our social channels are perfect for real-time interaction. We are here for you, and thrilled to help!

Here’s how you can connect with us via social media:

  • Twitter: @SAS_Cares
    Our primary social customer service channel, ready to handle anything SAS related.
    Reply, mention, and direct message us. Follow us for helpful resources and links for #SASusers!
  • Facebook: SAS Software and SAS Analytics U
    We post the latest SAS news and helpful tips. Like, Follow, Comment, Message, and post to our SAS pages.
  • LinkedIn: SAS
    Primary professional landing spot for SAS users, followed by nearly 300,000 SAS professionals. Follow us and comment on our posts.
  • YouTube: SAS Software
    Subscribe, view, and comment on our videos. Many of them teach specific SAS concepts and techniques, and will help you become a more effective and efficient SAS user.

So give us a try: reach out to us when you need SAS help, have some SAS feedback, or just want to share… because who cares? SAS Cares!

tags: sas cares, sas customer contact center

SAS Cares: help just a tweet, post, or click away was published on SAS Users.

6月 242016

ProblemSolversXML has become one of the major standards for moving data across the Internet. Some of XML’s strengths are the abilities to better describe data and to be more extensible than any of its predecessors such as CSV. Due to the increased popularity of XML for moving data, I provide a few tips in this article that will help when you need to read XML files into SAS software.

Reading XML Files

You can read XML files into SAS using either the XML engine or the XMLV2 engine. The XML engine was the first engine that SAS created to read XML files. The XMLV2 engine includes new functionality and enhancements and is aliased to the XML92 engine in SAS® 9.2.

It is easy to read XML files using the XMLV2 engine when the XML file that you read uses the GENERIC markup type and conforms to a very rectangular definition. Here is an example of the XML file layout that is required to be read natively using the XMLV2 engine:

If the file is not in this format, the following informative error message is generated in the SAS log:

reading XML files into SAS® software

If the file is not in this format, the following informative error message is generated in the SAS log:

Reading XML files into SAS Software02

The XMLMap file referenced in the error message is a SAS specific file that instructs the engine about how to create rows, columns, and other information from the XML markup. You can use either of the following methods to generate the XMLMap file:

  • the AUTOMAP= option within the SAS session beginning with SAS® 9.3 TS1M2
  • the SAS XML Mapper, which is a stand-alone Java application

Generating Dynamic Maps within the SAS Session

Using the XMLV2 engine, you can create a dynamic XMLMap file and use it to generate SAS data sets by specifying the AUTOMAP= and the XMLMAP= options in the LIBNAME statement. The AUTOMAP= option has two possible values: REPLACE and REUSE. REPLACE updates an existing XMLMap file, whereas REUSE uses an existing XMLMap file. Both values create a new XMLMap file if one does not currently exist. When the file is automapped, one or more data sets is created. This method creates a representation that is as relational as possible based on the XML markup. In addition, it generates surrogate keys that enable you to combine the data using the SQL procedure or the DATA step. The following example demonstrates this method.

Here is the XML data that is read:

Reading XML files into SAS Software03

Here is the SAS session code:

filename datafile 'c:teststudents2.xml';
filename mapfile "";
libname datafile xmlv2 xmlmap=mapfile automap=replace;

proc copy in=datafile out=work;

Here is the output:

Reading XML files into SAS Software04

For more information about using the XMLV2 engine, refer to the “SAS® 9.4 XMLV2 LIBNAME Engine Tip Sheet.”

Using SAS XML Mapper

If you answer “yes” to the following questions, you should use SAS XML Mapper to generate the XMLMap file rather than generating this file dynamically within the SAS session:

  • Do you need to generate an XMLMap from an XML schema file?
  • Do you need to generate a custom XMLMap file or map only a portion of the file?
  • Do you need to view the data to be imported before reading the XML file?
  • Do you need to add specific metadata such as data types, informats/formats, column widths, or names?
  • Do you want to generate sample code for the import of the XML file?

The following image shows SAS XML Mapper once you click the icon that runs the automapping feature, which enables you to quickly generate tables from the XML markup:

Reading XML files into SAS Software05

Using SAS XML Mapper, you can also generate a custom XMLMap file.

For more information, refer to the following video tutorials about creating XMLMap files:

I hope this article helps you easily read XML files into SAS regardless of the file structure. This information should also help you determine the appropriate method to use when generating an XMLMap file.

tags: SAS Problem Solvers, SAS Programmers

Tips for reading XML files into SAS® software was published on SAS Users.

6月 222016

Report designers often discover after aggregating data by groups in the Visual Analytics Designer that it would also be nice to see additional aggregations of the data, for example, a maximum or minimum of that sum across groups. This means creating an ‘aggregation of an aggregation.’ If you plan your report objectives in advance of loading your data, you can do this by creating an aggregation initially in the Visual Data Builder, rather than the Designer.  And in many cases, it’s possible to get the desired report results by simply loading a small additional table to memory.

Here’s an example of a report with a sum aggregation on the profit measure and Facility Region as the category data item. The report also shows the maximum, minimum, and average of the regional profit values, along with the difference between each region’s profit sum and the average. It’s possible to assign a SUM aggregation to the Profit measure, but the challenge appears when trying then to create a MAX aggregation across the regions.

Visual Data Builder to pre-aggregate your data01

If you have awareness of the need for this type of report prior to loading the data, you can easily load a much smaller version of the data with the Profit data item already pre-aggregated. The data builder query, shown below, creates and loads a file containing only the FacilityRegion as a GROUP BY column and the Profit column as a SUM aggregation.

Visual Data Builder to pre-aggregate your data02

Visual Data Builder to pre-aggregate your data03

In the designer, for reporting purposes, the Profit_SUM data item is renamed to Sum of Profit for Region.

Visual Data Builder to pre-aggregate your data04

Now the Sum of Profit for Region data item can be used to calculate these aggregated measures:  Max Regional Profit Sum, Min Regional Profit Sum, and Avg Regional Profit Sum, using the formulas shown below:

Visual Data Builder to pre-aggregate your data05

Once you have the Avg Regional Profit Sum calculation, you can create the additional aggregated measure, Regional Profit – Average:

Visual Data Builder to pre-aggregate your data06

So, this is a list table displaying all of the information that was derived from the two columns and four rows of data.

Visual Data Builder to pre-aggregate your data07

To emphasize the region with the maximum profit and minimum profit, you may want to add the following display rules.

Visual Data Builder to pre-aggregate your data08

And since you have the original larger table loaded as well, you can add an additional data source and some interactions, so that the user can click on a row to view more detail data corresponding to each region of interest.

Visual Data Builder to pre-aggregate your data09

Now, for example, the user can see what facilities contributed to the max value for the Western region, and the corresponding Expenses as well!

Visual Data Builder to pre-aggregate your data10

So, although it may not be practical to pre-aggregate the values of every measure in your data, sometimes pre-aggregating and storing the much smaller table can give you the benefit of being able to get the additional aggregations that you need for a specialized report, while relying on a larger stored table for the details.

tags: SAS Professional Services, SAS Programmers, SAS Visual Analytics

Need an aggregation of an aggregation? Use Visual Data Builder to pre-aggregate your data was published on SAS Users.

6月 142016

As an addendum to my previous two blogs on using the SAS Environment Manager Report Center, this blog illustrates further tips and tricks you can use to help make the creation of your custom reports easier.

The Ad-Hoc Reporting section of the Report Center is specifically designed to provide a “testing ground” for reports you may want to try. It can be your most useful tool in designing and creating the reports you want. Notice that the first four “reports” are most useful if you have a good idea of what content you want your report to have; they present  specific report content, and it’s easy to see what type of data these reports would  contain, just based on the titles or names shown.

Auditing using the SAS Environment Manager Report Center

Selecting any of these four reports will generate a list of all possible reports in that category. From here, you can select the one(s) you want. Here are a few screenshots to illustrate:

APM Metadata Audit: If you make this selection in the Ad Hoc Reporting section, you will be presented with a choice of all reports in that category, under the Report Name field dropdown list in the right-hand panel. Notice that most of these report selections correspond to finished reports already created in the main report section, marked on the left below. If you just run this Ad Hoc report with all the defaults, you’ll get the same report as the one listed in the Nightly Reports >> Audit Reports section. However, you can test some additional parameters using the Ad Hoc version’s menu shown.

Auditing Using the SAS Environment Manager Report Center02

You will find similar choices for all four report groups in that section of the Ad Hoc Reporting. Thus, you have a way to tinker with many of the other standard reports, and try various modifications to them. For example, if you are interested in any of the “ARM Performance Reports,” you can open the ARM Performance section in the Ad Hoc Reporting section to get a set of reports that (mostly) mirror those reports:Auditing Using the SAS Environment Manager Report Center03

This allows you to explore different parameter options beyond what’s provided by the standard reports, such as chart type, number of time bins, and subgrouping time periods.

A different approach is to explore your data using any of the remaining Ad Hoc reports that are organized around the type of output plot you want. As discussed in the previous blog, the name of the Ad Hoc report shows the type of plot generated based on the SAS PROC name. These reports allow you to select a dataset to explore from any of the following three built-in DataMart libraries:  ACM, ARTIFACT, or KITS.  The Ad Hoc reports that can be used in this way are circled below:

Auditing Using the SAS Environment Manager Report Center04

For example, if I want a basic chart, and I want to use the dataset ARTIFACT.AUDITACCESSC, then I can select the Ad Hoc report called “evChart TimeSeq Explorer,” choose my data set from the dropdown list, and then try varying any parameters provided in the interface to explore and come up with many variations and charts based on those two choices. Notice that with this group of Ad Hoc reports, there are more parameter choices than with other reports:

You can select many reporting options, such as multiple variables, type of chart, class variables, break variables, subgroupings, statistics, as well as the more standard subsetting and output format options. You can generate literally hundreds of different reports using varying parameter selections using these Ad Hoc Reports.

As I mentioned in a previous blog in this series, if you want to create a custom report for production that you will run many times, you will want to make a permanent, new copy of the stored process being called, with the correct parameters specified on the reporting macro being called.  I outlined the steps to do this in the previous blog: Creating Custom Reports in the Report Center.

The way to do this is to model your report on an existing report, then make modifications to it. You will want to know the following about the report you are using as a model:

  • What stored process is being used?
  • What reporting macro(s) are being called?
  • What parameters are being specified to the reporting macro?
  • What data set from the Data Mart is being used?

The best way to answer these questions is by using the debugging facility that is built into the report center. All reports provide the following debugging options:

  • Debug Level
  • Session Information: Log
  • Macro Invocation Parameters

Auditing Using the SAS Environment Manager Report Center05

By browsing the SAS log and checking the called reporting macro and its parameters, you can quickly answer the four questions listed above.

If you know what data set you want to use but need to explore that data set further, there are two approaches:

  1. Try using the Ad-Hoc Reporting section as described above to explore the data sets.
  2. Use another data browsing tool–I’ve found that SAS Studio is perfect for this task.

When the Service Architecture Framework is initialized, the following data libraries are added to metadata automatically:


For these cases you can just browse data directly using SAS Studio.  For the other Data Mart libraries, (EVDB, EVDMRAW), you will have to first define those libnames and register the tables in metadata. Once this is done you can explore the data sets in exactly the same way.

tags: SAS Administrators, SAS Environment Manager, SAS Professional Services

Auditing using the SAS Environment Manager Report Center– tips and tricks was published on SAS Users.

6月 102016

Learn more about machine learning"Shall we play a game?"

If you’re a child of the ’80s like me, you might recognize this famous line from the movie WarGames. This innocent-sounding question comes not from one of the movie’s human stars, but from a military super-computer named Joshua, after a bored high school student, played by Matthew Broderick, accesses the computer’s hard drive.

Thinking he’s hacked into a video game company, Broderick’s character accepts Joshua’s challenge and chooses the most intriguing game he can find: global thermonuclear war. To Joshua, though, it’s not just a game. Joshua is an intelligent computer programmed to learn through simulations like the one Broderick’s character initiates. And because the computer actually does control the arsenal of U.S. nuclear weapons, it’s a “game” that puts the planet on the brink of World War III.

If you want to see how the movie ends, I’d encourage you to check it out.

The reason I mention the movie is because, in addition to scaring the heck out of me, WarGames was my first exposure to machine learning, the idea that computers can learn and adapt based on the data they collect. Of course, machine learning has changed a lot since my 1983 Hollywood introduction.

Today, progressive organizations in all industries and disciplines are using the technology – which draws from various fields of study like artificial intelligence, data mining, statistics and optimization – to transform data into business value.

Machine learning defined

The concept is pretty straightforward. Computers “learn” from patterns in data to build a model, that when exposed to new data, can independently adapt without human intervention. The process repeats itself over and over again, with each iteration building on previous information and computations to create more accurate results or better predictions of future outcomes.

Machine learning is revolutionary for many analytically-mature organizations. Big data means even bigger possibilities. Consider what organizations like Amazon and Netflix are doing with recommendation systems based largely on machine learning principles and millions of data points.

Facebook is another great example. I’m sure you’ve noticed that when you upload a photo with people in it, Facebook will suggest that you tag certain individuals it recognizes as Facebook users, even pre-populating names to make it easier for you. That’s all done by a computer using facial recognition and an algorithm that analyzes an individual’s facial features to match faces with names.

My favorite machine learning resources

I won’t pretend to be an expert, but fascinating applications of machine learning like these push me to learn more about the science. It’s why I’m writing this post. If you’re interested in the concepts around machine learning, SAS has some excellent resources that have helped guide me in my discovery process. I believe you would find them helpful as well.

  1. For starters, SAS has an awesome web page on machine learning that includes a discussion on what it is, why it matters, who’s using it and how it works. You’ll also find a 60-second overview video and another video demonstrating how SAS Enterprise Miner uses pattern recognition to accurately recognize pictures and deliver that information to a user without human intervention. The best thing about the site, though, is that it serves as an aggregator for a number of other articles and resources to help you learn more. It probably is the best place to begin.
  2. Another excellent resource is a new O’Reilly white paper, The Evolution of Analytics: Opportunities and Challenges for Machine Learning in Business. SAS Data Scientist Patrick Hall, Solution Architect Wen Phan and Senior Marketing Specialist Katie Whitson give a really good overview of what machine learning is and what it is not. They also outline opportunities for businesses to effectively use machine learning and how to overcome some of the challenges they might encounter trying to weave it into their existing analytics strategy. This includes excellent discussions around talent scarcity, data and infrastructure challenges and the change management required to execute an analytics strategy that includes machine learning.
    Perhaps the best part of the report is the section highlighting some of the modern applications for machine learning – recommendation systems, streaming analytics, deep learning and cognitive computing. The authors highlight two companies – Geneia in the healthcare industry and Equifax in financial services – that are effectively using machine learning, walking readers through the journey these companies have taken to get to where they are today. Really interesting stuff.
  3. To discuss machine learning with your peers I recommend the SAS Data Mining Community. Conversations around machine learning are happening in this space all the time. With nearly 100,000 members from across all fields and disciplines, and thousands logged in at any given time, SAS Communities are a great place to ask and answer questions, or just follow along with the conversation. One discussion thread you should definitely check out is titled Recently Published Machine Learning Resources. Hall, one of the author mentioned above, highlights a couple of useful articles, the white paper and other resources. He also references a GitHub repo that includes a ton of good resources, including several machine learning quick reference tables outlining best practices.

So there you have it, a few good places to start if you’re looking to build your knowledge of machine learning. The possibilities machine learning presents to the world are very exciting.

Just please to be careful if that system you build asks you to play a game.


tags: machine learning, SAS

3 good resources for humans who want to learn more about machine learning was published on SAS Users.

6月 042016

In DataFlux Data Management Studio, the data quality nodes (e.g., Parsing, Standardization, and Match Codes) in a data job use definitions from the SAS Quality Knowledge Base (QKB).  These definitions are based on a locale (Language and Country combination).  Sometimes you would like to work with multi-locale data within the same data job and these data quality nodes have LOCALE attributes as part of their Advanced Properties to help you do this.

For example, you may want to work with data from the United States, Canada, and the United Kingdom within the same data job.  Note:  You must have the QKB locale data installed and be licensed for any locales that you plan to use in your data job.

The Advanced properties you will need to use are LOCALE_FIELD and LOCALE_LIST.  LOCALE_FIELD specifies the column name that contains the 5-character locale value to use for each record.  LOCALE_LIST specifies the list of locales that should be loaded into memory for use within the node.

The first step in using these Advanced properties in a data quality node in a data job is you need a field that contains the 5-character QKB locale information.  The first 2-characters represent the language and the last 3-characters represent the country.  For example, ENUSA represents the English – United States locale and ESESP represents the Spanish – Spain locale.  You can use the Locale Guessing node to create a field that contains a locale value based on a Locale Guess definition from the Quality Knowledge Base (QKB).  Alternatively, you can use a custom Standardization scheme to assign the 5-character locale information as shown in the example below.


Using Multiple Quality Knowledge Base Locales01
QKB Locale Standardization Scheme


Using Multiple Quality Knowledge Base Locales02
Apply QKB Locale Standardization Scheme

The application of the Standardization scheme is then followed up with an Expression node to assign the locale FRCAN (French – Canada) if the province is Quebec.  Now each record has its 5-character locale information in a field called DF_Locale.

Using Multiple Quality Knowledge Base Locales03
QKB Locale Field Results

Once you have the Locale field as part of your input data, you enter the information as usual for the data quality node.

Using Multiple Quality Knowledge Base Locales04
Match Codes Node Properties

Then you map the field with the 5-character locale information to the LOCALE_FIELD Advanced property for the data quality node. You also need to list the locales that should be loaded into memory in the LOCALE_LIST advanced property. Note: You could pass in this list as values using a macro variable.

Using Multiple Quality Knowledge Base Locales05
Match Codes Node Advanced Properties

Note: The definition used in the selected node must exist in all locales referenced.  For example, the State/Province Match definition only exists in the English – United States, English – Canada, and French – Canada locales.  Therefore, if you are using that definition in a Match Codes node you can only pass in data that is from one of those three locales; otherwise, executing the data job will produce an error.

Here is an example data job that uses the Advanced properties of LOCALE_FIELD and LOCALE_LIST to generate match codes for multi-locale data. Notice that there is minimal branching in the data flow.  The only branching that was needed is the United States and Canada data records are branched to generate the match codes for its State/Province data.

Using Multiple Quality Knowledge Base Locales06
Multi-Locale Data Job Example

In conclusion, the Advanced properties of LOCALE_FIELD and LOCALE_LIST are useful when you want to work with data from multiple locales within the same data job. For more information on Advanced properties for nodes in DataFlux Data Management Studio, refer to the topic “Advanced Properties” in the DataFlux Data Management Studio 2.7: User’s Guide.

tags: data management, DataFlux Data Management Studio, SAS Professional Services

Using Multiple Quality Knowledge Base Locales in a DataFlux Data Management Studio Data Job was published on SAS Users.

5月 272016

favorite big data presentations from SAS Global Forum 2106Nowadays, nearly every organization analyzes data to some degree, and most are working with “Big Data.”  At SAS Global Forum 2016 in Las Vegas, a vast number of papers were presented to share new and interesting ways our customers are using data to IMAGINE. CREATE. INNOVATE., as this year’s conference tagline reminds us.

Below you’ll find a collection of a few of my favorites on big data topics, ranging from SAS Grid Manager to Hadoop to SAS Federation Server. The common point? It’s easier than ever to modernize your architecture now. I hope these papers help you continue to advance your organization.

Paper 2020-2016: SAS® Grid Architecture Solution Using IBM Hardware
Whayne Rouse and Andrew Scott, Humana Inc.

This paper is an examination of Humana journey from SAS® 9.2 to SAS® 9.4M3 and from a monolithic environment to a SAS Grid Manager environment on new hardware and new storage. You can find tips such as the importance of understanding the old environment before starting and applying that understanding to building the new environment.

Paper 11684-2016: Multi-Tenancy in SAS® – Is It Worth the Fight?
Christopher Blake, RBS; Sergey Iglov, RBS; Dileep Pournami, RBS; Ekaitz Goienola, SAS UK

This paper explains how Royal Bank of Scotland analyzed the benefits of creating a shared Enterprise Business Intelligence platform in SAS alongside the risks and organizational barriers to the approach. The objective is to have a single, multi-tenant environment offering a variety of SAS software, including SAS Grid Manager, SAS Visual Analytics, SAS Enterprise Miner, and others.

Paper 11562-2016: Beyond Best Practice: Grid Computing in the Modern World
Jan Bigalke, Allianz Managed Operation & Services SE & Gregory S. Nelson, ThotWave Technologies

After an introduction detailing how SAS Grid Manager works, this paper describes multiple best practices implemented at Allianz Managed Operation & Services SE to configure and administer a multi-tenant grid environment.

Paper SAS6281-2016: Introducing SAS® Grid Manager for Hadoop
Cheryl Doninger and Doug Haigh, SAS Institute Inc.

With this paper you can learn the capabilities and benefits of SAS Grid Manager for Hadoop as well as some configuration tips. You will also find sample SAS Grid jobs to illustrate different ways to access and analyze your Hadoop data with your SAS Grid jobs.

Paper SAS2140-2016: Best Practices for Resource Management in Hadoop
James Kochuba, SAS Institute Inc.

YARN is the Hadoop answer to resource orchestration. SAS technologies can take advantage of a shared Hadoop environment, creating the ultimate analytic environment. This paper helps to set the expectations and tuning choices when setting up and using a shared Hadoop environment encompassing SAS technologies such as SAS/ACCESS® Interfaces, SAS® In-Database Analytics, SAS® In-Memory Analytics and SAS® Grid Computing.

Paper SAS2180-2016: How to Leverage the Hadoop Distributed File System as a Storage Format for the SAS® Scalable Performance Data Server and the SAS® Scalable Performance Engine
Steven Sober, SAS Institute Inc.

In this paper Steven explores how the SAS Scalable Performance Data Server and SAS Scalable Performance Data Engine leverage the Hadoop Distributed File System (HDFS). He explores the different capabilities of both engines and shows benchmarks detailing the differences in reading/writing compressed tables versus regular ones.

Paper SAS6223-2016 What’s New in SAS® Federation Server 4.2
Tatyana Petrova, SAS Institute Inc.

This paper provides an overview of the latest features of the product and includes examples for leveraging SAS Federation Server capabilities, such as data quality functions, data masking, and query performance optimization.

tags: big data, SAS Global Forum, SAS Professional Services

Seven of my favorite big data presentations from SAS Global Forum 2106 was published on SAS Users.

5月 252016

Pick your category? If this title seems familiar, that’s because in my last blog, Use parameters to pick your metric in VA Reports, I covered how to use parameters to allow your users to pick which metric they want to view in their visualizations. This is a great technique that offers a solution to many report requirements.

But, what if your users require specific axes labels and titles for your visualizations? What if your users require reference lines? If you encounter these requirements then consider using a stack container to meet these needs.

Let’s take a look; but first, here is a breakdown of the report we will be looking at in this blog. This report does not have any report level prompts but it does have two section prompts. Section prompts filter the data for every object on this section. There is a drop-down list control object that prompts the user for Year, and there is also a button bar control object that prompts the user for Continent.

Then in the report body there is a list control object, a text box and a stack container. The list control object prompts the user for Country. The text box provides the first half of the report title. And the stack container provides a way to organize multiple visualizations on your report; it layers or “stacks” the objects as if they were in a slide deck. The stack container provides navigation options to cycle through the visualization objects that were added. In this example, I added two bar charts and one line chart object to the stack container.

Use a stack container in SAS Visual Analytics

In this first view, I have the Product Line bar chart selected from the stack container. Notice in this bar chart I have two reference lines defined for the Y-axis: one at 250,000 and one at 500,000 for Order Total. Having these reference lines remain static as I select different continents from the button bar section prompt helps to compare these values across the different countries and regions. You can use reference lines to help identify goals or targets, acceptable or unacceptable ranges, previous year average or even previous month average, etc., basically any metric that puts a frame of reference around the values on the visualization.

Use a stack container to pick your category in Visual Analytics02

In this second view, I have the Vendor Type bar chart selected from the stack container. Notice in this bar chart I do not have any reference lines.

Use a stack container to pick your category in Visual Analytics03

In this third view, I have the Month line chart selected from the stack container.

Use a stack container to pick your category in Visual Analytics04

I have used the text box in this report to help with the custom titles. In this case, the title of this report is Total Orders By then each visualization in the stack container uses the category role as its title: either Product Line, Vendor Type or Month to complete the report title.

Use a stack container to pick your category in Visual Analytics05

What can you do with a stack container?

You cannot create an interaction or link to or from the stack container itself, but you can from the individual objects that are inside the stack container.

The way the stack container is used in this report allows the user to focus on the metric Order Totals and then examine these values across several category data items. As you can see, I have created an interaction from the list control object to each visualization in the stack container which allows the report user to filter out outliers or compare specific countries.

Use a stack container to pick your category in Visual Analytics06

In this screenshot, I have North America selected for my continent. We can see that the values for United States greatly exceed the values for Mexico and Canada. We can also see how far the United States values are from our reference lines.

Use a stack container to pick your category in Visual Analytics07

If I remove United States from this visualization - by using the list control to select Canada and Mexico - I can see more clearly how these values compare to each other and to the reference lines.

Use a stack container to pick your category in Visual Analytics08

How is the stack container implemented?

Simply drag and drop the Stack Container object from the Objects tab on to your report. Then, drag and drop the different objects into the stack container. In my example, I added two bar charts and a line chart. From the stack container’s Properties tab you can order the objects you added to the stack container by using the up and down arrows.

Use a stack container to pick your category in Visual Analytics09

How do you name the stack container objects?

These are the individual Names of each visualization object from the Properties tab. In this screenshot, I have the first bar chart selected, and I named this bar chart Product Line.

Use a stack container to pick your category in Visual Analytics10

You can also select different navigation properties for the stack container:

  • Navigation control location: Top left, Top Center, Top Right, Bottom Left, Bottom Center, or Bottom Right
  • Navigation button type: Buttons, Links, or Dots.

Use a stack container to pick your category in Visual Analytics11

Just like in the pick your metric blog, you can use this report technique to allow your report users to pick either different metrics, categories or both!

The stack container is one of my favorite report objects and really provides an interactive reporting experience to examine and explore the data.

tags: SAS Professional Services, SAS Visual Analytics

Use a stack container to pick your category in SAS Visual Analytics Reports was published on SAS Users.

5月 242016

Solving Business Problems with SASBeing a SAS consultant is about solving problems. In our day-to-day work we solve myriads of all sorts of problems – technical problems, data problems, programming problems, optimization problems – you name it. And in the grand scheme of things we solve business problems.

But without a well-defined business problem, all our problem-solving efforts become irrelevant. What is the benefit of optimizing a SAS program so it runs 2 seconds instead of 20? Well, you can claim a ten-fold improvement, but “so what?” if that program is intended to run just once! Given the number of hours you spent on such an optimization, you were definitely solving the wrong problem.

The ice cream maker

There was one event early in my life that made an unforgettable impression on me and forever changed my problem-solving mindset. When I was a teenager, my father and I bought Mom a present for her birthday – an ice cream maker – a little bowl with a slow electrical mixer that you place into a fridge. Yes, it was a cleverly self-serving gift on our part, but, hey, that was what she really wanted!

Unfortunately for all of us, the gift turned out to be totally useless, as our old refrigerator had such a construction that its door’s rigid rubber-covered-metal gasket would just cut off the ice cream maker’s wire, which was supposed to be plugged in outside of the fridge. Obviously, we were all pretty upset about it. My father was contemplating making a hole somewhere in the refrigerator to accommodate the wire. Mom pretended she never really cared about the ice cream maker in the first place. The whole gift situation was just terrible.

A couple weeks later we discovered a lottery ticket amongst the ice cream maker documentation – you know, one of those worthless no-chance promotional lottery tickets no one ever wins with.

Never in my life have I won anything with a lottery… except that time. No, we didn’t win a cash jackpot. Nor had we won a book titled “Converting an ice cream maker into a wireless device for dummies.” The win was much more meaningful. We had won a new refrigerator. It had a nice, soft rubber gasket around its door that perfectly allowed for the ice cream maker’s wire. The beauty of this win was not just that it was the perfect solution to our problem, but that, in fact, the solution (the winning lottery ticket) came packaged with the source of our problem (the ice cream maker)!

I can’t tell you how much that lucky event shaped my professional and overall attitude. From then on, whenever I faced a problem, I would always search for that little “lottery ticket.”  I know it’s there; I just need to find it, and that will turn the problem on its head. From then on, I saw the opportunity in any problem, and the satisfaction of solving the right one.

Problem-driven or purpose-driven solution?

The customer was straight to the point. They had a huge SAS data table that needed to be updated on a daily basis, at about 10 am when all the primary data sources were finished refreshing. The problem was that updating that SAS table was taking about 30 minutes, during the entirety of which the table was not available to their business analysts, as it was locked by the updating process. This created a 30 minute downtime for the business analysts. The customer formulated the problem as “too long a data table update,” with the suggested solution being “to optimize the process of updating the table”.

Sounds logical, doesn’t it? But we should never take for granted the customer’s vision of the problem, especially when it’s accompanied by a suggested solution.  In most cases, this will lead to solving the wrong problem. Just keep your mind open and help your customers to keep their minds open too.

Problems do not exist by themselves without a purpose, a goal. The customer is the ultimate authority of setting those goals and purposes. Get that purpose to be articulated first before jumping to formulating a problem. Picture this dialogue:

- What is your goal - to minimize the data table refresh time or to minimize the business analysts’ downtime?
- Aren’t those the same?
- Let’s assume not.
- But how?
- Let’s not ask “how” just yet. If I reduce your business analysts’ downtime to zero, would you care if your data table is being updated 30 minutes or 15 minutes?
- No. Well, then our goal is to minimize the business analysts’ downtime.

We just made a huge leap – from an imaginary problem to a purpose. In fact, the duration of the table update is not important as long as the analysts’ downtime is minimized. So the real problem is the downtime, which we need to reduce or, ideally, eliminate.

When the real problem is well-defined, finding a solution is a matter of practice. As Albert Einstein once said, “If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute resolving it.”  It makes a lot of sense, since spending all your time on solving the wrong problem does not move you an iota in the right direction.

Creative problem-solving

There are many proven methods and techniques of creative problem-solving. One of them is using analogies. To arrive at “how” (the solution), let’s consider the following analogy. Suppose we want to renovate a restaurant but we don’t want to stop its operations and lose its patrons during that renovation. What would we do? An ideal solution would be to build a new restaurant building next to the old one. After we finish the construction, we would simply take the sign from the old building and affix it to the new one. Problem solved. We can demolish or reuse the old building.

For SAS data tables that would translate into this. Let’s say our data table is called ANALYTICS. Instead of rebuilding it “in place,” we build a new table, let’s call it ANALYTICS_NEW. We can spend 30 minutes or 15 minutes doing this – it does not matter. After we finish table-building we can just make two quick renames:

  1. Rename  ANALYTICS table to ANALYTICS_OLD;
  2. Rename ANALYTICS_NEW table to ANALYTICS.

Using CHANGE statement of PROC DATASETS these 2 renames would take about 0 seconds – that is our new downtime. After that we can delete the ANALYTICS_OLD table. Looks like we just found an ideal purpose-driven solution to effectively eliminate (not just reduce) the downtime.

I can foresee some inquisitive minds asking why we can’t do the table update in place without locking it out by using SAS/SHARE software.  Yes, we can, and that would be a “refrigerator” solution to an “ice-cream maker” problem. Unfortunately, not all the refrigerators are won in a lottery.

Are you solving the wrong problem? was published on SAS Users.