Tech

4月 282016
 

Look at the report below. Imagine being asked to allow your users to select which Measure, highlighted in yellow, they are looking at: Income, Expense or Profit. This is a frequent report requirement and I’m going to outline just one of the ways you can design your report to satisfy this request using parameters.

In a previous blog, I talked about Using parameters in SAS Visual Analytics reports to prompt users to drive either an aggregated measure or calculated item. This example will allow your users to dynamically select which measures they want to see in their visualizations.

parameters to pick your metric in Visual Analytics Reports1

Steps

  1. Create the custom category to drive the button bar
  2. Create the parameter to hold the button bar’s selection
  3. Create the calculated data item that will hold the selected measure’s value
  4. Add report objects and assign roles

Step 1: Create the custom category to drive the button bar

On the Data tab, use the down menu and select New Custom Category….

Next, from the New Custom Category dialogue, select any category that has a cardinality greater than the number of metric choices you want to give your user. In this example, we will allow our report users to select between Revenue, Expense and Profit, so we have 3 options. Therefore, select a category with a cardinality greater than 3, so 4 or more will work.

parameters to pick your metric in Visual Analytics Reports2

The next part might seem awkward, but this prevents us from having to load a separate table into LASR. If you have the role and capabilities to load data into LASR then you could load a 3-row table that contains one column with the entries Revenue, Expense and Profit, and you could create your control object from that data source. However, you then create the dependency that both tables must be loaded to LASR for the report to work. The technique outlined in this blog allows us to use just this one data source, but restricts the placement of control object. Using this technique means that you cannot use this control object as a report or section prompt. IF you need to use this control as a report or section prompt then you will need to load a separate table.

Otherwise, follow along.

Name your custom category List of Measures. Next, create the labels of the measures you wish your report users to select from. Again, in my example, I want to allow my report users to pick either Revenue, Expense or Profit. Hint: Double click on Label 1 to rename it then use the New label button to add others.

Next, drag and drop at least one value into each custom category label grouping. The custom category data item will only allow you to save it once each category label grouping has a value. You may leave the radio button default Group remaining values as Other selected.

Finally, click OK to save your new custom category.

parameters to pick your metric in Visual Analytics Reports3

Step 2: Create the parameter to hold the button bar’s selection

Now that we have the custom category to feed the button bar’s values, we need to create a parameter to hold the button bar’s selection. From the Data tab, use the drop down menu and select New Parameter….

Select the Type Character and give your parameter a meaningful name. You can leave the current value, or default value, blank. When we assign the parameter to the button bar the value will be populated upon selection.

parameters to pick your metric in Visual Analytics Reports4

Step 3: Create the calculated data item that will hold the selected measure’s value

Here we need to create a calculated data item that will hold the value of the selected measure. This is the measure we will use in our table and graph objects. In pseudo code, we want to create a new measure that

If user selects "Expense" Return <Expense>
Else, If the user selects "Profit" Return <Profit>
Else Return <Revenue>

From the Data tab, use the drop down menu and select New Calculated Item….

parameters to pick your metric in Visual Analytics Reports5

Then use the Visual or Text editor to create a new calculated item named Measure.
Use nested IF…ELSE statements from the Boolean operators and the x = y statement from the Comparison operators.

parameters to pick your metric in Visual Analytics Reports6

Step 4: Add report objects and assign roles

Now we can add our report objects to our report. IMPORTANT: With our technique of using an unrelated category as the source of our custom category we cannot put our button bar in the report or section prompt areas. If we put the button bar in either the report or section prompt areas it would automatically filter the data and we would not get the intended results. DO NOT put the button bar in the report or section prompt area.

In this sample report, you can see I’ve added the Button Bar Control object, the Bar Chart, Crosstab and Line Chart objects to the report area. I want to narrow this section by year, so I have included a Drop-Down List Control to the section prompt area.

parameters to pick your metric in Visual Analytics Reports7

Now let’s look at the role assignments for our various objects. Here is what the report looks like before any style enhancements:

parameters to pick your metric in Visual Analytics Reports8

Drop-Down List Control Object

parameters to pick your metric in Visual Analytics Reports9

Button Bar Control Object

This is where we want to assign our newly created custom category and save that selection to our parameter. This will give us all the categories of our custom category, including the “Other” category.

parameters to pick your metric in Visual Analytics Reports10

Next, we will need to add a filter on this object to remove the “Other” category. From the Filters tab, add a filter for the List of Measures category and deselect the “Other” category.

parameters to pick your metric in Visual Analytics Reports11

Bar Chart, Crosstab and Line Chart

Use our new calculated item as the measure for each of the Bar Chart, Crosstab and Line Chart.

parameters to pick your metric in Visual Analytics Reports12

There is one interaction, where the Bar Chart filters the Crosstab.

parameters to pick your metric in Visual Analytics Reports13

And here is what the report looks like after I’ve altered some of the objects’ styles from the Style tab. I colored the selected bar value to coordinate with the rest of the report objects.
parameters to pick your metric in Visual Analytics Reports14

In this screenshot, I’ve selected the Promotional Product Line.

parameters to pick your metric in Visual Analytics Reports15

Other Applications

As you can envision, this technique can be used for more than just metrics, you could also use this to allow your users to pick a category to use in your objects! Or both.

Just bear in mind that you are restricted to using the Control Objects that support parameters. This includes:

  • Text Input
  • Button Bar
  • Drop-Down List

Slider (single-point only)

 

 

 

 

 

tags: SAS Professional Services, SAS Visual Analytics

Use parameters to pick your metric in Visual Analytics Reports was published on SAS Users.

4月 262016
 

SAS Global ForumWhen was the last time an informational graph or chart caught your eye? I mean, really caught your eye in a way that made you want to emblazon it on a greeting card or frame it for your office?

What’s that…never?

Me neither, until I had the opportunity to see some of the striking visuals and graphics by David McCandless and hear about the thought and passion that goes into his work as a data journalist. McCandless, the author of Knowledge is Beautiful, was a keynote speaker at SAS® Global Forum April 21, an event traditionally focused on the more technical and logistical aspects of analyzing data.

I’m not exaggerating when I say that I was moved by the informative digital images displayed across the conference venue jumbo screens the way some might be moved by a famous painting or sculpture. They revealed depth of understanding and presented analytical findings in such unexpected ways through story, shape, color and connection.

They were beautiful, indeed. But McCandless was quick to point out that it’s important that data visualization transcends aesthetic beauty and aids comprehension

That’s important when you’re faced with billions of numbers and facts. “Images allow us to see something important in a sea of data,” he said. “They tell a story.”

McCandless says the story often lies not in the data points themselves but in the gaps and modulations. “When you visualize data this way, you have a different relationship with it,” he said. “To be able to see it, see the data, helps us understand.”

Many in the analytics world have heard the phrase that data is the new oil, the new fuel to power and motivate business. But McCandless offered a twist on the modern day buzz phrase. “I like to think of data as the new soil,” he said. “Get in and get your hands dirty.” What is revealed could take root and flourish in ways you never imagined.

McCandless also encouraged attendees to give themselves the gift of time and spontaneity when digging into data. “Eject a little play and you may get unexpected results,” he said. He shared visuals created around his own areas of interest, images crafted just for fun, ranging from based-on-truth movies to more than 80 thousand horoscopes. Playing with data is a great way to learn techniques, stretch the imagination, and reveal more memorable ways of sharing business data.

The visual graphics you create may not find a spot above the living room couch, but if they hang in the minds of decision-makers and compel those who rely upon your analysis to change the way they see things, I think McCandless would agree: That is a beautiful thing.

View the full keynote presentation (and catch a glimpse of some of those stunning graphics) on the livestream archive.

 

 

 

 

 

 

 

tags: data visualization, SAS Global Forum

Data is the new soil; get your hands dirty was published on SAS Users.

4月 222016
 

ProblemSolversThe DS2 programming language gives you the following powerful capabilities:

  • The precision that results from using the new supported data types
  • Access to the new expressions, write methods, and packages available in the DS2 syntax
  • Ability to execute SAS Federated Query Language (FedSQL) from within the DS2 program
  • Ability to execute code outside of a SAS session such as on SAS® High-Performance Analytics Server or the SAS® Federation Server
  • Access to the threaded processing in products such as the SAS® In-Database Code Accelerator, SAS High-Performance Analytics Server, and SAS® Enterprise Miner™

Some DATA step functionality is not available in DS2, at least not how you are used to. However, don’t lose hope, because this article discusses ways to mimic some of the missing DATA step features within DS2.

Simulate Missing SET Statement Data Set Options

Many of the SET statement options are not allowed within DS2, such as the OBS= data set option. However, you can simulate some of these options by using FedSQL. For example, you can use a LIMIT clause similar to the following in the place of OBS=:

{select * from work.temp limit 10}.

Here is an example:

data one;                                                                                                                               
   do i = 1 to 100;                                                                                                                       
      output;                                                                                                                               
   end;                                                                                                                                   
run;                                                                                                                                    
                                                                                                                                        
proc ds2;                                                                                                                               
data new(overwrite=yes);                                                                                                               
   method run();                                                                                                                         
      set {select * from work.one limit 10};                                                                                               
   end;                                                                                                                                  
enddata;                                                                                                                               
run;                                                                                                                                    
quit;                                                                                                                                   
                                                                                                                                        
proc print data=new;                                                                                                                    
run;    

 

Prevent Errors from Duplicate Data Set Names

In a DATA step, an automatic overwrite occurs when you issue a DATA statement that contains a data set name that was used previously in your SAS session.

For example, in the DATA step, you can add the following code:

data one;
   x=100;
run;
data one;
   x=200;
run;

This code overwrites the previous ONE data set. However, this automatic overwrite does not occur within DS2, and an error is generated if a specified data set name already exists. To work around this problem, you can use the OVERWRITE option as shown below.
proc ds2;
data one(overwrite=yes);
   dcl double x;
   method init();
      x=100;
   end;
run;
quit;

Specify Name Literals

In a DATA step, you can use name literals. However, in DS2, they are specified differently.

In Base SAS® with the VALIDVARNAME system option set to ANY, you can use a name literal like the following:

'My var'n=100;

This strategy does not work in DS2, but you can use double quotation marks to get the same results:

"My var"=100;

Substitute Missing Statements

The ATTRIB, LABEL, LENGTH, FORMAT, and INFORMAT statements are missing from DS2. However, you can use the DECLARE statement with the HAVING clause to perform these functions.

Here is an example:

dcl double aa having
   label 'var aa'
   format comma8.2;

Create an Array

In Base SAS, you use the ARRAY statement in a DATA step to create an array to reference variables within the program data vector (PDV). Here is an example:

data one;                                                                                                                               
   array list(4) x1-x4;                                                                                                                  
   do i = 1 to 4;                                                                                                               
      list(i)=i;                                                                                                                          
   end;                                                                                                                                 
run;

The ARRAY statement does not exist in DS2, but the following code shows how to use an equivalent statement called VARARRAY:
proc ds2;                                                                                                                               
data one(overwrite=yes);                                                                                                               
   vararray double x[4];                                                                                                                 
   declare double i;                                                                                                                     
   method init();                                                                                                                       
   do i = 1 to 4;                                                             
      x[i]=i;                                                                                                                            
   end;                                                                                                                                
   end;                                                                                                                                 
enddata;                                                                                                                               
run;                                                                                                                                    
quit;  

Note: The VARARRAY statement must be outside the METHOD statement.

Enable Macro Variable Resolution

To reference a macro variable as a character value in the DATA step, you place double quotation marks around the macro variable as shown in the following example:

%let val=This is a test;
data _null_;
   dval=”&amp;val”;
   put dval=;
run;

In DS2, double quotation marks are used only to delimit an identifier. Single quotation marks are required to delimit constants. If the above code was run within DS2, a warning similar to the following would occur:

Solutions for missing DATA step features within DS2

To get a better understanding of the difference between an identifier and constant text, consider the following two examples:

VARA=’test’;
VARB=”vara”;

Within DS2, the first assignment statement creates a variable called VARA and assigns it a text string of test. The second assignment statement creates a variable called VARB and also assigns it a text string of test. Since the second assignment statement is using double quotation marks, vara is seen as an identifier and the identifier’s value is placed into the variable VARB.

Since constant text is represented by single quotation marks in DS2, there needs to be a way to resolve the macro variable within quotation marks. Luckily, within DS2, there is a SAS supplied autocall macro called %TSLIT that enables macro variable resolution within single quotation marks.  Here is an example:

%let val=This is a test;                                                                                                                
proc ds2;                                                                                                                               
data _null_;                                                                                                                           
   method init();                                                                                                                         
      declare char(14) dval;                                                                                                               
      dval=%tslit(&amp;val);                                                                                                                  
      put dval=;                                                                                                                          
   end;                                                                                                                                   
enddata;                                                                                                                               
run;                                                                                                                                    
quit;

I hope this blog post has been helpful. If you have any questions, please contact SAS Technical Support and we will be happy to assist you. Thanks for using SAS!

 

tags: Problem Solvers, PROC DS2, SAS Programmers

Solutions for missing DATA step features within DS2 was published on SAS Users.

4月 222016
 

SAS Global ForumWhen you attend SAS® Global Forum - a conference where you’re surrounded by data scientists, programmers and those who grew up as the smartest people in the room – you expect to hear talk about big data and advanced analytics.

What you don’t expect to hear are compelling messages about the importance of art, storytelling…and unicorns.

Ellen Warrillow, President of Data Insight Group, Inc., delivering her talk at SAS Global Forum

Ellen Warrillow, President of Data Insight Group, Inc., delivering her talk at SAS Global Forum

But Ellen Warrillow, President of Data Insight Group, Inc., couldn’t have been more convincing in her April 19 session highlighting the magic formula for becoming a well-sought-after marketing analyst. Her first hint: It requires much more than good programming skills.

She believes in the marriage of art and science. “When you put those two together, that’s where you get the wonder,” she said.

Wonder? Maybe that’s where unicorns come in.

In a sense, perhaps. Actually, unicorns – a rare breed of marketing technologists who understand both marketing and marketing technology (with a nod to John Ellett, contributor to Forbes) – are those who take the time to be curious and recognize that storytelling and imagery are like an analyst’s Trojan horse. Warrillow says they’re the way you get in.

For the data story to have real power, she believes, it needs to be memorable, impactful and personal. “Ask what the business will do with the results,” she said. “Think about what the listener might tell their boss or their coworker at the water cooler. That will be memorable.”

Today’s data visualization products make this easier than ever before. To build new skills in these seemingly foreign areas, she also suggested tapping into the power of user groups and creative teams in other parts of the organization.

Warrillow offered five tips to becoming that rare breed unicorn:

  1. Take time to align your analytic objectives with business objectives.
  2. Ask yourself what questions the business is asking. Insist on time to be curious and wonder.
  3. Tell stories to help your audience relate both rationally and emotionally to your message.
  4. Match the picture to the content and ensure it is telling the story.
  5. Look for ways that new technology may provide you with more efficient and effective ways to do your job.

“It’s a tall order,” she said. “Unicorns are rare and they’re hard to find. But the more you can take the time to understand all the pieces, the better analyst you’re going to be.”

Let the magic begin.

View the full paper
View the full presentation

tags: SAS Global Forum

Always be yourself, unless you can be a unicorn was published on SAS Users.

4月 222016
 

SAS Global ForumEditor's Note: In addition to the summary included in this blog, you can view videos of the following product demonstrations from the Technology Connection at SAS Global Forum 2016 by clicking on the links below:

TechnologyConnection

Executive VP and CTO Armistead Sapp delivers opening remarks at the Technology Connection

“For over 40 years, we’ve seen it, solved it,” said SAS Executive Vice President and Chief Technology Officer Armistead Sapp in his opening remarks at the SAS® Global Forum Technology Connection. If his explanation of SAS differentiators and the road ahead serve as any indication, SAS is just getting started.

So what does set SAS apart? Sapp believes it’s:

  • 40 years of analytics in action.
  • Technology that meets users’ skill sets.
  • Innovation driven by strategy.
  • Analytics that impact the world.

Since last year’s conference alone, he said, a total of 326 products have released, including 88 deployment tools and utilities. That’s a lot of code, but Sapp reiterated that SAS’ first priority to solve for quality, then performance and then new features remains unchanged.

SAS CEO Jim Goodnight and other presenters announced several new offerings during Opening Session the night before, but an impressive crew of Technical Connection speakers and demonstrations gave attendees a look under the hood.

Senior Product Manager Mike Frost, SAS Global Forum Technology Connection

Senior Product Manager Mike Frost speaks at SAS Global Forum Technology Connection

SAS Senior Product Manager Mike Frost served as the on-stage ringmaster, guiding attendees through real organizational scenarios and dilemmas presented via video.

“Whether you’re a data scientist, statistician, IT analyst, business analyst or even someone who employs or manages folks in these roles, you will be able to see how what we’re doing in SAS® Viya™ will deliver value,” he said.

Technical presenters (in order of appearance), the product offering they demonstrated, and key takeaways included:

SAS® Cloud Analytics – Juthika Khargharia, Senior Solutions Architect, Products, Marketing and Enablement:

  • Access analytic applications from a web browser to quickly build predictive models.
  • Access to SAS’ API for embedded analytics from any client language and incorporate them into current business applications and processes.
  • Zero set-up, with no worries about spinning up a cluster or installing any software. Users get a secure, cloud-based location to store data to analyze and save results of those analyses.

SAS® Visual Analytics – Jeff Diamond, Director, Research and Development

  • SAS Visual Analytics is running on SAS Viya.
  • Users can move seamlessly between data exploration, report design and modeling, as three offerings have merged into this new single user experience: SAS® Visual Analytics Designer, SAS® Visual Analytics Explorer and SAS® Visual Statistics.
  • The interface has been rewritten as an HTML5 application.

SAS® Customer Intelligence 360 – Michele Eggers, Senior Customer Intelligence Product Line Director, Products, Marketing and Enablement

  • Analytics “the way you need it.”
  • Software as a Service cloud offering with volumized pricing.
  • Omni-channel, offering the most comprehensive customer intelligence hub.

SAS® Visual Investigator – Gordon Robinson; R&D Director, Products, Marketing and Enablement

  • Detection of threats can now be automated. The offering can pull information from websites, social media and various databases, drawing associations between disparate datasets.
  • Can be used by analysts to perform efficient investigations.
  • Can be configured to meet the needs of many types of solutions, including fraud, public security and more.

SAS® Environment Manager – Evan Guarnaccia, Solutions Architect, Products, Marketing and Enablement

  • Web-based window into a seamless administrative experience.
  • Can process simple alerts and it can use machine learning to identify problematic conditions that have not yet been modeled.
  • SAS Viya architected to make maximum use of the capabilities in these technologies.

Vice President of Product Management Ryan Schmiedl offered closing remarks, talking more about the journey SAS and its customers have been on over the years.

“We are continuing to deliver on our promise to solve today’s problems and tomorrow’s problems,” he said. “It’s powerful stuff, revolutionary things. It’s going to change the market.”

tags: SAS Global Forum

Highlights from SAS Global Forum: Technology Connection was published on SAS Users.

4月 222016
 

SAS Global ForumImpressive innovations and exciting announcements took center stage (literally) at Opening Session of SAS Global Forum 2016. Near the end of the session, SAS CEO Jim Goodnight shared news about SAS’ new architecture that had everyone abuzz.

SAS® Viya™ - There’s a new headliner in Vegas

“We are unveiling a quantum leap forward in making analytics easier to use and accessible to everyone,” Goodnight said. “It’s a major breakthrough and it’s called SAS Viya.”

Goodnight was also quick to point out that SAS Viya will work with customers’ existing SAS 9 software.

Goodnight invited Vice President of Analytic Server Research and Development Oliver Schabenberger, who led the development work for SAS Viya, to join him on stage to discuss the new cloud-based analytic and data management architecture.

Jim Goodnight makes some exciting announcements at SAS Global Forum 2016 Opening Session

Jim Goodnight shares exciting announcements at SAS Global Forum 2016 Opening Session

“We see great diversity in the ways our customers approach and consume analytics,” Schabenberger explained. “From small data to big data. From simple analytics to the toughest machine learning problems. Data in motion and data at rest. Structured and unstructured data. Single users and hundreds of concurrent users. In the cloud and on premises. Data scientists and business users.”

SAS has developed a truly unified and integrated modern environment that everyone can use, whether you are a data scientist or a business analyst. “The beauty of SAS Viya is that it’s unified, open, simple and powerful, and built for the cloud,” said Schabenberger. “Today we are moving to a multi-cloud architecture.”

Goodnight encouraged customers to be “sure to try it out. I think you will enjoy the new SAS Viya.”

The SAS Viya procedural interface will be available to early adopters in 30 days, with visual interfaces scheduled for a September release. Customers can apply to be part of the SAS Viya early preview program.

SAS Customer Intelligence 360 and SAS Analytics for IoT announced

SAS Viya wasn’t the only “star” of the evening.

Goodnight lauded the company’s continuing efforts to globalize and expand ways to make our software faster and easier to use. On the development side, he highlighted SAS Customer Intelligence 360, SAS® Forecast Studio, SAS® Event Stream Processing, SAS® Cybersecurity and the next generation of high performance analytics.

Executive Vice President and SAS Chief Revenue Officer Carl Farrell took the stage to share examples of the many diverse uses of SAS. “Today, our customers are so much more educated on big data and analytics,” Farrell said. “CEOs are realizing that analytics can help them draw more value for their business around that data.”

Farrell singled out several customers including Idea Cellular Ltd. in India, which is processing a billion transactions a day -- something that was impossible before high performance analytics – and Macy’s customer intelligence project that is focused on making real-time offers to customers as they walk through a store, creating a personal and immediate experience.

Farrell also said he was so proud of the SAS work being done outside of business, in the data for good realm, specifically mentioning work in Chile combatting the Zika virus and the work of the Black Dog Institute, which conducts research to improve the lives of people with mental illness.

“Our customers are doing amazing things with SAS that we couldn’t have imagined 40 years ago, and this is just the tip of the iceberg and there’s so much more to come,” Farrell said.

Jeromey Farmer accepts the 2016 User Feedback Award from Annette Harris.

Jeromey Farmer accepts the 2016 User Feedback Award from Annette Harris,

Speaking of stars, Senior Vice President of Technical Support Annette Harris applauded the SAS Super Users for their work in support communities. “SAS users have a rich tradition of helping each other in peer-to-peer forums,” said Harris.

Harris also recognized the 2016 SAS User Feedback Award winner, Jeromey Farmer, a Treasury Officer from the Federal Reserve Bank of St. Louis, noting that SAS gained strong insights from Farmer into how SAS can more seamlessly integrate in a complex and secure environment.

SAS Executive Vice President and Chief Marketing Officer Randy Guard took the stage to announce SAS® Analytics for IoT and to talk about some macro trends he is seeing, including the digital transformation taking place in business and technology. He cited an IDC report that stated by the end of 2017, two-thirds of all CEOs will have digital transformation – across their company – at the top of their agenda.

Customers want help in managing their data, including streaming data, and want analytics embedded in their applications, he added. He calls the latter “analytics any way you want it.”

Customers also want software as a service, including self-service, and want to know how to monetize the connectivity and continuous load of data. “That hits our sweet spot in analytics at SAS,” he said. “The transformation is under way and we are investing money to make this transition smoother for our customers.”

40 and Forward

Woven throughout Opening Session were references to SAS’ 40 years in business.

Asked about what has changed over the years, Goodnight recalled that when SAS started, there was one product on a single machine. Now we have more than 200 products on dozens of machines. Back then, a computer could process about 500 instructions a second. Now it’s up to 2 to 3 billion instructions a second. The very first disk drives were two feet across, with tapes containing about five million bytes. Now we can get 1.2 terabytes in the size of a K-cup.

As for key milestones over the 40 years, Goodnight said two things came to mind. One was the introduction of multivendor architecture in the mid-1980s so our software could run on all platforms, and the other was the advent of massively parallel computing.

Not surprisingly, given the milestone anniversary year for SAS, the Opening Session ended with a video retrospective looking back on world news from the 1970s through today, with a cameo appearance by Goodnight from the early days of SAS.

If you want to view a recording of Opening Session, visit the SAS Global Forum Video Portal.

tags: SAS Analytics for IoT, SAS Customer Intelligence 360, SAS Global Forum, SAS Viya

Highlights from SAS Global Forum: Opening Session was published on SAS Users.

4月 212016
 

SAS Global ForumEditor's Note: There are hundreds of breakout sessions happening at SAS Global Forum in both the Users and Executive programs. Since we couldn’t just pick one to highlight, we decided to put together a SAS Global Forum day 2 session roundup, highlighting some of our very favorites!

Don’t overlook data management when it comes to cybersecurity analytics

There’s a constant buzz in the market around analytics and its role in the cybersecurity space, but often the conversation overlooks the important role data management plays. Data management is a fundamental component SAS cyber experts want to be sure organizations understand – just because the investment is being made in cyber analytics doesn’t mean companies can ignore data quality and data management.

“There are countless solutions and dollars spent to protect organizations,” said SAS’ Director of Cybersecurity Christopher Smith. “All of those pieces – firewalls, endpoints and email gateways – play a vital role, but those systems don’t communicate with each other.” Even with all the investment organizations are making to protect themselves, there is still no greater insight being gained into what’s actually happening inside company walls.

What’s needed is business context, and that’s something isolated solutions cannot provide. While those systems are valuable in identifying what’s good, what’s bad and what can be defined, they offer limited business intelligence.

But the challenge isn’t just about obtaining data, it’s about the speed, type, structure and volume of data being generated per second.

“We are working in a society where everyone is looking for a silver bullet,” said Vice President of Business Consulting and Data Management Evan Levy. “People are buying products to solve problems, but it’s more complicated than that. The volume, need and diversity of content and sources isn’t something we could have ever predicted.”

Levy said that’s where data management becomes critical. Companies have to enlist the proper data management techniques to avoid lagging in security and exposing themselves to added risk with every attack. By looking at what’s actually happening, companies can see what the data is saying and then develop an effective response.

The fear today is not what happened, it’s the unknown of what else has happened that we haven’t yet identified. “Once data is created it will always be an asset to the business,” said Smith, which means it must be catalogued to offer value. Effective cyber protection requires sophisticated analytic prowess with rich data history in order to protect organizations from the clever and skilled hackers.

Learning from past mistakes

In his April 20 Executive Conference breakout session, Sterling Price, Director of Customer Analytics at WalMart Stores, Inc., cautioned against relying too heavily on completed analytical projects, assuming that new technologies and massive data sets produce an accurate and relevant result. He used several historical examples, from the Google Flu prediction mishap to the faulty prediction outcome of the 1936 US presidential race, to help prove the point.

Big data, it turns out, is simply the newest phenomenon tempting leaders to believe their outcomes are statistically sound. "We owe our organizations objective analysis based on science, not wishful thinking," said Price.

Here are five points gleaned from his personal experience at Walmart as well as the historical examples he shared:

  1. Don't fall prey to the belief that results will be accurate and useful because of how much data was used.
  2. We still need to sample things, but a badly chosen large sample - even a really big one - is much worse than a well-chosen small sample.
  3. Methodology still matters. Big data by itself does nothing. How we use it defines its value.
  4. Scalability should be considered up front.
  5. Don't mistake statistical significance for practical significance. They are not the same.

More from this presentation.

Arrest Prediction and Analysis in New York City

Analyzing "stop and frisk" data captured by the New York City Police Department can lead to insights that help cops make better decisions about whether to arrest a person or not, say two Oklahoma State University graduate students.

Karan Rudra and Maitreya Kadiyala looked at open source data from the NYPD to understand the propensity of arrest and optimize frisk activities. This type of analysis can potentially reduce the number of stops and impact the arrest rate.

The pair examined 56 variables, including in which precinct a stop occurred, whether a stop led to an arrest, whether the officer produced an ID and shield, and whether a person was stopped inside or outside of a building.

Using SAS® Enterprise Miner™, they built and compared four models, determining that a polynomial regression model was the best. Some findings from their research include:

  • In the Bronx and Manhattan, females have the highest percentage of arrests after a stop and frisk.
  • In Staten Island, though there are a high number of stops per area, the number of resulting arrests is comparatively low.
  • Blacks and Hispanics have a higher percentage of arrests after a stop.
  • The overall arrest rate of the data sample was 6 percent.

More from this presentation

tags: SAS Global Forum

Highlights from SAS Global Forum: Cybersecurity, analytic failure and getting arrested was published on SAS Users.

4月 212016
 

SAS Global ForumThere are hundreds of breakout sessions happening at SAS Global Forum in both the Users and Executive programs. Since we couldn’t just pick one to highlight from opening day, we decided to put together a SAS Global Forum day 1 session roundup, highlighting some of our very favorites!

The big data behind fantasy football

With millions of users, peak traffic seasons and thousands of requests a second for complex user-specific data, fantasy football offers many challenges for even the most talented analytical minds. Clint Carpenter, one of the principal architects of the NFL fantasy football program, shared strategies and lessons learned behind football fanatics’ favorite application.

Fantasy football combines a high volume of users with detailed personalized data; multiple data sources; various data consumers; and high peak volumes of request. The challenge is to process the data from the stadium playing field and user devices, and make it easily accessible to a variety of different services. If there’s something to learn from developing and analyzing fantasy football over the years, Carpenter said it’s these three things: don’t blindly trust data or specifications; spend time planning upfront to avoid problems in the end; and test for data integrity, performance and for the whole system. “If you test well, you will have happy stakeholders,” said Carpenter. “If you don’t, you are asking for unhappy users and sleepless nights.”

More from this presentation.

One university’s solution to the data science talent gap

Is it time for a Ph.D. in data science? If you ask Jennifer Lewis Priestly, who happens to be the director of Kennesaw State University’s new Ph.D. in data science, the answer is yes, but there are areas we have to consider and address in order to make it work.

“Closing the talent gap is a problem and a challenge for our global economy,” said Priestly. The demand for deep analytical talent in the United States could potentially be 50 to 60 percent greater than its projected supply by 2018. And that demand is creating a first for academia, forcing companies across industry sectors to chase the same talent pool of students.

But it’s not just the skills gap that has to be addressed, Priestly said we also have to consider the rising master’s degree explosion. Today, analytically-aligned master’s programs are popping up across the country, and most can be completed between 10 to 18 months. But can institutions transform a student into a data scientist that fast? Offering a data science Ph.D. allows students to dive into the complexity of data science, rather than skim the surface.

So, if we find the talent and design the program, who will teach all of these students? “We have to put these students out into the market to fill these jobs, but we also have to put them back into colleges and universities to train up our future talent,” Priestly said.

View a video of this talk.
View the presentation.

Turning data into stories

Your data matters, but unless people emotionally connect with the data presented, it’s going to fall short. By not offering context, you risk having an audience miss your vision, draw their own conclusions or misunderstand the root of the problem you are trying to solve.

The question then becomes how? How do you actually get someone to engage and connect with the numbers? You’ve got to tell a story. Bree Baich, Principal Learning and Development Specialist with SAS Best Practices, gave her session attendees tips and tricks to turning data into stories that make sense.

“Data storytelling is a real thing, connecting the logical and emotional parts of our brain to not just make sense of the data, but to connect it in a way that causes a response,” Baich said. With an easy, four-step plan, Baich helped attendees see how getting data-driven stories is easier than we think.

  1. The story setup allows your audience to become curious and garner interest from the start. It’s a way to spark curiosity upfront by using a hook.
  2. The context paints a picture of the current realities, providing real understanding of the information at hand.
  3. The options show your audience where you want them to go. Think of it as an opportunity to demonstrate why your option is the better choice that will make a real difference.
  4. The action leaves a call to action and is key to pushing stakeholders to make a decision or getting customers to purchase.
    Remember, data shouldn’t stand alone. Next time, shape it with a story!

More from this presentation.

 

 

 

 

 

 

tags: SAS Global Forum

Highlights from SAS Global Forum 2016: Fantasy football, the data science talent gap and data storytelling was published on SAS Users.

4月 192016
 

Copy Data to Hadoop using SASWith the release of SAS® 9.4 M3, you can now access SAS Scalable Performance Data Engine (SPD Engine) data using Hive. SAS provides a custom Hive SerDe for reading SAS SPD Engine data stored on HDFS, enabling users to access the SPD Engine table from other applications.

The SPD Engine Hive SerDe is delivered in the form of two JAR files. Users need to deploy the SerDe JAR files under “../hadoop-mapreduce/lib” and “../hive/lib” on all nodes of a Hadoop cluster to enable the environment. To access the SPD Engine table from Hive, you need to register the SPD Engine table under Hive metadata using the metastore registration utility provided by SAS.

The Hive SerDe is read-only and cannot serialize data for storage in HDFS. The Hive SerDe does not support creating, altering, or updating SPD Engine data in HDFS using HiveQL or other languages. For those functions, you would use the SPD Engine with SAS applications.

Requirements

Before you can access an SPD Engine table using Hive SerDe you have to perform the following:

  • Deploy the SAS Foundation software using SAS Deployment Wizard.
  • Select the product name “SAS Hive SerDe for SPDE Data”

Accessing SPD Engine Data using Hive

This will create a subfolder under $sashome with the SerDe JAR files

[root@sasserver01 9.4]# pwd
/opt/sas/sashome/SASHiveSerDeforSPDEData/9.4
[root@sasserver01 9.4]# ls -l
total 88
drwxr-xr-x. 3 sasinst sas 4096 Mar 8 15:52 installs
-r-xr-xr-x. 1 sasinst sas 8615 Apr 15 2015 sashiveserdespde-installjar.sh
-rw-r--r--. 1 sasinst sas 62254 Jun 24 2015 sas.HiveSerdeSPDE.jar
-rw-r--r--. 1 sasinst sas 6998 Jun 24 2015 sas.HiveSerdeSPDE.nls.jar
[root@sasserver01 9.4]#

  • You must be running a supported Hadoop distribution that includes Hive 0.13 or later:

> Cloudera CDH 5.2 or later
> Hortonworks HDP 2.1 or later
> MapR 4.0.2 or later

  • The SPD Engine table stored in HDFS must have been created using the SPD Engine.
  • The Hive SerDe is delivered as two JAR files, which must be deployed to all nodes in the Hadoop cluster.
  • The SPD Engine table must be registered in the Hive metastore using the metastore registration utility supplied by SAS. You cannot use any other method to register tables.

Deploying the Hive SerDe on the Hadoop cluster

Deploy the SAS Hive SerDe on the Hadoop cluster by executing the script “sashiveserdespde-installjar.sh”. This script is located in the SAS Hive SerDe software deployed folder. Follow the steps below, which describe the SAS Hive SerDe deployment on a Hadoop cluster.

  • Copy the script file along with two JAR files to one of the nodes (NameNode server). For example, in my test environment, files were copied to the sascdh01 (NameNode) server with user ‘hadoop’.

[hadoop@sascdh01 SPDEHiveSerde]$ pwd
/home/hadoop/SPDEHiveSerde
[hadoop@sascdh01 SPDEHiveSerde]$ ls -l
total 84
-rwxr-xr-x 1 hadoop hadoop 8615 Mar 8 15:57 sashiveserdespde-installjar.sh
-rw-r--r-- 1 hadoop hadoop 62254 Mar 8 15:57 sas.HiveSerdeSPDE.jar
-rw-r--r-- 1 hadoop hadoop 6998 Mar 8 15:57 sas.HiveSerdeSPDE.nls.jar
[hadoop@sascdh01 SPDEHiveSerde]$

  • The node server (NameNode) must be able to use SSH to access the other data nodes in cluster. It’s recommended to execute the deployment script as user ‘root’ or with sudo su command.
  • Switch user to ‘root’ or user with ‘sudo su’ permission.
  • Set the Hadoop CLASSPATH to include the MapReduce and Hive Library installation directory. Set the SERDE_HOSTLIST to include the server where JAR files will be deployed. For example, for my test environment the following statement is used.

export CLASSPATH=/usr/lib/hive/lib/*:/usr/lib/hadoop-mapreduce/lib/*
export HADOOP_CLASSPATH=$CLASSPATH
export SERDE_HOSTLIST="xxxxx..xxxx.com xxxxx..xxxx.com xxxxx..xxxx.com"

  • Execute the script as user ‘root’ to deploy the JAR files on all nodes under “ ../hive/lib” and “../hadoop-mapreduce/lib” subfolder. While running the script, provide the location of MapReduce and the Hive library installation folder as parameters to script.

For example:

sh sashiveserdespde-installjar.sh -mr /usr/lib/hadoop-mapreduce/lib -hive /usr/lib/hive/lib
[root@sascdh01 SPDEHiveSerde]# sh sashiveserdespde-installjar.sh -mr /usr/lib/hadoop-mapreduce/lib -hive /usr/lib/hive/lib
scp -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null /root/Downloads/SPDEHiveSerde/sas.HiveSerdeSPDE.jar root@sascdh01:/usr/lib/hive/lib
....
.........
scp -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null /root/Downloads/SPDEHiveSerde/sas.HiveSerdeSPDE.nls.jar root@sascdh03:/usr/lib/hadoop-mapreduce/lib
[root@sascdh01 SPDEHiveSerde]#

  • Restart YARN/MapReduce and Hive services on the Hadoop cluster.

Registering the SAS Scalable Performance Data Engine table in Hive metadata

The SPD Engine table that you are planning to access from Hive must be registered to Hive metadata using the SAS provided metadata registration utility. You cannot use any other method to register tables. The utility reads an SPD Engine table’s metadata file (.mdf) in HDFS and creates Hive metadata in the Hive metastore as table properties. Registering the SPD Engine table projects a schema-like structure onto the table and creates Hive metadata about the location and structure of the data in HDFS.

Because the utility reads the SPD Engine table’s metadata file that is stored in HDFS, if the metadata is changed by the SPD Engine, you must re-register the table.

The metadata registration utility can be executed from one of the Hadoop cluster node server, preferably NameNode server. The code examples mentioned here are all from NameNode server.

The following steps describe the SPD Engine table registration to Hive metadata.

  • Set the Hadoop CLASSPATH to include a directory with the client Hadoop configuration files and SerDe JAR files.

The following example is from my test environment where two SerDe JAR files are copied under the “/home/hadoop/SPDEHiveSerde/” subfolder. This subfolder is owned by OS user ‘hadoop’, i.e., the user who will execute the table registration utility. While exporting CLASSPATH, you must also include ../hive/lib folder as part of classpath. For the Hadoop configuration XML file, here we are using /etc/hive/conf folder. If you have a separate folder for storing Hadoop configuration files, you can plug in that folder.

export CLASSPATH=/home/hadoop/SPDEHiveSerde/*:/usr/lib/hive/lib/*
export SAS_HADOOP_CONFIG_PATH=/etc/hive/conf/
export HADOOP_CLASSPATH=$SAS_HADOOP_CONFIG_PATH:$CLASSPATH

As a result of exporting Hadoop CLASSPATH, the output from ‘hadoop classpath’ statement should look like as follows. Notice the value that you included in your previous export statement.

[hadoop@sascdh01 ~]$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/etc/hive/conf/:/home/hadoop/SPDEHiveSerde/*:/usr/lib/hive/lib/*
[hadoop@sascdh01 ~]$

  • Run the SerDe JAR command with appropriate command parameters and options to register the SPD Engine table. For example, the following command executes the SerDe JAR files and registers an SPD Engine table named stocks. It specifies the HDFS directory location (/user/lasradm/SPDEData) that contains the .mdf file of that SPD Engine table. The –table and –mdflocation parameters are required.

hadoop jar /home/hadoop/SPDEHiveSerde/sas.HiveSerdeSPDE.jar com.sas.hadoop.serde.spde.hive.MetastoreRegistration -table stocks -mdflocation /user/lasradm/SPDEData

[hadoop@sascdh01 ~]$ hadoop jar /home/hadoop/SPDEHiveSerde/sas.HiveSerdeSPDE.jar com.sas.hadoop.serde.spde.hive.MetastoreRegistration -table stocks -mdflocation /user/lasradm/SPDEData
16/03/09 16:46:35 INFO hive.metastore: Trying to connect to metastore with URI thrift://xxxxxxx.xxxx.xxx.com:9083
16/03/09 16:46:35 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/03/09 16:46:36 INFO hive.metastore: Connected to metastore.
16/03/09 16:46:36 INFO hive.MetastoreRegistration: Table is registered in the Hive metastore as default.stocks
[hadoop@sascdh01 ~]$

Reading SAS Scalable Performance Data Engine table data from Hive

Once the SPD Engine table is registered in Hive metadata, you can query the SPD Engine table data via Hive. If you describe the table with the formatted option, you will see that the data file locations are the SPD Engine locations. The Storage section provides information about SerDe library, which is ‘com.sas.hadoop.serde.spde.hive.SpdeSerDe’.

hive> show tables;
OK
…..
…….

stocks
Time taken: 0.025 seconds, Fetched: 15 row(s)
hive>

hive> select count(*) from stocks;
Query ID = hadoop_20160309171515_9db3aed5-0ba4-40cc-acc4-56acee10a275
Total jobs = 1
…..
……..
………………
Total MapReduce CPU Time Spent: 2 seconds 860 msec
OK
699
Time taken: 38.734 seconds, Fetched: 1 row(s)
hive>

hive> describe formatted stocks;
OK
# col_name data_type comment

stock varchar(9) from deserializer
date date from deserializer
open double from deserializer
high double from deserializer
low double from deserializer
close double from deserializer
volume double from deserializer
adjclose double from deserializer

# Detailed Table Information
Database: default
Owner: anonymous
CreateTime: Wed Mar 09 16:46:36 EST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://xxxxxxxxx.xxxx.xxx.com:8020/user/lasradm/SPDEData/stocks_spde
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
adjclose.length 8
adjclose.offset 48
close.length 8
close.offset 32
date.length 8
date.offset 0
high.length 8
high.offset 16
low.length 8
low.offset 24
numFiles 0
numRows -1
open.length 8
open.offset 8
rawDataSize -1
spd.byte.order LITTLE_ENDIAN
spd.column.count 8
spd.encoding ISO-8859-1
spd.mdf.location hdfs://xxxxxxxxx.xxxx.xxx.com:8020/user/lasradm/SPDEData/stocks.mdf.0.0.0.spds9
spd.record.length 72
spde.serde.version.number 9.43
stock.offset 56
totalSize 0
transient_lastDdlTime 1457559996
volume.length 8
volume.offset 40

# Storage Information
SerDe Library: com.sas.hadoop.serde.spde.hive.SpdeSerDe
InputFormat: com.sas.hadoop.serde.spde.hive.SPDInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.166 seconds, Fetched: 60 row(s)
hive>

How SAS Scalable Performance Data Engine SerDE reads data

The SerDe reads the data using the encoding of the SPD Engine table. Make sure that the SPD Engine table name is appropriate for the encoding associated with the cluster.

Current SerDe Implementation of Data Conversion from SAS to Hive

Accessing SPD Engine Data using Hive2

Limitations

If the SPD Engine table in HDFS has any of the following features, it cannot be registered in Hive or use the SerDe. You must access it by going through SAS and the SPD Engine. The following table features are not supported:

  • Compressed or encrypted tables
  • Tables with SAS informats
  • Tables that have user-defined formats
  • Password-protected tables
  • Tables owned by the SAS Scalable Performance Data Server

Reference documents

SAS(R) 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System, Third Edition

 

tags: SAS Administrators, SAS Professional Services

Accessing SPD Engine Data using Hive was published on SAS Users.

4月 162016
 

SAS Global ForumIs it just me, or does it feel a little bit like Christmas Eve? I think it's because SAS Global Forum 2016 is right around the corner, and for many SAS users, it's the most wonderful time of the year. If you're heading to Las Vegas, get ready for three days of learning from SAS peers, exchanging ideas, discovering new techniques for using SAS, and maybe, if you play your cards right (see what I did there?), a dash of fun as well. If only there was something exciting to do in Las Vegas...

All this sounds great if you're one of the 5,000 SAS users who will be at the event (April 18-21 @ The Venetian), right? But what if you can't make the trip to Las Vegas? Is there another way to experience some of the great content that will be shared there? I'm happy to say the answer is yes!

This year, SAS will provide dozens of hours of live video streaming from the event, so you can watch select sessions from the Users and Executive Programs from wherever you are. Live coverage will include Opening Session, all the keynote talks, select breakouts, Tech Talks, updates from The Quad, interviews with SAS executives and developers, and more. Additional videos will be available on the SAS Global Forum Video Portal. Here you'll find featured, most popular, and how-to videos, as well as episodes of Inside SAS Global Forum. You can even view videos from past events. Coverage will be available for on-demand viewing after the conference as well.

Video not your thing? No worries. SAS will provide several other ways to stay up to date. For starters, you can read any of a number of blog posts from the event. Posts will come from many different SAS blogs, but all posts from SAS Global Forum will be aggregated here.

If you're on LinkedIn, Twitter or Facebook, you can stay connected with what's happening and engage with attendees on SAS’ social media channels. Join the conversation, comment on some of the cool presentations you attended or viewed, discuss the exciting news coming out of the event, or simply follow along. The channels sure to have the most activity are the SAS Users LinkedIn group, the SAS Twitter account, and the SAS Users Group Facebook page. The hashtag for SAS Global Forum is #SASGF; be sure to use the hashtag in all your posts.

With all the opportunities to follow along, connect and contribute, you can be a part of SAS Global Forum 2016, whether you're attending in person or not. And if you're a SAS user, that's almost as exciting as a visit from Santa.

tags: Livestream, SAS Global Forum

How to participate in SAS Global Forum 2016...even if you're not going was published on SAS Users.