2月 062017
 

The financial sector has always been subjected to regulatory compliance laws and directives. Consumers, lawmakers and politicians would expect no less. But it's fair to say that the financial sector has witnessed a "hockey stick" trend regarding new regulations in recent years. Last year I talked about how compliance is […]

The post It's time to stop the reactionary compliance tactics appeared first on The Data Roundtable.

2月 062017
 

Suppose you create a scatter plot in SAS with PROC SGPLOT. What color does PROC SGPLOT use for the markers? If you specify the GROUP= option so that markers are colored by a grouping variable, what colors are used to represent the various groups? The following scatter plot shows the colors that are used by default for the HTMLBlue style. They are shades of blue, red, green, brown, and magenta.

data A;        /* example data with groups 1, 2, ..., 5 */
do Color = 1 to 5;
   x = Color; y = Color;  output;
end;
run;
 
title "Marker Colors Used for GROUP= Option";
title2 "HTMLBlue Style";
proc sgplot data=A;
xaxis grid; yaxis grid;
scatter x=x y=y / group=Color markerattrs=(size=24 symbol=SquareFilled);
run;
stylescolors1

Notice that these marker colors are not fully saturated colors, so they are not the SAS color names RED, BLUE, GREEN, BROWN, and MAGENTA. So what colors are these? What are their RGB values?


What colors does PROC SGPLOT uses for groups? #SASTip
Click To Tweet


Colors come from styles

Colors are defined by styles, and you can use ODS style elements to set marker colors. A style defines elements called GraphDataDefault, GraphData1, GraphData2, GraphData3, and so forth. Each element contains several attributes such as colors and line patterns. The complete list of style elements and attributes for ODS graphics is in the documentation, but for this article, the important fact is that the GraphDatan:ContrastColor attribute determines the marker color for the nth group. These are the colors in the previous scatter plot.

Display an ODS style template

Styles are defined by ODS templates. You can use the SOURCE statement in PROC TEMPLATE to display a template. In the style template, the GraphDatan:ContrastColor attributes are set by using keywords named gcdata1, gcdata2, gcdata3, etc.

If you display the template for the styles.HTMLBlue template, you will see that the HTMLBlue style inherits from the Statistical style, and it is the Statistical style that defines the contrast colors. The following statements display the contents of the Statistical template to the SAS log:

proc template;
source styles.statistical;
quit;

The template is long, and it is hard to scroll through the log to discover which colors are associated with each attribute. But that's no problem: you can use SAS to find and display only the information about contrast colors.

Display the marker colors as RGB and hexadecimal values

For years Warren Kuhfeld has been showing SAS customers how to view, edit, and use ODS templates to customize the graphs that are produced by SAS statistical procedures. A powerful technique that he uses is to write a template to a file and then use the DATA step to modify the template.

I will not modify the template but merely display information from it. The following DATA step writes the template to a text file and then uses the DATA step to find all instances of the keyword 'gcdata' in the template. For lines that contain the string 'gcdata', the program extracts the color for each keyword. The keyword-value pairs are saved to a data set, which is sorted and displayed:

libname temp "C:/temp";
proc template;
source styles.statistical / file='temp.tmp'; /* write template to text file */
quit;
 
data Colors;
keep Num Name Color R G B;
length Name Color $8;
infile 'temp.tmp';                    /* read from text file */
input;
/* example string:  'gcdata1' = cx445694 */
k = find(_infile_,'gcdata','i');      /* if k=0 then string not found */
if k > 0 then do;                     /* Found line that contains 'gcdata' */
   s = substr(_infile_, k);           /* substring from 'gcdata' to end of line */
   j = index(s, "'");                 /* index of closing quote  */
   Name = substr(s, 1, j-1);          /* keyword                 */
   if j = 7 then Num = 0;             /* string is 'gcdata'      */
   else                               /* extract number 1, 2, ... for strings */
      Num = inputn(substr(s, 7, j-7), "best2.");  /* gcdata1, gcdata2,...     */
   j = index(s, "=");                 /* index of equal sign     */
   Color = compress(substr(s, j+1));  /* color value for keyword */
   R = inputn(substr(Color, 3, 2), "HEX2.");   /* convert hex to RGB */
   G = inputn(substr(Color, 5, 2), "HEX2.");
   B = inputn(substr(Color, 7, 2), "HEX2.");
end;
if k > 0;
run;
 
proc sort data=Colors; by Num; run;
 
proc print data=Colors; 
var Name Color R G B;
run;
stylescolors2

Success! The output shows the contrast colors for the HTMLBlue style. The 'gcdata' color is the fill color (a dark blue) for markers when no GROUP= option is specified. The 'gcdatan' colors are used for markers that are colored by group membership. Obviously you could use this same technique to display other style attributes, such as line patterns or bar colors ('gdata').

If you prefer a visual summary of the attributes for an ODS style, see section "ODS Style Comparisons" in the SAS/STAT documentation. That section is part of the chapter "Statistical Graphics Using ODS," which could have been titled "Everything you always wanted to know about ODS graphics but were afraid to ask."

An application of setting marker colors

I prefer to style elements and discrete attribute maps to set colors for markers. But if you are rushed for time, you might want to use the STYLEATTRS statement to set the colors that are used for the GROUP= option. The STYLEATTRS statement requires a color list of hexadecimal colors or SAS color names. The following call to PROC SGPLOT uses the RGB/hex values for GraphData1:ContrastColor and so forth:

/* use colors for HTMLBlue style */
%let gcdata1 = cx445694;        
%let gcdata2 = cxA23A2E;
%let gcdata3 = cx01665E;
title "Origin in {Europe, USA}";
proc sgplot data=sashelp.cars;
where origin^='Asia' && type^="Hybrid";                 /* omit first category */
   styleattrs DataContrastColors = (&gcdata2 &gcdata3); /* use 2nd and 3rd colors */
   scatter x=weight y=mpg_city / group=Origin markerattrs=(symbol=CircleFilled);
   keylegend / location=inside position=TopRight across=1;
run;

It would be great if you could specify a style-independent syntax such as

styleattrs DataContrastColors=(GraphData2:ContrastColor GraphData3:ContrastColor);

Unfortunately, that syntax is not supported. The STYLEATTRS statement requires a list of color values or SAS color names.

Although this trick is interesting, in general I prefer to use styles (rather than hard-coded color values) in production code. However, if you want to know the RGB/hex values for a style, this trick shows how you can get them from an ODS template.

tags: SAS Programming, Statistical Graphics

The post What colors does PROC SGPLOT use for markers? appeared first on The DO Loop.

2月 032017
 

Having addressed the adaptability and power of an analytics environment in my last two posts, I thought I'd close out this mini-series of blogs by  providing the business and technology implications of three attributes that need to define any truly open and unified analytics environment: Cohesion Business: The platform enables […]

3 attributes of an open and unified analytics environment was published on SAS Voices.

2月 032017
 

I will begin with a short story.

SAS Global Forum, Content is KingLike many employers, McDougall Scientific, my employer, requires its employees to review, with their co-workers and managers, what they learned at a conference or course. They are also asked to suggest applications of their learnings so that McDougall might realize value from the expense, both in time and money, of sending them to continuing education events.

Fei Wang, my co-worker, and I attended SAS Global Forum last year in Vegas. During her presentation to co-workers upon our return, Fei not only provided a comprehensive overview of the conference format, sessions, and learning opportunities, but she also chose one presentation to highlight that will fundamentally improve one of our business processes.

Although Fei attended many sessions and learned much, session 8480-2016, with thanks to Steven Black, will save McDougall enough time and money to dwarf the expenditure of sending Fei to SAS Global Forum.

“But John,” you might ask, “why not simply search the proceedings after the conference?” Well, because we would never think to search for CRF annotation automation. Innovation of this sort is more easily found by attending the conference. Discovering valuable nuggets like Steven’s idea is a common occurrence at SAS Global Forum.

The value that employers realize from SAS Global Forum is the reason “content is king,” a cliché first introduced by the magazine publishing industry in the mid-1970s.

Our speakers represent every region of the world!

Though there are a number of really great benefits from attending the conference, great content continues to reign supreme at SAS Global Forum.  This year’s conference is no different. The 2017 Content Advisory Team has assembled a stellar lineup of well over 600 sessions; invited speakers, contributed papers, hands-on workshops, tutorials and posters. And, I am very proud to report that 25 countries are contributing speakers this year, with every region of the world represented: North, Central, and South Africa, Europe, Australia, the Middle East, Asia and the Americas. This sort of global diversity brings new ideas and new ways of looking at and solving problems that really grows your knowledge and helps move your organization forward.

In addition to all of this great technical content, we have made special effort to organize sessions that help SAS Users better present their work. As Melissa Marshall famously claims, “Science not communicated is science not done.” Therefore, in keeping with the SAS Global Users Group’s mission to champion the needs of SAS users around the globe, here is a sampling of sessions that will help you better communicate.

The list starts with Melissa herself!

Present Your Science: Transforming Technical Talks
Session T108, Melissa Marshall, Principal, Melissa Marshall Consulting LLC

This versatile half-day workshop covers the full gamut: content strategy, slide design, and presentation delivery. With a dynamic combination of lecture, discussion, video analysis, and exercises, this workshop will truly transform how technical professionals present their work and will help foster a culture of improved communications throughout the SAS community.
Read More

How the British Broadcasting Corporation Uses Data to Tell Stories in a Visually Compelling Way
Session 0824, Amanda J Farnsworth, Head of Visual Journalism, BBC News

… data is often seen as a dry, detached, unemotional thing that's hard to understand and for many, easy to ignore. At the BBC, employees have been thinking hard about how to use data to tell stories in a visually compelling way that connects with audiences and makes them more curious about the world that we live in. And, there is an ever-increasing amount of data with which to tell those stories. Governments are publishing more big data sets about health, education, crime, and social makeup. Academics are generating huge amounts of data as a consequence of research. Businesses and other organizations conduct their own research and polling. The BBC’s aim is to take that data and make it relevant at a personal level, answering the audiences' number one question: what does this mean for me?
Read More

Convince Me: Constructing Persuasive Presentations
Session 0862, Frank Carillo, CEO and Anne Coffey, Senior Director, E.C.G. Inc.

Data outputs do not a persuasive argument make. Effective persuasion requires a combination of logic and emotion supported by facts. Statisticians dedicate their lives to analyzing data such that it is appropriate supporting evidence. While the appropriate evidence is essential to convince your listeners, you first have to be able to gain and maintain their attention and trust. Persuasive presentations fight for hearts and minds, and are not a dry, unbiased recitation of facts or analyses. This session is designed to provide suggestions for how to utilize successful structures and create emotional connections.
Read More

Data Visualization Best Practices: Practical Storytelling Using SAS®
Session T117, Greg S Nelson, CEO, Thotwave Technologies LLC.

Data means little without our ability to visually convey it. Whether building a business case to open a new office, acquiring customers, presenting research findings, forecasting or comparing the relative effectiveness of a program, we are crafting a story that is defined by the graphics that we use to tell it. Using practical, real-world examples, students will learn how to critically think about visualizations.
Read More

Presentations as Listeners Like Them: How to Tailor for an Audience
Session 0408, Frank Carillo, CEO and Anne Coffey, Senior Director, E.C.G. Inc.

Data doesn't speak for itself. We speak for it, and how we do that influences how people view and interpret that data. One of the most overlooked aspects of presenting data is analyzing the audience. At no point in history have speakers had to face such heterogeneous audiences as they do today: there might be many as five different generations in the room, cross-functional teams have broad areas of expertise, and international companies integrate different cultures and customs. This session is designed to teach attendees how to analyze not the data, but the listeners. Who is your audience? What is important to them? What is your message …?
Read More

tags: papers & presentations, SAS Global Forum

At SAS Global Forum, Content is King was published on SAS Users.

2月 032017
 

It was just a few years ago that the idea of an Internet of Things (IoT) seemed far off, something out of a science-fiction movie. After all, why would a vehicle need to talk to the road?  Why would our utility meters need to talk to the central office? The […]

How can the Internet of Things help government agencies? was published on SAS Voices.

2月 012017
 

Each day, the SAS Customer Contact Center participates in hundreds of interactions with customers, prospective customers, educators, students and the media. While the team responds to inbound calls, web forms, social media requests and emails, the live-chat sessions that occur on the corporate website make up the majority of these interactions.

The information contained in these chat transcripts can be a useful way to get feedback from customers and prospects. As a result, the contact center frequently asked by departments across the company what customers are saying about the company and its products – and what types of questions are asked.

The challenge

Chat transcripts are a source for measuring the relative happiness of those engaged with SAS. Using sentiment analysis, this information can help paint a more accurate picture of the health of customer relationships.

The live-chat feature includes an exit survey that provides some data including the visitor’s overall satisfaction with the chat agent and with SAS. While 13 percent of chat visitors complete the exit survey (which is above the industry average), that means thousands of chat sessions only have the transcript as a record of participant sentiment.

Analyzing chat transcripts often required the contact center to pore through the text to identify trends within the chat transcripts. With other, more pressing priorities, the manual review only provided some anecdotal information.

The approach

Performing more formal analytics using text information gets tricky due to the nature of text data. Text, unlike tabular data in databases or spreadsheets, is unstructured. There are no columns that dictate what bits of data go where. And, words can be assembled in nearly infinite combinations.

For the SAS team, however, the information contained within these transcripts were a valuable asset. Using text analytics, the team could start to uncover and understand trends and connections across thousands of chat sessions.

SAS turned to SAS Text Miner to conduct a more thorough analysis of the chat transcripts. The contact center worked with subject-matter experts across SAS to feed this text information into the analytics engine. The team used a variety of dimensions in the analysis:

  • Volume of the chat transcripts across different topics.
  • Web pages where the chat session originated.
  • Location of the customer.
  • Contact center agent who responded.
  • Duration of the chat session.
  • Products or initiatives mentioned within the text.

In addition, North Carolina State University’s Institute for Advanced Analytics began to use the chat data for a text analytics project focused on sentiment analysis. This partnership between the university and SAS helped students learn how to uncover trends in positive and negative sentiment across topics.

The results

After applying SAS text analytics to the chat data, the SAS contact center better understood the volume and type of inquiries and how they were being addressed. Often, the analysis could point areas on the corporate website that needed updates or improvements by tracking URLs for web pages that were the launch point for a chat.

Information from chat sessions also helped tune SAS’ strategy. After the announcement of Windows 10, the contact center received customer questions about the operating system, including some negative sentiment about a perceived lack of support. Based on this feedback, SAS released a statement to customers assuring them that Windows 10 was an integral part of the product roadmap.

The project with NC State University has also provided an opportunity for SAS and soon-to-be analytics professionals to continue and expand on the analysis of chat transcripts. They continue to look at the sentiment data and how it changes across different categories (products in use, duration of chat) to see if there are any trends to explore further.

Today, sentiment analysis feeds the training process for new chat agents and enables managers to highlight examples where an agent was able to turn a negative chat session into a positive resolution.

SAS Sentiment Analysis and SAS Text Analytics, combined with SAS Customer Intelligence solutions such as SAS Marketing Automation and SAS Real Time Decision Manager, allow marketing organizations like SAS to understand sentiment or emotion within text strings (chat, email, social, even voice to text) and use that information to inform sales, service, support and marketing efforts.

If you’d like to learn more about how to use SAS Sentiment Analysis to explore sentiment in electronic chat text, register for our SAS Sentiment Analysis course. And, the book, Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS, offers insights into SAS Text Miner capabilities and more.

==

Editor’s note: This post is part of a series excerpted from Adele Sweetwood’s book, The Analytical Marketer: How to Transform Your Marketing Organization. Each post is a real-world case study of how to improve your customers’ experience and optimize your marketing campaigns.

tags: Adele Sweetwood, contact center, live chat, SAS Text Miner, sentiment analysis, text analytics, The Analytical Marketer

Using chat transcripts to understand customer sentiment was published on Customer Intelligence Blog.

2月 012017
 

In a previous article, I showed how to simulate data for a linear regression model with an arbitrary number of continuous explanatory variables. To keep the discussion simple, I simulated a single sample with N observations and p variables. However, to use Monte Carlo methods to approximate the sampling distribution of statistics, you need to simulate many samples from the same regression model.

This article shows how to simulate many samples efficiently. Efficient simulation is the main emphasis of my book Simulating Data with SAS. For a detailed discussion about simulating data from regression models, see chapters 11 and 12.

The SAS DATA step in my previous post contains four steps. To simulate multiple samples, put a DO loop around the steps that generate the error term and the response variable for each observation in the model. The following program modifies the previous program and creates a single data set that contains NumSamples (=100) samples. Each sample is identified by an ordinal variable named SampleID.

/* Simulate many samples from a  linear regression model */
%let N = 50;            /* N = sample size               */
%let nCont = 10;        /* p = number of continuous variables */
%let NumSamples = 100;  /* number of samples                  */
data SimReg(keep= SampleID i Y x:);
call streaminit(54321);
array x[&nCont];        /* explanatory variables are named x1-x&nCont */
 
/* 1. Specify model coefficients. You can hard-code values such as
array beta[0:&nCont] _temporary_ (-4 2 -1.33 1 -0.8 0.67 -0.57 0.5 -0.44 0.4 -0.36);
      or you can use a formula such as the following */
array beta[0:&nCont] _temporary_;
do j = 0 to &nCont;
   beta[j] = 4 * (-1)**(j+1) / (j+1);       /* formula for beta[j] */
end;
 
do i = 1 to &N;              /* for each observation in the sample */
   do j = 1 to &nCont;
      x[j] = rand("Normal"); /* 2. Simulate explanatory variables  */
   end;
 
   eta = beta[0];                       /* model = intercept term  */
   do j = 1 to &nCont;
      eta = eta + beta[j] * x[j];       /*     + sum(beta[j]*x[j]) */
   end;
 
   /* 5. simulate response for each sample */
   do SampleID = 1 to &NumSamples;      /* <== LOOP OVER SAMPLES   */
      epsilon = rand("Normal", 0, 1.5); /* 3. Specify error distrib*/
      Y = eta + epsilon;                /* 4. Y = model + error    */
      output;
   end;
end;
run;

The efficient way to analyzed simulated samples with SAS is to use BY-group processing. With By-group processing you can analyze all samples with a single procedure call. The following statements sort the data by the SampleID variable and call PROC REG to analyze all samples. The NOPRINT option ensures that the procedure does not spew out thousands of tables and graphs. (For procedures that do not support the NOPRINT option, there are other ways to turn off ODS when analyzing simulated data.) The OUTEST= option saves the parameter estimates for all samples to a SAS data set.

proc sort data=SimReg;
   by SampleID i;
run;
 
proc reg data=SimReg outest=PE NOPRINT;
   by SampleID;
   model y = x:;
quit;

The PE data set contains NumSamples rows. Each row contains the p parameter estimates for the analysis of one simulated sample. The distribution of estimates is an approximation to the true (theoretical) sampling distribution of the statistics. The following image visualizes the joint distribution of the estimates of four regression coefficients. You can see that the distribution of the estimates appears to be multivariate normal and centered at the values of the population parameters.

You can download the SAS program that simulates the data, analyzes it, and produces the graph. The program is very efficient. For 10,000 random samples of size N=50 that contain p=10 variables, it takes about one second to run the Monte Carlo simulation and analyses.

tags: Simulation, Statistical Programming

The post Simulate many samples from a linear regression model appeared first on The DO Loop.

2月 012017
 

SAS® Federation Server provides a central, virtual environment for administering and securing access to your data. It also allows you to combine data from multiple sources without moving or copying the data. SAS Federation Server Manager, a web-based application, is used to administer SAS Federation Server(s).

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three-part series, I will explore the topic of controlling data access using SAS Federation Server.

The series covers the following topics:

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. Note: Permissions applied for particular data source cannot be bypassed with SAS Federation Server security. If permissions are denied at the source data, for example on a table, then users will always be denied access to that table, no matter what permissions are set in SAS Federation Server.

In this blog post, I build on the example in my previous post and demonstrate how you can use SAS Federation Server Manager to control access to columns and rows in tables and views.

Previously, I gave the Finance Users group access to the SALARY table. Robert is a member of the Finance Users group, so he has access to the SALARY table; however, I want to restrict his access to the IDNUM column on the table. To do this, first I view the SALARY table Authorizations in Federation Server Manager, then I select the arrow to the right of the table name to view its columns.

Next, I select the IDNUM column. I then add the user Robert and set his SELECT permission to Deny for the column.

Note: There are 5 columns on the SALARY table.
Since he was denied access to the IDNUM column, Robert is only able to view 4 out of 5 columns.

Susan is also a member of the Finance Users group, so she has access to the SALARY table; however, I want to restrict her access to only rows where the JOBCODE starts with a “Q.” To do this, first I view the SALARY table Authorizations in Federation Server Manager.

Next, I select the Row Authorizations tab and select New Filter. I use the SQL Clause Builder to build my condition of JOBCODE LIKE Q%.

Next, I select the Users and Groups tab and add Susan to restrict her access to the filter I just created.

Finally, I select OK to save the changes I made to Row Authorizations.

Susan is now only able to view the rows of the SALARY table where the JOBCODE begins with “Q.”

In this blog entry, I covered the second part of this series on Securing sensitive data using SAS Federation Server at the row and column level:

Part 1: Securing sensitive data using SAS Federation Server at the data source level
Part 2: Securing sensitive data using SAS Federation Server at the row and column level
Part 3: Securing sensitive data using SAS Federation Server data masking

More information on SAS Federation Server:

tags: SAS Administrators, SAS Federation Server, SAS Professional Services

Securing sensitive data using SAS Federation Server at the row and column level was published on SAS Users.