032017
 

I will begin with a short story.

SAS Global Forum, Content is KingLike many employers, McDougall Scientific, my employer, requires its employees to review, with their co-workers and managers, what they learned at a conference or course. They are also asked to suggest applications of their learnings so that McDougall might realize value from the expense, both in time and money, of sending them to continuing education events.

Fei Wang, my co-worker, and I attended SAS Global Forum last year in Vegas. During her presentation to co-workers upon our return, Fei not only provided a comprehensive overview of the conference format, sessions, and learning opportunities, but she also chose one presentation to highlight that will fundamentally improve one of our business processes.

Although Fei attended many sessions and learned much, session 8480-2016, with thanks to Steven Black, will save McDougall enough time and money to dwarf the expenditure of sending Fei to SAS Global Forum.

“But John,” you might ask, “why not simply search the proceedings after the conference?” Well, because we would never think to search for CRF annotation automation. Innovation of this sort is more easily found by attending the conference. Discovering valuable nuggets like Steven’s idea is a common occurrence at SAS Global Forum.

The value that employers realize from SAS Global Forum is the reason “content is king,” a cliché first introduced by the magazine publishing industry in the mid-1970s.

Our speakers represent every region of the world!

Though there are a number of really great benefits from attending the conference, great content continues to reign supreme at SAS Global Forum.  This year’s conference is no different. The 2017 Content Advisory Team has assembled a stellar lineup of well over 600 sessions; invited speakers, contributed papers, hands-on workshops, tutorials and posters. And, I am very proud to report that 25 countries are contributing speakers this year, with every region of the world represented: North, Central, and South Africa, Europe, Australia, the Middle East, Asia and the Americas. This sort of global diversity brings new ideas and new ways of looking at and solving problems that really grows your knowledge and helps move your organization forward.

In addition to all of this great technical content, we have made special effort to organize sessions that help SAS Users better present their work. As Melissa Marshall famously claims, “Science not communicated is science not done.” Therefore, in keeping with the SAS Global Users Group’s mission to champion the needs of SAS users around the globe, here is a sampling of sessions that will help you better communicate.

The list starts with Melissa herself!

Present Your Science: Transforming Technical Talks
Session T108, Melissa Marshall, Principal, Melissa Marshall Consulting LLC

This versatile half-day workshop covers the full gamut: content strategy, slide design, and presentation delivery. With a dynamic combination of lecture, discussion, video analysis, and exercises, this workshop will truly transform how technical professionals present their work and will help foster a culture of improved communications throughout the SAS community.
Read More

How the British Broadcasting Corporation Uses Data to Tell Stories in a Visually Compelling Way
Session 0824, Amanda J Farnsworth, Head of Visual Journalism, BBC News

… data is often seen as a dry, detached, unemotional thing that's hard to understand and for many, easy to ignore. At the BBC, employees have been thinking hard about how to use data to tell stories in a visually compelling way that connects with audiences and makes them more curious about the world that we live in. And, there is an ever-increasing amount of data with which to tell those stories. Governments are publishing more big data sets about health, education, crime, and social makeup. Academics are generating huge amounts of data as a consequence of research. Businesses and other organizations conduct their own research and polling. The BBC’s aim is to take that data and make it relevant at a personal level, answering the audiences' number one question: what does this mean for me?
Read More

Convince Me: Constructing Persuasive Presentations
Session 0862, Frank Carillo, CEO and Anne Coffey, Senior Director, E.C.G. Inc.

Data outputs do not a persuasive argument make. Effective persuasion requires a combination of logic and emotion supported by facts. Statisticians dedicate their lives to analyzing data such that it is appropriate supporting evidence. While the appropriate evidence is essential to convince your listeners, you first have to be able to gain and maintain their attention and trust. Persuasive presentations fight for hearts and minds, and are not a dry, unbiased recitation of facts or analyses. This session is designed to provide suggestions for how to utilize successful structures and create emotional connections.
Read More

Data Visualization Best Practices: Practical Storytelling Using SAS®
Session T117, Greg S Nelson, CEO, Thotwave Technologies LLC.

Data means little without our ability to visually convey it. Whether building a business case to open a new office, acquiring customers, presenting research findings, forecasting or comparing the relative effectiveness of a program, we are crafting a story that is defined by the graphics that we use to tell it. Using practical, real-world examples, students will learn how to critically think about visualizations.
Read More

Presentations as Listeners Like Them: How to Tailor for an Audience
Session 0408, Frank Carillo, CEO and Anne Coffey, Senior Director, E.C.G. Inc.

Data doesn't speak for itself. We speak for it, and how we do that influences how people view and interpret that data. One of the most overlooked aspects of presenting data is analyzing the audience. At no point in history have speakers had to face such heterogeneous audiences as they do today: there might be many as five different generations in the room, cross-functional teams have broad areas of expertise, and international companies integrate different cultures and customs. This session is designed to teach attendees how to analyze not the data, but the listeners. Who is your audience? What is important to them? What is your message …?
Read More

tags: papers & presentations, SAS Global Forum

At SAS Global Forum, Content is King was published on SAS Users.

032017
 

It was just a few years ago that the idea of an Internet of Things (IoT) seemed far off, something out of a science-fiction movie. After all, why would a vehicle need to talk to the road?  Why would our utility meters need to talk to the central office? The […]

How can the Internet of Things help government agencies? was published on SAS Voices.

012017
 

Each day, the SAS Customer Contact Center participates in hundreds of interactions with customers, prospective customers, educators, students and the media. While the team responds to inbound calls, web forms, social media requests and emails, the live-chat sessions that occur on the corporate website make up the majority of these interactions.

The information contained in these chat transcripts can be a useful way to get feedback from customers and prospects. As a result, the contact center frequently asked by departments across the company what customers are saying about the company and its products – and what types of questions are asked.

The challenge

Chat transcripts are a source for measuring the relative happiness of those engaged with SAS. Using sentiment analysis, this information can help paint a more accurate picture of the health of customer relationships.

The live-chat feature includes an exit survey that provides some data including the visitor’s overall satisfaction with the chat agent and with SAS. While 13 percent of chat visitors complete the exit survey (which is above the industry average), that means thousands of chat sessions only have the transcript as a record of participant sentiment.

Analyzing chat transcripts often required the contact center to pore through the text to identify trends within the chat transcripts. With other, more pressing priorities, the manual review only provided some anecdotal information.

The approach

Performing more formal analytics using text information gets tricky due to the nature of text data. Text, unlike tabular data in databases or spreadsheets, is unstructured. There are no columns that dictate what bits of data go where. And, words can be assembled in nearly infinite combinations.

For the SAS team, however, the information contained within these transcripts were a valuable asset. Using text analytics, the team could start to uncover and understand trends and connections across thousands of chat sessions.

SAS turned to SAS Text Miner to conduct a more thorough analysis of the chat transcripts. The contact center worked with subject-matter experts across SAS to feed this text information into the analytics engine. The team used a variety of dimensions in the analysis:

  • Volume of the chat transcripts across different topics.
  • Web pages where the chat session originated.
  • Location of the customer.
  • Contact center agent who responded.
  • Duration of the chat session.
  • Products or initiatives mentioned within the text.

In addition, North Carolina State University’s Institute for Advanced Analytics began to use the chat data for a text analytics project focused on sentiment analysis. This partnership between the university and SAS helped students learn how to uncover trends in positive and negative sentiment across topics.

The results

After applying SAS text analytics to the chat data, the SAS contact center better understood the volume and type of inquiries and how they were being addressed. Often, the analysis could point areas on the corporate website that needed updates or improvements by tracking URLs for web pages that were the launch point for a chat.

Information from chat sessions also helped tune SAS’ strategy. After the announcement of Windows 10, the contact center received customer questions about the operating system, including some negative sentiment about a perceived lack of support. Based on this feedback, SAS released a statement to customers assuring them that Windows 10 was an integral part of the product roadmap.

The project with NC State University has also provided an opportunity for SAS and soon-to-be analytics professionals to continue and expand on the analysis of chat transcripts. They continue to look at the sentiment data and how it changes across different categories (products in use, duration of chat) to see if there are any trends to explore further.

Today, sentiment analysis feeds the training process for new chat agents and enables managers to highlight examples where an agent was able to turn a negative chat session into a positive resolution.

SAS Sentiment Analysis and SAS Text Analytics, combined with SAS Customer Intelligence solutions such as SAS Marketing Automation and SAS Real Time Decision Manager, allow marketing organizations like SAS to understand sentiment or emotion within text strings (chat, email, social, even voice to text) and use that information to inform sales, service, support and marketing efforts.

If you’d like to learn more about how to use SAS Sentiment Analysis to explore sentiment in electronic chat text, register for our SAS Sentiment Analysis course. And, the book, Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS, offers insights into SAS Text Miner capabilities and more.

==

Editor’s note: This post is part of a series excerpted from Adele Sweetwood’s book, The Analytical Marketer: How to Transform Your Marketing Organization. Each post is a real-world case study of how to improve your customers’ experience and optimize your marketing campaigns.

tags: Adele Sweetwood, contact center, live chat, SAS Text Miner, sentiment analysis, text analytics, The Analytical Marketer

Using chat transcripts to understand customer sentiment was published on Customer Intelligence Blog.

012017
 

In a previous article, I showed how to simulate data for a linear regression model with an arbitrary number of continuous explanatory variables. To keep the discussion simple, I simulated a single sample with N observations and p variables. However, to use Monte Carlo methods to approximate the sampling distribution of statistics, you need to simulate many samples from the same regression model.

This article shows how to simulate many samples efficiently. Efficient simulation is the main emphasis of my book Simulating Data with SAS. For a detailed discussion about simulating data from regression models, see chapters 11 and 12.

The SAS DATA step in my previous post contains four steps. To simulate multiple samples, put a DO loop around the steps that generate the error term and the response variable for each observation in the model. The following program modifies the previous program and creates a single data set that contains NumSamples (=100) samples. Each sample is identified by an ordinal variable named SampleID.

/* Simulate many samples from a  linear regression model */
%let N = 50;            /* N = sample size               */
%let nCont = 10;        /* p = number of continuous variables */
%let NumSamples = 100;  /* number of samples                  */
data SimReg(keep= SampleID i Y x:);
call streaminit(54321);
array x[&nCont];        /* explanatory variables are named x1-x&nCont */
 
/* 1. Specify model coefficients. You can hard-code values such as
array beta[0:&nCont] _temporary_ (-4 2 -1.33 1 -0.8 0.67 -0.57 0.5 -0.44 0.4 -0.36);
      or you can use a formula such as the following */
array beta[0:&nCont] _temporary_;
do j = 0 to &nCont;
   beta[j] = 4 * (-1)**(j+1) / (j+1);       /* formula for beta[j] */
end;
 
do i = 1 to &N;              /* for each observation in the sample */
   do j = 1 to &nCont;
      x[j] = rand("Normal"); /* 2. Simulate explanatory variables  */
   end;
 
   eta = beta[0];                       /* model = intercept term  */
   do j = 1 to &nCont;
      eta = eta + beta[j] * x[j];       /*     + sum(beta[j]*x[j]) */
   end;
 
   /* 5. simulate response for each sample */
   do SampleID = 1 to &NumSamples;      /* <== LOOP OVER SAMPLES   */
      epsilon = rand("Normal", 0, 1.5); /* 3. Specify error distrib*/
      Y = eta + epsilon;                /* 4. Y = model + error    */
      output;
   end;
end;
run;

The efficient way to analyzed simulated samples with SAS is to use BY-group processing. With By-group processing you can analyze all samples with a single procedure call. The following statements sort the data by the SampleID variable and call PROC REG to analyze all samples. The NOPRINT option ensures that the procedure does not spew out thousands of tables and graphs. (For procedures that do not support the NOPRINT option, there are other ways to turn off ODS when analyzing simulated data.) The OUTEST= option saves the parameter estimates for all samples to a SAS data set.

proc sort data=SimReg;
   by SampleID i;
run;
 
proc reg data=SimReg outest=PE NOPRINT;
   by SampleID;
   model y = x:;
quit;

The PE data set contains NumSamples rows. Each row contains the p parameter estimates for the analysis of one simulated sample. The distribution of estimates is an approximation to the true (theoretical) sampling distribution of the statistics. The following image visualizes the joint distribution of the estimates of four regression coefficients. You can see that the distribution of the estimates appears to be multivariate normal and centered at the values of the population parameters.

You can download the SAS program that simulates the data, analyzes it, and produces the graph. The program is very efficient. For 10,000 random samples of size N=50 that contain p=10 variables, it takes about one second to run the Monte Carlo simulation and analyses.

tags: Simulation, Statistical Programming

The post Simulate many samples from a linear regression model appeared first on The DO Loop.

012017
 

SAS® Federation Server provides a central, virtual environment for administering and securing access to your data. It also allows you to combine data from multiple sources without moving or copying the data. SAS Federation Server Manager, a web-based application, is used to administer SAS Federation Server(s).

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three-part series, I will explore the topic of controlling data access using SAS Federation Server.

The series covers the following topics:

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. Note: Permissions applied for particular data source cannot be bypassed with SAS Federation Server security. If permissions are denied at the source data, for example on a table, then users will always be denied access to that table, no matter what permissions are set in SAS Federation Server.

In this blog post, I build on the example in my previous post and demonstrate how you can use SAS Federation Server Manager to control access to columns and rows in tables and views.

Previously, I gave the Finance Users group access to the SALARY table. Robert is a member of the Finance Users group, so he has access to the SALARY table; however, I want to restrict his access to the IDNUM column on the table. To do this, first I view the SALARY table Authorizations in Federation Server Manager, then I select the arrow to the right of the table name to view its columns.

Next, I select the IDNUM column. I then add the user Robert and set his SELECT permission to Deny for the column.

Note: There are 5 columns on the SALARY table.
Since he was denied access to the IDNUM column, Robert is only able to view 4 out of 5 columns.

Susan is also a member of the Finance Users group, so she has access to the SALARY table; however, I want to restrict her access to only rows where the JOBCODE starts with a “Q.” To do this, first I view the SALARY table Authorizations in Federation Server Manager.

Next, I select the Row Authorizations tab and select New Filter. I use the SQL Clause Builder to build my condition of JOBCODE LIKE Q%.

Next, I select the Users and Groups tab and add Susan to restrict her access to the filter I just created.

Finally, I select OK to save the changes I made to Row Authorizations.

Susan is now only able to view the rows of the SALARY table where the JOBCODE begins with “Q.”

In this blog entry, I covered the second part of this series on Securing sensitive data using SAS Federation Server at the row and column level:

Part 1: Securing sensitive data using SAS Federation Server at the data source level
Part 2: Securing sensitive data using SAS Federation Server at the row and column level
Part 3: Securing sensitive data using SAS Federation Server data masking

More information on SAS Federation Server:

tags: SAS Administrators, SAS Federation Server, SAS Professional Services

Securing sensitive data using SAS Federation Server at the row and column level was published on SAS Users.

312017
 

You have all seen, or perhaps even created, some really bad graphics: Cluttered, confusing, too small, incomprehensible. Or worse, the author may have committed one of the three unforgivable sins of data visualization by deceptively distorting a map, truncating the axis so as to misrepresent the data, or used double […]

How to make your pie chart worse was published on SAS Voices.

302017
 

Recently, SAS shipped the fourth maintenance of SAS 9.4. Building on this foundation, SAS Studio reached a new milestone, its 3.6 release. All editions have been upgraded, including Personal, Basic and Enterprise. In this blog post, I want to highlight the new features that have been introduced. In subsequent posts I’ll discuss some of these features in more detail.

1  -  SAS Studio 3.6 includes many new features and enhancements, including:

2  -  new preferences to personalize even more of the SAS Studio user experience. In detail, it is now possible to:

  • control whether items in the navigation pane, such as libraries, files and folders, are automatically refreshed after running a program, task or query.

  • determine whether, at start up, SAS Studio attempts to restore the tabs that were open during the prior session, when it was last closed.

3  -  enhancements to the background submit feature (previously known as batch submit), with more control on the output and log files. SAS Studio 3.6 also enforces a new behavior: if the background SAS program is a FILE on the server and not an FTP reference, then the current working directory is automatically set to the directory where the code resides. This enables the use of relative paths in code to reference artifacts such as additional SAS code to include with “%include” statements (i.e. %include ./macros.sas), references to data files (i.e. libname data “.”;), or images to be included in ODS output.

4  -  ability to generate HTML graphs in the SVG format instead of the PNG format.

5  -  many new analytical tasks for power and sample size analysis, cluster analysis and network optimization.

Impressive new features to be sure, but that’s not all. Here’s a bonus feature that I personally find really interesting.

  • The navigation pane includes new categories, both in the code snippets section and in the task section, to streamline the integration between SAS 9.4 and SAS Viya. A new category of Viya Cloud Analytic Services code snippets helps you connect to SAS Viya and work with CAS tables. New Viya Machine Learning tasks enable you to run SAS code in a SAS Viya environment. You can do all this while working from your 9.4 environment.

tags: SAS Professional Services, sas studio

SAS Studio 3.6 new features was published on SAS Users.

302017
 

If you are a SAS programmer and use the GROUP= option in PROC SGPLOT, you might have encountered a thorny issue: if you use a WHERE clause to omit certain observations, then the marker colors for groups might change from one plot to another. This happens because the marker colors depend on the data by default. If you change the number of groups (or the order of groups), the marker colors also change.

A simple example demonstrates the problem. The following scatter plots are colored by the ORIGIN variable in the SasHelp.Cars data. (Click to enlarge.) The ORIGIN variable has three levels: Asia, Europe, and USA. On the left, all values of the ORIGIN variable are present in the graph. On the right, the Asian vehicles are excluded by using WHERE ORIGIN^="Asia". Notice that the colors of the markers on the right are not consistent with the values on the left.

Warren Kuhfeld wrote an excellent introduction to legend order and group attributes, and he describes several other ways that group colors can change from one plot to another. To solve this problem, Kuhfeld and other experts recommend that you create a discrete attribute map. A discrete attribute map is a SAS data set that specifies the colors to use for each group value. If you are not familiar with discrete attribute maps, I provide several references at the end of this article.


Automatically create a discrete attribute map for PROC SGPLOT #SASTip
Click To Tweet


Automatic creation of a discrete attribute map

The discrete attribute map is powerful, flexible, and enables the programmer to completely determine the legend order and color for all categories. However, I rarely use discrete attribute maps in my work because the process requires the manual creation of a data set. The data set has to contain all categories (spelled and capitalized correctly) and you have to remember (or look up) the structure of the data set. Furthermore, many examples use hard-coded color values such as CXFFAAAA or "LightBlue," whereas I prefer to use the GraphDatan style elements in the current ODS style.

However, I recently realized that PROC FREQ and the SAS DATA step can lessen the burden of creating a discrete attribute map. The documentation for the discrete attribute map mentions that you can define a column named MarkerStyleElement (or MarkerStyle), which specifies the names of styles elements such as GraphData1, GraphData2, and so on. Therefore, you can use PROC FREQ to write the category levels to a data set, and use a simple DATA step to add the MarkerStyleElement variable. For example, you can create a discrete attribute map for the ORIGIN variable, as follows:

/* semi-automatic way to create a DATTRMAP= data set */
%let VarName = Origin;           /* specify name of grouping variable */
proc freq data=sashelp.cars ORDER=FORMATTED;   /* or ORDER=DATA|FREQ  */
   tables &VarName / out=Attrs(rename=(&VarName=Value));
run;
 
data DAttrs;
ID = "&VarName";                 /* or "ID_&VarName" */
set Attrs(keep=Value);
length MarkerStyleElement $11.;
MarkerStyleElement = cats("GraphData", 1+mod(_N_-1, 12)); /* GraphData1, GraphData2, etc */
run;
 
proc print; run;
Structure of discrete attribute maps for DATTRMAP= option in PROC SGPLOT

Voila! The result is a valid discrete attribute data set for the ORIGIN variable. The DATTRS data set contains all the information you need to ensure that the first category is always displayed by using the GraphData1 element, the second category is displayed by using GraphData2, and so on. The program does not require that you manually type the categories or even know how many categories there are. Obviously, you could write a macro that makes it easy to generate these statements.

This data set uses the alphabetical order of the formatted values to determine the group order. However, you can use the ORDER=DATA option in PROC FREQ to order by the order of categories in the data set. You can also use the ORDER=FREQ option to order by the most frequent categories. Because most SAS-supplied styles define 12 style elements, the MOD function is used to handle categorical variable that have more than 12 levels.

Use the discrete attribute map

To use the discrete attribute map, you need to specify the DATTRMAP= option on the PROC SGPLOT statement. You also need to specify the ATTRID= option on every SGPLOT statements that will use the map. Notice that I set the value of the ID variable to be the name of the GROUP= variable. (If that is confusing, you could choose a different value, as noted in the comments of the program.) The following statements are similar to the statements that create the right-hand graph at the top of this article, except this call to PROC SGPLOT uses the DATTRS discrete attribute map:

proc sgplot data=sashelp.cars DATTRMAP=DAttrs;
where origin^='Asia' && type^="Hybrid";
   scatter x=weight y=mpg_city / group=Origin ATTRID=Origin 
                                 markerattrs=(symbol=CircleFilled);
   keylegend / location=inside position=TopRight across=1;
run;
Markers colored by attributes specified in a discrete attribute data set, using PROC SGPLOT and the DATTRMAP= option

Notice that the colors in this scatter plot are the same as for the left-hand graph at the top of this article. The group colors are now consistent, even though the number of groups is different.

Generalizing the automatic creation of a discrete attribute map

The previous section showed how to create a discrete attribute map for one variable. You can use a similar approach to automatically create a discrete data map that contains several variables. The main steps are as follows:

  1. Use ODS OUTPUT to save the OneWayFreqs tables from PROC FREQ to a SAS data set.
  2. Use the SUBSTR function to extract the variable name into the ID variable.
  3. Use the COALESCEC function to form a Value column that contains the values of the categorical variables.
  4. Use BY-group processing and the UNSORTED option to assign the style elements GraphDatan.
ods select none;
proc freq data=sashelp.cars;
   tables Type Origin;        /* specify VARS here */ 
   ods output OneWayFreqs=Freqs;
run;
ods select all;
 
data Freqs2;
set Freqs;
length ID $32.;
ID = substr(Table, 6);        /* original values are "Table VarName" */
Value = COALESCEC(F_Type, F_Origin);  /* also specify F_VARS here */
keep ID Value;
run;
 
data DAttrs(drop=count);
set Freqs2;
length MarkerStyleElement $11.;
by ID notsorted;
if first.ID then count = 0;
count + 1;
MarkerStyleElement = cats("GraphData", 1 + mod(count-1, 12));
run;

The preceding program is not completely general, but it shows the main ideas. You can adapt the program to your own data. If you are facile with the SAS macro language, you can even write a macro that generates appropriate code for an arbitrary number of variables. Leave a comment if this technique proves useful in your work or if you have ideas for improving the technique.

References

tags: Statistical Graphics, Statistical Programming

The post Automate the creation of a discrete attribute map appeared first on The DO Loop.