Tech

5月 302018
 

SAS Enterprise Miner has been a leader in data mining and modeling for over 20 years. The system offers over 80 different nodes that help users analyze, score and model their data. With a wide range of functionalities, there can be a number of different ways to produce the results you want.

At SAS® Global Forum 2018, Principal Systems Engineer Melodie Rush spoke about her experience with SAS® Enterprise Miner™, and compiled a list of hints that she believe will help users of all levels. This article previews her full presentation, Top 10 Tips for SAS Enterprise Miner Based on 20 Years’ Experience. The paper includes images and further details of each of the tips noted below; I’d encourage you to check it out to learn more.

Top Ten Tips for Enterprise Miner

Tip 1: How to find the node you’re looking for

If you struggle finding the node that best fits what you need, there’s a system that can simplify it.

Nodes are organized by Sample, Explore, Modify, Model, and Assess. Find which of these best describes what you are trying to do, and scroll across each node alphabetically for a description.

Tip 2: Add node from diagram workspace

Double click any node on the toolbar to see its properties. An example of the results this presents are shown below:

Top Ten Tips for Enterprise Miner

Tip 3: Clone a process flow

Highlight process flow by dragging your mouse across, right-click or CTRL+C, and Paste or CTRL+V where you want to insert process flow.

Tip 4: New features

  • There’s a new tab, HPDM (High-Performance Data Mining), which contains several new nodes that cover data mining and machine learning algorithms.
  • There are two new nodes under Utility that incorporate Open Source and SAS Viya.
  • The Open Source Integration node allows you to use R language code in SAS Enterprise Miner diagrams.
  • A SAS Viya Code node now incorporates code that will be used in SAS Viya and CAS, and algorithms from SAS Visual Data Mining and Machine Learning.
  • To save and share your results, there are now the Register Model and Save Data nodes under Utility.
  • You can now register models to the SAS Metadata Server to score or compare easily.
  • A Save Data node lets you save training, validation, test, score, or transaction data as SAS, JMP, Excel, CSV or tab-delimited files.

Tip 5: The unknown node

The reporter node under Utility allows you to easily document your Enterprise Miner process flow diagrams. A .pdf or .rtf is created with an image of the process flow.

Tip 6: The node that changes everything

The Metadata node, on the Utility tab, allows you to change metadata information and values in your diagram. You also can capture settings to then apply to data in another diagram.

Tip 7: How to generate a scorecard

A scorecard emphasizes what variables and values from your model are important. Values are reported on a 0 to 1,000 scale, with the higher being more likely the event you’re measuring occurs. To do this, have the Reporter node follow a Score node, and then change the Nodes property to Summary under Reporter node properties.

Tip 8: How to override the 512 level limit

If faced with the error message, “Maximum target levels of 512 exceeded,” your input is resulting in more than 512 distinct results. To get around this, you need to change EM_TRAIN_MAXLEVELS to another value. To do so, either change the macro value in properties

or change the macro value in project start code.

Tip 9: Which variable selection method should I use?

Instead of choosing just one variable selection method, you can combine different ones such as Decision Trees, Forward, Chi-Square, and others. The results can be combined using different selection properties, such as None (no changes made from original metadata), Any (reject a variable if any previous variable selection nodes reject it), All (reject a variable if all of the previous variable selection nodes reject it), and Majority (reject a variable if the majority of the variable selection nodes reject it).

Tip 10: Interpreting neural network

Decision trees can be produced to interpret networks, by changing the Prediction variable to be your Target and the Target variable to be rejected.

Conclusion

With so many options to create models that best suit your preferences, these tips will help sharpen your focus and allow you to use SAS Enterprise Miner more efficiently and effectively. This presentation was one in a series of talks on Enterprise Miner tool presented at SAS® Global Forum 2018.

Additional Resources

SAS Enterprise Miner
SAS Enterprise Learning Tutorials
Getting Started With SAS Enterprise Miner Tutorial Videos

Additional SAS Enterprise Miner talks from Global Forum 2018

A Case Study of Mining Social Media Data for Disaster Relief: Hurricane Irma
Bogdan Gadidov, Linh Le, Analytics and Data Science Institute, Kennesaw State University

A Study of Modelling Approaches for Predicting Dropout in a Business College
Xuan Wang, Helmut Schneider, Louisiana State University

Analysis of Nokia Customer Tweets with SAS® Enterprise Miner™ and SAS® Sentiment Analysis Studio
Vaibhav Vanamala MS in Business Analytics, Oklahoma State University

Analysis of Unstructured Data: Topic Mining & Predictive Modeling using Text
Ravi Teja Allaparthi

Association Rule Mining of Polypharmacy Drug Utilization Patterns in Health Care Administrative Data Using SAS® Enterprise Miner™
Dingwei Dai, Chris Feudtner, The Children’s Hospital of Philadelphia

Bayesian Networks for Causal Analysis
Fei Wang and John Amrhein, McDougall Scientific Ltd.

Classifying and Predicting Spam Messages Using Text Mining in SAS® Enterprise Miner™
Mounika Kondamudi, Oklahoma State University

Image Classification Using SAS® Enterprise Miner 14.1

Model-Based Fiber Network Expansion Using SAS® Enterprise Miner™ and SAS® Visual Analytics
Nishant Sharma, Charter Communications

Monte Carlo K-Means Clustering SAS Enterprise Miner
Donald K. Wedding, PhD Director of Data Science Sprint Corporation

Retail Product Bundling – A new approach
Bruno Nogueira Carlos, Youman Mind Over Data

Using Market Basket Analysis in SAS® Enterprise MinerTM to Make Student Course Enrollment Recommendations
Shawn Hall, Aaron Osei, and Jeremiah McKinley, The University of Oklahoma

Using SAS® Enterprise Miner for Categorization of Customer Comments to Improve Services at USPS
Olayemi Olatunji, United States Postal Service Office of Inspector General

Top 10 tips for SAS Enterprise Miner based on 20 years’ experience was published on SAS Users.

5月 302018
 

SAS Enterprise Miner has been a leader in data mining and modeling for over 20 years. The system offers over 80 different nodes that help users analyze, score and model their data. With a wide range of functionalities, there can be a number of different ways to produce the results you want.

At SAS® Global Forum 2018, Principal Systems Engineer Melodie Rush spoke about her experience with SAS® Enterprise Miner™, and compiled a list of hints that she believe will help users of all levels. This article previews her full presentation, Top 10 Tips for SAS Enterprise Miner Based on 20 Years’ Experience. The paper includes images and further details of each of the tips noted below; I’d encourage you to check it out to learn more.

Top Ten Tips for Enterprise Miner

Tip 1: How to find the node you’re looking for

If you struggle finding the node that best fits what you need, there’s a system that can simplify it.

Nodes are organized by Sample, Explore, Modify, Model, and Assess. Find which of these best describes what you are trying to do, and scroll across each node alphabetically for a description.

Tip 2: Add node from diagram workspace

Double click any node on the toolbar to see its properties. An example of the results this presents are shown below:

Top Ten Tips for Enterprise Miner

Tip 3: Clone a process flow

Highlight process flow by dragging your mouse across, right-click or CTRL+C, and Paste or CTRL+V where you want to insert process flow.

Tip 4: New features

  • There’s a new tab, HPDM (High-Performance Data Mining), which contains several new nodes that cover data mining and machine learning algorithms.
  • There are two new nodes under Utility that incorporate Open Source and SAS Viya.
  • The Open Source Integration node allows you to use R language code in SAS Enterprise Miner diagrams.
  • A SAS Viya Code node now incorporates code that will be used in SAS Viya and CAS, and algorithms from SAS Visual Data Mining and Machine Learning.
  • To save and share your results, there are now the Register Model and Save Data nodes under Utility.
  • You can now register models to the SAS Metadata Server to score or compare easily.
  • A Save Data node lets you save training, validation, test, score, or transaction data as SAS, JMP, Excel, CSV or tab-delimited files.

Tip 5: The unknown node

The reporter node under Utility allows you to easily document your Enterprise Miner process flow diagrams. A .pdf or .rtf is created with an image of the process flow.

Tip 6: The node that changes everything

The Metadata node, on the Utility tab, allows you to change metadata information and values in your diagram. You also can capture settings to then apply to data in another diagram.

Tip 7: How to generate a scorecard

A scorecard emphasizes what variables and values from your model are important. Values are reported on a 0 to 1,000 scale, with the higher being more likely the event you’re measuring occurs. To do this, have the Reporter node follow a Score node, and then change the Nodes property to Summary under Reporter node properties.

Tip 8: How to override the 512 level limit

If faced with the error message, “Maximum target levels of 512 exceeded,” your input is resulting in more than 512 distinct results. To get around this, you need to change EM_TRAIN_MAXLEVELS to another value. To do so, either change the macro value in properties

or change the macro value in project start code.

Tip 9: Which variable selection method should I use?

Instead of choosing just one variable selection method, you can combine different ones such as Decision Trees, Forward, Chi-Square, and others. The results can be combined using different selection properties, such as None (no changes made from original metadata), Any (reject a variable if any previous variable selection nodes reject it), All (reject a variable if all of the previous variable selection nodes reject it), and Majority (reject a variable if the majority of the variable selection nodes reject it).

Tip 10: Interpreting neural network

Decision trees can be produced to interpret networks, by changing the Prediction variable to be your Target and the Target variable to be rejected.

Conclusion

With so many options to create models that best suit your preferences, these tips will help sharpen your focus and allow you to use SAS Enterprise Miner more efficiently and effectively. This presentation was one in a series of talks on Enterprise Miner tool presented at SAS® Global Forum 2018.

Additional Resources

SAS Enterprise Miner
SAS Enterprise Learning Tutorials
Getting Started With SAS Enterprise Miner Tutorial Videos

Additional SAS Enterprise Miner talks from Global Forum 2018

A Case Study of Mining Social Media Data for Disaster Relief: Hurricane Irma
Bogdan Gadidov, Linh Le, Analytics and Data Science Institute, Kennesaw State University

A Study of Modelling Approaches for Predicting Dropout in a Business College
Xuan Wang, Helmut Schneider, Louisiana State University

Analysis of Nokia Customer Tweets with SAS® Enterprise Miner™ and SAS® Sentiment Analysis Studio
Vaibhav Vanamala MS in Business Analytics, Oklahoma State University

Analysis of Unstructured Data: Topic Mining & Predictive Modeling using Text
Ravi Teja Allaparthi

Association Rule Mining of Polypharmacy Drug Utilization Patterns in Health Care Administrative Data Using SAS® Enterprise Miner™
Dingwei Dai, Chris Feudtner, The Children’s Hospital of Philadelphia

Bayesian Networks for Causal Analysis
Fei Wang and John Amrhein, McDougall Scientific Ltd.

Classifying and Predicting Spam Messages Using Text Mining in SAS® Enterprise Miner™
Mounika Kondamudi, Oklahoma State University

Image Classification Using SAS® Enterprise Miner 14.1

Model-Based Fiber Network Expansion Using SAS® Enterprise Miner™ and SAS® Visual Analytics
Nishant Sharma, Charter Communications

Monte Carlo K-Means Clustering SAS Enterprise Miner
Donald K. Wedding, PhD Director of Data Science Sprint Corporation

Retail Product Bundling – A new approach
Bruno Nogueira Carlos, Youman Mind Over Data

Using Market Basket Analysis in SAS® Enterprise MinerTM to Make Student Course Enrollment Recommendations
Shawn Hall, Aaron Osei, and Jeremiah McKinley, The University of Oklahoma

Using SAS® Enterprise Miner for Categorization of Customer Comments to Improve Services at USPS
Olayemi Olatunji, United States Postal Service Office of Inspector General

Top 10 tips for SAS Enterprise Miner based on 20 years’ experience was published on SAS Users.

5月 222018
 

SAS ViyaSAS Viya Presentations is our latest extension of the SAS Platform and interoperable with SAS® 9.4. Designed to enable analytics to the enterprise, it seamlessly scales for data of any size, type, speed and complexity. It was also a star at this year’s SAS Global Forum 2018. In this series of articles, we will review several of the most interesting SAS Viya talks from the event. Our first installment reviews Hadley Christoffels’ talk, A Need For Speed: Loading Data via the Cloud.

You can read all the articles in this series or check out the individual interviews by clicking on the titles below:
Part 1: Technology that gets the most from the Cloud.


Technology that gets the most from the Cloud

Few would argue about the value the effective use of data can bring an organization. Advancements in analytics, particularly in areas like artificial intelligence and machine learning, allow organizations to analyze more complex data and deliver faster, more accurate results.

However, in his SAS Global Forum 2018 paper, A Need For Speed: Loading Data via the Cloud, Hadley Christoffels, CEO of Boemska, reminded the audience that 80% of an analyst’s time is still spent on the data. Getting insight from your data is where the magic happens, but the real value of powerful analytical methods like artificial intelligence and machine learning can only be realized when “you shorten the load cycle the quicker you get to value.”

Data Management is critical and still the most common area of investment in analytical software, making data management a primary responsibility of today’s data scientist. “Before you can get to any value the data has to be collected, has to be transformed, has to be enriched, has to be cleansed and has to be loaded before it can be consumed.”

Benefits of cloud adoption

The cloud can help, to a degree. According to Christoffels, “cloud adoption has become a strategic imperative for enterprises.” The advantages of moving to a cloud architecture are many, but the two greatest are elasticity and scalability.

Elasticity, defined by Christoffels, allows you to dynamically provision or remove virtual machines (VM), while scalability refers to increasing or decreasing capacity within existing infrastructure by scaling vertically, moving the workload to a bigger or smaller VM, or horizontally, by provisioning additional VM’s and distributing the application load between them.

“I can stand up VMs in a matter of seconds, I can add more servers when I need it, I can get a bigger one when I need it and a smaller one when I don’t, but, especially when it comes to horizontal scaling, you need technology that can make the most of it.” Cloud-readiness and multi-threaded processing make SAS® Viya® the perfect tool to take advantage of the benefits of “clouding up.”

SAS® Viya® can addresses complex analytical challenges and speed up data management processes. “If you have software that can only run on a single instance, then scaling horizontally means nothing to you because you can’t make use of that multi-threaded, parallel environment. SAS Viya is one of those technologies,” Christoffels said.

Challenges you need to consider

According to Christoffels, it’s important, when moving your processing to the cloud, that you understand and address existing performance challenges and whether it will meet your business needs in an agile manner. Inefficiencies on-premise are annoying; inefficiencies in the cloud are annoying and costly, since you pay for that resource.

It’s not the best use of the architecture to take what you have on premise and just shift it. “Finding and improving and eliminating inefficiencies is a massive part in cutting down the time data takes to load.”

Boemska, Christoffels’ company, has tools to help businesses find inefficiencies and understand the impact users have on the environment, including:

  1. Real-time diagnostics looking at CPU Usage, Memory Usage, SAS Workload, etc.
  2. Insight and comparison provides a historic view in a certain timeframe, essential when trying to optimize and shave off costly time when working in cloud.
  3. Utilization reports to better understand how the platform is used.

Optimizing inefficiencies with SAS Viya

But scaling vertically and horizontally from cloud-based infrastructure to speed the loading and data management process solves only part of the problem. Christoffels said SAS Viya capabilities completes the picture. SAS Viya offers a number of benefits in a Cloud infrastructure, Christoffels said. Code amendments that make use of the new techniques and benefits now available in SAS Viya, such as the multi-threaded DATA step or CAS Action Sets, can be extremely powerful.

One simple example of the benefits of SAS Viya, Christoffels said, is that with in-memory processing, PROC SORT is a procedure that’s no longer needed; SAS Viya does “grouping on the fly,” meaning you can remove sort routines from existing programs, which of itself, can cut down processing time significantly.

As a SAS Programmer, just the fact that SAS Viya can run multithreaded, the fact that you don’t have to do these sorts, the way it handles grouping on the fly, the fact that multithreaded nature and capability is built into how you deal with tables are all “significant,” according to Christoffels.

Conclusion

Data preparation and load processes have a direct impact on how applications can begin and subsequently complete. Many organizations are using the Cloud platform to speed up the process, but to take full advantage of the infrastructure you have to apply the right software technology. SAS Viya enables the full realization of Cloud benefits through performance improvements, such as the transposing of data and the transformation of data using the DATA step or CAS Action Sets.

Additional Resources

SAS Global Forum Video: A Need For Speed: Loading Data via the Cloud
SAS Global Forum 2018 Paper: A Need For Speed: Loading Data via the Cloud
SAS Viya
SAS Viya Products


Read all the posts in this series.

Part 1: Technology that gets the most from the Cloud

Technology that gets the most from the Cloud was published on SAS Users.

5月 192018
 

How to change your working directory for SASRegardless of the environment in which you run SAS (whether it is SAS® Foundation, SAS® Studio, or SAS® Enterprise Guide®), SAS uses a default location on your host system as a working directory. When you do not specify the use of a different directory within your code, the default location is where SAS stores output.

Beginning with SAS® 9.4 TS1M4, you can use a new DATA step function, DLGCDIR, to change the location for your working directory. You can use this function in Microsoft Windows or UNIX/Linux environments.

Make sure that any directory that you specify with the DLGCDIR function is an existing directory that you have Write or Update access to.

Finding Out What Your Current Directory Is

To determine what your current working directory in SAS is, submit the following code:

   data _null_;
      rc=dlgcdir();
      put rc=;
   run;

Changing Your Windows Directory

The following sample code for Windows sets the working directory in SAS as the TEMP folder on your C: drive:

   data _null_; 
      rc=dlgcdir("c:\temp");
      put rc=;
   run;

Changing Your Linux Directory

This sample code (for a Linux environment) changes the working directory in SAS to /u/your/linux/directory:

   data _null_;
      rc=dlgcdir("/u/your/linux/directory");
      put rc=;
   run;

Changing Your Directory: Other Tips

The DLGCDIR function temporarily changes the working directory for the current SAS or client session. However, you can create an autoexec file that contains the DATA step code that uses the DLGCDIR function. The autoexec file then executes the code each time you invoke SAS.

In most situations, it is still recommended that you specify the intended target directory for the Output Delivery System (ODS) and in other SAS statements. For example, when you use the ODS HTML statement, you should specify the target directory with the PATH option, as shown here:

   ods html path="c:\temp" (url=none) file="sasoutput.html";

Similarly, with the ODS PDF statement, you should specify the target directory with the FILE option, as shown here:

   ods pdf file="c:\temp\sasoutput.pdf";

I hope you've found this post helpful.

How to change your working directory for SAS® with the DLGCDIR DATA step function was published on SAS Users.

5月 082018
 

SAS reporting tools for GDPR and other privacy protection lawThe European Union’s General Data Protection Regulation (GDPR) taking effect on 25 May 2018 pertains not only to organizations located within the EU; it applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

If the GDPR acronym does not mean much to you, think of the one that does – HIPAA, FERPA, COPPA, CIPSEA, or any other that is relevant to your jurisdiction – this blog post is equally applicable to all of them.

The GDPR prohibits personal data processing revealing such individual characteristics as race or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, and data concerning a natural person’s sex life or sexual orientation. It also has special rules for data relating to criminal convictions or offenses and the processing of children’s personal data.

Whenever SAS users produce reports on demographic data, there is always a risk of inadvertently revealing personal data protected by law, especially when reports are generated automatically or interactively via dynamic data queries. Even for aggregate reports there is a high potential for such exposure.

Suppose you produce an aggregate cross-tabulation report on a small demographic group, representing a count distribution by students’ grade and race. It is highly probable that you can get the count of 1 for some cells in the report, which will unequivocally identify persons and thus disclose their education record (grade) by race. Even if the count is not equal to 1, but is equal to some other small number, there is still a risk of possible deducing or disaggregating of Personally Identifiable Information (PII) from surrounding data (other cells, row and column totals) or related reports on that small demographic group.

The following are the four selected SAS tools that allow you to take care of protecting personal data in SAS reports by suppressing counts in small demographic group reports.

1. Automatic data suppression in SAS reports

This blog post explains the fundamental concepts of data suppression algorithms. It takes you behind the scenes of the iterative process of complementary data suppression and walks you through SAS code implementing a primary and secondary complementary suppression algorithm. The suppression code uses BASE SAS – DATA STEPs, SAS macros, PROC FORMAT, PROC MEANS, and PROC REPORT.

2. Implementing Privacy Protection-Compliant SAS® Aggregate Reports

This SAS Global Forum 2018 paper solidifies and expands on the above blog post. It walks you through the intricate logic of an enhanced complementary suppression process, and demonstrates SAS coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization responsible for compliance with privacy protection.

3. In SAS Visual Analytics you can create derived data items that are aggregated measures.  SAS Visual Analytics 8.2 on SAS Viya introduces a new Type for the aggregated measures derived data items called Data Suppression. Here is an excerpt from the documentation on the Data Suppression type:

“Obscures aggregated data if individual values could easily be inferred. Data suppression replaces all values for the measure on which it is based with asterisk characters (*) unless a value represents the aggregation of a specified minimum number of values. You specify the minimum in the Suppress data if count less than parameter. The values are hidden from view, but they are still present in the data query. The calculation of totals and subtotals is not affected.

Some additional values might be suppressed when a single value would be suppressed from a subgroup. In this case, an additional value is suppressed so that the suppressed value cannot be inferred from totals or subtotals.

A common use of suppressed data is to protect the identity of individuals in aggregated data when some crossings are sparse. For example, if your data contains testing scores for a school district by demographics, but one of the demographic categories is represented only by a single student, then data suppression hides the test score for that demographic category.

When you use suppressed data, be sure to follow these best practices:

  • Never use the unsuppressed version of the data item in your report, even in filters and ranks. Consider hiding the unsuppressed version in the Data pane.
  • Avoid using suppressed data in any object that is the source or target of a filter action. Filter actions can sometimes make it possible to infer the values of suppressed data.
  • Avoid assigning hierarchies to objects that contain suppressed data. Expanding or drilling down on a hierarchy can make it possible to infer the values of suppressed data.”

This Data Suppression type functionality is significant as it represents the first such functionality embedded directly into a SAS product.

4. Is it sensitive? Mask it with data suppression

This blog post provides an example of using the above Data Suppression type aggregated measures derived data items in SAS Visual Analytics.

We need your feedback!

We want to hear from you.  Is this blog post useful? How do you comply with GDPR (or other Privacy Law of your jurisdiction) in your organization? What SAS privacy protection features would you like to see in future SAS releases?

SAS tools for GDPR privacy compliant reporting was published on SAS Users.

5月 082018
 

SAS reporting tools for GDPR and other privacy protection lawThe European Union’s General Data Protection Regulation (GDPR) taking effect on 25 May 2018 pertains not only to organizations located within the EU; it applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

If the GDPR acronym does not mean much to you, think of the one that does – HIPAA, FERPA, COPPA, CIPSEA, or any other that is relevant to your jurisdiction – this blog post is equally applicable to all of them.

The GDPR prohibits personal data processing revealing such individual characteristics as race or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, and data concerning a natural person’s sex life or sexual orientation. It also has special rules for data relating to criminal convictions or offenses and the processing of children’s personal data.

Whenever SAS users produce reports on demographic data, there is always a risk of inadvertently revealing personal data protected by law, especially when reports are generated automatically or interactively via dynamic data queries. Even for aggregate reports there is a high potential for such exposure.

Suppose you produce an aggregate cross-tabulation report on a small demographic group, representing a count distribution by students’ grade and race. It is highly probable that you can get the count of 1 for some cells in the report, which will unequivocally identify persons and thus disclose their education record (grade) by race. Even if the count is not equal to 1, but is equal to some other small number, there is still a risk of possible deducing or disaggregating of Personally Identifiable Information (PII) from surrounding data (other cells, row and column totals) or related reports on that small demographic group.

The following are the four selected SAS tools that allow you to take care of protecting personal data in SAS reports by suppressing counts in small demographic group reports.

1. Automatic data suppression in SAS reports

This blog post explains the fundamental concepts of data suppression algorithms. It takes you behind the scenes of the iterative process of complementary data suppression and walks you through SAS code implementing a primary and secondary complementary suppression algorithm. The suppression code uses BASE SAS – DATA STEPs, SAS macros, PROC FORMAT, PROC MEANS, and PROC REPORT.

2. Implementing Privacy Protection-Compliant SAS® Aggregate Reports

This SAS Global Forum 2018 paper solidifies and expands on the above blog post. It walks you through the intricate logic of an enhanced complementary suppression process, and demonstrates SAS coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization responsible for compliance with privacy protection.

3. In SAS Visual Analytics you can create derived data items that are aggregated measures.  SAS Visual Analytics 8.2 on SAS Viya introduces a new Type for the aggregated measures derived data items called Data Suppression. Here is an excerpt from the documentation on the Data Suppression type:

“Obscures aggregated data if individual values could easily be inferred. Data suppression replaces all values for the measure on which it is based with asterisk characters (*) unless a value represents the aggregation of a specified minimum number of values. You specify the minimum in the Suppress data if count less than parameter. The values are hidden from view, but they are still present in the data query. The calculation of totals and subtotals is not affected.

Some additional values might be suppressed when a single value would be suppressed from a subgroup. In this case, an additional value is suppressed so that the suppressed value cannot be inferred from totals or subtotals.

A common use of suppressed data is to protect the identity of individuals in aggregated data when some crossings are sparse. For example, if your data contains testing scores for a school district by demographics, but one of the demographic categories is represented only by a single student, then data suppression hides the test score for that demographic category.

When you use suppressed data, be sure to follow these best practices:

  • Never use the unsuppressed version of the data item in your report, even in filters and ranks. Consider hiding the unsuppressed version in the Data pane.
  • Avoid using suppressed data in any object that is the source or target of a filter action. Filter actions can sometimes make it possible to infer the values of suppressed data.
  • Avoid assigning hierarchies to objects that contain suppressed data. Expanding or drilling down on a hierarchy can make it possible to infer the values of suppressed data.”

This Data Suppression type functionality is significant as it represents the first such functionality embedded directly into a SAS product.

4. Is it sensitive? Mask it with data suppression

This blog post provides an example of using the above Data Suppression type aggregated measures derived data items in SAS Visual Analytics.

We need your feedback!

We want to hear from you.  Is this blog post useful? How do you comply with GDPR (or other Privacy Law of your jurisdiction) in your organization? What SAS privacy protection features would you like to see in future SAS releases?

SAS tools for GDPR privacy compliant reporting was published on SAS Users.

4月 212018
 

Have you ever been working in the macro facility and needed a macro function, but you could not locate one that would achieve your task? With the %SYSFUNC macro function, you can access most SAS® functions. In this blog post, I demonstrate how %SYSFUNC can help in your programming needs when a macro function might not exist. I also illustrate the formatting feature that is built in to %SYSFUNC. %SYSFUNC also has a counterpart called %QSYSFUNC that masks the returned value, in case special characters are returned.
%SYSFUNC enables the execution of SAS functions and user-written functions, such as those created with the FCMP procedure. Within the DATA step, arguments to the functions require quotation marks, but because %SYSFUNC is a macro function, you do not enclose the arguments in quotation marks. The examples here demonstrate this.

%SYSFUNC has two possible arguments. The first argument is the SAS function, and the second argument (which is optional) is the format to be applied to the value returned from the function. Suppose you had a report and within the title you wanted to issue today’s date in word format:

   title "Today is %sysfunc(today(),worddate20.)";

The title appears like this:

   "Today is               July 4, 2018"

Because the date is right-justified, there are leading blanks before the date. In this case, you need to introduce another function to remove the blank spaces. Luckily %SYSFUNC enables the nesting of functions, but each function that you use must have its own associated %SYSFUNC. You can rewrite the above example by adding the STRIP function to remove any leading or trailing blanks in the value:

   title "Today is %sysfunc(strip(%sysfunc(today(),worddate20.)))";

The title now appears like this:

    "Today is July 4, 2018"

The important thing to notice is the use of two separate functions. Each function is contained within its own %SYSFUNC.

Suppose you had a macro variable that contained blank spaces and you wanted to remove them. There is no macro COMPRESS function that removes all blanks. However, with %SYSFUNC, you have access to one. Here is an example:

   %let list=a    b    c; 
   %put %sysfunc(compress(&list));

The value that is written to the log is as follows:

   abc

In this last example, I use %SYSFUNC to work with SAS functions where macro functions do not exist.

The example checks to see whether an external file is empty. It uses the following SAS functions: FILEEXIST, FILENAME, FOPEN, FREAD, FGET, and FCLOSE. There are other ways to accomplish this task, but this example illustrates the use of SAS functions within %SYSFUNC.

   %macro test(outf);
   %let filrf=myfile;
 
   /* The FILEEXIST function returns a 1 if the file exists; else, a 0
   is returned. The macro variable &OUTF resolves to the filename
   that is passed into the macro. This function is used to determine
   whether the file exists. In this case you want to find the file
   that is contained within &OUTF. Notice that there are no quotation
   marks around the argument, as you will see in all cases below. If
   the condition is false, the %ELSE portion is executed, and a
   message is written to the log stating that the file does not
   exist.*/
 
   %if %sysfunc(fileexist(&outf)) %then %do;
 
   /* The FILENAME function returns 0 if the operation was successful; 
   else, a nonzero is returned. This function can assign a fileref
   for the external file that is located in the &OUTF macro 
   variable. */
 
   %let rc=%sysfunc(filename(filrf,&outf));
 
   /* The FOPEN function returns 0 if the file could not be opened; 
   else, a nonzero is returned. This function is used to open the
   external file that is associated with the fileref from &FILRF. */
 
   %let fid=%sysfunc(fopen(&filrf));
 
   /* The %IF macro checks to see whether &FID has a value greater
   than zero, which means that the file opened successfully. If the
   condition is true, we begin to read the data in the file. */
 
   %if &fid > 0 %then %do;
 
   /* The FREAD function returns 0 if the read was successful; else, a
   nonzero is returned. This function is used to read a record from
   the file that is contained within &FID. */
 
   %let rc=%sysfunc(fread(&fid));
 
   /* The FGET function returns a 0 if the operation was successful. A
   returned value of -1 is issued if there are no more records
   available. This function is used to copy data from the file data 
   buffer and place it into the macro variable, specified as the
   second argument in the function. In this case, the macro variable
   is MYSTRING. */   
 
   %let rc=%sysfunc(fget(&fid,mystring));
 
   /* If the read was successful, the log will write out the value
   that is contained within &MYSTRING. If nothing is returned, the
   %ELSE portion is executed. */
 
   %if &rc = 0 %then %put &mystring;
   %else %put file is empty;
 
   /* The FCLOSE function returns a 0 if the operation was successful;
   else, a nonzero value is returned. This function is used to close
   the file that was referenced in the FOPEN function. */
 
   %let rc=%sysfunc(fclose(&fid));
   %end;
 
   /* The FILENAME function is used here to deassign the fileref 
   FILRF. */
 
   %let rc=%sysfunc(filename(filrf));
   %end;
   %else %put file does not exist;
   %mend test;
   %test(c:\testfile.txt)

There are times when the value that is returned from the function used with %SYSFUNC contains special characters. Those characters then need to be masked. This can be done easily by using %SYSFUNC’s counterpart, %QSYSFUNC. Suppose we run the following example:

   %macro test(dte);
   %put &dte;
   %mend test;
 
   %test(%sysfunc(today(), worddate20.))

The above code would generate an error in the log, similar to the following:

   1  %macro test(dte);
   2  %put &dte;
   3  %mend test;
   4
   5  %test(%sysfunc(today(), worddate20.))
   MLOGIC(TEST):  Beginning execution.
   MLOGIC(TEST):  Parameter DTE has value July 20
   ERROR: More positional parameters found than defined.
   MLOGIC(TEST):  Ending execution.

The WORDDATE format would return the value like this: July 20, 2017. The comma, to a parameter list, represents a delimiter, so this macro call is pushing two positional parameters. However, the definition contains only one positional parameter. Therefore, an error is generated. To correct this problem, you can rewrite the macro invocation in the following way:

   %test(%qsysfunc(today(), worddate20.))

The %QSYSFUNC macro function masks the comma in the returned value so that it is seen as text rather than as a delimiter.

For a list of the functions that are not available with %SYSFUNC, see the “How to expand the number of available SAS functions within the macro language was published on SAS Users.

4月 192018
 

In SAS Visual Analytics 7.4 on 9.4M5 and SAS Visual Analytics 8.2 on SAS Viya, the periodic operators have a new additional parameter that controls how filtering on the date data item used in the calculation affects the calculated values.

The new parameter values are:

SAS Visual Analytics filters

These parameter values enable you to improve the appearance of reports based on calculations that use periodic operators. You can have periods that produce missing values for periodic calculations removed from the report, but still available for use in the calculations for later periods. These parameter settings also enable you to provide users with a prompt for choosing the data to display in a report, without having any effect on the calculations themselves.

The following will illustrate the points above, using periodic Revenue calculations based on monthly data from the MEGA_CORP table. New aggregated measures representing Previous Month Revenue (RelativePeriod) and Same Month Last Year (ParallelPeriod) will be displayed as measures in a crosstab. The default _ApplyAllFilters_ is in effect for both, as shown below, but there are no current filters on report or objects.

The Change from Previous Month and Change From Same Month Last Year calculations, respectively, are below:

The resulting report is a crosstab with Date by Month and Product Line in the Row roles, and Revenue, along with the 4 aggregations, in the Column roles.  All calculations are accurate, but of course, the calculations result in missing values for the first month (Jan2009) and for the first year (2009).

An improvement to the appearance of the report might be to only show Date by Month values beginning with Jan2010, where there are no non-missing values.  Why not apply a filter to the crosstab (shown below), so that the interval shown is Jan2010 to the most recent date?

With the above filter applied to the crosstab, the result is shown below—same problem, different year!

This is where the new parameter on our periodic operators is useful. We would like to have all months used in the calculations, but only the months with non-missing values for both of the periodic calculations shown in the crosstab. So, edit both periodic calculations to change the default _ApplyAllFilters_ to _IgnoreAllTimeFrameFilters_, so that the filters will filter the data in the crosstab, but not for the calculations. When the report is refreshed, only the months with non-missing values are shown:

This periodic operator parameter is also useful if you want to enable users to select a specific month, for viewing only a subset of the crosstab results.

For a selection prompt, add a Drop-Down list to select a MONYY value and define a filter action from the Drop-Down list to the Crosstab. To prevent selection of a month value with missing calculation values, you will also want to apply a filter to the Drop-Down list as you did for the crosstab, displaying months Jan2010 and after in the list.

Now the user can select a month, with all calculations relative to that month displayed, shown in the examples below:

Note that, at this point, since you’ve added the action from the drop-down list to the crosstab, you actually no longer need the filter on the crosstab itself.  In addition, if you remove the crosstab filter, then all of your filters will now be from prompts or actions, so you could use the _IgnoreInteractiveTimeFrameFilters_ parameter on your periodic calculations instead of the _IgnoreTimeFrameFilters_ parameter.

You will also notice that, in release 8.2 of SAS Visual Analytics that the performance of the periodic calculations has been greatly improved, with more of the work done in CAS.

Be sure to check out all of the periodic operators, documented here for SAS Visual Analytics 7.4 and SAS Visual Analytics filters on periodic calculations: Apply them or ignore them! was published on SAS Users.

4月 192018
 

A very common coding technique SAS programmers use is identifying the largest value for a given column using DATA Step BY statement with the DESCENDING option. In this example I wanted to find the largest value of the number of runs (nRuns) by each team in the SASHELP.BASEBALL dataset. Using a SAS workspace server, one would write:

DESCENDING BY Variables in SAS Viya

Figure 1. Single Threaded DATA Step in SAS Workspace Server

Figure 2 shows the results of the code we ran in Figure 1:

Figure 2. Result from SAS Code Displayed in Figure 1

To run this DATA Step distributed we will leveraging the SAS® Cloud Analytic Services in SAS® Viya™. Notice in Figure 3, there is no need for the PROC SORT step which is required when running DATA Step single threaded in a SAS workspace server. This is because SAS® Cloud Analytic Services in SAS® Viya™ . Instead we will use

Figure 3. Distributed DATA Step in SAS® Cloud Analytic Services in SAS® Viya™

Figure 4 shows the results when running distributed DATA Step in SAS® Cloud Analytic Services in SAS® Viya™.

Figure 4. Results of Distributed DATA Step in SAS® Cloud Analytic Services in SAS® Viya™

Conclusion

Until the BY statement running in the SAS® Cloud Analytic Services in SAS® Viya™ supports DESCENDING use this technique to ensure your DATA Step runs distributed.

Read more SAS Viya posts.

Read our SAS 9 to SAS Viya whitepaper.

How to Simulate DESCENDING BY Variables in DATA Step Code that Runs Distributed in SAS® Viya™ was published on SAS Users.

4月 172018
 

Peak moments from SAS Global Forum“Customer experiences are defined by peak moments”, said Dan Heath, in his typical charismatic fashion during the keynote on the Executive Track of the SAS Global Forum 2018. Customer experiences is not an alien term by any means but "peak moments?" What was that?

The best way to get a detailed understanding is to simply get your copy of The Power of Moments authored by the Heath Brothers. Heath went on to use examples from a bank in Canada where the ATM would have a human dialog with the surprised consumer just walking up to withdraw a few dollars and would pop up a free ticket to Disney for a mother of two or a round trip ticket for a grandmother to be with her beautiful grandchildren. Peak Moments. OK, now I get it.

Heath also challenged everyone in the audience – including yours truly – to submit our own interpretation of peak moments within the next 24 hours to get a free copy of his book. But, as luck would have it, I did not get around to submitting it because the peak moment I experienced was a few hours later during the SAS Grid Dinner. And it was no surprise that this conference was in Denver with a beautiful view of the glorious mountain peaks -- the perfect backdrop for customers to experience their peak moments!

Peak Moments from SAS Global Forum

Fast forward to the SAS Grid Dinner. A panel of customers using SAS Grid Manager with years of experienced insight had a great conversation with the audience with the discussion eloquently moderated by Cheryl Doninger, VP, Business Intelligence, Research & Development, SAS. It was clear that this was a panel that had done many things right from the get go and taken up the challenges of technology, culture, change management different ways. It was really nice to see them openly share their experiences and findings for the benefit of the hundreds of members in the audience listening with rapt attention.

And, then it hit me! We were looking at SAS customers on the panel. Going back to Heath’s assertion, I started wondering what could have been the peak moment for every one of these customers as they continued to evangelize the enterprise-wide adoption of SAS Grid Manager. Here is a synopsis of various peak moments shared by the customers in response to my question. Heath! Are you reading this?

Peak Moment One
SAS Grid Manager users are happy with the uptime and availability of Grid Manager. One customer called it a Layer of Happiness.

Peak Moment Two
Users who had initially issued a stern warning – “You ain’t taking my PC SAS away” – were happy to see 25 percent to 50 percent performance and throughput improvements with SAS Grid Manager.

Peak Moment Three
The loudest and most aggressive naysayer who was opposed to the migration to SAS Grid Manager became an advocate after experiencing tangible proof points and measurable outcomes.

Peak Moment Four
200 users were seamlessly migrated without any glitch. Worked like a charm! Nodes were decommissioned unbeknownst to the user.

Peak Moment Five
Taking a moment to reflect on the overall experience – what could have been and how smooth the overall experience was.

Peak Moment Six
Certify models when moving from previous platform to the SAS Grid but no code changes were needed to move the models.

Peak Moment Seven.
SAS Grid Manager being fully utilized at 80 percent to 90 percent capacity just like the mainframes. Maximizing the utilization is always a strong indicator of systemic adoption across the enterprise.

There you have it.  These were peak moments that the customers shared in response to my question.

But, here is what I was pleasantly surprised by! It took them seconds to come out with their respective peak moments. They did not have to reflect. They did not have to think hard. It came to them naturally. And that is what made it a peak moment for me! Just seeing these real-life customers share openly what worked for them and their users and why. These seven peak moments took me to seventh heaven!

So, Dan Heath, I could not meet your deadline of submitting my peak moment because I was waiting to experience one myself – albeit a tad bit late! Guess what could be another peak moment for me? You reading this blog and sending me a copy of your book!

What are your peak moments as a customer?  What are peak moments your customers have experienced?  Please share them here.  And we will wait for the Heath brothers to synthesize that into another wonderful book for the rest of us to read!

Peak moments define SAS Global Forum 2018 was published on SAS Users.