Data Management

4月 112019
 

What's the impact of using data governance and analytics for the business side of education? It's an interesting question, and during a video interview, Dale Pietrzak, Ed.D., Director of Institutional Effectiveness and Accreditation (IEA) at the University of Idaho shared details on the results they're realizing from using SAS for data [...]

The impact of data governance and analytics: An interview with the U. of Idaho was published on SAS Voices by Georgia Mariani

3月 142019
 

I am obsessed with jigsaw puzzles. Specifically, 1000-piece mystery puzzles, entertaining not just for their pictorial humor, but also for the challenge. Unlike traditional puzzles, you don't know what you are putting together because the completed puzzle isn't pictured on the box. Mystery puzzles are constructed so that you must [...]

Puzzle obsession ... piecing together a 360-degree view of a patient. was published on SAS Voices by Heather Hallett

3月 142019
 

I am obsessed with jigsaw puzzles. Specifically, 1000-piece mystery puzzles, entertaining not just for their pictorial humor, but also for the challenge. Unlike traditional puzzles, you don't know what you are putting together because the completed puzzle isn't pictured on the box. Mystery puzzles are constructed so that you must [...]

Puzzle obsession ... piecing together a 360-degree view of a patient. was published on SAS Voices by Heather Hallett

12月 142018
 

By now you’ve seen the headlines and the hype proclaiming data as the new oil. The well-meaning intent of these proclamations is to cast data in the role of primary economic driver for the 21st century, just as oil was for the 20th century. As analogies go, it’s not too [...]

No, data is not the new oil was published on SAS Voices by Leo Sadovy

11月 132018
 

In my previous blog post I demonstrated how to create your own CAS actions and action sets.  In this post, we will explore how to create your own CAS functions using the CAS Language (CASL).  A function is a component of the CASL programming language that can accept arguments, perform a computation or other operation, and return a value.  The value that is returned can be used in an assignment statement or elsewhere in expressions.

About SAS functions

SAS provides two types of supplied functions: built-in functions and common functions.  Built-in functions contain functionality that is unique to CASL.  These allow you to perform operations on your result tables, arrays, and dictionaries, and provide run-time support for your CASL programs.  Built-in functions cannot be replaced with user-defined functions.

Conversely, common functions provide functionality that is common to other SAS functions.  When used in a CASL program, SAS functions take a CASL value and a CASL value is returned.  Unlike built-in functions, you can replace these functions with user-defined functions.

Since the capabilities of built-in functions are unique to CASL, let’s look at these in-depth and demonstrate with an example.  Save the following FedSQL code in an external file called hmeqsql.sas.  This code will be read into CAS and stored as a variable.

The execDirect action executes FedSQL code in CAS.  The READPATH built-in function reads the FedSQL code saved in hmeqsql.sas and stores it in the CASL variable sqlcode which is used as input to the query parameter.

The fetch action displays the first 20 rows from the output table hmeq.out.

If you don’t feel like looking through the documentation for a built-in or common function, a list of each can be generated programmatically.  Run the following code to see a list of built-in functions.

Partial list of CASL built-in functions

Run the following code to see a list of common functions.

Partial list of common functions

User-defined CASL functions

In addition to the customizable capabilities of built-in functions supplied by SAS, you can also create your own functions using the FUNCTION statement.  User-defined functions can be called in expressions using CASL and they provide a large amount of flexibility.  The following example creates four different functions for temperature conversion.

After creating these functions, they can be called immediately, or you can store them in an external file and call them via a %include statement.  In this example, the user-defined functions have been stored in an external file called FunctionStore.sas.  You can call one, all, or any number of your user-defined functions.

The output from each function call is displayed in the log.

Lastly, if you want to see all user-defined functions, run the FUNCTIONLIST statement.  A list will be printed to the log.

More about CASL programming and using functions in CASL

Check out these resources for further information on programming in the CASL language and using functions in CASL.

Customize your CASL code with built-in and user-defined functions was published on SAS Users.

11月 012018
 

This blog post was also written by SAS' Bari Lawhorn.

We have had several requests from customers who want to use SAS® software to automate the download of data from a website when there is no application programming interface (API) to do it. As an example, the Yahoo Finance website recently changed their service to decommission their API, and this generated an interesting challenge for one of our customers. This SAS programmer wanted to download historical stock price data "unattended," without having to click through a web page. While working on this case, we discovered that the Yahoo Finance website requires a cookie-crumb combination to download. To help you automate downloads from websites that do not have an API, this blog post takes you through how we used the DEBUG feature of PROC HTTP to achieve partial automation, and eventually full automation with this case.

Partial automation

To access the historical data for Apple stock (symbol: AAPL) on the Yahoo Finance website, we use this URL: https://finance.yahoo.com/quote/AAPL/history?p=AAPL

We click Historical Data --> Download Data and get a CSV file with historical stock price data for Apple. We could save this CSV file and read it into SAS. But, we want a process that does not require us to click in the browser.

Because we know the HTTP procedure, we right-click Download Data and then select Copy link address as shown from a screen shot using the Google Chrome browser below:

Note: The context menu that contains Copy link address looks different in each browser.

Using this link address, we expect to get a direct download of the data into a CSV file (note that your crumb= will differ from ours):

filename out "c:\temp\aapl.csv";
 
proc http
 url='https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1535399845&period2=1538078245&interval=1d&events=history&crumb=hKubrf50i1P'
 method="get" out=out;
run;

However, the above code results in the following log message:

NOTE: PROCEDURE HTTP used (Total process time):
real time           0.25 seconds
cpu time            0.14 seconds
 
NOTE: 401 Unauthorized

When we see this note, we know that the investigation needs to go further.

filename out "c:\temp\aapl.csv";
proc http
 url='https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1535399845&period2=1538078245&interval=1d&events=history&crumb=hKubrf50i1P'
 method="get" out=out;
 debug level=3;
run;

When we run the code, here's what we see in the log (snipped for convenience):

> GET
/v7/finance/download/AAPL?period1=1535399845&period2=1538078245&interval=1d&events=history&crumb=h
Kubrf50i1P HTTP/1.1
 
> User-Agent: SAS/9
> Host: query1.finance.yahoo.com
> Accept: */*
> Connection: Keep-Alive
> Cookie: B=fpd0km1dqnqg3&b=3&s=ug
> 
< HTTP/1.1 401 Unauthorized
< WWW-Authenticate: crumb
< Content-Type: application/json;charset=utf-8
 
…more output…
 
< Strict-Transport-Security: max-age=15552000
 
…more output…
 
{    "finance": {        "error": {            "code": "Unauthorized",
"description": "Invalid cookie"        }    }}
NOTE: PROCEDURE HTTP used (Total process time):
      real time           0.27 seconds
      cpu time            0.15 seconds
 
NOTE: 401 Unauthorized

The log snippet reveals that we did not provide the Yahoo Finance website with a valid cookie. It is important to note that the response header for the URL shows crumb for the authentication method (the line that shows WWW-Authenticate: crumb. A little web research helps us determine that the Yahoo site wants a cookie-crumb combination, so we need to also provide the cookie. But, why did we not need this step when we were using the browser? We used a tool called Fiddler to examine the HTTP traffic and discovered that the cookie was cached when we first clicked in the browser on the Yahoo Finance website:

Luckily, starting in SAS® 9.4M3 (TS1M3), PROC HTTP will set cookies and save them across HTTP steps if the response contains a "set-cookie:  <some cookie>" header when it successfully connects to a URL. So, we try this download in two steps. The first step does two things:

  • PROC HTTP sets the cookie for the Yahoo Finance website.
  • Adds the DEBUG statement so that we can obtain the crumb value from the log.
filename out "c:\temp\Output.txt";
 
filename hdrout "c:\temp\Response.txt";
 
proc http
 out=out
 headerout=hdrout
 url="https://finance.yahoo.com/quote/AAPL/history?p=AAPL"
 method="get";
 debug level=3;
run;

Here's our log snippet showing the set-cookie header and the crumb we copy and use in our next PROC HTTP step:

…more output…
< set-cookie: B=2ehn8rhdsf5r2&b=3&s=fe; expires=Wed, 17-Oct-2019 20:11:14 GMT; path=/;
domain=.yahoo.com
 
…more output…
 
Initialized"},"account-switch-uh-0-AccountSwitch":{"status":"initialized"}}},"CrumbStore":{"crumb":
"4fKG9lnt5jw"},"UserStore":{"guid":"","login":"","alias":"","firstName":"","comscoreC14":-1,"isSig

The second step uses the cached cookie from Yahoo Finance (indicated in the "CrumbStore" value), and in combination with the full link that includes the appropriate crumb value, downloads the CSV file into our c:\temp directory.

filename out "c:\temp\aapl.csv";
 
proc http
 url='https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1535399845&period2=1538078245&interval=1d&events=history&crumb=4fKG9lnt5jw'
 method="get"
 out=out;
run;

With the cookie value in place, our download attempt succeeds!

Here is our log snippet:

31
32   proc http
33       out=data
34       headerout=hdrout2
35       url='https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1534602937&peri
35 ! od2=1537281337&interval=1d&events=history&crumb=4fKG9lnt5jw'
36       method="get";
37   run;
 
NOTE: PROCEDURE HTTP used (Total process time):
      real time           0.37 seconds
      cpu time            0.17 seconds
 
NOTE: 200 OK

Full automation

This partial automation requires us to visit the website and right-click on the download link to get the URL. There’s nothing streamlined about that, and SAS programmers want full automation!

So, how can we fully automate the process? In this section, we'll share a "recipe" for how to get the crumb value -- a value that changes with each transaction. To get the current crumb, we use the first PROC HTTP statement to "screen scrape" the URL and to cache the cookie value that comes back in the response. In this example, we store the first response in the Output.txt file, which contains all the content from the page:

filename out "c:\temp\Output.txt";
filename hdrout "c:\temp\Response.txt";
 
proc http 
    out=out 
    headerout=hdrout
    url="https://finance.yahoo.com/quote/AAPL/history?p=AAPL" 
    method="get";
run;

It is a little overwhelming to examine the web page in its entirety. And the HTML page contains some very long lines, some of them over 200,000 characters long! However, we can still use the SAS DATA step to parse the file and retrieve the text or information that might change on a regular basis, such as the crumb value.

In this DATA step we read chunks of the text data and scan the buffer for the "CrumbStore" keyword. Once found, we're able to apply what we know about the text pattern to extract the crumb value.

data crumb (keep=crumb);
  infile out  recfm=n lrecl=32767;
  /* the @@ directive says DON'T advance pointer to next line */
  input txt: $32767. @@;
  pos = find(txt,"CrumbStore");
  if (pos>0) then
    do;
      crumb = dequote(scan(substr(txt,pos),3,':{}'));
      /* cookie value can have unicode characters, so must URLENCODE */
      call symputx('getCrumb',urlencode(trim(crumb)));
      output;
    end;
run;
 
%put &=getCrumb.;

Example result:

 102        %put &=getCrumb.;
 GETCRUMB=PWDb1Ve5.WD

We feel so good about finding the crumb, we're going to treat ourselves to a whole cookie. Anybody care for a glass of milk?

Complete Code for Full Automation
The following code brings it all together. We also added a PROC IMPORT step and a bonus highlow plot to visualize the results. We've adjusted the file paths so that the code works just as well on SAS for Windows or Unix/Linux systems.

/* use WORK location to store our temp files */
filename out "%sysfunc(getoption(WORK))/output.txt";
filename hdrout "%sysfunc(getoption(WORK))/response1.txt";
 
/* This PROC step caches the cookie for the website finance.yahoo.com */
/* and captures the web page for parsing later                        */
proc http 
  out=out
  headerout=hdrout
  url="https://finance.yahoo.com/quote/AAPL/history?p=AAPL" 
  method="get";
run;
 
/* Read the response and capture the cookie value from     */
/* the CrumbStore field.                                   */
/* The file has very long lines, longer than SAS can       */
/* store in a single variable.  So we read in <32k chunks. */
data crumb (keep=crumb);
  infile out  recfm=n lrecl=32767;
  /* the @@ directive says DON'T advance pointer to next line */
  input txt: $32767. @@;
  pos = find(txt,"CrumbStore");
  if (pos>0) then
    do;
      crumb = dequote(scan(substr(txt,pos),3,':{}'));
      /* cookie value can have unicode characters, so must URLENCODE */
      call symputx('getCrumb',urlencode(trim(crumb)));
      output;
    end;
run;
 
%put &=getCrumb.;
 
filename data "%sysfunc(getoption(WORK))/data.csv";
filename hdrout2 "%sysfunc(getoption(WORK))/response2.txt";
 
proc http 
    out=data 
    headerout=hdrout2
    url="https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1535835578%str(&)period2=1538427578%str(&)interval=1d%str(&)events=history%str(&)crumb=&getCrumb."
    method="get";
run;
 
proc import
 file=data
 out=history
 dbms=csv
 replace;
run;
 
proc sgplot data=history;
  highlow x=date high=high low=low / open=open close=close;
  xaxis display=(nolabel) minor;
  yaxis display=(nolabel);
run;


Disclaimer: As we've seen, Yahoo Finance could change their website at any time, so the URLs in this blog post might not be accurate at a later date. Note that, as of the time of this writing, the above code runs error-free with Base SAS 9.4M5. And it also works in SAS University Edition and SAS OnDemand for Academics!

How to automate a data download with PROC HTTP was published on SAS Users.

10月 222018
 

This blog post was also written by Reece Clifford.

Who’s responsible for x, y, z sales territory? What’s the most amount of people they engaged with in a month? What type of location leads to the best response from the meeting?

To get the complete answer to these sales team-related questions, you need to trust your data. You need to be able to cut and slice high-quality data to prepare for analytics to drive innovation in your company. With SAS Data Preparation alongside SAS Decision Manager, you can do all this. Its many features allow you to perform out-of-the-box column and row transformations to increase your data quality and build the foundations for data-driven innovation.

This blog will discuss how you can leverage SAS Decision Manager to enrich data when preparing it through SAS Data Preparation.

The use case

As posed above, we want to create a SAS Data Preparation plan to map a sales person to a postcode area. We use a SAS Decision Manager rule to find the sales person for a postcode area and map the person to the address. To trigger the rule, we are going to call it from SAS Data Preparation.

In SAS Decision Manager we import a csv file to create a Lookup Table mapping a sales person to a postcode area. Lookup Tables are tables of key-value pairs and provide the ability to access reference data from business rules.

Next, we create a rule to map a postcode and sales person. A rule specifies conditions to be evaluated and actions to be taken if those conditions are satisfied. Rules are grouped together into rule sets. Most rules correspond to the form:

if condition_expressions then action_expressions

For our rule, we are going to have an incoming postcode plus a record id. The postcode is assumed to be a UK postcode. We are extracting the first two characters of the postcode and lookup the sales person from the Lookup Table that we have just imported.

The rule outputs the sales person (representative) and the record ID. When we have tested and published the rule, it's ready to be used in a SAS Data Preparation Plan.

In SAS Data Preparation, we load a table with address data that we want to enrich by the appropriate sales person.

  1. We need to make sure the table column names and rule input parameter names match. Therefore, we are renaming the field ADDRESS_ID to ID, as ID is the rule input name. The second rule input parameter is Postcode which is the same as in the table, therefore no action is needed.

  1. We can then call the previously-created rule in SAS Decision Manager to map a sales person to an area. This will be done by adding some CASL code to the Code node in the SAS Data Preparation plan. This is featured below with a brief explanation of the functions.
    As the rule has two output parameters, we receive only two columns when executing the code step.

CASL Code

loadactionset "ds2";
action runModel submit / 
	modelTable={name="MONITORRULES", caslib="DCMRULES"}
	modelName="Mon_Person"
	table= {name= _dp_inputTable, caslib=_dp_inputCaslib}
	casout= {name= _dp_outputTable, caslib=_dp_outputCaslib};

Parameters settings for CASL call

modelTable name Name of the table where the rule set was published to.
modelTable caslib Name of the caslib where the rule set was published to.
modelName Name of the decision flow to execute.
table name Table name of the decision flow input data.

(Set to _dp_inputTable)

table caslib caslib name of the table input data.

(Set to _dp_inputCaslib)

casout name Table name of the decision flow output data.

(Set to _dp_outputTable)

casout caslib caslib name of the table output data.

(Set to _dp_outputCaslib)

 

Decision Manager Publishing Dialogue

 

  1. We then wanted to bring back the columns from the input table. We do this through joining the table in the SAS Data Preparation Plan to the original table (again) on the rule output field ID and the tables field ADDRESS_ID.

Conclusion

We have answered our initial question of which sales person is mapped to which region by enriching our data in a user-friendly, efficient process in SAS Data Preparation. We can now begin to gain further insight from our data to answer more of the questions posed at the beginning of the blog to help drive innovation. This can be done through additional insight using SAS Decision Manager or functions in SAS Data Preparation in the current plan or use the output table in another plan. Ultimately, this will facilitate data-driven Innovation via reporting or advanced analytics in your organisation.

Using SAS Decision Manager to enrich the data prep process was published on SAS Users.

5月 222018
 

SAS ViyaSAS Viya Presentations is our latest extension of the SAS Platform and interoperable with SAS® 9.4. Designed to enable analytics to the enterprise, it seamlessly scales for data of any size, type, speed and complexity. It was also a star at this year’s SAS Global Forum 2018. In this series of articles, we will review several of the most interesting SAS Viya talks from the event. Our first installment reviews Hadley Christoffels’ talk, A Need For Speed: Loading Data via the Cloud.

You can read all the articles in this series or check out the individual interviews by clicking on the titles below:
Part 1: Technology that gets the most from the Cloud.


Technology that gets the most from the Cloud

Few would argue about the value the effective use of data can bring an organization. Advancements in analytics, particularly in areas like artificial intelligence and machine learning, allow organizations to analyze more complex data and deliver faster, more accurate results.

However, in his SAS Global Forum 2018 paper, A Need For Speed: Loading Data via the Cloud, Hadley Christoffels, CEO of Boemska, reminded the audience that 80% of an analyst’s time is still spent on the data. Getting insight from your data is where the magic happens, but the real value of powerful analytical methods like artificial intelligence and machine learning can only be realized when “you shorten the load cycle the quicker you get to value.”

Data Management is critical and still the most common area of investment in analytical software, making data management a primary responsibility of today’s data scientist. “Before you can get to any value the data has to be collected, has to be transformed, has to be enriched, has to be cleansed and has to be loaded before it can be consumed.”

Benefits of cloud adoption

The cloud can help, to a degree. According to Christoffels, “cloud adoption has become a strategic imperative for enterprises.” The advantages of moving to a cloud architecture are many, but the two greatest are elasticity and scalability.

Elasticity, defined by Christoffels, allows you to dynamically provision or remove virtual machines (VM), while scalability refers to increasing or decreasing capacity within existing infrastructure by scaling vertically, moving the workload to a bigger or smaller VM, or horizontally, by provisioning additional VM’s and distributing the application load between them.

“I can stand up VMs in a matter of seconds, I can add more servers when I need it, I can get a bigger one when I need it and a smaller one when I don’t, but, especially when it comes to horizontal scaling, you need technology that can make the most of it.” Cloud-readiness and multi-threaded processing make SAS® Viya® the perfect tool to take advantage of the benefits of “clouding up.”

SAS® Viya® can addresses complex analytical challenges and speed up data management processes. “If you have software that can only run on a single instance, then scaling horizontally means nothing to you because you can’t make use of that multi-threaded, parallel environment. SAS Viya is one of those technologies,” Christoffels said.

Challenges you need to consider

According to Christoffels, it’s important, when moving your processing to the cloud, that you understand and address existing performance challenges and whether it will meet your business needs in an agile manner. Inefficiencies on-premise are annoying; inefficiencies in the cloud are annoying and costly, since you pay for that resource.

It’s not the best use of the architecture to take what you have on premise and just shift it. “Finding and improving and eliminating inefficiencies is a massive part in cutting down the time data takes to load.”

Boemska, Christoffels’ company, has tools to help businesses find inefficiencies and understand the impact users have on the environment, including:

  1. Real-time diagnostics looking at CPU Usage, Memory Usage, SAS Workload, etc.
  2. Insight and comparison provides a historic view in a certain timeframe, essential when trying to optimize and shave off costly time when working in cloud.
  3. Utilization reports to better understand how the platform is used.

Optimizing inefficiencies with SAS Viya

But scaling vertically and horizontally from cloud-based infrastructure to speed the loading and data management process solves only part of the problem. Christoffels said SAS Viya capabilities completes the picture. SAS Viya offers a number of benefits in a Cloud infrastructure, Christoffels said. Code amendments that make use of the new techniques and benefits now available in SAS Viya, such as the multi-threaded DATA step or CAS Action Sets, can be extremely powerful.

One simple example of the benefits of SAS Viya, Christoffels said, is that with in-memory processing, PROC SORT is a procedure that’s no longer needed; SAS Viya does “grouping on the fly,” meaning you can remove sort routines from existing programs, which of itself, can cut down processing time significantly.

As a SAS Programmer, just the fact that SAS Viya can run multithreaded, the fact that you don’t have to do these sorts, the way it handles grouping on the fly, the fact that multithreaded nature and capability is built into how you deal with tables are all “significant,” according to Christoffels.

Conclusion

Data preparation and load processes have a direct impact on how applications can begin and subsequently complete. Many organizations are using the Cloud platform to speed up the process, but to take full advantage of the infrastructure you have to apply the right software technology. SAS Viya enables the full realization of Cloud benefits through performance improvements, such as the transposing of data and the transformation of data using the DATA step or CAS Action Sets.

Additional Resources

SAS Global Forum Video: A Need For Speed: Loading Data via the Cloud
SAS Global Forum 2018 Paper: A Need For Speed: Loading Data via the Cloud
SAS Viya
SAS Viya Products


Read all the posts in this series.

Part 1: Technology that gets the most from the Cloud

Technology that gets the most from the Cloud was published on SAS Users.

5月 082018
 

SAS reporting tools for GDPR and other privacy protection lawThe European Union’s General Data Protection Regulation (GDPR) taking effect on 25 May 2018 pertains not only to organizations located within the EU; it applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

If the GDPR acronym does not mean much to you, think of the one that does – HIPAA, FERPA, COPPA, CIPSEA, or any other that is relevant to your jurisdiction – this blog post is equally applicable to all of them.

The GDPR prohibits personal data processing revealing such individual characteristics as race or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, and data concerning a natural person’s sex life or sexual orientation. It also has special rules for data relating to criminal convictions or offenses and the processing of children’s personal data.

Whenever SAS users produce reports on demographic data, there is always a risk of inadvertently revealing personal data protected by law, especially when reports are generated automatically or interactively via dynamic data queries. Even for aggregate reports there is a high potential for such exposure.

Suppose you produce an aggregate cross-tabulation report on a small demographic group, representing a count distribution by students’ grade and race. It is highly probable that you can get the count of 1 for some cells in the report, which will unequivocally identify persons and thus disclose their education record (grade) by race. Even if the count is not equal to 1, but is equal to some other small number, there is still a risk of possible deducing or disaggregating of Personally Identifiable Information (PII) from surrounding data (other cells, row and column totals) or related reports on that small demographic group.

The following are the four selected SAS tools that allow you to take care of protecting personal data in SAS reports by suppressing counts in small demographic group reports.

1. Automatic data suppression in SAS reports

This blog post explains the fundamental concepts of data suppression algorithms. It takes you behind the scenes of the iterative process of complementary data suppression and walks you through SAS code implementing a primary and secondary complementary suppression algorithm. The suppression code uses BASE SAS – DATA STEPs, SAS macros, PROC FORMAT, PROC MEANS, and PROC REPORT.

2. Implementing Privacy Protection-Compliant SAS® Aggregate Reports

This SAS Global Forum 2018 paper solidifies and expands on the above blog post. It walks you through the intricate logic of an enhanced complementary suppression process, and demonstrates SAS coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization responsible for compliance with privacy protection.

3. In SAS Visual Analytics you can create derived data items that are aggregated measures.  SAS Visual Analytics 8.2 on SAS Viya introduces a new Type for the aggregated measures derived data items called Data Suppression. Here is an excerpt from the documentation on the Data Suppression type:

“Obscures aggregated data if individual values could easily be inferred. Data suppression replaces all values for the measure on which it is based with asterisk characters (*) unless a value represents the aggregation of a specified minimum number of values. You specify the minimum in the Suppress data if count less than parameter. The values are hidden from view, but they are still present in the data query. The calculation of totals and subtotals is not affected.

Some additional values might be suppressed when a single value would be suppressed from a subgroup. In this case, an additional value is suppressed so that the suppressed value cannot be inferred from totals or subtotals.

A common use of suppressed data is to protect the identity of individuals in aggregated data when some crossings are sparse. For example, if your data contains testing scores for a school district by demographics, but one of the demographic categories is represented only by a single student, then data suppression hides the test score for that demographic category.

When you use suppressed data, be sure to follow these best practices:

  • Never use the unsuppressed version of the data item in your report, even in filters and ranks. Consider hiding the unsuppressed version in the Data pane.
  • Avoid using suppressed data in any object that is the source or target of a filter action. Filter actions can sometimes make it possible to infer the values of suppressed data.
  • Avoid assigning hierarchies to objects that contain suppressed data. Expanding or drilling down on a hierarchy can make it possible to infer the values of suppressed data.”

This Data Suppression type functionality is significant as it represents the first such functionality embedded directly into a SAS product.

4. Is it sensitive? Mask it with data suppression

This blog post provides an example of using the above Data Suppression type aggregated measures derived data items in SAS Visual Analytics.

We need your feedback!

We want to hear from you.  Is this blog post useful? How do you comply with GDPR (or other Privacy Law of your jurisdiction) in your organization? What SAS privacy protection features would you like to see in future SAS releases?

SAS tools for GDPR privacy compliant reporting was published on SAS Users.

5月 082018
 

SAS reporting tools for GDPR and other privacy protection lawThe European Union’s General Data Protection Regulation (GDPR) taking effect on 25 May 2018 pertains not only to organizations located within the EU; it applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

If the GDPR acronym does not mean much to you, think of the one that does – HIPAA, FERPA, COPPA, CIPSEA, or any other that is relevant to your jurisdiction – this blog post is equally applicable to all of them.

The GDPR prohibits personal data processing revealing such individual characteristics as race or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, and data concerning a natural person’s sex life or sexual orientation. It also has special rules for data relating to criminal convictions or offenses and the processing of children’s personal data.

Whenever SAS users produce reports on demographic data, there is always a risk of inadvertently revealing personal data protected by law, especially when reports are generated automatically or interactively via dynamic data queries. Even for aggregate reports there is a high potential for such exposure.

Suppose you produce an aggregate cross-tabulation report on a small demographic group, representing a count distribution by students’ grade and race. It is highly probable that you can get the count of 1 for some cells in the report, which will unequivocally identify persons and thus disclose their education record (grade) by race. Even if the count is not equal to 1, but is equal to some other small number, there is still a risk of possible deducing or disaggregating of Personally Identifiable Information (PII) from surrounding data (other cells, row and column totals) or related reports on that small demographic group.

The following are the four selected SAS tools that allow you to take care of protecting personal data in SAS reports by suppressing counts in small demographic group reports.

1. Automatic data suppression in SAS reports

This blog post explains the fundamental concepts of data suppression algorithms. It takes you behind the scenes of the iterative process of complementary data suppression and walks you through SAS code implementing a primary and secondary complementary suppression algorithm. The suppression code uses BASE SAS – DATA STEPs, SAS macros, PROC FORMAT, PROC MEANS, and PROC REPORT.

2. Implementing Privacy Protection-Compliant SAS® Aggregate Reports

This SAS Global Forum 2018 paper solidifies and expands on the above blog post. It walks you through the intricate logic of an enhanced complementary suppression process, and demonstrates SAS coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization responsible for compliance with privacy protection.

3. In SAS Visual Analytics you can create derived data items that are aggregated measures.  SAS Visual Analytics 8.2 on SAS Viya introduces a new Type for the aggregated measures derived data items called Data Suppression. Here is an excerpt from the documentation on the Data Suppression type:

“Obscures aggregated data if individual values could easily be inferred. Data suppression replaces all values for the measure on which it is based with asterisk characters (*) unless a value represents the aggregation of a specified minimum number of values. You specify the minimum in the Suppress data if count less than parameter. The values are hidden from view, but they are still present in the data query. The calculation of totals and subtotals is not affected.

Some additional values might be suppressed when a single value would be suppressed from a subgroup. In this case, an additional value is suppressed so that the suppressed value cannot be inferred from totals or subtotals.

A common use of suppressed data is to protect the identity of individuals in aggregated data when some crossings are sparse. For example, if your data contains testing scores for a school district by demographics, but one of the demographic categories is represented only by a single student, then data suppression hides the test score for that demographic category.

When you use suppressed data, be sure to follow these best practices:

  • Never use the unsuppressed version of the data item in your report, even in filters and ranks. Consider hiding the unsuppressed version in the Data pane.
  • Avoid using suppressed data in any object that is the source or target of a filter action. Filter actions can sometimes make it possible to infer the values of suppressed data.
  • Avoid assigning hierarchies to objects that contain suppressed data. Expanding or drilling down on a hierarchy can make it possible to infer the values of suppressed data.”

This Data Suppression type functionality is significant as it represents the first such functionality embedded directly into a SAS product.

4. Is it sensitive? Mask it with data suppression

This blog post provides an example of using the above Data Suppression type aggregated measures derived data items in SAS Visual Analytics.

We need your feedback!

We want to hear from you.  Is this blog post useful? How do you comply with GDPR (or other Privacy Law of your jurisdiction) in your organization? What SAS privacy protection features would you like to see in future SAS releases?

SAS tools for GDPR privacy compliant reporting was published on SAS Users.