12月 022016
 

Nearly every organization has to deal with big data, and that often means dealing with big data problems. For some organizations, especially government agencies, addressing these problems provides more than a competitive advantage, it helps them ensure public confidence in their work or meet standards mandated by law. In this blog I wanted to share with you how SAS worked with a government revenue collection agency to successfully manage their big data issues and seamlessly integrate with Hadoop and other technologies.

Hadoop Security

We all know Hadoop pretty well, and if you haven’t heard of Hadoop yet, it is about time you invest some resources to learn more about this upcoming defacto standard for storage and compute. The core of Apache Hadoop consist of a storage part known as HDFS (Hadoop Distributed File System) and a processing part (called MapReduce). Hadoop splits large files into large blocks and distributes them across the nodes of a cluster.

Hadoop was initially developed to solve web-scale problems like webpage search and indexing at Yahoo. However, the potential of the platform to handle big data and analytics caught the attention of a number of industries. Since the initial used of Hadoop was to count webpages and implement algorithms like page-rank, security was never considered a major requirement, until it started getting used by enterprises across the world.

Security incidents and massive fines have become commonplace and financial institutions, in particular, are doing everything to avoid such incidents. Security should never be an afterthought and should be considered in the initial design of the system. The five core pillars of Enterprise Security are as follows:

sas-integration-with-hadoop05

Our customer had the four core pillars covered from Administration to Auditing using tools provided by their Hadoop vendor. While there are options in the open-source community that provide data protection, in this case the organization decided to use a data security company to protect data at rest on top of Cloudera Navigator Encryption. They refer to it as “Double Encryption.”

The challenge

SAS has multiple products around the Hadoop ecosystem to provide the best support for customers. The traditional way of working with Hadoop involves SAS/ACCESS which can involve pulling the data from Hadoop using Hive. However for larger installations, where data movement is a concern, SAS provides Embedded Process technology, which allows you to push SAS code inside of a Hadoop cluster and run it alongside the data blocks. This is a super-efficient way to access large data sets inside of Hadoop by pushing the compute to the data.

Our customer's data security vendor’s product supports access via Hive UDF’s which means you can tokenize/detokenize when working with SAS/ACCESS Interface to Hadoop using PROC SQL and other options, relatively out of the box. In addition, the SAS language (BASE SAS) can be added using the security company’s API (and PROC FMCP and PROC PROTO) to add additional new SAS language functions for the de/tokenisation of data inside BASE SAS already.

However, SAS Embedded Process has no default support for our customer's security vendor and SAS products which utilize SAS EP include SAS Code Accelerator, SAS Scoring Accelerator and LASR-based products and cannot work with data tokenized by the vendor. This was a major challenge for our customer who wanted to use SAS products like SAS Visual Analytics and SAS Visual Statistics on large volumes of data stored on Hadoop.

The challenge hence was to make SAS Embedded Process work with their data security vendor’s software to perform detokenization before passing the data to SAS procedures.

The possible solutions

We considered various solutions before agreeing on a solution that satisfies all current requirements and could be extended to meet the future needs of our customer. Let’s discuss the top two solutions and the final implementation.

Solution 1: SERDE approach

Our first approach was to create a custom Hive SERDE that wraps the data security company’s APIs. With 9.4M3 the SAS Embedded Process (EP) can read & write via SERDE APIs with some possible constraints and limitations including DS2’s SET/MERGE capabilities and potential identity credentials being passed on from SAS to the company’s APIs.

sas-integration-with-hadoop02

The approach had various drawbacks but the top drawback was in working with various file formats. This approach was discarded because it would have meant lots of rework with every new data format being released by the Hadoop community. While it is true that generally an organization would standardize a few formats to be used for its use cases, it is nonetheless a limiting factor.

Solution 2: Use HDMD with Custom Input Formats

The second approach was to use HDMD with custom input formats. SAS HDMD supports custom input formats which will allow you to plug in your custom input format. A high-level architectural diagram looks something like Figure 2.  This approach works with a variety of file formats, and we have tested it with Parquet, Avro and ORC with good results. The objective is to load a dataset onto Hadoop or use an existing data set and generate an HDMD file for the dataset. We plug in our custom reader in the HDMD file and as a part of the custom reader we make a number of API calls to the data security company’s API. The API will call on the specific protect and unprotect procedures of the security vendor to protect and/or unprotect the data depending on the requirements and pass the results back to the client.

sas-integration-with-hadoop03

What is an Input/Custom input format BTW?

Data inside Hadoop is typically stored on HDFS (Hadoop Distributed File System). The data needs to be read from the filesystem before being processed. This is achieved using Input Format, which has the following responsibilities.

  • Compute input splits
    • Input splits represent the part of the data that will be processed by each Map phase. A unique input split is passed to the process. At the start of a Map Reduce job, input format will split the data into multiple parts based on logical record boundaries and HDFS block size. To get the input splits, the following method is called:
      • List getSplits(JobContext ctx)
  • Provide a logic to read the input split
    • Each mapper gets a unique input split to process the data. Input format provides a logic to read the split, which is an implementation of the RecordReader interface. The record reader will read the split and emit <key,value> pairs as an input for each map function. The record reader is created using the following method:
      • RecordReader<K,V> createRecordReader(InputSplit is, TaskAttemptContext ctx)

All the common formats will provide a way to split the data and read records. However, if you want to read a custom data set for which data parsing isn’t available out of the box with Hadoop, you are better off writing a custom input format.

How to write a Custom Input Format?

Writing a custom input format needs Java skills (the programming language in which Hadoop has been written). You have the option to implement Abstract methods of InputFormat class, or extend one of the pre-existing input formats. In our case, we had extended FileInputFormat, and overrode few critical methods like

  • getSplits()
  • getRecordReader()

The getSplits() will create the splits from the input data, while the getRecordReader() should return an instance of a Java object, which has the ability to read custom records, which in our case was the security vendor’s API.

You can use one of the predefined Record Reader classes or implement your own (most likely if you are writing a custom input format). In our case, we implemented RecordReader interface, and implemented the next() method which is called whenever a new record is found. This is the method where your core business logic is implemented. In our case, we had to write the integration logic by looking at the data, understanding the user who has logged in (available as a part of JobConf object), and then calling the vendor’s APIs to decrypt the data. Sample codes can be requested by contacting me directly.

Integrating a custom input format with SAS

Integrating a custom input format is fairly easy with SAS. SAS allows us to plug in custom formats, which are called before the data is processed via SAS Embedded Process using HDMD files.

When you generate an HDMD file using PROC HDMD, you can specify your custom input format as a part of the generated XML file. Please refer to PROC HDMD documentation.

The generated HDMD file would look something like this.

sas-integration-with-hadoop04

When loading the data from HDFS, SAS will ensure that the specified input format is called prior to any data processing taking place.

The ultimate solution

The solution was demonstrated using data from the tax authorities and included tokenization of data via hive UDFS, detokenization of data according to the policies set on the data security appliance, and performing analytics using SAS Visual Analytics. Only users with permissions on the specific policy were able to view the data, while users with no permissions had access to decrypted data. This additional security helped the enterprise protect users’ information from inadvertent access and resulted in widespread use of Big Data technologies within the Enterprise.

Summary

As you can see from the example above, SAS is open for business, and is already providing deep integration with Hadoop and other technologies using custom APIs. The sky is the limit for people willing to explore the capabilities of SAS.

tags: Global Technology Practice, Hadoop, SAS/ACCESS Interface to Hadoop

SAS integration with Hadoop - one success story was published on SAS Users.

12月 022016
 

Nearly every organization has to deal with big data, and that often means dealing with big data problems. For some organizations, especially government agencies, addressing these problems provides more than a competitive advantage, it helps them ensure public confidence in their work or meet standards mandated by law. In this blog I wanted to share with you how SAS worked with a government revenue collection agency to successfully manage their big data issues and seamlessly integrate with Hadoop and other technologies.

Hadoop Security

We all know Hadoop pretty well, and if you haven’t heard of Hadoop yet, it is about time you invest some resources to learn more about this upcoming defacto standard for storage and compute. The core of Apache Hadoop consist of a storage part known as HDFS (Hadoop Distributed File System) and a processing part (called MapReduce). Hadoop splits large files into large blocks and distributes them across the nodes of a cluster.

Hadoop was initially developed to solve web-scale problems like webpage search and indexing at Yahoo. However, the potential of the platform to handle big data and analytics caught the attention of a number of industries. Since the initial used of Hadoop was to count webpages and implement algorithms like page-rank, security was never considered a major requirement, until it started getting used by enterprises across the world.

Security incidents and massive fines have become commonplace and financial institutions, in particular, are doing everything to avoid such incidents. Security should never be an afterthought and should be considered in the initial design of the system. The five core pillars of Enterprise Security are as follows:

sas-integration-with-hadoop05

Our customer had the four core pillars covered from Administration to Auditing using tools provided by their Hadoop vendor. While there are options in the open-source community that provide data protection, in this case the organization decided to use a data security company to protect data at rest on top of Cloudera Navigator Encryption. They refer to it as “Double Encryption.”

The challenge

SAS has multiple products around the Hadoop ecosystem to provide the best support for customers. The traditional way of working with Hadoop involves SAS/ACCESS which can involve pulling the data from Hadoop using Hive. However for larger installations, where data movement is a concern, SAS provides Embedded Process technology, which allows you to push SAS code inside of a Hadoop cluster and run it alongside the data blocks. This is a super-efficient way to access large data sets inside of Hadoop by pushing the compute to the data.

Our customer's data security vendor’s product supports access via Hive UDF’s which means you can tokenize/detokenize when working with SAS/ACCESS Interface to Hadoop using PROC SQL and other options, relatively out of the box. In addition, the SAS language (BASE SAS) can be added using the security company’s API (and PROC FMCP and PROC PROTO) to add additional new SAS language functions for the de/tokenisation of data inside BASE SAS already.

However, SAS Embedded Process has no default support for our customer's security vendor and SAS products which utilize SAS EP include SAS Code Accelerator, SAS Scoring Accelerator and LASR-based products and cannot work with data tokenized by the vendor. This was a major challenge for our customer who wanted to use SAS products like SAS Visual Analytics and SAS Visual Statistics on large volumes of data stored on Hadoop.

The challenge hence was to make SAS Embedded Process work with their data security vendor’s software to perform detokenization before passing the data to SAS procedures.

The possible solutions

We considered various solutions before agreeing on a solution that satisfies all current requirements and could be extended to meet the future needs of our customer. Let’s discuss the top two solutions and the final implementation.

Solution 1: SERDE approach

Our first approach was to create a custom Hive SERDE that wraps the data security company’s APIs. With 9.4M3 the SAS Embedded Process (EP) can read & write via SERDE APIs with some possible constraints and limitations including DS2’s SET/MERGE capabilities and potential identity credentials being passed on from SAS to the company’s APIs.

sas-integration-with-hadoop02

The approach had various drawbacks but the top drawback was in working with various file formats. This approach was discarded because it would have meant lots of rework with every new data format being released by the Hadoop community. While it is true that generally an organization would standardize a few formats to be used for its use cases, it is nonetheless a limiting factor.

Solution 2: Use HDMD with Custom Input Formats

The second approach was to use HDMD with custom input formats. SAS HDMD supports custom input formats which will allow you to plug in your custom input format. A high-level architectural diagram looks something like Figure 2.  This approach works with a variety of file formats, and we have tested it with Parquet, Avro and ORC with good results. The objective is to load a dataset onto Hadoop or use an existing data set and generate an HDMD file for the dataset. We plug in our custom reader in the HDMD file and as a part of the custom reader we make a number of API calls to the data security company’s API. The API will call on the specific protect and unprotect procedures of the security vendor to protect and/or unprotect the data depending on the requirements and pass the results back to the client.

sas-integration-with-hadoop03

What is an Input/Custom input format BTW?

Data inside Hadoop is typically stored on HDFS (Hadoop Distributed File System). The data needs to be read from the filesystem before being processed. This is achieved using Input Format, which has the following responsibilities.

  • Compute input splits
    • Input splits represent the part of the data that will be processed by each Map phase. A unique input split is passed to the process. At the start of a Map Reduce job, input format will split the data into multiple parts based on logical record boundaries and HDFS block size. To get the input splits, the following method is called:
      • List getSplits(JobContext ctx)
  • Provide a logic to read the input split
    • Each mapper gets a unique input split to process the data. Input format provides a logic to read the split, which is an implementation of the RecordReader interface. The record reader will read the split and emit <key,value> pairs as an input for each map function. The record reader is created using the following method:
      • RecordReader<K,V> createRecordReader(InputSplit is, TaskAttemptContext ctx)

All the common formats will provide a way to split the data and read records. However, if you want to read a custom data set for which data parsing isn’t available out of the box with Hadoop, you are better off writing a custom input format.

How to write a Custom Input Format?

Writing a custom input format needs Java skills (the programming language in which Hadoop has been written). You have the option to implement Abstract methods of InputFormat class, or extend one of the pre-existing input formats. In our case, we had extended FileInputFormat, and overrode few critical methods like

  • getSplits()
  • getRecordReader()

The getSplits() will create the splits from the input data, while the getRecordReader() should return an instance of a Java object, which has the ability to read custom records, which in our case was the security vendor’s API.

You can use one of the predefined Record Reader classes or implement your own (most likely if you are writing a custom input format). In our case, we implemented RecordReader interface, and implemented the next() method which is called whenever a new record is found. This is the method where your core business logic is implemented. In our case, we had to write the integration logic by looking at the data, understanding the user who has logged in (available as a part of JobConf object), and then calling the vendor’s APIs to decrypt the data. Sample codes can be requested by contacting me directly.

Integrating a custom input format with SAS

Integrating a custom input format is fairly easy with SAS. SAS allows us to plug in custom formats, which are called before the data is processed via SAS Embedded Process using HDMD files.

When you generate an HDMD file using PROC HDMD, you can specify your custom input format as a part of the generated XML file. Please refer to PROC HDMD documentation.

The generated HDMD file would look something like this.

sas-integration-with-hadoop04

When loading the data from HDFS, SAS will ensure that the specified input format is called prior to any data processing taking place.

The ultimate solution

The solution was demonstrated using data from the tax authorities and included tokenization of data via hive UDFS, detokenization of data according to the policies set on the data security appliance, and performing analytics using SAS Visual Analytics. Only users with permissions on the specific policy were able to view the data, while users with no permissions had access to decrypted data. This additional security helped the enterprise protect users’ information from inadvertent access and resulted in widespread use of Big Data technologies within the Enterprise.

Summary

As you can see from the example above, SAS is open for business, and is already providing deep integration with Hadoop and other technologies using custom APIs. The sky is the limit for people willing to explore the capabilities of SAS.

tags: Global Technology Practice, Hadoop, SAS/ACCESS Interface to Hadoop

SAS integration with Hadoop - one success story was published on SAS Users.

12月 022016
 
About two years ago we published a quick and easy guide to setting up your own RStudio server in the cloud using the Docker service and Digital Ocean. The process is incredibly easy-- about the only cumbersome part is retyping a random password. Today the excitement in virtual private servers is that Amazon is getting into the market, with their Lightsail product. They are not undercutting Digital Ocean entirely-- in fact, their prices look to be just about identical. But Amazon's interface may have some advantages for you, so here's how to get Docker and RStudio running with Amazon Lightsail.



1. Log in to Lightsail

2. Create an Instance; choose the Base OS, and Ubuntu (as of this writing 16.04 LTS)

3. Name it what you like

4. Wait for boot up. Once it's running, click "connect" under the three dots. This opens a console window where you are already logged in, saving some headache vs. Digital Ocean.

5. Time for console commands. Type: sudo apt-get install docker.io Then Y for yes to add the new material.

6. Type: sudo service docker start

7. Now you can start your docker/rstudio container. See our earlier blog post or this link for resources. Shortcuts:

a. Plain Rstudio: sudo docker run -d -p 8787:8787 rocker/rstudio

b. All of Hadleyverse: sudo docker run -d -p 8787:8787 rocker/hadleyverse

c. Custom password: sudo docker run -d -p 8787:8787 -e USER=ken -e PASSWORD=ken rocker/hadleyverse

d. Enable root: sudo docker run -d -p 8787:8787 -e ROOT=TRUE rocker/rstudio

8. Important! While the container is starting, go back to the Lightsail tab in your browser and click in the three dots in the "Running" instance to Manage. then click on the Networking tab. In the table of two enabled ports, click on the plus "Add Another". Leave "Custom" and "All" under "Aplication" and "Protocol", repectively, and change port range to 8787. Save.

9. The public IP is printed on the Networking page there. Cut and paste into your browser with :8787 appended. Your username and password are both rstudio, unless you changed them. To allow additional users onto your cloud server, see this page.

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work, except as noted above.
12月 012016
 

When I was a kid, I always looked forward to Casey Kasem's American Top 40 song countdown at the end of the year. Did I listen to check whether my favorite songs had made the list, or to critique how well the people making the list had done in picking the 'right' […]

The post My top 10 graph blog posts of 2016! appeared first on SAS Learning Post.

12月 012016
 

In my earlier post about WHERE and IF statements, I announced that the DATA step debugger has finally arrived in SAS Enterprise Guide. (I admit that I might have buried the lead in that post.) Let's use this post to talk about the new debugger and how it works.

First, let's address some important limitations. This tool is for debugging DATA step code. It can't be used to debug PROC SQL or PROC IML or SAS macro programs. Next, it can't be used to debug DATA steps that read data from CARDS or DATALINES. That's an unfortunate limitation, but it's a side effect of the way the DATA step "debug" mode works with client applications like SAS Enterprise Guide. (Workaround: load your data in a separate step, then debug your more complex DATA step logic in a subsequent step.)

Ye olde DATA step debugger

1986 called; they want their debugger back

1986 called; they want their debugger back.

If you've been around SAS programs for a while then you might remember the full-screen DATA step debugger in the SAS windowing environment. Introduced as production in SAS 6.09E (E="enhanced!"), it was basic but it did the job, relying on command-line processing to direct the debugger actions. It had only two windows: one for the source, and one for the "log", meaning the debugger console log. You could set breakpoints, variable watch conditions, examine variables and calculate values -- all with commands that you typed. (Even though I'm writing this in the past tense and it seems like I'm eulogizing, this debugger still lives on in Base SAS!)

The new DATA step debugger

The new debugging environment, introduced in SAS Enterprise Guide 7.13, has all of the features of its ancestor. And it's much more usable, with toolbars and windows that allow you to control its behavior. But keyboard junkies, don't worry -- that command line is still there too!

To activate the debugger, click the new "bug" toolbar icon in the program editor window. Once activated, you can click the bug in the left "gutter" of the program editor to begin a debug session. (You can also press F5 to debug the active DATA step.)
Starting the Debugger
Examine the screenshot below. You see the source window on top and the console window at the bottom, plus a convenient "watch" window that shows much of the content in the program data vector (PDV). That's all of the variables defined in the DATA step, plus automatic variables like _N_ and _ERROR_.

EG debugger
As you step through the DATA step, the line pointer in the source window advances to show the next line that will execute. You can use keyboard shortcuts (F10), the toolbar, or typed a typed command ("step") to execute that line and advance. With every step, the watch window is updated with the latest values of the variables in your step. When a variable changes value, it's colored red. If you want to the DATA step to break processing when a certain variable changes value, check the Watch box for that variable.

Diving deeper with advanced debugging

Here's another example of debugging a different DATA step program. This program uses a BY statement and FIRST.variable logic, and you can see the additional automatic variables (FIRST.Make and LAST.Make) that the debugger is tracking. I also used END=eof on the SET statement; that adds the eof "flag" variable into the mix during run time.

egdebug_adv
In the Debug Console window you can see that I've issued some pretty fancy commands. The DATA step debugger allows you to set breakpoints that trigger on specific conditions. For example, "b 8 when (running_price > 10000)" will break on Line 8 when the value of running_price exceeds 10,000. "b 8 after 5" will break on Line 8 after 5 passes through the DATA step. You can set and clear line-specific breakpoints by clicking in the "gutter" (that left-hand margin next to the line numbers).

The "list _all_" command reveals the details about your open data sets and files. Here's what I see during the run of my program.

list command
Other commands let you SET variable values, EXAMINE variables, CALCulate expressions, GO and JUMP to specific lines, and more. The SAS documentation contains a complete reference for DATA step debugger commands, and most of those work exactly as documented, even within SAS Enterprise Guide. Here's the list:

This old-but-still relevant SAS Global Forum paper (written by a SAS user) also covers some useful debugging concepts in SAS which you can apply in this new environment.

A personal note: eating my words

I've presented "SAS Enterprise Guide for SAS programmers" as a topic in one form or another for the past 15 years. Every so often the topic of the DATA step debugger comes up, and I've said "don't look for it anytime soon." Knowing how the full-screen debugger is closely tied to the SAS windowing environment, I didn't hold out hope for a client application like SAS Enterprise Guide to get it working. Kudos to the R&D team! They creatively found a solution with the "/ldebug" option, an even more obscure debugging approach that works in SAS batch mode. I think this feature will be tremendous productivity boost for experienced SAS programmers, and a useful learning and teaching tool for those just getting started with the DATA step.

tags: SAS Enterprise Guide, SAS programming

The post Using the DATA step debugger in SAS Enterprise Guide appeared first on The SAS Dummy.

12月 012016
 

Data virtualization is an agile way to provide virtual views of data from multiple sources without moving the data. Think of data virtualization as an another arrow in your quiver in terms of how you approach combining data from different sources to augment your existing Extract, Transform and Load ETL batch processes. SAS® Federation Server is a unique data virtualization offering that provides not only blending of data, but also on-demand data masking, encryption and cleansing of the data. It provides a central, virtual environment for administering and securing access to your Personally Identifiable Information (PII) and other data.

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three part blog series, I will explore the topic of controlling data access using SAS Federation Server. The series will cover the following topics:

Part 1: Securing sensitive data using SAS Federation Server at the data source level
Part 2: Securing sensitive data using SAS Federation Server at the row and column level
Part 3: Securing sensitive data using SAS Federation Server data masking

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. In this blog, I want to explore controlling data access to specific sources of data using SAS Federation Server.  Obviously, you can secure data at its source by using secured metadata-bound libraries in SAS Metadata Server or by using a database’s or file’s own security mechanisms. However, SAS Federation Server can be used to control access to these data sources by authenticating with the users and groups in SAS Management Console and setting authorizations within SAS Federation Server Manager.

In order to show how SAS Federation Server can be used to control access data, I will explore an example where Finance Users in our fictitious company SHOULD have access to the Salary data in a SAS dataset, but our Business Users should NOT.Instead, ourBusiness Users should have access to all other BASE tables with the exception of SALARY. In my scenario, Kate is a Finance User and David and Sally are Business Users. These users have already been setup as such in SAS Metadata Server.

The SAS Federation Server Administrator has setup the BASE catalog and schema information in Federation Server Manager. The SALARY table is located in the Employee_Info schema within the Global catalog.

securing-sensitive-data-using-sas-federation-server01

The SAS Federation Server Administrator has also explicitly granted the CONNECT and SELECT permissions to both the Business Users and Finance Users group for the BASE Data Service.

securing-sensitive-data-using-sas-federation-server02

securing-sensitive-data-using-sas-federation-server03

This gives both groups permission to connect to and select information from the items within this Data Service. The information is inherited by all children items of the Data Service – Data Source Names, Catalogs, Schemas, Tables and Views.  For example, note that the Business Users group has inherited the Grant setting for the CONNECT permission on the BASE Data Source Name (DSN) and the SELECT permission on the EMPLOYEES table.  Permission inheritance is denoted by the diamond symbol (u).

securing-sensitive-data-using-sas-federation-server04

securing-sensitive-data-using-sas-federation-server05

For the SALARY table, the SAS Federation Server Administrator has explicitly denied the SELECT permission for the Business Users group whereas the Finance Users groups has inherited the Grant setting for the SELECT permission on the SALARY table.

securing-sensitive-data-using-sas-federation-server06

securing-sensitive-data-using-sas-federation-server07

Kate, who is a member of the Finance Users group, has permission to select records from the SALARY table.

securing-sensitive-data-using-sas-federation-server08

Note: The user does not need to know the physical location of where the SAS data resides. They simply refer to the Federation Server Data Source Name which in this case is BASE.

By denying the Business Users group the SELECT permission on the SALARY table, David, who is a member of the Business Users group, does NOT have access to select records from this table. He is denied access.

securing-sensitive-data-using-sas-federation-server09

However, David still has access to the EMPLOYEES table since the Business Users group inherited the SELECT permission for that table.

securing-sensitive-data-using-sas-federation-server10

If I want to prevent David from accessing any of the tables or views in the Employee_Info schema, but still allow other Business Users to access them, then as the SAS Federation Server Administrator I can explicitly deny the user, David, the SELECT permission for the Employee_Info schema as shown below.

securing-sensitive-data-using-sas-federation-server11

Now, David inherits the Deny setting for the SELECT permission for all tables and views within that schema and he will no longer be able to access the EMPLOYEES table.

securing-sensitive-data-using-sas-federation-server12

However, Sally, another member of the Business Users group, is still able to access the EMPLOYEES table.

securing-sensitive-data-using-sas-federation-server13

In this blog entry, I covered the first part of this series on controlling data access to SAS Federation Server 4.2:

Part 1: Securing sensitive data using SAS Federation Server at the data source level
Part 2: Securing sensitive data using SAS Federation Server at the row and column level
Part 3: Securing sensitive data using SAS Federation Server data masking

I’ll be posting Part 2 of this series soon. Keep an eye out for it.

For more information on SAS Federation Server:

tags: SAS Federation Server, SAS Professional Services, Securing data

Securing sensitive data using SAS Federation Server at the data source level was published on SAS Users.

12月 012016
 

Hi there! I am Murali Sastry, a student pursuing my Master’s degree in Analytics at Capella University (CU). I love data and really enjoy digging into it to find valuable insights. If you’re a student in an analytics program like I am, let me give you a bit of advice. […]

The post Analytics competition provides students real-world opportunity (and why that’s so important!) appeared first on SAS Analytics U Blog.

11月 302016
 

Balance. This is the challenge facing any organisation wishing to exploit their customer data in the digital age. On one side we have the potential for a massive explosion of customer data. We can collect real-time social media data, machine data, behavioural data and of course our traditional master and […]

The post How can data privacy and protection help drive better analytics? appeared first on The Data Roundtable.

11月 302016
 

Do you want to create customized SAS graphs by using PROC SGPLOT and the other ODS graphics procedures? An essential skill that you need to learn is how to merge, join, append, and concatenate SAS data sets that come from different sources. The SAS statistical graphics procedures (SG procedures) enable you to overlay all kinds of customized curves, markers, and bars. However, the SG procedures expect all the data for a graph to be in a single SAS data set. Therefore it is often necessary to append two or more data sets before you can create a complex graph.

This article discusses two ways to combine data sets in order to create ODS graphics. An alternative is to use the SG annotation facility to add extra curves or markers to the graph. Personally, I prefer to use the techniques in this article for simple features, and reserve annotation for adding highly complex and non-standard features.

Overlay curves

sgplotoverlay

In a previous article, I discussed how to structure a SAS data set so that you can overlay curves on a scatter plot.

The diagram at the right shows the main idea of that article. The X and Y variables contain the original data, which are the coordinates for a scatter plot. Secondary information was appended to the end of the data. The X1 and Y1 variables contain the coordinates of a custom scatter plot smoother. The X2 and Y2 variables contain the coordinates of a different scatter plot smoother.

This structure enables you to use the SGPLOT procedure to overlay two curves on the scatter plot. You use a SCATTER statement and two SERIES statements to create the graph. See the previous article for details.

Overlay markers: Wide form

In addition to overlaying curves, I sometimes want to add special markers to the scatter plot. In this article I will show how to add a marker that shows the location of the sample mean. This article shows how to use PROC MEANS to create an output data set that contains the coordinates of the sample mean, then append that data set to the original data.


Add special markers to a graph using PROC SGPLOT #SASTip
Click To Tweet


The following statements use PROC MEANS to compute the sample mean for four variables in the SasHelp.Iris data set, which contains the measurements for 150 iris flowers. To emphasize the general syntax of this computation, I use macro variables, but that is not necessary:

%let DSName = Sashelp.Iris;
%let VarNames = PetalLength PetalWidth SepalLength SepalWidth;
 
proc means data=&DSName noprint;
var &VarNames;
output out=Means(drop=_TYPE_ _FREQ_) mean= / autoname;
run;

The AUTONAME option on the OUTPUT statement tells PROC MEANS to append the name of the statistic to the variable names. Thus the output data set contains variables with names like PetalLength_Mean and SepalWidth_Mean. As shown in the diagram in the previous section, this enables you to append the new data to the end of the old data in "wide form" as follows:

data Wide;
   set &DSName Means; /* add four new variables; pad with missing values */
run;
 
ods graphics / attrpriority=color subpixel;
proc sgplot data=Wide;
scatter x=SepalWidth y=PetalLength / legendlabel="Data";
ellipse x=SepalWidth y=PetalLength / type=mean;
scatter x=SepalWidth_Mean y=PetalLength_Mean / 
         legendlabel="Sample Mean" markerattrs=(symbol=X color=firebrick);
run;
Scatter plot with markers for sample means

The first SCATTER statement and the ELLIPSE statement use the original data. Recall that the ELLIPSE statement draws an approximate confidence ellipse for the mean of the population. The second SCATTER statement uses the sample means, which are appended to the end of the original data. The second SCATTER statement draws a red marker at the location of the sample mean.

You can use this same method to plot other sample statistics (such as the median) or to highlight special values such as the origin of a coordinate system.

Overlay markers: Long form

In some situations it is more convenient to append the secondary data in "long form." In the long form, the secondary data set contains the same variable names as in the original data. You can use the SAS data step to create a variable that identifies the original and supplementary observations. This technique can be useful when you want to show multiple markers (sample mean, median, mode, ...) by using the GROUP= option on one SCATTER statement.

The following call to PROC MEANS does not use the AUTONAME option. Therefore the output data set contains variables that have the same name as the input data. You can use the IN= data set option to create an ID variable that identifies the data from the computed statistics:

/* Long form. New data has same name but different group ID */
proc means data=&DSName noprint;
var &VarNames;
output out=Means(drop=_TYPE_ _FREQ_) mean=;
run;
 
data Long;
set &DSName Means(in=newdata);
if newdata then 
   GroupID = "Mean";
else GroupID = "Data";
run;

The DATA step created the GroupID variable, which has the values "Data" for the original observations and the value "Mean" for the appended observations. This data structure is useful for calling PROC SGSCATTER, which supports the GROUP= option, but does not support multiple PLOT statements, as follows:

ods graphics / attrpriority=none;
proc sgscatter data=Long 
   datacontrastcolors=(steelblue firebrick)
   datasymbols=(Circle X);
plot (PetalLength PetalWidth)*(SepalLength SepalWidth) / group=groupID;
run;
Scatter plot matrix with markers for sample means

In conclusion, this article demonstrates a useful technique for adding markers to a graph. The technique requires that you concatenate the original data with supplementary data. Appending and merging data is a technique that is used often when creating ODS statistical graphics in SAS. It is a great technique to add to your programming toolbox.

tags: SAS Programming, Statistical Graphics, Tips and Techniques

The post Append data to add markers to SAS graphs appeared first on The DO Loop.