Data Management

4月 282017
 

ETL automationWhen developing SAS applications, you can feed database tables into your application by using the libname access engine either by directly referring a database table, or via SAS or database views that themselves refer to one or more of the database tables.

However, such on-the-fly data access may not be efficient enough, especially for interactive SAS applications when system response time is critical. In case of distributed systems where a database might reside on one server, and the SAS Application server – on a different physical server, user wait time caused by delays in data access across networks and databases might become intolerable. In such cases, it makes perfect sense to extract database tables and load them into SAS tables in advance, preferably placing them on (or close to) the SAS Application server. That will ensure higher operational efficiency (responsiveness) of the interactive SAS application.

SAS Data Integration Studio provides a powerful visual design tool for building, implementing, and managing data integration processes across different data sources, applications, and platforms.

However, in case of multiple tables, loading them one by one using even such a powerful tool might become a bit tedious. In an automation paradigm, being “visual” is not necessarily a good thing; when we automate it implies that we want to get things done without visualizing or even seeing them.

Here is a SAS coding approach, which I recently used for a customer project, to automate the repetitive process.

SAS code example of loading multiple Oracle tables into SAS tables

There were several Oracle tables that needed to be extracted and loaded into SAS data tables on a different physical server. The Oracle tables contained multi-year data, but we were only interested in the latest year’s data. That is those tables that have datetime stamp EVENT_DT column needed to be subset to the latest rolling 365 days.

Step 1. Creating a driver table

In order to make this process data-driven, let’s first create a driver table containing a list of the table names that needed to be extracted and loaded:

/* --------------------------------------------------- */
/* create table list to extract & load,                */
/* dt_flag indicates whether EVENT_DT variable present */
/* --------------------------------------------------- */
 
libname parmdl '/sas/data/parmdata';
 
data parmdl.tablelist;
        length tname $8 dt_flag $1;
        input tname dt_flag;
        datalines;
ADDRESS N
ACCOUNT Y
BENEFIT Y
EXCLUSN N
FINANCE Y
HOUSING Y
;

This program runs just once to create the driver table parmdl.tablelist.

Step 2. Loading multiple tables

The following data-driven program runs every time when you need to reload Oracle tables into SAS:

/* source ORACLE library */
libname oralib oracle path="xxx" schema="yyy" user="uuu"
 	PASSWORD="{SAS002}D2DF612A161F7F874C4EF97F" access=readonly;
 
/* target SAS library */
libname sasdl '/sas/data/appdata';
 
/* driver table SAS library */
libname parmdl '/sas/data/parmdata';
 
options symbolgen mprint mlogic;
%let cutoff_date = %eval(%sysfunc(today()) - 365);
 
/* --------------------------------------------------- */
/* loop through table list to extract & load           */
/* --------------------------------------------------- */
%macro ETL;
   %let dsid = %sysfunc(open(parmdl.tablelist));
   %syscall set(dsid);
   %do %while(%sysfunc(fetch(&dsid)) eq 0);
      data sasdl.&tname;
         set oralib.&tname;
         %if &dt_flag eq Y %then %str(where datepart(EVENT_DT) ge &cutoff_date;);
      run;
   %end;
   %let rc = %sysfunc(close(&dsid));
%mend ETL;
%ETL

In this code, we use a similar coding technique to that described in my earlier blog post Modifying variable attributes in all datasets of a SAS library. We loop through the initially created table parmdl.tablelist. in a macro %do-loop, and for each observation containing a single table name, implement a data step extracting one table at a time.

If macro variable &dt_flag=Y, then we generate and insert a where statement sub-setting the extracted table to the latest year’s data.

Note a very useful feature here %syscall set(dsid) that links SAS data set variables to macro variables that have the same name and data type. Since we have not defined those macro variables, in essence, it creates a namesake macro variable for each SAS data set variable.

Implementing data-driven load in SAS Data Integration Studio

The above SAS code is run on the SAS Application server – the same server where the SAS target library resides. To fully automate the ETL process, this code may be embedded into a SAS job in SAS Data Integration Studio and then scheduled via either Platform Suite for SAS or your operating system scheduler. Alternately, it can be run in batch mode or via script that that in turn can be scheduled by the operating system scheduler.

In SAS DI Studio:

1.  Create a New job and give it a meaningful name (e.g. Loading Oracle Tables into SAS Tables.)

2.  Drag & drop User Written Code transformation onto the SAS Job canvas and give it a meaningful name (e.g. ETL macro):

SAS Data Integration Studio Job

3.  Go to the Properties of this transformation and click on the Code tab, then paste your SAS code in the text entry area:

SAS DI job properties

4.  Click OK and Save your SAS Job.

Note, that we do not add Source and Target tables to the Job canvas, only User Written Code transformation. Also note, that the %ETL macro has no direct references to the database table names being extracted and loaded into SAS datasets. Instead, it receives the names of the tables to be loaded from the PARMDL.TABLELIST dataset. This makes it extremely flexible to the task of processing a completely different set of tables; all you need to do is update the table list in the driver table. No change to your SAS DI job is needed.

Now it's your turn

What is your experience with automating ETL processes? Please feel free to share it here in the Comments section.

 

Automating the loading of multiple database tables into SAS tables was published on SAS Users.

3月 092017
 

SAS® Federation Server provides a central, virtual environment for administering and securing access to your data. It also allows you to combine data from multiple sources without moving or copying the data. SAS Federation Server Manager, a web-based application, is used to administer SAS Federation Server(s).

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three-part blog series, I will explore the topic of controlling data access using SAS Federation Server. The series covers the following topics:

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data.  Note: Permissions applied for a particular data source cannot be bypassed with SAS Federation Server security. If permissions are denied at the source data, for example on a table, then users will always be denied access to that table, no matter what permissions are set in SAS Federation Server.

In this post, I will build on the examples from my previous articles and demonstrate how you can use data masking to conceal actual data values from users, but still allow them access for analysis and reporting purposes.

In previous posts, I gave the Finance Users group access to the SALARY table. Linda is a member of the Finance Users group, so currently she has access to the SALARY table.

However, I want to restrict her access. She needs access to the Salary info for analytic purposes, but does not need to know the identifying data of IDNUM, so I can hide that column from her. She does need the JOBCODE information for her analytics; however, she does not need to know the actual JOBCODE information associated with the record, so that data can be masked to prevent her from viewing that identifying information.

First, I create a FedSQL View of the SALARY table. FedSQL is the implementation of SQL that SAS Federation Server uses to access relational data.  For the view, I set the Security to Use the definer’s privileges when accessed since I will eventually deny Linda the rights to view the underlying table to the view.

Here is the default code for the view:

I change the code to the following to remove the IDNUM column from the view and mask the JOBCODE column, so Linda will not know what is the real JOBCODE associated with the Salary.

There are several data masking functions available for use. In this instance, I use the TRANC function to mask the JOBCODE field using transliterated values by replacing the first three characters with other values.  Refer to the Data Masking section of the SAS Federation Server Manager 4.2: User’s Guide for more information on the different data masking functions.

Now that I have created the FedSQL view, I then need to grant Linda authorization to it.

Next, I need to deny Linda authorization to the SALARY table, so she won’t be able to access the original table.

Linda is only able to view the SALARY_VIEW with the IDNUM column removed and the JOBCODE information masked.

Linda is denied access to the SALARY table.

However, Kate another member of the Finance team is able to view the full SALARY table with the IDNUM column and the real information (non-masked) in the JOBCODE column.

In this blog entry, I covered the third part of this series on controlling data access to SAS Federation Server 4.2.  Other blogs in the series include

For more information on SAS Federation Server visit the:

Securing sensitive data using SAS Federation Server data masking was published on SAS Users.

2月 112017
 

People come from all over the world to attend this highlight of the season. It’s been a tradition for decades. Hotels book months in advance. Traffic is horrendous in the city center. The coveted tickets can cost thousands of dollars, but tens of thousands of people are lucky enough to score them. In […]

It's February. Game On! was published on SAS Voices.

2月 112017
 

People come from all over the world to attend this highlight of the season. It’s been a tradition for decades. Hotels book months in advance. Traffic is horrendous in the city center. The coveted tickets can cost thousands of dollars, but tens of thousands of people are lucky enough to score them. In […]

It's February. Game On! was published on SAS Voices.

1月 302017
 

I’ve had several meetings lately on data management, and especially integration, where the ability to explore alternatives has been critical. And the findings from our internet of things (IoT) early adopters survey confirms that the ecosystem nature of data sources in IoT deployments means we need to expand the traditional […]

It’s data integration, but not as we know it was published on SAS Voices.

12月 202016
 

Data governance seems to be the hottest topic at data-related conferences this year, and the question I get asked most often is, “where do we start?” Followed closely by how do we do it, what skills do we need, how do we convince the rest of the organisation to get […]

4 steps to get started with data governance was published on SAS Voices.

12月 162016
 

“Analytics” and “data scientist” aren’t new terms, but they are trending buzzwords. The popularity of these concepts has created a false impression: Analytics are mysterious abstractions that can only be decoded if you have a white lab coat and an advanced degree in computer science. The reality couldn’t be more different. […]

No data scientist? No analytics platform? No problem. was published on SAS Voices.

12月 142016
 

The insurance industry is becoming increasingly focused on the digitalization of its business processes. There are many factors driving digitalization, but it’s clear that a reliable and meaningful database is the basic prerequisite successful digitalization strategy. Insurance companies are increasingly prioritizing digitalization, not because this issue is currently "in," but […]

Drivers for the digitalization of insurance was published on SAS Voices.