Data Management

8月 182017
 

SAS Viya deployments use credentials for accessing databases and other third-party products that require authentication. In this blog post, I will look at how this sharing of credentials is implemented in SAS Environment Manager.

In SAS Viya, domains are used to store the:

  • Credentials required to access external data sources.
  • Identities that are allowed to use those credentials.

There are three types of domains:

  • Authentication stores credentials that are used to access an external source that can then be associated with a caslib.
  • Connection used when the external database has been set up to require a User ID but no password.
  • Encryption stores an encryption key required to read data at rest in a path assigned to a caslib.

In this blog post we will focus on authentication domains which are typically used to provide access to data in a database management system. It is a pretty simple concept; an authentication domain makes a set of credentials available to a set of users. This allows SAS Viya to seamlessly access a resource. The diagram below shows a logical view of a domain. In this example, the domain PGAuth stores the credentials for a Postgres database, and makes those credentials available to two groups (and their members) and three users.

How does this work when a user accesses data in a database caslib? The following steps are performed:

1.     Log on to SAS Viya using personal credentials: the user’s identity is established including group memberships.

2.     Access a CASLIB for a database: using the user’s identity and the authentication domain of the CASLIB, Viya will look up the credentials associated with that identity in the domain.

3.     Two results are possible. A credential match is:

  • 1.     Found: the credentials are passed to the database authentication provider to determine access to the data.
  • 2.     Not found: no access to the data is provided.

To manage domains in SAS Environment Manager you must be an administrator. In SAS Environment Manager select Security > Domains. There are two views available:  Domains and Credentials. The Domains view lists all defined domains. You can access the credentials for a domain by right-clicking on the domain and selecting Credentials.

The Credentials view lists all credentials defined and the domains for which they are associated.

Whatever way you get to a credential, you can edit it by right-clicking and selecting Edit. In the edit dialog, you can specify the Identities (users and groups) that can use the credential, and the User ID and Password of the credential.  Note that only users who are already listed in the Identities field will be able to edit this field, so make sure you are in this field (directly or through group membership) prior to saving.

To use an authentication domain, you reference it in the CASLIB definition. When defining a non-path based CASLIB you must select a domain to provide user credentials to connect to the database server. This can be done when creating a new CASLIB in SAS Environment Manager in the Data > Libraries area.

If you use code to create or access your caslib, use the authenticationdomain option. In this example, we specify authenticationdomain in the table.addcaslib action.

If a user is not attached to the authentication domain directly, or through a group membership, they will not be able to access the credentials. An error will occur when they attempt to access the data.

This has been a brief look at storing and using credentials to access databases from SAS Viya. You can find  more detail in the SAS Viya Administration Guide in the section titled SAS Viya sharing credentials for database access was published on SAS Users.

8月 172017
 

In this blog post I am going to cover the example of importing data into SAS Viya using Cloud Analytic Services (CAS) actions via REST API. For example, you may want to import data into a CASLib via REST API.  This means you can perform an import of data outside of the SAS Self-Service Import user interface environment using REST API.  Once this data is loaded into CAS it is available for use in applications such as SAS Visual Analytics and SAS Visual Data Builder.

Introduction

To import data into SAS Viya via REST API, you need to make a series of REST API calls:

1.     Start CAS Session
2.     Load Data into a CASLib
3.     End CAS Session

I will walk through these various REST API calls in the sections below using the REST API testing application HTTPRequestor, which is a free add-on to the Mozilla Firefox browser.

Before I perform any of my REST API calls, I need to Base-64 encode my credentials. The input for encoding the credentials is: I used the site https://www.base64encode.org/ to encode my credentials.  Note: You can use other methods (e.g., Python) to encode your credentials. Use the preferred method by your organization to ensure you are meeting their security protocols.

Below is the header Authorization information I will be sending with each of my requests.

Authorization Header

1.     Start CAS Session

First, I need to start a CAS Session. Below is an example request for starting a CAS Session:

POST https://<YourCASServer:Port>/cas/sessions

Authorization: Basic <Base-64EncodedCredentials>
 Content-Type: application/json

{}

This request returns the CASSessionUUID needed in the next step.

I construct my request in HTTPRequestor as follows and submit the request:

Start CAS Session Request/Response

Here is a screenshot of the raw transaction information.

Start CAS Session Raw Transaction

I need to copy the CAS Session UUID information that was returned for use in the subsequent REST API calls since their CAS Actions must be performed within a CAS Session.

2.     Load Data into a CASLib

Now that I have started my CAS session and have its UUID, I can load the table to CAS. Below is an example request for the table.loadTable CAS Action:

POST 
https://<YourCASServer:Port>/cas/sessions/<CASSessionUUID>/actions/table.load
Table

Authorization: Basic <Base-64EncodedCredentials>
 Content-Type: application/json

{"casLib":"<InputCASLib>","importOptions":{"fileType":"<FileType>"},"path":"<InputFilePathAndName>",
 "casout":{"caslib":"<OutputCASLib>","name":"<OutputTableName>","promote":true}}

 

This request returns a log message: “NOTE: Cloud Analytic Services made the file <InputFilePathAndName> available as table <OutputTableName> in caslib <OutputCASLib>.”

For my example, I will load the SAS data set BASEBALL located in the helpdata CASLib to the Public CASLib and call the CAS Table SAS_BASEBALL.  I am copying the data to the Public CASLib to make it more readily available to all CAS users. Let’s first confirm that the SAS_BASEBALL table does not currently exist in the Public CASLib.

Public CASLib Before LoadTable CAS Action Called

I construct my request in HTTPRequestor as follows and submit the request:

Load Table Request/Response

Here is a screenshot of the raw transaction information.

Load Table Raw Transaction

Next, I will confirm that the SAS_BASEBALL data set is now loaded in the Public CASLib.

Public CASLib After LoadTable CAS Action Called

The SAS_BASEBALL data set is now available for use in applications such as SAS Visual Analytics and SAS Visual Data Builder.

3.     End CAS Session

Finally, I need to terminate my CAS Session. Below is an example request for the session.endSession CAS Action:

POST https://&lt;YourCASServer:Port&gt;/cas/sessions/&lt;CASSessionUUID&gt;/actions/session.endSession

Authorization: Basic &lt;Base-64EncodedCredentials&gt;
 Content-Type: application/json

{}

 

This request returns a status of 0 indicating there was no error and the CASSessionUUID specified in the request has ended.

I construct my request in HTTPRequestor as follows and submit the request:

End CAS Session Request/Response

Here is a screenshot of the raw transaction information.

End CAS Session Raw Transaction

Conclusion

These calls can be strung together so you could schedule their execution. For more information on SAS Viya and REST APIs, refer to the following documentation the SAS Cloud Analytics REST API documentation.

Load Data into SAS Viya via REST API was published on SAS Users.

8月 152017
 

CAS data modelingThe CAS physical data model, i.e.what features CAS offers for data storage, and how to use them to maximize performance in CAS (and consequently SAS Visual Analytics 8.1 too).

So, specifically let’s answer the question:

What CAS physical table storage features can we use to get better performance in CAS and SAS Visual Analytics/CAS?

CAS Physical Table Storage Features

The following data storage features affect how CAS tables are physically structured:

  • Compression
  • Partitioning
  • Sorting
  • Repeated Tables
  • Extended Data Types (Varchar)
  • User Defined Formats

Compression — the Storage Option that Degrades Performance

data public.MegaCorp (compress=yes);
   set baselib.MegaCorp;
run;

Partitioning and Sorting

Partitioning is a powerful tool for improving Bar Charts, Decision Tree, Linear Regression) provide grouping as well as classification functionality.

When performing analyses/processing, CAS first groups the data into the required BY-groups. Pre-partitioning on commonly-used BY-groups means CAS can skip this step, vastly improving performance.

Within partitions, tables can be sorted by non-partition-key variables. Pre-sorting by natural ordering variables (e.g. time) allows CAS to skip the ordering step in many cases just like partitioning allows CAS to skip the grouping step.

For a full use-case, consider a line graph that groups sales by region and plots by date. This graph object would benefit greatly from a CAS table that is pre-partitioned by region and pre-sorted by date.

Join Optimization

Partitioning can also support join operations since both the CAS FedSQL Merge Join algorithm utilize BY-GROUP operations to support their processing.

Pre-partitioning tables in anticipation of joins will greatly improve join performance. A good use case is partitioning both a large transaction table and an equally large reference table (e.g. an enormous Customer table) by the common field, customerID. When a DATA Step MERGE or a FedSQL join is performed between the two tables on that field, the join/merge will take advantage of partitioning for the BY-GROUP operation resulting in something similar to a partition-wise join.

Like Compression, partitioning and sorting can be implemented via CAS actions as well as data set options. Using the data set options is demonstrated below:

data mycas.bigOrderTable (partition=(region division) orderby=(year quarter month));
   set CASorBase.bigOrderTable;
run;

Repeated Tables

By default, in distributed CAS Server deployments, CAS divides incoming tables into blocks and distributes those blocks among its DUPLICATE data set option or the Repeated Tables have two main use-cases in CAS:

1.     Join Optimization
2.     Small Table Operation Optimization

Join Optimization

For join operations, the default data distribution scheme can result in significant network traffic as matching records from the two tables travel between worker nodes to meet. If one of the two tables was created with the DUPLICATE/REPEAT option, then every possible record from that table is available on every node to the other table. There is no need for any network traffic.

Small Table Operation Optimization

For small tables, even single table operations can perform better with repeated instead of divided distribution. LASR actually implemented the “High Volume Access to Smaller Tables” feature for the same reason. When a table is repeated, CAS runs any required operation on a single worker node against the full copy of the table that resides there, instead of distributing the work.

As stated, repeated tables can be implemented with the DUPLICATE data set option, it can also be implemented with the REPEAT option on the PROC CASUTIL LOAD statement. The CASUTIL method is shown below:

proc casutil ;
   load data=sashelp.prdsale outcaslib=”caspath”
           casout=”prdsale” replace REPEAT ;
quit ;

Extended Data Types (VARCHAR)

With Viya 3.2 comes SAS’ first widespread implementation of variable length character fields. While Base SAS offers variable length character fields through compression, Viya 3.2 is the first major SAS release to include a save storage space, it also improves performance by reducing the size of the record being processed. CAS, like any other processing engine, will process narrower records more quickly than wide records.

User Defined Formats

User defined formats (UDFs) exist in CAS in much the same way they do in Base SAS. Their primary function, of course, is to provide display formatting for raw data values. Think about a format for direction. The raw data might be: “E”, “W”, “N”, “S” while the corresponding format values might be “East”, “West”, “North”, “South.”

So how might user defined formats improve performance in CAS? The same way they do in Base SAS, and the same way that VARCHAR does, by reducing the size of the record that CAS has to process. Imagine replacing multiple 200 byte description fields with 1 byte codes. If you had 10 such fields, the record length would decrease 1990 bytes ((10 X 200) – 10). This is an extreme example but it illustrates the point: User defined formats can reduce the amount of data that CAS has to process and, consequently, will lead to performance gains.

CAS data modeling for performance was published on SAS Users.

8月 082017
 

If I’m having a tough week, I might feel sorry for myself for a minute or two. But to snap myself out of it, all I have to do is think of the data managers. Those folks have to contend with duplicate records, bad information, data proliferation, accountability, permissions, privacy, [...]

Data management headaches? We have the cure was published on SAS Voices by Nick Lisi

6月 212017
 

IT organizations today are constantly challenged to do more with less. Reusing data processing jobs and employing best practices in monitoring the health of your data are proven ways to improve the productivity of data professionals. Dataflux Data Management Studio is a component of both the SAS Data Quality and the SAS Data Management offerings that allows you to create data processing jobs to integrate, cleanse and monitor your data.

You can write global functions for SAS Data Management jobs that can be reused in any expression in the system, in either data or process flow jobs. Global functions can be called from expression nodes, monitor rules, profile filters, surviving record indicators, process flow if-nodes and more.

Global functions are defined in a text file and saved in the Data Management installation directory under “etc/udf” in Data Management Studio or Data Management Server respectively.

Each global function has to have a unique name and is wrapped with a function / end function block code, can process any number of input parameters and returns a single value of either integer, real, date, string or boolean type.

Hello World

For a start, let’s create a “hello world” function.

  • If it does not exist, create a folder in the installation directory under “etc/udf” (DM Studio and DM Server).
  • In “etc/udf” create a file named hello_world.txt.
  • In the hello_word file create the function as follows:
function hello_world return string
return “hello world!”
end function
  • Save the file and restart DM Studio, if necessary, in order to use hello_world().

The new function is fully integrated in Data Management Studio. You can see the new function in an expression node under Function->Other or as expression language help in the expression node editor.

Handling Parameters

Global functions can handle any number of parameters. Parameter helper functions are available to access input parameters inside a function:

  • paramatercount() returns the number parameters that have been passed into the function call.  This is helpful if the incoming parameters are unknown.
integer i
for i = 1 to parametercount() 
begin
   // process each parameter
end

 

  • parametertype(integer) returns the type of the parameter for the given parameter position. The first parameter is 1. The return value will either be integer, real, date, string or Boolean.
  • parameterstring(integer), parameterinteger(integer), parameterboolean(integer), parameterdate(integer), parameterreal(integer) these functions return the value of the parameter as specified by position, or null if the parameter doesn’t exist. You can use these functions if you know the incoming parameter type at a given position.
  • parameter(integer) returns the value of the parameter as specified by position, or null if the parameter doesn’t exist. If you don’t know the incoming parameter type you can use this function. Note: Using the parameter() function may require additional overhead to coerce the values to the field type. Using the specific data type parameter functions above will eliminate the cost of coercion.

Global Function Example

This global function will check if password rules are followed.

//////////////////////////////////////////////////////////////////////////
// Function:     check_psw_rule
// Inputs:       string
// Output:       boolean -&gt; true == passed check; false == failed check
// Description:  Check the rules for password. The rules are:
//               Need to be at least 8 characters long
//               Need to have at least one lower case character
//               Need to have at least one upper case character
//               Need to have at least one number
//////////////////////////////////////////////////////////////////////////
function check_psw_rule return boolean
	string check_str
	boolean rc
	regex r
 
	check_str= parameterstring(1)   //copy input parameter to variable
 
	rc= false                       //set default return value to failed (false)
	if(len(check_str) &lt; 8)          //check if at least 8 characters
		return rc
	r.compile("[a-z]")              
	if (!r.findfirst(check_str))    //check if at least one lower case character
		return rc
	r.compile("[A-Z]")
	if (!r.findfirst(check_str))    //check if at least one upper case character
		return rc
	r.compile("[0-9]")
	if (!r.findfirst(check_str))    //check if at least one number
		return rc
	rc= true                        //return true if all checks passed
	return rc
end function

 

This function can be called from any expression in a Data Management job:

boolean  check_result
check_result= check_psw_rule(password)

Global function can also call other global function

Just a few things to be aware of. There is a late binding process, which means if function B() wants to call function A(), then function A() needs to be loaded first. The files global functions are stored in are loaded alphabetically by file name. This means the file name containing function A() has to occurs alphabetically before file name containing function B().

Best Practices

Here are some best practice tips which will help you to be most successful writing global functions:

  1. Create one file per expression function.
    This allows for global functions to easily be deployed and shared.
  2. Use lots of comments.
    Describe what the function’s purpose, expected parameters, and outputs and improve the readability and reusability of your code
  3. Test the expressions in data jobs first.
    Write a global function body as an expression first and test it via preview. This way it is easier to find typos, syntax errors and to ensure that the code is doing what you would like it to do.
  4. Debugging - If the global function is not loading, check the platform_date.log.  For Studio, this could for example be found under: C:\Users\<your_id>\AppData\Roaming\DataFlux\DMStudio\studio1

You now have a taste of how to create reusable functions in Data Management Studio to help you both improve the quality of your data as well as improve the productivity of your data professionals. Good luck and please let us know what kind of jobs you are using to help your organization succeed.

Writing your own functions in SAS Data Quality using Dataflux Data Management Studio was published on SAS Users.

6月 212017
 

IT organizations today are constantly challenged to do more with less. Reusing data processing jobs and employing best practices in monitoring the health of your data are proven ways to improve the productivity of data professionals. Dataflux Data Management Studio is a component of both the SAS Data Quality and the SAS Data Management offerings that allows you to create data processing jobs to integrate, cleanse and monitor your data.

You can write global functions for SAS Data Management jobs that can be reused in any expression in the system, in either data or process flow jobs. Global functions can be called from expression nodes, monitor rules, profile filters, surviving record indicators, process flow if-nodes and more.

Global functions are defined in a text file and saved in the Data Management installation directory under “etc/udf” in Data Management Studio or Data Management Server respectively.

Each global function has to have a unique name and is wrapped with a function / end function block code, can process any number of input parameters and returns a single value of either integer, real, date, string or boolean type.

Hello World

For a start, let’s create a “hello world” function.

  • If it does not exist, create a folder in the installation directory under “etc/udf” (DM Studio and DM Server).
  • In “etc/udf” create a file named hello_world.txt.
  • In the hello_word file create the function as follows:
function hello_world return string
return “hello world!”
end function
  • Save the file and restart DM Studio, if necessary, in order to use hello_world().

The new function is fully integrated in Data Management Studio. You can see the new function in an expression node under Function->Other or as expression language help in the expression node editor.

Handling Parameters

Global functions can handle any number of parameters. Parameter helper functions are available to access input parameters inside a function:

  • paramatercount() returns the number parameters that have been passed into the function call.  This is helpful if the incoming parameters are unknown.
integer i
for i = 1 to parametercount() 
begin
   // process each parameter
end

 

  • parametertype(integer) returns the type of the parameter for the given parameter position. The first parameter is 1. The return value will either be integer, real, date, string or Boolean.
  • parameterstring(integer), parameterinteger(integer), parameterboolean(integer), parameterdate(integer), parameterreal(integer) these functions return the value of the parameter as specified by position, or null if the parameter doesn’t exist. You can use these functions if you know the incoming parameter type at a given position.
  • parameter(integer) returns the value of the parameter as specified by position, or null if the parameter doesn’t exist. If you don’t know the incoming parameter type you can use this function. Note: Using the parameter() function may require additional overhead to coerce the values to the field type. Using the specific data type parameter functions above will eliminate the cost of coercion.

Global Function Example

This global function will check if password rules are followed.

//////////////////////////////////////////////////////////////////////////
// Function:     check_psw_rule
// Inputs:       string
// Output:       boolean -&gt; true == passed check; false == failed check
// Description:  Check the rules for password. The rules are:
//               Need to be at least 8 characters long
//               Need to have at least one lower case character
//               Need to have at least one upper case character
//               Need to have at least one number
//////////////////////////////////////////////////////////////////////////
function check_psw_rule return boolean
	string check_str
	boolean rc
	regex r
 
	check_str= parameterstring(1)   //copy input parameter to variable
 
	rc= false                       //set default return value to failed (false)
	if(len(check_str) &lt; 8)          //check if at least 8 characters
		return rc
	r.compile("[a-z]")              
	if (!r.findfirst(check_str))    //check if at least one lower case character
		return rc
	r.compile("[A-Z]")
	if (!r.findfirst(check_str))    //check if at least one upper case character
		return rc
	r.compile("[0-9]")
	if (!r.findfirst(check_str))    //check if at least one number
		return rc
	rc= true                        //return true if all checks passed
	return rc
end function

 

This function can be called from any expression in a Data Management job:

boolean  check_result
check_result= check_psw_rule(password)

Global function can also call other global function

Just a few things to be aware of. There is a late binding process, which means if function B() wants to call function A(), then function A() needs to be loaded first. The files global functions are stored in are loaded alphabetically by file name. This means the file name containing function A() has to occurs alphabetically before file name containing function B().

Best Practices

Here are some best practice tips which will help you to be most successful writing global functions:

  1. Create one file per expression function.
    This allows for global functions to easily be deployed and shared.
  2. Use lots of comments.
    Describe what the function’s purpose, expected parameters, and outputs and improve the readability and reusability of your code
  3. Test the expressions in data jobs first.
    Write a global function body as an expression first and test it via preview. This way it is easier to find typos, syntax errors and to ensure that the code is doing what you would like it to do.
  4. Debugging - If the global function is not loading, check the platform_date.log.  For Studio, this could for example be found under: C:\Users\<your_id>\AppData\Roaming\DataFlux\DMStudio\studio1

You now have a taste of how to create reusable functions in Data Management Studio to help you both improve the quality of your data as well as improve the productivity of your data professionals. Good luck and please let us know what kind of jobs you are using to help your organization succeed.

Writing your own functions in SAS Data Quality using Dataflux Data Management Studio was published on SAS Users.

6月 212017
 

Today in higher education, savvy users expect to have the information they need to make data-informed decisions at their fingertips. As such, leaders in institutional research (IR) are under pressure to provide these users with accurate data, reports and analyses. IR has been tasked with transforming data and reports in [...]

6 examples of data management, reporting and analytics in higher education was published on SAS Voices by Georgia Mariani

4月 282017
 

ETL automationWhen developing SAS applications, you can feed database tables into your application by using the libname access engine either by directly referring a database table, or via SAS or database views that themselves refer to one or more of the database tables.

However, such on-the-fly data access may not be efficient enough, especially for interactive SAS applications when system response time is critical. In case of distributed systems where a database might reside on one server, and the SAS Application server – on a different physical server, user wait time caused by delays in data access across networks and databases might become intolerable. In such cases, it makes perfect sense to extract database tables and load them into SAS tables in advance, preferably placing them on (or close to) the SAS Application server. That will ensure higher operational efficiency (responsiveness) of the interactive SAS application.

SAS Data Integration Studio provides a powerful visual design tool for building, implementing, and managing data integration processes across different data sources, applications, and platforms.

However, in case of multiple tables, loading them one by one using even such a powerful tool might become a bit tedious. In an automation paradigm, being “visual” is not necessarily a good thing; when we automate it implies that we want to get things done without visualizing or even seeing them.

Here is a SAS coding approach, which I recently used for a customer project, to automate the repetitive process.

SAS code example of loading multiple Oracle tables into SAS tables

There were several Oracle tables that needed to be extracted and loaded into SAS data tables on a different physical server. The Oracle tables contained multi-year data, but we were only interested in the latest year’s data. That is those tables that have datetime stamp EVENT_DT column needed to be subset to the latest rolling 365 days.

Step 1. Creating a driver table

In order to make this process data-driven, let’s first create a driver table containing a list of the table names that needed to be extracted and loaded:

/* --------------------------------------------------- */
/* create table list to extract &amp; load,                */
/* dt_flag indicates whether EVENT_DT variable present */
/* --------------------------------------------------- */
 
libname parmdl '/sas/data/parmdata';
 
data parmdl.tablelist;
        length tname $8 dt_flag $1;
        input tname dt_flag;
        datalines;
ADDRESS N
ACCOUNT Y
BENEFIT Y
EXCLUSN N
FINANCE Y
HOUSING Y
;

This program runs just once to create the driver table parmdl.tablelist.

Step 2. Loading multiple tables

The following data-driven program runs every time when you need to reload Oracle tables into SAS:

/* source ORACLE library */
libname oralib oracle path="xxx" schema="yyy" user="uuu"
 	PASSWORD="{SAS002}D2DF612A161F7F874C4EF97F" access=readonly;
 
/* target SAS library */
libname sasdl '/sas/data/appdata';
 
/* driver table SAS library */
libname parmdl '/sas/data/parmdata';
 
options symbolgen mprint mlogic;
%let cutoff_date = %eval(%sysfunc(today()) - 365);
 
/* --------------------------------------------------- */
/* loop through table list to extract & load           */
/* --------------------------------------------------- */
%macro ETL;
   %let dsid = %sysfunc(open(parmdl.tablelist));
   %syscall set(dsid);
   %do %while(%sysfunc(fetch(&dsid)) eq 0);
      data sasdl.&tname;
         set oralib.&tname;
         %if &dt_flag eq Y %then %str(where datepart(EVENT_DT) ge &cutoff_date;);
      run;
   %end;
   %let rc = %sysfunc(close(&dsid));
%mend ETL;
%ETL

In this code, we use a similar coding technique to that described in my earlier blog post Modifying variable attributes in all datasets of a SAS library. We loop through the initially created table parmdl.tablelist. in a macro %do-loop, and for each observation containing a single table name, implement a data step extracting one table at a time.

If macro variable &dt_flag=Y, then we generate and insert a where statement sub-setting the extracted table to the latest year’s data.

Note a very useful feature here %syscall set(dsid) that links SAS data set variables to macro variables that have the same name and data type. Since we have not defined those macro variables, in essence, it creates a namesake macro variable for each SAS data set variable.

Implementing data-driven load in SAS Data Integration Studio

The above SAS code is run on the SAS Application server – the same server where the SAS target library resides. To fully automate the ETL process, this code may be embedded into a SAS job in SAS Data Integration Studio and then scheduled via either Platform Suite for SAS or your operating system scheduler. Alternately, it can be run in batch mode or via script that that in turn can be scheduled by the operating system scheduler.

In SAS DI Studio:

1.  Create a New job and give it a meaningful name (e.g. Loading Oracle Tables into SAS Tables.)

2.  Drag & drop User Written Code transformation onto the SAS Job canvas and give it a meaningful name (e.g. ETL macro):

SAS Data Integration Studio Job

3.  Go to the Properties of this transformation and click on the Code tab, then paste your SAS code in the text entry area:

SAS DI job properties

4.  Click OK and Save your SAS Job.

Note, that we do not add Source and Target tables to the Job canvas, only User Written Code transformation. Also note, that the %ETL macro has no direct references to the database table names being extracted and loaded into SAS datasets. Instead, it receives the names of the tables to be loaded from the PARMDL.TABLELIST dataset. This makes it extremely flexible to the task of processing a completely different set of tables; all you need to do is update the table list in the driver table. No change to your SAS DI job is needed.

Now it's your turn

What is your experience with automating ETL processes? Please feel free to share it here in the Comments section.

 

Automating the loading of multiple database tables into SAS tables was published on SAS Users.

3月 092017
 

SAS® Federation Server provides a central, virtual environment for administering and securing access to your data. It also allows you to combine data from multiple sources without moving or copying the data. SAS Federation Server Manager, a web-based application, is used to administer SAS Federation Server(s).

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three-part blog series, I will explore the topic of controlling data access using SAS Federation Server. The series covers the following topics:

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data.  Note: Permissions applied for a particular data source cannot be bypassed with SAS Federation Server security. If permissions are denied at the source data, for example on a table, then users will always be denied access to that table, no matter what permissions are set in SAS Federation Server.

In this post, I will build on the examples from my previous articles and demonstrate how you can use data masking to conceal actual data values from users, but still allow them access for analysis and reporting purposes.

In previous posts, I gave the Finance Users group access to the SALARY table. Linda is a member of the Finance Users group, so currently she has access to the SALARY table.

However, I want to restrict her access. She needs access to the Salary info for analytic purposes, but does not need to know the identifying data of IDNUM, so I can hide that column from her. She does need the JOBCODE information for her analytics; however, she does not need to know the actual JOBCODE information associated with the record, so that data can be masked to prevent her from viewing that identifying information.

First, I create a FedSQL View of the SALARY table. FedSQL is the implementation of SQL that SAS Federation Server uses to access relational data.  For the view, I set the Security to Use the definer’s privileges when accessed since I will eventually deny Linda the rights to view the underlying table to the view.

Here is the default code for the view:

I change the code to the following to remove the IDNUM column from the view and mask the JOBCODE column, so Linda will not know what is the real JOBCODE associated with the Salary.

There are several data masking functions available for use. In this instance, I use the TRANC function to mask the JOBCODE field using transliterated values by replacing the first three characters with other values.  Refer to the Data Masking section of the SAS Federation Server Manager 4.2: User’s Guide for more information on the different data masking functions.

Now that I have created the FedSQL view, I then need to grant Linda authorization to it.

Next, I need to deny Linda authorization to the SALARY table, so she won’t be able to access the original table.

Linda is only able to view the SALARY_VIEW with the IDNUM column removed and the JOBCODE information masked.

Linda is denied access to the SALARY table.

However, Kate another member of the Finance team is able to view the full SALARY table with the IDNUM column and the real information (non-masked) in the JOBCODE column.

In this blog entry, I covered the third part of this series on controlling data access to SAS Federation Server 4.2.  Other blogs in the series include

For more information on SAS Federation Server visit the:

Securing sensitive data using SAS Federation Server data masking was published on SAS Users.