Mary Kathryn Queen

3月 092017
 

SAS® Federation Server provides a central, virtual environment for administering and securing access to your data. It also allows you to combine data from multiple sources without moving or copying the data. SAS Federation Server Manager, a web-based application, is used to administer SAS Federation Server(s).

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three-part blog series, I will explore the topic of controlling data access using SAS Federation Server. The series covers the following topics:

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data.  Note: Permissions applied for a particular data source cannot be bypassed with SAS Federation Server security. If permissions are denied at the source data, for example on a table, then users will always be denied access to that table, no matter what permissions are set in SAS Federation Server.

In this post, I will build on the examples from my previous articles and demonstrate how you can use data masking to conceal actual data values from users, but still allow them access for analysis and reporting purposes.

In previous posts, I gave the Finance Users group access to the SALARY table. Linda is a member of the Finance Users group, so currently she has access to the SALARY table.

However, I want to restrict her access. She needs access to the Salary info for analytic purposes, but does not need to know the identifying data of IDNUM, so I can hide that column from her. She does need the JOBCODE information for her analytics; however, she does not need to know the actual JOBCODE information associated with the record, so that data can be masked to prevent her from viewing that identifying information.

First, I create a FedSQL View of the SALARY table. FedSQL is the implementation of SQL that SAS Federation Server uses to access relational data.  For the view, I set the Security to Use the definer’s privileges when accessed since I will eventually deny Linda the rights to view the underlying table to the view.

Here is the default code for the view:

I change the code to the following to remove the IDNUM column from the view and mask the JOBCODE column, so Linda will not know what is the real JOBCODE associated with the Salary.

There are several data masking functions available for use. In this instance, I use the TRANC function to mask the JOBCODE field using transliterated values by replacing the first three characters with other values.  Refer to the Data Masking section of the SAS Federation Server Manager 4.2: User’s Guide for more information on the different data masking functions.

Now that I have created the FedSQL view, I then need to grant Linda authorization to it.

Next, I need to deny Linda authorization to the SALARY table, so she won’t be able to access the original table.

Linda is only able to view the SALARY_VIEW with the IDNUM column removed and the JOBCODE information masked.

Linda is denied access to the SALARY table.

However, Kate another member of the Finance team is able to view the full SALARY table with the IDNUM column and the real information (non-masked) in the JOBCODE column.

In this blog entry, I covered the third part of this series on controlling data access to SAS Federation Server 4.2.  Other blogs in the series include

For more information on SAS Federation Server visit the:

Securing sensitive data using SAS Federation Server data masking was published on SAS Users.

2月 012017
 

SAS® Federation Server provides a central, virtual environment for administering and securing access to your data. It also allows you to combine data from multiple sources without moving or copying the data. SAS Federation Server Manager, a web-based application, is used to administer SAS Federation Server(s).

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three-part series, I will explore the topic of controlling data access using SAS Federation Server.

The series covers the following topics:

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. Note: Permissions applied for particular data source cannot be bypassed with SAS Federation Server security. If permissions are denied at the source data, for example on a table, then users will always be denied access to that table, no matter what permissions are set in SAS Federation Server.

In this blog post, I build on the example in my previous post and demonstrate how you can use SAS Federation Server Manager to control access to columns and rows in tables and views.

Previously, I gave the Finance Users group access to the SALARY table. Robert is a member of the Finance Users group, so he has access to the SALARY table; however, I want to restrict his access to the IDNUM column on the table. To do this, first I view the SALARY table Authorizations in Federation Server Manager, then I select the arrow to the right of the table name to view its columns.

Next, I select the IDNUM column. I then add the user Robert and set his SELECT permission to Deny for the column.

Note: There are 5 columns on the SALARY table.
Since he was denied access to the IDNUM column, Robert is only able to view 4 out of 5 columns.

Susan is also a member of the Finance Users group, so she has access to the SALARY table; however, I want to restrict her access to only rows where the JOBCODE starts with a “Q.” To do this, first I view the SALARY table Authorizations in Federation Server Manager.

Next, I select the Row Authorizations tab and select New Filter. I use the SQL Clause Builder to build my condition of JOBCODE LIKE Q%.

Next, I select the Users and Groups tab and add Susan to restrict her access to the filter I just created.

Finally, I select OK to save the changes I made to Row Authorizations.

Susan is now only able to view the rows of the SALARY table where the JOBCODE begins with “Q.”

In this blog entry, I covered the second part of this series on Securing sensitive data using SAS Federation Server at the row and column level:

Part 1: Securing sensitive data using SAS Federation Server at the data source level
Part 2: Securing sensitive data using SAS Federation Server at the row and column level
Part 3: Securing sensitive data using SAS Federation Server data masking

More information on SAS Federation Server:

tags: SAS Administrators, SAS Federation Server, SAS Professional Services

Securing sensitive data using SAS Federation Server at the row and column level was published on SAS Users.

12月 012016
 

Data virtualization is an agile way to provide virtual views of data from multiple sources without moving the data. Think of data virtualization as an another arrow in your quiver in terms of how you approach combining data from different sources to augment your existing Extract, Transform and Load ETL batch processes. SAS® Federation Server is a unique data virtualization offering that provides not only blending of data, but also on-demand data masking, encryption and cleansing of the data. It provides a central, virtual environment for administering and securing access to your Personally Identifiable Information (PII) and other data.

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three part blog series, I will explore the topic of controlling data access using SAS Federation Server. The series will cover the following topics:

Part 1: Securing sensitive data using SAS Federation Server at the data source level
Part 2: Securing sensitive data using SAS Federation Server at the row and column level
Part 3: Securing sensitive data using SAS Federation Server data masking

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. In this blog, I want to explore controlling data access to specific sources of data using SAS Federation Server.  Obviously, you can secure data at its source by using secured metadata-bound libraries in SAS Metadata Server or by using a database’s or file’s own security mechanisms. However, SAS Federation Server can be used to control access to these data sources by authenticating with the users and groups in SAS Management Console and setting authorizations within SAS Federation Server Manager.

In order to show how SAS Federation Server can be used to control access data, I will explore an example where Finance Users in our fictitious company SHOULD have access to the Salary data in a SAS dataset, but our Business Users should NOT.Instead, ourBusiness Users should have access to all other BASE tables with the exception of SALARY. In my scenario, Kate is a Finance User and David and Sally are Business Users. These users have already been setup as such in SAS Metadata Server.

The SAS Federation Server Administrator has setup the BASE catalog and schema information in Federation Server Manager. The SALARY table is located in the Employee_Info schema within the Global catalog.

securing-sensitive-data-using-sas-federation-server01

The SAS Federation Server Administrator has also explicitly granted the CONNECT and SELECT permissions to both the Business Users and Finance Users group for the BASE Data Service.

securing-sensitive-data-using-sas-federation-server02

securing-sensitive-data-using-sas-federation-server03

This gives both groups permission to connect to and select information from the items within this Data Service. The information is inherited by all children items of the Data Service – Data Source Names, Catalogs, Schemas, Tables and Views.  For example, note that the Business Users group has inherited the Grant setting for the CONNECT permission on the BASE Data Source Name (DSN) and the SELECT permission on the EMPLOYEES table.  Permission inheritance is denoted by the diamond symbol (u).

securing-sensitive-data-using-sas-federation-server04

securing-sensitive-data-using-sas-federation-server05

For the SALARY table, the SAS Federation Server Administrator has explicitly denied the SELECT permission for the Business Users group whereas the Finance Users groups has inherited the Grant setting for the SELECT permission on the SALARY table.

securing-sensitive-data-using-sas-federation-server06

securing-sensitive-data-using-sas-federation-server07

Kate, who is a member of the Finance Users group, has permission to select records from the SALARY table.

securing-sensitive-data-using-sas-federation-server08

Note: The user does not need to know the physical location of where the SAS data resides. They simply refer to the Federation Server Data Source Name which in this case is BASE.

By denying the Business Users group the SELECT permission on the SALARY table, David, who is a member of the Business Users group, does NOT have access to select records from this table. He is denied access.

securing-sensitive-data-using-sas-federation-server09

However, David still has access to the EMPLOYEES table since the Business Users group inherited the SELECT permission for that table.

securing-sensitive-data-using-sas-federation-server10

If I want to prevent David from accessing any of the tables or views in the Employee_Info schema, but still allow other Business Users to access them, then as the SAS Federation Server Administrator I can explicitly deny the user, David, the SELECT permission for the Employee_Info schema as shown below.

securing-sensitive-data-using-sas-federation-server11

Now, David inherits the Deny setting for the SELECT permission for all tables and views within that schema and he will no longer be able to access the EMPLOYEES table.

securing-sensitive-data-using-sas-federation-server12

However, Sally, another member of the Business Users group, is still able to access the EMPLOYEES table.

securing-sensitive-data-using-sas-federation-server13

In this blog entry, I covered the first part of this series on controlling data access to SAS Federation Server 4.2:

Part 1: Securing sensitive data using SAS Federation Server at the data source level
Part 2: Securing sensitive data using SAS Federation Server at the row and column level
Part 3: Securing sensitive data using SAS Federation Server data masking

I’ll be posting Part 2 of this series soon. Keep an eye out for it.

For more information on SAS Federation Server:

tags: SAS Federation Server, SAS Professional Services, Securing data

Securing sensitive data using SAS Federation Server at the data source level was published on SAS Users.

10月 202016
 

SAS Quality Knowledge Base locales in a SAS event stream processing compute windowIn a previous blog post, I demonstrated combining the power of SAS Event Stream Processing (ESP) and the SAS Quality Knowledge Base (QKB), a key component of our SAS Data Quality offerings. In this post, I will expand on the topic and show how you can work with data from multiple QKB locales in your event stream.

To illustrate how to do this I will review an example where I have event stream data that contains North American postal codes.  I need to standardize the values appropriately depending on where they are from – United States, Canada, or Mexico – using the Postal Code Standardization definition from the appropriate QKB locale.  Note: This example assumes that the QKB for Contact Information has been installed and the license file that the DFESP_QKB_LIC environment variable points to contains a valid license for these locales.

In an ESP Compute window, I first need to initialize the call to the BlueFusion Expression Engine Language function and load the three QKB locales needed – ENUSA (English – United States), ENCAN (English – Canada), and ESMEX (Spanish – Mexico).

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window01

Next, I need to call the appropriate Postal Code QKB Standardization definition based on the country the data is coming from.  However, to do this, I first need to standardize the Country information in my streaming data; therefore, I call the Country (ISO 3-character) Standardization definition.

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window02

After that is done, I do a series of if/else statements to standardize the Postal Codes using the appropriate QKB locale definition based on the Country_Standardized value computed above.  The resulting standardized Postal Code value is returned in the output field named PostalCode_STND.

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window03

I can review the output of the Compute window by testing the ESP Studio project and subscribing to the Compute window.

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window04

Here is the XML code for the SAS ESP project reviewed in this blog:

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window05

Now that the Postal Code values for the various locales have been standardized for the event stream, I can add analyses to my ESP Studio project based on those standardized values.

For more information, please refer to the product documentation:

Learn more about a sustainable approach to data quality.

tags: data management, SAS Data Quality, SAS Event Stream Processing, SAS Professional Services

Using multiple SAS Quality Knowledge Base locales in a SAS Event Stream Processing compute window was published on SAS Users.

9月 162016
 

The SAS Quality Knowledge Base (QKB) is a collection of files which store data and logic that define data cleansing operations such as parsing, standardization, and generating match codes to facilitate fuzzy matching. Various SAS software products reference the QKB when performing data quality operations on your data. One of these products is SAS Event Stream Processing (ESP). SAS ESP enables programmers to build applications that quickly process and analyze streaming events. In this blog, we will look at combining the power of these two products – SAS ESP and the SAS QKB.

SAS Event Stream Processing (ESP) Studio can call definitions from the SAS Quality Knowledge Base (QKB) in its Compute window. The Compute window enables the transformation of input events into output events through computed manipulations of the input event stream fields. One of the computed manipulations that can be used is calling the QKB definitions by using the BlueFusion Expression Engine Language function.

Before QKB definitions can be used in ESP projects the QKB must be installed on the SAS ESP server. Also, two environment variables must be set: DFESP_QKB and DFESP_QKB_LIC. The environment variable DFESP_QKB should be set to the path where the QKB data was installed. The environment variable DFESP_QKB_LIC should be set to the path and filename that contains the license(s) for the QKB locale(s).

In this post, I will explore the example of calling the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale in the ESP Compute window. The Source window is reading in events that contain US State data that may or may not be standardized in the 2-character US State abbreviation.

SASEventStreamProcessing

As part of the event stream analysis I want to perform, I need the US_State values to be in a standard format. To do this I will utilize the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale.

First, I need to initialize the call to the BlueFusion Expression Engine Language function and load the ENUSA (English – United States) locale. Note: The license file that the DFESP_QKB_LIC environment variable points to must contain a license for this locale.

SASEventStreamProcessing02

Next, I need to call the QKB definition and return its result. In this case, I am calling the BlueFusion standardize function. This function expects the following inputs: Definition Name, Input Field to Standardize, Output Field for Standardized Value. In this case the Definition Name is State/Province (Abbreviation), the Input Field to Standardize is US_State, and the Output Field for the Standardized Value is result. Note: The field result was declared in the Initialize expression pictured above. This result value is returned in the output field named US_State_STND.

SASEventStreamProcessing03

I can review the output of the Compute window by testing the ESP Studio project and subscribing to the Compute window.

SASEventStreamProcessing04

Here is the XML code for the SAS ESP project reviewed in this blog:

SASEventStreamProcessing05

Now that the US_State values have been standardized for the event stream, I can add analyses to my ESP Studio project based on those standardized values.

For more information, please refer to the product documentation:
SAS Event Stream Processing
DataFlux Expression Engine Language
SAS Quality Knowledge Base

tags: DataFlux Data Management Studio, SAS Event Stream Processing, SAS Professional Services

Using SAS Quality Knowledge Base Definitions in a SAS Event Stream Processing Compute Window was published on SAS Users.

7月 222016
 

In DataFlux Data Management Studio, the predominate component of the SAS Data Quality bundle, the data quality nodes in a data job use definitions from something called the SAS Quality Knowledge Base (QKB). The QKB supports over 25 languages and provides a set of pre-built rules, definitions and reference data that conduct operations such as parsing, standardizing and fuzzy matching to help you cleanse your data.  The QKB comes with pre-built definitions for both customer and product data and allows for customization and addition of rules to accommodate new data types and rules specific to your business. (You can learn more about the QKB here.)

Sometimes you may want to work with an alternate QKB installation that contains different definitions within the same data job. For example, your default QKB may be the Contact Information QKB; however, in your data flow you may want to use a definition that exists in the Product Data QKB.  These data quality nodes have the BF_PATH attribute as part of their Advanced Properties enabling you to do this.
Note: You must have the alternate QKB data installed and be licensed for any QKB locales that you plan to use in your data job.

Here is an example data job that uses the Advanced property of BF_PATH to call the Brand/Manufacturer Extraction definition from the Product Data QKB. It also calls Standardization definitions from the default QKB of Contact Info. Notice that from the data flow perspective, it is one seamless flow.

Using an alternate Quality Knowledge Base in a DataFlux Data Management Studio data job

The BF_PATH advanced property setting for the Extraction node that is using the Brand/Manufacturer definition from the Product QKB contains the path of where the Product Data QKB was installed.
Note: The path setting could be set as a macro variable.  The path information can be obtained from the QKB registration information in the Administration riser bar in Data Management Studio.

Using an Alternate Quality Knowledge Base (QKB) in a DataFlux Data Management Studio Data Job02

Once BF_PATH advanced property is set, you will not be able to use the user interface to make your definition selection. You will need to know the definition name and any other relevant information for the node, so you can add the information using the appropriate Advanced properties.

In my example, I need to set the definition and token fields for the Brand/Manufacturer definition using the PARSE_DEF and PARSE_DEF Advanced properties.

Using an Alternate Quality Knowledge Base (QKB) in a DataFlux Data Management Studio Data Job03

Note: I was able to find out the needed information by viewing the Product Data QKB information in the Administration riser bar in Data Management Studio.

Using an Alternate Quality Knowledge Base (QKB) in a DataFlux Data Management Studio Data Job04

Here is the user interface for the Extraction node that is using the Brand/Manufacturer definition from the Product QKB and its data preview.

Note: The definition cannot be displayed since it is not in the Active QKB.  You can select the Extraction field and Additional Output information on the user interface.

Using an Alternate Quality Knowledge Base (QKB) in a DataFlux Data Management Studio Data Job05

In conclusion, the Advanced property of BF_PATH is useful when you want to use an Alternate QKB installation in your data job.  For more information on Advanced properties for nodes in DataFlux Data Management Studio, refer to the topic “Advanced Properties” in the DataFlux Data Management Studio 2.7: User’s Guide. For more information on the SAS Quality Knowledge Base (QKB), refer to its documentation. Finally, to learn more about SAS Data Quality, visit sas.com/dataquality.

tags: data management, data quality, DataFlux Data Management Studio

Using an alternate Quality Knowledge Base (QKB) in a DataFlux Data Management Studio data job was published on SAS Users.

6月 042016
 

In DataFlux Data Management Studio, the data quality nodes (e.g., Parsing, Standardization, and Match Codes) in a data job use definitions from the SAS Quality Knowledge Base (QKB).  These definitions are based on a locale (Language and Country combination).  Sometimes you would like to work with multi-locale data within the same data job and these data quality nodes have LOCALE attributes as part of their Advanced Properties to help you do this.

For example, you may want to work with data from the United States, Canada, and the United Kingdom within the same data job.  Note:  You must have the QKB locale data installed and be licensed for any locales that you plan to use in your data job.

The Advanced properties you will need to use are LOCALE_FIELD and LOCALE_LIST.  LOCALE_FIELD specifies the column name that contains the 5-character locale value to use for each record.  LOCALE_LIST specifies the list of locales that should be loaded into memory for use within the node.

The first step in using these Advanced properties in a data quality node in a data job is you need a field that contains the 5-character QKB locale information.  The first 2-characters represent the language and the last 3-characters represent the country.  For example, ENUSA represents the English – United States locale and ESESP represents the Spanish – Spain locale.  You can use the Locale Guessing node to create a field that contains a locale value based on a Locale Guess definition from the Quality Knowledge Base (QKB).  Alternatively, you can use a custom Standardization scheme to assign the 5-character locale information as shown in the example below.

 

Using Multiple Quality Knowledge Base Locales01
QKB Locale Standardization Scheme

 

Using Multiple Quality Knowledge Base Locales02
Apply QKB Locale Standardization Scheme

The application of the Standardization scheme is then followed up with an Expression node to assign the locale FRCAN (French – Canada) if the province is Quebec.  Now each record has its 5-character locale information in a field called DF_Locale.

Using Multiple Quality Knowledge Base Locales03
QKB Locale Field Results

Once you have the Locale field as part of your input data, you enter the information as usual for the data quality node.

Using Multiple Quality Knowledge Base Locales04
Match Codes Node Properties

Then you map the field with the 5-character locale information to the LOCALE_FIELD Advanced property for the data quality node. You also need to list the locales that should be loaded into memory in the LOCALE_LIST advanced property. Note: You could pass in this list as values using a macro variable.

Using Multiple Quality Knowledge Base Locales05
Match Codes Node Advanced Properties

Note: The definition used in the selected node must exist in all locales referenced.  For example, the State/Province Match definition only exists in the English – United States, English – Canada, and French – Canada locales.  Therefore, if you are using that definition in a Match Codes node you can only pass in data that is from one of those three locales; otherwise, executing the data job will produce an error.

Here is an example data job that uses the Advanced properties of LOCALE_FIELD and LOCALE_LIST to generate match codes for multi-locale data. Notice that there is minimal branching in the data flow.  The only branching that was needed is the United States and Canada data records are branched to generate the match codes for its State/Province data.

Using Multiple Quality Knowledge Base Locales06
Multi-Locale Data Job Example

In conclusion, the Advanced properties of LOCALE_FIELD and LOCALE_LIST are useful when you want to work with data from multiple locales within the same data job. For more information on Advanced properties for nodes in DataFlux Data Management Studio, refer to the topic “Advanced Properties” in the DataFlux Data Management Studio 2.7: User’s Guide.

tags: data management, DataFlux Data Management Studio, SAS Professional Services

Using Multiple Quality Knowledge Base Locales in a DataFlux Data Management Studio Data Job was published on SAS Users.

5月 142016
 

Trusted data is key to driving accurate reporting and analysis, and ultimately, making the right decision. SAS Data Quality and SAS Data Management are two offerings that help create a trusted, blended view of your data. Both contain DataFlux Data Management Studio, a key component in profiling, enriching monitoring, governing and cleansing your data. Clustering, or grouping similar names or addresses together, is one data quality activity that helps to reduce the number of duplicates in a given data set. You do this by using fuzzy matching to group similar names or addresses together. As an example, a marketing analyst might want to remove duplicate customer names or addresses from a customer list in order to reduce mailing costs.

Have you ever wondered how the cluster results would differ if you changed the match code sensitivity for one of your data columns, or removed a column from one of your cluster conditions or added a new cluster condition? Well, wonder no more!

The Cluster Diff node in DataFlux Data Management Studio compares the results of two different Clustering nodes based on the same input data.  This is useful for comparing the results of different cluster conditions and/or different match code sensitivities.

All records from the input set must be passed to both Clustering nodes and both Clustering nodes must pass out all their data in the same order for this comparison to work. To summarize, in both Clustering nodes, you must select the All clusters output option and you cannot use the Sort output by cluster number option.

DataFlux Data Management Studio with Cluster Comparison

Clustering Node Properties

The results of both Clustering nodes are then fed into the Cluster Diff node.  In order to perform the comparison, the Cluster Diff node must know the unique identifier for the input records (Record ID) and the Cluster number that is returned from the respective Clustering node.

DataFlux Data Management Studio with Cluster Comparison

Cluster Diff Node Properties

The Diff type value describes the type of change when performing the cluster number comparison between the two Clustering nodes. The possible values for Diff type include COMBINE, DIVIDE, NETWORK, and SAME which is represented as a period (.). When comparing the results of the two Clustering nodes the results are reviewed as a Diff set.  Within a Diff set:

  • If the records were in different clusters on the “left table” and in the same cluster on the “right table,” then its Diff type is COMBINE.
  • If the records were in the same cluster on the “left table” and in different clusters on the “right table,” then its Diff type is DIVIDE.
  • If the records were in same cluster on the “left table” and in the same cluster on the “right table,” then its Diff type is “.” (SAME).
  • If when comparing the “left table” cluster to the “right table” clusters at least one record is added to the cluster AND at least one record is removed from the cluster, then its Diff type is NETWORK.
DataFlux Data Management Studio with Cluster Comparison

Cluster Diff Node Results

The Cluster Diff node is not a node that is typically used in a production matching job. However, it is a node that is useful in helping you compare and contrast different match code sensitivities and/or cluster conditions that enable you to achieve the best matching results for your data set.

 

tags: data management, DataFlux Data Management Studio, SAS Professional Services

Improving matching results in DataFlux Data Management Studio with cluster comparison was published on SAS Users.

2月 172016
 

REpresentational State Transfer (REST) is an architectural style for designing web services that access a system's resources using HTTP methods.

With the release of DataFlux Data Management 2.7 earlier this year, three new REST Application Programming Interfaces (APIs) are now available:

  • Data Management Server Batch Jobs
  • Data Management Server Real-time Data Jobs
  • Data Management Server Real-time Process Jobs

The Base URLs for the new Data Management Server REST APIs are:

Batch Jobs – http://<DM Server IP address or server name>:<port>/ SASDataMgmtBatchJob/rest

Real-time Data Services – http://<DM Server IP address or server name>:<port>/ SASDataMgmtRTDataJob/rest

Real-time Process Services – http://<DM Server IP address or server name>:<port>/ SASDataMgmtRTProcessJob/rest

Here is a sample call to the Base URL for Real-time Data Services:

http://my_dmserver.com:21036/SASDataMgmtRTDataJob/rest

RESTAPIs

The Data Management Server job or service name is referred to by its ID which is simply the name of the job or service (with its extension) Base64 encoded.  You can use this site to Base64 encode text.  Base64 encoding is a common standard use to encode paths or other resource names in any URL.  For example, without Base64 encoding, a filename and path that contains a space can cause problems.

Here is a sample call for a specific Real-time Data Service using its Base64 encoded name for its ID.

http://my_dmserver.com:21036/SASDataMgmtRTDataJob/rest/jobFlowDefns/ RGV0ZXJtaW5lX0dlbmRlci5kZGY=

RESTAPIs2

Note that the returned text contains information about the links to call (href), method to use as well as content type of the request.

Before reviewing the REST API metadata information for the service, let’s first look at testing the Determine Gender data service using the Data Management Server test service utility.

RESTAPIs3

The input of the service is Name and the output is Gender.

Here is an example of the REST API metadata information which includes the inputs and outputs for the Determine_Gender.ddf service.

http://my_dmserver.com:21036/SASDataMgmtRTDataJob/rest/jobFlowDefns/ RGV0ZXJtaW5lX0dlbmRlci5kZGY=/metadata

RESTAPIs4

The following are some common tools for testing REST APIs requests and responses:

Here is an example of the request/response information for executing the Determine Gender data service. The input Names of “John Smith” and “Jane Smith” were sent to the service and the Genders of “M” and “F” were returned.  (The Mozilla Firefox Add-on tool was used to generate this information.)

RESTAPIs5

The table below lists the possible HTTP status response codes that could be returned by a Data Management Server REST API request:

RESTAPIs6

For more information refer to the DataFlux Data Management Server 2.7: REST API Reference guide.

 

 

tags: data management, Data Management Server, REST API, SAS Professional Services

DataFlux Data Management Server 2.7: REST APIs was published on SAS Users.

12月 172015
 

Sometimes when trying to fuzzy match names you want to fuzzy match just a portion of the name: for example, Family Name and/or Given Name.  A common mistake that people make is to feed in the Family Name and Given Name columns separately into the Match Codes node instead of the Match Codes (Parsed) node.

So why is this a mistake?  The Name Match definition is designed to accept a full name and then parse the information into its tokens – Name Prefix, Given Name, Middle Name, Family Name, Name Suffix, and Title/Additional Info.

Name_Match_Definition

Let’s look at a case where I want to have separate match codes for Given Name and Family Name and my data is the following:
Michael Smith
Mike Smith

The following are the match codes generated using the Match Codes node with the Name Match definition (English – United States) at a sensitivity of 85.

Match_Code_Node

Match Code Node Results

Notice that the two records would NOT match using this approach since the Given_Name_MatchCodes are different.  The reason they do not match is if only one name is supplied when calling the Name Match definition, then in most cases when the Name is parsed it assumes you supplied only the Family Name and the nickname equivalents of Given Name are not applied to the input.

The following are the Match codes generated using the Match Codes (Parsed) node with the Name Match definition (English – United States) at a sensitivity of 85 by feeding in the Given Name and Family Name field in the appropriate tokens in separate calls to the node.

Match Code (Parsed) Node Results

Match Code (Parsed) Node Results

Notice that the names will match in this case since both the Given_Name_MatchCode and Family_Name_MatchCodes are the same. Using the Match Codes (Parsed) node ensures that the names are assigned to the proper tokens and the proper logic will be applied to each token.  Therefore, this is the approach you should use if trying to match Names based on just Given Name and/or Family Name.

Note:  You can also use the ParsedView of proc DQMATCH to generate match codes for parsed data. Refer to the SAS® 9.4: Data Quality Server Reference guide section on “DQMATCH – Example 4: Creating Match Codes for Parsed Values” for more information.

Now that you have this knowledge you can avoid making this common Name matching mistake on your data quality projects!

tags: data management, data quality, SAS Professional Services

Matching Names: How to Avoid a Common Mistake was published on SAS Users.