data integration

2月 202018
 

When using conventional methods to access and analyze data sets from Teradata tables, SAS brings all the rows from a Teradata table to SAS Workspace Server. As the number of rows in the table grows over time, it adds to the network latency to fetch the data from a database management system (DBMS) and move it to SAS Workspace Server. Considering big data, SAS Workspace Server may not have enough capacity to hold all the rows from a Teradata table.

SAS In-Database processing can help solve the problem of returning too much data from the database. SAS In-Database processing allows you to perform data operations inside the DBMS and use the distributed processing over multiple Access Module Processors (AMPs). Select SAS procedures take advantage of Teradata SQL functionality, and in some cases leverage SAS functions deployed inside the DBMS. The goal of in-database processing is to reduce the I/O required to transfer the data from Teradata to SAS.

SAS® In-Database Processing in Teradata

Using SAS In-Database processing, you can run scoring models, some SAS procedures, DS2 threaded programs, and formatted SQL queries inside the Teradata database.

The list of SAS In-Database features supported for Teradata include:

  • Format publishing and SAS_PUT()function
  • Scoring Models
  • Select BASE SAS® Procedures ( FREQ, RANK, REPORT, SORT, SUMMARY/MEAN , TABULATE)
  • Select SAS/STAT® Procedures (CORR, CANCORR, DMDB, DMINE, DMREG, FACTOR, PRINCOMP,
  • REG, SCORE, TIMESERIES, VARCLUS )
  • DS2 Threaded programs
  • Data quality operations
  • Extract and transform data

SAS In-Database Deployment Package for Teradata

The in-database deployment package for Teradata includes the following:

  • The SAS formats library, accelterafmt-######.rpm, installs a SAS formats library on the Teradata server. By having a SAS formats library on your Teradata system, you can publish SAS formats in Teradata, which enables you to process SAS statements with SAS formats in the Teradata database. This also enables you to publish SAS PUT functions to Teradata as a SAS_PUT() function. This software can be found in your SAS Install folder under /SAS-install-directory/SASFormatsLibraryforTeradata/3.1/TeradataonLinux/.
  • The SAS® Embedded Process package, sepcoretera-######.rpm, installs SAS Embedded Process in the Teradata database. This is the core package of in-database components. This software can be found in your software depot under folder /depot/standalone_installs/SAS_Core_Embedded_Process_Package_for_Teradata/13_0/Teradata_on_Linux.
  • The SASEPFUNC package, sasepfunc-#####.x86_64.tar.gz, installs SAS Embedded Process support functions on Teradata. SAS Embedded Process support functions are Teradata stored procedures that generate SQL to interface with SAS Embedded Process. The script from the package creates a Teradata database named SAS_SYSFNLIB with a list of tables, views, functions, and procedures to support SAS Embedded Process. The same script also adds a list of functions in the TD_SYSFNLIB database. The package can be obtained from the Teradata support group.

The following figure shows the list of objects from the SAS_SYSFNLIB database to support SAS Embedded Process:

The following shows the list of objects from the TD_SYSFNLIB database to support SAS Embedded Process:

  • The SAS® Quality Knowledge Base package, sasqkb_ci-27.#####.noarch.rpm, installs SAS Quality Knowledge Base on the Teradata server. This is an optional package to SAS Embedded Process. This package is needed along with SAS® Quality Accelerator, if you are planning to run data cleansing operations in the Teradata database. The package can be downloaded from the SAS support site.
  • The SAS Quality Accelerator package. There are two scripts (dq_install.sh and dq_grant.sh) located under SAS-Install-directory to install the data quality accelerator at Teradata. This is an optional package to SAS Embedded Process, and needed only if you are planning to run data cleansing operations in Teradata. The software install files can be found in the folder /SAS-install-directory/SASDataQualityAcceleratorforTeradata/9.4/dqacctera/sasmisc/. As a part of script execution, it adds a list of objects (procedures, functions) to the SAS_SYSFNLIB database.

Sample list of data quality related objects from the SAS_SYSFNLIB database.

Examples of running DS2 Code to perform data quality, data extract, and transform operations in Teradata:

The following example describes the execution of DS2 code by using SAS Data Quality Accelerator and SAS Quality Knowledge Base to match and extract a data set from the Teradata database. The log shows that both Threads program and Data program ran in the Teradata database as in-database program execution.

CODE

Stay tuned in for the next part of the SAS In-Database Processing in Teradata blog series. Coming up is one about publishing SAS format in Teradata.

SAS In-Database Processing in Teradata DBMS was published on SAS Users.

1月 302017
 

I’ve had several meetings lately on data management, and especially integration, where the ability to explore alternatives has been critical. And the findings from our internet of things (IoT) early adopters survey confirms that the ecosystem nature of data sources in IoT deployments means we need to expand the traditional […]

It’s data integration, but not as we know it was published on SAS Voices.