Uttam Kumar

3月 302018
 

Multi Node Data TransferWith SAS Viya 3.3, a new data transfer mechanism “MultiNode Data Transfer” has been introduced to transfer data between the data source and the SAS’ Cloud Analytics Services (‘CAS’), in addition to Serial and Parallel data transfer modes. The new mechanism is an extension of the Serial Data Transfer mode. In MultiNode Data transfer mode each CAS Worker makes a simultaneous concurrent connection to read and write data from the source DBMS or Distributed data system.

In CAS, SAS Data connectors are used for Serial mode and SAS Data Connect Accelerators are used for Parallel mode data transfer between CAS and DBMS. The SAS Data connector can also be used for the MultiNode data transfer mechanism. In a multi-node CAS environment when the Data Connector is installed on all Nodes, the Data connector can take advantage of a multi-node CAS environment and make concurrent data access connections from each CAS worker to read and write data from the data source environment.

The CAS Controller controls the MultiNode Data transfer. It directs each CAS worker node on how to query the source data and obtain the needed data. The CAS Controller checks the source data table for the first numeric column and uses the values to divide the table into slices using a MOD function of the number of CAS nodes specified. The higher the Cardinality of the selected numeric column, the easier the data can be divided into slices. If CAS chooses a low cardinality column, you could end-up with poor data distribution on the CAS worker nodes. The CAS controller directs each CAS worker to submit a query to obtain the needed slice of data. During this process, each CAS worker makes an independent, concurrent request to the data source environment.

Data is transferred from the source environment to the CAS worker nodes directly using a single thread connection, bypassing the CAS Controller.

The following diagrams describe the data access from CAS to data source environment using MultiNode Data transfer Mode. CAS is hosted on a multi-node environment with SAS Data Connector installed on each node (CAS Controller and Workers). A CASLIB is defined with NUMREADNODES= and NUMWRITENODES= value other than 1. With each data table access request, the CAS controller scan through the source data table for the first numeric columns and use the value to prepare a query for each CAS worker to run. The CAS Worker node submits an individual query to get its slice of the data. Something like:

Select * from SourceTable where mod(NumericField, NUMREADNODES) = WorkerNodeNumber

The data moves from the DBMS gateway server to each CAS Worker Nodes directly using a single thread connection, bypassing the CAS Controller. It’s a kind of parallel load using the serial mechanism, but it’s not a massively parallel data load. You can notice the bottleneck at DBMS gateway server. The data transfers always passes through the DBMS gateway server to the CAS Worker nodes.

Multi Node Data Transfer

Prerequisites to enable MultiNode Data Transfer include:

  • The CAS environment is a multi-node environment (multiple CAS Worker Nodes).
  • The SAS Data Connector for the data source is installed on each CAS Worker, and Controller Node.
  • The data source client connection components are installed on each CAS Worker, and Controller Node.

By default, SAS Data connector uses serial data transfer mode. To enable MultiNode Data Transfer mode you must use the NUMREADNODES= and NUMWRITENODES= parameters in CSLIB statement and specify value other than 1. If value is specified as 0, CAS will use all available CAS worker nodes. MultiNode Data Transfer Mode can use only number of available node, if you specify more than available nodes, the log prints a warning message.

The following code example describes the data load using “MultiNode” data transfer mode. It assigns a CASLIB using serial mode with NUMREADNODES=10 and NUMWRITENODES=10 and loads data from a Hive table to CAS. As NUMREADNODES= value is other than 1, it follows the MultiNode mechanism. You can notice in log, there is a warning message stating that the Number of Read node parameter exceeds the available Worker nodes. This is one way to verify whether CAS is using MultiNode data transfer mode, by specifying the higher number than available CAS worker nodes. If you specify value for NUMREADNODES =0, it will use all available nodes but no message or warning message in SAS log about multi node usage.

CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
caslib HiveSrl datasource=(srctype="hadoop",
server="xxxxxxx.xxx",
username="hadoop",
dataTransferMode="SERIAL",
NUMREADNODES=10, 
NUMWRITENODES=10,
hadoopconfigdir="/opt/MyHadoop/CDH/Config",
hadoopjarpath="/opt/MyHadoop/CDH/Jars",
schema="default");
proc casutil;
load casdata="prdsal2_1G" casout="prdsal2_1G"
outcaslib="HiveSrl" incaslib="HiveSrl" ;
quit;

SAS Log extract:

….
77 proc casutil;
78 ! load casdata="prdsal2_1G" casout="prdsal2_1G"
79 outcaslib="HiveSrl" incaslib="HiveSrl" ;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to Hadoop.
WARNING: The value of numReadNodes(10) exceeds the number of available worker nodes(7). The load will proceed with numReadNodes=7. 
…
..

On the Database side, in this case Hive, note the queries submitted by CAS Worker Nodes. Each include the MOD function WHERE clause as described above.

On Hadoop Resource Manager User Interface you can notice the corresponding job execution for each query submitted by CAS worker nodes.

When using MultiNode mode to load data to CAS, data distribution is dependent on the cardinality of the numeric column selected by CAS during MOD function operation. You can notice the CAS data distribution for the above loaded table is not ideal, since it selected a column (‘year’) which is not ideal (in this case) for data distribution across CAS worker nodes. There is no option with MultiNode mechanism to specify a column name to be use for query preparation and eventually for data distribution.

If CAS cannot find suitable columns for MultiNode data transfer mode, it will use standard Serial mode to transfer data as shown in the following log:

……..
74
74 ! load casdata="prdsal2_char" casout="prdsal2_char"
75 outcaslib="HiveSrl" incaslib="HiveSrl" ;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to Hadoop.
WARNING: The value of numReadNodes(10) exceeds the number of available worker nodes(7). The load will proceed with numReadNodes=7.
WARNING: Unable to find an acceptable column for multi-node reads. Load will proceed with numReadNodes = 1. 
NOTE: Cloud Analytic Services made the external data from prdsal2_char available as table PRDSAL2_CHAR in caslib HiveSrl.
…….

List of data platform supported with MultiNode Data Transfer using Data Connector:

  • Hadoop
  • Impala
  • Oracle
  • PostgreSQL
  • Teradata
  • Amazon Redshift
  • DB2
  • MS SQL Server
  • SAP HANA

The order of data types that SAS uses to divide data into slices for MultiNode Data Read.

  • INT (includes BIGINT, INTEGER, SMALLINT, TINYINT)
  • DECIMAL
  • NUMERIC
  • DOUBLE

Multi-Node Write:

While this post focused on loading data from a data source into CAS, multi-node data transfer also works when saving from CAS back to the data source. The important parameter when saving is NUMWRITENODES instead of NUMREADNODES. The behavior of multi-node saving is similar to that of multi-node loading.

Summary:

The SAS Data Connector can be used for MultiNode data transfer by installing Data Connector and DBMS client components on all CAS Worker nodes without additional license fees. The source data is transferred directly from DBMS gateway server to CAS Worker Nodes being divided up by a simple MOD function. By using this mechanism, the optimum data distribution in CAS Nodes are not guaranteed. It’s suggested to use all CAS Worker Nodes by specifying NUMREADNODES=0 when loading data to CAS using MultiNode mode.

Important links for more information about this topic:

Multi Node Data Transfer to CAS was published on SAS Users.

2月 202018
 

When using conventional methods to access and analyze data sets from Teradata tables, SAS brings all the rows from a Teradata table to SAS Workspace Server. As the number of rows in the table grows over time, it adds to the network latency to fetch the data from a database management system (DBMS) and move it to SAS Workspace Server. Considering big data, SAS Workspace Server may not have enough capacity to hold all the rows from a Teradata table.

SAS In-Database processing can help solve the problem of returning too much data from the database. SAS In-Database processing allows you to perform data operations inside the DBMS and use the distributed processing over multiple Access Module Processors (AMPs). Select SAS procedures take advantage of Teradata SQL functionality, and in some cases leverage SAS functions deployed inside the DBMS. The goal of in-database processing is to reduce the I/O required to transfer the data from Teradata to SAS.

SAS® In-Database Processing in Teradata

Using SAS In-Database processing, you can run scoring models, some SAS procedures, DS2 threaded programs, and formatted SQL queries inside the Teradata database.

The list of SAS In-Database features supported for Teradata include:

  • Format publishing and SAS_PUT()function
  • Scoring Models
  • Select BASE SAS® Procedures ( FREQ, RANK, REPORT, SORT, SUMMARY/MEAN , TABULATE)
  • Select SAS/STAT® Procedures (CORR, CANCORR, DMDB, DMINE, DMREG, FACTOR, PRINCOMP,
  • REG, SCORE, TIMESERIES, VARCLUS )
  • DS2 Threaded programs
  • Data quality operations
  • Extract and transform data

SAS In-Database Deployment Package for Teradata

The in-database deployment package for Teradata includes the following:

  • The SAS formats library, accelterafmt-######.rpm, installs a SAS formats library on the Teradata server. By having a SAS formats library on your Teradata system, you can publish SAS formats in Teradata, which enables you to process SAS statements with SAS formats in the Teradata database. This also enables you to publish SAS PUT functions to Teradata as a SAS_PUT() function. This software can be found in your SAS Install folder under /SAS-install-directory/SASFormatsLibraryforTeradata/3.1/TeradataonLinux/.
  • The SAS® Embedded Process package, sepcoretera-######.rpm, installs SAS Embedded Process in the Teradata database. This is the core package of in-database components. This software can be found in your software depot under folder /depot/standalone_installs/SAS_Core_Embedded_Process_Package_for_Teradata/13_0/Teradata_on_Linux.
  • The SASEPFUNC package, sasepfunc-#####.x86_64.tar.gz, installs SAS Embedded Process support functions on Teradata. SAS Embedded Process support functions are Teradata stored procedures that generate SQL to interface with SAS Embedded Process. The script from the package creates a Teradata database named SAS_SYSFNLIB with a list of tables, views, functions, and procedures to support SAS Embedded Process. The same script also adds a list of functions in the TD_SYSFNLIB database. The package can be obtained from the Teradata support group.

The following figure shows the list of objects from the SAS_SYSFNLIB database to support SAS Embedded Process:

The following shows the list of objects from the TD_SYSFNLIB database to support SAS Embedded Process:

  • The SAS® Quality Knowledge Base package, sasqkb_ci-27.#####.noarch.rpm, installs SAS Quality Knowledge Base on the Teradata server. This is an optional package to SAS Embedded Process. This package is needed along with SAS® Quality Accelerator, if you are planning to run data cleansing operations in the Teradata database. The package can be downloaded from the SAS support site.
  • The SAS Quality Accelerator package. There are two scripts (dq_install.sh and dq_grant.sh) located under SAS-Install-directory to install the data quality accelerator at Teradata. This is an optional package to SAS Embedded Process, and needed only if you are planning to run data cleansing operations in Teradata. The software install files can be found in the folder /SAS-install-directory/SASDataQualityAcceleratorforTeradata/9.4/dqacctera/sasmisc/. As a part of script execution, it adds a list of objects (procedures, functions) to the SAS_SYSFNLIB database.

Sample list of data quality related objects from the SAS_SYSFNLIB database.

Examples of running DS2 Code to perform data quality, data extract, and transform operations in Teradata:

The following example describes the execution of DS2 code by using SAS Data Quality Accelerator and SAS Quality Knowledge Base to match and extract a data set from the Teradata database. The log shows that both Threads program and Data program ran in the Teradata database as in-database program execution.

CODE

Stay tuned in for the next part of the SAS In-Database Processing in Teradata blog series. Coming up is one about publishing SAS format in Teradata.

SAS In-Database Processing in Teradata DBMS was published on SAS Users.

11月 162017
 

As a SAS Viya user, you may be wondering whether it is possible to execute data append and data update concurrently to a global Cloud Analytic Services (CAS) table from two or more CAS sessions. (Learn more about CAS.) How would this impact the report view while data append or data update is running on a global CAS table? These questions are even more important for those using the programming interface to load and update data in CAS. This post discusses data append, data update, and concurrency in CAS.

Two or more CAS sessions can simultaneously submit a data append and data update process to a CAS table, but only one process at a time can run against the same CAS table. The multiple append and update processes execute in serial, one after another, never running in a concurrent fashion. Whichever CAS session is first to acquire the write lock on a global CAS table prevails, appending or updating the data first. The other append and update processes must wait in a queue to acquire the write lock.

During the data append process, the appended data is not available to end users or reports until all rows are inserted and committed into the CAS table. While data append is running, users can still render reports against the CAS table using the original data, but excluding the appended rows.

Similarly, during the data update process, the updated data is not available to users or reports until the update process is complete. However, CAS lets you render reports using the original (non-updated) data, as the CAS table is always available for the read process. During the data update process, CAS makes additional copies into memory of the to-be-updated blocks containing rows in order to perform the update statement. Once the update process is complete, the additional and now obsolete copies of blocks, are removed from CAS. Data updates to a global CAS table is an expensive operation in terms of CPU and memory usage. You have to factor in the additional overhead memory or CAS_CACHE space to support the updates. The space requirement depends on the number of rows being affected by the update process.

At any given time, there could be only one active write process (append/update) against a global CAS table. However, there could be many concurrent active read processes against a global CAS table. A global CAS table is always available for read processes, even when an append or update process is running on the same CAS table.

The following log example describes two simultaneous CAS sessions executing data appends to a CAS table. Both append processes were submitted to CAS with a gap of a few seconds. Notice the execution time for the second CAS session MYSESSION1 is double the time that it took the first CAS session to append the same size of data to the CAS table. This shows that both appends were executing one after another. The amount of memory used and the CAS_CACHE location also shows that both processes were running one after another in a serial fashion.

Log from simultaneous CAS session MYSESSION submitting APPEND

58 proc casutil ;
NOTE: The UUID '1411b6f2-e678-f546-b284-42b6907260e9' is connected using session MYSESSION.
59 load data=mydata.big_prdsale
60 outcaslib="caspath" casout="big_PRDSALE" append ;
NOTE: MYDATA.BIG_PRDSALE was successfully added to the "caspath" caslib as "big_PRDSALE".
61 quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 49.58 seconds
cpu time 5.05 seconds

Log from simultaneous CAS session MYSESSION1 submitting APPEND

58 proc casutil ;
NOTE: The UUID 'a20a246e-e0cc-da4d-8691-c0ff2a222dfd' is connected using session MYSESSION1.
59 load data=mydata.big_prdsale1
60 outcaslib="caspath" casout="big_PRDSALE" append ;
NOTE: MYDATA.BIG_PRDSALE1 was successfully added to the "caspath" caslib as "big_PRDSALE".
61 quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 1:30.33
cpu time 4.91 seconds

 

When the data append process from MYSESSION1 was submitted alone (no simultaneous process), the execution time is around the same as for the first session MYSESSION. This also shows that when two simultaneous append processes were submitted against the CAS table, one was waiting for the other to finish. At one time, only one process was running the data APPEND action to the CAS table (no concurrent append).

Log from a lone CAS session MYSESSION1 submitting APPEND

58 proc casutil ;
NOTE: The UUID 'a20a246e-e0cc-da4d-8691-c0ff2a222dfd' is connected using session MYSESSION1.
59 load data=mydata.big_prdsale1
60 outcaslib="caspath" casout="big_PRDSALE" append ;
NOTE: MYDATA.BIG_PRDSALE1 was successfully added to the "caspath" caslib as "big_PRDSALE".
61 quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 47.63 seconds
cpu time 4.94 seconds

 

The following log example describes two simultaneous CAS sessions submitting data updates on a CAS table. Both update processes were submitted to CAS in a span of a few seconds. Notice the execution time for the second CAS session MYSESSION1 is double the time it took the first session to update the same number of rows. The amount of memory used and the CAS_CACHE location also shows that both processes were running one after another in a serial fashion. While the update process was running, memory and CAS_CACHE space increased, which suggests that the update process makes copies of to-be-updated data rows/blocks. Once the update process is complete, the space usage in memory/CAS_CACHE returned to normal.

When the data UPDATE action from MYSESSION1 was submitted alone (no simultaneous process), the execution time is around the same as for the first CAS session.

Log from a simultaneous CAS session MYSESSION submitting UPDATE

58 proc cas ;
59 table.update /
60 set={
61 {var="actual",value="22222"},
62 {var="country",value="'FRANCE'"}
63 },
64 table={
65 caslib="caspath",
66 name="big_prdsale",
67 where="index in(10,20,30,40,50,60,70,80,90,100 )"
68 }
69 ;
70 quit ;
NOTE: Active Session now MYSESSION.
{tableName=BIG_PRDSALE,rowsUpdated=86400}
NOTE: PROCEDURE CAS used (Total process time):
real time 4:37.68
cpu time 0.05 seconds

 

Log from a simultaneous CAS session MYSESSION1 submitting UPDATE

57 proc cas ;
58 table.update /
59 set={
60 {var="actual",value="22222"},
61 {var="country",value="'FRANCE'"}
62 },
63 table={
64 caslib="caspath",
65 name="big_prdsale",
66 where="index in(110,120,130,140,150,160,170,180,190,1100 )"
67 }
68 ;
69 quit ;
NOTE: Active Session now MYSESSION1.
{tableName=BIG_PRDSALE,rowsUpdated=86400}
NOTE: PROCEDURE CAS used (Total process time):
real time 8:56.38
cpu time 0.09 seconds

 

The following memory usage snapshot from one of the CAS nodes describes the usage of memory before and during the CAS table update. Notice the values for “used” and “buff/cache” columns before and during the CAS table update.

Memory usage on a CAS node before starting a CAS table UPDATE

Memory usage on a CAS node during CAS table UDPATE

Summary

When simultaneous data append and data update requests are submitted against a global CAS table from two or more CAS sessions, they execute in a serial fashion (no concurrent process execution). To execute data updates on a CAS table, you need an additional overhead memory/CAS_CACHE space. While the CAS table is going through the data append or data update process, the CAS table is still accessible to rendering reports.

Concurrent data append and update to a global CAS table was published on SAS Users.

4月 192016
 

Copy Data to Hadoop using SASWith the release of SAS® 9.4 M3, you can now access SAS Scalable Performance Data Engine (SPD Engine) data using Hive. SAS provides a custom Hive SerDe for reading SAS SPD Engine data stored on HDFS, enabling users to access the SPD Engine table from other applications.

The SPD Engine Hive SerDe is delivered in the form of two JAR files. Users need to deploy the SerDe JAR files under “../hadoop-mapreduce/lib” and “../hive/lib” on all nodes of a Hadoop cluster to enable the environment. To access the SPD Engine table from Hive, you need to register the SPD Engine table under Hive metadata using the metastore registration utility provided by SAS.

The Hive SerDe is read-only and cannot serialize data for storage in HDFS. The Hive SerDe does not support creating, altering, or updating SPD Engine data in HDFS using HiveQL or other languages. For those functions, you would use the SPD Engine with SAS applications.

Requirements

Before you can access an SPD Engine table using Hive SerDe you have to perform the following:

  • Deploy the SAS Foundation software using SAS Deployment Wizard.
  • Select the product name “SAS Hive SerDe for SPDE Data”

Accessing SPD Engine Data using Hive

This will create a subfolder under $sashome with the SerDe JAR files

[root@sasserver01 9.4]# pwd
/opt/sas/sashome/SASHiveSerDeforSPDEData/9.4
[root@sasserver01 9.4]# ls -l
total 88
drwxr-xr-x. 3 sasinst sas 4096 Mar 8 15:52 installs
-r-xr-xr-x. 1 sasinst sas 8615 Apr 15 2015 sashiveserdespde-installjar.sh
-rw-r--r--. 1 sasinst sas 62254 Jun 24 2015 sas.HiveSerdeSPDE.jar
-rw-r--r--. 1 sasinst sas 6998 Jun 24 2015 sas.HiveSerdeSPDE.nls.jar
[root@sasserver01 9.4]#

  • You must be running a supported Hadoop distribution that includes Hive 0.13 or later:

> Cloudera CDH 5.2 or later
> Hortonworks HDP 2.1 or later
> MapR 4.0.2 or later

  • The SPD Engine table stored in HDFS must have been created using the SPD Engine.
  • The Hive SerDe is delivered as two JAR files, which must be deployed to all nodes in the Hadoop cluster.
  • The SPD Engine table must be registered in the Hive metastore using the metastore registration utility supplied by SAS. You cannot use any other method to register tables.

Deploying the Hive SerDe on the Hadoop cluster

Deploy the SAS Hive SerDe on the Hadoop cluster by executing the script “sashiveserdespde-installjar.sh”. This script is located in the SAS Hive SerDe software deployed folder. Follow the steps below, which describe the SAS Hive SerDe deployment on a Hadoop cluster.

  • Copy the script file along with two JAR files to one of the nodes (NameNode server). For example, in my test environment, files were copied to the sascdh01 (NameNode) server with user ‘hadoop’.

[hadoop@sascdh01 SPDEHiveSerde]$ pwd
/home/hadoop/SPDEHiveSerde
[hadoop@sascdh01 SPDEHiveSerde]$ ls -l
total 84
-rwxr-xr-x 1 hadoop hadoop 8615 Mar 8 15:57 sashiveserdespde-installjar.sh
-rw-r--r-- 1 hadoop hadoop 62254 Mar 8 15:57 sas.HiveSerdeSPDE.jar
-rw-r--r-- 1 hadoop hadoop 6998 Mar 8 15:57 sas.HiveSerdeSPDE.nls.jar
[hadoop@sascdh01 SPDEHiveSerde]$

  • The node server (NameNode) must be able to use SSH to access the other data nodes in cluster. It’s recommended to execute the deployment script as user ‘root’ or with sudo su command.
  • Switch user to ‘root’ or user with ‘sudo su’ permission.
  • Set the Hadoop CLASSPATH to include the MapReduce and Hive Library installation directory. Set the SERDE_HOSTLIST to include the server where JAR files will be deployed. For example, for my test environment the following statement is used.

export CLASSPATH=/usr/lib/hive/lib/*:/usr/lib/hadoop-mapreduce/lib/*
export HADOOP_CLASSPATH=$CLASSPATH
export SERDE_HOSTLIST="xxxxx..xxxx.com xxxxx..xxxx.com xxxxx..xxxx.com"

  • Execute the script as user ‘root’ to deploy the JAR files on all nodes under “ ../hive/lib” and “../hadoop-mapreduce/lib” subfolder. While running the script, provide the location of MapReduce and the Hive library installation folder as parameters to script.

For example:

sh sashiveserdespde-installjar.sh -mr /usr/lib/hadoop-mapreduce/lib -hive /usr/lib/hive/lib
[root@sascdh01 SPDEHiveSerde]# sh sashiveserdespde-installjar.sh -mr /usr/lib/hadoop-mapreduce/lib -hive /usr/lib/hive/lib
scp -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null /root/Downloads/SPDEHiveSerde/sas.HiveSerdeSPDE.jar root@sascdh01:/usr/lib/hive/lib
....
.........
scp -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null /root/Downloads/SPDEHiveSerde/sas.HiveSerdeSPDE.nls.jar root@sascdh03:/usr/lib/hadoop-mapreduce/lib
[root@sascdh01 SPDEHiveSerde]#

  • Restart YARN/MapReduce and Hive services on the Hadoop cluster.

Registering the SAS Scalable Performance Data Engine table in Hive metadata

The SPD Engine table that you are planning to access from Hive must be registered to Hive metadata using the SAS provided metadata registration utility. You cannot use any other method to register tables. The utility reads an SPD Engine table’s metadata file (.mdf) in HDFS and creates Hive metadata in the Hive metastore as table properties. Registering the SPD Engine table projects a schema-like structure onto the table and creates Hive metadata about the location and structure of the data in HDFS.

Because the utility reads the SPD Engine table’s metadata file that is stored in HDFS, if the metadata is changed by the SPD Engine, you must re-register the table.

The metadata registration utility can be executed from one of the Hadoop cluster node server, preferably NameNode server. The code examples mentioned here are all from NameNode server.

The following steps describe the SPD Engine table registration to Hive metadata.

  • Set the Hadoop CLASSPATH to include a directory with the client Hadoop configuration files and SerDe JAR files.

The following example is from my test environment where two SerDe JAR files are copied under the “/home/hadoop/SPDEHiveSerde/” subfolder. This subfolder is owned by OS user ‘hadoop’, i.e., the user who will execute the table registration utility. While exporting CLASSPATH, you must also include ../hive/lib folder as part of classpath. For the Hadoop configuration XML file, here we are using /etc/hive/conf folder. If you have a separate folder for storing Hadoop configuration files, you can plug in that folder.

export CLASSPATH=/home/hadoop/SPDEHiveSerde/*:/usr/lib/hive/lib/*
export SAS_HADOOP_CONFIG_PATH=/etc/hive/conf/
export HADOOP_CLASSPATH=$SAS_HADOOP_CONFIG_PATH:$CLASSPATH

As a result of exporting Hadoop CLASSPATH, the output from ‘hadoop classpath’ statement should look like as follows. Notice the value that you included in your previous export statement.

[hadoop@sascdh01 ~]$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/etc/hive/conf/:/home/hadoop/SPDEHiveSerde/*:/usr/lib/hive/lib/*
[hadoop@sascdh01 ~]$

  • Run the SerDe JAR command with appropriate command parameters and options to register the SPD Engine table. For example, the following command executes the SerDe JAR files and registers an SPD Engine table named stocks. It specifies the HDFS directory location (/user/lasradm/SPDEData) that contains the .mdf file of that SPD Engine table. The –table and –mdflocation parameters are required.

hadoop jar /home/hadoop/SPDEHiveSerde/sas.HiveSerdeSPDE.jar com.sas.hadoop.serde.spde.hive.MetastoreRegistration -table stocks -mdflocation /user/lasradm/SPDEData

[hadoop@sascdh01 ~]$ hadoop jar /home/hadoop/SPDEHiveSerde/sas.HiveSerdeSPDE.jar com.sas.hadoop.serde.spde.hive.MetastoreRegistration -table stocks -mdflocation /user/lasradm/SPDEData
16/03/09 16:46:35 INFO hive.metastore: Trying to connect to metastore with URI thrift://xxxxxxx.xxxx.xxx.com:9083
16/03/09 16:46:35 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/03/09 16:46:36 INFO hive.metastore: Connected to metastore.
16/03/09 16:46:36 INFO hive.MetastoreRegistration: Table is registered in the Hive metastore as default.stocks
[hadoop@sascdh01 ~]$

Reading SAS Scalable Performance Data Engine table data from Hive

Once the SPD Engine table is registered in Hive metadata, you can query the SPD Engine table data via Hive. If you describe the table with the formatted option, you will see that the data file locations are the SPD Engine locations. The Storage section provides information about SerDe library, which is ‘com.sas.hadoop.serde.spde.hive.SpdeSerDe’.

hive> show tables;
OK
…..
…….

stocks
Time taken: 0.025 seconds, Fetched: 15 row(s)
hive>

hive> select count(*) from stocks;
Query ID = hadoop_20160309171515_9db3aed5-0ba4-40cc-acc4-56acee10a275
Total jobs = 1
…..
……..
………………
Total MapReduce CPU Time Spent: 2 seconds 860 msec
OK
699
Time taken: 38.734 seconds, Fetched: 1 row(s)
hive>

hive> describe formatted stocks;
OK
# col_name data_type comment

stock varchar(9) from deserializer
date date from deserializer
open double from deserializer
high double from deserializer
low double from deserializer
close double from deserializer
volume double from deserializer
adjclose double from deserializer

# Detailed Table Information
Database: default
Owner: anonymous
CreateTime: Wed Mar 09 16:46:36 EST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://xxxxxxxxx.xxxx.xxx.com:8020/user/lasradm/SPDEData/stocks_spde
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
adjclose.length 8
adjclose.offset 48
close.length 8
close.offset 32
date.length 8
date.offset 0
high.length 8
high.offset 16
low.length 8
low.offset 24
numFiles 0
numRows -1
open.length 8
open.offset 8
rawDataSize -1
spd.byte.order LITTLE_ENDIAN
spd.column.count 8
spd.encoding ISO-8859-1
spd.mdf.location hdfs://xxxxxxxxx.xxxx.xxx.com:8020/user/lasradm/SPDEData/stocks.mdf.0.0.0.spds9
spd.record.length 72
spde.serde.version.number 9.43
stock.offset 56
totalSize 0
transient_lastDdlTime 1457559996
volume.length 8
volume.offset 40

# Storage Information
SerDe Library: com.sas.hadoop.serde.spde.hive.SpdeSerDe
InputFormat: com.sas.hadoop.serde.spde.hive.SPDInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.166 seconds, Fetched: 60 row(s)
hive>

How SAS Scalable Performance Data Engine SerDE reads data

The SerDe reads the data using the encoding of the SPD Engine table. Make sure that the SPD Engine table name is appropriate for the encoding associated with the cluster.

Current SerDe Implementation of Data Conversion from SAS to Hive

Accessing SPD Engine Data using Hive2

Limitations

If the SPD Engine table in HDFS has any of the following features, it cannot be registered in Hive or use the SerDe. You must access it by going through SAS and the SPD Engine. The following table features are not supported:

  • Compressed or encrypted tables
  • Tables with SAS informats
  • Tables that have user-defined formats
  • Password-protected tables
  • Tables owned by the SAS Scalable Performance Data Server

Reference documents

SAS(R) 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System, Third Edition

 

tags: SAS Administrators, SAS Professional Services

Accessing SPD Engine Data using Hive was published on SAS Users.

2月 232016
 

Copy Data to Hadoop using SASThe SAS Data Loader directive ‘Copy Data to Hadoop’ enables you to copy data from DBMS to Hadoop Hive tables.

The SAS Data Loader for Hadoop can be configured to copy data from any external database which offers JDBC database connectivity. SAS Data Loader uses the Apache Sqoop™ and Oozie components installed with the Hadoop cluster to copy data from external databases. SAS Data Loader accesses the same external database directly to display schemas and lists of tables. For this reason, the SAS Data Loader client needs the same set of JDBC drivers that are installed on the Hadoop cluster.

The SAS Data Loader directive “Copy Data to Hadoop” generates and submits Oozie workflow with Sqoop tasks to an Oozie server on the Hadoop cluster. The Ozzie workflow executes on the Hadoop cluster using MapReduce steps to copy data from the database into HDFS.

Prerequisites

  • The external database-specific JDBC driver installed on the Hadoop cluster and on the SAS Data Loader client machine
  • Access to Oozie and Sqoop services running on the Hadoop cluster
  • A valid user ID and password to access the RDBMS database
  • A valid user ID and password to access the Hadoop cluster
  • The database connectors (JAR files) placed in the Sqoop lib and in the Oozie shared lib folder on HDFS
  • Database connection defined for vendor-specific database under SAS Data Loader database configuration

Database connector

To connect and access external databases from a Hadoop environment, you need a vendor-specific JDBC driver. To copy data from external database to a Hadoop Hive table using Oozie and Sqoop, the database vendor’s JDBC driver (JAR file) needs to be installed on $sqoop_home/lib path folder on the Hadoop cluster and in the Oozie shared lib folder on HDFS.

For example, if you are connecting to MySQL database, you place the mySQL connector JAR file under $sqoop_home/lib path at OS and under the Oozie shared lib folder /user/oozie/share/lib/lib_20151218113109/sqoop/ on HDFS.

[root@sascdh01 lib]# pwd
/usr/lib/sqoop/lib
[root@sascdh01 lib]# ls -l mysql*
-rw-r–r– 1 root root 972009 Feb 1 15:08 mysql-connector-java-5.1.36-bin.jar
[root@sascdh01 lib]#

[root@sascdh01 ~]# hadoop fs -ls /user/oozie/share/lib/lib_20151218113109/sqoop/mysql-connector*
-rw-r–r– 3 oozie oozie 972009 2016-01-29 18:09 /user/oozie/share/lib/lib_20151218113109/sqoop/mysql-connector-java-5.1.36-bin.jar
[root@sascdh01 ~]#

The same JDBC driver on the SAS Data Loader client machine.
Copy data to Hadoop using SAS

Example

The following data loader directive example illustrates the data copy from a MySQL database table to a Hadoop Hive table. The data transfer operation executes on the Hadoop cluster using Oozie workflow and MapReduce steps. The data is streamed directly from database server to the Hadoop cluster.

Copy data to Hadoop using SAS2

The following screen illustrates the database configuration set-up using JDBC mechanism for this example.

Copy data to Hadoop using SAS3

Here is a code extract from the above directive. You’ll notice it is submitting Oozie workflow to the Hadoop cluster.
Copy data to Hadoop using SAS4

And here is the log extract from the above directive execution, showing execution of Oozie workflow:
Copy data to Hadoop using SAS5

On an Oozie web console, you can see the Oozie jobs that have been submitted and their status shows either running, killed, or succeeded. By double-clicking the job ID, you can view the sequence of actions executed by the Oozie workflow.

Copy data to Hadoop using SAS6

Copy data to Hadoop using SAS7

On a YARN Resource Manager user interface, you can see the MapReduce tasks that have been submitted to execute the Sqoop and Hive tasks.

Copy data to Hadoop using SAS8

On HDFS, you can see the data files that have been created per the target table mentioned in the SAS Data Loader directive.

[root@sascdh01[LL1]  ~]# hadoop fs -ls /user/hive/warehouse/dept
Found 2 items
-rwxrwxrwt 3 sasdemo hadoop 22 2016-02-01 15:25 /user/hive/warehouse/dept/part-m-00000
-rwxrwxrwt 3 sasdemo hadoop 22 2016-02-01 17:14 /user/hive/warehouse/dept/part-m-00000_copy_1
[root@sascdh01 ~]#

Calling the SAS Data Loader directive using REST API

Representational State Transfer (REST) is an architectural style for designing web services that typically communicate over Hypertext Transfer Protocol (HTTP) using Uniform Resource Identifier (URI) paths. A REST application programming interface (API) is a set of routines, protocols, and tools for building software applications. A REST API is a collection of URIs, HTTP calls to those URIs, and some JSON or XML resource representations that frequently contain relational links.

The purpose of the SAS Data Loader REST API is to give scheduling systems the ability to run a SAS Data Loader directive. This API is included with SAS Data Loader and enables users to obtain a list of the existing directives and their IDs and then execute the directive without using the user interface. This API also enables users to create batch files to execute a directive and monitor the job’s state. The batch file can also be scheduled to run at specific times or intervals.

The SAS Data Loader REST API consists of the following components:

  • Base URI – http://www.hostname.com/SASDataLoader/rest
  • Resources – the base URI plus additional path levels that provide context for the service request, such as http://www.hostname.com/SASDataLoader/rest/customer/1234.
  • HTTP methods – a call such as GET, PUT, POST, or DELETE that specifies the action.
  • Representations (Media Types) – an Internet media type for the data that is a JSON or XML representation of the resource.
  • Status Codes – the HTTP response code to the action.

Example

The following example illustrates the execution of a saved SAS Data Loader directive using a curl application interface. The “Sygwin with curl” application has been installed on the SAS Data Loader client machine to demonstrate this example. The saved directive name used in this example is ‘ProfileData’.

To call the SAS Data Loader directive from external application using REST API, first determine the ID of the directive and locate the URL to run the directive. The following curl statement displays information about the supplied directive name, which includes the directive ID and the URL to run.

$ curl -H "[LL1] Accept: application/json" http://192.168.180.132/SASDataLoader/rest/directives?name=ProfileData

output:
{"links":{"version":1,"links":[{"method":"GET","rel":"self","href":"http://192.168.180.132:80/SASDataLoader/rest/directives?name=ProfileData&start=0&limit=10","uri":"/directives?name=ProfileData&start=0&limit=10"}]},"name":"items","accept":"application/vnd.sas.dataloader.directive.summary+json","start":0,"count":1,"items":[{"links":[{"method":"GET","rel":"up","href":"http://192.168.180.132:80/SASDataLoader/rest/directives","uri":"/directives","type":"application/vnd.sas.collection+json"},{"method":"GET","rel":"self","href":"http://192.168.180.132:80/SASDataLoader/rest/directives/7ab97872-6537-469e-8e0b-ecce7b05c2c5","uri":"/directives/7ab97872-6537-469e-8e0b-ecce7b05c2c5","type":"application/vnd.sas.dataloader.directive+json"},{"method":"GET","rel":"alternate","href":"http://192.168.180.132:80/SASDataLoader/rest/directives/7ab97872-6537-469e-8e0b-ecce7b05c2c5","uri":"/directives/7ab97872-6537-469e-8e0b-ecce7b05c2c5","type":"application/vnd.sas.dataloader.directive.summary+json"},{"method":"POST","rel":"execute","href":"http://192.168.180.132:80/SASDataLoader/rest/jobs?directive=7ab97872-6537-469e-8e0b-ecce7b05c2c5","uri":"/jobs?directive=7ab97872-6537-469e-8e0b-ecce7b05c2c5","type":"application/vnd.sas.dataloader.job+json"}],"version":1,"id":"7ab97872-6537-469e-8e0b-ecce7b05c2c5","name":"ProfileData","description":"Generate a profile report of the data in a table","type":"profileData","category":"dataQuality","creationTimeStamp":"2016-02-02T15:08:36.931Z","modifiedTimeStamp":"2016-02-02T15:08:36.931Z"}],"limit":10,"version":2}

$

Once the directive ID and the URL is identified, execute the saved directive using http method POST with parameters as URL to run. Once the POST statement is submitted, it displays JSON text with job ID

$ curl -H "Accept: application/json" -X POST http://192.168.180.132/SASDataLoader/rest/jobs?directive=7ab97872-6537-469e-8e0b-ecce7b05c2c5

output:
{"links":[{"method":"GET","rel":"self","href":"http://192.168.180.132:80/SASDataLoader/rest/jobs/22","uri":"/jobs/22"},{"method":"DELETE","rel":"delete","href":"http://192.168.180.132:80/SASDataLoader/rest/jobs/22","uri":"/jobs/22"},{"method":"GET","rel":"state","href":"http://192.168.180.132:80/SASDataLoader/rest/jobs/22/state","uri":"/jobs/22/state"},{"method":"GET","rel":"code","href":"http://192.168.180.132:80/SASDataLoader/rest/jobs/22/code","uri":"/jobs/22/code"},{"method":"GET","rel":"log","href":"http://192.168.180.132:80/SASDataLoader/rest/jobs/22/log","uri":"/jobs/22/log"},{"method":"PUT","rel":"cancel","href":"http://192.168.180.132:80/SASDataLoader/rest/jobs/22/state?value=canceled","uri":"/jobs/22/state?value=canceled"}],"version":1,"id":"22","state":"starting","directiveName":"ProfileData","elapsedTime":0.0}

By using the job ID from the previous statement output, you can view the status of submitted job. The following statement shows the status as ”job running” and “completed” when it completes.

$ curl -H "Accept: text/plain" http://192.168.180.132/SASDataLoader/rest/jobs/22/state

output:
running

$ curl -H "Accept: text/plain" http://192.168.180.132/SASDataLoader/rest/jobs/22/state

output:
completed

The status of the SAS Data Loader directive execution, which has been called and executed using REST API can also be viewed and monitored in the “Run Status” SAS Data Loader interface window. The following screen captures illustrate the in-progress and successfully completed jobs, which were called using REST API.

Copy data to Hadoop using SAS9

Copy data to Hadoop using SAS10

 

 

tags: SAS Administrators, SAS Data Loader for Hadoop, SAS Professional Services

Copy data to Hadoop using SAS Data Loader was published on SAS Users.

10月 282015
 

SAS 9.4 M3, introduces a new procedure named PROC SQOOP. This procedure enables users to access an Apache Sqoop utility from a SAS session to transfer data between a database and HDFS. Using SAS PROC SQOOP lets you submit Sqoop commands from within your SAS application to your Hadoop cluster.
PROC SQOOP is licensed with SAS/ACCESS® Interface for Hadoop, it’s not part of the Base SAS® license. PROC SQOOP is supported in UNIX and Windows SAS.

Sqoop commands are passed to the cluster using the Apache Oozie Workflow Scheduler for Hadoop. PROC SQOOP defines an Oozie workflow for your Sqoop task, which is then submitted to an Oozie server using a RESTFUL API.

PROC SQOOP works similarly to the Apache Sqoop command-line interface (CLI), using the same syntax. The procedure provides feedback as to whether the job completed successfully and where to get more details in your Hadoop cluster.

Database Connector

Sqoop can be used with any Java Database Connectivity (JDBC) compliant database and automatically supports several databases. In some cases, the database vendor’s JDBC driver (JAR file) might need to be installed in the “$sqoop_home/lib” path on the Sqoop client machine.

For example, if you are connecting to MySQL database, you place the mySQL connector JAR file under the /../lib/oozie/lib folder.

    [root@sascdh01 ~]# ls -l /usr/lib/oozie/lib/my*
    -rw-r–r– 1 root root 972009 Oct 15 02:48 /usr/lib/oozie/lib/mysql-connector-java-5.1.36-bin.jar

SQOOP setup

  • Download Sqoop-1 (1.4.x) from the Apache Sqoop website. Sqoop 1.4.5 is recommended. Be sure to get a Sqoop JAR file that is compatible with your Hadoop.
  • For each database that you plan to use Sqoop with, you must download a compatible JDBC driver or Sqoop connector from the associated database vendor. The connectors (JAR files) should be places in Sqoop lib and in the Oozie share lib in HDFS.
  • PROC SQOOP uses your Hadoop cluster configuration files. Set the environment variable SAS_HADOOP_CONFIG_PATH, which points to the location of your Hadoop configuration directory.
  • SQOOP JAR files are not required for the SAS client machine. The PROC SQOOP uses Apache Oozie, which provides REST API communication to the Hadoop cluster without local JAR files. Set the environment variable SAS_HADOOP_RESTFUL=1; to connect to the Hadoop server by using the WebHDFS or the HttpFS REST API.

To use PROC SQOOP, you must have the following information:

  • Database connection information; each vendor has its own connection options
  • Database user ID and password
  • HDFS file that contains database password ( for some database cases)
  • Hadoop user ID and password
  • Oozie URL ( host and port #)
  • NameNode service URL information (host and port #)
  • JobTracker/Resource manager service URL information (host and port #)
  • Oozie Workflow output Path
  • Sqoop command

Example
The following code example illustrates the data transfer from a MySQL database table to HDFS. The data transfer operation executed on the Hadoop cluster using Oozie workflow and MapReduce steps. The data is streamed directly from database server to the Hadoop cluster without routing through SAS workspace server. Under the PROC SQOOP statement, you provide environment properties where the data is located and the target location. Under the command section, you provide the native Sqoop statement for specific required actions.

OPTIONS SET=SAS_HADOOP_CONFIG_PATH="/opt/sas/thirdparty/Hadoop_Conf/CDH524";
OPTIONS SET=SAS_HADOOP_RESTFUL=1 ;

proc sqoop
 hadoopuser='sasdemo'
 dbuser='hdp' dbpwd='xxxxxx'
 oozieurl='http://xxxxxxx.xxxx.sas.com:11000/oozie'
 namenode='hdfs://xxxxxxx.xxxx.sas.com:8020'
 jobtracker='xxxxxxx.xxxx.sas.com:8032'
 wfhdfspath='hdfs://xxxxxxx.xxxx.sas.com:8020/user/sasdemo/myworkflow.xml'
 deletewf
 command=' import --connect jdbc:mysql://XXXXX.XXXX.sas.com/hdpdata --append -m 1 --table department --target-dir /user/sasdemo/department ';
 run;

Log extract from the above code execution
………….
…………………..
NOTE: SAS initialization used:
real time 0.03 seconds
cpu time 0.02 seconds

1 OPTIONS SET=SAS_HADOOP_CONFIG_PATH="/opt/sas/thirdparty/Hadoop_Conf/CDH524";
2 OPTIONS SET=SAS_HADOOP_RESTFUL=1 ;
3
4
5
6 proc sqoop
7 hadoopuser='sasdemo'
8 dbuser='hdp' dbpwd=XXXXXXXXX
9 oozieurl='http://xxxxx.xxxx.sas.com:11000/oozie'
10 namenode='hdfs://xxxxx.xxxx.sas.com:8020'
11 jobtracker=xxxxx.xxxx.sas.com:8032'
12 wfhdfspath='hdfs://xxxxx.xxxx.sas.com:8020/user/sasdemo/myworkflow.xml'
13 deletewf
14 command=' import --connect jdbc:mysql://xxxxx.xxxx.sas.com/hdpdata --append -m 1 --table department
--target-dir
14 ! /user/sasdemo/department ';
15 run;

NOTE: Job ID : 0000004-151015031507797-oozie-oozi-W
NOTE: Status : SUCCEEDED
NOTE: PROCEDURE SQOOP used (Total process time):
real time 55.89 seconds
cpu time 0.05 seconds
……….
………………..

On an Oozie web console, you can see the Oozie jobs that have been submitted and their status shows either running, killed, or succeeded.proc_sqoop_1
On a Yarn Resource Manager User interface, you can see the MapReduce tasks that have been submitted and executing.
proc_sqoop_2

On HDFS, you can see the data files that have been created per the –targetdir in the Sqoop command.

 

[root@sascdh01 ~]# hadoop fs -ls /user/sasdemo/department
Found 3 items
-rw-r--r-- 3 sasdemo supergroup 22 2015-10-15 10:56 /user/sasdemo/department/part-m-00000
-rw-r--r-- 3 sasdemo supergroup 22 2015-10-15 11:54 /user/sasdemo/department/part-m-00001
-rw-r--r-- 3 sasdemo supergroup 22 2015-10-15 12:03 /user/sasdemo/department/part-m-00002

Dependencies on SAS_HADOOP_RESTFUL environment variable
The SAS_HADOOP_RESTFUL environment variable determines whether to connect to the Hadoop server through JAR files, HttpFs, or WebHDFS. The default setting for this variable is 0, which connects to the Hadoop cluster using JAR files. The PROC SQOOP uses Apache Oozie, which provides REST API communication to a Hadoop cluster without JAR files. So when running PROC SQOOP, we need this parameter set to one.
If you don’t set the environment variable SAS_HADOOP_RESTFUL=1 in your SAS session, you could see a strange error message while executing the PROC SQOOP statement. The following SAS log reports issues with ‘hadoopuser’ parameters, and the process assumes that the Hadoop cluster is enabled with Kerberos security. However, security is disabled on the Hadoop cluster. This is therefore a misleading error message toward Kerberos.

..........
.....
NOTE: SAS initialization used:
real time 0.02 seconds
cpu time 0.02 seconds

1 OPTIONS SET=SAS_HADOOP_CONFIG_PATH="/opt/sas/thirdparty/Hadoop_Conf/CDH524";
2 /* OPTIONS SET=SAS_HADOOP_RESTFUL=1; */
3
4
5 proc sqoop
6 hadoopuser='sasdemo'
7 dbuser='hdp' dbpwd=XXXXXXXXX
8 oozieurl='http://xxxxx.xxxx.sas.com:11000/oozie'
9 namenode='hdfs://xxxxx.xxxx.sas.com:8020'
10 jobtracker=xxxxx.xxxx.sas.com:8032'
11 wfhdfspath='hdfs://xxxxx.xxxx.sas.com:8020/user/sasdemo/myworkflow.xml'
12 deletewf
13 command=' import --connect jdbc:mysql://xxxxx.xxxx.sas.com/hdpdata --append -m 1 --table department
--target-dir
13 ! /user/sasdemo/department ';
14 run;

ERROR: HADOOPUSER should not be provided for Kerberos enabled clusters.
ERROR: The path was not found: /user/sasdemo/SAS_SQOOPaq7ddo96.
NOTE: PROCEDURE SQOOP used (Total process time):
real time 3.41 seconds
cpu time 0.09 seconds
...........
......

Reference document

Base SAS® Procedure Guide, Fourth Edition

tags: Hadoop, PROC SQOOP, SAS Professional Services

Using SAS PROC SQOOP was published on SAS Users.