SAS Data Integration Studio

1月 182019
 

It seems that everyone knows about GitHub -- the service that hosts many popular open source code projects. The underpinnings of GitHub are based on Git, which is itself an open-source implementation of a source management system. Git was originally built to help developers collaborate on Linux (yet another famous open source project) -- but now we all use it for all types of projects.

There are other free and for-pay services that use Git, like Bitbucket and GitLab. And there are countless products that embed Git for its versioning and collaboration features. In 2014, SAS developers added built-in Git support for SAS Enterprise Guide.

Since then, Git (and GitHub) have grown to play an even larger role in data science operations and DevOps in general. Automation is a key component for production work -- including check-in, check-out, commit, and rollback. In response, SAS has added Git integration to more SAS products, including:

  • the Base SAS programming language, via a collection of SAS functions.
  • SAS Data Integration Studio, via a new source control plugin
  • SAS Studio (experimental in v3.8)

You can use this Git integration with any service that supports Git (GitHub, GitLab, etc.), or with your own private Git servers and even just local Git repositories.

SAS functions for Git

Git infrastructure and functions were added to SAS 9.4 Maintenance 6. The new SAS functions all have the helpful prefix of "GITFN_" (signifying "Git fun!", I assume). Here's a partial list:

GITFN_CLONE  Clones a Git repository (for example, from GitHub) into a directory on the SAS server.
GITFN_COMMIT  Commits staged files to the local repository
GITFN_DIFF Returns the number of diffs between two commits in the local repository and creates a diff record object for the local repository.
GITFN_PUSH  Pushes the committed files in the local repository to the remote repository.
GITFN_NEW_BRANCH  Creates a Git branch

 

The function names make sense if you're familiar with Git lingo. If you're new to Git, you'll need to learn the terms that go with the commands: clone, repo, commit, stage, blame, and more. This handbook provided by GitHub is friendly and easy to read. (Or you can start with this xkcd comic.)

You can

data _null_;
 version = gitfn_version();
 put version=;             
 
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "c:\Projects\sas-dummy-blog");
 put rc=;
run;

In one line, this function fetches an entire collection of code files from your source control system. Here's a more concrete example that fetches the code to a work space, then runs a program from that repository. (This is safe for you to try -- here's the code that will be pulled/run. It even works from SAS University Edition.)

options dlcreatedir;
%let repoPath = %sysfunc(getoption(WORK))/sas-dummy-blog;
libname repo "&repoPath.";
libname repo clear;
 
/* Fetch latest code from GitHub */
data _null_;
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "&repoPath.");
 put rc=;
run;
 
/* run the code in this session */
%include "&repoPath./rng_example_thanos.sas";

You could use the other GITFN functions to stage and commit the output from your SAS jobs, including log files, data sets, ODS results -- whatever you need to keep and version.

Using Git in SAS Data Integration Studio

SAS Data Integration Studio has supported source control integration for many years, but only for CVS and Subversion (still in wide use, but they aren't media darlings like GitHub). By popular request, the latest version of SAS Data Integration Studio adds support for a Git plug-in.

Example of Git in SAS DI Studio

See the documentation for details:

Read more about setup and use in the available here as part of our "Custom Tasks Tuesday" series.

Using Git in SAS Enterprise Guide

This isn't new, but I'll include it for completeness. SAS Enterprise Guide supports built-in Git repository support for SAS programs that are stored in your project file. You can use this feature without having to set up any external Git servers or repositories. Also, SAS Enterprise Guide can recognize when you reference programs that are managed in an external Git repository. This integration enables features like program history, compare differences, commit, and more. Read more and see a demo of this in action here.

program history

If you use SAS Enterprise Guide to edit and run SAS programs that are managed in an external Git repository, here's an important tip. Change your project file properties to "Use paths relative to the project for programs and importable files." You'll find this checkbox in File->Project Properties.

With this enabled, you can store the project file (EGP) and any SAS programs together in Git, organized into subfolders if you want. As long as these are cloned into a similar structure on any system you use, the file paths will resolve automatically.

The post Using built-in Git operations in SAS appeared first on The SAS Dummy.

10月 192017
 

The goal of this article is to describe the steps needed to configure the bridge that allows SAS Data Integration 4.902, based on SAS 9.4M4, to load data directly into CAS on SAS Viya 3.2.

Of course, SAS 9.4M5 simplifies this process, as a SAS/CONNECT communication will no longer be required, enabled by a direct function within SAS Data Integration to CAS - but for those of you who may not move immediately from SAS 9.4M4 to SAS 9.4M5, this could be helpful.

It is assumed here that SAS/CONNECT has been installed and configured on both environments, SAS 9.4M4 and SAS Viya 3.2.

Validate the connection from SAS 9.4M4 to SAS Viya 3.2

⇒     Check the status of the SAS/CONNECT Spawner on SAS Viya, on the machine where this service is installed.

SAS Viya 3.2 Bridge for SAS Data Integration Studio

⇒     Note the machine and the listening port of the SAS/CONNECT Spawner on SAS Viya.
⇒     Open SAS Studio on SAS 9.4M4 and sign-in.
⇒     Run the following SAS code with your machine details and a valid SAS Viya user account and check the results.

SAS Viya 3.2 Bridge for SAS Data Integration Studio

⇒     If successful, sign-off the SAS/CONNECT session and sign-out from SAS Studio SAS 9.4M4

Setup SAS9.4M4 metadata

⇒     Open SAS Management Console 9.4M4 as sasadm@saspw.
⇒     On the “Server Manager” plugin, add a “New Server…”

  • Server type: “SAS Application Server”
  • Name: “SASViya”
  • Select “Connect Server” as the sub-type of server you want to add to this SAS Application Server
  • Configure the “Connect Server” as shown below (you might have to create a new authentication domain for SAS Viya) and set the values accordingly (server where the SAS/CONNECT Spawner on SAS Viya is listening)

⇒     On the “Server Manager” plugin, add a “New Server…”

  • Server type: “SAS Cloud Analytic Services Server”
  • Name: “CAS Server”
  • Configure the “CAS Server” as shown below and set the values accordingly (CAS controller)

⇒     On the “User Manager” plugin, set a login for the SASViya application server, on a user or group that you will use in SAS Data Integration Studio

⇒     On the “Data Library Manager” plugin, add a “New Library…”

  • Library type: “SAS Cloud Analytic Services Library”
  • Name: CAS_DATA
  • Assign the library to the SASViya server

⇒     Configure the CAS library as shown below and set the values accordingly (the CASLIB must exist in the SAS Viya environment; here CASPATH is the name of an existing CASLIB).

⇒     Specify the server and connection information as shown below:

Build a SAS Data Integration Studio job to load data into CAS

⇒     Open SAS Data Integration Studio 4.902 as a user who holds, directly or not, a login for the ViyaAuth authentication domain.
⇒     Test the CAS_DATA library by “Register(ing) tables…”

  • In SAS Environment Manager on SAS Viya, some tables must have been loaded before into the CASLIB (the one that is pointed on by the CAS library, here CASPATH), so that you can display some tables in the “Register Tables…” wizard.
  • If you see some tables then it looks like you are all set.

⇒     If you want to go further and test the “Cloud Analytic Services Transfer” DI transformation, create the metadata for a target table in CAS with appropriate columns.

⇒     Build a job that loads a source table to this target table using the “Cloud Analytic Services Transfer” (“Access” folder) as shown below:

⇒     The “Cloud Analytic Services Transfer” is basically a “Data Transfer” transformation that fits with CAS; it enables you to provide some CAS options such as COPIES; by default the table is PROMOTED.

⇒     Run it and check if it ran successfully and if the table is available from the SAS Viya world.

SAS/CONNECT on SAS Viya configuration directory

⇒     The SAS/CONNECT on SAS Viya configuration is located by default here /opt/sas/viya/config/etc/connectserver/default

⇒     You might want to customize/clean some of the files here.

  • Check the CASHOST option in autoexec files, sometimes the value is not appropriate.

Normally, options here are sourced from the CONNECTSERVER_CONFIGURATION in vars.yml during the deployment of SAS Viya.

SAS Viya 3.2 Bridge for SAS Data Integration Studio 4.902 (on SAS 9.4M4) configuration steps was published on SAS Users.

1月 132013
 

SAS2DISJob

This is for for SAS programmers who might be reluctant to check out a GUI tool like SAS Data Integration Studio (DIS for short, which is graphic tool to implement ETL processes: extract, transform, load). DIS translates all users dragged nodes, transformations and process into SAS codes which are traditionally written by SAS programmers. I find one of the benefits of using DIS is that I can package all the ETL work to other players(technical or nontechnical) then they can replay the job by dragging and clicking.

Actually besides translating all the visual nodes to SAS codes, DIS can also reversely incorporate your SAS codes to the graphic job shown as above. So SAS programmers can easily jump into DIS by importing their codes to get the graphic workflow (to get the first impression). Here is an one-minute demo:

Save the following codes in a file(from SAS SQL onlinedoc),

%let dir=C:\;

libname source BASE "&dir";
libname target BASE "&dir";

data source.one;
    input x y;
datalines;
1 2
2 3
;

data source.two;
    input x y;
datalines;
2 5
3 6
4 9
;

proc sql;
    create table target.three as
        select o.x,o.y
        from source.one as o, source.two as t
        where o.x=t.x
        ;
quit;

proc print data=target.three;
run;

Just create a test folder in DIS (this demo was created in a working repository, in a Windows DIS 4.21 machine ) then right click mouse to “Import”-“Import SAS Code” to import the file created above then run the job generated and all set (you will get all stuff showed above).

Note that the SAS libraries must be registered first in the SAS Metadata Server then you can use it. This demo omit this process only for demo purpose.

You can check out the SQL Join by double clicking the SQL node:

SAS2DISJob_SQLJoin

and your codes in Code panel (still yours!):

image

Furthermore, without getting know bunch of the tool first, you can jump into DIS quickly by creating customized Transformations which are basically SAS codes with macro variables as the perimeters in the visual options box. Pretty neat? Just check it out!

12月 232012
 

logo

Last week I got another SAS certification, SAS Certified Data Integration Developer for SAS 9 and I must say it can’t be better as a SAS holiday gift for myself!

I took all my SAS certificates when employed (when in Sanofi Pasteur, 2010, I got two SAS programming certificates, Base and Advanced). You may say it’s kind of certificate for sake of certificate and not for applying a job like college students. You are right and few reasons I like to earn such certificates at work:

  • It doesn’t hurt! Little better than nothing and holding a certificate will never make a thing worse off.
  • It’s highly relevant. Only few additional efforts needed to take such SAS certificate examinations when worked as SAS programmer and solution consultant.
  • It’s necessary(somehow). When worked as a SAS programmer at pharma,  I played data manipulations techniques heavily, but never used a indexing method because it’s not needed due to limited cases of clinical data: recruiting a thousand subjects is even not a easy task! Then I took SAS Advance test and I filled the hole of SAS index.
  • It’s fun; actually I mean the certificate fee was paid by my employers (thanks!)…I got this from Prof. N. Gregory Mankiw:

Most people who pursue an academic career do so because they are fascinated by their subject. It is for this reason that professors report among the highest rates of job satisfaction of all professions. Professors have found what they like to do, and they have found someone to pay them to do it.

(then change “people/professor” to “SAS Programmer”)

I’m always enamored of playing with SAS in any kind of forms: writing SAS codes, configuring SAS products, reading SAS books, taking SAS certificates, and very important, I’m paid to do these!

11月 202012
 

You can run the following piece of codes successfully in these 3 SAS programming environments:

  • BASE SAS
  • Enterprise Guide: create a new “File-New-Program”
  • SAS Data Integration Studio: create a new “Tool-Code Editor”

%let species="Setosa" "Versicolor";

data a;
    set sashelp.iris;
    where species in (&species);
run;

Then create a Transformation in SAS Data Integration Studio (DIS for short; I use version 4.3 in a Win 7 machine) using the codes above as source code (remember deleting first line) and create a simple Prompt to assign the macro variable &species with default values as "Setosa" "Versicolor":

SAS_DIS_Transformation

Drag this Transformation in a single node job and run, then you will get such errors:

127        data a;
128            set sashelp.iris;
129            where species in (&species);
NOTE: Line generated by the macro variable "SPECIES".
129        "Setosa", "Versicolor"

           _
           22
            _
            76
ERROR: Syntax error while parsing WHERE clause.
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,
              a missing value, -. 

ERROR 76-322: Syntax error, statement will be ignored.

130        run;

So, what happened inside DIS since such codes go well in BASE SAS, SAS EG and the Code Editor in DIS itself? The default value "Setosa" "Versicolor" was assigned to macro variable &species in the DIS Prompt (see picture above) and you would expect the following effect like what I wrote in open codes in BASE SAS:

%let species="Setosa" "Versicolor";

Actually NO. In DIS, this action was translated into such clause:

%let species = %nrquote(%"Setosa%", %"Versicolor%");

Here is the point: since there quotation symbols were masked by %nrquote function, they are no longer valid syntax characters (then we got errors!).

The solution: use a %unquote function on the macro variable &species to reverse the masking effort by %nrquote(note in our open codes example in BASE SAS, both functions were not used):

data a;
    set sashelp.iris;
    where species in (%unquote(&species));
run;

Notes on Macro Variable Generated by DIS

%let species = %nrquote(%"Setosa%", %"Versicolor%");

1)%nrquote function used while it is dated for a long time… %nrquote and %quote were replaced by %BQUOTE and %NRBQUOTE Functions.

2)All quotation symbols(“) were preceded by a percent sign (%). That’s %nrquote and %quote needed and why they are outdated any more.