tips & techniques

3月 072019
 

As of December 2018, any customer with a valid SAS Viya order is able to package and deploy their SAS Viya software in Docker containers. SAS has provided a fully documented and supported project (or “recipe”) for easily building these containers. So how can you start? You can simply stop reading this article and go directly to the GitHub repository and follow the instructions there. Otherwise, in this article, Jeff Owens, a solutions architect at SAS, provides a little color commentary around the process in case it is helpful…

First of all, what is the point of these containers?

Well, at its core, remember SAS and it’s massively parallel, in-memory counterpart, Cloud Analytic Services (CAS) is a powerful runtime for data processing and analytics. A runtime simply being an engine responsible for processing and executing a particular type of code (i.e. SAS code). Traditionally, the SAS runtime would live on a centralized server somewhere and users would submit their “jobs” to that SAS runtime (server) in a variety of ways. The SAS server supports a number of different products, tasks, etc. – but for this discussion let’s just focus on the scenario where a job here is a “.sas” file, perhaps developed in an IDE-like Enterprise Guide or SAS Studio, and submitted to the SAS runtime engine via the IDE itself, a bash shell, or maybe even SAS’ enterprise grade scheduler and job management solution – SAS Grid. In these cases, the SAS and CAS servers are on dedicated, always-on physical servers.

The brave new containerized world in which we live provides us a new deployment model: submit the job and create the runtime server at the same time. Plus, only consume the exact resources from the host machine or the Kubernetes cluster the specific job requires. And when the job finishes, release those resources for others to use. Kubernetes and PaaS clusters are quite likely shared environments, and one of the major themes in the rise of the containers is the further abstraction between hardware and software. Some of that may be easier said than done, particularly for customers with very large volumes of jobs to manage, but it is indeed possible today with SAS Viya on Docker/Kubernetes.

Another effective (and more immediate) usage of this containerized version of SAS Viya is simply an adhoc, on-demand, temporary development environment. The container package includes SAS Studio, so one can quickly spin up a full SAS Viya programming sandbox – SAS Studio as well as the SAS & CAS runtimes. Here they can develop and test SAS code, and just as quickly tear the environment down when no longer needed. This is useful for users that: (a) don’t have access to an “always-on” environment for whatever reason, (b) want to try out experimental code that could potentially consume resources from a shared "always-on" sas environment, and/or (c) maybe their Kubernetes cluster has many more resources available than their always-on and they want to try a BIG job.

Yes, it is possible to deploy the entire SAS Viya stack (microservices and all) via Kubernetes but that discussion is for another day. This post focuses strictly on the SAS Viya programming components and running on a single machine Docker host rather than a Kubernetes cluster.

Build the container image

I will begin here with a fresh single machine RHEL 7.5 server running on Openstack. But this machine could have been running on any cloud or VM platform, and I could use any (modern enough) flavor of Linux thanks to how Docker works. My machine here has 8cpu, 16GB RAM, and a 50GB root volume. Less or more is fine. A couple of notes to help understand how to configure an instance:

  • The final docker container image we will end up with will be ~10GB in size and like all docker images will live in /var/lib/docker/images by default.
    • Yes, that is large for a container. Most of this size is just static bins and libs that support the very developed SAS language. Compare to an Anaconda image which is ~3.6GB.
  • As for RAM, remember any tables loaded to CAS are loaded to memory (and will swap to disk as needed). So, your memory choice should be directly dependent on the data sizes you expect to work with.
  • Similar story for cores – CAS code is multithreaded, so more cores = more parallelization.

The first step is to install Docker.

Following along with sas-container-recipes now, the first thing I should do is mirror the repo for my order. Note, this is not a required step – you could build this container directly from SAS repos if you wanted, but we’ll mirror as a best practice. We could simply mirror and serve it over the local filesystem of our build host, but since I promised color I’ll serve it over the web instead. So, these commands run on a separate RHEL server. If you choose to mirror on your build host, make sure you have the disk space (~30GB should be plenty). You will also need your SAS_Viya_deployment_data.zip file available on the SAS Customer Support site. Run the following code to execute the setup.

$ wget https://support.sas.com/installation/viya/34/sas-mirror-manager/lax/mirrormgr-linux.tgz
$ tar xf mirrormgr-linux.tgz
$ rm -f mirrormgr-linux.tgz
$ mkdir -p /repos/viyactr
$ mirrormgr mirror –deployment-data SAS_Viya_deployment_data.zip –path /repos/viyactr –platform x64-redhat-linux-6 –latest
$ yum install httpd -y
$ system start httpd
$ systemctl enable httpd
$ ln -s /repos/viyactr /var/www/html/sas_repo

Next, I go ahead and clone the sas-containers-recipes repo locally and upload my SAS-Viya-deployment-data.zip file and I am ready to run the build command. As a bonus, I am also going to use my site’s (SAS’) sssd.conf file so my container will use our corporate Active Directory for authentication. If you do not need or want that integration you can skip the “vi addons/sssd.conf” line and change the “--addons” option to “addons/auth-demo” so your container seeds with a single “sasdemo:sasdemo” user:password instead.

$ # upload SAS_Viya_deployment_data.zip to this machine somehow
$ Git clone https://github.com/sassoftware/sas-container-recipes.git
$ cd sas-container-recipes/
$ vi addons/sssd.conf # <- paste in your site’s sssd.conf file
$ build.sh \
--type single \
--zip ~/SAS_Viya_deployment_data.zip \
--mirror-url http://jo.openstack.sas.com/sas_repo \
--addons “addons/auth-sssd”

The build should take about 45 minutes and produce a single container image for you (there might be a few images, but it is just one with a thin layer or two on top). You might want to give this image a new name (docker tag) and push it into your own private registry (docker push). Aside from that, we are ready to run it.
If you are curious, look in the addons directory for the other optional layers you can add to your container. Several tools are available for easily configuring connections to external databases.

Run the container

Here is the run command we can use to launch the container. Note the image name I use here is “sas-viya-programming:xxxxxx” – this is the image that has my sssd layer built on top of it.

$ docker run \
--detach \ 
--rm \ 
--env CASENV_CAS_HOST=$(hostname -f) \ 
--env CASENV_CAS_VIRTUAL_PORT=8081 \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Connect to the container

And now, in a web browser, I can go to :8081/SASStudio and I will end up in SAS Studio, where I can sign in with my internal SAS credentials. To stop the container, use the name you gave it: “docker stop sas-viya-programming”. Because we used the “--rm” flag the container will be removed (completely destroyed) when we stop it.

Note we are explicitly mapping in the HTTP port (8081:80) so we easily know how to get to SAS Studio. If you want to start up another container here on the same host, you will need to use a different port or else you’ll get an address already in use error. Also note we might be interested in connecting directly to this CAS server from something other than SAS Studio (localhost). A remote python client for example. We can use the other port we mapped in (5570:5570) to connect to the CAS server.

Persist the data

Running this container with the above command means anything and everything done inside the container (configuration changes, code, data) will not persist if the container stops and a new one started later. Luckily this is a very standard and easy to solve scenario with Docker and Kubernetes. Here are a couple of targets inside the container you might be interested in mounting a volume to:

  • /tmp – this is where CAS_DISK_CACHE is by default, not to mention SASWORK. Those are scratch space used by the runtimes. If you are working with small data and don’t care too much about performance, no need to worry about this. But to optimize your container we would suggest mounting a Docker volume to this location (or, ideally, bind mount a high-performance storage device here). Note that generally Docker prefers us to use Docker volumes in lieu of bind mounts, but that is more for manageability, security, and portability than performance.
  • /data – this directory doesn’t necessarily exist but when you mount a volume into a container the target location will be created. So, you could call this target whatever you want, assuming it doesn’t exist yet.  Bind mounting is tempting here and OK to do but consider the scenario when another user wants to run your container following instructions you provided them – better to use a Docker volume than force them to create the directory on the host.  If you have an NFS location, bind mounting that makes sense
  • /code – same spiel as with /data. Once you are in the container you can save your work and it will persist in the docker volume from run to run of your container.

Here is what an updated docker run command might look like with these volumes included:

$docker run \ 
--detach \ 
-rm \ 
--env CASNV_CAS_VIRTUAL_HOST=$(hostname -f) \ 
--env CASNV_CAS_VIRTUAL_PORT=8081 \ 
--volume mydata:/data \ 
--volume /nfsdata:/nfsdata \ # example syntax for bind mount instead of docker volume mount 
--volume mycode:/code \ 
--volume sastmp:/tmp \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Can I run this on my laptop?

Yes. You would just need to install Docker on your laptop (go to docker.com for that). You can certainly follow the instructions from the top to build and run locally. You can even push this container image out to an internal registry so other users could skip the build and just run.

So far, we have only talked about the “ad-hoc” or “sandbox” dev type of use case for this container. A later article may cover how to run in batch mode or maybe we will move straight to multi-containers & Kubernetes. In the meantime though, here is how to submit a .sas program as a batch job to this single container we have built.

Give it a try!

Try creating your own image and deploying a container. Feel free to comment on your experience.

More info:

SAS Communities Article- Running SAS Analytics in a Docker container
SAS Global Forum Paper- Docker Toolkit for Data Scientists – How to Start Doing Data Science in Minutes!
SAS Global Forum Tech Talk Video- Deploying and running SAS in Containers

Getting Started with SAS Containers was published on SAS Users.

2月 062019
 

Splitting external text data files into multiple files

Recently, I worked on a cybersecurity project that entailed processing a staggering number of raw text files about web traffic. Millions of rows had to be read and parsed to extract variable values.

The problem was complicated by the varying records composition. Each external raw file was a collection of records of different structures that required different parsing programming logic. Besides, those heterogeneous records could not possibly belong to the same rectangular data tables with fixed sets of columns.

Solving the problem

To solve the problem, I decided to employ a "divide and conquer" strategy: to split the external file into many files, each with a homogeneous structure, then parse them separately to create as many output SAS data sets.

My plan was to use a SAS DATA Step for looping through the rows (records) of the external file, read each row, identify its type, and based on that, write it to a corresponding output file.

Like how we would split a data set into many:

 
data CARS_ASIA CARS_EUROPE CARS_USA;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   output CARS_ASIA;
      when('Europe') output CARS_EUROPE;
      when('USA')    output CARS_USA;
   end;   
run;

But how do you switch between the output files? The idea came from SAS' Chris Hemedinger, who suggested using multiple FILE statements to redirect output to different external files.

Splitting an external raw file into many

As you know, one can use PUT statement in a SAS DATA Step to output a character string or a combination of character strings and variable values into an external file. That external file (a destination) is defined by a

 
filename inf  'c:\temp\input_file.txt';
filename out1 'c:\temp\traffic.txt';
filename out2 'c:\temp\system.txt';
filename out3 'c:\temp\threat.txt';
filename out4 'c:\temp\other.txt';
 
data _null_;
   infile inf;
   input REC_TYPE $10. @;
   input;
   select(REC_TYPE);
      when('TRAFFIC') file out1;
      when('SYSTEM')  file out2;
      when('THREAT')  file out3;
      otherwise       file out4;
   end;
   put _infile_;
run;

In this code, the first INPUT statement retrieves the value of REC_TYPE. The trailing @ line-hold specifier ensures that an input record is held for the execution of the next INPUT statement within the same iteration of the DATA Step. It may not be used exactly as written, but the point is you need to capture the filed(s) of interest and stay on the same row.

The second INPUT statement reads the whole raw file record into the _infile_ DATA Step automatic variable.

Depending on the value of the REC_TYPE variable assigned in the first INPUT statement, SELECT block toggles the FILE definition between one of the four filerefs, out1, out2, out3, or out4.

Then the PUT statement outputs the _infile_ automatic variable value to the output file defined in the SELECT block.

Splitting a data set into several external files

Similar technique can be used to split a data table into several external raw files. Let’s combine the above two code samples to demonstrate how you can split a data set into several external raw files:

 
filename outasi 'c:\temp\cars_asia.txt';
filename outeur 'c:\temp\cars_europe.txt';
filename outusa 'c:\temp\cars_usa.txt';
 
data _null_;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   file outasi;
      when('Europe') file outeur;
      when('USA')    file outusa;
   end;
   put _all_; 
run;

This code will read observations of the SASHELP.CARS data table, and depending on the value of ORIGIN variable, put _all_ will output all the variables (including automatic variables _ERROR_ and _N_) as named values (VARIABLE_NAME=VARIABLE_VALUE pairs) to one of the three external raw files specified by their respective file references (outasi, outeur, or outusa.)

You can modify this code to produce delimited files with full control over which variables and in what order to output. For example, the following code sample produces 3 files with comma-separated values:

 
data _null_;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   file outasi dlm=',';
      when('Europe') file outeur dlm=',';
      when('USA')    file outusa dlm=',';
   end;
   put make model type origin msrp invoice; 
run;

You may use different delimiters for your output files. In addition, rather than using mutually exclusive SELECT, you may use different logic for re-directing your output to different external files.

Bonus: How to zip your output files as you create them

For those readers who are patient enough to read to this point, here is another tip. As described in this series of blog posts by Chris Hemedinger, in SAS you can read your external raw files directly from zipped files without unzipping them first, as well as write your output raw files directly into zipped files. You just need to specify that in your filename statement. For example:

UNIX/Linux

 
filename outusa ZIP '/sas/data/temp/cars_usa.txt.gz' GZIP;

Windows

 
filename outusa ZIP 'c:\temp\cars.zip' member='cars_usa.txt';

Your turn

What is your experience with creating multiple external raw files? Could you please share with the rest of us?

How to split a raw file or a data set into many external raw files was published on SAS Users.

1月 252019
 

Need to authenticate on REST API calls

In my blog series regarding SAS REST APIs (article 1, article 2, article 3) I outlined how to integrate SAS analytical capabilities into applications. I detailed how to construct REST calls, build body parameters and interpret the responses. I've not yet covered authentication for the operations, purposefully putting the cart before the horse. If you're not authenticated, you can't do much, so this post will help to get the horse and cart in the right order.

While researching authentication I ran into multiple, informative articles and papers on SAS and OAuth. A non-exhaustive list includes Stuart Rogers' article on SAS Viya authentication options, one of which is OAuth. Also, I found several resources on connecting to external applications from SAS with explanations of OAuth. For example, Joseph Henry provides an overview of OAuth and using it with PROC HTTP and Chris Hemedinger explains securing REST API credentials in SAS programs in this article. Finally, the SAS Viya REST API documentation covers details on application registration and access token generation.

Consider this post a quick guide to summarize these resources and shed light on authenticating via authorization code and passwords.

What OAuth grant type should I use?

Choosing the grant method to get an access token with OAuth depends entirely on your application. You can get more information on which grant type to choose here. This post covers two grant methods: authorization code and password. Authorization code grants are generally used with web applications and considered the safest choice. Password grants are most often used by mobile apps and applied in more trusted environments.

The process, briefly

Getting an external application connected to the SAS Viya platform requires the following steps:

  1. Use the SAS Viya configuration server's Consul token to obtain an ID Token to register a new Client ID
  2. Use the ID Token to register the new client ID and secret
  3. Obtain the authorization code
  4. Acquire the access OAuth token of the Client ID using the authorization code
  5. Call the SAS Viya API using the access token for the authentication.

Registering the client (steps 1 and 2) is a one-time process. You will need a new authorization code (step 3) if the access token is revoked. The access and refresh tokens (step 4) are created once and only need to be refreshed if/when the token expires. Once you have the access token, you can call any API (step 5) if your access token is valid.

Get an access token using an authorization code

Step 1: Get the SAS Viya Consul token to register a new client

The first step to register the client is to get the consul token from the SAS server. As a SAS administrator (sudo user), access the consul token using the following command:

$ export CONSUL_TOKEN=`cat /opt/sas/viya/config/etc/SASSecurityCertificateFramework/tokens/consul/default/client.token`
64e01b03-7dab-41be-a104-2231f99d7dd8

The Consul token returns and is used to obtain an access token used to register the new application. Use the following cURL command to obtain the token:

$ curl -k -X POST "https://sasserver.demo.sas.com/SASLogon/oauth/clients/consul?callback=false&serviceId=app" \
     -H "X-Consul-Token: 64e01b03-7dab-41be-a104-2231f99d7dd8"
 {"access_token":"eyJhbGciOiJSUzI1NiIsIm...","token_type":"bearer","expires_in":35999,"scope":"uaa.admin","jti":"de81c7f3cca645ac807f18dc0d186331"}

The returned token can be lengthy. To assist in later use, create an environment variable from the returned token:

$ export IDTOKEN="eyJhbGciOiJSUzI1NiIsIm..."

Step 2: Register the new client

Change the client_id, client_secret, and scopes in the code below. Scopes should always include "openid" along with any other groups this client needs to get in the access tokens. You can specify "*" but then the user gets prompted for all their groups, which is tedious. The example below just uses one group named "group1".

$ curl -k -X POST "https://sasserver.demo.sas.com/SASLogon/oauth/clients" \
       -H "Content-Type: application/json" \
       -H "Authorization: Bearer $IDTOKEN" \
       -d '{
        "client_id": "myclientid", 
        "client_secret": "myclientsecret",
        "scope": ["openid", "group1"],
        "authorized_grant_types": ["authorization_code","refresh_token"],
        "redirect_uri": "urn:ietf:wg:oauth:2.0:oob"
       }'
{"scope":["openid","group1"],"client_id":"app","resource_ids":["none"],"authorized_grant_types":["refresh_token","authorization_code"],"redirect_uri":["urn:ietf:wg:oauth:2.0:oob"],"autoapprove":[],"authorities":["uaa.none"],"lastModified":1547138692523,"required_user_groups":[]}

Step 3: Approve access to get authentication code

Place the following URL in a browser. Change the hostname and myclientid in the URL as needed.

https://sasserver.demo.sas.com/SASLogon/oauth/authorize?client_id=myclientid&response_type=code

The browser redirects to the SAS login screen. Log in with your SAS user credentials.

SAS Login Screen

On the Authorize Access screen, select the openid checkbox (and any other required groups) and click the Authorize Access button.

Authorize Access form

After submitting the form, you'll see an authorization code. For example, "lB1sxkaCfg". You will use this code in the next step.

Authorization Code displays

Step 4: Get an access token using the authorization code

Now we have the authorization code and we'll use it in the following cURL command to get the access token to SAS.

$ curl -k https://sasserver.demo.sas.com/SASLogon/oauth/token -H "Accept: application/json" -H "Content-Type: application/x-www-form-urlencoded" \
     -u "app:appclientsecret" -d "grant_type=authorization_code&code=YZuKQUg10Z"
{"access_token":"eyJhbGciOiJSUzI1NiIsImtpZ...","token_type":"bearer","refresh_token":"eyJhbGciOiJSUzI1NiIsImtpZC...","expires_in":35999,"scope":"openid","jti":"b35f26197fa849b6a1856eea1c722933"}

We use the returned token to authenticate and authorize the calls made between the client and SAS. We also get a refresh token we use to issue a new token when the current one expires. This way we can avoid repeating all the previous steps. I explain the refresh process further down.

We will again create environment variables for the tokens.

$ export ACCESS_TOKEN="eyJhbGciOiJSUzI1NiIsImtpZCI6ImxlZ..."
$ export REFRESH_TOKEN="eyJhbGciOiJSUzI1NiIsImtpZC..."

Step 5: Use the access token to call SAS Viya APIs

The prep work is complete. We can now send requests to SAS Viya and get some work done. Below is an example REST call that returns user preferences.

$ curl -k https://sasserver.demo.sas.com/preferences/ -H "Authorization: Bearer $ACCESS_TOKEN"
{"version":1,"links":[{"method":"GET","rel":"preferences","href":"/preferences/preferences/stpweb1","uri":"/preferences/preferences/stpweb1","type":"application/vnd.sas.collection","itemType":"application/vnd.sas.preference"},{"method":"PUT","rel":"createPreferences","href":"/preferences/preferences/stpweb1","uri":"/preferences/preferences/stpweb1","type":"application/vnd.sas.preference","responseType":"application/vnd.sas.collection","responseItemType":"application/vnd.sas.preference"},{"method":"POST","rel":"newPreferences","href":"/preferences/preferences/stpweb1","uri":"/preferences/preferences/stpweb1","type":"application/vnd.sas.collection","responseType":"application/vnd.sas.collection","itemType":"application/vnd.sas.preference","responseItemType":"application/vnd.sas.preference"},{"method":"DELETE","rel":"deletePreferences","href":"/preferences/preferences/stpweb1","uri":"/preferences/preferences/stpweb1","type":"application/vnd.sas.collection","itemType":"application/vnd.sas.preference"},{"method":"PUT","rel":"createPreference","href":"/preferences/preferences/stpweb1/{preferenceId}","uri":"/preferences/preferences/stpweb1/{preferenceId}","type":"application/vnd.sas.preference"}]}

Use the refresh token to get a new access token

To use the refresh token to get a new access token, simply send a cURL command like the following:

$ curl -k https://sasserver.demo.sas.com/SASLogon/oauth/token -H "Accept: application/json" \
     -H "Content-Type: application/x-www-form-urlencoded" -u "app:appclientsecret" -d "grant_type=refresh_token&refresh_token=$REFRESH_TOKEN"
{"access_token":"eyJhbGciOiJSUzI1NiIsImtpZCI6ImxlZ...","token_type":"bearer","refresh_token":"eyJhbGciOiJSUzI1NiIsImtpZCSjYxrrNRCF7h0oLhd0Y","expires_in":35999,"scope":"openid","jti":"a5c4456b5beb4493918c389cd5186f02"}

Note the access token is new, and the refresh token remains static. Use the new token for future REST calls. Make sure to replace the ACCESS_TOKEN variable with the new token. Also, the access token has a default life of ten hours before it expires. Most applications deal with expiring and refreshing tokens programmatically. If you wish to change the default expiry of an access token in SAS, make a configuration change in the JWT properties in SAS.

Get an access token using a password

The steps to obtain an access token with a password are the same as with the authorization code. I highlight the differences below, without repeating all the steps.
The process for accessing the ID Token and using it to get an access token for registering the client is the same as described earlier. The first difference when using password authentication is when registering the client. In the code below, notice the key authorized_grant_types has a value of password, not authorization code.

$ curl -k -X POST https://sasserver.demo.sas.com/SASLogon/oauth/clients -H "Content-Type: application/json" \
       -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZ..." \
       -d '{
        "client_id": "myclientid", 
        "client_secret": "myclientsecret",
        "scope": ["openid", "group1"],
        "authorized_grant_types": ["password","refresh_token"],
        "redirect_uri": "urn:ietf:wg:oauth:2.0:oob"
        }'
{"scope":["openid","group1"],"client_id":"myclientid","resource_ids":["none"],"authorized_grant_types":["refresh_token","authorization_code"],"redirect_uri":["urn:ietf:wg:oauth:2.0:oob"],"autoapprove":[],"authorities":["uaa.none"],"lastModified":1547801596527,"required_user_groups":[]}

The client is now registered on the SAS Viya server. To get the access token, we send a command like we did when using the authorization code, just using the username and password.

curl -k https://sasserver.demo.sas.com/SASLogon/oauth/token \
     -H "Content-Type: application/x-www-form-urlencoded" -u "sas.cli:" -d "grant_type=password&username=sasdemo&password=mypassword"
{"access_token":"eyJhbGciOiJSUzI1NiIsImtpZCI6Imx...","token_type":"bearer","refresh_token":"eyJhbGciOiJSUzI1NiIsImtpZ...","expires_in":43199,"scope":"DataBuilders ApplicationAdministrators SASScoreUsers clients.read clients.secret uaa.resource openid PlanningAdministrators uaa.admin clients.admin EsriUsers scim.read SASAdministrators PlanningUsers clients.write scim.write","jti":"073bdcbc6dc94384bcf9b47dc8b7e479"}

From here, sending requests and refreshing the token steps are identical to the method explained in the authorization code example.

Final thoughts

At first, OAuth seems a little intimidating; however, after registering the client and creating the access and refresh tokens, the application will handle all authentication components . This process runs smoothly if you plan and make decisions up front. I hope this guide clears up any question you may have on securing your application with SAS. Please leave questions or comments below.

Authentication to SAS Viya: a couple of approaches was published on SAS Users.

11月 032018
 

When you begin to work within the SAS Viya ecosystem, you learn that the central piece is SAS Cloud Analytic Services (CAS). CAS allows all clients in the SAS Viya ecosystem to communicate and run analytic methods. The great part about SAS Viya is that the R client can drive CAS directly using familiar objects and constructs for R programmers.

The SAS Scripting Wrapper for Analytics Transfer (SWAT) package is an R interface to CAS. With this package, you can load data into memory and apply CAS actions to transform, summarize, model and score the data. You can still retain the ease-of-use of R on the client side to further post process CAS result tables.

But before you can do any analysis in CAS, you need some data to work with and a way to get to it. There are two data access components in CAS:

  1. Caslibs, definitions that give access to a resource that contains data.
  2. CASTables, for analyzing data from a caslib resource. You load the data into a CASTable, which contains information about the data in the columns.

Other references you may find of interest include this GitHub repository where you can find more information on installing and configuring CAS and SWAT. Also available is this article on using RStudio with SAS Viya.

The following excerpt from SAS® Viya® : the R Perspective, the book I co-authored with my SAS colleague Xiangxiang Meng, demonstrates the way the R client in SAS Viya allows you to select data with precision. The examples come from the iris flower data set, which is available in the SASHELP library, in all distributions of SAS. The CASTable object sorttbl is sorted by the Sepal.Width column.

Rather than using fixed values of rows and columns to select data, we can create conditions that are based on the data in the table to determine which rows to select. The specification of conditions is done using the same syntax as that used by data.frame objectsCASTable objects support R’s various comparison operators and build a filter that subsets the rows in the table. You can then use the result of that comparison to index into a CASTableIt sounds much more complicated than it is, so let’s look at an example.

This expression creates a computed column that is used in a where expression on the CASTable. This expression can then be used as an index value for a CASTable. Indexing this way essentially creates a Boolean mask. Wherever the expression values are true, the rows of the table are returned. Wherever the expression is false, the rows are filtered out.

> newtbl <- sorttbl[expr,] > head(newtbl) 
 
  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species 
1          7.7         2.6          6.9         2.3 virginica 
2          7.7         2.8          6.7         2.0 virginica 
3          7.6         3.0          6.6         2.1 virginica 
4          7.7         3.8          6.7         2.2 virginica

These two steps are commonly entered on one line.

> newtbl <- sorttbl[sorttbl$Petal.Length > 6.5,]
> head(newtbl) 
 
  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species 
1          7.7         2.6          6.9         2.3 virginica 
2          7.7         2.8          6.7         2.0 virginica 
3          7.6         3.0          6.6         2.1 virginica 
4          7.7         3.8          6.7         2.2 virginica

We can further filter rows out by indexing another comparison expression.

> newtbl2 <- newtbl[newtbl$Petal.Width < 2.2,] > head(newtbl2) 
 
  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species 
1          7.7         2.8          6.7         2.0 virginica 
2          7.6         3.0          6.6         2.1 virginica

Comparisons can be joined using the bitwise comparison operators & (and) and | (or). You must be careful with these operators though due to operator precedence. Bitwise comparison has a lower precedence than comparisons such as greater-than and less-than, but it is still safer to enclose your comparisons in parentheses.

> newtbl3 <- sorttbl[(sorttbl$Petal.Length > 6.5) & (sorttbl$Petal.Width < 2.2),] > head(newtbl3) 
 
  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species 
1          7.7         2.8          6.7         2.0 virginica 
2          7.6         3.0          6.6         2.1 virginica

In all cases, we are not changing anything about the underlying data in CAS. We are simply constructing a query that is executed with the CASTable when it is used as the parameter in a CAS action. You can see what is happening behind the scenes by displaying the attributes of the resulting CASTable objects.

> attributes(newtbl3) 
 
$conn 
CAS(hostname=server-name.mycompany.com, port=8777, username=username, session=11ed56e2-f9dd-9346-8d01-44a496e68880, protocol=http) 
 
$tname
[1] "iris" 
 
$caslib 
[1] "" 
 
$where 
[1] "((\"Petal.Length\"n > 6.5) AND (\"Petal.Width\"n < 2.2))" 
 
$orderby 
[1] "Sepal.Width" 
 
$groupby 
[1] "" 
 
$gbmode 
[1] "" 
 
$computedOnDemand 
[1] FALSE 
 
$computedVars 
[1] "" 
 
$computedVarsProgram 
[1] "" 
 
$XcomputedVarsProgram 
[1] "" 
 
$XcomputedVars 
[1] "" 
 
$names 
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  
[5] "Species"      
 
$class 
[1] "CASTable" 
attr(,"package") 
[1] "swat"

You can also do mathematical operations on columns with constants or on other columns within your comparisons.

> iris[(iris$Petal.Length + iris$Petal.Width) * 2 > 17.5,] 
 
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species 
118          7.7         3.8          6.7         2.2 virginica 
119          7.7         2.6          6.9         2.3 virginica

The list of supported operators is shown in the following table:

Operator Numeric Data Character Data
+ (add) ✔
- (subtract) ✔
* (multiply) ✔
/ (divide) ✔
%% (modulo) ✔
%/% (integer division) ✔
^ (power) [✔

The supported comparison operators are shown in the following table.

Operator Numeric Data Character Data
== (equality) ✔ ✔
!= (inequality) ✔ ✔
< (less than) ✔ ✔
> (greater than) ✔ ✔
<= (less than or equal to) ✔ ✔
>= (greater than or equal to) ✔ ✔

 

As you can see in the preceding tables, you can do comparisons on character columns as well. In the following example, all of the rows in which Species is equal to "virginica" are selected and saved to a new CASTable object virginica. Note that in this case, data is still not duplicated.

> tbl <- defCasTable(conn, 'iris') > virginica <- tbl[tbl$Species == 'virginica',] > dim(virginica) 
 
[1] 50  5 
 
> head(virginica) 
 
  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species 
1          7.7         3.0          6.1         2.3 virginica 
2          6.3         3.4          5.6         2.4 virginica 
3          6.4         3.1          5.5         1.8 virginica 
4          6.0         3.0          4.8         1.8 virginica 
5          6.9         3.1          5.4         2.1 virginica 
6          6.7         3.1          5.6         2.4 virginica

It’s easy to create powerful filters that are executed in CAS while still using the R syntax. However, the similarities to dataframe don’t end there. CASTable objects can also create computed columns and by groups using similar techniques.

Want to learn more? Get your copy of SAS Viya: The R Perspective

How to use SAS® Viya® and R for dynamic data selection was published on SAS Users.

10月 102018
 

Deep learning (DL) is a subset of neural networks, which have been around since the 1960’s. Computing resources and the need for a lot of data during training were the crippling factor for neural networks. But with the growing availability of computing resources such as multi-core machines, graphics processing units (GPUs) accelerators and hardware specialized, DL is becoming much more practical for business problems.

Financial institutions use a large number of computations to evaluate portfolios, price securities, and financial derivatives. For example, every cell in a spreadsheet potentially implements a different formula. Time is also usually of the essence so having the fastest possible technology to perform financial calculations with acceptable accuracy is paramount.

In this blog, we talk to Henry Bequet, Director of High-Performance Computing and Machine Learning in the Finance Risk division of SAS, about how he uses DL as a technology to maximize performance.

Henry discusses how the performance of numerical applications can be greatly improved by using DL. Once a DL network is trained to compute analytics, using that DL network becomes drastically faster than more classic methodologies like Monte Carlo simulations.

We asked him to explain deep learning for numerical analysis (DL4NA) and the most common questions he gets asked.

Can you describe the deep learning methodology proposed in DL4NA?

Yes, it starts with writing your analytics in a transparent and scalable way. All content that is released as a solution by the SAS financial risk division uses the "many task computing" (MTC) paradigm. Simply put, when writing your analytics using the many task computing paradigm, you organize code in SAS programs that define task inputs and outputs. A job flow is a set of tasks that will run in parallel, and the job flow will also handle synchronization.

Fig 1.1 A Sequential Job Flow

The job flow in Figure 1.1 visually gives you a hint that the two tasks can be executed in parallel. The addition of the task into the job flow is what defines the potential parallelism, not the task itself. The task designer or implementer doesn’t need to know that the task is being executed at the same time as other tasks. It is not uncommon to have hundreds of tasks in a job flow.

Fig 1.2 A Complex Job Flow

Using that information, the SAS platform, and the Infrastructure for Risk Management (IRM) is able to automatically infer the parallelization in your analytics. This allows your analytics to run on tens or hundreds of cores. (Most SAS customers run out of cores before they run out of tasks to run in parallel.) By running SAS code in parallel, on a single machine or on a grid, you gain orders of magnitude of performance improvements.

This methodology also has the benefit of expressing your analytics in the form of Y= f(x), which is precisely what you feed a deep neural network (DNN) to learn. That organization of your analytics allows you to train a DNN to reproduce the results of your analytics originally written in SAS. Once you have the trained DNN, you can use it to score tremendously faster than the original SAS code. You can also use your DNN to push your analytics to the edge. I believe that this is a powerful methodology that offers a wide spectrum of applicability. It is also a good example of deep learning helping data scientists build better and faster models.

Fig 1.3 Example of a DNN with four layers: two visible layers and two hidden layers.

The number of neurons of the input layer is driven by the number of features. The number of neurons of the output layer is driven by the number of classes that we want to recognize, in this case, three. The number of neurons in the hidden layers as well as the number of hidden layers is up to us: those two parameters are model hyper-parameters.

How do I run my SAS program faster using deep learning?

In the financial risk division, I work with banks and insurance companies all over the world that are faced with increasing regulatory requirements like CCAR and IFRS17. Those problems are particularly challenging because they involve big data and big compute.

The good news is that new hardware architectures are emerging with the rise of hybrid computing. Computers are increasing built as a combination of traditional CPUs and innovative devices like GPUs, TPUs, FPGAs, ASICs. Those hybrid machines can run significantly faster than legacy computers.

The bad news is that hybrid computers are hard to program and each of them is specific: you write code for GPU, it won’t run on an FPGA, it won’t even run on different generations of the same device. Consequently, software developers and software vendors are reluctant to jump into the fray and data scientist and statisticians are left out of the performance gains. So there is a gap, a big gap in fact.

To fill that gap is the raison d’être of my new book, Deep Learning for Numerical Applications with SAS. Check it out and visit the SAS Risk Management Community to share your thoughts and concerns on this cross-industry topic.

Deep learning for numerical analysis explained was published on SAS Users.

7月 172018
 

Automation for SAS Administrators - deleting old filesAttention SAS administrators! When running SAS batch jobs on schedule (or manually), they usually produce date-stamped SAS logs which are essential for automated system maintenance and troubleshooting. Similar log files have been created by various SAS infrastructure services (Metadata server, Mid-tier servers, etc.) However, as time goes on, the relevance of such logs diminishes while clutter stockpiles. In some cases, this may even lead to disk space problems.

There are multiple ways to solve this problem, either by deleting older log files or by stashing them away for auditing purposes (zipping and archiving). One solution would be using Unix/Linux or Windows scripts run on schedule. The other is much "SAS-sier."

Let SAS clean up its "mess"

We are going to write a SAS code that you can run manually or on schedule, which for a specified directory (folder) deletes all .log files that are older than 30 days.
First, we need to capture the contents of that directory, then select those file names with extension .log, and finally, subset that file selection to a sub-list where Date Modified is less than Today's Date minus 30 days.

Perhaps the easiest way to get the contents of a directory is by using the X statement (submitting DOS’ DIR command from within SAS with a pipe (>) option, e.g.

x 'dir > dirlist.txt';

or using pipe option in the filename statement:

filename DIRLIST pipe 'dir "C:\Documents and Settings"';

However, SAS administrators know that in many organizations, due to cyber-security concerns IT department policies do not allow enabling the X statement by setting SAS XCMD system option to NOXCMD (XCMD system option for Unix). This is usually done system-wide for the whole SAS Enterprise client-server installation via SAS configuration. In this case, no operating system command can be executed from within SAS. Try running any X statement in your environment; if it is disabled you will get the following ERROR in the SAS log:

ERROR: Shell escape is not valid in this SAS session.

To avoid that potential roadblock, we’ll use a different technique of capturing the contents of a directory along with file date stamps.

Macro to delete old log files in a directory/folder

The following SAS macro cleans up a Unix directory or a Windows folder removing old .log files. I must admit that this statement is a little misleading. The macro is much more powerful. Not only it can delete old .log files, it can remove ANY file types specified by their extension.

%macro mr_clean(dirpath=,dayskeep=30,ext=.log);
   data _null_;
      length memname $256;
      deldate = today() - &dayskeep;
      rc = filename('indir',"&dirpath");
      did = dopen('indir');
      if did then
      do i=1 to dnum(did);
         memname = dread(did,i);
         if reverse(trim(memname)) ^=: reverse("&ext") then continue;
         rc = filename('inmem',"&dirpath/"!!memname);
         fid = fopen('inmem');
         if fid then 
         do;
            moddate = input(finfo(fid,'Last Modified'),date9.);
            rc = fclose(fid);
            if . < moddate <= deldate then rc = fdelete('inmem');
         end;
      end; 
      rc = dclose(did);
      rc = filename('inmem');
      rc = filename('indir');
   run;
%mend mr_clean;

This macro has 3 parameters:

  • dirpath - directory path (required);
  • dayskeep - days to keep (optional, default 30);
  • ext - file extension (optional, default .log).

This macro works in both Windows and Linux/Unix environments. Please note that dirpath and ext parameter values are case-sensitive.

Here are examples of the macro invocation:

1. Using defaults

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean)

With this macro call, all files with extension .log (default) which are older than 30 days (default) will be deleted from the specified directory.

2. Using default extension

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean,dayskeep=20)

With this macro call, all files with extension .log (default) which are older than 20 days will be deleted from the specified directory.

3. Using explicit parameters

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean,dayskeep=10,ext=.xls)

With this macro call, all files with extension .xls (Excel files) which are older than 10 days will be deleted from the specified directory.

Old file deletion SAS macro code explanation

The above SAS macro logic and actions are done within a single data _NULL_ step. First, we calculate the date from which file deletion starts (going back) deldate = today() - &dayskeep. Then we assign fileref indir to the specified directory &dirpath:

rc = filename('indir',"&dirpath");

Then we open that directory:

did = dopen('indir');

and if it opened successfully (did>0) we loop through its members which can be either files or directories:

do i=1 to dnum(did);

In that loop, first we grab the directory member name:

memname = dread(did,i);

and look for our candidates for deletion, i.e., determine if that name (memname) ends with "&ext". In order to do that we reverse both character strings and compare their first characters. If they don’t match (^=: operator) then we are not going to touch that member - the continue statement skips to the end of the loop. If they do match it means that the member name does end with "&ext" and it’s a candidate for deletion. We assign fileref inmem to that member:

rc = filename('inmem',"&dirpath/"!!memname);

Note that forward slash (/) Unix/Linux path separator in the above statement is also a valid path separator in Windows. Windows will convert it to back slash (\) for display purposes, but it interprets forward slash as a valid path separator along with back slash.
Then we open that file using fopen function:

fid = fopen('inmem');

If inmem is a directory, the opening will fail (fid=0) and we will skip the following do-group that is responsible for the file deletion. If it is file and is opened successfully (fid>0) then we go through the deletion do-group where we first grab the file Last Modified date as moddate, close the file, and if moddate <= deldate we delete that file:

rc = fdelete('inmem');

Then we close the directory and un-assign filerefs for the members and directory itself.

Deleting old files across multiple directories/folders

Macro %mr_clean is flexible enough to address various SAS administrators needs. You can use this macro to delete old files of various types across multiple directories/folders. First, let’s create a driver table as follows:

data delete_instructions;
   length days 8 extn $9 path $256;
   infile datalines truncover;
   input days 1-2 extn $ 4-12 path $ 14-270;
   datalines;
30 .log      C:\PROJECTS\Automatically deleting old files\Logs1
20 .log      C:\PROJECTS\Automatically deleting old files\Logs2
25 .txt      C:\PROJECTS\Automatically deleting old files\Texts
35 .xls      C:\PROJECTS\Automatically deleting old files\Excel
30 .sas7bdat C:\PROJECTS\Automatically deleting old files\SAS_Backups
;

This driver table specifies how many days to keep files of certain extensions in each directory. In this example, perhaps the most beneficial deletion applies to the SAS_Backups folder since it contains SAS data tables (extension .sas7bdat). Data files typically have much larger size than SAS log files, and therefore their deletion frees up much more of the valuable disk space.

Then we can use this driver table to loop through its observations and dynamically build macro invocations using CALL EXECUTE:

data _null_;
   set delete_instructions;
   s = cats('%nrstr(%mr_clean(dirpath=',path,',dayskeep=',days,',ext=',extn,'))');
   call execute(s);
run;

Alternatively, we can use DOSUBL() function to dynamically execute our macro at every iteration of the driver table:

data _null_;
   set delete_instructions;
   s = cats('%mr_clean(dirpath=',path,',dayskeep=',days,',ext=',extn,')');
   rc = dosubl(s);
run;

Put it on autopilot

When it comes to cleaning your old files (logs, backups, etc.), the best practice for SAS administrators is to schedule your cleaning job to automatically run on a regular basis. Then you can forget about this chore around your "SAS house" as %mr_clean macro will do it quietly for you without the noise and fuss of a Roomba.

Your turn, SAS administrators

Would you use this approach in your SAS environment? Any suggestions for improvement? How do you deal with old log files? Other old files? Please share below.

SAS administrators tip: Automatically deleting old SAS logs was published on SAS Users.

12月 202017
 

Running SAS programs in batchWhile SAS program development is usually done in an interactive SAS environment (SAS Enterprise Guide, SAS Display Manager, SAS Studio, etc.), when it comes to running SAS programs in a production or operations environment, it is routinely done in batch mode.

Why run SAS programs in batch mode?

First and foremost, this is done for automation, as the batch process does not require human participation at the time of run. It can be scheduled to run (using Operating System scheduler or other scheduling software) while we sleep, at any time of the day or at any time interval between two consecutive runs.

Running SAS programs in batch mode allows streamlining SAS processing by eliminating the possibility of human error, submitting multiple SAS jobs (programs) all at once or in a sequence securing programs and/or data dependencies.

SAS batch processing also takes care of self-documenting, as it automatically generates and stores SAS logs and outputs.

Imagine the following scenario. Every night, a SAS batch process “wakes up” at 3 a.m. and runs an ETL process on a SAS Application server that extracts multiple tables from a database, transforms, combines, and loads them into a SAS datamart; then moves some data tables across the network and loads them into SAS LASR server, so when you are back to work in the morning your SAS Visual Analytics application has all its data refreshed and ready to roll. Of course, the process schedule can be custom-tailored to your particular needs; your batch jobs may run every 15 minutes, once a week, every first Friday of the month – you name it.

What is a batch script file?

To submit a single SAS program in batch mode manually, you could submit an OS command that looks something like the following:

Unix/Linux

sas /sas/code/proj1/job1.sas -log /sas/code/proj1/job1.log

DOS/Windows

"C:\Program Files\SASHome\SASFoundation\9.4\Sas.exe" -SYSIN c:\proj1\job1.sas -NOSPLASH -ICON -LOG c:\proj1\job1.log

However, submitting an OS command manually has too many drawbacks: it’s too much typing, it only submits one SAS program at a time, and most importantly – it is manual, which means it is prone to human error.

Usually, these OS commands are packaged into so called batch files (shell scripts in Unix) that allow for sequential, parallel, as well as conditional execution of multiple OS line commands. They can be run either manually, or automatically – on schedule, or called by other batch scripts.

In a Windows/DOS Operating System, these script files are called batch files and have .bat filename extensions. In Unix-like operating systems, such as Linux, these script files are called shell scripts and have .sh filename extensions.

Since Windows batch files are similar, but slightly different from the Unix (and its open source cousin Linux) shell scripts, in the below examples we are going to use Unix/Linux shell scripts only, in order to avoid any confusion. And we are going to use terms Unix and Linux interchangeably.

Here is the typical content of a Linux shell script file to run a single SAS program:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

Note, that the shell script syntax allows for some basic programming features like current datetime function, formatting, and variables. It also provides some conditional processing similar to “if-then-else” logic. For detailed information on the shell scripting language you may refer to the following BASH shell script tutorial or any other source of many dialects or flavors of the shell scripting (C Shell, Korn Shell, etc.)

Let’s save the above shell script as the following file:
/sas/code/project1/program1.sh

How to submit a SAS program via Unix script

In order to run this shell script we would submit the following Linux command:
/sas/code/project1/program1.sh

Or, if we navigate to the directory first:
cd /sas/code/project1

then we can submit an abbreviated Linux command
./program1.sh
When run, this shell script not only executes a SAS program (program1.sas), but for every run it also creates and saves a uniquely named SAS Log file. You may create the SAS log file in the same directory where the SAS code is stored, as specified in the script shell above, or specify another directory of your choice.

For example, it creates the following SAS log file:
/sas/code/project1/program1_2017.12.06_09.15.20.log

The file name uniqueness is achieved by adding a date/time stamp suffix between the SAS program name and .log file name extension, in this particular case indicating that this SAS log file was created on December 6, 2017, at 09:15:20 (hours:minutes:seconds).

Unix script for submitting multiple SAS programs

Unix scripts may contain not only OS commands, but also other Unix script calls. You can mix-and-match OS commands and other script calls.

When scripts are created for each individual SAS program that you intend to run in a batch, you can easily combine them into a program flow by creating a flow script containing those single program scripts. For example, let’s create a script file /sas/code/project1/flow1.sh with the following contents:

/sas/code/project1/program1.sh
/sas/code/project1/program2.sh
/sas/code/project1/program3.sh

When submitted as

/sas/code/project1/flow1.sh

it will sequentially execute three scripts - program1.sh, program2.sh, and program3.sh, each of which will execute the corresponding SAS program - program1.sas, program2.sas, and program3.sas, and produce three SAS logs - program1.log, program2.log, and program3.log.

Unix script file permissions

In order to be executable, UNIX script files must have certain permissions. If you create the script file and want to execute it yourself only, the file permissions can be as follows:

-rwxr-----, or 740 in octal representation.

This means that you (the Owner of the script file) have Read (r), Write (w) and Execute (x) permission as indicated by the green highlighting; Group owning the script file has only Read (r) permission as indicated by yellow highlighting;  Others have no permissions to the script file at all as indicated by red highlighting.

If you want to give yourself (Owner) and Group execution permissions then your script file permissions can be as:

-rwxr-x---, or 750 in octal representation.

In this case, your group has Read (r) and Execute (x) permissions as highlighted in yellow.

In Unix, file permissions are assigned using the chmod Unix command.

Note, that in both examples above we do not give Others any permissions at all. Remember that file permissions are a security feature, and you should assign them at the minimum level necessary.

Conditional execution of scripts and SAS programs

Here is an example of a Unix script file that allows running multiple SAS programs and OS commands at different times.

#!/bin/sh

#1 extract data from a database
/sas/code/etl/etl.sh

&gt;#2 copy data to the Visual Analytics autoload directory
scp -B userid@sasAPPservername:/sas/data/*.sas7bdat userid@sasVAservername:/sas/config/.../AutoLoad

#3 run weekly, every Monday
dow=$(date +%w)
if [ $dow -eq 1 ]
then
   /sas/code/alerts_generation.sh
fi

#4 run monthly, first Friday of every month
dom=$(date +%d)
if [ $dow -eq 5 -a $dom -le 7 ]
then
   /sas/code/update_history.sh
   /sas/code/update_transactions.sh
fi

In this script, the following logical operators are used: -eq (equal), -le (less or equal), -a (logical and).

As you can see, the script logic takes care of branching to execute different SAS programs when certain timing conditions are met. With such an approach, you would need to schedule only this single script to run at a specified time/interval, say daily at 3 a.m.

In this case, the script will “wake up” every morning at 3 a.m. and execute its component scripts either unconditionally, or conditionally.

If one of the included programs needs to run at a different, lesser frequency (e.g. every Monday, or monthly on first Friday of every month) the script logic will trigger those executions at the appropriate times.

In the above script example steps #1 and #2 will execute every time (unconditionally) the script runs (daily). Step #1 runs ETL program to extract data from a database, step #2 copies the extracted data across the network from SAS Application server to the SAS LASR Analytic server’s drop zone from where they are automatically loaded (autoloaded) into the LASR.

Step #3 will run conditionally every Monday ( $dow -eq 1). Step #4 will run conditionally every first Friday of a month ($dow -eq 5 -a $dom -le 7).

For more information on how to format date for use in shell scripts please refer to this post.

Do you run your SAS programs in batch?

Please share your batch experiences in the comment section below. I am sure the rest of us will really appreciate it!

Running SAS programs in batch under Unix/Linux was published on SAS Users.

11月 072017
 

SAS Tips and TricksThere is certainly no shortage of terrific tips and tricks in various SAS blogs from some of our most distinguished SAS in-house experts. But, there's another group of equally qualified experts who don't often get to share their expertise on this channel: our customers. So, I went on a quest to get the inside scoop from various SAS users, polling Friends of SAS members to get their feedback on their favorite SAS tips.

We asked a few of these Friends of SAS members who are regular SAS users to share with us their top SAS tips and tricks for improving performance or something they wished they had known earlier in their SAS career. Based on that, we got a wide range of tips and tricks from a number of different SAS users – ranging from novice to expert and across various industries and product users. Check out some of them below:

FUNCTIONS

Functions are either built into SAS itself or you can write your own customized code that act in the same manner, all of which help in analyzing and processing data. There are a variety of function categories that include mathematical, date and time, character, truncation, and miscellaneous. Using functions makes us more efficient, and we don’t have to re-invent the wheel every time we want to figure something out. With this being said, some of our regular SAS users have a thing or two to say about dealing with functions that may help you out:

“Before you program any complex code, look for a SAS function that will do the task for you.”
     - John Ladds, Past President, OASUS

“Insert a line break in a concatenated string, such as: manylines = catx('0a'x,a,b,c);”
     - Aroop Ghosh, Principal Consultant, Webtalk Communications

“Use the lag function to create time related variables, for example, in time punch data”
     - Yolanda, Analyst, TD

“A good trick that I have recently learnt [sic]which can make the code less wordier is using the functions IFN and IFC as an alternative to IF THEN ELSE statements in conditional processing.”
     - Sunny Giroti, Master of Business Analytics Candidate, Schulich School of Business

“IFN can be used in place of IF THEN ELSE to shorten code”
     - Neil Menezes, Senior Business Anlyst, CTFS

“Ron Cody’s link from SAS.COM. It has many SAS function examples.”
     - John Lam, CIBC

SYNTAX/SHORTCUTS/EFFICIENCIES

You know what they say: time is money. So for a SAS programmer, finding shortcuts and ways to work more efficiently and faster are important to get a job done quicker. Here are a few ways SAS users think can make your life easy while working with SAS:

“Use missover to ensure no records are skipped when reading in a file”
     - Scott Bellefeuille, IT Solutions Developer (Merchant Services), TD Bank

“Pressing keys 'Ctrl'+'/' to comment out a line of code.”
     - Bunce Leung, Execution Manager, RBC

“Variable Lists - being able to refer to variables using double dashes to indicate all variables between first and last in a dataset is super useful for many procs. The later versions of being able to use the prefix and colon to indicate all datasets with a prefix is a great shortcut as well.”
     - Fareeza Khurshed, Manager (Statistical Services), Alberta Treasury Board and Finance

“I like to use PERL in SAS for finding stuff in character variables.”
     - Peter Timusk, Statistics Officer, Statistics Canada

“Title "SAS can give you an Inheritance". Have an ODBC driver on your local PC but not on a remote server? No problem. Use rsubmit with the inheritlib option. Your remote server will now inherit the ODBC driver and be able to access a database you thought you could only reach with your PC.”
     - Horst Wolter, Manager, TD Bank

“If you want to speed the processing of your program. Run your join statements on the "work" library. It is must faster.”
     - Estela Tavares, Economist, Statistics Canada

“When dealing with probability, can logistic be used in all cases? Trick Q - as A is N0. What about the times, probability is 0 and 1. What if the data is heavily distributed on 1s and 0s.”
     - Mukul Pandey, Student Business Analytics, Schulich School of Business

“Proc tabulate can perform descriptive statistics better than proc freq and proc means.”
     - Taha Azizi, Senior Business Insight Analyst, TD

Your turn

Were any of these tips and tricks useful? Do you use them already? What are some of your top SAS tips and tricks? Please be sure to share in the comments below!

Looking for more tips and tricks? Check out this video featuring six Canadian SAS programmers, including a few Friends of SAS members, who share some of their favourite SAS programming tips.

About Friends of SAS

If you’re not familiar with Friends of SAS, it is an exclusive online community available only to our Canadian SAS customers and partners to recognize and show our appreciation for their affinity to SAS. Members complete activities called 'challenges' and earn points that can be redeemed for rewards. There are opportunities to build powerful connections, gain privileged access to SAS resources and events, and boost your learning and development of SAS all in a fun environment.

Interested in learning more about Friends of SAS? Feel free to email myself at Natasha.Ulanowski@sas.com or Martha.Casanova@sas.com with any questions or more details.

SAS tips and tricks: Users-tell-all edition was published on SAS Users.

11月 022017
 

The purpose of this blog post is to demonstrate a SAS coding technique that allows for calculations with multiple variables across a SAS dataset, whether or not their values belong to the same or different observations.

calculations across observations of a SAS data table

What do we want?

As illustrated in the picture on the right, we want to be able to hop, or jump, back and forth, up and down across observations of a data table in order to implement calculations not just on different variables, but with their values from different observations of a data table.

In essence, we want to access SAS dataset variable values similar to accessing elements of a matrix (aij), where rows represent dataset observations, and columns represent dataset variables.

Combine and Conquer

In the spirit of my earlier post Combine and Conquer with SAS, this technique combines the functionality of the LAG function, which allows us to retrieve the variable value of a previous observation from a queue, with an imaginary, non-existent in SAS, LEAD function that reads a subsequent observation in a data set while processing the current observation during the same iteration of the DATA step.

LAG function

LAG<n> function in SAS is not your usual, ordinary function. While it provides a mechanism of retrieving previous observations in a data table, it does not work “on demand” to arbitrarily read a variable value from a previous observation n steps back. If you want to use it conditionally in some observations of a data step, you still need to call it in every iteration of that data step. That is because it retrieves values from a queue that is built sequentially for each invocation of the LAG<n> function. In essence, in order to use the LAG function even just once in a data step, you need to call it every time in each data step iteration until that single use.

Moreover, if you need to use each of the LAG1, LAG2, . . . LAGn functions just once, in order to build these queues, you have to call each of them in every data step iteration even if you are going to use them in some subsequent iterations.

LEAD function

The LEAD function is implemented in Oracle SQL and it returns data from the next or subsequent row of a data table. It allows you to query more than one row in a table at a time without having to join the table to itself.

There is no such function in SAS. However, the POINT= option of the SET statement in a SAS data step allows retrieving any observation by its number from a data set using random (direct) access to read a SAS data set. This will allow us to simulate a LEAD function in SAS.

HOP function

But why do we need two separate functions like LAG and LEAD in order to retrieve non-current observations. In essence, these two functions do the same thing, just in opposite directions. Why can’t we get by with just one function that does both backwards and forwards “hopping?”

Let’s combine and conquer.

Ideally, we would like to construct a new single function - let’s call it HOP(x, j) - that combines the best qualities of both LAG and LEAD functions. The two arguments of the HOP function would be as follows:

x – SAS variable name (numeric or character) the value of we are retrieving;

j – hop distance (numeric) – an offset from the current observation; negative values being lagging (hopping back), positive values being leading (hopping forward), and a zero-value meaning staying within the current observation.

The sign of the second argument defines whether we lag (minus) or lead (plus). The absolute value of this second argument defines how far from the current observation we hop.

Alternatively, we could have the first argument as a column number, and the second argument as a row/observation number, if we wanted this function to deal with the data table more like with a matrix. But relatively speaking, the method doesn’t really matter as long as we can unambiguously identify a data element or a cell. To stay within the data step paradigm, we will stick with the variable name and offset from the current observation (_n_) as arguments.

Let’s say we have a data table SAMPLE, where for each event FAIL_FLAG=1 we want to calculate DELTA as the difference between DATE_OUT, one observation after the event, and DATE_IN, two observations before the event:

Calculations across observations of a SAS data table

That is, we want to calculate DELTA in the observation where FAIL_FLAG = 1 as

26MAR2017 18JAN2017 = 67 (as shown in light-blue highlighting in the above figure).

With the HOP() function, that calculation in the data step would look like this:

data SAMPLE;
   set SAMPLE;
   if FAIL_FLAG then DELTA = hop(DATE_OUT,1) - hop(DATE_IN,-2);
run;

It would be reasonable to suggest that the hop() function should return a missing value when the second argument produces an observation number outside of the dataset boundary, that is when

_n_ + j < 0 or _n_ + j > num, where _n_ is the current observation number of the data step iteration; num is the number of observations in the dataset; j is the offset argument value.

Do you see anything wrong with this solution? I don’t. Except that the HOP function exists only in my imagination. Hopefully, it will be implemented soon if enough SAS users ask for it. But until then, we can use its surrogate in the form of a %HOP macro.

%HOP macro

The HOP macro grabs the value of a specified variable in an offset observation relative to the current observation and assigns it to another variable. It is used within a SAS data step, but it cannot be used in an expression; each invocation of the HOP macro can only grab one value of a variable across observations and assign it to another variable within the current observation.

If you need to build an expression to do calculations with several variables from various observations, you would need to first retrieve all those values by invoking the %hop macro as many times as the number of the values involved in the expression.

Here is the syntax of the %HOP macro, which has four required parameters:

%HOP(d,x,y,j)

d – input data table name;

x – source variable name;

y – target variable name;

j – integer offset relative to the current observation. As before, a negative value means a previous observation, a positive value means a subsequent observation, and zero means the current observation.

Using this %HOP macro we can rewrite our code for calculating DELTA as follows:

 data SAMPLE (drop=TEMP1 TEMP2);
   set SAMPLE;
   if FAIL_FLAG then
   do;
      %hop(SAMPLE,DATE_OUT,TEMP1, 1)
      %hop(SAMPLE,DATE_IN, TEMP2,-2)
      DELTA = TEMP1 - TEMP2;
   end;
run;

Note that we should not have temporary variables TEMP1 and TEMP2 listed in a RETAIN statement, as this could mess up our calculations if the j-offset throws an observation number out of the dataset boundary.

Also, the input data table name (d parameter value) is the one that is specified in the SET statement, which may or may not be the same as the name specified in the DATA statement.

In case you are wondering where you can download the %HOP macro from, here it is in its entirety:

%macro hop(d,x,y,j);
   _p_ = _n_ + &j;
   if (1 le _p_ le _o_) then set &d(keep=&x rename=(&x=&y)) point=_p_ nobs=_o_;
%mend hop;

Of course, it is “free of charge” and “as is” for your unlimited use.

Your turn

Please provide your feedback and share possible use cases for the HOP function/macro in the Comment section below. This is your chance for your voice to be heard!

Hopping for the best - calculations across SAS dataset observations was published on SAS Users.

9月 152017
 

Whether you are a SAS code creator, a blogger, a technical writer, an editor-in-chief, an executive, a secretary, a developer or programmer in any programming language or simply someone who uses computer or hand-held device for writing, you need to read this blog post – your life is about to change forever!

Did you know that you can use a web browser as a SAS code editor? I’m not talking about browser-based SAS programming interfaces like SAS University Edition or SAS Studio; these are full-blown applications. I’m talking about converting a regular web browser into a “notepad” where you can type, display, and save your SAS code. Or non-SAS code. Or practically anything. And you don’t even have to be connected to the Internet to use this browser functionality.

Converting a web browser into a notepad

This trick works with most modern browsers:

  • Chrome
  • Firefox
  • Opera
  • Safari

It will not work on Internet Explorer 11.

Try this: open your web browser (I am using Firefox in the examples below) or a new tab in your browser and type the following in the URL field (case insensitive):

data:text/html,<html contentEditable>

Hit Enter. Then click anywhere in the browser body.

Your browser has just turned into a Notepad. You can now type anything in it, including SAS code:

Using web browser as SAS code editor

In order to save your SAS code in Firefox, click on File ⇒ Save Page As… and save it as type Text Files (*.txt;*.text):

Saving SAS code in a browser as text file

This functionality is possible thanks to HTML5’s contentEditable attribute and the browsers’ ability to handle data URL.

I don’t know about you, but I find this browser-notepad feature very cool and handy. Not only does it allow you to type SAS code in your browser, but it also gives you the capability to take notes and copy & paste excerpts or code snippets from other web pages on non-web applications. If you use WebEx or Skype or Lync to present one of the SAS web browser-based products such as SAS Visual Analytics, SAS Visual Statistics, etc., you can share your browser to your audience and make one of the tabs a typeable area. Then during your presentation you may switch between browser tabs depending on whether you are presenting SAS VA/VS or your own on-the-fly typing.

Bookmark notepad in a browser

If you like this Notepad browser feature, you can easily bookmark it by placing it on the Bookmarks toolbar. In this case, I suggest typing the following line in the URL filed:

data:text/html,<html contentEditable><title>Notepad</title>

and then dragging the image in front of this URL string and dropping it to the Bookmarks toolbar to create a button. Then, every time you need a Notepad it is at your fingertips; you just need to click the button:

Bookmark SAS code editor in a browser

Styling your new SAS editor in a browser

By default, your browser editor does not look pretty. However, you can apply CSS styles to it to make it look better. You can control font (style, size, color), margins, paddings, background and other CSS style attributes. For example, try the following URL:

data:text/html, <textarea style="width:100%; height:100%; padding:20px; font-size:2em; font-family: SAS Monospace; color:darkblue; border:none; border-left: 10px solid lightblue; margin-left: 30px;" autofocus/>

Your web browser editor becomes much more presentable:

Customize SAS code editor in a browser

The autofocus attribute places cursor immediately in the typing area of the browser notepad, without having to click on the browser body first.

I want to hear from you!

Do you like this editable browser feature? Would you use it to enhance your presentations? Do you envision yourself writing SAS code in a browser? An article, a blog post? What other usages can you envision using such a web browser transformation? Do you have any ideas to expand this notepad browser functionality beyond presenting, typing, taking notes, copying/pasting, and saving your SAS code? Can you apply SAS color syntax highlighting in a browser? Or a background image? How about submitting your SAS code from a browser?

Using a web browser as a SAS code editor was published on SAS Users.