Tech

3月 222019
 

As data volumes continue to surge (with no signs of slowing down), the cost to handle all that data can have a very real impact on IT budgets.

But for all the talk of the growing value of data, the cost side of the equation is often overlooked. After all, it not only takes a lot of drives to hold all that data, it requires significant computing resources to handle it. The costs quickly add up, forcing IT leaders to make tough choices about where to make their IT investments.

This is as true for SAS users as it is for any other IT organization. We work closely with our customers and know that the cost of our analytics software is only part of the budget puzzle – total cost of ownership (TCO) casts a long shadow over their budget decisions. So we were as curious as they were when Intel released its Intel® OptaneTM DC Persistent Memory.

What is persistent memory?

Data written to persistent memory remains accessible using memory instructions even after the process that created or last modified them is gone. In this aspect, persistent memory is like a hard disk drive (HDD). But the speed at which the data can be accessed from persistent memory is considerably faster than the speed delivered by an HDD.

Of course, as a longtime Intel collaborator, we worked closely together to optimize SAS software for this innovation. But how would it work in real-life applications outside of the labs? Would the Intel Optane DC Persistent Memory - SAS® Viya® combination have a real impact, or would it just represent yet another incremental improvement – nice, but not enough to change the IT investment-performance equation?

Testing confirms performance

There was only one way to find out, of course: Test it.

So that’s what we did. In collaboration with Intel, we ran SAS Viya 3.4 on Intel Optane DC Persistent Memory. More specifically, the testing compared performance of a two-socket second-generation Intel® Xeon® Platinum 8280 processor with DRAM to a two-socket second-generation Intel Xeon Platinum 8280 processor with Intel Optane DC persistent memory. (See editor's note below for details.)

We didn’t hold back, either – we ran multiple instances of a large model containing 400 gigabytes of data, concurrently. All these jobs finished in a matter of minutes. Just as important, the costs of achieving performance in a persistent memory environment with SAS Viya, when compared to DRAM and SAS Viya, make their own compelling case.

Intel Optane persistent memory achieves more than 95 percent on-par performance with DRAM, at a lower cost per performance rate of approximately 18 percent. With second-generation Intel Xeon Scalable Processors and Intel Optane DC persistent memory in Memory Mode, it’s clear that users can take advantage of the benefits of systems with larger capacity memory, with little or no performance degradation.

Greater memory capacity = substantial cost savings

But performance may not even be the most interesting part of this story – not when you compare it to the cost benefits of running in this environment. In short, compared with DRAM – equal capacity memory, equal CPU, and so on – using Intel Optane DC Persistent Memory leads to a reduction in costs of 20 percent or more. That’s a tangible, meaningful impact for any IT budget, allowing CIOs to invest more heavily to address other pressing needs. By contrast, system costs increase dramatically with higher DRAM capacity.

Plus, it may be possible in the future to use the memory module in different modes – an opportunity we’re currently testing in non-volatile random-access memory (NVRAM) mode with the goal of allowing applications to take advantage of this benefit.

How does this combination of persistent memory tools and SAS Viya contribute to faster speeds and lower costs? The massive scale of data typically handled by SAS customers can be handled by fewer servers, because each can now have a much greater memory capacity. This equation can translate into a significant reduction in the total cost of ownership.

There are a lot of factors, many of which are unique to your organization, that go into reducing the total cost of ownership – so it goes without saying that your mileage may vary. Regardless, this is a conversation your organization needs to be having right now, if it hasn’t already. We’re happy to answer your questions about running SAS Viya together with Intel Optane DC Persistent Memory.

Want a deeper dive? | See the video on Intel Optane DC Persistent Memory

Editor's Note: Specs for this test included:

  • SAS® Viya® In-memory Analytics: SAS® Viya 3.4 VDMML application.
  • Workload: 3 concurrent logistic regression tasks each running on 400GB datasets.
  • Testing by: Intel and SAS completed on February 15, 2019.
  • Baseline hardware for comparison: 2S Intel® Xeon® Platinum 8280 processor, 2.7GHz, 28 cores, turbo and HT on, BIOS SE5C620.86B.0D.01.0286.011120190816, 1536GB total memory, 24 slots / 64GB / 2666 MT/s / DDR4 LRDIMM, 1x 800GB, Intel SSD DC S3710 OS Drive + 1x 1.5TB Intel Optane SSD DC P4800X NVMe Drive for CAS_DISK_CACHE + 1x 1.5TB Intel SSD DC P4610 NVMe Drive for application data, CentOS Linux* 7.6 kernel 4.19.8.
  • New hardware tested: 2S Intel® Xeon® Platinum 8280 processor, 2.7GHz, 28 cores, turbo and HT on, BIOS SE5C620.86B.0D.01.0286.011120190816, 1536GB Intel Optane DCPMM configured in Memory Mode(8:1), 12 slots / 128GB / 2666 MT/s, 192GB DRAM, 12 slots / 16GB / 2666 MT/s DDR4 LRDIMM, 1x 800GB, Intel SSD DC S3710 OS Drive + 1x 1.5TB Intel Optane SSD DC P4800X NVMe Drive for CAS_DISK_CACHE + 1x 1.5TB Intel SSD DC P4610 NVMe Drive for application data, CentOS Linux* 7.6 kernel 4.19.8.)

Persistent memory: one way to rein in IT costs was published on SAS Users.

3月 212019
 

SAS Decision Manager enables you to build and test decisions to use in batch processes, real-time web applications or with SAS ESP.

In this blog, I explain how to use Rulesets in an Event Stream Process project. If you are streaming data using SAS ESP and your data stream involves making decisions, you can build Rulesets in SAS Decision Manager and use them in your event stream project. ESP can invoke the code generated by SAS Decision Manager and execute it in its Micro Analytic Service (MAS) engine.

Receiving code for Rulesets

To use a Ruleset in Decision Manager within an event stream project in ESP, you need to export the DS2 code generated by Decision Manager and point ESP towards the code to execute it. To export code from Decision Manager, we use the SAS Decision Manager Viya REST API to:
• Obtain an access token to SAS Viya
• Receive the ID for the required Ruleset
• Receive the Decision Manager DS2 code via the Ruleset ID

Obtain an access token to SAS Viya

Before using SAS Viya APIs, your SAS administrator must register a client identifier. The SAS Logon OAuth API uses OAuth2 to securely identify your application before it connects to the SAS Viya platform. See Registering clients for information on how clients are registered. Once a client is successfully registered, the SAS administrator provides you with the client identifier and client secret to authenticate an API request.
To obtain an access token call:
http://{{ViyaServer}}/SASLogon/oauth/token

If successfully executed, you will get an access token for all further REST calls.

Receive the Ruleset ID

We need the ID for the Ruleset we want to use in ESP. The REST Endpoint requires the ID to receive the DS2 code.
To get the ID, call the Endpoint that lists all available Rulesets:
http://{{ViyaServer}}/businessRules/ruleSets

If successfully executed, you will receive the Ruleset ID in the field “id” in the “items” list.

Receive Ruleset code

With this new ID, we can export the DS2 code for the Ruleset.
To get the code, call the appropriate Ruleset Endpoint:
http://{{ViyaServer}}/businessRules/ruleSets//code
Set ID to the value of this new Ruleset ID.

If executed successfully, you will receive the DS2 code for the Ruleset.

Preparing the code

Copy the DS2 code from the REST call into a file, save the file with a descriptive name (i.e. the name of the Ruleset) and move it to a location where ESP can access it.

Invoke Decision Manager Code in ESP

Now that we have saved the code into a file and moved it to a location that ESP can access, we can now invoke the code from our ESP project.

We need to register the ruleset code file we saved.
Open the ESP project and go to Micro Analytic Service Modules at the project level.

Add a new Micro Analytic Service Module for the ruleset code file and fill in all required fields.


To invoke the code in the event stream, add a Calculate Window.
In Settings choose Calculation = User-specified.

Under Handlers, select the source and ensure the field values are set correctly.


Set the fields for the output schema of the calculate window. Note that the field names and types must match the names and types used in the Ruleset.
Save the project.

You are now ready to run your project in Test mode to check if it works.

Conclusion

SAS Decision Manager allows you to build decisions in an independent environment to ESP. This gives you the freedom to design and test decisions in a less technical environment without touching the event stream. After testing the decision, you can simply “hook it in” to your event stream.
Other users can work on and update decisions by just applying a new/updated code file. This will allow your event stream to be to be more flexible and easier to maintain. To learn more, please check out these sources.

Video: SAS Decision Manager
Article: Using SAS Decision Manager to enrich the data prep process

Calling SAS Decision Manager Rulesets in ESP was published on SAS Users.

3月 152019
 
SAS makes it easy for you to create a large amount of procedure output with very few statements. However, when you create a large amount of procedure output with the Output Delivery System (ODS), your SAS session might stop responding or run slowly. In some cases, SAS generates a “Not Responding” message. Beginning with SAS® 9.3, the SAS windowing environment creates HTML output by default and enables ODS Graphics by default. If your code creates a large amount of either HTML output or ODS Graphics output, you can experience performance issues in SAS. This blog article discusses how to work around this issue.

Option 1: Enable the Output window instead of the Results Viewer window

By default, the SAS windowing environment with SAS 9.3 and SAS® 9.4 creates procedure output in HTML format and displays that HTML output in the Results Viewer window. However, when a large amount of HTML output is displayed in the Results Viewer window, performance might suffer. To display HTML output in the Results Viewer window, SAS uses an embedded version of Internet Explorer within the SAS environment. And because Internet Explorer does not process large amounts of HTML output well, it can slow down your results.

If you do not need to create HTML output, you can display procedure output in the Output window instead. To do so, add the following statements to the top of your code before the procedure step:

   ods _all_ close; 
   ods listing;

The Output window can show results faster than HTML output that is displayed in the Results Viewer window.

If you want to enable the Output window via the SAS windowing environment, take these steps:

    1. Choose Tools ► Options ► Preferences.
    2. Click the Results tab.
    3. In this window, select Create listing and clear the Create HTML check box.
    4. Click OK.

A large amount of output in the Output window, which typically does not cause a performance issue, might still generate an “Output window is full” message. In that case, you can route your LISTING output to a disk file. Use either the PRINTTO procedure or the ODS LISTING statement with the FILE= option. Here is an example:

   ods _all_ close; 
   ods listing file="sasoutput.lst"; 

Option 2: Disable ODS Graphics

Beginning with SAS 9.3, the SAS windowing environment enables ODS Graphics by default. Therefore, most SAS/STAT® procedures now create graphics output automatically. Naturally, graphics output can take longer to create than regular text output. If you are running a SAS/STAT procedure but you do not need to create graphics output, add the following statement to the code before the procedure step:

   ods graphics off; 

If you want to set this option via the SAS windowing environment, take these steps:

    1. Choose Tools ► Options ► Preferences.
    2. Click the Results tab.
    3. In this window, clear the Use ODS Graphics check box.
    4. Click OK.

For maximum efficiency, you can combine the ODS GRAPHICS OFF statement with the statements listed in the previous section, as shown here:

   ods _all_ close;
   ods listing;
   ods graphics off; 

Option 3: Write ODS output to disk

You can ask SAS to write ODS output to disk but not to create output in the Results Viewer window. To do so, add the following statement to your code before your procedure step:

   ods results off;

Later in your SAS session, if you decide that you want to see output in the Results Viewer window, submit this statement:

   ods results on;

If you want to disable the Results Viewer window via the SAS windowing environment, take these steps:

    1. Choose Tools ► Options ► Preferences.
    2. Click the Results tab.
    3. In this window, clear the View results as they are generated check box.
    4. Click OK.

The ODS RESULTS OFF statement is a valuable debugging tool because it enables you to write ODS output to disk without viewing it in the Results Viewer window. You can then inspect the ODS output file on disk to check the size of it (before you open it).

Option 4: Suppress specific procedure output from the ODS results

In certain situations, you might use multiple procedure steps to send output to ODS. However, if you want to exclude certain procedure output from being written to ODS, use the following statement:

   ods exclude all;

Ensure that you place the statement right before the procedure step that contains the output that you want to suppress.

If necessary, use the following statement when you want to resume sending subsequent procedure output to ODS:

   ods exclude none;

Five reasons to use ODS EXCLUDE to suppress SAS output discusses the ODS EXCLUDE statement in more detail.

Conclusion

Certain web browsers display large HTML files better than others. When you use SAS to create large HTML files, you might try using a web browser such as Chrome, Firefox, or Edge instead of Internet Explorer. However, even browsers such as Chrome, Firefox, and Edge might run slowly when processing a very large HTML file.

Instead, as a substitute for HTML, you might consider creating PDF output (with the ODS PDF destination) or RTF output (with the ODS RTF destination). However, if you end up creating a very large PDF or RTF file, then Adobe (for PDF output) and Microsoft Word (for RTF output) might also experience performance issues.

The information in this blog mainly pertains to the SAS windowing environment. For information about how to resolve ODS issues in SAS® Enterprise Guide®, refer to Take control of ODS results in SAS Enterprise Guide.

How to view or create ODS output without causing SAS® to stop responding or run slowly was published on SAS Users.

3月 072019
 

As of December 2018, any customer with a valid SAS Viya order is able to package and deploy their SAS Viya software in Docker containers. SAS has provided a fully documented and supported project (or “recipe”) for easily building these containers. So how can you start? You can simply stop reading this article and go directly to the GitHub repository and follow the instructions there. Otherwise, in this article, Jeff Owens, a solutions architect at SAS, provides a little color commentary around the process in case it is helpful…

First of all, what is the point of these containers?

Well, at its core, remember SAS and it’s massively parallel, in-memory counterpart, Cloud Analytic Services (CAS) is a powerful runtime for data processing and analytics. A runtime simply being an engine responsible for processing and executing a particular type of code (i.e. SAS code). Traditionally, the SAS runtime would live on a centralized server somewhere and users would submit their “jobs” to that SAS runtime (server) in a variety of ways. The SAS server supports a number of different products, tasks, etc. – but for this discussion let’s just focus on the scenario where a job here is a “.sas” file, perhaps developed in an IDE-like Enterprise Guide or SAS Studio, and submitted to the SAS runtime engine via the IDE itself, a bash shell, or maybe even SAS’ enterprise grade scheduler and job management solution – SAS Grid. In these cases, the SAS and CAS servers are on dedicated, always-on physical servers.

The brave new containerized world in which we live provides us a new deployment model: submit the job and create the runtime server at the same time. Plus, only consume the exact resources from the host machine or the Kubernetes cluster the specific job requires. And when the job finishes, release those resources for others to use. Kubernetes and PaaS clusters are quite likely shared environments, and one of the major themes in the rise of the containers is the further abstraction between hardware and software. Some of that may be easier said than done, particularly for customers with very large volumes of jobs to manage, but it is indeed possible today with SAS Viya on Docker/Kubernetes.

Another effective (and more immediate) usage of this containerized version of SAS Viya is simply an adhoc, on-demand, temporary development environment. The container package includes SAS Studio, so one can quickly spin up a full SAS Viya programming sandbox – SAS Studio as well as the SAS & CAS runtimes. Here they can develop and test SAS code, and just as quickly tear the environment down when no longer needed. This is useful for users that: (a) don’t have access to an “always-on” environment for whatever reason, (b) want to try out experimental code that could potentially consume resources from a shared "always-on" sas environment, and/or (c) maybe their Kubernetes cluster has many more resources available than their always-on and they want to try a BIG job.

Yes, it is possible to deploy the entire SAS Viya stack (microservices and all) via Kubernetes but that discussion is for another day. This post focuses strictly on the SAS Viya programming components and running on a single machine Docker host rather than a Kubernetes cluster.

Build the container image

I will begin here with a fresh single machine RHEL 7.5 server running on Openstack. But this machine could have been running on any cloud or VM platform, and I could use any (modern enough) flavor of Linux thanks to how Docker works. My machine here has 8cpu, 16GB RAM, and a 50GB root volume. Less or more is fine. A couple of notes to help understand how to configure an instance:

  • The final docker container image we will end up with will be ~10GB in size and like all docker images will live in /var/lib/docker/images by default.
    • Yes, that is large for a container. Most of this size is just static bins and libs that support the very developed SAS language. Compare to an Anaconda image which is ~3.6GB.
  • As for RAM, remember any tables loaded to CAS are loaded to memory (and will swap to disk as needed). So, your memory choice should be directly dependent on the data sizes you expect to work with.
  • Similar story for cores – CAS code is multithreaded, so more cores = more parallelization.

The first step is to install Docker.

Following along with sas-container-recipes now, the first thing I should do is mirror the repo for my order. Note, this is not a required step – you could build this container directly from SAS repos if you wanted, but we’ll mirror as a best practice. We could simply mirror and serve it over the local filesystem of our build host, but since I promised color I’ll serve it over the web instead. So, these commands run on a separate RHEL server. If you choose to mirror on your build host, make sure you have the disk space (~30GB should be plenty). You will also need your SAS_Viya_deployment_data.zip file available on the SAS Customer Support site. Run the following code to execute the setup.

$ wget https://support.sas.com/installation/viya/34/sas-mirror-manager/lax/mirrormgr-linux.tgz
$ tar xf mirrormgr-linux.tgz
$ rm -f mirrormgr-linux.tgz
$ mkdir -p /repos/viyactr
$ mirrormgr mirror –deployment-data SAS_Viya_deployment_data.zip –path /repos/viyactr –platform x64-redhat-linux-6 –latest
$ yum install httpd -y
$ system start httpd
$ systemctl enable httpd
$ ln -s /repos/viyactr /var/www/html/sas_repo

Next, I go ahead and clone the sas-containers-recipes repo locally and upload my SAS-Viya-deployment-data.zip file and I am ready to run the build command. As a bonus, I am also going to use my site’s (SAS’) sssd.conf file so my container will use our corporate Active Directory for authentication. If you do not need or want that integration you can skip the “vi addons/sssd.conf” line and change the “--addons” option to “addons/auth-demo” so your container seeds with a single “sasdemo:sasdemo” user:password instead.

$ # upload SAS_Viya_deployment_data.zip to this machine somehow
$ Git clone https://github.com/sassoftware/sas-container-recipes.git
$ cd sas-container-recipes/
$ vi addons/sssd.conf # <- paste in your site’s sssd.conf file
$ build.sh \
--type single \
--zip ~/SAS_Viya_deployment_data.zip \
--mirror-url http://jo.openstack.sas.com/sas_repo \
--addons “addons/auth-sssd”

The build should take about 45 minutes and produce a single container image for you (there might be a few images, but it is just one with a thin layer or two on top). You might want to give this image a new name (docker tag) and push it into your own private registry (docker push). Aside from that, we are ready to run it.
If you are curious, look in the addons directory for the other optional layers you can add to your container. Several tools are available for easily configuring connections to external databases.

Run the container

Here is the run command we can use to launch the container. Note the image name I use here is “sas-viya-programming:xxxxxx” – this is the image that has my sssd layer built on top of it.

$ docker run \
--detach \ 
--rm \ 
--env CASENV_CAS_HOST=$(hostname -f) \ 
--env CASENV_CAS_VIRTUAL_PORT=8081 \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Connect to the container

And now, in a web browser, I can go to :8081/SASStudio and I will end up in SAS Studio, where I can sign in with my internal SAS credentials. To stop the container, use the name you gave it: “docker stop sas-viya-programming”. Because we used the “--rm” flag the container will be removed (completely destroyed) when we stop it.

Note we are explicitly mapping in the HTTP port (8081:80) so we easily know how to get to SAS Studio. If you want to start up another container here on the same host, you will need to use a different port or else you’ll get an address already in use error. Also note we might be interested in connecting directly to this CAS server from something other than SAS Studio (localhost). A remote python client for example. We can use the other port we mapped in (5570:5570) to connect to the CAS server.

Persist the data

Running this container with the above command means anything and everything done inside the container (configuration changes, code, data) will not persist if the container stops and a new one started later. Luckily this is a very standard and easy to solve scenario with Docker and Kubernetes. Here are a couple of targets inside the container you might be interested in mounting a volume to:

  • /tmp – this is where CAS_DISK_CACHE is by default, not to mention SASWORK. Those are scratch space used by the runtimes. If you are working with small data and don’t care too much about performance, no need to worry about this. But to optimize your container we would suggest mounting a Docker volume to this location (or, ideally, bind mount a high-performance storage device here). Note that generally Docker prefers us to use Docker volumes in lieu of bind mounts, but that is more for manageability, security, and portability than performance.
  • /data – this directory doesn’t necessarily exist but when you mount a volume into a container the target location will be created. So, you could call this target whatever you want, assuming it doesn’t exist yet.  Bind mounting is tempting here and OK to do but consider the scenario when another user wants to run your container following instructions you provided them – better to use a Docker volume than force them to create the directory on the host.  If you have an NFS location, bind mounting that makes sense
  • /code – same spiel as with /data. Once you are in the container you can save your work and it will persist in the docker volume from run to run of your container.

Here is what an updated docker run command might look like with these volumes included:

$docker run \ 
--detach \ 
-rm \ 
--env CASNV_CAS_VIRTUAL_HOST=$(hostname -f) \ 
--env CASNV_CAS_VIRTUAL_PORT=8081 \ 
--volume mydata:/data \ 
--volume /nfsdata:/nfsdata \ # example syntax for bind mount instead of docker volume mount 
--volume mycode:/code \ 
--volume sastmp:/tmp \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Can I run this on my laptop?

Yes. You would just need to install Docker on your laptop (go to docker.com for that). You can certainly follow the instructions from the top to build and run locally. You can even push this container image out to an internal registry so other users could skip the build and just run.

So far, we have only talked about the “ad-hoc” or “sandbox” dev type of use case for this container. A later article may cover how to run in batch mode or maybe we will move straight to multi-containers & Kubernetes. In the meantime though, here is how to submit a .sas program as a batch job to this single container we have built.

Give it a try!

Try creating your own image and deploying a container. Feel free to comment on your experience.

More info:

SAS Communities Article- Running SAS Analytics in a Docker container
SAS Global Forum Paper- Docker Toolkit for Data Scientists – How to Start Doing Data Science in Minutes!
SAS Global Forum Tech Talk Video- Deploying and running SAS in Containers

Getting Started with SAS Containers was published on SAS Users.

3月 062019
 

conditionally terminating a SAS batch flow process in UNIX/LinuxIn automated production (or business operations) environments, we often run SAS job flows in batch mode and on schedule. SAS job flow is a collection of several inter-dependent SAS programs executed as a single process.

In my earlier posts, Running SAS programs in batch under Unix/Linux and Let SAS write batch scripts for you, I described how you can run SAS programs in batch mode by creating UNIX/Linux scripts that in turn incorporate other scripts invocations.

In this scenario you can run multiple SAS programs sequentially or in parallel, all while having a single root script kicked off on schedule. The whole SAS processing flow runs like a chain reaction.

Why and when to stop SAS batch flow process

However, sometimes we need to automatically stop and terminate that chain job flow execution if certain criteria are met (or not met) in a program of that process flow.
Let’s say our first job in a batch flow is a data preparation step (ETL) where we extract data tables from a database and prepare them for further processing. The rest of the batch process is dependent on successful completion of that critical first job. The process is kicked off at 3:00 a.m. daily, however, sometimes we run into a situation when the database connection is unavailable, or the database itself is not finished refreshing, or something else happens resulting in the ETL program completing with ERRORs.

This failure means that our data has not updated properly and there is no reason to continue running the remainder of the job flow process as it might lead to undesired or even disastrous consequences. In this situation we want to automatically terminate the flow execution and send an e-mail notification to the process owners and/or SAS administrators informing them about the mishap.

How to stop SAS batch flow process in UNIX/Linux

Suppose, we run the following main.sh script on UNIX/Linux:

#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
#2 run the rest of processing flow
/sas/code/processing/tail.sh

The etl.sh script runs the SAS ETL process as follows:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/etl/etl.sas"
logname="/sas/code/etl/etl_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

We want to run tail.sh shell script (which itself runs multiple other scripts) only if etl.sas program completes successfully, that is if SAS ETL process etl.sas that is run by etl.sh completes with no ERRORs or WARNINGs. Otherwise, we want to terminate the main.sh script and do not run the rest of the processing flow.

To do this, we re-write our main.sh script as:

 
#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
exitcode=$?
echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)"
 
if [ $exitcode -eq 0 ]
   then
      #2 run the rest of processing flow
      /sas/code/processing/tail.sh
fi

In this code, we use a special shell script variable ($? for the Bourne and Korn shells, $STATUS for the C shell) to capture the exit status code of the previously executed OS command, /sas/code/etl/etl.sh:

exitcode=$?

Then the optional echo command just prints the captured value of that status for our information.

Every UNIX/Linux command executed by the shell script or user has an exit status represented by an integer number in the range of 0-255. The exit code of 0 means the command executed successfully without any errors; a non-zero value means the command was a failure.

SAS System plays nicely with the UNIX/Linux Operating System. According to the SAS documentation $? for the Bourne and Korn shells, and $STATUS for the C shell.) A value of 0 indicates successful termination. For additional flexibility, SAS’ Condition Exit Status Code All steps terminated normally 0 SAS issued WARNINGs 1 SAS issued ERRORs 2 User issued ABORT statement 3 User issued ABORT RETURN statement 4 User issued ABORT ABEND statement 5 SAS could not initialize because of a severe error 6 User issued ABORT RETURN - n statement n User issued ABORT ABEND - n statement n

Since our etl.sh script executes SAS code etl.sas, the exit status code is passed by the SAS System to etl.sh and consequently to our main.sh shell script.

Then, in the main.sh script we check if that exit code equals to 0 and then and only then run the remaining flow by executing the tail.sh shell script. Otherwise, we skip tail.sh and exit from the main.sh script reaching its end.

Alternatively, the main.sh script can be implemented with an explicit exit as follows:

#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
exitcode=$?
echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)"
 
if [ $exitcode -ne 0 ]
   then exit
fi
 
#2 run the rest of processing flow
/sas/code/processing/tail.sh

In this shell script code example, we check the exit return code value, and if it is NOT equal to 0, then we explicitly terminate the main.sh shell script using exit command which gets us out of the script immediately without executing the subsequent commands. In this case, our #2 command invoking tail.sh script never gets executed that effectively stops the batch flow process.

If you also need to automatically send an e-mail notification to the designated people about the failed batch flow process, you can do it in a separate SAS job that runs right before exit command. Then the if-statement will look something like this:

 
if [ $exitcode -ne 0 ]
   then
      # send an email and exit
      /sas/code/etl/email_etl_failure.sh
      exit
fi

That is immediately after the email is sent, the shell script and the whole batch flow process gets terminated by the exit command; no shell script commands beyond that if-statement will be executed.

A word of caution

Be extra careful if you use the special script variable $? directly in a script's logical expression, without assigning it to an interim variable. For example, you could use the following script command sequence:

/sas/code/etl/etl.sh
if [ $? -ne 0 ]
. . .

However, let’s say you insert another script command between them, for example:

/sas/code/etl/etl.sh
echo "Status=$? (0=SUCCESS,1=WARNING,2=ERROR)"
if [ $? -ne 0 ]
. . .

Then the $? variable in the if [ $? -ne 0 ] statement will have the value of the previous echo command, not the /stas/code/etl/etl.sh command as you might imply.

Hence, I suggest capturing the $? value in an interim variable (e.g. exitcode=$?) right after the command, exit code of which you are going to inspect, and then reference that interim variable (as $exitcode) in your subsequent script statements. That will save you from trouble of inadvertently referring to a wrong exit code when you insert some additional commands during your script development.

Your thoughts

What do you think about this approach? Did you find this blog post useful? Did you ever need to terminate your batch job flow? How did you go about it? Please share with us.

How to conditionally terminate a SAS batch flow process in UNIX/Linux was published on SAS Users.

2月 282019
 

Across organizations of all types, massive amounts of information are stored in unstructured formats such as video, images, audio, and of course, text. Let’s talk more about text and natural language processing. We know that there is tremendous value buried in call center and chat dialogues, survey comments, product reviews, technical notes, legal contracts, and other sources where context is captured in words versus numbers. But how can we extract the signal we want amidst all the noise?

In this post, we will examine this problem using publicly available descriptions of side effects or adverse events that patients have reported following a vaccination. This Vaccine Adverse Event Reporting System (VAERS) is managed by the CDC and FDA. Among other objectives, these agencies use it to:

* Monitor increases in known adverse events and detect new or unusual vaccine adverse events

* Identify potential patient risk factors, including temporal, demographic, or geographic reporting clusters

Below is a view of the raw data. It contains a text field which holds freeform case notes, along with structured fields which contain the patient’s location, age, sex, date, vaccination details, and flags for serious outcomes such as hospitalization or death.

In this dashboard, notice how we easily can do a search for a keyword “seizure” to filter to patients who have reported this symptom in the comments. However, analysts need much more than just Search. They need to be able to not only investigate all the symptoms an individual patient is experiencing, but also see what patterns are emerging in aggregate so they can detect systemic safety or process issues. To do this, we need to harvest the insights from the freeform text field, and for that we’ll use SAS Visual Text Analytics.

In this solution, we can do many types of text analysis – which you choose depends on the nature of the data and your goals. When we load the data into the solution, it first displays all the variables in the table and detects their types. We could profile the structured fields further to see summary statistics and determine if any data cleansing is appropriate, but for now let’s just build a quick text model for the SYMPTOM_TEXT variable.

After assigning this variable to the “Text” role, SAS Visual Text Analytics automatically builds a pipeline which we can use to string together analytic tasks. In this default pipeline, first we parse the data and identify key entities, and then the solution assigns a sentiment label to each document, discovers topics (i.e. themes) of interest, and categorizes the collection in a meaningful way. Each of these nodes is interactive.

In this post, we’ll show just a tiny piece of overall functionality – how to automatically extract custom entities and relationships using a combination of machine learning and linguistic rules. In the Concepts node, we provide several standard entities to use out of the box. For example, here are the automatic matches to the pre-defined “DATE” concept:

However, for this data, we’re interested in extracting something different – patient symptoms, and where on the body they occurred. Since neither open source Named Entity Recognition (NER) models nor SAS Pre-defined Concepts will do something as domain-specific as this out of the box, it’s up to us to define what we mean by a symptom or a body part under Custom Concepts.

For Body Parts, we started with a list of expected parts from medical dictionaries and subject matter experts. As I iterate through and inspect the results, I might see a keyword or phrase that I missed. In the upcoming version of SAS Visual Text Analytics, I will be able to simply highlight it and right click to add it to the rule set.

We also will be adding a powerful new feature that applies machine learning to suggest additional rules for us. Note that this isn’t a simple thesaurus lookup! Instead, an algorithm is using the matches you’ve already told it are good, combined with the data itself, to learn the pattern you’re interested in. The suggested rules are placed in a new Sandbox area where you can test and evaluate them before adding them to your final definition.

We will also be able to auto-generate fact rules. This will help us pull out meaningful relationships between two entities and suggest a generalized pattern for modeling it. Here, we’ll have the machine determine the best relationship between Body Parts and Localized Symptoms, so that we can answer questions like, “where does it hurt?”, or “what body part was red (or itchy or swollen or tingly, etc.)?”. For this data, the tool suggested a rule which looks for a body part within 6 terms of a symptom, regardless of order, so long as both are contained in the same sentence.

Let’s apply just these few simple rules to our entire dataset and go back to the dashboard view. If we look at the results, we can see now much richer potential for finding insights the data. I can easily select a single patient and see an entire list of his/her side effects alongside key details about the vaccination. I can also compare the most commonly reported symptoms by age group, gender, or geography, or which body parts and symptoms may be predictors of a severe outcome like hospitalization or death.

Of course, there is much more we could do with this data. We could extract the name of the vaccine that was administered, the time to symptom onset, duration period of the symptoms, and other important information. However, even this simple example illustrates the technique and power of contextual extraction, and how it can enhance our ability to analyze large collections of complex data. Currently, concept rule generation is on the forefront of our research efforts in its experimental first stages. This, along with the sandbox testing environment, will make it even faster and easier for analysts to do this work in SAS Visual Text Analytics. Here are a few other resources to check out if you want to dig in further.

Article: Reduce the cost-barrier of generating labeled text data for machine learning algorithms

Paper: Analyzing Text In-Stream and at the Edge

Automatically extracting key information from textual data was published on SAS Users.

2月 272019
 

In this post, we continue our discussion of geography variables, the foundation of Visual Analytics Geo maps. This time we will look at Custom Coordinates.  As with any statistical graph, understanding your data is key.  But when using Custom Coordinates for geographic maps, this understanding becomes even more important.

Use the Custom Coordinate geography variable when your data does not match one of VA’s predefined geography types (see previous post, Fundamentals of SAS Visual Analytics geo maps).  For Custom coordinates, your data set must include latitude and longitude values as separate variables.   These values should be sourced from trustworthy providers and validated for accuracy prior to loading into VA.

When using Custom Coordinates, the Coordinate Space must also be considered.  The coordinate space defines the grid used to plot your data.  The underlying map is also based on a grid.  In order for your data to display correctly on a map, these grids must match.  Visual Analytics uses the World Geodetic System (WGS84) as the default coordinate space (grid).  This will work for most scenarios, including the example below.

Once you have selected a dataset and confirmed it contains the required spatial information, you can now create a Custom Geography variable.  In this example, I am using the variable Business Address from the dataset Wake_Co_Pizza.  Let’s get started.

  1. Begin by opening VA and navigate to the Data panel on the left of the application.
  2. Select the dataset and locate the variable that you wish to map. Click the down arrow to the right of the variable and chose ‘Geography’ from the Classification dropdown menu.
  3. The ‘Edit Geography Item’ window appears. Select Custom coordinates in the ‘Geography data type’ dropdown.   Three new dropdown lists appear that are specific to the Custom coordinates data type: ‘Latitude (y)’, ‘Longitude (x)’ and ‘Coordinate Space’.

When using the Custom coordinates data type, we must tell VA where to find the spatial data in our dataset.  We do this using the Latitude (y) and Longitude (x) dropdown lists.  They contain all measures from your dataset.  In this example, the variable ‘Latitude World Geodetic System’ contains our latitude values and the variable  ‘Longitude World Geodetic System’ contains our longitude values.   The ‘Coordinate Space’ dropdown defaults to World Geodetic System (WGS84) and is the correct choice for this example.

  1. Click the OK button to complete the setup once the latitude and longitude variables have been selected from their respective dropdown lists. You should see a new ‘Geography’ section in the Data panel.  The name of the variable (or its edited value) will be displayed beside a globe icon to indicate it is a geography variable.  In this case we see the variable Business Address.

 

Congratulations!  You have now created a custom geography variable and are ready to display it on a map.  To do this, simply drag it from the Data panel and drop it on the report canvas.  The auto-map feature of VA will recognize it as a geography variable and display the data as a bubble map with an OpenStreetMap background.

In this post, we created a custom geography variable using the default Coordinate Space.  Using a custom geography variable gives you the flexibility of mapping data sets that contain valid latitude and longitude values.  Next time, we will take our exploration of the geography variable one step further and explore using custom polygons in your maps.

Using Custom Coordinates for map creation in SAS Visual Analytics was published on SAS Users.

2月 222019
 

Have you heard? SAS recently announced a new practical programming credential, SAS® Certified Specialist: Base Programming Using SAS® 9.4. This new practical programming credential is different than our previous exams. Now it requires you to take a performance-based exam, in which you access a SAS environment and then write and execute SAS code. Practical, right? At the end, your answers are scored for correctness.

This new SAS Certified Specialist credential will run in parallel with the current SAS Certified Base Programmer credential until June 2019. So make sure to check out the complete exam content guide for a list of objectives that are tested on the exam!

A new certification guide


SAS is also releasing an accompanying certification guide to help you prepare for the new programming credential: SAS® Certified Specialist Prep Guide: Base Programming Using SAS® 9.4. This new certification guide has been streamlined to remove redundancy throughout the chapters and has been aligned with the courses, SAS® Programming 1: Essentials and SAS® Programming 2: Data Manipulation Techniques, along with the exam content guide.

The new certification guide also includes a workbook that provides programming scenarios to help you prepare for the performance-based portion of the exam. Make sure to also review the SAS® 9.4 Base Programming Exam Experience Tutorial to help you prepare!

Here are some of the changes made to the certification guide:

    • Removal of INFILE/INPUT statements. These statements are no longer taught in SAS® Programming 1 and SAS® Programming 2 courses or tested in the exam. Thus, these statements have been replaced with the SET statement and the IMPORT procedure.
    • Removal of arrays. Arrays are no longer taught in SAS® Programming 1 and SAS® Programming 2 courses or tested in the exam.
    • Addition of the TRANSPOSE and EXPORT procedures, as well as macro variables. These additions are tested and are taught in the SAS® Programming 1 and SAS® Programming 2 courses.
    • Updating of existing examples and the addition of more annotated examples that are easier to follow and have better quality graphics. These changes help to improve the readability of examples, and there is a closer relationship between the sample questions in the book and the questions that appear on the actual exam.
    • Installation of sample data, and the way you work with SAS libraries has been simplified.
    • A new companion piece, the Quick Syntax Reference Guide, is also available for download.

More opportunities to test your skills

For even more programming practice check out Ron Cody’s Learning SAS by Example. This is full of practical examples, and exercises with solutions which allow you to test your programming skills for the exam.

Not ready for the certification exam just yet? You can also browse our books for getting started with SAS. There you will find, of course, The Little SAS Book: A Primer 5th Edition and Exercises and Projects for The Little SAS® Book, Fifth Edition. Both are great guides for new users wanting to learn SAS and practice before getting to the New Base Programming Specialist Exam!

Thinking about getting SAS® certified? Check out the new SAS certification guide! was published on SAS Users.

2月 152019
 
Beginning with SAS® 9.4, you can embed graphics output within HTML output using the ODS HTML5 destination. This technique works with SAS/GRAPH® procedures (such as GPLOT and GCHART), SG procedures (such as SGPLOT and SGRENDER), and when you create graphics output with ODS Graphics enabled. Most (if not all) existing web browsers support graphics output embedded in HTML5 output.

Note: The default graphics output format for the ODS HTML5 destination is Scalable Vector Graphics (SVG). SVG documents display clearly at any size in any viewer or browser that supports SVG. So, SVG files are ideal for display on a computer monitor, PDA, or cell phone; or printed documents. Because it's a vector graphic, a single SVG document can be transformed to any screen resolution without compromising the clarity of the document. Here's an example:

The same SVG graph, scaled at 90% and then at 200%. But 100% crisp!

SAS/GRAPH procedures

When you use the ODS HTML5 destination with a SAS/GRAPH procedure, specify a value of SVG, PNG, or JPEG for the DEVICE option in the GOPTIONS statement. The following sample PROC GPLOT code embeds SVG graphics inside the resulting HTML output:

goptions device=svg;
ods _all_ close;  
ods html5 path="c:\temp" file="svg_graph.html"; 
symbol1 i=none v=squarefilled; 
proc gplot data=sashelp.cars; 
  plot mpg_city * horsepower;   
  where make="Honda"; 
run;
quit;  
ods html5 close; 
ods preferences;

Note that the ODS PREFERENCES statement above resets the ODS environment back to its default settings when you use the SAS windowing environment.

When you use the PNG or JPEG device driver with the ODS HTML5 destination, add the BITMAP_MODE="INLINE" option to the ODS HTML5 statement. Here is an example:

goptions device=png;
ods _all_ close; 
ods html5 path="c:\temp" file="png_graph.html"     options(bitmap_mode="inline");
symbol1 i=none v=squarefilled; 
proc gplot data=sashelp.cars; 
  plot mpg_city * horsepower;   
  where make="Honda"; 
run;
quit;  
ods html5 close; 
ods preferences;

ODS Graphics and SG procedures

When you use SG procedures and ODS Graphics, specify a value of SVG, PNG, or JPEG for the OUTPUTFMT option in the ODS GRAPHICS statement. The following sample code uses PROC SGPLOT to embed SVG graphics inside the HTML output with the ODS HTML5 destination:

ods _all_ close; 
ods html5 path="c:\temp" file="svg_graph.html"; 
ods graphics on / reset=all outputfmt=svg;
proc sgplot data=sashelp.cars; 
  scatter y=mpg_city x=horsepower / markerattrs=(size=9PT symbol=squarefilled);   
  where make="Honda"; 
run;
ods html5 close; 
ods preferences;  

The following sample code uses PROC SGPLOT to embed PNG graphics inside the HTML output with the ODS HTML5 destination:

ods _all_ close; 
ods html5 path="c:\temp" file="png_graph.html" options(bitmap_mode="inline");   
      ods graphics on / reset=all outputfmt=png;
proc sgplot data=sashelp.cars; 
  scatter y=mpg_city x=horsepower / markerattrs=(size=9PT symbol=squarefilled);   
  where make="Honda"; 
run;
      ods html5 close; 
      ods preferences; 

The technique above also works when you use the ODS GRAPHICS ON statement with other procedures that produce graphics output (such as the LIFETEST procedure).

Note that the ODS HTML5 destination supports the SAS Graphics Accelerator. The SAS Graphics Accelerator enables users with visual impairments or blindness to create, explore, and share data visualizations. It supports alternative presentations of data visualizations that include enhanced visual rendering, text descriptions, tabular data, and interactive sonification. Sonification uses non-speech audio to convey important information about the graph.

You can use the ODS HTML5 destination in most situations where you need to embed all of your output into a single HTML output location. For example, when you email HTML output as an attachment or when you create graphics output via a SAS stored process. If you currently use the ODS HTML destination, you might want to experiment with the ODS HTML5 destination to see whether it meets your needs even if you cannot completely switch to it yet.

Embed scalable graphics using the ODS HTML5 destination was published on SAS Users.

2月 122019
 

Multi-tenancy is one of the exciting new capabilities of SAS Viya. Because it is so new, there is quite a lot of misinformation going around about it. I would like to offer you five key things to know about multi-tenancy before implementing a project using this new paradigm.

All tenants share one SAS Viya deployment

Just as apartment units exist within a larger, common building, all tenants, including the provider, exist within one, single SAS Viya deployment. Tenants share some SAS Viya resources such as the physical machines, most microservices, and possibly the SAS Infrastructure Data Server. Other SAS Viya resources are duplicated per tenant such as the CAS server and compute launcher. Regardless, the key point here is that because there is one SAS Viya deployment, there is one, and only one, SAS license that applies to all tenants. Adding a new tenant to a multi-tenant deployment could have licensing ramifications depending upon how the CAS server resources are allocated.

Decision to use multi-tenancy must be made at deployment time

Many people, myself included, are not very comfortable with commitment. Making a decision that cannot be changed is something we avoid. Deciding whether your SAS Viya deployment supports multi-tenancy cannot be put off for later.

This decision must be made at the time the software is deployed. There is currently no way to convert a multi-tenant deployment to a single-tenant deployment or vice versa short of redeployment, so choose wisely. As with marriage, the decision to go single-tenant or multi-tenant should not be taken lightly and there are benefits to each configuration that should be considered.

Each tenant is accessed by separate login

Let’s return to our apartment analogy. Just as each apartment owner has a separate key that opens only the apartment unit they lease, SAS Viya requires users to log on (authenticate) to a specific tenant space before allowing them access.

SAS Viya facilitates this by accessing each tenant by way of a separate sub-domain address. As shown in the diagram below, a user wishing to use the Acme tenant must access the deployment with a URL of acme.viya.sas.com while a GELCorp user would use a URL of gelcorp.viya.sas.com.

This helps create total separation of tenant access and allows administrators to define and restrict user access for each tenant. It does, however, mean that each tenant space is authenticated individually and there is no notion of single sign-on between tenants.

No content is visible between tenants

You will notice in both images above that there are brick walls between each of the tenants. This is to illustrate how tenants are completely separated from one another. One tenant cannot see any other tenant’s content, data, users, groups or even that other tenants exist in the system.

One common scenario for multi-tenancy is to keep business units within a single corporation separated. For example, we could set up Sales as a tenant, Finance as a tenant, and Human Resources as a tenant. This works very well if we want to truly segregate the departments' work. But what happens when Sales wants to share a report with Finance or Finance wants to publish a report for the entire company to view?

There are two options for this situation:
• We could export content from one tenant and import it into the other tenant(s). For example, we would export a report from the Sales tenant and import it into the Finance tenant, assuming that data the report needs is available to both. But now we have the report (and data) in two places and if Sales updates the report we must repeat the export/import process.
• We could set up a separate tenant at the company level for shared content. Because identities are not shared between tenants, this would require users to log off the departmental tenant and log on to the corporate tenant to see shared reports.

There are pros and cons to using multi-tenancy for departmental separation and the user experience must be considered.

Higher administrative burden

Managing and maintaining a multi-tenancy deployment is more complex than taking care of a single-tenant deployment. Multi-tenancy requires additional CAS servers, additional micro-services, possibly additional machines, and multiple administrative personas. The additional resources can complicate backup strategies, authorization models, operating system security, and resource management of shared resources.

There are also more levels of administration which requires an administrator persona for the provider of the environment and separate administrator personas for each tenant. Each of these administration personas have varying scope into which aspects of the entire deployment they can interact with. For example, the provider administrator can see all system resources, all system activity, logs and tenants, but cannot see any tenant content.

Tenant administrators can only see and interact with dedicated tenant resources such as their CAS server and can also manage all tenant content. They cannot, however, see system resources, other tenants, or logs.

Therefore, coordinating management of a complete multi-tenant deployment will require multiple administration personas, careful design of operating system group membership to protect and maintain sufficient access to files and processes, and possibly multiple logins to accomplish administrative tasks.

Now what?

I have pointed out a handful of key concepts that differ between the usual single-tenant deployments and what you can expect with a multi-tenant deployment of SAS Viya. I am obviously just scratching the surface on these topics. Here are a couple of other resources to check out if you want to dig in further.

Documentation: Multi-tenancy: Concepts
Article: Get ready! SAS Viya 3.4 highlights for the Technical Architect

5 things to know about multi-tenancy was published on SAS Users.