tips & techniques

7月 032019
 

One of my favorite parts of summer is a relaxing weekend by the pool. Summer is the time I get to finally catch up on my reading list, which has been building over the year. So, if expanding your knowledge is a goal of yours this summer, SAS Press has a shelf full of new titles for you to explore. To help navigate your selection we asked some of our authors what SAS books were on their reading lists for this summer?

Teresa Jade


Teresa Jade, co-author of SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, has already started The DS2 Procedure: SAS Programming Methods at Work by Peter Eberhardt. Teresa reports that the book “is a concise, well-written book with good examples. If you know a little bit about the SAS DATA step, then you can leverage what you know to more quickly get up to speed with DS2 and understand the differences and benefits.”
 
 
 

Derek Morgan

Derek Morgan, author of The Essential Guide to SAS® Dates and Times, Second Edition, tells us his go-to books this summer are Art Carpenter’s Complete Guide to the SAS® REPORT Procedure and Kirk Lafler's PROC SQL: Beyond the Basics Using SAS®, Third Edition. He also notes that he “learned how to use hash objects from Don Henderson’s Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study.”
 

Chris Holland

Chris Holland co-author of Implementing CDISC Using SAS®: An End-to-End Guide, Revised Second Edition, recommends Richard Zink’s JMP and SAS book, Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP® and SAS®, which describes how to improve efficiency while reducing costs in trials with centralized monitoring techniques.
 
 
 
 
 

And our recommendations this summer?

Download our two new free e-books which illustrate the features and capabilities of SAS® Viya®, and SAS® Visual Analytics on SAS® Viya®.

Want to be notified when new books become available? Sign up to receive information about new books delivered right to your inbox.

Summer reading – Book recommendations from SAS Press authors was published on SAS Users.

6月 262019
 

"There's a way to do it better - find it." - Thomas A. Edison

Finding a better SAS code

When it comes to SAS coding, this quote by Thomas A. Edison is my best advisor. Time permitting, I love finding better ways of implementing SAS code.

But what code feature means “better” – brevity, clarity or efficiency? It all depends on the purpose of your code. When code is to illustrate a coding concept or technique, clarity is a paramount. However, when processing large data volumes in near real-time, code efficiency becomes critical, not just a luxury or convenience. And brevity won’t hurt in either case. Ideally, your code should be a combination of all three features - brevity, clarity and efficiency.

Parsing a character string

In this blog post we will solve a problem of parsing a character string to find a position of n-th occurrence of a group of characters (substring) in that string.

The closest out-of-box solution to this problem is SAS’ FIND() function. Except this function searches only for a single/first instance of specified substring of characters within a character string. Close enough, and with some do-looping we can easily construct what we want.

After some internet and soul searching to find the Nth occurrence of a substring within a string, I came up with the following DATA STEP code snippet:

   p = 0;
   do i=1 to n until(p=0); 
      p = find(s, x, p+1);
   end;

Here, s is a text string (character variable) to be parsed; x is a character variable holding a group of characters that we are searching for within s; p is a position of x value found within s; n is an instance number.

If there is no n-th instance of x within s found, then the code returns p=0.

In this code, each do-loop iteration searches for x within s starting from position p+1 where p is position found in prior iteration: p = find(s,x,p+1);.

Notice, if there is no prior-to n instance of x within s, the do-loop ends prematurely, based on until(p=0) condition, thus cutting the number of loops to the minimal necessary.

Reverse string search

Since find() function allows for a string search in a reverse direction (from right to left) by making the third augment negative, the above code snippet can be easily modified to do just that: find Nth instance (from right to left) of a group of characters within a string. Here is how you can do that:

   p = length(s) + 1;
   do i=1 to n until(p=0); 
      p = find(s, x, -p+1);
   end;

The difference here is that we start from position length(s)+1 instead of 0, and each iteration searches substring x within string s starting from position –(p-1)=-p+1 from right to left.

Testing SAS code

You can run the following SAS code to test and see how these searches work:

data a;
   s='AB bhdf +BA s Ab fs ABC Nfm AB ';
   x='AB';
   n=3;
 
   /* from left to right */
   p = 0;
   do i=1 to n until(p=0); 
      p = find(s, x, p+1);
   end;
   put p=;
 
   /* from right to left */
   p = length(s) + 1;
   do i=1 to n until(p=0); 
      p = find(s, x, -p+1);
   end;
   put p=;
run;

FINDNTH() function

We can also combine the above left-to-right and right-to-left searches into a single user-defined SAS function by means of SAS Function Compiler (PROC FCMP) procedure:

proc fcmp outlib=sasuser.functions.findnth;
   function findnth(str $, sub $, n);
      p = ifn(n>=0,0,length(str)+1);
      do i=1 to abs(n) until(p=0);
         p = find(str,sub,sign(n)*p+1);
      end;
      return (p);
   endsub;
run;

We conveniently named it findnth() to match the Tableau FINDNTH(string, substring, occurrence) function that returns the position of the nth occurrence of substring within the specified string, where the occurrence argument defines n.

Except our findnth() function allows for both, positive (for left-to-right searches) as well as negative (for right-to-left searches) third argument while Tableau’s function only allows for left-to-right searches.

Here is an example of the findnth() function usage:

options cmplib=sasuser.functions;
data a;
   s='AB bhdf +BA s Ab fs ABC Nfm AB ';
   x='AB';
   n=3;
 
   /* from left to right */
   p=findnth(s,x,n);
   put p=;
 
   /* from right to left */
   p=findnth(s,x,-n);
   put p=;
run;

Using Perl regular expression

As an alternative solution I also implemented SAS code for finding n-th occurrence of a substring within a string using Perl regular expression (regex or prx):

data a;
   s='AB bhdf +BA s Ab fs ABC Nfm AB ';
   x='AB';
   n=3;
 
   /* using regex */
   xid = prxparse('/'||x||'/o');
   p = 0;
   do i=1 to n until(p=0);
      from = p + 1;
      call prxnext(xid, p + 1, length(s), s, p, len);
   end;
   put p=;
run;

However, efficiency benchmarking tests demonstrated that the above solutions using FIND() function or FINDNTH() SAS user-written function run roughly twice faster than this regex solution.

Challenge

Can you come up with an even better solution to the problem of finding Nth instance of a sub-string within a string? Please share your thoughts and solutions with us. Thomas A. Edison would have been proud of you!

Finding n-th instance of a substring within a string was published on SAS Users.

6月 182019
 

What is Item Response Theory?

Item Response Theory (IRT) is a way to analyze responses to tests or questionnaires with the goal of improving measurement accuracy and reliability.

A common application is in testing a student’s ability or knowledge. Today, all major psychological and educational tests are built using IRT. The methodology can significantly improve measurement accuracy and reliability while providing potential significant reductions in assessment time and effort, especially via computerized adaptive testing. For example, the SAT and GRE both use Item Response Theory for their tests. IRT takes into account the number of questions answered correctly and the difficulty of the question.

In recent years, IRT models have also become increasingly popular in health behavior, quality of life, and clinical research. There are many different models for IRT. Three of the most popular are:

The Rasch model

Two-parameter model

Graded Response model

Early IRT models (such as the Rasch model and two-parameter model) concentrate mainly on dichotomous responses. These models were later extended to incorporate other formats, such as ordinal responses, rating scales, partial credit scoring, and multiple category scoring.

Item Response Theory Models Using SAS

Ron Cody and Jeffrey K. Smith’s book, Test Scoring and Analysis Using SAS, uses SAS PROC IRT to show how to develop your own multiple-choice tests, score students, produce student rosters (in print form or Excel), and explore item response theory (IRT).

Aimed at non-statisticians working in education or training, the book describes item analysis and test reliability in easy-to-understand terms and teaches SAS programming to score tests, perform item analysis, and estimate reliability.

For those with a more statistical background, Bayesian Analysis of Item Response Theory Models Using SAS describes how to estimate and check IRT models using the SAS MCMC procedure. Written especially for psychometricians, scale developers, and practitioners, numerous programs are provided and annotated so that you can easily modify them for your applications.

Assessment has played, and continues to play, an integral part in our work and educational settings. IRT models continue to be increasingly popular in many other fields, such as medical research, health sciences, quality-of-life research, and even marketing research. With the use of IRT models, you can not only improve scoring accuracy but also economize test administration by adaptively using only the discriminative items.

Interested in learning more? Check out our chapter previews available for free. Want to learn more about SAS Press? Explore our online bookstore and subscribe to our newsletter to get all the latest discounts, news, and more.

Further resources

SAS Blogs:
New at SAS: Psychometric Testing by Charu Shankar
SAS author’s tip: Bayesian analysis of item response theory models

SAS Communities:
SAS Communities: Custom Task Tuesday: SAS Global Forum/PROC IRT Edition!

SAS Global Forum Paper:
Item Response Theory: What It Is and How You Can Use the IRTProcedure to Apply It by Xinming An and Yiu-Fai Yung

SAS Documentation:
The IRT Procedure
SAS/STAT 14.1 User Guide: The IRT Procedure
SAS/STAT 14.2 User Guide: Help Center

Understanding Item Response Theory with SAS was published on SAS Users.

6月 112019
 

This article is a follow-on to a recent post from Jeff Owens, Getting started with SAS Containers. In that post, Jeff discussed building and running a single container for a SAS Viya runtime/IDE. Today we will go through how to build and run the full SAS Viya stack - visual components and all - in Kubernetes. Step 1 is building the container images and Step 2 is running the containers. For both steps, you can go to the sas-container-recipes GitHub repo for more detail and to obtain the tools needed to accomplish this task. An in-depth guide and more information is located on the wiki page in the repository.

The project development team at SAS has done an incredible job of making this new and intuitive way to dynamically create large collections of containers easy and foolproof, despite my long-winded explanation...

Building the Container Images

Keeping with the recipes theme, we are going to need to prepare a few ingredients to make this work. Of course, you will need a valid SAS_Viya_deployment_data.zip file containing your ordered products.

Build Machine

First, you need a Build Machine. This can be a lightweight server, but it needs to be running Linux. The build machine in this example is 2cpu x 8GB RAM, running RHEL 7.6. Hint – 2 cores is the minimum but the more you use for the build the better (faster). I have installed Docker version 18.09.5 here and I have a 100GB volume attached to my docker root (by default this is /var/lib/docker but you can easily change the location in your /etc/docker/daemon.json file).

You can review full system requirements in the GitHub repository here. This article covers the "multiple" or "full" deployment types so focus on that column in the table.
This build machine is going to execute the build script which builds each one of your containers, push them to your Docker Registry, and create the corresponding Kubernetes manifests files needed to launch your deployment.

Make sure you have cloned the sas-container-recipes repository to this machine.

Docker Registry

You will need access to a Docker registry. Your build machine must be able to push images into it, and your Kubernetes machines must be able to pull images from it. Prior to building, make sure you runt the docker login myregistry.com command using the build uid. This docker login will ensure a file is present at /home/.docker/config.json. This is a requirement whether you secure the registry with a form of authentication, or not. Note, if your registry does not respond to pings you will need to add the --skip-docker-url-validation parameter to the build command.

Mirror Repo (Optional)

Similar to the single containers build, it is a good idea to create a mirror repository to host your SAS rpms. A local mirror gives you consistent performance during installation and a consistent build. However, if your containers are able to connect to ses.sas.download then you can skip the mirror step. Beware of the network implications and the fluid nature of these repos.

LDAP

Just like any other SAS Viya environment, all users/groups/authentication/authorization are managed by connecting to an external LDAP. This could be a quick-and-dirty OpenLDAP server we stand up ourselves, or a corporate Active Directory server. Regardless, we will have to be able to make this connection if we want to use SAS Viya's visual interfaces. The easiest and best way to handle this connection is with a sitedefault.yml file. Below is a sample sitedefault.yml that would hypothetically connect to host.com's corporate LDAP. You need to construct your own sitedefault file using values for your LDAP. Consult SAS documentation (linked above) for further information.

config:
    application:
        sas.logon.initial.password: sasboot
        sas.identities.providers.ldap.connection:
            host: myldap.host.com
            port: 368
            userDN: 'CN=ldapadmin,DC=host,DC=com'
            password: ldappassword
        sas.identities.providers.ldap.group:
            baseDN: OU=Groups,DC=host,DC=com
        sas.identities.providers.ldap.user:
            baseDN: DC=host,DC=com
        sas.identities:
            administrator: youruserid

Additionally, we will need to make sure a few of our containers have "host integration" with this same LDAP (specifically, the CAS container and the programming container). The way we do that is with a standard sssd.conf file. You should hopefully be able to track down a valid sssd.conf file for your site from an administrator. Hint – it may be necessary to add homedir (/home/%u) and default shell (/bin/bash) overrides to this file depending on your LDAP configuration.

The way one would apply these two files here is:

  1. place sssd.conf in the add-ons/auth-sssd directory and include the --addons/auth-sssd option when you run build.sh, as we do in the example later.
  2. place sitedefault.yml in the top level of sas-container-recipes. If the recipe sees a sitedefault.yml file here, it will base64 encode it and embed it as a value in the consul.yml config map. If you didn't do this beforehand, you can add your sitedefault.yml file later. Remember the step below is optional, post-build. This is necessary if you did not include sitedefault.yml pre-build.
    cat sitedefault.yml | base64 --wrap=0

    Next, copy and paste the output into your consul.yml configmap (by default you can find this in builds/full/manifests/kubernetes/configmaps/consul.yml). You want to add a new key/value similar to the following:

    consul_key_value_data_enc: Y29uZmlnOgogICAgYXBwbGlj......XNvZW1zaXRlLERDPWNvbQo=[

Ingress

Ingress is a crucial component to make this come together because the only way to access your SAS Viya environment is through your Ingress. The recipe gives us an Ingress resource (one of the generated Kubernetes manifests files); however, an Ingress resource is simply an internal HTTP routing rule. We will need to make sure we have manually installed a valid Ingress controller inside of our Kubernetes environment which can be a little tricky if you are new to Kubernetes. The Ingress controller reads and applies routing rules (Ingress resources) such as the ones created by the recipes.

Traefik and Ngnix are the two most popular industry options. Or you might use native Ingresses offered by AWS, Azure, or GCP if you are running your Kubernetes cluster in the cloud. But to reiterate, you will need an Ingress controller up and running.

Once your Ingress controller is up, you need to edit the provided manifests_usermods.yml. You should set SAS_K8S_INGRESS_DOMAIN to be the DNS name that resolves to a Kubernetes node that can reach your Ingress controller. And while you have this file open you can also set a unique name for the Kubernetes namespaces you want these resources to deploy (the default is "sas-viya"). This manifests_usermods.yml file is available in the util/ directory, so if you are going to use this then you will first make a copy of that file in the top-level sas-container-recipes directory and edit it there.

Kubernetes namespace

Build.sh

With all this in place we are ready to build. To summarize, the “pre-build” config needed here are the files we touched in this sas-container-recipes project:

Relevent pre-build files

So, we can go ahead and launch the build script. I prefer using environment variables for easier readability along with copying and pasting when things change - new registries, mirrors, tags, etc.

SAS_VIYA_DEPLOYMENT_DATA_ZIP=/path/to/SAS_Viya_deployment_data.zip
MIRROR_URL=mymirror.com/myrepo #optional
DOCKER_REGISTRY_URL=myregistry.com
SAS_RECIPE_TYPE=full
DOCKER_REGISTRY_NAMESPACE=viya
SAS_DOCKER_TAG=prod
 
./build.sh --type $SAS_RECIPE_TYPE \
--mirror-url $MIRROR_URL \ #optional
--docker-registry-url $DOCKER_REGISTRY_URL \
--docker-registry-namespace $DOCKER_REGISTRY_NAMESPACE \
--zip $SAS_VIYA_DEPLOYMENT_DATA_ZIP \
--tag $SAS_DOCKER_TAG \
--addons "addons/auth-sssd"

Once complete:

  1. We store container images (30-40 of them depending on the software you have ordered) locally in the build host's docker images directory.
  2. All these images also are tagged and pushed to our Docker Registry. For your organizational reference, the naming convention used is:
    $DOCKER_REGISTRY_URL/$ DOCKER_REGISTRY_NAMESPACE/-:$SAS_DOCKER_TAG
  3. All our Kubernetes manifests files are available on the build machine in sas-container-recipes/builds/full/manifests/kubernetes. These fully configured manifest files are ready to use. They reference the images we have built and pushed.
  4. The build log gives us instructions for how to apply these resources to Kubernetes. These are simple commands you should be able to copy and paste to standup our Viya environment).

Build log instructions

For the curious
The list below is what happened during the build process. Feel free to skip this section, you do not need to know how any of this works to use the recipes:

  1. You, the builder invokes build.sh. This is a wrapper script around the greater build framework.  This script created a "builder container."  Check out the Dockerfile in the top level of the recipes directory.  This builder container builds from a golang base image as the build process, written in a few Go files (new as of April 2019).  Several files from the sas-container-recipes project copy into this container, including said Go files.
    • Note, we did not have to install Go on our build machine since Go is running inside a container.
    • If you are interested in seeing what the builder container looks like, you can run this command: docker run -it --rm --entrypoint /bin/bash sas-container-recipes-builder:$SAS_DOCKER_TAG.
    • A 'sas' user is created inside of this container - this user has the same uid as the user who invoked build.sh on the host.
  2. build.sh also created a new subdirectory on the host called 'builds/<buildtype>-<timestamp>'. This will contain logs, manifests, and various templates used during this specific build.
  3. build.sh then runs that builder container and the real work gets underway. The entry point for the builder is:  go run main.go container.go order.go.  All those arguments you specified when invoking build.sh pass right into this Go program.  Also, the newly created "builds" directory mounts into the container at /sas-container-recipes/builds.
    • The host's /var/run/docker.sock file mounts into this container - this allows the builder container to run docker (docker in docker)
  4. This Go program then:
    • Generates a playbook from your deployment data file (SOE zip) using the [sas-orchestration tool](https://support.sas.com/en/documentation/install-center/viya/deployment-tools/34/command-line-interface.html).
    • Creates Kubernetes manifests for the images set to build.
    • Gathers sets of Ansible roles to install in each container, based on the entitlement of your software order.
    • Generates a Dockerfile for each container, where each applicable Ansible role installs in a new Docker layer
    • Creates a "build context" for each container with the generated Dockerfile and the Ansible role files.
    • Starts a docker build process for each container. The Dockerfile installs ansible and executes the playbook "locally" (inside of each container).
    • Pushes these images into your registry as each build finishes.
    • Note, this happens inside of containers, and the builds execute concurrently. Recall this build machine has 2 cores, so only 2 containers build at a time and it took several hours.  If we used a 16-core machine, this whole build would go faster.  In another terminal, look at docker stats during the build.  Another significant “performance” impact is the network bandwidth between your build machine and your registry.

Running the Containers

We are going to run these containers inside of a Kubernetes environment. Here are the finishing touches needed to give us a completely containerized SAS Viya environment running in Kubernetes. Note, that by default this deploys into a new namespace inside of your Kubernetes cluster and isolates the resources from anything else running.

Kubernetes Environment

Since we built the full stack, we'll need to make sure we have sufficient resources to run all of these containers at the same time. We'll need a minimum of 8 cores and 80GB RAM available. Remember CAS is a multithreaded, in-memory runtime, so the more cores and RAM you provide, the more horsepower you'll have for doing actual analytical work with SAS and CAS.

Kubectl

Hopefully, if you've gotten this far you are familiar with kubectl, which is the client tool/interface used with a Kubernetes cluster. Consider it a cli wrapper around the Kubernetes API. But for thoroughness, you will launch your SAS Viya deployment from whatever machine from where you are running kubectl. If this happens to be the same machine you built on, then you can stay inside of the sas-container-recipes directory you started in, and copy and paste those kubectl apply -f... commands. Or you can copy your manifest files somewhere else and modify those commands accordingly. In either instance, once those commands run, your environment is up, and you should be able to access SAS Environment Manager and other SAS web apps. If you added your userid as an administrator in the sitedefault.yml file, then you can log in as yourself with admin access.
Apply the manifests:

Apply the manifests

And after a few minutes your pods should be up (first time takes the longest since images must be pulled). Note that the pod running doesn’t mean all your SAS Viya services are running. It may take up to 30 minutes for all services to be up and stabilized.

Pods list

With your Ingress and DNS rules set up correctly, you should be able to reach your environment:

SAS login screen

Based on properly configured sitedefault.yml and sssd.conf files, you should be able to log in as an LDAP user.

Miscellaneous Notes

Scaling

Once your SAS Viya environment is up and running in Kubernetes, the following kubectl command adds CAS worker nodes to scale out the capacity of our CAS server.

kubectl scale deployment sas-viya-cas-worker --replicas=5 -n sas-viya-prod

Note, there isn’t any value in adding any more workers than you have physical nodes in your cluster.

Performance

SAS is a powerful programming language designed to handle heavy workloads on large data. General hardware performance has historically been a chief concern to customers implementing SAS. Containers bring a whole new wrinkle to the concept of performance given the general notion of hardware abstraction. One performance related question is: how can we ever guarantee the IO provided by the underlying filesystem (SASWORK, CAS_DISK_CACHE)? Like Kubernetes and Storage/State in general, no easy answer exists. It falls back on the Kubernetes operator to make high performance filesystems (i.e. local SSD) available on all nodes a SAS programming or CAS container(s) might land on, and manually edit the corresponding manifest files to leverage those host disks. Alternatively, we can try to limit the burden on these scratch disk spaces. For CAS, this means ensuring we have more RAM available than data in use.

Amnesia

See the summary section below for a caveat about this deployment methodology – this is not quite a complete implementation for “production” types of environments. At least not without the understanding customer configuration requirements. You should have a discussion with your sales team about some of these details. But please be aware building/deploying as we did here leaves us with an “Amnesiac Viya” (this useful term coined by an astute SAS employee). That is, there is no state here. If and when you take your environment down, or scale pods to 0 across services, this will yield a "brand new" or "fresh" environment once brought back up. The good news is this also means if we run into any issues, we can easily delete the whole namespace and restart. If you want to persist any user data, config, reports, code, etc. you will have to manually attach storage to a few locations.

Full vs Multiple

Note, here we used SAS_DEPLOYMENT_TYPE=full. This built the entire Viya stack, visual interfaces, microservices and all. Alternatively, if we set the deployment type to "multiple" we get three container images – programming, httpproxy, and cas. This would be all we need if we wanted to write SAS code, whether we wanted to use SAS Studio or an external IDE like Jupyter. And we could still scale out our CAS cluster the same way as we did in our full environment.

Summary

Just like everyone else, the SAS container strategy is quickly evolving. SAS Viya, as a scalable, highly available services-oriented architecture, is a perfect fit to run in containers inside of the Kubernetes orchestration framework. Kubernetes brings tremendous operational benefits to the table for this type of software. Smoother deployments, higher uptime, instant scale, much more efficient hardware usage to name a few.

As you will see in the build log when running the recipe, this is an "EXPERIMENTAL" deployment process. The recipes are an excellent way to get your hands on a Kubernetes version of SAS Viya early. Future releases of SAS Viya will be fully "containerized" and "kubernetes-ized" so customers won’t be building their own containers in this manner. Rather, SAS will provide a Helm chart to customers that will pull container images straight from SAS and apply them into their Kubernetes environments appropriately. Further, many aspects of SAS Viya’s infrastructure will be redesigned to be more "Kubernetes native," but the general feel of this model is what sysadmins/operators should see from SAS going forward.

Deploying the Full SAS Viya Stack in Kubernetes was published on SAS Users.

6月 042019
 


Two sayings I’ve heard countless times throughout my life are “Work smarter, not harder,” and “Use the best tool for the job.” If you need to drive a nail, you pick up a hammer, not a wrench or a screwdriver. In the programming world, this could mean using an existing function library instead of writing your own or using an entirely different language because it’s more applicable to your problem. While that sounds good in practice, in the workplace you don’t always have that freedom.

So, what do you do when you’re given a hammer and told to fasten a screw? Or, like in the title of this article’s case, what do you do when you have Python functions you want to use in SAS?

Recently I was tasked with documenting an exciting new feature for SAS — the ability to call Python functions from within SAS. In this article I will highlight everything I’ve learned along the way to bring you up to speed on this powerful new tool.

PROC FCMP Python Objects

Starting with May 2019 release of SAS 9.4M6, the PROC FCMP procedure added support for submitting and executing functions written in Python from within a SAS session using the new Python object. If you’re unfamiliar with PROC FCMP, I’d suggest reading the documentation. In short, FCMP, or the SAS Function Compiler, enables users to write their own functions and subroutines that can then be called from just about anywhere a SAS function can be used in SAS. Users are not restricted to using Python only inside a PROC FCMP statement. You can create an FCMP function that calls Python code, and then call that FCMP function from the DATA step. You can also use one of the products or solutions that support Python objects including SAS High Performance Risk and SAS Model Implementation Platform.

The Why and How

So, what made SAS want to include this feature in our product? The scenario in mind we imagined when creating this feature was a customer who already had resources invested in Python modeling libraries but now wanted to integrate those libraries into their SAS environment. As much fun as it sounds to convert and validate thousands of lines of Python code into SAS code, wouldn’t it be nice if you could simply call Python functions from SAS? Whether you’re in the scenario above with massive amounts of Python code, or you’re simply more comfortable coding in Python, PROC FCMP is here to help you. Your Python code is submitted to a Python interpreter of your choice. Results are packaged into a Python tuple and brought back inside SAS for you to continue programming.

Programming in Two Languages at Once

So how do you program in SAS and Python at the same time? Depending on your installation of SAS, you may be ready to start, or there could be some additional environment setup you need to complete first. In either case, I recommend pulling up the Using PROC FCMP Python Objects documentation before we continue. The documentation outlines the addition of an output string that must be made to your Python code before it can be submitted from SAS. When you call a Python function from SAS, the return value(s) is stored in a SAS dictionary. If you’re unfamiliar with SAS dictionaries, you can read more about them here Dictionaries: Referencing a New PROC FCMP Data Type.

Getting Started

There are multiple methods to load your Python code into the Python object. In the code example below, I’ll use the SUBMIT INTO statement to create an embedded Python block and show you the basic framework needed to execute Python functions in SAS.

/* A basic example of using PROC FCMP to execute a Python function */
proc fcmp;
 
/* Declare Python object */
declare object py(python);
 
/* Create an embedded Python block to write your Python function */
submit into py;
def MyPythonFunction(arg1, arg2):
	"Output: ResultKey"
	Python_Out = arg1 * arg2
	return Python_Out
endsubmit;
 
/* Publish the code to the Python interpreter */
rc = py.publish();
 
/* Call the Python function from SAS */
rc = py.call("MyPythonFunction", 5, 10);
 
/* Store the result in a SAS variable and examine the value */
SAS_Out = py.results["ResultKey"];
put SAS_Out=;
run;

You can gather from this example that there are essentially five parts to using PROC FCMP Python objects in SAS:

  1. Declaring your Python object.
  2. Loading your Python code.
  3. Publishing your Python code to the interpreter.
  4. Executing your Python Code.
  5. Retrieving your results in SAS.

From the SAS side, those are all the pieces you need to get started importing your Python code. Now what about more complicated functions? What if you have working models made using thousands of lines and a variety of Python packages? You still use the same program structure as before. This time I’ll be using the INFILE method to import my Python function library by specifying the file path to the library. You can follow along in by copying my Python code into a .py file. The file, blackscholes.py, contains this code:

def internal_black_scholes_call(stockPrice, strikePrice, timeRemaining, volatility, rate):
    import numpy
    from scipy import stats
    import math
    if ((strikePrice != 0) and (volatility != 0)):
        d1 = (math.log(stockPrice/strikePrice) + (rate + (volatility**2)\
                       /  2) * timeRemaining) / (volatility*math.sqrt(timeRemaining))
        d2 = d1 - (volatility * math.sqrt(timeRemaining))
        callPrice = (stockPrice * stats.norm.cdf(d1)) - \
        (strikePrice * math.exp( (-rate) * timeRemaining) * stats.norm.cdf(d2))
    else:
        callPrice=0
    return (callPrice)
 
def black_scholes_call(stockPrice, strikePrice, timeRemaining, volatility, rate):
    "Output: optprice"
    import numpy
    from scipy import stats
    import math
    optPrice = internal_black_scholes_call(stockPrice, strikePrice,\
                                           timeRemaining, volatility, rate)
    callPrice = float(optPrice)
    return (callPrice,)

My example isn’t quite 1000 lines, but you can see the potential of having complex functions all callable inside SAS. In the next figure, I’ll call these Python functions from SAS.

(/*Using PROC FCMP to execute Python functions from a file */
proc fcmp;
 
/* Declare Python object */
declare object py(python);
 
/* Use the INFILE method to import Python code from a file */
rc = py.infile("C:\Users\PythonFiles\blackscholes.py");
 
/* Publish the code to the Python interpreter */
rc = py.publish();
 
/* Call the Python function from SAS */
rc = py.call("black_scholes_call", 132.58, 137, 0.041095, .2882, .0222);
 
/* Store the result in a SAS variable and examine the value */
SAS_Out = py.results["optprice"];
put SAS_Out=;
run;

Calling Python Functions from the DATA step

You can take this a step further and make it useable in the DATA step-outside of a PROC FCMP statement. We can use our program from the previous example as a starting point. From there, we just need to wrap the inner Python function call in an outer FCMP function. This function within a function design may be giving you flashbacks of Inception, but I promise you this exercise won’t leave you confused and questioning reality. Even if you’ve never used FCMP before, creating the outer function is straightforward.

/* Creating a PROC FCMP function that calls a Python function  */
proc fcmp outlib=work.myfuncs.pyfuncs;
 
/* Create the outer FCMP function */
/* These arguments are passed to the inner Python function */
function FCMP_blackscholescall(stockprice, strikeprice, timeremaining, volatility, rate);
 
/* Create the inner Python function call */
/* Declare Python object */
declare object py(python);
 
/* Use the INFILE method to import Python code from a file */
rc = py.infile("C:\Users\PythonFiles\blackscholes.py");
 
/* Publish the code to the Python interpreter */
rc = py.publish();
 
/* Call the Python function from SAS */
/* Since this the inner function, instead of values in the call           */
/* you will pass the outer FCMP function arguments to the Python function */
rc = py.call("black_scholes_call", stockprice, strikeprice, timeremaining, volatility, rate);
 
/* Store the inner function Python output in a SAS variable                              */
FCMP_out = py.results["optprice"];
 
/* Return the Python output as the output for outer FCMP function                        */
return(FCMP_out);
 
/* End the FCMP function                                                                 */
endsub;
run;
 
/* Specify the function library you want to call from                                    */
options cmplib=work.myfuncs;
 
/*Use the DATA step to call your FCMP function and examine the result                    */
data _null_;
   result = FCMP_blackscholescall(132.58, 137, 0.041095, .2882, .0222);
   put result=;
run;

With your Python function neatly tucked away inside your FCMP function, you can call it from the DATA step. You also effectively reduced the statements needed for future calls to the Python function from five to one by having an FCMP function ready to call.

Looking Forward

So now that you can use Python functions in SAS just like SAS functions, how are you going to explore using these two languages together? The PROC FCMP Python object expands the capabilities of SAS and by result improves you too as a SAS user. Depending on your experience level, completing a task in Python might be easier for you than completing that same task in SAS. Or you could be in the scenario I mentioned before where you have a major investment in Python and converting to SAS is non-trivial. In either case, PROC FCMP now has the capability to help you bridge that gap.

SAS or Python? Why not use both? Using Python functions inside SAS programs was published on SAS Users.

5月 142019
 

Interested in making business decisions with big data analytics? Our Wiley SAS Business Series book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value by Bart Baesens, Wouter Verbeke, and Cristian Danilo Bravo Roman has just the information you need to learn how to use SAS to make data and analytics decision-making a part of your core business model!

This book combines the authorial team’s worldwide consulting experience and high-quality research to open up a road map to handling data, optimizing data analytics for specific companies, and continuously evaluating and improving the entire process.

In the following excerpt from their book, the authors describe a value-centric strategy for using analytics to heighten the accuracy of your enterprise decisions:

“'Data is the new oil' is a popular quote pinpointing the increasing value of data and — to our liking — accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use.”

Analytics process model

In our book, we introduce the analytics process model that describes the iterative chain of processing steps involved in turning data into information or decisions, which is quite similar actually to an oil refinery process. Note the subtle but significant difference between the words data and information in the sentence above. Whereas data fundamentally can be defined to be a sequence of zeroes and ones, information essentially is the same but implies in addition a certain utility or value to the end user or recipient.

So, whether data are information depends on whether the data have utility to the recipient. Typically, for raw data to be information, the data first need to be processed, aggregated, summarized, and compared. In summary, data typically need to be analyzed, and insight, understanding, or knowledge should be added for data to become useful.

Applying basic operations on a dataset may already provide useful insight and support the end user or recipient in decision making. These basic operations mainly involve selection and aggregation. Both selection and aggregation may be performed in many ways, leading to a plentitude of indicators or statistics that can be distilled from raw data. Providing insight by customized reporting is exactly what the field of business intelligence (BI) is about.

Business intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to — and analysis of — information to improve and optimize decisions and performance.

This model defines the subsequent steps in the development, implementation, and operation of analytics within an organization.

    Step 1
    As a first step, a thorough definition of the business problem to be addressed is needed. The objective of applying analytics needs to be unambiguously defined. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit cards. Defining the perimeter of the analytical modeling exercise requires a close collaboration between the data scientists and business experts. Both parties need to agree on a set of key concepts; these may include how we define a customer, transaction, churn, or fraud. Whereas this may seem self-evident, it appears to be a crucial success factor to make sure a common understanding of the goal and some key concepts is agreed on by all involved stakeholders.

    Step 2
    Next, all source data that could be of potential interest need to be identified. The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Some basic exploratory data analysis can then be considered using, for instance, OLAP facilities for multidimensional analysis (e.g., roll-up, drill down, slicing and dicing).

    Step 3
    After we move to the analytics step, an analytical model will be estimated on the preprocessed and transformed data. Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist.

    Step 4
    Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. Trivial patterns (e.g., an association rule is found stating that spaghetti and spaghetti sauce are often purchased together) that may be detected by the analytical model is interesting as they help to validate the model. But of course, the key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities!

    Step 5
    Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent the model output in a user-friendly way, how to integrate it with other applications (e.g., marketing campaign management tools, risk engines), and how to make sure the analytical model can be appropriately monitored and back-tested on an ongoing basis.

Book giveaway!

If you are as excited about business analytics as we are and want a copy of Bart Baesens’ book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value, enter to win a free copy in our book giveaway today! The first 5 commenters to correctly answer the question below get a free copy of Baesens book! Winners will be contacted via email.

Here's the question:
What Free SAS Press e-book did Bart Baesens write the foreword too?

We look forward to your answers!

Further resources

Want to prove your business analytics skills to the world? Check out our Statistical Business Analyst Using SAS 9 certification guide by Joni Shreve and Donna Dea Holland! This certification is designed for SAS professionals who use SAS/STAT software to conduct and interpret complex statistical data analysis.

For more information about the certification and certification prep guide, watch this video from co-author Joni Shreve on their SAS Certification Prep Guide: Statistical Business Analysis Using SAS 9.

Big data in business analytics: Talking about the analytics process model was published on SAS Users.

5月 142019
 

Interested in making business decisions with big data analytics? Our Wiley SAS Business Series book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value by Bart Baesens, Wouter Verbeke, and Cristian Danilo Bravo Roman has just the information you need to learn how to use SAS to make data and analytics decision-making a part of your core business model!

This book combines the authorial team’s worldwide consulting experience and high-quality research to open up a road map to handling data, optimizing data analytics for specific companies, and continuously evaluating and improving the entire process.

In the following excerpt from their book, the authors describe a value-centric strategy for using analytics to heighten the accuracy of your enterprise decisions:

“'Data is the new oil' is a popular quote pinpointing the increasing value of data and — to our liking — accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use.”

Analytics process model

In our book, we introduce the analytics process model that describes the iterative chain of processing steps involved in turning data into information or decisions, which is quite similar actually to an oil refinery process. Note the subtle but significant difference between the words data and information in the sentence above. Whereas data fundamentally can be defined to be a sequence of zeroes and ones, information essentially is the same but implies in addition a certain utility or value to the end user or recipient.

So, whether data are information depends on whether the data have utility to the recipient. Typically, for raw data to be information, the data first need to be processed, aggregated, summarized, and compared. In summary, data typically need to be analyzed, and insight, understanding, or knowledge should be added for data to become useful.

Applying basic operations on a dataset may already provide useful insight and support the end user or recipient in decision making. These basic operations mainly involve selection and aggregation. Both selection and aggregation may be performed in many ways, leading to a plentitude of indicators or statistics that can be distilled from raw data. Providing insight by customized reporting is exactly what the field of business intelligence (BI) is about.

Business intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to — and analysis of — information to improve and optimize decisions and performance.

This model defines the subsequent steps in the development, implementation, and operation of analytics within an organization.

    Step 1
    As a first step, a thorough definition of the business problem to be addressed is needed. The objective of applying analytics needs to be unambiguously defined. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit cards. Defining the perimeter of the analytical modeling exercise requires a close collaboration between the data scientists and business experts. Both parties need to agree on a set of key concepts; these may include how we define a customer, transaction, churn, or fraud. Whereas this may seem self-evident, it appears to be a crucial success factor to make sure a common understanding of the goal and some key concepts is agreed on by all involved stakeholders.

    Step 2
    Next, all source data that could be of potential interest need to be identified. The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Some basic exploratory data analysis can then be considered using, for instance, OLAP facilities for multidimensional analysis (e.g., roll-up, drill down, slicing and dicing).

    Step 3
    After we move to the analytics step, an analytical model will be estimated on the preprocessed and transformed data. Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist.

    Step 4
    Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. Trivial patterns (e.g., an association rule is found stating that spaghetti and spaghetti sauce are often purchased together) that may be detected by the analytical model is interesting as they help to validate the model. But of course, the key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities!

    Step 5
    Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent the model output in a user-friendly way, how to integrate it with other applications (e.g., marketing campaign management tools, risk engines), and how to make sure the analytical model can be appropriately monitored and back-tested on an ongoing basis.

Book giveaway!

If you are as excited about business analytics as we are and want a copy of Bart Baesens’ book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value, enter to win a free copy in our book giveaway today! The first 5 commenters to correctly answer the question below get a free copy of Baesens book! Winners will be contacted via email.

Here's the question:
What Free SAS Press e-book did Bart Baesens write the foreword too?

We look forward to your answers!

Further resources

Want to prove your business analytics skills to the world? Check out our Statistical Business Analyst Using SAS 9 certification guide by Joni Shreve and Donna Dea Holland! This certification is designed for SAS professionals who use SAS/STAT software to conduct and interpret complex statistical data analysis.

For more information about the certification and certification prep guide, watch this video from co-author Joni Shreve on their SAS Certification Prep Guide: Statistical Business Analysis Using SAS 9.

Big data in business analytics: Talking about the analytics process model was published on SAS Users.

5月 102019
 

May 12th is #NationalLimerickDay! If you saw our Valentine’s Day poem, you know we at SAS Press love creating poems and fun rhymes, so check out our limericks below!

So, what’s a limerick?

National Limerick Day is observed each year on May 12th and honors the birthday of the famed English artist, illustrator, author and poet Edward Lear (May 12, 1812 – Jan. 29, 1888). Lear’s poetry is most famous for its nonsense or absurdity, and mostly consists of prose and limericks.

His book, “Book of Nonsense,” published in 1846 popularized the limerick poem.

A limerick poem has five lines and is often very short, humorous, and full of nonsense. To create a limerick the first two lines must rhyme with the fifth line, and the third and fourth lines rhyme together. The limerick’s rhythm is officially described as anapestic meter.

To celebrate, we want to ask all lovers of SAS books to enjoy the limericks written by us and to see if you can create your own! Can you top our limericks on our love for SAS Books? Check out our handy how-to limerick links below.

Our limericks

There once was a software named SAS
helping tons of analysts complete tasks.
a Text Analytics book to extract meaning as data flies by
and a Portfolio and Investment Analysis book so you’ll never go awry.
You know our SAS books are first-class!

We enjoyed meeting our awesome users at SAS Global Forum
who enjoy our books with true decorum.
a SAS Administration book on building from the ground up
and a new book about PROC SQL you need to pick-up.
Checkout our SAS books today, you’ll adore ‘em!

For more about SAS Books and some more of our SAS Press fun, subscribe to our newsletter. You’ll get all the latest news and exclusive newsletter discounts. Also check out all our new SAS books at our online bookstore.

Resources:
Wiki-How: How to Write A Limerick
Limerick Generator: Create a Limerick in Seconds

Happy National Limerick Day from SAS Press! was published on SAS Users.

5月 102019
 

May 12th is #NationalLimerickDay! If you saw our Valentine’s Day poem, you know we at SAS Press love creating poems and fun rhymes, so check out our limericks below!

So, what’s a limerick?

National Limerick Day is observed each year on May 12th and honors the birthday of the famed English artist, illustrator, author and poet Edward Lear (May 12, 1812 – Jan. 29, 1888). Lear’s poetry is most famous for its nonsense or absurdity, and mostly consists of prose and limericks.

His book, “Book of Nonsense,” published in 1846 popularized the limerick poem.

A limerick poem has five lines and is often very short, humorous, and full of nonsense. To create a limerick the first two lines must rhyme with the fifth line, and the third and fourth lines rhyme together. The limerick’s rhythm is officially described as anapestic meter.

To celebrate, we want to ask all lovers of SAS books to enjoy the limericks written by us and to see if you can create your own! Can you top our limericks on our love for SAS Books? Check out our handy how-to limerick links below.

Our limericks

There once was a software named SAS
helping tons of analysts complete tasks.
a Text Analytics book to extract meaning as data flies by
and a Portfolio and Investment Analysis book so you’ll never go awry.
You know our SAS books are first-class!

We enjoyed meeting our awesome users at SAS Global Forum
who enjoy our books with true decorum.
a SAS Administration book on building from the ground up
and a new book about PROC SQL you need to pick-up.
Checkout our SAS books today, you’ll adore ‘em!

For more about SAS Books and some more of our SAS Press fun, subscribe to our newsletter. You’ll get all the latest news and exclusive newsletter discounts. Also check out all our new SAS books at our online bookstore.

Resources:
Wiki-How: How to Write A Limerick
Limerick Generator: Create a Limerick in Seconds

Happy National Limerick Day from SAS Press! was published on SAS Users.

5月 062019
 

App security is at the top of mind for just about everybody – users, IT folks, business executives. Rightfully so. Mobile apps and the devices on which they reside tend to travel around, without any physical boundaries that encompass the traditional desktop computers.

In chatting with folks who are evaluating the SAS Visual Analytics app for their mobile devices, the conversation eventually winds up with a focus on security and the big question comes up:

How is this app secure?

Great question! Here’s a whirlwind tour of the security features that have been built into the SAS Visual Analytics app for Windows 10, Android, and iOS devices. The app is now a young kid and not a toddler anymore, it has been around for about six years. And during its growth journey, the app has been beefed up with rock-solid features to address security for Visual Analytics reports viewed from mobile devices.

Before we take a look at the security features in the app, here are a few things you should know:

    • The app is free.
    • No license is needed to use the app.
    • You can download it anytime from the app store, and try out the sample reports in the app.
    • If you already have SAS Visual Analytics deployed in your organization, you can connect to your server, add reports to the app, and start interacting with your reports from your smartphone or tablet. The Help available in the app walks you through these steps.

Now, let’s get back to security for Visual Analytics reports on mobile devices. Here are five things that make the Visual Analytics app robust and secure on mobile devices.

    1. Device Whitelisting: If you want to connect to your SAS Visual Analytics server from the app, your administrator will “whitelist” your mobile device. Your device is first registered as a valid device that can connect to the Visual Analytics server. The whitelist affects devices, not users. If you happen to lose your mobile device, your administrator can remove the device from the whitelist and prevent access to the reports and data. The option to “blacklist” devices is also available.
    2. Cached Reports: After you add Visual Analytics reports to your app, if you don’t want the report data to remain with the report in the app, your administrator can enable the cached report feature. Data is downloaded only when you open and view the report on your mobile device. When you close the report, that data is removed from the device. For enhanced security, thumbnail images for report tiles in your app will not display for cached reports.
    3. Passcode: To prevent anyone other than yourself from opening the Visual Analytics app, you can set a 4-digit passcode for the app. There are two kinds of passcodes: required and optional. A required passcode is mandated by the server – when you connect to the server, you will create a passcode. Then, whenever you open the app or view a report from that server, you must enter the passcode. An optional passcode, on the other hand, is a passcode that you choose to use to lock up the app – it is not required to access the server, it is needed only to open the app. In addition, there are several features for passcode use that solidify security and access to the app: time-out, lock-out and so forth. I’ll go over these features in an upcoming blog.
    4. SSL/HTTPS: If the Visual Analytics server is set up with SSL/HTTPS, the data viewed in the reports on your mobile device is encrypted.
    5. Offline: If you were offline for a specified number of days, you must sign into the server again. If you don’t, the app does not download reports, update reports, or open reports for viewing.

Cached Reports

One of the security features we just talked about was the cached report feature. Here’s how cached report thumbnails are displayed in the Visual Analytics app on Windows 10, without any images.

When you tap the thumbnail for the cached report, data is immediately downloaded and the report opens in the app for viewing and interaction:

When you close this cached report in the app, the data is removed from the device and the cached report thumbnail displays in the app without any images.

Thanks for joining me on this whirlwind security tour of the SAS Visual Analytics app. Now you know the many different security mechanisms that are in place to protect your organization’s data and reports accessed from the mobile app.

Five key security features in the SAS Visual Analytics app was published on SAS Users.