configuration

6月 192019
 

As a data scientist, did you ever come to the point where you felt the need for an evolved analytics platform bringing together the disparate skills of open source and commercial software? A system that can enable advanced analytic capabilities. This is now possible and easy to implement. With many deployment possibilities, SAS Viya allows you to choose the data storage location where compute happens, and the deployment methods for models.

Let’s say you want to expand your model development process with SAS Viya analytical capabilities and you don’t want to wait for getting such environment up and running. Unfortunately, you have no infrastructure, nor the experience to install SAS Viya. Moving the traditional way, you could go for:

  • Protracted hardware procurement and provisioning
  • Deployment planning and coordination with IT
  • Effort and time required for software installation/configuration

This solution may be the right path for many organizations, but I think we all recognize this: the traditional approach could take days, weeks and yes sometimes months.

What if you could get up and running with a full SAS Viya platform in two hours? If you have some affinity for cloud-based solutions, SAS offers you the AWS SAS Viya Cloud Rapid Deployment tool. SAS released this AWS Quick Start as a rapid deployment architecture for SAS Viya on AWS. Deployable products include SAS Visual Data Mining and Machine Learning, SAS Visual Statistics and SAS Visual Analytics.

The goal of this article is to brief you how I launched such an AWS SAS Viya Quickstart. I strongly advise you to watch this related video by my colleague Erwan Granger. Much of what is covered here appears in Erwan's video. The recording predates the SAS Viya 3.4 release, but main concepts are still the same.

What you will need

The following is a list of items you need to complete this task.

  • AWS Account with appropriate creation privileges
  • A valid SAS Viya License; this means you will need a SAS Software Order Confirmation e-mail
  • Optional: you deploy with your own DNS Name and SSL Certificate. In that case you need to register a domain managed by Amazon Route 53. For instructions on registering the domain, see the Route 53 documentation. And you can request and register a certificate with AWS Certificate Manager.

Furthermore, it’s good to know this Quick Start provides two deployment options. You can deploy SAS Viya into a new Virtual Private Cloud (VPC) or into an existing VPC. The first option builds a new AWS environment consisting of the VPC, private and public subnets, NAT gateways, security groups, Ansible controllers, and other infrastructure components, and then deploys SAS Viya into this new VPC. The second option provisions SAS Viya in your existing AWS infrastructure. I decided to go for the first option.

What you will build

Here's an architectural overview of what we will build:

SAS Viya architecture on the AWS Cloud

You can find exactly the same architecture on the SAS Viya AWS Quick Start landing page.

Configure the build

We’ll be following the build process outlined in the Quick Start guide. On the landing page, next to the "What you’ll build tab" you can click on "How to deploy". From there launch the "Deploy into a new VPC" wizard.

Deploy into a new VPC wizard

Prerequisite prep

Make sure you sign in with your AWS account and you have chosen the region where you want to deploy. On that first screen you can leave the Amazon S3 template URL default. That template is the basics for the AWS CloudFormation we are launching. CloudFormation is a tool from AWS that allows you to spin up resources in the right order. The template is the blueprint document for your CloudFormation. By keeping the default template, we will build exactly the architecture displayed above.

Pre-req prep template

Now click "Next" and move to the page where we can specify more details and the required parameters of the CloudFormation parameters.

Cloudfourmation parameters

The first parameter is the SAS Viya Software Order file, which is the Amazon S3 location of the Software Order e-mail attachment.

SAS Viya install package location

In the Administration section, you provide parameters to configure your AWS architecture. That way, you control access, instance type, and if you will use a SAS Viya Mirror repository.

CloudFormation administration parameters

Administration parameter definitions:

  • The name of an Amazon EC2 key pair, so you can access the Ansible controller
  • The Amazon Availability zone for the public and private subnet
  • Allowable IP range for HTTP traffic; must be a valid IP CIDR range
  • Allowable IP Range for SSH traffic to the Ansible controller; must be a valid IP CIDR range
  • SAS Administrator password
  • Password for Default (sasuser) user
  • Amazon EC2 Instance type for CAS Compute VM
  • Amazon EC2 Instance type for SAS Viya Services VM
  • (Optional) Location of SAS Viya Deployment Repository data
  • (Optional) Operator Email

If you want to work with custom DNS names and SSL, you will need to provide the next three parameters as well.

DNS and SSL configuration (optional)

DNS and SSL parameters:

You may accept the defaults on the remaining parameters.

Optional parameters

After clicking "Next" another set of optional parameters are available. I mostly go with accepting the default parameters provided. The lone exception is the Rollback on failure.

Optional administration parameters

Based on what I’ve learned from Erwan's video, the safer choice is "No" on the Rollback option. This way, if the deployment process encounters issues, the log will identify in which step the error occurred. Of course this means you are responsible to manually delete AWS created resources that are not longer necessary. The easiest way to do this is by deleting the CloudFormation Stacks afterward.

Kick off the build

To conclude the deployment wizard, click "Next" once more and acknowledge the necessary AWS resources to create. By clicking "Create stack" the deployment process starts.

Start the build process

You can monitor the deployment log using AWS CloudWatch. In his video, Erwan demonstrates this at around minute 23.

After a successful formation you will find two AWS CloudFormation Stacks created. The Outputs gives you the direct links to SAStudioV and SASDrive.

SAS Studio and SAS Drive stacks

That’s it. You are deployed and ready to begin using your SAS Viya environment!

Additional Reference

Alexander Koller writes about SAS on AWS and takeaways for preparing for the AWS associate solution architect exam.

Your experiences and opinion matter

New forces are shaping the analytics ecosystem. Because of increased competition, rise in customer expectations and new, emerging technology such as AI and Machine Learning, challenging IT departments with evolving their analytic ecosystems to meet the demands of their business partners.

How is your organization doing this? How does your Analytics Cloud strategy compare to the market? And what do your peers think about migrating Analytics to the cloud? We can give you some insights and an industry benchmark on the topic.

Tell us about your experience in this 5 minute survey and we will be happy to share a detailed industry insight report with you, to answer these questions.

Deploy SAS Viya on AWS - Quick Start was published on SAS Users.

6月 112019
 

This article is a follow-on to a recent post from Jeff Owens, Getting started with SAS Containers. In that post, Jeff discussed building and running a single container for a SAS Viya runtime/IDE. Today we will go through how to build and run the full SAS Viya stack - visual components and all - in Kubernetes. Step 1 is building the container images and Step 2 is running the containers. For both steps, you can go to the sas-container-recipes GitHub repo for more detail and to obtain the tools needed to accomplish this task. An in-depth guide and more information is located on the wiki page in the repository.

The project development team at SAS has done an incredible job of making this new and intuitive way to dynamically create large collections of containers easy and foolproof, despite my long-winded explanation...

Building the Container Images

Keeping with the recipes theme, we are going to need to prepare a few ingredients to make this work. Of course, you will need a valid SAS_Viya_deployment_data.zip file containing your ordered products.

Build Machine

First, you need a Build Machine. This can be a lightweight server, but it needs to be running Linux. The build machine in this example is 2cpu x 8GB RAM, running RHEL 7.6. Hint – 2 cores is the minimum but the more you use for the build the better (faster). I have installed Docker version 18.09.5 here and I have a 100GB volume attached to my docker root (by default this is /var/lib/docker but you can easily change the location in your /etc/docker/daemon.json file).

You can review full system requirements in the GitHub repository here. This article covers the "multiple" or "full" deployment types so focus on that column in the table.
This build machine is going to execute the build script which builds each one of your containers, push them to your Docker Registry, and create the corresponding Kubernetes manifests files needed to launch your deployment.

Make sure you have cloned the sas-container-recipes repository to this machine.

Docker Registry

You will need access to a Docker registry. Your build machine must be able to push images into it, and your Kubernetes machines must be able to pull images from it. Prior to building, make sure you runt the docker login myregistry.com command using the build uid. This docker login will ensure a file is present at /home/.docker/config.json. This is a requirement whether you secure the registry with a form of authentication, or not. Note, if your registry does not respond to pings you will need to add the --skip-docker-url-validation parameter to the build command.

Mirror Repo (Optional)

Similar to the single containers build, it is a good idea to create a mirror repository to host your SAS rpms. A local mirror gives you consistent performance during installation and a consistent build. However, if your containers are able to connect to ses.sas.download then you can skip the mirror step. Beware of the network implications and the fluid nature of these repos.

LDAP

Just like any other SAS Viya environment, all users/groups/authentication/authorization are managed by connecting to an external LDAP. This could be a quick-and-dirty OpenLDAP server we stand up ourselves, or a corporate Active Directory server. Regardless, we will have to be able to make this connection if we want to use SAS Viya's visual interfaces. The easiest and best way to handle this connection is with a sitedefault.yml file. Below is a sample sitedefault.yml that would hypothetically connect to host.com's corporate LDAP. You need to construct your own sitedefault file using values for your LDAP. Consult SAS documentation (linked above) for further information.

config:
    application:
        sas.logon.initial.password: sasboot
        sas.identities.providers.ldap.connection:
            host: myldap.host.com
            port: 368
            userDN: 'CN=ldapadmin,DC=host,DC=com'
            password: ldappassword
        sas.identities.providers.ldap.group:
            baseDN: OU=Groups,DC=host,DC=com
        sas.identities.providers.ldap.user:
            baseDN: DC=host,DC=com
        sas.identities:
            administrator: youruserid

Additionally, we will need to make sure a few of our containers have "host integration" with this same LDAP (specifically, the CAS container and the programming container). The way we do that is with a standard sssd.conf file. You should hopefully be able to track down a valid sssd.conf file for your site from an administrator. Hint – it may be necessary to add homedir (/home/%u) and default shell (/bin/bash) overrides to this file depending on your LDAP configuration.

The way one would apply these two files here is:

  1. place sssd.conf in the add-ons/auth-sssd directory and include the --addons/auth-sssd option when you run build.sh, as we do in the example later.
  2. place sitedefault.yml in the top level of sas-container-recipes. If the recipe sees a sitedefault.yml file here, it will base64 encode it and embed it as a value in the consul.yml config map. If you didn't do this beforehand, you can add your sitedefault.yml file later. Remember the step below is optional, post-build. This is necessary if you did not include sitedefault.yml pre-build.
    cat sitedefault.yml | base64 --wrap=0

    Next, copy and paste the output into your consul.yml configmap (by default you can find this in builds/full/manifests/kubernetes/configmaps/consul.yml). You want to add a new key/value similar to the following:

    consul_key_value_data_enc: Y29uZmlnOgogICAgYXBwbGlj......XNvZW1zaXRlLERDPWNvbQo=[

Ingress

Ingress is a crucial component to make this come together because the only way to access your SAS Viya environment is through your Ingress. The recipe gives us an Ingress resource (one of the generated Kubernetes manifests files); however, an Ingress resource is simply an internal HTTP routing rule. We will need to make sure we have manually installed a valid Ingress controller inside of our Kubernetes environment which can be a little tricky if you are new to Kubernetes. The Ingress controller reads and applies routing rules (Ingress resources) such as the ones created by the recipes.

Traefik and Ngnix are the two most popular industry options. Or you might use native Ingresses offered by AWS, Azure, or GCP if you are running your Kubernetes cluster in the cloud. But to reiterate, you will need an Ingress controller up and running.

Once your Ingress controller is up, you need to edit the provided manifests_usermods.yml. You should set SAS_K8S_INGRESS_DOMAIN to be the DNS name that resolves to a Kubernetes node that can reach your Ingress controller. And while you have this file open you can also set a unique name for the Kubernetes namespaces you want these resources to deploy (the default is "sas-viya"). This manifests_usermods.yml file is available in the util/ directory, so if you are going to use this then you will first make a copy of that file in the top-level sas-container-recipes directory and edit it there.

Kubernetes namespace

Build.sh

With all this in place we are ready to build. To summarize, the “pre-build” config needed here are the files we touched in this sas-container-recipes project:

Relevent pre-build files

So, we can go ahead and launch the build script. I prefer using environment variables for easier readability along with copying and pasting when things change - new registries, mirrors, tags, etc.

SAS_VIYA_DEPLOYMENT_DATA_ZIP=/path/to/SAS_Viya_deployment_data.zip
MIRROR_URL=mymirror.com/myrepo #optional
DOCKER_REGISTRY_URL=myregistry.com
SAS_RECIPE_TYPE=full
DOCKER_REGISTRY_NAMESPACE=viya
SAS_DOCKER_TAG=prod
 
./build.sh --type $SAS_RECIPE_TYPE \
--mirror-url $MIRROR_URL \ #optional
--docker-registry-url $DOCKER_REGISTRY_URL \
--docker-registry-namespace $DOCKER_REGISTRY_NAMESPACE \
--zip $SAS_VIYA_DEPLOYMENT_DATA_ZIP \
--tag $SAS_DOCKER_TAG \
--addons "addons/auth-sssd"

Once complete:

  1. We store container images (30-40 of them depending on the software you have ordered) locally in the build host's docker images directory.
  2. All these images also are tagged and pushed to our Docker Registry. For your organizational reference, the naming convention used is:
    $DOCKER_REGISTRY_URL/$ DOCKER_REGISTRY_NAMESPACE/-:$SAS_DOCKER_TAG
  3. All our Kubernetes manifests files are available on the build machine in sas-container-recipes/builds/full/manifests/kubernetes. These fully configured manifest files are ready to use. They reference the images we have built and pushed.
  4. The build log gives us instructions for how to apply these resources to Kubernetes. These are simple commands you should be able to copy and paste to standup our Viya environment).

Build log instructions

For the curious
The list below is what happened during the build process. Feel free to skip this section, you do not need to know how any of this works to use the recipes:

  1. You, the builder invokes build.sh. This is a wrapper script around the greater build framework.  This script created a "builder container."  Check out the Dockerfile in the top level of the recipes directory.  This builder container builds from a golang base image as the build process, written in a few Go files (new as of April 2019).  Several files from the sas-container-recipes project copy into this container, including said Go files.
    • Note, we did not have to install Go on our build machine since Go is running inside a container.
    • If you are interested in seeing what the builder container looks like, you can run this command: docker run -it --rm --entrypoint /bin/bash sas-container-recipes-builder:$SAS_DOCKER_TAG.
    • A 'sas' user is created inside of this container - this user has the same uid as the user who invoked build.sh on the host.
  2. build.sh also created a new subdirectory on the host called 'builds/<buildtype>-<timestamp>'. This will contain logs, manifests, and various templates used during this specific build.
  3. build.sh then runs that builder container and the real work gets underway. The entry point for the builder is:  go run main.go container.go order.go.  All those arguments you specified when invoking build.sh pass right into this Go program.  Also, the newly created "builds" directory mounts into the container at /sas-container-recipes/builds.
    • The host's /var/run/docker.sock file mounts into this container - this allows the builder container to run docker (docker in docker)
  4. This Go program then:
    • Generates a playbook from your deployment data file (SOE zip) using the [sas-orchestration tool](https://support.sas.com/en/documentation/install-center/viya/deployment-tools/34/command-line-interface.html).
    • Creates Kubernetes manifests for the images set to build.
    • Gathers sets of Ansible roles to install in each container, based on the entitlement of your software order.
    • Generates a Dockerfile for each container, where each applicable Ansible role installs in a new Docker layer
    • Creates a "build context" for each container with the generated Dockerfile and the Ansible role files.
    • Starts a docker build process for each container. The Dockerfile installs ansible and executes the playbook "locally" (inside of each container).
    • Pushes these images into your registry as each build finishes.
    • Note, this happens inside of containers, and the builds execute concurrently. Recall this build machine has 2 cores, so only 2 containers build at a time and it took several hours.  If we used a 16-core machine, this whole build would go faster.  In another terminal, look at docker stats during the build.  Another significant “performance” impact is the network bandwidth between your build machine and your registry.

Running the Containers

We are going to run these containers inside of a Kubernetes environment. Here are the finishing touches needed to give us a completely containerized SAS Viya environment running in Kubernetes. Note, that by default this deploys into a new namespace inside of your Kubernetes cluster and isolates the resources from anything else running.

Kubernetes Environment

Since we built the full stack, we'll need to make sure we have sufficient resources to run all of these containers at the same time. We'll need a minimum of 8 cores and 80GB RAM available. Remember CAS is a multithreaded, in-memory runtime, so the more cores and RAM you provide, the more horsepower you'll have for doing actual analytical work with SAS and CAS.

Kubectl

Hopefully, if you've gotten this far you are familiar with kubectl, which is the client tool/interface used with a Kubernetes cluster. Consider it a cli wrapper around the Kubernetes API. But for thoroughness, you will launch your SAS Viya deployment from whatever machine from where you are running kubectl. If this happens to be the same machine you built on, then you can stay inside of the sas-container-recipes directory you started in, and copy and paste those kubectl apply -f... commands. Or you can copy your manifest files somewhere else and modify those commands accordingly. In either instance, once those commands run, your environment is up, and you should be able to access SAS Environment Manager and other SAS web apps. If you added your userid as an administrator in the sitedefault.yml file, then you can log in as yourself with admin access.
Apply the manifests:

Apply the manifests

And after a few minutes your pods should be up (first time takes the longest since images must be pulled). Note that the pod running doesn’t mean all your SAS Viya services are running. It may take up to 30 minutes for all services to be up and stabilized.

Pods list

With your Ingress and DNS rules set up correctly, you should be able to reach your environment:

SAS login screen

Based on properly configured sitedefault.yml and sssd.conf files, you should be able to log in as an LDAP user.

Miscellaneous Notes

Scaling

Once your SAS Viya environment is up and running in Kubernetes, the following kubectl command adds CAS worker nodes to scale out the capacity of our CAS server.

kubectl scale deployment sas-viya-cas-worker --replicas=5 -n sas-viya-prod

Note, there isn’t any value in adding any more workers than you have physical nodes in your cluster.

Performance

SAS is a powerful programming language designed to handle heavy workloads on large data. General hardware performance has historically been a chief concern to customers implementing SAS. Containers bring a whole new wrinkle to the concept of performance given the general notion of hardware abstraction. One performance related question is: how can we ever guarantee the IO provided by the underlying filesystem (SASWORK, CAS_DISK_CACHE)? Like Kubernetes and Storage/State in general, no easy answer exists. It falls back on the Kubernetes operator to make high performance filesystems (i.e. local SSD) available on all nodes a SAS programming or CAS container(s) might land on, and manually edit the corresponding manifest files to leverage those host disks. Alternatively, we can try to limit the burden on these scratch disk spaces. For CAS, this means ensuring we have more RAM available than data in use.

Amnesia

See the summary section below for a caveat about this deployment methodology – this is not quite a complete implementation for “production” types of environments. At least not without the understanding customer configuration requirements. You should have a discussion with your sales team about some of these details. But please be aware building/deploying as we did here leaves us with an “Amnesiac Viya” (this useful term coined by an astute SAS employee). That is, there is no state here. If and when you take your environment down, or scale pods to 0 across services, this will yield a "brand new" or "fresh" environment once brought back up. The good news is this also means if we run into any issues, we can easily delete the whole namespace and restart. If you want to persist any user data, config, reports, code, etc. you will have to manually attach storage to a few locations.

Full vs Multiple

Note, here we used SAS_DEPLOYMENT_TYPE=full. This built the entire Viya stack, visual interfaces, microservices and all. Alternatively, if we set the deployment type to "multiple" we get three container images – programming, httpproxy, and cas. This would be all we need if we wanted to write SAS code, whether we wanted to use SAS Studio or an external IDE like Jupyter. And we could still scale out our CAS cluster the same way as we did in our full environment.

Summary

Just like everyone else, the SAS container strategy is quickly evolving. SAS Viya, as a scalable, highly available services-oriented architecture, is a perfect fit to run in containers inside of the Kubernetes orchestration framework. Kubernetes brings tremendous operational benefits to the table for this type of software. Smoother deployments, higher uptime, instant scale, much more efficient hardware usage to name a few.

As you will see in the build log when running the recipe, this is an "EXPERIMENTAL" deployment process. The recipes are an excellent way to get your hands on a Kubernetes version of SAS Viya early. Future releases of SAS Viya will be fully "containerized" and "kubernetes-ized" so customers won’t be building their own containers in this manner. Rather, SAS will provide a Helm chart to customers that will pull container images straight from SAS and apply them into their Kubernetes environments appropriately. Further, many aspects of SAS Viya’s infrastructure will be redesigned to be more "Kubernetes native," but the general feel of this model is what sysadmins/operators should see from SAS going forward.

Deploying the Full SAS Viya Stack in Kubernetes was published on SAS Users.

3月 222019
 

As data volumes continue to surge (with no signs of slowing down), the cost to handle all that data can have a very real impact on IT budgets.

But for all the talk of the growing value of data, the cost side of the equation is often overlooked. After all, it not only takes a lot of drives to hold all that data, it requires significant computing resources to handle it. The costs quickly add up, forcing IT leaders to make tough choices about where to make their IT investments.

This is as true for SAS users as it is for any other IT organization. We work closely with our customers and know that the cost of our analytics software is only part of the budget puzzle – total cost of ownership (TCO) casts a long shadow over their budget decisions. So we were as curious as they were when Intel released its Intel® OptaneTM DC Persistent Memory.

What is persistent memory?

Data written to persistent memory remains accessible using memory instructions even after the process that created or last modified them is gone. In this aspect, persistent memory is like a hard disk drive (HDD). But the speed at which the data can be accessed from persistent memory is considerably faster than the speed delivered by an HDD.

Of course, as a longtime Intel collaborator, we worked closely together to optimize SAS software for this innovation. But how would it work in real-life applications outside of the labs? Would the Intel Optane DC Persistent Memory - SAS® Viya® combination have a real impact, or would it just represent yet another incremental improvement – nice, but not enough to change the IT investment-performance equation?

Testing confirms performance

There was only one way to find out, of course: Test it.

So that’s what we did. In collaboration with Intel, we ran SAS Viya 3.4 on Intel Optane DC Persistent Memory. More specifically, the testing compared performance of a two-socket second-generation Intel® Xeon® Platinum 8280 processor with DRAM to a two-socket second-generation Intel Xeon Platinum 8280 processor with Intel Optane DC persistent memory. (See editor's note below for details.)

We didn’t hold back, either – we ran multiple instances of a large model containing 400 gigabytes of data, concurrently. All these jobs finished in a matter of minutes. Just as important, the costs of achieving performance in a persistent memory environment with SAS Viya, when compared to DRAM and SAS Viya, make their own compelling case.

Intel Optane persistent memory achieves more than 95 percent on-par performance with DRAM, at a lower cost per performance rate of approximately 18 percent. With second-generation Intel Xeon Scalable Processors and Intel Optane DC persistent memory in Memory Mode, it’s clear that users can take advantage of the benefits of systems with larger capacity memory, with little or no performance degradation.

Greater memory capacity = substantial cost savings

But performance may not even be the most interesting part of this story – not when you compare it to the cost benefits of running in this environment. In short, compared with DRAM – equal capacity memory, equal CPU, and so on – using Intel Optane DC Persistent Memory leads to a reduction in costs of 20 percent or more. That’s a tangible, meaningful impact for any IT budget, allowing CIOs to invest more heavily to address other pressing needs. By contrast, system costs increase dramatically with higher DRAM capacity.

Plus, it may be possible in the future to use the memory module in different modes – an opportunity we’re currently testing in non-volatile random-access memory (NVRAM) mode with the goal of allowing applications to take advantage of this benefit.

How does this combination of persistent memory tools and SAS Viya contribute to faster speeds and lower costs? The massive scale of data typically handled by SAS customers can be handled by fewer servers, because each can now have a much greater memory capacity. This equation can translate into a significant reduction in the total cost of ownership.

There are a lot of factors, many of which are unique to your organization, that go into reducing the total cost of ownership – so it goes without saying that your mileage may vary. Regardless, this is a conversation your organization needs to be having right now, if it hasn’t already. We’re happy to answer your questions about running SAS Viya together with Intel Optane DC Persistent Memory.

Want a deeper dive? | See the video on Intel Optane DC Persistent Memory

Editor's Note: Specs for this test included:

  • SAS® Viya® In-memory Analytics: SAS® Viya 3.4 VDMML application.
  • Workload: 3 concurrent logistic regression tasks each running on 400GB datasets.
  • Testing by: Intel and SAS completed on February 15, 2019.
  • Baseline hardware for comparison: 2S Intel® Xeon® Platinum 8280 processor, 2.7GHz, 28 cores, turbo and HT on, BIOS SE5C620.86B.0D.01.0286.011120190816, 1536GB total memory, 24 slots / 64GB / 2666 MT/s / DDR4 LRDIMM, 1x 800GB, Intel SSD DC S3710 OS Drive + 1x 1.5TB Intel Optane SSD DC P4800X NVMe Drive for CAS_DISK_CACHE + 1x 1.5TB Intel SSD DC P4610 NVMe Drive for application data, CentOS Linux* 7.6 kernel 4.19.8.
  • New hardware tested: 2S Intel® Xeon® Platinum 8280 processor, 2.7GHz, 28 cores, turbo and HT on, BIOS SE5C620.86B.0D.01.0286.011120190816, 1536GB Intel Optane DCPMM configured in Memory Mode(8:1), 12 slots / 128GB / 2666 MT/s, 192GB DRAM, 12 slots / 16GB / 2666 MT/s DDR4 LRDIMM, 1x 800GB, Intel SSD DC S3710 OS Drive + 1x 1.5TB Intel Optane SSD DC P4800X NVMe Drive for CAS_DISK_CACHE + 1x 1.5TB Intel SSD DC P4610 NVMe Drive for application data, CentOS Linux* 7.6 kernel 4.19.8.)

Persistent memory: one way to rein in IT costs was published on SAS Users.

3月 072019
 

As of December 2018, any customer with a valid SAS Viya order is able to package and deploy their SAS Viya software in Docker containers. SAS has provided a fully documented and supported project (or “recipe”) for easily building these containers. So how can you start? You can simply stop reading this article and go directly to the GitHub repository and follow the instructions there. Otherwise, in this article, Jeff Owens, a solutions architect at SAS, provides a little color commentary around the process in case it is helpful…

First of all, what is the point of these containers?

Well, at its core, remember SAS and it’s massively parallel, in-memory counterpart, Cloud Analytic Services (CAS) is a powerful runtime for data processing and analytics. A runtime simply being an engine responsible for processing and executing a particular type of code (i.e. SAS code). Traditionally, the SAS runtime would live on a centralized server somewhere and users would submit their “jobs” to that SAS runtime (server) in a variety of ways. The SAS server supports a number of different products, tasks, etc. – but for this discussion let’s just focus on the scenario where a job here is a “.sas” file, perhaps developed in an IDE-like Enterprise Guide or SAS Studio, and submitted to the SAS runtime engine via the IDE itself, a bash shell, or maybe even SAS’ enterprise grade scheduler and job management solution – SAS Grid. In these cases, the SAS and CAS servers are on dedicated, always-on physical servers.

The brave new containerized world in which we live provides us a new deployment model: submit the job and create the runtime server at the same time. Plus, only consume the exact resources from the host machine or the Kubernetes cluster the specific job requires. And when the job finishes, release those resources for others to use. Kubernetes and PaaS clusters are quite likely shared environments, and one of the major themes in the rise of the containers is the further abstraction between hardware and software. Some of that may be easier said than done, particularly for customers with very large volumes of jobs to manage, but it is indeed possible today with SAS Viya on Docker/Kubernetes.

Another effective (and more immediate) usage of this containerized version of SAS Viya is simply an adhoc, on-demand, temporary development environment. The container package includes SAS Studio, so one can quickly spin up a full SAS Viya programming sandbox – SAS Studio as well as the SAS & CAS runtimes. Here they can develop and test SAS code, and just as quickly tear the environment down when no longer needed. This is useful for users that: (a) don’t have access to an “always-on” environment for whatever reason, (b) want to try out experimental code that could potentially consume resources from a shared "always-on" sas environment, and/or (c) maybe their Kubernetes cluster has many more resources available than their always-on and they want to try a BIG job.

Yes, it is possible to deploy the entire SAS Viya stack (microservices and all) via Kubernetes but that discussion is for another day. This post focuses strictly on the SAS Viya programming components and running on a single machine Docker host rather than a Kubernetes cluster.

Build the container image

I will begin here with a fresh single machine RHEL 7.5 server running on Openstack. But this machine could have been running on any cloud or VM platform, and I could use any (modern enough) flavor of Linux thanks to how Docker works. My machine here has 8cpu, 16GB RAM, and a 50GB root volume. Less or more is fine. A couple of notes to help understand how to configure an instance:

  • The final docker container image we will end up with will be ~10GB in size and like all docker images will live in /var/lib/docker/images by default.
    • Yes, that is large for a container. Most of this size is just static bins and libs that support the very developed SAS language. Compare to an Anaconda image which is ~3.6GB.
  • As for RAM, remember any tables loaded to CAS are loaded to memory (and will swap to disk as needed). So, your memory choice should be directly dependent on the data sizes you expect to work with.
  • Similar story for cores – CAS code is multithreaded, so more cores = more parallelization.

The first step is to install Docker.

Following along with sas-container-recipes now, the first thing I should do is mirror the repo for my order. Note, this is not a required step – you could build this container directly from SAS repos if you wanted, but we’ll mirror as a best practice. We could simply mirror and serve it over the local filesystem of our build host, but since I promised color I’ll serve it over the web instead. So, these commands run on a separate RHEL server. If you choose to mirror on your build host, make sure you have the disk space (~30GB should be plenty). You will also need your SAS_Viya_deployment_data.zip file available on the SAS Customer Support site. Run the following code to execute the setup.

$ wget https://support.sas.com/installation/viya/34/sas-mirror-manager/lax/mirrormgr-linux.tgz
$ tar xf mirrormgr-linux.tgz
$ rm -f mirrormgr-linux.tgz
$ mkdir -p /repos/viyactr
$ mirrormgr mirror –deployment-data SAS_Viya_deployment_data.zip –path /repos/viyactr –platform x64-redhat-linux-6 –latest
$ yum install httpd -y
$ system start httpd
$ systemctl enable httpd
$ ln -s /repos/viyactr /var/www/html/sas_repo

Next, I go ahead and clone the sas-containers-recipes repo locally and upload my SAS-Viya-deployment-data.zip file and I am ready to run the build command. As a bonus, I am also going to use my site’s (SAS’) sssd.conf file so my container will use our corporate Active Directory for authentication. If you do not need or want that integration you can skip the “vi addons/sssd.conf” line and change the “--addons” option to “addons/auth-demo” so your container seeds with a single “sasdemo:sasdemo” user:password instead.

$ # upload SAS_Viya_deployment_data.zip to this machine somehow
$ Git clone https://github.com/sassoftware/sas-container-recipes.git
$ cd sas-container-recipes/
$ vi addons/sssd.conf # <- paste in your site’s sssd.conf file
$ build.sh \
--type single \
--zip ~/SAS_Viya_deployment_data.zip \
--mirror-url http://jo.openstack.sas.com/sas_repo \
--addons “addons/auth-sssd”

The build should take about 45 minutes and produce a single container image for you (there might be a few images, but it is just one with a thin layer or two on top). You might want to give this image a new name (docker tag) and push it into your own private registry (docker push). Aside from that, we are ready to run it.
If you are curious, look in the addons directory for the other optional layers you can add to your container. Several tools are available for easily configuring connections to external databases.

Run the container

Here is the run command we can use to launch the container. Note the image name I use here is “sas-viya-programming:xxxxxx” – this is the image that has my sssd layer built on top of it.

$ docker run \
--detach \ 
--rm \ 
--env CASENV_CAS_HOST=$(hostname -f) \ 
--env CASENV_CAS_VIRTUAL_PORT=8081 \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Connect to the container

And now, in a web browser, I can go to :8081/SASStudio and I will end up in SAS Studio, where I can sign in with my internal SAS credentials. To stop the container, use the name you gave it: “docker stop sas-viya-programming”. Because we used the “--rm” flag the container will be removed (completely destroyed) when we stop it.

Note we are explicitly mapping in the HTTP port (8081:80) so we easily know how to get to SAS Studio. If you want to start up another container here on the same host, you will need to use a different port or else you’ll get an address already in use error. Also note we might be interested in connecting directly to this CAS server from something other than SAS Studio (localhost). A remote python client for example. We can use the other port we mapped in (5570:5570) to connect to the CAS server.

Persist the data

Running this container with the above command means anything and everything done inside the container (configuration changes, code, data) will not persist if the container stops and a new one started later. Luckily this is a very standard and easy to solve scenario with Docker and Kubernetes. Here are a couple of targets inside the container you might be interested in mounting a volume to:

  • /tmp – this is where CAS_DISK_CACHE is by default, not to mention SASWORK. Those are scratch space used by the runtimes. If you are working with small data and don’t care too much about performance, no need to worry about this. But to optimize your container we would suggest mounting a Docker volume to this location (or, ideally, bind mount a high-performance storage device here). Note that generally Docker prefers us to use Docker volumes in lieu of bind mounts, but that is more for manageability, security, and portability than performance.
  • /data – this directory doesn’t necessarily exist but when you mount a volume into a container the target location will be created. So, you could call this target whatever you want, assuming it doesn’t exist yet.  Bind mounting is tempting here and OK to do but consider the scenario when another user wants to run your container following instructions you provided them – better to use a Docker volume than force them to create the directory on the host.  If you have an NFS location, bind mounting that makes sense
  • /code – same spiel as with /data. Once you are in the container you can save your work and it will persist in the docker volume from run to run of your container.

Here is what an updated docker run command might look like with these volumes included:

$docker run \ 
--detach \ 
-rm \ 
--env CASNV_CAS_VIRTUAL_HOST=$(hostname -f) \ 
--env CASNV_CAS_VIRTUAL_PORT=8081 \ 
--volume mydata:/data \ 
--volume /nfsdata:/nfsdata \ # example syntax for bind mount instead of docker volume mount 
--volume mycode:/code \ 
--volume sastmp:/tmp \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Can I run this on my laptop?

Yes. You would just need to install Docker on your laptop (go to docker.com for that). You can certainly follow the instructions from the top to build and run locally. You can even push this container image out to an internal registry so other users could skip the build and just run.

So far, we have only talked about the “ad-hoc” or “sandbox” dev type of use case for this container. A later article may cover how to run in batch mode or maybe we will move straight to multi-containers & Kubernetes. In the meantime though, here is how to submit a .sas program as a batch job to this single container we have built.

Give it a try!

Try creating your own image and deploying a container. Feel free to comment on your experience.

More info:

SAS Communities Article- Running SAS Analytics in a Docker container
SAS Global Forum Paper- Docker Toolkit for Data Scientists – How to Start Doing Data Science in Minutes!
SAS Global Forum Tech Talk Video- Deploying and running SAS in Containers

Getting Started with SAS Containers was published on SAS Users.

3月 302018
 

As a follow on from my previous blog post, where we looked at the different use cases for using Kerberos in SAS Viya 3.3, in this post I want to delve into more details on configuring Kerberos delegation with SAS Viya 3.3. SAS Viya 3.3 supports the use of Kerberos delegation to authenticate to SAS Logon Manager and then use the delegated credentials to access SAS Cloud Analytic Services. This was the first use case we illustrated in the previous blog post.

As a reminder this is the scenario we are discussing in this blog post:

Kerberos Delegation

In this post we’ll examine:

  • The implications of using Kerberos delegation.
  • The prerequisites.
  • How authentication is processed.
  • How to configure Kerberos delegation.

Why would we want to configure Kerberos delegation for SAS Viya 3.3? Kerberos will provide us with a strong authentication mechanism for the Visual interfaces, SAS Cloud Analytic Services, and Hadoop in SAS Viya 3.3. With Kerberos enabled, no end-user credentials will be sent from the browser to the SAS Viya 3.3 environment. Instead Kerberos relies on a number of encrypted tickets and a trusted third party to provide authentication. Equally, leveraging Kerberos Delegation means that both the SAS Cloud Analytic Services session and the connection to Hadoop will all be running as the end-user. This better allows you to trace operations to a specific end-user and to more thoroughly apply access controls to the end-user.

Implications

Configuring Kerberos delegation will involve configuring Kerberos authentication for both the Visual interfaces and SAS Cloud Analytic Services. First, we’ll look at the implications for the Visual interfaces.

Once we configure Kerberos for authentication of SAS Logon Manager it replaces the default LDAP provider for end-users. This means that the only way for end-users to authenticate to SAS Logon Manager will be with Kerberos. In SAS Viya 3.3 there is no concept of fallback authentication.

Kerberos will be our only option for end-user authentication and we will be unable to use the sasboot account to access the environment. Configuring Kerberos authentication for SAS Logon Manager will be an all-or-nothing approach.

While the web clients will be using Kerberos for authentication, any client using the OAuth API directly will still use the LDAP provider. This means when we connect to SAS Cloud Analytic Services from SAS Studio (which does not integrate with SAS Logon) we will still be obtaining an OAuth token using the username and password of the user accessing SAS Studio.

If we make any mistakes when we configure Kerberos, or if we have not managed to complete the prerequisites correctly, the SAS Logon Manager will not start correctly. The SAS Logon Manager bootstrap process will error and SAS Logon Manager will fail to start. If SAS Logon Manager fails to start then there is no way to gain access to the SAS Viya 3.3 visual interfaces. In such a case the SAS Boot Strap configuration tool must be used to repair or change the configuration settings. Finally, remember using Kerberos for SAS Logon Manager does not change the requirement for the identities microservice to connect to an LDAP provider. Since the identities microservice is retrieving information from LDAP about users and groups we need to ensure the username part of the Kerberos principal for the end-users match the username returned from LDAP. SAS Logon Manager will strip the realm from the user principal name and use this value in the comparison.

Then considering SAS Cloud Analytic Services, we will be adding Kerberos to the other supported mechanisms for authentication. We will not replace the other mechanisms the way we do for SAS Logon Manager. This means we will not prevent users from connecting with a username and password from the Programming interfaces. As with the configuration of SAS Logon Manager, issues with the configuration can cause SAS Cloud Analytic Services to fail to start. Therefore, it is recommended to complete the configuration of SAS Cloud Analytic Services after the deployment has completed and you are certain things are working correctly.

Prerequisites

To be able to use Kerberos delegation with SAS Viya 3.3 a number of prerequisites need to be completed.

Service Principal Name

First a Kerberos Service Principal Name (SPN) needs to be registered for both the HTTP service class and the sascas service class. This will take the form <service class>/<HOSTNAME>, where the <HOSTNAME> is the value that will be used by clients to request a Kerberos Service Ticket. In most cases for HTTP the <HOSTNAME> will just be the fully qualified hostname of the machine where the Apache HTTP Server is running. If you are using aliases or alternative DNS registrations then finding the correct name to use might not be so straight forward. For SAS Cloud Analytic Services, the <HOSTNAME> will be the CAS Controller hostnameNext by registering we mean that this Service Principal Name must be provided to the Kerberos Key Distribution Center (KDC). If we are using Microsoft Active Directory, each SPN must be registered against an object in the Active Directory database. Objects that can have a SPN registered against them are users or computers. We recommend using a user object in Active Directory to register each SPN against. We also recommend that different users are used for HTTP and CAS.

So, we have two service accounts in Active Directory and we register the SPN against each service account. There are different ways the SPN can be registered in Active Directory. The administrator could perform these tasks manually using the GUI, using an LDAP script, PowerShell script, using the setspn command, or using the ktpass command. Using these tools multiple SPNs can be registered against the service account, which is useful if there are different hostnames the end-users might use to access the service. In most cases using these tools will only register the SPN; however, using the ktpass command will also change the User Principal Name for the service account. More on this shortly.

Alternatively, to Microsoft Active Directory customers could be using a different Kerberos KDC. They could use MIT Kerberos or Heimdal Kerberos. For these implementations of Kerberos there is no difference between a user and a service. The database used by these KDCs just stores information on principals and does not provide a distinction between a User Principal Name and a Service Principal Name.

Trusted for Delegation

For the Kerberos authentication to be delegated from SAS Logon Manager to SAS Cloud Analytic Services and then from SAS Cloud Analytic Services to Secured Hadoop, the two service accounts that have the SPNs registered against them must be trusted for delegation. Without this the scenario it will not work. You can only specify that an account is trusted for delegation after the Service Principal Name has been registered. The option is not available until you have completed that step. The picture below shows an example of the delegation settings in Active Directory.

If the Secured Hadoop environment is configured using a different Kerberos Key Distribution Center (KDC) to the rest of the environment it will not prevent the end-to-end scenario working. However, it will add further complexity. You will need to ensure there is a cross-realm trust configured to the Hadoop KDC for the end-to-end scenario to work.

Kerberos Keytab

Once you have registered each of the SPNs you’ll need to create a Kerberos keytab for each service account. Again, there are multiple tools available to create the Kerberos keytab. We recommend using the ktutil command on Linux, since this is independent of the KDC and makes no changes to the Kerberos database when creating the keytab. Some tools like ktpass will make changes when generating the keytab.

In the Kerberos keytab we need to have the User Principal Name (UPN) and associated Kerberos keys for that principal. The Kerberos keys are essentially encrypted versions of the password for the principal. As we have discussed above, about the SPN, depending on the tools used to register it the UPN for the Kerberos keytab could take different forms.

When using ktpass to register SPN and create the keytab in a single step the UPN of the account in Active Directory will be set to the same value as the SPN. Whilst using the setspn command or performing the task manually will leave the UPN unchanged. Equally for MIT Kerberos or Heimdal Kerberos, since there is no differentiation between principals the UPN for the keytab, will be the SPN registered with the KDC.

Once the Kerberos keytabs have been created they will need to be made available to any hosts with the corresponding service deployed.

Kerberos Configuration File

Finally, as far as prerequisites are concerned we might need to provide a Kerberos configuration file for the host where SAS Logon Manager is deployed. This configuration should identify the default realm and other standard Kerberos settings. The Kerberos implementation in Java should be able to use network queries to find the default realm and Kerberos Key Distribution Center. However, if there are issues with the network discovery, then providing a Kerberos configuration file will allow us to specify these options.

The Kerberos configuration file should be placed in the standard location for the operating system. So on Linux this would be /etc/krb5.conf. If we want to specify a different location we can also specify a JVM option to point to a different location. This would be the java.security.krb5.conf option. Equally, if we cannot create a Kerberos configuration file we could set the java.security.krb5.realm and java.security.krb5.kdc options to identify the Kerberos Realm and Kerberos Key Distribution Center. We’ll show how to set JVM options below.

Authentication Process

The process of authenticating an end-user is shown in the figure below:

Where the steps are:

A.  Kerberos used to authenticate to SAS Logon Manager. SAS Logon Manager uses the Kerberos Keytab for HTTP/<HOSTNAME> to validate the Service Ticket. Delegated credentials are stored in the Credentials microservice.
B.  Standard internal OAuth connection to SAS Cloud Analytic Services. Where the origin field in the OAuth token includes Kerberos and the claims include the custom group ID “CASHOSTAccountRequired”.
C.  The presence of the additional Kerberos origin causes SAS Cloud Analytic Services to get the CAS client to make a second connection attempt using Kerberos. The Kerberos credentials for the end-user are obtained from the Credentials microservice. SAS Cloud Analytic Services Controller uses the Kerberos Keytab for sascas/<HOSTNAME> to validate the Service Ticket and authenticate the end-user. Delegated credentials are placed in the end-user ticket cache.
D.  SAS Cloud Analytic Services uses the credentials in the end-user ticket cache to authenticate as the end-user to the Secured Hadoop environment.

Configuration

Kerberos authentication must be configured for both SAS Logon Manager and SAS Cloud Analytic Services. Also, any end-user must be added to a new custom group.

SAS Logon Manager Configuration

SAS Logon Manager is configured in SAS Environment Manager.

Note: Before attempting any configuration, ensure at least one valid LDAP user is a member of the SAS Administrators custom group.

The configuration settings are within the Definitions section of SAS Environment Manager. For the sas.logon.kerberos definition you need to set the following properties:

For more information see the

SAS Logon Manager will need to be restarted for these new JVM options to be picked up. The same method can be used to set the JVM options for identifying the Kerberos Realm and KDC where we would add the following:

  • Name = java_option_krb5realm
  • Value = -Djava.security.krb5.realm=<REALM>
  • Name = java_option_krb5kdc
  • Value = -Djava.security.krb5.kdc=<KDC HOSTNAME>

Or for setting the location of the Kerberos configuration file where we would add:

  • Name = java_option_krb5conf
  • Value = -Djava.security.krb5.conf=/etc/krb5.conf

SAS Cloud Analytic Services Configuration

The configuration for SAS Cloud Analytic Services is not performed in SAS Environment Manager and is completed by changing files on the file system. The danger of changing files on the file system is that re-running the deployment Ansible playbook might overwrite any changes you make. The choices you have is to either remake any changes to the file system, make the changes to both the file system and the playbook files, or make the changes in the playbook files and re-run the playbook to change the file system. Here I will list the changes in both the configuration files and the playbook files.

There is only one required change and then two option changes. The required change is to define the authentication methods that SAS Cloud Analytic Services will use. In the file casconfig_usermods.lua located in:

/opt/sas/viya/config/etc/cas/default

Add the following line:

cas.provlist = 'oauth.ext.kerb'

Note: Unlike the SAS Logon Manager option above, this is separated with full-stops!

In the same file we can make two optional changes. These optional changes enable you to override default values. The first is the default Service Principal Name that SAS Cloud Analytic Services will use. If you cannot use sascas/<HOSTNAME> you can add the following to the casconfig_usermods.lua:

-- Add Env Variable for SPN
env.CAS_SERVER_PRINCIPAL = 'CAS/HOSTNAME.COMPANY.COM'

This sets an environment variable with the new value of the Service Principal Name. The second optional change is to set another environment variable. This will allow you to put the Kerberos Keytab in any location and call it anything. The default name and location is:

/etc/sascas.keytab

If you want to put the keytab somewhere else or call it something else add the following to the casconfig_usermods.lua

-- Add Env Variable for keytab location
env.KRB5_KTNAME = '/opt/sas/cas.keytab'

These changes can then be reflected in the vars.yml within the playbook by adding the following to the CAS_CONFIGURATION section:

CAS_CONFIGURATION:
   env:
     CAS_SERVER_PRINCIPAL: 'CAS/HOSTNAME.COMPANY.COM'
     KRB5_KTNAME: '/opt/sas/cas.keytab'
   cfg:
     provlist: 'oauth.ext.kerb'

With this in place we can restart the SAS Cloud Analytic Services Controller to pick-up the changes.

Custom Group

If you attempted to test accessing SAS Cloud Analytic Services at this point from the Visual interfaces as an end-user you would see that they were not delegating credentials and the CAS session was not running as the end-user. The final step is to create a custom group in SAS Environment Manager. This custom group can be called anything, perhaps “Delegated Users”, but the ID for the group must be “CASHostAccountRequired“. Without this the CAS session will not be run as the end-user and delegated Kerberos credentials will not be used to launch the session.

Summary

What we have outlined in this article is the new feature of SAS Viya 3.3 that enables Kerberos delegation throughout the environment. It allows you to have end-user sessions in SAS Cloud Analytics Services that are able to use Kerberos to connect to Secured Hadoop. I hope you found this helpful.

SAS Viya 3.3 Kerberos Delegation was published on SAS Users.

8月 022016
 

One of the jobs of SAS Administrators is keeping the SAS license current.  In the past, all you needed to do was update the license for Foundation SAS and you were done. This task can be performed by selecting the Renew SAS Software option in the SAS Deployment Manager.

More recently, many SAS solutions require an additional step which updates the license information in metadata. The license information is stored in metadata so that middle-tier applications can access it in order to check whether the license is valid. Not all solutions require that the SAS Installation Data file (SID) file be stored in metadata, however the list of solutions that do require it is growing and includes SAS Visual Analytics. For a full list you can check this SASNOTE. To update the license information in metadata, run the SAS Deployment Manager and select Update SID File in Metadata.

Recently, I performed a license renewal for a Visual Analytics environment. A couple of days later it occurred to me that I might not have performed the update of the SID file in metadata. That prompted the obvious question: how do I check the status of my license file in metadata?

To check the status of a SAS Foundation license you can use PROC setinit. PROC setinit will return the details of the SAS license in the SAS log.

proc setinit;run;

steps to update your SAS License

The above output of PROC setinit shows the:

  • Expiration Date as 25MAY2017
  • Grace Period ends on 09JUL2917
  • Warning Period ends on 04SEP2017

This indicates that the software expires on 25MAY2017, however nothing will happen during the Grace Period. During the Warning Period messages in the SAS log will warn the user that the software is expiring. When the Warning Period ends on 04SEP2017 the SAS Software will stop functioning. PROC setinit is only checking the status of the Foundation SAS license, not the license in metadata.

If the foundation license is up-to-date but the license stored in metadata is expired the web applications will not work. It turns out SAS Environment Manager will also monitor the status of the SAS license. But is it the Foundation license or the license stored in metadata?

To see the status of the license in SAS Environment Manager, select Resources then select Browse > Platforms > SAS 9.4 Application Server Tier. The interface displays:

  • Days Until License Expiration:  the number of days until the license expires.
  • Days Until License Termination: the number of days until the software stops working.
  • Days Until License Termination Warning: the number of days until the Grace period.

steps to update your SAS License

Some testing revealed that Environment Manager is monitoring not the status of the foundation license but the status of the license in metadata. This is an important point, because as we noted earlier not all SAS solutions require the SID to be updated in metadata. Since Environment Manager monitors the license by checking the status of the SID file in metadata, administrators are recommended, as a best practice, to always update the SID file in metadata.

Environment manager with Service Architecture configured also will generate events that warn of license termination when the license termination date is within a month.

In addition, as of SAS 9.4 M3, SAS Management Console has an option to View metadata setinit details. To access this functionality you must be a member of the SAS Administrators Group or the Management Console: Advanced Role.

To check on a SID file in metadata open SAS Management Console and in the plug-ins tab:

1.     Expand Metadata Manager

2.     Select Metadata Utilities

3.     Right- click and select View metadata setinit details

steps to update your SAS License

Selecting the option gives details of the current SID file in metadata, with similar information as PROC setinit displays including the expiration date, the grace period and the warning period.  In addition it displays the date the SID file was last updated in metadata.

steps to update your SAS License

The takeaway: to fully renew SAS software, and ensure that SAS Environment Manager has the correct date for its metrics on license expiration, always use SAS Deployment Manager to both Update the SAS License, AND Update the SID File in Metadata.

To check if your SAS Deployment license has been fully updated, do the following:

1.     Run PROC setinit to view the status of the SAS Foundation license.

2.     Use SAS Management Console or SAS Environment Manager to check if the SID file has been updated in metadata.

For more information on this topic see the video, “Use SAS Environment Manager to Get SAS License Expiration Notice” and additional resources below:

 

SAS® Deployment Wizard and SAS® Deployment Manager 9.4:User’s Guide: Update SID File in Metadata
SAS® Deployment Wizard and SAS® Deployment Manager 9.4:User’s Guide: Renew SAS Software
SAS(R) 9.4 Intelligence Platform: System Administration Guide: Managing Setinit (License) Information in Metadata
SAS® Environment Manager 2.5 User’s Guide

tags: configuration, SAS Administrators, SAS architecture, SAS Environment Manager, SAS Professional Services

Two steps to update your SAS License and check if it is updated was published on SAS Users.

8月 202015
 

Everyone who codes with SAS knows what the SASWORK directory space is, and everyone who has ever managed a medium-large installation knows that you need to monitor this space to avoid a huge buildup of worthless disk usage.  One of the most common snarls happens when large SAS jobs go bust for one reason or another, and the work space does not get cleaned up properly.  Here’s a technique you can use, with the help of SAS Environment Manger, to get a proxy for the amount of disk space being used–it’s not perfect, but it’s better than being in the dark.

Before illustrating the technique, a little explanation is needed.  In SAS Environment Manager, you will find two types of SAS directories, both at the Server level:

  • SAS Config Level Directory 9.4, referring to the …/Lev1 directory
  • SAS Home Directory, referring to the …/SASHome/SASFoundation directory

The SASWORK directory is an additional Service level resource, underneath the SAS Home Directory, with the full name of:

<machine> SAS Home Directory 9.4 SAS work directory

WORK1

Further, the SASWORK directory (or, “work directory”) can be located anywhere on the machine–it’s always some place outside the physical hierarchies of SAS Config and SASHome.

Here we are interested in monitoring the work directory.  The problem is that the SAS EV agents are only able to scan and collect information about the disk volume where this work directory resides; they cannot get to the level of just the work directory by itself.  Therefore the metrics we can observe provide the amount of space being used on the entire disk volume on which the work directory resides.  We will use the metric called “Use Percent”–this is the same “Use Percent” metric that’s found in the alerts in the Service Architecture Framework:

WORK2

Despite this limitation, it’s still useful for our purposes to monitor this “work directory” object, so here’s how it’s done:

1.  Confirm the location and the resource for the SAS workspace.  On the main interface, logged in as a SAS administrator, select Resource->Browse->Services, then search on the string “work directory”.   Notice that there are two SAS work directories in this example, one on the compute01 machine, one on the meta01 machine, since this particular installation has two machines with base SAS installed:

WORK3

Here we select the compute01 machine by clicking on it.  The properties indicates the location of the SASWORK directory, which is /tmp :

WORK4

Note that Use Percent is one of the metrics, and also note the file system location:  /tmp on the Linux server.  You can confirm this if you like by opening a SAS session and running a SAS PROC OPTIONS.  The work directory is indicated below:

WORK5

2. Go to the Dashboard interface, and add a portlet of type “Metric Viewer” to the interface.  On the bottom of the right column, in the “Add Content to this column” portlet, choose the Metric Viewer option in the dropdown list and click on the plus “+” sign to add the new portlet.

WORK6

3. Click on the configuration button located at the top right corner of the new Metric Viewer portlet:

WORK7

4. Enter the following properties:

Description:               SASWORK Disk Volume

Resource Type:          – SAS Home Directory 9.4 SAS Directory

Metric:                         – Use Percent

Then at the bottom of the screen, select the Add to List button. Move the object called “compute01 SAS Home Directory 9.4 SAS work directory” to the right, using the arrow, and select OK. You should see this screen:

WORK8

5. Select the OK button, and you will see your new portlet with the “Use Percent” metric displayed:

WORK9

As stated earlier, this metric is imperfect because we are not getting the actual SASWORK space only but rather the work space plus the rest of the disk volume, but it’s better than nothing.  There are two potential solutions to this problem:

1.  It’s considered a best practice on a production site to create a separate disk volume to be used only for SASWORK–in that case the metric gives us the precise measure that we want.  In the case of Windows that would be a separate new drive letter (D:, E:, etc.)

2. It’s possible to use a resource type of “FileServer Directory Tree," point it at the physical SASWORK directory location, and get the total disk space being used, HOWEVER, this will not work unless the userID running the SAS EV agent has read permissions to all the subdirectories of the SASWORK area.  Each SAS user gets their own subdirectory within the SASWORK area, and each user is normally the only one that has directory read permissions to their own work area.  Therefore this solution would only work in a few unique cases, such as where the agent userID has specifically been given read permissions to the entire SASWORK directory.

tags: configuration, SAS Administrators, SAS Environment Manager, SAS Professional Services

Monitor your SASWORK directory from SAS Environment Manager was published on SAS Users.

7月 292015
 

SAS 9.4 M3 released in July 2015 with some interesting new features and functionality for platform SAS administrators.  In this blog I will review at a very high level the major new features. For details you can see the SAS 9.4 System Administration guide.

SAS 9.4 M3 includes a new release of SAS Environment Manager (2.5), and some nice new features.

Some highlights include: A federated data mart enables you to collect metric data in data marts for several SAS deployments and view the collected metric data in one place, log collection and discovery has been improved and support has been added for collecting metric data from a SAS grid. For details on these and a few additional enhancements to SAS Environment Manager see the SAS Environment Manager Users Guide.

The SAS Administration interface available in SAS Environment Manager is now an HTML5 interface and includes new metadata management capabilities including:

  • Server manager module which enables you to manage server definitions in metadata. For the current release, you can browse any type of server that has been defined in SAS metadata. You can create and edit definitions for SAS LASR Analytic Servers
  • Library manager module enables you to manage SAS library definitions in metadata. For the current release, you can browse any type of library that has been defined in SAS metadata. You can create and edit definitions for Base SAS libraries and SAS LASR Analytic Server libraries.
  • SAS Backup Manager graphical user interface (more on that later).

SAS Administrators EnvironmentManagerFor details of the new metadata management features see the SAS(R) Environment Manager 2.5 Administration: User’s Guide.

In support of Metadata Server clustering a new a new feature has been added to the Metadata Analyze and Repair Tools. Metadata Server Cluster Synchronization verifies that metadata is synchronized among all the nodes of a metadata server cluster.

The SAS Deployment Backup and Recovery tool has a number of new features. An exciting one is a new interface available on the Administration tab of SAS Environment Manager. The new interface supports scheduling, configuring, monitoring, and performing integrated backups. The interface incorporates most of the functions of the Deployment Backup and Recovery tool’s batch commands. For details of the using the new user interface see the SAS(R) Environment Manager 2.5 Administration: User’s Guide.

In addition to the new user interface, the tool has some additional enhancements. You can now:

  • include or exclude specific tiers, specific instances of the SAS Web Infrastructure Platform Data Server, or particular databases
  • reorganize metadata repositories during a backup
  • define filters that specify which subdirectories and files are to be included or excluded when backing up sub-directories within the configuration directory
  • specify additional (custom) directories within the configuration directory to be backed up.

Click here for more information on SAS Deployment Backup and Recovery tool.

When promoting content in 9.4 M3 you can use the -disableX11 option to run the batch import or export tool on UNIX without setting the DISPLAY variable. This removes a dependency on X11 for the batch export and import tools.

That is a really high level review of new features for platform administrators in SAS 9.4 M3. I hope you found this post helpful.

tags: configuration, SAS Administrators, SAS Environment Manager

Great new functionality for SAS Administrators in 9.4 M3 was published on SAS Users.

4月 272015
 

SAS recently performed testing using the Intel Cloud Edition for Lustre* Software - Global Support (HVM) available on AWS marketplace to determine how well a standard workload mix using SAS Grid Manager performs on AWS.  Our testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. You can find the detailed results in the technical paper, SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre.

In addition to the paper, Amazon will be publishing a post on the AWS Big Data Blog that will take a look at the approach to scaling the underlying AWS infrastructure to run SAS Grid Manager to meet the demands of SAS applications with demanding I/O requirements.  We will add the exact URL to the blog as a comment once it is published.

System design overview – network, instance sizes, topology, performance

For our testing, we set up the following AWS infrastructure to support the compute and IO needs for these two components of the system:

  • the SAS workload that was submitted using SAS Grid Manager
  • the underlying Lustre file system required to meet the clustered file system requirement of SAS Grid Manager.

SAS Grid Manager and Lustre shared file configuration on AWS clour

The SAS Grid nodes in the cluster are i2.8xlarge instances.  The 8xlarge instance size provides proportionally the best network performance to shared storage of any instance size, assuming minimal EBS traffic.  The i2 instance also provides high performance local storage, which is covered in more detail in the following section.

The use of an 8xlarge size for the Lustre cluster is less impactful since there is significant traffic to both EBS and the file system clients, although an 8xlarge is still is more optimal.  The Lustre file system has a caching strategy, and you will see higher throughput to clients in the case of frequent cache hits which effectively reduces the network traffic to EBS.

Steps to maximize storage I/O performance

The shared storage for SAS applications needs to be high speed temporary storage.  Typically temporary storage has the most demanding load.  The high I/O instance family, I2, and the recently released dense storage instance, D2, provide high aggregate throughput to ephemeral (local) storage.  For the SAS workload tested, the i2.8xlarge has 6.4 TB of local SSD storage, while the D2 has 48 TB of HDD.

Throughput testing and results

We wanted to achieve a throughput of least 100 MB/sec/core to temporary storage, and 50-75 MB/sec/core to shared storage.  The i2.8xlarge has 16 cores (32 virtual CPUs, each virtual CPU is a hyperthread on a core, and a core has two hyperthreads).  Testing done with lower level testing tools (fio and a SAS tool, iotest.sh)  showed a throughput of about 3 GB/sec to ephemeral (temporary) storage and about 1.5 GB/sec to shared storage.  The shared storage performance does not take into account file system caching, which Lustre does well.

This testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. For full details of the testing configuration and results, please see the SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre technical white paper.

 

tags: cloud computing, configuration, grid, SAS Administrators

The post Can I run SAS Grid Manager in the AWS cloud? appeared first on SAS Users.

4月 222015
 

SAS System software supports a wide variety architecture and deployment possibilities. It’s wild when you think about it because you can scale the analytic power of SAS from the humblest single CPU laptop machine all the way up to hundreds-of-machines clusters.

When SAS deployments involve many machines, it’s natural to look for time- and effort-saving options that simplify the initial installation as well as ongoing administration. Electing to employ a shared SAS configuration directory is one of those options. But what does that even mean?
Deploying SAS with a shared configuration directory is always optional. It’s not a technical requirement in any sense. But there are times when it’s really nice to have and SAS does support it in the proper circumstances.  Here are some tips on when to take advantage of shared configuration capabilities.

First, you need file-sharing technology

To create a shared configuration directory, we must first set up a way to share a single physical directory with multiple machines. A shared file system is one physical storage location that is

  • visible to (mounted on) multiple host machines
  • accessible to SAS on each machine by the same directory path.

There are many ways to accomplish this. The simplest place to start in UNIX (and Linux) environments is to define a shared filesystem using Network Attached Storage (or NAS) technology. An NAS-mounted filesystem essentially leverages the computer’s built-in networking ability to share one machine’s local disk such that it’s accessible to multiple machines.

This is fine for a proof-of-concept or small development/test deployment, but for a large production environment, chances are you will want to invest in a more robust and scalable technology. A Storage Area Network (or SAN) is a dedicated, resilient and highly available storage solution with faster connectivity than the standard network interfaces leveraged by NAS. There’s a lot more to shared filesystems than just NAS and SAN, but that’s a topic well covered elsewhere. Visit the SAS Support web site for Scalability and Performance Papers to view the SAS Technical Paper: A Survey of Shared File Systems.

Identify which SAS configuration directory to share

Next, we need to identify which SAS configuration directory to share. And that’s going to depend on your SAS server topology. Let’s begin with the standard SAS Enterprise Business Intelligence platform, which is a common building block for most SAS deployments. Here we’ve got three major service tiers:

  • Metadata
  • Compute (Workspace, Stored Process, OLAP, etc.)
  • Middle (Web)

For performance, efficiency, and availability purposes, we’ve elected to place each of those service tiers into their own set of host machines. That is, we’re going to physically separate those logical tiers by their function:

Diagram showing SAS deployment defined as compute tier, metadata tier and middle tier.

The graphic below shows the necessary deployment steps described by the Planning Application when we choose the topology above from the SAS Deployment Wizard (or SDW):

Output from Planning ApplicationThe takeaway here: separating the tiers in this way means that each tier will have its own configuration directory.  If you choose a multiple machine topology, then on each tier, you must:

  • run the SDW
  • select a configuration directory that is not shared with any other tier

Avoid this wrong turn!

It’s important to heed this advice:  when you’ve chosen a plan with separated tiers, then you must not allow those distinct tiers to write to the same configuration directory.

The SDW warns you if you try to do it:

Warning from SDW that configuration already exists.

But if you ignore the warning, the SDW will successfully deploy the software for the first as well as the subsequent tiers. SAS services will successfully startup and validate. Everything will appear to work – except for one major problem: the SAS Deployment Registry is overwritten with each new configuration deployment.

That means that in the future, installers for migration, hotfixes and maintenance updates will not be able to see all of the details of the full deployment – only the information for that last SDW configuration is retained. When that day comes, it will create a major headache for support purposes.

Configuring the Compute Tier on a shared directory—an example

Notice that up to this point, we’ve been talking about how the configuration directory must be deployed by tier, not by host machine. Each tier has its own considerations, but the Compute Tier is where we can share the configuration directory across multiple machines.

The Compute Tier can consist of one or more machines.  It’s very scalable both vertically and horizontally. For some deployments, there could be dozens, even hundreds, of machines in the SAS Compute Tier. In those circumstances, we don’t want to deploy a separate configuration for each one if we don’t have to, so let’s zoom in on the Compute Tier. In this diagram, we have seven different host machines of varying sizes – all run the same OS version and the same release of SAS. It will save us a lot of installation, configuration, and administration time if they all share a common configuration directory.

Compute Tier comprising seven machines of different sizes

When we run the configuration portion of the SAS Deployment Wizard for the Compute tier, we provide the shared file system’s directory path (in the diagram above, that’s /compute/config). And we only need to run the SDW configuration one time. After configuration is complete, all of the SAS configuration files you’re familiar with are visible and accessible by all machines of the Compute Tier. So with a single deployment run of the SDW, all of the machines in the Compute Tier have access to the same configuration.  So what are the benefits?

  • From a SAS installer’s perspective, it’s great not having to run the SDW for configuration on each and every host of the Compute Tier.
  • For the SAS administrator who is charged with daily operations and maintenance, a shared configuration means that making a change in one place is available to all intended machines.
  • Further, when it comes time to deploy hot fixes or maintenance updates, the installation tools also need to run only once for this shared configuration directory.

Finishing the configuration

There is some additional follow-through necessary, depending on your SAS release:

  • For SAS 9.4 M1 and earlier releases of SAS, some additional configuration work was required. Certain operational and log files were generically named and if those filenames were not changed, then there be file-locking conflicts as processes on different host machines attempted to write to the same physical file. The procedure is to modify certain scripts to insert variables into the filename references which would then ensure each host machine was writing to its own unique files on the shared filesystem.
  • Beginning with SAS 9.4 M2, these manual edits of executable files are no longer required.  Filename references now include the hostname by default so everything plays nicely in a shared configuration environment. Yay!

For any release of SAS, you must also make manual changes to the SAS metadata. At this point in the process, you have only deployed a single configuration directory, you have not yet informed the overall SAS deployment of how many server machines are participating in the Compute Tier. Follow the steps provided in the SAS® 9.4 Intelligence Platform: Application Server Administration Guide for Creating Metadata for Load-Balancing Clusters.

Configuring the Metadata Tier and Middle Tier

If you’ve decided to deploy a SAS Metadata Server cluster to ensure high-availability of your metadata services, then you must deploy at least three installations of the SAS Metadata Server. Each of those installations will have its own dedicated configuration directory – they do not share! The only thing shared between the nodes of a metadata cluster is the common network-mounted directory for metadata backups (not shown here).

Diagram of configuration files for the Metadata Tier

The same holds true if you choose to cluster the SAS Web Application Server. Let’s say you will deploy a horizontal two-node cluster of your SAS Web Application Servers that will be load-balanced by the SAS Web Server. Each node of that web app server cluster will have its own configuration directory – they do not share either!

Diagram of shared configuration for the Middle Tier

The point is, each of those cluster nodes (for meta and middle) requires their own configuration deployment. Now aren’t you glad we can perform just one configuration deployment in the Compute Tier to share the configuration directory for any number of machines participating there!

Takeaways

In this discussion, we have learned:

  • A SAS configuration directory can be shared across multiple machines in the logical Compute tier (as we have it defined separately from the Metadata and Middle tiers) – saving initial deployment effort as well as ongoing administration and maintenance effort
  • Clusters of SAS Metadata Servers should not share a configuration directory
  • Clusters of SAS middle-tier services should not share a configuration directory
  • Do not use the SAS Deployment Wizard to deploy a new configuration on top of another one in the same directory
  • Some shared filesystem technologies are better suited for supporting SAS I/O patterns than others – so choose wisely.  This list of  Scalability and Performance Papers can help.
tags: configuration, SAS Administrators, SAS Professional Services

The post Deploying SAS software--save time and effort with shared configuration appeared first on SAS Users.