configuration

3月 222019
 

As data volumes continue to surge (with no signs of slowing down), the cost to handle all that data can have a very real impact on IT budgets.

But for all the talk of the growing value of data, the cost side of the equation is often overlooked. After all, it not only takes a lot of drives to hold all that data, it requires significant computing resources to handle it. The costs quickly add up, forcing IT leaders to make tough choices about where to make their IT investments.

This is as true for SAS users as it is for any other IT organization. We work closely with our customers and know that the cost of our analytics software is only part of the budget puzzle – total cost of ownership (TCO) casts a long shadow over their budget decisions. So we were as curious as they were when Intel released its Intel® OptaneTM DC Persistent Memory.

What is persistent memory?

Data written to persistent memory remains accessible using memory instructions even after the process that created or last modified them is gone. In this aspect, persistent memory is like a hard disk drive (HDD). But the speed at which the data can be accessed from persistent memory is considerably faster than the speed delivered by an HDD.

Of course, as a longtime Intel collaborator, we worked closely together to optimize SAS software for this innovation. But how would it work in real-life applications outside of the labs? Would the Intel Optane DC Persistent Memory - SAS® Viya® combination have a real impact, or would it just represent yet another incremental improvement – nice, but not enough to change the IT investment-performance equation?

Testing confirms performance

There was only one way to find out, of course: Test it.

So that’s what we did. In collaboration with Intel, we ran SAS Viya 3.4 on Intel Optane DC Persistent Memory. More specifically, the testing compared performance of a two-socket second-generation Intel® Xeon® Platinum 8280 processor with DRAM to a two-socket second-generation Intel Xeon Platinum 8280 processor with Intel Optane DC persistent memory. (See editor's note below for details.)

We didn’t hold back, either – we ran multiple instances of a large model containing 400 gigabytes of data, concurrently. All these jobs finished in a matter of minutes. Just as important, the costs of achieving performance in a persistent memory environment with SAS Viya, when compared to DRAM and SAS Viya, make their own compelling case.

Intel Optane persistent memory achieves more than 95 percent on-par performance with DRAM, at a lower cost per performance rate of approximately 18 percent. With second-generation Intel Xeon Scalable Processors and Intel Optane DC persistent memory in Memory Mode, it’s clear that users can take advantage of the benefits of systems with larger capacity memory, with little or no performance degradation.

Greater memory capacity = substantial cost savings

But performance may not even be the most interesting part of this story – not when you compare it to the cost benefits of running in this environment. In short, compared with DRAM – equal capacity memory, equal CPU, and so on – using Intel Optane DC Persistent Memory leads to a reduction in costs of 20 percent or more. That’s a tangible, meaningful impact for any IT budget, allowing CIOs to invest more heavily to address other pressing needs. By contrast, system costs increase dramatically with higher DRAM capacity.

Plus, it may be possible in the future to use the memory module in different modes – an opportunity we’re currently testing in non-volatile random-access memory (NVRAM) mode with the goal of allowing applications to take advantage of this benefit.

How does this combination of persistent memory tools and SAS Viya contribute to faster speeds and lower costs? The massive scale of data typically handled by SAS customers can be handled by fewer servers, because each can now have a much greater memory capacity. This equation can translate into a significant reduction in the total cost of ownership.

There are a lot of factors, many of which are unique to your organization, that go into reducing the total cost of ownership – so it goes without saying that your mileage may vary. Regardless, this is a conversation your organization needs to be having right now, if it hasn’t already. We’re happy to answer your questions about running SAS Viya together with Intel Optane DC Persistent Memory.

Want a deeper dive? | See the video on Intel Optane DC Persistent Memory

Editor's Note: Specs for this test included:

  • SAS® Viya® In-memory Analytics: SAS® Viya 3.4 VDMML application.
  • Workload: 3 concurrent logistic regression tasks each running on 400GB datasets.
  • Testing by: Intel and SAS completed on February 15, 2019.
  • Baseline hardware for comparison: 2S Intel® Xeon® Platinum 8280 processor, 2.7GHz, 28 cores, turbo and HT on, BIOS SE5C620.86B.0D.01.0286.011120190816, 1536GB total memory, 24 slots / 64GB / 2666 MT/s / DDR4 LRDIMM, 1x 800GB, Intel SSD DC S3710 OS Drive + 1x 1.5TB Intel Optane SSD DC P4800X NVMe Drive for CAS_DISK_CACHE + 1x 1.5TB Intel SSD DC P4610 NVMe Drive for application data, CentOS Linux* 7.6 kernel 4.19.8.
  • New hardware tested: 2S Intel® Xeon® Platinum 8280 processor, 2.7GHz, 28 cores, turbo and HT on, BIOS SE5C620.86B.0D.01.0286.011120190816, 1536GB Intel Optane DCPMM configured in Memory Mode(8:1), 12 slots / 128GB / 2666 MT/s, 192GB DRAM, 12 slots / 16GB / 2666 MT/s DDR4 LRDIMM, 1x 800GB, Intel SSD DC S3710 OS Drive + 1x 1.5TB Intel Optane SSD DC P4800X NVMe Drive for CAS_DISK_CACHE + 1x 1.5TB Intel SSD DC P4610 NVMe Drive for application data, CentOS Linux* 7.6 kernel 4.19.8.)

Persistent memory: one way to rein in IT costs was published on SAS Users.

3月 072019
 

As of December 2018, any customer with a valid SAS Viya order is able to package and deploy their SAS Viya software in Docker containers. SAS has provided a fully documented and supported project (or “recipe”) for easily building these containers. So how can you start? You can simply stop reading this article and go directly to the GitHub repository and follow the instructions there. Otherwise, in this article, Jeff Owens, a solutions architect at SAS, provides a little color commentary around the process in case it is helpful…

First of all, what is the point of these containers?

Well, at its core, remember SAS and it’s massively parallel, in-memory counterpart, Cloud Analytic Services (CAS) is a powerful runtime for data processing and analytics. A runtime simply being an engine responsible for processing and executing a particular type of code (i.e. SAS code). Traditionally, the SAS runtime would live on a centralized server somewhere and users would submit their “jobs” to that SAS runtime (server) in a variety of ways. The SAS server supports a number of different products, tasks, etc. – but for this discussion let’s just focus on the scenario where a job here is a “.sas” file, perhaps developed in an IDE-like Enterprise Guide or SAS Studio, and submitted to the SAS runtime engine via the IDE itself, a bash shell, or maybe even SAS’ enterprise grade scheduler and job management solution – SAS Grid. In these cases, the SAS and CAS servers are on dedicated, always-on physical servers.

The brave new containerized world in which we live provides us a new deployment model: submit the job and create the runtime server at the same time. Plus, only consume the exact resources from the host machine or the Kubernetes cluster the specific job requires. And when the job finishes, release those resources for others to use. Kubernetes and PaaS clusters are quite likely shared environments, and one of the major themes in the rise of the containers is the further abstraction between hardware and software. Some of that may be easier said than done, particularly for customers with very large volumes of jobs to manage, but it is indeed possible today with SAS Viya on Docker/Kubernetes.

Another effective (and more immediate) usage of this containerized version of SAS Viya is simply an adhoc, on-demand, temporary development environment. The container package includes SAS Studio, so one can quickly spin up a full SAS Viya programming sandbox – SAS Studio as well as the SAS & CAS runtimes. Here they can develop and test SAS code, and just as quickly tear the environment down when no longer needed. This is useful for users that: (a) don’t have access to an “always-on” environment for whatever reason, (b) want to try out experimental code that could potentially consume resources from a shared "always-on" sas environment, and/or (c) maybe their Kubernetes cluster has many more resources available than their always-on and they want to try a BIG job.

Yes, it is possible to deploy the entire SAS Viya stack (microservices and all) via Kubernetes but that discussion is for another day. This post focuses strictly on the SAS Viya programming components and running on a single machine Docker host rather than a Kubernetes cluster.

Build the container image

I will begin here with a fresh single machine RHEL 7.5 server running on Openstack. But this machine could have been running on any cloud or VM platform, and I could use any (modern enough) flavor of Linux thanks to how Docker works. My machine here has 8cpu, 16GB RAM, and a 50GB root volume. Less or more is fine. A couple of notes to help understand how to configure an instance:

  • The final docker container image we will end up with will be ~10GB in size and like all docker images will live in /var/lib/docker/images by default.
    • Yes, that is large for a container. Most of this size is just static bins and libs that support the very developed SAS language. Compare to an Anaconda image which is ~3.6GB.
  • As for RAM, remember any tables loaded to CAS are loaded to memory (and will swap to disk as needed). So, your memory choice should be directly dependent on the data sizes you expect to work with.
  • Similar story for cores – CAS code is multithreaded, so more cores = more parallelization.

The first step is to install Docker.

Following along with sas-container-recipes now, the first thing I should do is mirror the repo for my order. Note, this is not a required step – you could build this container directly from SAS repos if you wanted, but we’ll mirror as a best practice. We could simply mirror and serve it over the local filesystem of our build host, but since I promised color I’ll serve it over the web instead. So, these commands run on a separate RHEL server. If you choose to mirror on your build host, make sure you have the disk space (~30GB should be plenty). You will also need your SAS_Viya_deployment_data.zip file available on the SAS Customer Support site. Run the following code to execute the setup.

$ wget https://support.sas.com/installation/viya/34/sas-mirror-manager/lax/mirrormgr-linux.tgz
$ tar xf mirrormgr-linux.tgz
$ rm -f mirrormgr-linux.tgz
$ mkdir -p /repos/viyactr
$ mirrormgr mirror –deployment-data SAS_Viya_deployment_data.zip –path /repos/viyactr –platform x64-redhat-linux-6 –latest
$ yum install httpd -y
$ system start httpd
$ systemctl enable httpd
$ ln -s /repos/viyactr /var/www/html/sas_repo

Next, I go ahead and clone the sas-containers-recipes repo locally and upload my SAS-Viya-deployment-data.zip file and I am ready to run the build command. As a bonus, I am also going to use my site’s (SAS’) sssd.conf file so my container will use our corporate Active Directory for authentication. If you do not need or want that integration you can skip the “vi addons/sssd.conf” line and change the “--addons” option to “addons/auth-demo” so your container seeds with a single “sasdemo:sasdemo” user:password instead.

$ # upload SAS_Viya_deployment_data.zip to this machine somehow
$ Git clone https://github.com/sassoftware/sas-container-recipes.git
$ cd sas-container-recipes/
$ vi addons/sssd.conf # <- paste in your site’s sssd.conf file
$ build.sh \
--type single \
--zip ~/SAS_Viya_deployment_data.zip \
--mirror-url http://jo.openstack.sas.com/sas_repo \
--addons “addons/auth-sssd”

The build should take about 45 minutes and produce a single container image for you (there might be a few images, but it is just one with a thin layer or two on top). You might want to give this image a new name (docker tag) and push it into your own private registry (docker push). Aside from that, we are ready to run it.
If you are curious, look in the addons directory for the other optional layers you can add to your container. Several tools are available for easily configuring connections to external databases.

Run the container

Here is the run command we can use to launch the container. Note the image name I use here is “sas-viya-programming:xxxxxx” – this is the image that has my sssd layer built on top of it.

$ docker run \
--detach \ 
--rm \ 
--env CASENV_CAS_HOST=$(hostname -f) \ 
--env CASENV_CAS_VIRTUAL_PORT=8081 \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Connect to the container

And now, in a web browser, I can go to :8081/SASStudio and I will end up in SAS Studio, where I can sign in with my internal SAS credentials. To stop the container, use the name you gave it: “docker stop sas-viya-programming”. Because we used the “--rm” flag the container will be removed (completely destroyed) when we stop it.

Note we are explicitly mapping in the HTTP port (8081:80) so we easily know how to get to SAS Studio. If you want to start up another container here on the same host, you will need to use a different port or else you’ll get an address already in use error. Also note we might be interested in connecting directly to this CAS server from something other than SAS Studio (localhost). A remote python client for example. We can use the other port we mapped in (5570:5570) to connect to the CAS server.

Persist the data

Running this container with the above command means anything and everything done inside the container (configuration changes, code, data) will not persist if the container stops and a new one started later. Luckily this is a very standard and easy to solve scenario with Docker and Kubernetes. Here are a couple of targets inside the container you might be interested in mounting a volume to:

  • /tmp – this is where CAS_DISK_CACHE is by default, not to mention SASWORK. Those are scratch space used by the runtimes. If you are working with small data and don’t care too much about performance, no need to worry about this. But to optimize your container we would suggest mounting a Docker volume to this location (or, ideally, bind mount a high-performance storage device here). Note that generally Docker prefers us to use Docker volumes in lieu of bind mounts, but that is more for manageability, security, and portability than performance.
  • /data – this directory doesn’t necessarily exist but when you mount a volume into a container the target location will be created. So, you could call this target whatever you want, assuming it doesn’t exist yet.  Bind mounting is tempting here and OK to do but consider the scenario when another user wants to run your container following instructions you provided them – better to use a Docker volume than force them to create the directory on the host.  If you have an NFS location, bind mounting that makes sense
  • /code – same spiel as with /data. Once you are in the container you can save your work and it will persist in the docker volume from run to run of your container.

Here is what an updated docker run command might look like with these volumes included:

$docker run \ 
--detach \ 
-rm \ 
--env CASNV_CAS_VIRTUAL_HOST=$(hostname -f) \ 
--env CASNV_CAS_VIRTUAL_PORT=8081 \ 
--volume mydata:/data \ 
--volume /nfsdata:/nfsdata \ # example syntax for bind mount instead of docker volume mount 
--volume mycode:/code \ 
--volume sastmp:/tmp \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Can I run this on my laptop?

Yes. You would just need to install Docker on your laptop (go to docker.com for that). You can certainly follow the instructions from the top to build and run locally. You can even push this container image out to an internal registry so other users could skip the build and just run.

So far, we have only talked about the “ad-hoc” or “sandbox” dev type of use case for this container. A later article may cover how to run in batch mode or maybe we will move straight to multi-containers & Kubernetes. In the meantime though, here is how to submit a .sas program as a batch job to this single container we have built.

Give it a try!

Try creating your own image and deploying a container. Feel free to comment on your experience.

More info:

SAS Communities Article- Running SAS Analytics in a Docker container
SAS Global Forum Paper- Docker Toolkit for Data Scientists – How to Start Doing Data Science in Minutes!
SAS Global Forum Tech Talk Video- Deploying and running SAS in Containers

Getting Started with SAS Containers was published on SAS Users.

3月 302018
 

As a follow on from my previous blog post, where we looked at the different use cases for using Kerberos in SAS Viya 3.3, in this post I want to delve into more details on configuring Kerberos delegation with SAS Viya 3.3. SAS Viya 3.3 supports the use of Kerberos delegation to authenticate to SAS Logon Manager and then use the delegated credentials to access SAS Cloud Analytic Services. This was the first use case we illustrated in the previous blog post.

As a reminder this is the scenario we are discussing in this blog post:

Kerberos Delegation

In this post we’ll examine:

  • The implications of using Kerberos delegation.
  • The prerequisites.
  • How authentication is processed.
  • How to configure Kerberos delegation.

Why would we want to configure Kerberos delegation for SAS Viya 3.3? Kerberos will provide us with a strong authentication mechanism for the Visual interfaces, SAS Cloud Analytic Services, and Hadoop in SAS Viya 3.3. With Kerberos enabled, no end-user credentials will be sent from the browser to the SAS Viya 3.3 environment. Instead Kerberos relies on a number of encrypted tickets and a trusted third party to provide authentication. Equally, leveraging Kerberos Delegation means that both the SAS Cloud Analytic Services session and the connection to Hadoop will all be running as the end-user. This better allows you to trace operations to a specific end-user and to more thoroughly apply access controls to the end-user.

Implications

Configuring Kerberos delegation will involve configuring Kerberos authentication for both the Visual interfaces and SAS Cloud Analytic Services. First, we’ll look at the implications for the Visual interfaces.

Once we configure Kerberos for authentication of SAS Logon Manager it replaces the default LDAP provider for end-users. This means that the only way for end-users to authenticate to SAS Logon Manager will be with Kerberos. In SAS Viya 3.3 there is no concept of fallback authentication.

Kerberos will be our only option for end-user authentication and we will be unable to use the sasboot account to access the environment. Configuring Kerberos authentication for SAS Logon Manager will be an all-or-nothing approach.

While the web clients will be using Kerberos for authentication, any client using the OAuth API directly will still use the LDAP provider. This means when we connect to SAS Cloud Analytic Services from SAS Studio (which does not integrate with SAS Logon) we will still be obtaining an OAuth token using the username and password of the user accessing SAS Studio.

If we make any mistakes when we configure Kerberos, or if we have not managed to complete the prerequisites correctly, the SAS Logon Manager will not start correctly. The SAS Logon Manager bootstrap process will error and SAS Logon Manager will fail to start. If SAS Logon Manager fails to start then there is no way to gain access to the SAS Viya 3.3 visual interfaces. In such a case the SAS Boot Strap configuration tool must be used to repair or change the configuration settings. Finally, remember using Kerberos for SAS Logon Manager does not change the requirement for the identities microservice to connect to an LDAP provider. Since the identities microservice is retrieving information from LDAP about users and groups we need to ensure the username part of the Kerberos principal for the end-users match the username returned from LDAP. SAS Logon Manager will strip the realm from the user principal name and use this value in the comparison.

Then considering SAS Cloud Analytic Services, we will be adding Kerberos to the other supported mechanisms for authentication. We will not replace the other mechanisms the way we do for SAS Logon Manager. This means we will not prevent users from connecting with a username and password from the Programming interfaces. As with the configuration of SAS Logon Manager, issues with the configuration can cause SAS Cloud Analytic Services to fail to start. Therefore, it is recommended to complete the configuration of SAS Cloud Analytic Services after the deployment has completed and you are certain things are working correctly.

Prerequisites

To be able to use Kerberos delegation with SAS Viya 3.3 a number of prerequisites need to be completed.

Service Principal Name

First a Kerberos Service Principal Name (SPN) needs to be registered for both the HTTP service class and the sascas service class. This will take the form <service class>/<HOSTNAME>, where the <HOSTNAME> is the value that will be used by clients to request a Kerberos Service Ticket. In most cases for HTTP the <HOSTNAME> will just be the fully qualified hostname of the machine where the Apache HTTP Server is running. If you are using aliases or alternative DNS registrations then finding the correct name to use might not be so straight forward. For SAS Cloud Analytic Services, the <HOSTNAME> will be the CAS Controller hostnameNext by registering we mean that this Service Principal Name must be provided to the Kerberos Key Distribution Center (KDC). If we are using Microsoft Active Directory, each SPN must be registered against an object in the Active Directory database. Objects that can have a SPN registered against them are users or computers. We recommend using a user object in Active Directory to register each SPN against. We also recommend that different users are used for HTTP and CAS.

So, we have two service accounts in Active Directory and we register the SPN against each service account. There are different ways the SPN can be registered in Active Directory. The administrator could perform these tasks manually using the GUI, using an LDAP script, PowerShell script, using the setspn command, or using the ktpass command. Using these tools multiple SPNs can be registered against the service account, which is useful if there are different hostnames the end-users might use to access the service. In most cases using these tools will only register the SPN; however, using the ktpass command will also change the User Principal Name for the service account. More on this shortly.

Alternatively, to Microsoft Active Directory customers could be using a different Kerberos KDC. They could use MIT Kerberos or Heimdal Kerberos. For these implementations of Kerberos there is no difference between a user and a service. The database used by these KDCs just stores information on principals and does not provide a distinction between a User Principal Name and a Service Principal Name.

Trusted for Delegation

For the Kerberos authentication to be delegated from SAS Logon Manager to SAS Cloud Analytic Services and then from SAS Cloud Analytic Services to Secured Hadoop, the two service accounts that have the SPNs registered against them must be trusted for delegation. Without this the scenario it will not work. You can only specify that an account is trusted for delegation after the Service Principal Name has been registered. The option is not available until you have completed that step. The picture below shows an example of the delegation settings in Active Directory.

If the Secured Hadoop environment is configured using a different Kerberos Key Distribution Center (KDC) to the rest of the environment it will not prevent the end-to-end scenario working. However, it will add further complexity. You will need to ensure there is a cross-realm trust configured to the Hadoop KDC for the end-to-end scenario to work.

Kerberos Keytab

Once you have registered each of the SPNs you’ll need to create a Kerberos keytab for each service account. Again, there are multiple tools available to create the Kerberos keytab. We recommend using the ktutil command on Linux, since this is independent of the KDC and makes no changes to the Kerberos database when creating the keytab. Some tools like ktpass will make changes when generating the keytab.

In the Kerberos keytab we need to have the User Principal Name (UPN) and associated Kerberos keys for that principal. The Kerberos keys are essentially encrypted versions of the password for the principal. As we have discussed above, about the SPN, depending on the tools used to register it the UPN for the Kerberos keytab could take different forms.

When using ktpass to register SPN and create the keytab in a single step the UPN of the account in Active Directory will be set to the same value as the SPN. Whilst using the setspn command or performing the task manually will leave the UPN unchanged. Equally for MIT Kerberos or Heimdal Kerberos, since there is no differentiation between principals the UPN for the keytab, will be the SPN registered with the KDC.

Once the Kerberos keytabs have been created they will need to be made available to any hosts with the corresponding service deployed.

Kerberos Configuration File

Finally, as far as prerequisites are concerned we might need to provide a Kerberos configuration file for the host where SAS Logon Manager is deployed. This configuration should identify the default realm and other standard Kerberos settings. The Kerberos implementation in Java should be able to use network queries to find the default realm and Kerberos Key Distribution Center. However, if there are issues with the network discovery, then providing a Kerberos configuration file will allow us to specify these options.

The Kerberos configuration file should be placed in the standard location for the operating system. So on Linux this would be /etc/krb5.conf. If we want to specify a different location we can also specify a JVM option to point to a different location. This would be the java.security.krb5.conf option. Equally, if we cannot create a Kerberos configuration file we could set the java.security.krb5.realm and java.security.krb5.kdc options to identify the Kerberos Realm and Kerberos Key Distribution Center. We’ll show how to set JVM options below.

Authentication Process

The process of authenticating an end-user is shown in the figure below:

Where the steps are:

A.  Kerberos used to authenticate to SAS Logon Manager. SAS Logon Manager uses the Kerberos Keytab for HTTP/<HOSTNAME> to validate the Service Ticket. Delegated credentials are stored in the Credentials microservice.
B.  Standard internal OAuth connection to SAS Cloud Analytic Services. Where the origin field in the OAuth token includes Kerberos and the claims include the custom group ID “CASHOSTAccountRequired”.
C.  The presence of the additional Kerberos origin causes SAS Cloud Analytic Services to get the CAS client to make a second connection attempt using Kerberos. The Kerberos credentials for the end-user are obtained from the Credentials microservice. SAS Cloud Analytic Services Controller uses the Kerberos Keytab for sascas/<HOSTNAME> to validate the Service Ticket and authenticate the end-user. Delegated credentials are placed in the end-user ticket cache.
D.  SAS Cloud Analytic Services uses the credentials in the end-user ticket cache to authenticate as the end-user to the Secured Hadoop environment.

Configuration

Kerberos authentication must be configured for both SAS Logon Manager and SAS Cloud Analytic Services. Also, any end-user must be added to a new custom group.

SAS Logon Manager Configuration

SAS Logon Manager is configured in SAS Environment Manager.

Note: Before attempting any configuration, ensure at least one valid LDAP user is a member of the SAS Administrators custom group.

The configuration settings are within the Definitions section of SAS Environment Manager. For the sas.logon.kerberos definition you need to set the following properties:

For more information see the

SAS Logon Manager will need to be restarted for these new JVM options to be picked up. The same method can be used to set the JVM options for identifying the Kerberos Realm and KDC where we would add the following:

  • Name = java_option_krb5realm
  • Value = -Djava.security.krb5.realm=<REALM>
  • Name = java_option_krb5kdc
  • Value = -Djava.security.krb5.kdc=<KDC HOSTNAME>

Or for setting the location of the Kerberos configuration file where we would add:

  • Name = java_option_krb5conf
  • Value = -Djava.security.krb5.conf=/etc/krb5.conf

SAS Cloud Analytic Services Configuration

The configuration for SAS Cloud Analytic Services is not performed in SAS Environment Manager and is completed by changing files on the file system. The danger of changing files on the file system is that re-running the deployment Ansible playbook might overwrite any changes you make. The choices you have is to either remake any changes to the file system, make the changes to both the file system and the playbook files, or make the changes in the playbook files and re-run the playbook to change the file system. Here I will list the changes in both the configuration files and the playbook files.

There is only one required change and then two option changes. The required change is to define the authentication methods that SAS Cloud Analytic Services will use. In the file casconfig_usermods.lua located in:

/opt/sas/viya/config/etc/cas/default

Add the following line:

cas.provlist = 'oauth.ext.kerb'

Note: Unlike the SAS Logon Manager option above, this is separated with full-stops!

In the same file we can make two optional changes. These optional changes enable you to override default values. The first is the default Service Principal Name that SAS Cloud Analytic Services will use. If you cannot use sascas/<HOSTNAME> you can add the following to the casconfig_usermods.lua:

-- Add Env Variable for SPN
env.CAS_SERVER_PRINCIPAL = 'CAS/HOSTNAME.COMPANY.COM'

This sets an environment variable with the new value of the Service Principal Name. The second optional change is to set another environment variable. This will allow you to put the Kerberos Keytab in any location and call it anything. The default name and location is:

/etc/sascas.keytab

If you want to put the keytab somewhere else or call it something else add the following to the casconfig_usermods.lua

-- Add Env Variable for keytab location
env.KRB5_KTNAME = '/opt/sas/cas.keytab'

These changes can then be reflected in the vars.yml within the playbook by adding the following to the CAS_CONFIGURATION section:

CAS_CONFIGURATION:
   env:
     CAS_SERVER_PRINCIPAL: 'CAS/HOSTNAME.COMPANY.COM'
     KRB5_KTNAME: '/opt/sas/cas.keytab'
   cfg:
     provlist: 'oauth.ext.kerb'

With this in place we can restart the SAS Cloud Analytic Services Controller to pick-up the changes.

Custom Group

If you attempted to test accessing SAS Cloud Analytic Services at this point from the Visual interfaces as an end-user you would see that they were not delegating credentials and the CAS session was not running as the end-user. The final step is to create a custom group in SAS Environment Manager. This custom group can be called anything, perhaps “Delegated Users”, but the ID for the group must be “CASHostAccountRequired“. Without this the CAS session will not be run as the end-user and delegated Kerberos credentials will not be used to launch the session.

Summary

What we have outlined in this article is the new feature of SAS Viya 3.3 that enables Kerberos delegation throughout the environment. It allows you to have end-user sessions in SAS Cloud Analytics Services that are able to use Kerberos to connect to Secured Hadoop. I hope you found this helpful.

SAS Viya 3.3 Kerberos Delegation was published on SAS Users.

8月 022016
 

One of the jobs of SAS Administrators is keeping the SAS license current.  In the past, all you needed to do was update the license for Foundation SAS and you were done. This task can be performed by selecting the Renew SAS Software option in the SAS Deployment Manager.

More recently, many SAS solutions require an additional step which updates the license information in metadata. The license information is stored in metadata so that middle-tier applications can access it in order to check whether the license is valid. Not all solutions require that the SAS Installation Data file (SID) file be stored in metadata, however the list of solutions that do require it is growing and includes SAS Visual Analytics. For a full list you can check this SASNOTE. To update the license information in metadata, run the SAS Deployment Manager and select Update SID File in Metadata.

Recently, I performed a license renewal for a Visual Analytics environment. A couple of days later it occurred to me that I might not have performed the update of the SID file in metadata. That prompted the obvious question: how do I check the status of my license file in metadata?

To check the status of a SAS Foundation license you can use PROC setinit. PROC setinit will return the details of the SAS license in the SAS log.

proc setinit;run;

steps to update your SAS License

The above output of PROC setinit shows the:

  • Expiration Date as 25MAY2017
  • Grace Period ends on 09JUL2917
  • Warning Period ends on 04SEP2017

This indicates that the software expires on 25MAY2017, however nothing will happen during the Grace Period. During the Warning Period messages in the SAS log will warn the user that the software is expiring. When the Warning Period ends on 04SEP2017 the SAS Software will stop functioning. PROC setinit is only checking the status of the Foundation SAS license, not the license in metadata.

If the foundation license is up-to-date but the license stored in metadata is expired the web applications will not work. It turns out SAS Environment Manager will also monitor the status of the SAS license. But is it the Foundation license or the license stored in metadata?

To see the status of the license in SAS Environment Manager, select Resources then select Browse > Platforms > SAS 9.4 Application Server Tier. The interface displays:

  • Days Until License Expiration:  the number of days until the license expires.
  • Days Until License Termination: the number of days until the software stops working.
  • Days Until License Termination Warning: the number of days until the Grace period.

steps to update your SAS License

Some testing revealed that Environment Manager is monitoring not the status of the foundation license but the status of the license in metadata. This is an important point, because as we noted earlier not all SAS solutions require the SID to be updated in metadata. Since Environment Manager monitors the license by checking the status of the SID file in metadata, administrators are recommended, as a best practice, to always update the SID file in metadata.

Environment manager with Service Architecture configured also will generate events that warn of license termination when the license termination date is within a month.

In addition, as of SAS 9.4 M3, SAS Management Console has an option to View metadata setinit details. To access this functionality you must be a member of the SAS Administrators Group or the Management Console: Advanced Role.

To check on a SID file in metadata open SAS Management Console and in the plug-ins tab:

1.     Expand Metadata Manager

2.     Select Metadata Utilities

3.     Right- click and select View metadata setinit details

steps to update your SAS License

Selecting the option gives details of the current SID file in metadata, with similar information as PROC setinit displays including the expiration date, the grace period and the warning period.  In addition it displays the date the SID file was last updated in metadata.

steps to update your SAS License

The takeaway: to fully renew SAS software, and ensure that SAS Environment Manager has the correct date for its metrics on license expiration, always use SAS Deployment Manager to both Update the SAS License, AND Update the SID File in Metadata.

To check if your SAS Deployment license has been fully updated, do the following:

1.     Run PROC setinit to view the status of the SAS Foundation license.

2.     Use SAS Management Console or SAS Environment Manager to check if the SID file has been updated in metadata.

For more information on this topic see the video, “Use SAS Environment Manager to Get SAS License Expiration Notice” and additional resources below:

 

SAS® Deployment Wizard and SAS® Deployment Manager 9.4:User’s Guide: Update SID File in Metadata
SAS® Deployment Wizard and SAS® Deployment Manager 9.4:User’s Guide: Renew SAS Software
SAS(R) 9.4 Intelligence Platform: System Administration Guide: Managing Setinit (License) Information in Metadata
SAS® Environment Manager 2.5 User’s Guide

tags: configuration, SAS Administrators, SAS architecture, SAS Environment Manager, SAS Professional Services

Two steps to update your SAS License and check if it is updated was published on SAS Users.

8月 202015
 

Everyone who codes with SAS knows what the SASWORK directory space is, and everyone who has ever managed a medium-large installation knows that you need to monitor this space to avoid a huge buildup of worthless disk usage.  One of the most common snarls happens when large SAS jobs go bust for one reason or another, and the work space does not get cleaned up properly.  Here’s a technique you can use, with the help of SAS Environment Manger, to get a proxy for the amount of disk space being used–it’s not perfect, but it’s better than being in the dark.

Before illustrating the technique, a little explanation is needed.  In SAS Environment Manager, you will find two types of SAS directories, both at the Server level:

  • SAS Config Level Directory 9.4, referring to the …/Lev1 directory
  • SAS Home Directory, referring to the …/SASHome/SASFoundation directory

The SASWORK directory is an additional Service level resource, underneath the SAS Home Directory, with the full name of:

<machine> SAS Home Directory 9.4 SAS work directory

WORK1

Further, the SASWORK directory (or, “work directory”) can be located anywhere on the machine–it’s always some place outside the physical hierarchies of SAS Config and SASHome.

Here we are interested in monitoring the work directory.  The problem is that the SAS EV agents are only able to scan and collect information about the disk volume where this work directory resides; they cannot get to the level of just the work directory by itself.  Therefore the metrics we can observe provide the amount of space being used on the entire disk volume on which the work directory resides.  We will use the metric called “Use Percent”–this is the same “Use Percent” metric that’s found in the alerts in the Service Architecture Framework:

WORK2

Despite this limitation, it’s still useful for our purposes to monitor this “work directory” object, so here’s how it’s done:

1.  Confirm the location and the resource for the SAS workspace.  On the main interface, logged in as a SAS administrator, select Resource->Browse->Services, then search on the string “work directory”.   Notice that there are two SAS work directories in this example, one on the compute01 machine, one on the meta01 machine, since this particular installation has two machines with base SAS installed:

WORK3

Here we select the compute01 machine by clicking on it.  The properties indicates the location of the SASWORK directory, which is /tmp :

WORK4

Note that Use Percent is one of the metrics, and also note the file system location:  /tmp on the Linux server.  You can confirm this if you like by opening a SAS session and running a SAS PROC OPTIONS.  The work directory is indicated below:

WORK5

2. Go to the Dashboard interface, and add a portlet of type “Metric Viewer” to the interface.  On the bottom of the right column, in the “Add Content to this column” portlet, choose the Metric Viewer option in the dropdown list and click on the plus “+” sign to add the new portlet.

WORK6

3. Click on the configuration button located at the top right corner of the new Metric Viewer portlet:

WORK7

4. Enter the following properties:

Description:               SASWORK Disk Volume

Resource Type:          – SAS Home Directory 9.4 SAS Directory

Metric:                         – Use Percent

Then at the bottom of the screen, select the Add to List button. Move the object called “compute01 SAS Home Directory 9.4 SAS work directory” to the right, using the arrow, and select OK. You should see this screen:

WORK8

5. Select the OK button, and you will see your new portlet with the “Use Percent” metric displayed:

WORK9

As stated earlier, this metric is imperfect because we are not getting the actual SASWORK space only but rather the work space plus the rest of the disk volume, but it’s better than nothing.  There are two potential solutions to this problem:

1.  It’s considered a best practice on a production site to create a separate disk volume to be used only for SASWORK–in that case the metric gives us the precise measure that we want.  In the case of Windows that would be a separate new drive letter (D:, E:, etc.)

2. It’s possible to use a resource type of “FileServer Directory Tree," point it at the physical SASWORK directory location, and get the total disk space being used, HOWEVER, this will not work unless the userID running the SAS EV agent has read permissions to all the subdirectories of the SASWORK area.  Each SAS user gets their own subdirectory within the SASWORK area, and each user is normally the only one that has directory read permissions to their own work area.  Therefore this solution would only work in a few unique cases, such as where the agent userID has specifically been given read permissions to the entire SASWORK directory.

tags: configuration, SAS Administrators, SAS Environment Manager, SAS Professional Services

Monitor your SASWORK directory from SAS Environment Manager was published on SAS Users.

7月 292015
 

SAS 9.4 M3 released in July 2015 with some interesting new features and functionality for platform SAS administrators.  In this blog I will review at a very high level the major new features. For details you can see the SAS 9.4 System Administration guide.

SAS 9.4 M3 includes a new release of SAS Environment Manager (2.5), and some nice new features.

Some highlights include: A federated data mart enables you to collect metric data in data marts for several SAS deployments and view the collected metric data in one place, log collection and discovery has been improved and support has been added for collecting metric data from a SAS grid. For details on these and a few additional enhancements to SAS Environment Manager see the SAS Environment Manager Users Guide.

The SAS Administration interface available in SAS Environment Manager is now an HTML5 interface and includes new metadata management capabilities including:

  • Server manager module which enables you to manage server definitions in metadata. For the current release, you can browse any type of server that has been defined in SAS metadata. You can create and edit definitions for SAS LASR Analytic Servers
  • Library manager module enables you to manage SAS library definitions in metadata. For the current release, you can browse any type of library that has been defined in SAS metadata. You can create and edit definitions for Base SAS libraries and SAS LASR Analytic Server libraries.
  • SAS Backup Manager graphical user interface (more on that later).

SAS Administrators EnvironmentManagerFor details of the new metadata management features see the SAS(R) Environment Manager 2.5 Administration: User’s Guide.

In support of Metadata Server clustering a new a new feature has been added to the Metadata Analyze and Repair Tools. Metadata Server Cluster Synchronization verifies that metadata is synchronized among all the nodes of a metadata server cluster.

The SAS Deployment Backup and Recovery tool has a number of new features. An exciting one is a new interface available on the Administration tab of SAS Environment Manager. The new interface supports scheduling, configuring, monitoring, and performing integrated backups. The interface incorporates most of the functions of the Deployment Backup and Recovery tool’s batch commands. For details of the using the new user interface see the SAS(R) Environment Manager 2.5 Administration: User’s Guide.

In addition to the new user interface, the tool has some additional enhancements. You can now:

  • include or exclude specific tiers, specific instances of the SAS Web Infrastructure Platform Data Server, or particular databases
  • reorganize metadata repositories during a backup
  • define filters that specify which subdirectories and files are to be included or excluded when backing up sub-directories within the configuration directory
  • specify additional (custom) directories within the configuration directory to be backed up.

Click here for more information on SAS Deployment Backup and Recovery tool.

When promoting content in 9.4 M3 you can use the -disableX11 option to run the batch import or export tool on UNIX without setting the DISPLAY variable. This removes a dependency on X11 for the batch export and import tools.

That is a really high level review of new features for platform administrators in SAS 9.4 M3. I hope you found this post helpful.

tags: configuration, SAS Administrators, SAS Environment Manager

Great new functionality for SAS Administrators in 9.4 M3 was published on SAS Users.

4月 272015
 

SAS recently performed testing using the Intel Cloud Edition for Lustre* Software - Global Support (HVM) available on AWS marketplace to determine how well a standard workload mix using SAS Grid Manager performs on AWS.  Our testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. You can find the detailed results in the technical paper, SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre.

In addition to the paper, Amazon will be publishing a post on the AWS Big Data Blog that will take a look at the approach to scaling the underlying AWS infrastructure to run SAS Grid Manager to meet the demands of SAS applications with demanding I/O requirements.  We will add the exact URL to the blog as a comment once it is published.

System design overview – network, instance sizes, topology, performance

For our testing, we set up the following AWS infrastructure to support the compute and IO needs for these two components of the system:

  • the SAS workload that was submitted using SAS Grid Manager
  • the underlying Lustre file system required to meet the clustered file system requirement of SAS Grid Manager.

SAS Grid Manager and Lustre shared file configuration on AWS clour

The SAS Grid nodes in the cluster are i2.8xlarge instances.  The 8xlarge instance size provides proportionally the best network performance to shared storage of any instance size, assuming minimal EBS traffic.  The i2 instance also provides high performance local storage, which is covered in more detail in the following section.

The use of an 8xlarge size for the Lustre cluster is less impactful since there is significant traffic to both EBS and the file system clients, although an 8xlarge is still is more optimal.  The Lustre file system has a caching strategy, and you will see higher throughput to clients in the case of frequent cache hits which effectively reduces the network traffic to EBS.

Steps to maximize storage I/O performance

The shared storage for SAS applications needs to be high speed temporary storage.  Typically temporary storage has the most demanding load.  The high I/O instance family, I2, and the recently released dense storage instance, D2, provide high aggregate throughput to ephemeral (local) storage.  For the SAS workload tested, the i2.8xlarge has 6.4 TB of local SSD storage, while the D2 has 48 TB of HDD.

Throughput testing and results

We wanted to achieve a throughput of least 100 MB/sec/core to temporary storage, and 50-75 MB/sec/core to shared storage.  The i2.8xlarge has 16 cores (32 virtual CPUs, each virtual CPU is a hyperthread on a core, and a core has two hyperthreads).  Testing done with lower level testing tools (fio and a SAS tool, iotest.sh)  showed a throughput of about 3 GB/sec to ephemeral (temporary) storage and about 1.5 GB/sec to shared storage.  The shared storage performance does not take into account file system caching, which Lustre does well.

This testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. For full details of the testing configuration and results, please see the SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre technical white paper.

 

tags: cloud computing, configuration, grid, SAS Administrators

The post Can I run SAS Grid Manager in the AWS cloud? appeared first on SAS Users.

4月 222015
 

SAS System software supports a wide variety architecture and deployment possibilities. It’s wild when you think about it because you can scale the analytic power of SAS from the humblest single CPU laptop machine all the way up to hundreds-of-machines clusters.

When SAS deployments involve many machines, it’s natural to look for time- and effort-saving options that simplify the initial installation as well as ongoing administration. Electing to employ a shared SAS configuration directory is one of those options. But what does that even mean?
Deploying SAS with a shared configuration directory is always optional. It’s not a technical requirement in any sense. But there are times when it’s really nice to have and SAS does support it in the proper circumstances.  Here are some tips on when to take advantage of shared configuration capabilities.

First, you need file-sharing technology

To create a shared configuration directory, we must first set up a way to share a single physical directory with multiple machines. A shared file system is one physical storage location that is

  • visible to (mounted on) multiple host machines
  • accessible to SAS on each machine by the same directory path.

There are many ways to accomplish this. The simplest place to start in UNIX (and Linux) environments is to define a shared filesystem using Network Attached Storage (or NAS) technology. An NAS-mounted filesystem essentially leverages the computer’s built-in networking ability to share one machine’s local disk such that it’s accessible to multiple machines.

This is fine for a proof-of-concept or small development/test deployment, but for a large production environment, chances are you will want to invest in a more robust and scalable technology. A Storage Area Network (or SAN) is a dedicated, resilient and highly available storage solution with faster connectivity than the standard network interfaces leveraged by NAS. There’s a lot more to shared filesystems than just NAS and SAN, but that’s a topic well covered elsewhere. Visit the SAS Support web site for Scalability and Performance Papers to view the SAS Technical Paper: A Survey of Shared File Systems.

Identify which SAS configuration directory to share

Next, we need to identify which SAS configuration directory to share. And that’s going to depend on your SAS server topology. Let’s begin with the standard SAS Enterprise Business Intelligence platform, which is a common building block for most SAS deployments. Here we’ve got three major service tiers:

  • Metadata
  • Compute (Workspace, Stored Process, OLAP, etc.)
  • Middle (Web)

For performance, efficiency, and availability purposes, we’ve elected to place each of those service tiers into their own set of host machines. That is, we’re going to physically separate those logical tiers by their function:

Diagram showing SAS deployment defined as compute tier, metadata tier and middle tier.

The graphic below shows the necessary deployment steps described by the Planning Application when we choose the topology above from the SAS Deployment Wizard (or SDW):

Output from Planning ApplicationThe takeaway here: separating the tiers in this way means that each tier will have its own configuration directory.  If you choose a multiple machine topology, then on each tier, you must:

  • run the SDW
  • select a configuration directory that is not shared with any other tier

Avoid this wrong turn!

It’s important to heed this advice:  when you’ve chosen a plan with separated tiers, then you must not allow those distinct tiers to write to the same configuration directory.

The SDW warns you if you try to do it:

Warning from SDW that configuration already exists.

But if you ignore the warning, the SDW will successfully deploy the software for the first as well as the subsequent tiers. SAS services will successfully startup and validate. Everything will appear to work – except for one major problem: the SAS Deployment Registry is overwritten with each new configuration deployment.

That means that in the future, installers for migration, hotfixes and maintenance updates will not be able to see all of the details of the full deployment – only the information for that last SDW configuration is retained. When that day comes, it will create a major headache for support purposes.

Configuring the Compute Tier on a shared directory—an example

Notice that up to this point, we’ve been talking about how the configuration directory must be deployed by tier, not by host machine. Each tier has its own considerations, but the Compute Tier is where we can share the configuration directory across multiple machines.

The Compute Tier can consist of one or more machines.  It’s very scalable both vertically and horizontally. For some deployments, there could be dozens, even hundreds, of machines in the SAS Compute Tier. In those circumstances, we don’t want to deploy a separate configuration for each one if we don’t have to, so let’s zoom in on the Compute Tier. In this diagram, we have seven different host machines of varying sizes – all run the same OS version and the same release of SAS. It will save us a lot of installation, configuration, and administration time if they all share a common configuration directory.

Compute Tier comprising seven machines of different sizes

When we run the configuration portion of the SAS Deployment Wizard for the Compute tier, we provide the shared file system’s directory path (in the diagram above, that’s /compute/config). And we only need to run the SDW configuration one time. After configuration is complete, all of the SAS configuration files you’re familiar with are visible and accessible by all machines of the Compute Tier. So with a single deployment run of the SDW, all of the machines in the Compute Tier have access to the same configuration.  So what are the benefits?

  • From a SAS installer’s perspective, it’s great not having to run the SDW for configuration on each and every host of the Compute Tier.
  • For the SAS administrator who is charged with daily operations and maintenance, a shared configuration means that making a change in one place is available to all intended machines.
  • Further, when it comes time to deploy hot fixes or maintenance updates, the installation tools also need to run only once for this shared configuration directory.

Finishing the configuration

There is some additional follow-through necessary, depending on your SAS release:

  • For SAS 9.4 M1 and earlier releases of SAS, some additional configuration work was required. Certain operational and log files were generically named and if those filenames were not changed, then there be file-locking conflicts as processes on different host machines attempted to write to the same physical file. The procedure is to modify certain scripts to insert variables into the filename references which would then ensure each host machine was writing to its own unique files on the shared filesystem.
  • Beginning with SAS 9.4 M2, these manual edits of executable files are no longer required.  Filename references now include the hostname by default so everything plays nicely in a shared configuration environment. Yay!

For any release of SAS, you must also make manual changes to the SAS metadata. At this point in the process, you have only deployed a single configuration directory, you have not yet informed the overall SAS deployment of how many server machines are participating in the Compute Tier. Follow the steps provided in the SAS® 9.4 Intelligence Platform: Application Server Administration Guide for Creating Metadata for Load-Balancing Clusters.

Configuring the Metadata Tier and Middle Tier

If you’ve decided to deploy a SAS Metadata Server cluster to ensure high-availability of your metadata services, then you must deploy at least three installations of the SAS Metadata Server. Each of those installations will have its own dedicated configuration directory – they do not share! The only thing shared between the nodes of a metadata cluster is the common network-mounted directory for metadata backups (not shown here).

Diagram of configuration files for the Metadata Tier

The same holds true if you choose to cluster the SAS Web Application Server. Let’s say you will deploy a horizontal two-node cluster of your SAS Web Application Servers that will be load-balanced by the SAS Web Server. Each node of that web app server cluster will have its own configuration directory – they do not share either!

Diagram of shared configuration for the Middle Tier

The point is, each of those cluster nodes (for meta and middle) requires their own configuration deployment. Now aren’t you glad we can perform just one configuration deployment in the Compute Tier to share the configuration directory for any number of machines participating there!

Takeaways

In this discussion, we have learned:

  • A SAS configuration directory can be shared across multiple machines in the logical Compute tier (as we have it defined separately from the Metadata and Middle tiers) – saving initial deployment effort as well as ongoing administration and maintenance effort
  • Clusters of SAS Metadata Servers should not share a configuration directory
  • Clusters of SAS middle-tier services should not share a configuration directory
  • Do not use the SAS Deployment Wizard to deploy a new configuration on top of another one in the same directory
  • Some shared filesystem technologies are better suited for supporting SAS I/O patterns than others – so choose wisely.  This list of  Scalability and Performance Papers can help.
tags: configuration, SAS Administrators, SAS Professional Services

The post Deploying SAS software--save time and effort with shared configuration appeared first on SAS Users.

10月 092014
 

hadoop-config1understanding the fundamentals of Kerberos authentication and how we can simplify processes by placing SAS and Hadoop in the same realm. For SAS applications to interact with a secure Hadoop environment, we must address the third key practice:

Ensure Kerberos prerequisites are met when installing and configuring SAS applications that interact with Hadoop.

The prerequisites must be met during installation and deployment of SAS software, specifically SAS/ACCESS Interface to Hadoop for SAS 9.4.

1) Make the correct versions of the Hadoop JAR files available to SAS.

If you’ve installed other SAS/ACCESS products before, you’ll find installing SAS/ACCESS Interface for Hadoop is different. For other SAS/ACCESS products, you generally install the RDBMS client application and then make parts of this client available via the LD_LIBRARY_PATH environment variable.

With SAS/ACCESS to Hadoop, the client is essentially a collection of JAR files. When you access Hadoop through SAS/ACCESS to Hadoop, these JAR files are loaded into memory. The SAS Foundation interacts with Java through the jproxy process, which loads the Hadoop JAR files.

You will find the instructions for copying the required Hadoop JAR files and setting the SAS_HADOOP_JAR_PATH environment variable in the following product documentation:

2) Make the appropriate configuration files available to SAS.

The configuration for the Hadoop client is provided via XML files. The cluster configuration is updated when Kerberos is enabled for the Hadoop cluster and you must remember to update the cluster configuration files when this is enabled.  The XML files contain properties specific to security, and the files required depend on the version of MapReduce being used in the Hadoop cluster. When Kerberos is enabled, it’s required that these XML configuration files contain all the appropriate options for SAS and Hadoop to properly connect.

  • If you are using MapReduce 1, you need the Hadoop core, Hadoop HDFS, and MapReduce configuration files.
  • If you are using MapReduce 2, you need the Hadoop core, Hadoop HDFS, MapReduce 2, and YARN configuration files.

The files are placed in a directory available to SAS Foundation and this location is set via the SAS_HADOOP_CONFIG_PATH environment variable.  The SAS® 9.4 Hadoop Configuration Guide for Base SAS® and SAS/ACCESS® describes how to make the cluster configuration files available to SAS Foundation.

3) Make the user’s Kerberos credentials available to SAS.

The SAS process will need to have access to the user’s Kerberos credentials for it to make a successful connection to the Hadoop cluster.  There are two different ways this can be achieved, but essentially SAS requires access to the user’s Kerberos Ticket-Granting-Ticket (TGT) via the Kerberos Ticket Cache.

Enable users to enter a kinit command interactively from the SAS server. My previous post Understanding Hadoop security described the steps required for a Hadoop user to access a Hadoop client:

  • launch a remote connection to a server
  • run a kinit command
  • then run the Hadoop client.

The same steps apply when you are accessing the client through SAS/ACCESS to Hadoop. You can make a remote SSH connection to the server where SAS is installed. Once logged into the system, you run the command kinit, which initiates your Kerberos credentials and prompts for your Kerberos password. This step obtains your TGT and places it in the Kerberos Ticket Cache. Once completed, you can start a SAS session and run SAS code containing SAS/ACCESSS to Hadoop statements. This method provides access to the secure Hadoop environment, and SAS will interact with Kerberos to provide the strong authentication of the user.

However, in reality, how many SAS users run their SAS code by first making a remote SSH connection to the server where SAS is installed? Clearly, the SAS clients such as SAS Enterprise Guide or the new SAS Studio do not function in this way: these are proper client-server applications. SAS software does not directly interact with Kerberos. Instead, SAS relies on the underlying operating system and APIs to make those connections. If you’re running a client-server application, the interactive shell environment isn’t available, and users cannot run the kinit command. SAS clients need the operating system to perform the kinit step for users automatically. This requirement means that the operating system itself must be integrated with Kerberos, providing the user’s Kerberos password to obtain a Kerberos-Ticket-Granting Ticket (TGT).

Integrate the operating system of the SAS server into the Kerberos realm for Hadoop. Integrating the operating system with Kerberos does not necessarily mean that the user accounts are stored in a directory server. You can configure Kerberos for authentication with local accounts. However, the user accounts must exist with all the same settings (UID, GID, etc.) on all of the hosts in the environment. This requirement includes the SAS server and the hosts used in the Hadoop environment.

Managing all these local user accounts across multiple machines will be considerable management overhead for the environment. As such, it makes sense to use a directory server such as LDAP to store the user details in one place. Then the operating system can be configured to use Kerberos for authentication and LDAP for user properties.

If SAS is running on Linux, you’d expect to use a PAM (Pluggable Authentication Module) configuration to perform this step, and the PAM should be configured to use Kerberos for authentication. This results in a TGT being generated as a user’s session is initialized.

The server where SAS code will be run must also be configured to use PAM, either through the SAS Deployment Wizard during the initial deployment or manually after the deployment is complete.  Both methods update the sasauth.conf file in the <SAS_HOME>/SASFoundation/9.4/utilities/bin and set the value of methods to “pam”.

This step is not sufficient for SAS to use PAM.  You must also make entries in the PAM configuration that describe what authentication services are used when sasauth performs an authentication.  Specifically, the “account” and “auth” module types are required.  The PAM configuration of the host is locked down to the root user, and you will need the support of your IT organization to complete this step. More details are found in the Configuration Guide for SAS 9.4 Foundation for UNIX Environments.

With this configuration in place, a Kerberos Ticket-Granting-Ticket should be generated as the user’s session is started by the SAS Object Spawner. The TGT will be automatically available for the client-server applications. On most Linux systems, this Kerberos TGT will be placed in the user’s Kerberos Ticket Cache, which is a file located, by default, in /tmp. The ticket cache normally has a name /tmp/krb5cc_<uid>_<rand>, where the last section of the filename is a set of random characters allowing for a user to log in multiple times and have separate Kerberos Ticket Caches.

Given that SAS does not know in advance what the full filename will be, the PAM configuration should define an environment variable KRB5CCNAME which points to the correct Kerberos Ticket Cache.  SAS and other processes use the environment variable to access the Kerberos Ticket Cache. Running the following code in a SAS session will print in the SAS log the value of the KRB5CCNAME environment variable:

%let krb5env=%sysget(KRB5CCNAME);
%put &KRB5ENV;

Which should put something like the following in the SAS log:

43         %let krb5env=%sysget(KRB5CCNAME);
44         %put &KRB5ENV;
FILE:/tmp/krb5cc_100001_ELca0y

Now that the Kerberos Ticket-Granting-Ticket is available to the SAS session running on the server, the end user is able to submit code using SAS/ACCESS to Hadoop statements that access a secure Hadoop environment.

In my next blog in the series, we will look at what happens when we connect to a secure Hadoop environment from a distributed High Performance Analytics Environment.

More information

 

tags: authentication, configuration, Hadoop, Kerberos, SAS Administrators, security
10月 082014
 

When SAS is used for analysis on large volumes of data (in the gigabytes), SAS reads and writes the data using large block sequential IO.  To gain the optimal performance from the hardware when doing these IOs, we strongly suggest that you review the information below to ensure that the infrastructure (CPUs, memory, IO subsystem) are all configured as optimally as possible.

Operating-system tuning. Tuning Guidelines for working with SAS on various operating systems can be found on the SAS Usage Note 53873.

CPU. SAS recommends the use of current generation processors whenever possible for all systems.

Memory. For each tier of the environment, SAS recommends the following minimum memory, guidelines:

  • SAS Compute tier: A minimum of 8GB of RAM per core
  • SAS Middle tier: A minimum 24GB or 8GB of RAM per core, whichever is larger
  • SAS Metadata tier:  A minimum of 8GB of RAM per core

It is also important to understand the amount of virtual memory that is required in the system. SAS recommends that virtual memory be 1.5 to 2 times the amount of physical RAM. If, in monitoring your system, it is evident that the machine is paging a lot, then SAS recommends either adding more memory or moving the paging file to a drive with a more robust I/O throughput rate compared to the default drive. In some cases, both of these steps may be necessary.

IO configuration. Configuring the IO subsystem (disks within the storage, adaptors coming out of the storage, interconnect between the storage and processors, input into the processors) to be able to deliver the IO throughput recommended by SAS will keep the processor busy, allow the workloads to execute without delays and make the SAS users happy.  Here are the recommended IO throughput for the typical file systems required by the SAS Compute tier:

  • Overall IO throughput needs to be a minimum of 100-125 MB/sec/core.
  • For SAS WORK, a minimum of 100 MB/sec/core
  • For permanent SAS data files, a minimum of 50-75 MB/sec/core

For more information regarding how SAS does IO, please review the Best Practices for Configuring your IO Subsystem for SAS® 9 Applications (Revised May 2014) paper.

IO throughput. Additionally, it is a good idea to establish base line IO capabilities before end-users begin placing demands on the system as well as to support monitoring the IO if end-users begin suggesting changes in performance.  To test the IO throughput, platform specific scripts are available:

File system. The Best Practices for Configuring IO paper above lists the preferred local file systems for SAS (i.e. JFS2 for AIX, XFS for RHEL, NTFS for Windows). Specific tuning for these file systems can be found the above operating system tuning papers.

For SAS Grid Computing implementations, a clustered file system is required.  SAS has tested SAS Grid Manager with many file systems, and the results of that testing along with any available tuning guidelines can be found in the A Survey of Shared File Systems (updated August 2013) paper.  In addition to this overall paper, there are more detailed papers on Red Hat’s GFS2 and IBM’s GPFS clustered file systems on the SAS Usage Note 53875.

Due to the nature of SAS WORK (the temporary file system for SAS applications), which does large sequential reads and writes and then destroys these files at the termination of the SAS session, SAS does not recommend NFS mounted file systems. These systems have a history of file-locking issues on NFS systems, and the network can negatively influence the performance of SAS when accessing files across it, especially when doing writes.

Storage array. Storage arrays play an important part in the IO subsystem infrastructure.  SAS has several papers on tuning guidelines for various storage arrays, through the SAS Usage Note 53874.

Miscellaneous. In addition to the above information, there are some general papers on how to setup the infrastructure to best support SAS, these are available for your review:

Finally, SAS recommends regular monitoring of the environment to ensure ample compute resources for SAS.  Additional papers are available that provide guidelines for appropriate monitoring.   These can be found on the SAS Usage Note 53877.

tags: configuration, deployment, performance, SAS Administrators