configuration

3月 302018
 

As a follow on from my previous blog post, where we looked at the different use cases for using Kerberos in SAS Viya 3.3, in this post I want to delve into more details on configuring Kerberos delegation with SAS Viya 3.3. SAS Viya 3.3 supports the use of Kerberos delegation to authenticate to SAS Logon Manager and then use the delegated credentials to access SAS Cloud Analytic Services. This was the first use case we illustrated in the previous blog post.

As a reminder this is the scenario we are discussing in this blog post:

Kerberos Delegation

In this post we’ll examine:

  • The implications of using Kerberos delegation.
  • The prerequisites.
  • How authentication is processed.
  • How to configure Kerberos delegation.

Why would we want to configure Kerberos delegation for SAS Viya 3.3? Kerberos will provide us with a strong authentication mechanism for the Visual interfaces, SAS Cloud Analytic Services, and Hadoop in SAS Viya 3.3. With Kerberos enabled, no end-user credentials will be sent from the browser to the SAS Viya 3.3 environment. Instead Kerberos relies on a number of encrypted tickets and a trusted third party to provide authentication. Equally, leveraging Kerberos Delegation means that both the SAS Cloud Analytic Services session and the connection to Hadoop will all be running as the end-user. This better allows you to trace operations to a specific end-user and to more thoroughly apply access controls to the end-user.

Implications

Configuring Kerberos delegation will involve configuring Kerberos authentication for both the Visual interfaces and SAS Cloud Analytic Services. First, we’ll look at the implications for the Visual interfaces.

Once we configure Kerberos for authentication of SAS Logon Manager it replaces the default LDAP provider for end-users. This means that the only way for end-users to authenticate to SAS Logon Manager will be with Kerberos. In SAS Viya 3.3 there is no concept of fallback authentication.

Kerberos will be our only option for end-user authentication and we will be unable to use the sasboot account to access the environment. Configuring Kerberos authentication for SAS Logon Manager will be an all-or-nothing approach.

While the web clients will be using Kerberos for authentication, any client using the OAuth API directly will still use the LDAP provider. This means when we connect to SAS Cloud Analytic Services from SAS Studio (which does not integrate with SAS Logon) we will still be obtaining an OAuth token using the username and password of the user accessing SAS Studio.

If we make any mistakes when we configure Kerberos, or if we have not managed to complete the prerequisites correctly, the SAS Logon Manager will not start correctly. The SAS Logon Manager bootstrap process will error and SAS Logon Manager will fail to start. If SAS Logon Manager fails to start then there is no way to gain access to the SAS Viya 3.3 visual interfaces. In such a case the SAS Boot Strap configuration tool must be used to repair or change the configuration settings. Finally, remember using Kerberos for SAS Logon Manager does not change the requirement for the identities microservice to connect to an LDAP provider. Since the identities microservice is retrieving information from LDAP about users and groups we need to ensure the username part of the Kerberos principal for the end-users match the username returned from LDAP. SAS Logon Manager will strip the realm from the user principal name and use this value in the comparison.

Then considering SAS Cloud Analytic Services, we will be adding Kerberos to the other supported mechanisms for authentication. We will not replace the other mechanisms the way we do for SAS Logon Manager. This means we will not prevent users from connecting with a username and password from the Programming interfaces. As with the configuration of SAS Logon Manager, issues with the configuration can cause SAS Cloud Analytic Services to fail to start. Therefore, it is recommended to complete the configuration of SAS Cloud Analytic Services after the deployment has completed and you are certain things are working correctly.

Prerequisites

To be able to use Kerberos delegation with SAS Viya 3.3 a number of prerequisites need to be completed.

Service Principal Name

First a Kerberos Service Principal Name (SPN) needs to be registered for both the HTTP service class and the sascas service class. This will take the form <service class>/<HOSTNAME>, where the <HOSTNAME> is the value that will be used by clients to request a Kerberos Service Ticket. In most cases for HTTP the <HOSTNAME> will just be the fully qualified hostname of the machine where the Apache HTTP Server is running. If you are using aliases or alternative DNS registrations then finding the correct name to use might not be so straight forward. For SAS Cloud Analytic Services, the <HOSTNAME> will be the CAS Controller hostnameNext by registering we mean that this Service Principal Name must be provided to the Kerberos Key Distribution Center (KDC). If we are using Microsoft Active Directory, each SPN must be registered against an object in the Active Directory database. Objects that can have a SPN registered against them are users or computers. We recommend using a user object in Active Directory to register each SPN against. We also recommend that different users are used for HTTP and CAS.

So, we have two service accounts in Active Directory and we register the SPN against each service account. There are different ways the SPN can be registered in Active Directory. The administrator could perform these tasks manually using the GUI, using an LDAP script, PowerShell script, using the setspn command, or using the ktpass command. Using these tools multiple SPNs can be registered against the service account, which is useful if there are different hostnames the end-users might use to access the service. In most cases using these tools will only register the SPN; however, using the ktpass command will also change the User Principal Name for the service account. More on this shortly.

Alternatively, to Microsoft Active Directory customers could be using a different Kerberos KDC. They could use MIT Kerberos or Heimdal Kerberos. For these implementations of Kerberos there is no difference between a user and a service. The database used by these KDCs just stores information on principals and does not provide a distinction between a User Principal Name and a Service Principal Name.

Trusted for Delegation

For the Kerberos authentication to be delegated from SAS Logon Manager to SAS Cloud Analytic Services and then from SAS Cloud Analytic Services to Secured Hadoop, the two service accounts that have the SPNs registered against them must be trusted for delegation. Without this the scenario it will not work. You can only specify that an account is trusted for delegation after the Service Principal Name has been registered. The option is not available until you have completed that step. The picture below shows an example of the delegation settings in Active Directory.

If the Secured Hadoop environment is configured using a different Kerberos Key Distribution Center (KDC) to the rest of the environment it will not prevent the end-to-end scenario working. However, it will add further complexity. You will need to ensure there is a cross-realm trust configured to the Hadoop KDC for the end-to-end scenario to work.

Kerberos Keytab

Once you have registered each of the SPNs you’ll need to create a Kerberos keytab for each service account. Again, there are multiple tools available to create the Kerberos keytab. We recommend using the ktutil command on Linux, since this is independent of the KDC and makes no changes to the Kerberos database when creating the keytab. Some tools like ktpass will make changes when generating the keytab.

In the Kerberos keytab we need to have the User Principal Name (UPN) and associated Kerberos keys for that principal. The Kerberos keys are essentially encrypted versions of the password for the principal. As we have discussed above, about the SPN, depending on the tools used to register it the UPN for the Kerberos keytab could take different forms.

When using ktpass to register SPN and create the keytab in a single step the UPN of the account in Active Directory will be set to the same value as the SPN. Whilst using the setspn command or performing the task manually will leave the UPN unchanged. Equally for MIT Kerberos or Heimdal Kerberos, since there is no differentiation between principals the UPN for the keytab, will be the SPN registered with the KDC.

Once the Kerberos keytabs have been created they will need to be made available to any hosts with the corresponding service deployed.

Kerberos Configuration File

Finally, as far as prerequisites are concerned we might need to provide a Kerberos configuration file for the host where SAS Logon Manager is deployed. This configuration should identify the default realm and other standard Kerberos settings. The Kerberos implementation in Java should be able to use network queries to find the default realm and Kerberos Key Distribution Center. However, if there are issues with the network discovery, then providing a Kerberos configuration file will allow us to specify these options.

The Kerberos configuration file should be placed in the standard location for the operating system. So on Linux this would be /etc/krb5.conf. If we want to specify a different location we can also specify a JVM option to point to a different location. This would be the java.security.krb5.conf option. Equally, if we cannot create a Kerberos configuration file we could set the java.security.krb5.realm and java.security.krb5.kdc options to identify the Kerberos Realm and Kerberos Key Distribution Center. We’ll show how to set JVM options below.

Authentication Process

The process of authenticating an end-user is shown in the figure below:

Where the steps are:

A.  Kerberos used to authenticate to SAS Logon Manager. SAS Logon Manager uses the Kerberos Keytab for HTTP/<HOSTNAME> to validate the Service Ticket. Delegated credentials are stored in the Credentials microservice.
B.  Standard internal OAuth connection to SAS Cloud Analytic Services. Where the origin field in the OAuth token includes Kerberos and the claims include the custom group ID “CASHOSTAccountRequired”.
C.  The presence of the additional Kerberos origin causes SAS Cloud Analytic Services to get the CAS client to make a second connection attempt using Kerberos. The Kerberos credentials for the end-user are obtained from the Credentials microservice. SAS Cloud Analytic Services Controller uses the Kerberos Keytab for sascas/<HOSTNAME> to validate the Service Ticket and authenticate the end-user. Delegated credentials are placed in the end-user ticket cache.
D.  SAS Cloud Analytic Services uses the credentials in the end-user ticket cache to authenticate as the end-user to the Secured Hadoop environment.

Configuration

Kerberos authentication must be configured for both SAS Logon Manager and SAS Cloud Analytic Services. Also, any end-user must be added to a new custom group.

SAS Logon Manager Configuration

SAS Logon Manager is configured in SAS Environment Manager.

Note: Before attempting any configuration, ensure at least one valid LDAP user is a member of the SAS Administrators custom group.

The configuration settings are within the Definitions section of SAS Environment Manager. For the sas.logon.kerberos definition you need to set the following properties:

For more information see the

SAS Logon Manager will need to be restarted for these new JVM options to be picked up. The same method can be used to set the JVM options for identifying the Kerberos Realm and KDC where we would add the following:

  • Name = java_option_krb5realm
  • Value = -Djava.security.krb5.realm=<REALM>
  • Name = java_option_krb5kdc
  • Value = -Djava.security.krb5.kdc=<KDC HOSTNAME>

Or for setting the location of the Kerberos configuration file where we would add:

  • Name = java_option_krb5conf
  • Value = -Djava.security.krb5.conf=/etc/krb5.conf

SAS Cloud Analytic Services Configuration

The configuration for SAS Cloud Analytic Services is not performed in SAS Environment Manager and is completed by changing files on the file system. The danger of changing files on the file system is that re-running the deployment Ansible playbook might overwrite any changes you make. The choices you have is to either remake any changes to the file system, make the changes to both the file system and the playbook files, or make the changes in the playbook files and re-run the playbook to change the file system. Here I will list the changes in both the configuration files and the playbook files.

There is only one required change and then two option changes. The required change is to define the authentication methods that SAS Cloud Analytic Services will use. In the file casconfig_usermods.lua located in:

/opt/sas/viya/config/etc/cas/default

Add the following line:

cas.provlist = 'oauth.ext.kerb'

Note: Unlike the SAS Logon Manager option above, this is separated with full-stops!

In the same file we can make two optional changes. These optional changes enable you to override default values. The first is the default Service Principal Name that SAS Cloud Analytic Services will use. If you cannot use sascas/<HOSTNAME> you can add the following to the casconfig_usermods.lua:

-- Add Env Variable for SPN
env.CAS_SERVER_PRINCIPAL = 'CAS/HOSTNAME.COMPANY.COM'

This sets an environment variable with the new value of the Service Principal Name. The second optional change is to set another environment variable. This will allow you to put the Kerberos Keytab in any location and call it anything. The default name and location is:

/etc/sascas.keytab

If you want to put the keytab somewhere else or call it something else add the following to the casconfig_usermods.lua

-- Add Env Variable for keytab location
env.KRB5_KTNAME = '/opt/sas/cas.keytab'

These changes can then be reflected in the vars.yml within the playbook by adding the following to the CAS_CONFIGURATION section:

CAS_CONFIGURATION:
   env:
     CAS_SERVER_PRINCIPAL: 'CAS/HOSTNAME.COMPANY.COM'
     KRB5_KTNAME: '/opt/sas/cas.keytab'
   cfg:
     provlist: 'oauth.ext.kerb'

With this in place we can restart the SAS Cloud Analytic Services Controller to pick-up the changes.

Custom Group

If you attempted to test accessing SAS Cloud Analytic Services at this point from the Visual interfaces as an end-user you would see that they were not delegating credentials and the CAS session was not running as the end-user. The final step is to create a custom group in SAS Environment Manager. This custom group can be called anything, perhaps “Delegated Users”, but the ID for the group must be “CASHostAccountRequired“. Without this the CAS session will not be run as the end-user and delegated Kerberos credentials will not be used to launch the session.

Summary

What we have outlined in this article is the new feature of SAS Viya 3.3 that enables Kerberos delegation throughout the environment. It allows you to have end-user sessions in SAS Cloud Analytics Services that are able to use Kerberos to connect to Secured Hadoop. I hope you found this helpful.

SAS Viya 3.3 Kerberos Delegation was published on SAS Users.

8月 022016
 

One of the jobs of SAS Administrators is keeping the SAS license current.  In the past, all you needed to do was update the license for Foundation SAS and you were done. This task can be performed by selecting the Renew SAS Software option in the SAS Deployment Manager.

More recently, many SAS solutions require an additional step which updates the license information in metadata. The license information is stored in metadata so that middle-tier applications can access it in order to check whether the license is valid. Not all solutions require that the SAS Installation Data file (SID) file be stored in metadata, however the list of solutions that do require it is growing and includes SAS Visual Analytics. For a full list you can check this SASNOTE. To update the license information in metadata, run the SAS Deployment Manager and select Update SID File in Metadata.

Recently, I performed a license renewal for a Visual Analytics environment. A couple of days later it occurred to me that I might not have performed the update of the SID file in metadata. That prompted the obvious question: how do I check the status of my license file in metadata?

To check the status of a SAS Foundation license you can use PROC setinit. PROC setinit will return the details of the SAS license in the SAS log.

proc setinit;run;

steps to update your SAS License

The above output of PROC setinit shows the:

  • Expiration Date as 25MAY2017
  • Grace Period ends on 09JUL2917
  • Warning Period ends on 04SEP2017

This indicates that the software expires on 25MAY2017, however nothing will happen during the Grace Period. During the Warning Period messages in the SAS log will warn the user that the software is expiring. When the Warning Period ends on 04SEP2017 the SAS Software will stop functioning. PROC setinit is only checking the status of the Foundation SAS license, not the license in metadata.

If the foundation license is up-to-date but the license stored in metadata is expired the web applications will not work. It turns out SAS Environment Manager will also monitor the status of the SAS license. But is it the Foundation license or the license stored in metadata?

To see the status of the license in SAS Environment Manager, select Resources then select Browse > Platforms > SAS 9.4 Application Server Tier. The interface displays:

  • Days Until License Expiration:  the number of days until the license expires.
  • Days Until License Termination: the number of days until the software stops working.
  • Days Until License Termination Warning: the number of days until the Grace period.

steps to update your SAS License

Some testing revealed that Environment Manager is monitoring not the status of the foundation license but the status of the license in metadata. This is an important point, because as we noted earlier not all SAS solutions require the SID to be updated in metadata. Since Environment Manager monitors the license by checking the status of the SID file in metadata, administrators are recommended, as a best practice, to always update the SID file in metadata.

Environment manager with Service Architecture configured also will generate events that warn of license termination when the license termination date is within a month.

In addition, as of SAS 9.4 M3, SAS Management Console has an option to View metadata setinit details. To access this functionality you must be a member of the SAS Administrators Group or the Management Console: Advanced Role.

To check on a SID file in metadata open SAS Management Console and in the plug-ins tab:

1.     Expand Metadata Manager

2.     Select Metadata Utilities

3.     Right- click and select View metadata setinit details

steps to update your SAS License

Selecting the option gives details of the current SID file in metadata, with similar information as PROC setinit displays including the expiration date, the grace period and the warning period.  In addition it displays the date the SID file was last updated in metadata.

steps to update your SAS License

The takeaway: to fully renew SAS software, and ensure that SAS Environment Manager has the correct date for its metrics on license expiration, always use SAS Deployment Manager to both Update the SAS License, AND Update the SID File in Metadata.

To check if your SAS Deployment license has been fully updated, do the following:

1.     Run PROC setinit to view the status of the SAS Foundation license.

2.     Use SAS Management Console or SAS Environment Manager to check if the SID file has been updated in metadata.

For more information on this topic see the video, “Use SAS Environment Manager to Get SAS License Expiration Notice” and additional resources below:

 

SAS® Deployment Wizard and SAS® Deployment Manager 9.4:User’s Guide: Update SID File in Metadata
SAS® Deployment Wizard and SAS® Deployment Manager 9.4:User’s Guide: Renew SAS Software
SAS(R) 9.4 Intelligence Platform: System Administration Guide: Managing Setinit (License) Information in Metadata
SAS® Environment Manager 2.5 User’s Guide

tags: configuration, SAS Administrators, SAS architecture, SAS Environment Manager, SAS Professional Services

Two steps to update your SAS License and check if it is updated was published on SAS Users.

8月 202015
 

Everyone who codes with SAS knows what the SASWORK directory space is, and everyone who has ever managed a medium-large installation knows that you need to monitor this space to avoid a huge buildup of worthless disk usage.  One of the most common snarls happens when large SAS jobs go bust for one reason or another, and the work space does not get cleaned up properly.  Here’s a technique you can use, with the help of SAS Environment Manger, to get a proxy for the amount of disk space being used–it’s not perfect, but it’s better than being in the dark.

Before illustrating the technique, a little explanation is needed.  In SAS Environment Manager, you will find two types of SAS directories, both at the Server level:

  • SAS Config Level Directory 9.4, referring to the …/Lev1 directory
  • SAS Home Directory, referring to the …/SASHome/SASFoundation directory

The SASWORK directory is an additional Service level resource, underneath the SAS Home Directory, with the full name of:

<machine> SAS Home Directory 9.4 SAS work directory

WORK1

Further, the SASWORK directory (or, “work directory”) can be located anywhere on the machine–it’s always some place outside the physical hierarchies of SAS Config and SASHome.

Here we are interested in monitoring the work directory.  The problem is that the SAS EV agents are only able to scan and collect information about the disk volume where this work directory resides; they cannot get to the level of just the work directory by itself.  Therefore the metrics we can observe provide the amount of space being used on the entire disk volume on which the work directory resides.  We will use the metric called “Use Percent”–this is the same “Use Percent” metric that’s found in the alerts in the Service Architecture Framework:

WORK2

Despite this limitation, it’s still useful for our purposes to monitor this “work directory” object, so here’s how it’s done:

1.  Confirm the location and the resource for the SAS workspace.  On the main interface, logged in as a SAS administrator, select Resource->Browse->Services, then search on the string “work directory”.   Notice that there are two SAS work directories in this example, one on the compute01 machine, one on the meta01 machine, since this particular installation has two machines with base SAS installed:

WORK3

Here we select the compute01 machine by clicking on it.  The properties indicates the location of the SASWORK directory, which is /tmp :

WORK4

Note that Use Percent is one of the metrics, and also note the file system location:  /tmp on the Linux server.  You can confirm this if you like by opening a SAS session and running a SAS PROC OPTIONS.  The work directory is indicated below:

WORK5

2. Go to the Dashboard interface, and add a portlet of type “Metric Viewer” to the interface.  On the bottom of the right column, in the “Add Content to this column” portlet, choose the Metric Viewer option in the dropdown list and click on the plus “+” sign to add the new portlet.

WORK6

3. Click on the configuration button located at the top right corner of the new Metric Viewer portlet:

WORK7

4. Enter the following properties:

Description:               SASWORK Disk Volume

Resource Type:          – SAS Home Directory 9.4 SAS Directory

Metric:                         – Use Percent

Then at the bottom of the screen, select the Add to List button. Move the object called “compute01 SAS Home Directory 9.4 SAS work directory” to the right, using the arrow, and select OK. You should see this screen:

WORK8

5. Select the OK button, and you will see your new portlet with the “Use Percent” metric displayed:

WORK9

As stated earlier, this metric is imperfect because we are not getting the actual SASWORK space only but rather the work space plus the rest of the disk volume, but it’s better than nothing.  There are two potential solutions to this problem:

1.  It’s considered a best practice on a production site to create a separate disk volume to be used only for SASWORK–in that case the metric gives us the precise measure that we want.  In the case of Windows that would be a separate new drive letter (D:, E:, etc.)

2. It’s possible to use a resource type of “FileServer Directory Tree," point it at the physical SASWORK directory location, and get the total disk space being used, HOWEVER, this will not work unless the userID running the SAS EV agent has read permissions to all the subdirectories of the SASWORK area.  Each SAS user gets their own subdirectory within the SASWORK area, and each user is normally the only one that has directory read permissions to their own work area.  Therefore this solution would only work in a few unique cases, such as where the agent userID has specifically been given read permissions to the entire SASWORK directory.

tags: configuration, SAS Administrators, SAS Environment Manager, SAS Professional Services

Monitor your SASWORK directory from SAS Environment Manager was published on SAS Users.

7月 292015
 

SAS 9.4 M3 released in July 2015 with some interesting new features and functionality for platform SAS administrators.  In this blog I will review at a very high level the major new features. For details you can see the SAS 9.4 System Administration guide.

SAS 9.4 M3 includes a new release of SAS Environment Manager (2.5), and some nice new features.

Some highlights include: A federated data mart enables you to collect metric data in data marts for several SAS deployments and view the collected metric data in one place, log collection and discovery has been improved and support has been added for collecting metric data from a SAS grid. For details on these and a few additional enhancements to SAS Environment Manager see the SAS Environment Manager Users Guide.

The SAS Administration interface available in SAS Environment Manager is now an HTML5 interface and includes new metadata management capabilities including:

  • Server manager module which enables you to manage server definitions in metadata. For the current release, you can browse any type of server that has been defined in SAS metadata. You can create and edit definitions for SAS LASR Analytic Servers
  • Library manager module enables you to manage SAS library definitions in metadata. For the current release, you can browse any type of library that has been defined in SAS metadata. You can create and edit definitions for Base SAS libraries and SAS LASR Analytic Server libraries.
  • SAS Backup Manager graphical user interface (more on that later).

SAS Administrators EnvironmentManagerFor details of the new metadata management features see the SAS(R) Environment Manager 2.5 Administration: User’s Guide.

In support of Metadata Server clustering a new a new feature has been added to the Metadata Analyze and Repair Tools. Metadata Server Cluster Synchronization verifies that metadata is synchronized among all the nodes of a metadata server cluster.

The SAS Deployment Backup and Recovery tool has a number of new features. An exciting one is a new interface available on the Administration tab of SAS Environment Manager. The new interface supports scheduling, configuring, monitoring, and performing integrated backups. The interface incorporates most of the functions of the Deployment Backup and Recovery tool’s batch commands. For details of the using the new user interface see the SAS(R) Environment Manager 2.5 Administration: User’s Guide.

In addition to the new user interface, the tool has some additional enhancements. You can now:

  • include or exclude specific tiers, specific instances of the SAS Web Infrastructure Platform Data Server, or particular databases
  • reorganize metadata repositories during a backup
  • define filters that specify which subdirectories and files are to be included or excluded when backing up sub-directories within the configuration directory
  • specify additional (custom) directories within the configuration directory to be backed up.

Click here for more information on SAS Deployment Backup and Recovery tool.

When promoting content in 9.4 M3 you can use the -disableX11 option to run the batch import or export tool on UNIX without setting the DISPLAY variable. This removes a dependency on X11 for the batch export and import tools.

That is a really high level review of new features for platform administrators in SAS 9.4 M3. I hope you found this post helpful.

tags: configuration, SAS Administrators, SAS Environment Manager

Great new functionality for SAS Administrators in 9.4 M3 was published on SAS Users.

4月 272015
 

SAS recently performed testing using the Intel Cloud Edition for Lustre* Software - Global Support (HVM) available on AWS marketplace to determine how well a standard workload mix using SAS Grid Manager performs on AWS.  Our testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. You can find the detailed results in the technical paper, SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre.

In addition to the paper, Amazon will be publishing a post on the AWS Big Data Blog that will take a look at the approach to scaling the underlying AWS infrastructure to run SAS Grid Manager to meet the demands of SAS applications with demanding I/O requirements.  We will add the exact URL to the blog as a comment once it is published.

System design overview – network, instance sizes, topology, performance

For our testing, we set up the following AWS infrastructure to support the compute and IO needs for these two components of the system:

  • the SAS workload that was submitted using SAS Grid Manager
  • the underlying Lustre file system required to meet the clustered file system requirement of SAS Grid Manager.

SAS Grid Manager and Lustre shared file configuration on AWS clour

The SAS Grid nodes in the cluster are i2.8xlarge instances.  The 8xlarge instance size provides proportionally the best network performance to shared storage of any instance size, assuming minimal EBS traffic.  The i2 instance also provides high performance local storage, which is covered in more detail in the following section.

The use of an 8xlarge size for the Lustre cluster is less impactful since there is significant traffic to both EBS and the file system clients, although an 8xlarge is still is more optimal.  The Lustre file system has a caching strategy, and you will see higher throughput to clients in the case of frequent cache hits which effectively reduces the network traffic to EBS.

Steps to maximize storage I/O performance

The shared storage for SAS applications needs to be high speed temporary storage.  Typically temporary storage has the most demanding load.  The high I/O instance family, I2, and the recently released dense storage instance, D2, provide high aggregate throughput to ephemeral (local) storage.  For the SAS workload tested, the i2.8xlarge has 6.4 TB of local SSD storage, while the D2 has 48 TB of HDD.

Throughput testing and results

We wanted to achieve a throughput of least 100 MB/sec/core to temporary storage, and 50-75 MB/sec/core to shared storage.  The i2.8xlarge has 16 cores (32 virtual CPUs, each virtual CPU is a hyperthread on a core, and a core has two hyperthreads).  Testing done with lower level testing tools (fio and a SAS tool, iotest.sh)  showed a throughput of about 3 GB/sec to ephemeral (temporary) storage and about 1.5 GB/sec to shared storage.  The shared storage performance does not take into account file system caching, which Lustre does well.

This testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. For full details of the testing configuration and results, please see the SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre technical white paper.

 

tags: cloud computing, configuration, grid, SAS Administrators

The post Can I run SAS Grid Manager in the AWS cloud? appeared first on SAS Users.

4月 222015
 

SAS System software supports a wide variety architecture and deployment possibilities. It’s wild when you think about it because you can scale the analytic power of SAS from the humblest single CPU laptop machine all the way up to hundreds-of-machines clusters.

When SAS deployments involve many machines, it’s natural to look for time- and effort-saving options that simplify the initial installation as well as ongoing administration. Electing to employ a shared SAS configuration directory is one of those options. But what does that even mean?
Deploying SAS with a shared configuration directory is always optional. It’s not a technical requirement in any sense. But there are times when it’s really nice to have and SAS does support it in the proper circumstances.  Here are some tips on when to take advantage of shared configuration capabilities.

First, you need file-sharing technology

To create a shared configuration directory, we must first set up a way to share a single physical directory with multiple machines. A shared file system is one physical storage location that is

  • visible to (mounted on) multiple host machines
  • accessible to SAS on each machine by the same directory path.

There are many ways to accomplish this. The simplest place to start in UNIX (and Linux) environments is to define a shared filesystem using Network Attached Storage (or NAS) technology. An NAS-mounted filesystem essentially leverages the computer’s built-in networking ability to share one machine’s local disk such that it’s accessible to multiple machines.

This is fine for a proof-of-concept or small development/test deployment, but for a large production environment, chances are you will want to invest in a more robust and scalable technology. A Storage Area Network (or SAN) is a dedicated, resilient and highly available storage solution with faster connectivity than the standard network interfaces leveraged by NAS. There’s a lot more to shared filesystems than just NAS and SAN, but that’s a topic well covered elsewhere. Visit the SAS Support web site for Scalability and Performance Papers to view the SAS Technical Paper: A Survey of Shared File Systems.

Identify which SAS configuration directory to share

Next, we need to identify which SAS configuration directory to share. And that’s going to depend on your SAS server topology. Let’s begin with the standard SAS Enterprise Business Intelligence platform, which is a common building block for most SAS deployments. Here we’ve got three major service tiers:

  • Metadata
  • Compute (Workspace, Stored Process, OLAP, etc.)
  • Middle (Web)

For performance, efficiency, and availability purposes, we’ve elected to place each of those service tiers into their own set of host machines. That is, we’re going to physically separate those logical tiers by their function:

Diagram showing SAS deployment defined as compute tier, metadata tier and middle tier.

The graphic below shows the necessary deployment steps described by the Planning Application when we choose the topology above from the SAS Deployment Wizard (or SDW):

Output from Planning ApplicationThe takeaway here: separating the tiers in this way means that each tier will have its own configuration directory.  If you choose a multiple machine topology, then on each tier, you must:

  • run the SDW
  • select a configuration directory that is not shared with any other tier

Avoid this wrong turn!

It’s important to heed this advice:  when you’ve chosen a plan with separated tiers, then you must not allow those distinct tiers to write to the same configuration directory.

The SDW warns you if you try to do it:

Warning from SDW that configuration already exists.

But if you ignore the warning, the SDW will successfully deploy the software for the first as well as the subsequent tiers. SAS services will successfully startup and validate. Everything will appear to work – except for one major problem: the SAS Deployment Registry is overwritten with each new configuration deployment.

That means that in the future, installers for migration, hotfixes and maintenance updates will not be able to see all of the details of the full deployment – only the information for that last SDW configuration is retained. When that day comes, it will create a major headache for support purposes.

Configuring the Compute Tier on a shared directory—an example

Notice that up to this point, we’ve been talking about how the configuration directory must be deployed by tier, not by host machine. Each tier has its own considerations, but the Compute Tier is where we can share the configuration directory across multiple machines.

The Compute Tier can consist of one or more machines.  It’s very scalable both vertically and horizontally. For some deployments, there could be dozens, even hundreds, of machines in the SAS Compute Tier. In those circumstances, we don’t want to deploy a separate configuration for each one if we don’t have to, so let’s zoom in on the Compute Tier. In this diagram, we have seven different host machines of varying sizes – all run the same OS version and the same release of SAS. It will save us a lot of installation, configuration, and administration time if they all share a common configuration directory.

Compute Tier comprising seven machines of different sizes

When we run the configuration portion of the SAS Deployment Wizard for the Compute tier, we provide the shared file system’s directory path (in the diagram above, that’s /compute/config). And we only need to run the SDW configuration one time. After configuration is complete, all of the SAS configuration files you’re familiar with are visible and accessible by all machines of the Compute Tier. So with a single deployment run of the SDW, all of the machines in the Compute Tier have access to the same configuration.  So what are the benefits?

  • From a SAS installer’s perspective, it’s great not having to run the SDW for configuration on each and every host of the Compute Tier.
  • For the SAS administrator who is charged with daily operations and maintenance, a shared configuration means that making a change in one place is available to all intended machines.
  • Further, when it comes time to deploy hot fixes or maintenance updates, the installation tools also need to run only once for this shared configuration directory.

Finishing the configuration

There is some additional follow-through necessary, depending on your SAS release:

  • For SAS 9.4 M1 and earlier releases of SAS, some additional configuration work was required. Certain operational and log files were generically named and if those filenames were not changed, then there be file-locking conflicts as processes on different host machines attempted to write to the same physical file. The procedure is to modify certain scripts to insert variables into the filename references which would then ensure each host machine was writing to its own unique files on the shared filesystem.
  • Beginning with SAS 9.4 M2, these manual edits of executable files are no longer required.  Filename references now include the hostname by default so everything plays nicely in a shared configuration environment. Yay!

For any release of SAS, you must also make manual changes to the SAS metadata. At this point in the process, you have only deployed a single configuration directory, you have not yet informed the overall SAS deployment of how many server machines are participating in the Compute Tier. Follow the steps provided in the SAS® 9.4 Intelligence Platform: Application Server Administration Guide for Creating Metadata for Load-Balancing Clusters.

Configuring the Metadata Tier and Middle Tier

If you’ve decided to deploy a SAS Metadata Server cluster to ensure high-availability of your metadata services, then you must deploy at least three installations of the SAS Metadata Server. Each of those installations will have its own dedicated configuration directory – they do not share! The only thing shared between the nodes of a metadata cluster is the common network-mounted directory for metadata backups (not shown here).

Diagram of configuration files for the Metadata Tier

The same holds true if you choose to cluster the SAS Web Application Server. Let’s say you will deploy a horizontal two-node cluster of your SAS Web Application Servers that will be load-balanced by the SAS Web Server. Each node of that web app server cluster will have its own configuration directory – they do not share either!

Diagram of shared configuration for the Middle Tier

The point is, each of those cluster nodes (for meta and middle) requires their own configuration deployment. Now aren’t you glad we can perform just one configuration deployment in the Compute Tier to share the configuration directory for any number of machines participating there!

Takeaways

In this discussion, we have learned:

  • A SAS configuration directory can be shared across multiple machines in the logical Compute tier (as we have it defined separately from the Metadata and Middle tiers) – saving initial deployment effort as well as ongoing administration and maintenance effort
  • Clusters of SAS Metadata Servers should not share a configuration directory
  • Clusters of SAS middle-tier services should not share a configuration directory
  • Do not use the SAS Deployment Wizard to deploy a new configuration on top of another one in the same directory
  • Some shared filesystem technologies are better suited for supporting SAS I/O patterns than others – so choose wisely.  This list of  Scalability and Performance Papers can help.
tags: configuration, SAS Administrators, SAS Professional Services

The post Deploying SAS software--save time and effort with shared configuration appeared first on SAS Users.

10月 092014
 

hadoop-config1understanding the fundamentals of Kerberos authentication and how we can simplify processes by placing SAS and Hadoop in the same realm. For SAS applications to interact with a secure Hadoop environment, we must address the third key practice:

Ensure Kerberos prerequisites are met when installing and configuring SAS applications that interact with Hadoop.

The prerequisites must be met during installation and deployment of SAS software, specifically SAS/ACCESS Interface to Hadoop for SAS 9.4.

1) Make the correct versions of the Hadoop JAR files available to SAS.

If you’ve installed other SAS/ACCESS products before, you’ll find installing SAS/ACCESS Interface for Hadoop is different. For other SAS/ACCESS products, you generally install the RDBMS client application and then make parts of this client available via the LD_LIBRARY_PATH environment variable.

With SAS/ACCESS to Hadoop, the client is essentially a collection of JAR files. When you access Hadoop through SAS/ACCESS to Hadoop, these JAR files are loaded into memory. The SAS Foundation interacts with Java through the jproxy process, which loads the Hadoop JAR files.

You will find the instructions for copying the required Hadoop JAR files and setting the SAS_HADOOP_JAR_PATH environment variable in the following product documentation:

2) Make the appropriate configuration files available to SAS.

The configuration for the Hadoop client is provided via XML files. The cluster configuration is updated when Kerberos is enabled for the Hadoop cluster and you must remember to update the cluster configuration files when this is enabled.  The XML files contain properties specific to security, and the files required depend on the version of MapReduce being used in the Hadoop cluster. When Kerberos is enabled, it’s required that these XML configuration files contain all the appropriate options for SAS and Hadoop to properly connect.

  • If you are using MapReduce 1, you need the Hadoop core, Hadoop HDFS, and MapReduce configuration files.
  • If you are using MapReduce 2, you need the Hadoop core, Hadoop HDFS, MapReduce 2, and YARN configuration files.

The files are placed in a directory available to SAS Foundation and this location is set via the SAS_HADOOP_CONFIG_PATH environment variable.  The SAS® 9.4 Hadoop Configuration Guide for Base SAS® and SAS/ACCESS® describes how to make the cluster configuration files available to SAS Foundation.

3) Make the user’s Kerberos credentials available to SAS.

The SAS process will need to have access to the user’s Kerberos credentials for it to make a successful connection to the Hadoop cluster.  There are two different ways this can be achieved, but essentially SAS requires access to the user’s Kerberos Ticket-Granting-Ticket (TGT) via the Kerberos Ticket Cache.

Enable users to enter a kinit command interactively from the SAS server. My previous post Understanding Hadoop security described the steps required for a Hadoop user to access a Hadoop client:

  • launch a remote connection to a server
  • run a kinit command
  • then run the Hadoop client.

The same steps apply when you are accessing the client through SAS/ACCESS to Hadoop. You can make a remote SSH connection to the server where SAS is installed. Once logged into the system, you run the command kinit, which initiates your Kerberos credentials and prompts for your Kerberos password. This step obtains your TGT and places it in the Kerberos Ticket Cache. Once completed, you can start a SAS session and run SAS code containing SAS/ACCESSS to Hadoop statements. This method provides access to the secure Hadoop environment, and SAS will interact with Kerberos to provide the strong authentication of the user.

However, in reality, how many SAS users run their SAS code by first making a remote SSH connection to the server where SAS is installed? Clearly, the SAS clients such as SAS Enterprise Guide or the new SAS Studio do not function in this way: these are proper client-server applications. SAS software does not directly interact with Kerberos. Instead, SAS relies on the underlying operating system and APIs to make those connections. If you’re running a client-server application, the interactive shell environment isn’t available, and users cannot run the kinit command. SAS clients need the operating system to perform the kinit step for users automatically. This requirement means that the operating system itself must be integrated with Kerberos, providing the user’s Kerberos password to obtain a Kerberos-Ticket-Granting Ticket (TGT).

Integrate the operating system of the SAS server into the Kerberos realm for Hadoop. Integrating the operating system with Kerberos does not necessarily mean that the user accounts are stored in a directory server. You can configure Kerberos for authentication with local accounts. However, the user accounts must exist with all the same settings (UID, GID, etc.) on all of the hosts in the environment. This requirement includes the SAS server and the hosts used in the Hadoop environment.

Managing all these local user accounts across multiple machines will be considerable management overhead for the environment. As such, it makes sense to use a directory server such as LDAP to store the user details in one place. Then the operating system can be configured to use Kerberos for authentication and LDAP for user properties.

If SAS is running on Linux, you’d expect to use a PAM (Pluggable Authentication Module) configuration to perform this step, and the PAM should be configured to use Kerberos for authentication. This results in a TGT being generated as a user’s session is initialized.

The server where SAS code will be run must also be configured to use PAM, either through the SAS Deployment Wizard during the initial deployment or manually after the deployment is complete.  Both methods update the sasauth.conf file in the <SAS_HOME>/SASFoundation/9.4/utilities/bin and set the value of methods to “pam”.

This step is not sufficient for SAS to use PAM.  You must also make entries in the PAM configuration that describe what authentication services are used when sasauth performs an authentication.  Specifically, the “account” and “auth” module types are required.  The PAM configuration of the host is locked down to the root user, and you will need the support of your IT organization to complete this step. More details are found in the Configuration Guide for SAS 9.4 Foundation for UNIX Environments.

With this configuration in place, a Kerberos Ticket-Granting-Ticket should be generated as the user’s session is started by the SAS Object Spawner. The TGT will be automatically available for the client-server applications. On most Linux systems, this Kerberos TGT will be placed in the user’s Kerberos Ticket Cache, which is a file located, by default, in /tmp. The ticket cache normally has a name /tmp/krb5cc_<uid>_<rand>, where the last section of the filename is a set of random characters allowing for a user to log in multiple times and have separate Kerberos Ticket Caches.

Given that SAS does not know in advance what the full filename will be, the PAM configuration should define an environment variable KRB5CCNAME which points to the correct Kerberos Ticket Cache.  SAS and other processes use the environment variable to access the Kerberos Ticket Cache. Running the following code in a SAS session will print in the SAS log the value of the KRB5CCNAME environment variable:

%let krb5env=%sysget(KRB5CCNAME);
%put &KRB5ENV;

Which should put something like the following in the SAS log:

43         %let krb5env=%sysget(KRB5CCNAME);
44         %put &KRB5ENV;
FILE:/tmp/krb5cc_100001_ELca0y

Now that the Kerberos Ticket-Granting-Ticket is available to the SAS session running on the server, the end user is able to submit code using SAS/ACCESS to Hadoop statements that access a secure Hadoop environment.

In my next blog in the series, we will look at what happens when we connect to a secure Hadoop environment from a distributed High Performance Analytics Environment.

More information

 

tags: authentication, configuration, Hadoop, Kerberos, SAS Administrators, security
10月 082014
 

When SAS is used for analysis on large volumes of data (in the gigabytes), SAS reads and writes the data using large block sequential IO.  To gain the optimal performance from the hardware when doing these IOs, we strongly suggest that you review the information below to ensure that the infrastructure (CPUs, memory, IO subsystem) are all configured as optimally as possible.

Operating-system tuning. Tuning Guidelines for working with SAS on various operating systems can be found on the SAS Usage Note 53873.

CPU. SAS recommends the use of current generation processors whenever possible for all systems.

Memory. For each tier of the environment, SAS recommends the following minimum memory, guidelines:

  • SAS Compute tier: A minimum of 8GB of RAM per core
  • SAS Middle tier: A minimum 24GB or 8GB of RAM per core, whichever is larger
  • SAS Metadata tier:  A minimum of 8GB of RAM per core

It is also important to understand the amount of virtual memory that is required in the system. SAS recommends that virtual memory be 1.5 to 2 times the amount of physical RAM. If, in monitoring your system, it is evident that the machine is paging a lot, then SAS recommends either adding more memory or moving the paging file to a drive with a more robust I/O throughput rate compared to the default drive. In some cases, both of these steps may be necessary.

IO configuration. Configuring the IO subsystem (disks within the storage, adaptors coming out of the storage, interconnect between the storage and processors, input into the processors) to be able to deliver the IO throughput recommended by SAS will keep the processor busy, allow the workloads to execute without delays and make the SAS users happy.  Here are the recommended IO throughput for the typical file systems required by the SAS Compute tier:

  • Overall IO throughput needs to be a minimum of 100-125 MB/sec/core.
  • For SAS WORK, a minimum of 100 MB/sec/core
  • For permanent SAS data files, a minimum of 50-75 MB/sec/core

For more information regarding how SAS does IO, please review the Best Practices for Configuring your IO Subsystem for SAS® 9 Applications (Revised May 2014) paper.

IO throughput. Additionally, it is a good idea to establish base line IO capabilities before end-users begin placing demands on the system as well as to support monitoring the IO if end-users begin suggesting changes in performance.  To test the IO throughput, platform specific scripts are available:

File system. The Best Practices for Configuring IO paper above lists the preferred local file systems for SAS (i.e. JFS2 for AIX, XFS for RHEL, NTFS for Windows). Specific tuning for these file systems can be found the above operating system tuning papers.

For SAS Grid Computing implementations, a clustered file system is required.  SAS has tested SAS Grid Manager with many file systems, and the results of that testing along with any available tuning guidelines can be found in the A Survey of Shared File Systems (updated August 2013) paper.  In addition to this overall paper, there are more detailed papers on Red Hat’s GFS2 and IBM’s GPFS clustered file systems on the SAS Usage Note 53875.

Due to the nature of SAS WORK (the temporary file system for SAS applications), which does large sequential reads and writes and then destroys these files at the termination of the SAS session, SAS does not recommend NFS mounted file systems. These systems have a history of file-locking issues on NFS systems, and the network can negatively influence the performance of SAS when accessing files across it, especially when doing writes.

Storage array. Storage arrays play an important part in the IO subsystem infrastructure.  SAS has several papers on tuning guidelines for various storage arrays, through the SAS Usage Note 53874.

Miscellaneous. In addition to the above information, there are some general papers on how to setup the infrastructure to best support SAS, these are available for your review:

Finally, SAS recommends regular monitoring of the environment to ensure ample compute resources for SAS.  Additional papers are available that provide guidelines for appropriate monitoring.   These can be found on the SAS Usage Note 53877.

tags: configuration, deployment, performance, SAS Administrators
10月 162013
 

"Do I really need a detailed technical architecture before I start my SAS Deployment?"

 My team gets asked these questions all the time:   Do we really need to spend the time for the above exercise?  Why can’t we just start doing the deployment of SAS and fix issues if they come up? 

The main reason you need more planning upfront is some issues will require you to do a complete deinstall, reconfigure and then reinstall to fix.  This process takes longer than the initial architecture exercise.

There are several very good SAS Global Forum papers that talk about what needs to be accounted for in this exercise.  They will help you cover all the bases before you start the installation and deployment of SAS software: 

Additionally, configuring their hardware is another area that we see SAS customers not planning well enough for.  Over time, there have been lots of papers that cover: 

  • the best way to configure your hardware
  • the operating system you will be using
  • the file systems (including shared/clustered file systems)
  • the storage array that will be used.  

Resources related to hardware configuration and planning are available in a list of papers useful for troubleshooting system performance problems.  Please bookmark this site as we will be adding more papers here and updating the existing papers here frequently.

 

tags: configuration, deployment, SAS Administrators
7月 032013
 

In my last post, I introduced the hardware solutions (such as a virtual IP switch or IP load balancer) that enable client applications to access services regardless of whether they are running on a primary or a failover server in a grid-enabled environment configured with high availability. In this post, I’ll detail the use of DNS resolution to ensure access to SAS servers.

About DNS resolution

Every client uses DNS resolution to find the IP address from the name of the server where it knows a service is running. In a high-availability scenario, the environment is usually configured to use aliases instead of real server names, such as meta_alias.exnet.xyz.com instead of sgcwin071.exnet.xyz.com or sgcwin072.exnet.xyz.com in the graphic below.

The corporate DNS does not know on which of the two or more possible hosts SAS services are running (we have no hardware load balancer here) so the software solution requires some means of integrating with the corporate DNS to return the correct IP address.

With SAS Grid Manager, it is EGO itself that does this, or more specifically, a component called EGO Service Director, and it can do return the correct IP address in a couple of different ways. The key factor determining the appropriate configuration is whether the EGO Service Director can send dynamic updates to the corporate DNS server.

Enabling dynamic corporate DNS updates

Dynamic updates may conflict with your organization’s IT policies: a compromised corporate DNS may bring down the whole network, so this option may not be appropriate in many settings. However, if the answer is "yes", EGO Service Director is granted write access to the corporate DNS. The alias for location of the EGO Service Director (named process) is kept up-to-date in the corporate DNS. As soon as EGO starts SAS services on a host, those aliases are written to an EGO DNS database.

What happens when there is a server failure? In the following example, the SAS server sgcwin071 failed. Once EGO starts the application on the failover server, it sends the address of this new host to the corporate DNS server. The entry for the meta_alias is updated in the DNS server, so when SAS Management Console makes a request to connect to the SAS Metadata Server on meta_alias on port 5555, the DNS server returns the address of the failover host sgcwin072.

Enabling DNS resolution with EGO DNS server

A more common option is to configure EGO Service Director as a stand-alone DNS server, which can serve as the authoritative name server for the SAS subdomain and respond to DNS queries for the high-availability SAS services it manages.

The virtual hostnames for the EGO high-availability services will always be in a subdomain of the corporate DNS domain. For instance, if the corporate domain were exnet.xyz.com, then all virtual hostnames for EGO high-availability services would be in the subdomain ego.exnet.xyz.com by default. It is important that the corporate DNS server be configured with multiple Name Server records for the EGO subdomain, one for each of the redundant nodes that can possibly execute the EGO DNS server.

These are the steps that SAS Management Console follows in order to connect to the SAS Metadata Server in this scenario:

  1. SAS Management Console makes a request to connect to the SAS Metadata Server on meta_alias.ego.exnet.xyz.com on port 5555.
  2. The corporate DNS finds in its internal table that all queries for addresses in the form *.ego.exnet.xyz.com are to be rerouted to another DNS running on IP1 or IP2 or IP3
  3. The EGO DNS receives the query and responds to SAS Management Console with the physical IP address for the meta_alias name, which is bound the physical server where the SAS Metadata Server is running.
  4. The connection request is properly routed to the sgcwin071.exnet.xyz.com host, where the SAS Metadata Server is running.

If SAS server sgcwin071 fails, EGO starts the managed service on the failover server, then it updates its internal DNS table with the address of the failover machine for the meta_alias name. When a new instance of SAS Management Console makes a request to connect to the SAS Metadata Server on meta_alias on port 5555, EGO DNS server returns the address of the failover machine, sgcwin072, as shown below.

For the software solution, the choice between direct use of the corporate DNS versus implementing an EGO DNS server, is usually determined by IT governance policies. Additionally, both types of software solutions outlined above have drawbacks. By default, Windows clients cache DNS entries for some minutes, so they will not get the new IP address until the cache expires. For all that time, they will not be able to connect to the failover host. To prevent this issue, SAS administrators must disable the DNS cache for all Windows clients. This step generates extra DNS traffic for all look-ups.

Comparison of hardware and software solutions

The following table shows a comparison of the hardware and software solutions:

Conclusion
The method used to resolve the virtual hostname to the SAS service’s current physical location is completely hidden from the client; it is exactly the same as any other host to which a connection is to be made. However, the detail of how virtual hostnames are resolved is how the two solutions differ. With either solution, hardware or software, the fundamental concepts are the same:

  • Define a virtual hostname for the services that are to be high-availability within the grid.
  • Any client wishing to access a high-availability grid service must use the virtual hostname.
  • The virtual hostname is resolved to the current physical location of the service within the grid.

You can find more detailed configuration information regarding EGO Service Director and DNS integration in the High Availability Services with SAS Grid Manager.

tags: configuration, grid, SAS Administrators, servers