SAS architecture

12月 072017
 

In SAS Viya, deployments identities are managed by the environments configured identity provider. In Visual SAS Viya deployments the identity provider must be an LDAP (Lightweight Directory Access Protocol)  server. Initial setup of a SAS Viya Deployment requires configuration to support reading the identity information (users and groups) from LDAP. SAS Viya 3.3 adds support for multi-tenancy which has implications for the way users and groups must be stored in LDAP. For the SAS Administrator, this means at least a rudimentary understanding of LDAP is required. In this blog post, I will review some key LDAP basics for the SAS Viya administrator.

A basic understanding of LDAP l ensures SAS Viya administrators can speak the same language as the LDAP administrator.

What is LDAP?

LDAP is a lightweight protocol for accessing directory servers. A directory server is a hierarchical object orientated database. An LDAP server can be used to organize anything. One of the most common uses is as an identity management system, to organize user and group membership.

LDAP directories are organized in a tree manner:

  • A directory is a tree of directory entries.
  • An entry contains a set of attributes.
  • An attribute has a name, and one or more values.

LDAP basics for the SAS Viya administrator

Below is an entry for a user Henrik. It has common attributes like:

  • uid User id
  • cn Common Name
  • L Location
  • DisplayName: name to display

The attribute value pairs are the details of the entry.

The objectclass attribute is worth a special mention. Every entry has at least one objectclass attribute and often more than one. The objectclass is used to provide the rules for the object including required and allowed attributes. For example, the inetorgperson object class specifies attributes about people who are part of an organization, including items such as uid, name, employeenumber etc.

LDAP Tree

Let’s now look at the organization of the tree. DC is the “domain component.” You will often see examples of LDAP structures that use DNS names for the domain component, such as: dc=sas,dc=com. This is not required, but since DNS itself often implies organizational boundaries, it usually makes sense to use the existing naming structure. In this example the domain component is “dc=geldemo,dc=com”. The next level down is the organizational unit (ou).  An organizational unit is a grouping or collection of entries. Organizational units can contain additional organizational units.

But how do we find the objects we want in the directory tree? Every entry in a directory has a unique identifier, called the Distinguished Name (DN). The distinguished name is the full path to the object in the directory tree. For example, the distinguished name of Henrik is uid=Henrik,ou=users, ou=gelcorp,dc=viyademo,dc=com. The distinguished name is the path to the object from lowest to highest (yes it seems backward to me to).

LDAP Queries and Filters

Like any other database LDAP can be queried and it has its own particular syntax for defining queries. LDAP queries are boolean expressions in the format

<em><strong>attribute operator value</strong></em>

<em><strong>uid = sasgnn</strong></em>

 

Attribute can be any valid LDAP attribute (e.g name, uid, city etc.) and value is the value that you wish to search for.  The usual operators are available, including:

Using LDAP filters, you can link two or more Boolean expressions together using the “filter choices” and/or. Unusually, the LDAP “filter choices” are always placed in front of the expressions. The search criteria must be put in parentheses and then the whole term has to be bracketed one more time. Here are some examples of LDAP queries that may make the syntax easier to follow:

  • (sn=Jones): return all entries with a surname equal to Jones.
  • (objectclass=inetorgperson) return entries that use the inegorgperson object class.
  • (mail=*): return all entries that have the mail attribute.
  • (&(objectclass=inetorgperson)(o=Orion)): return all entries that use the inetorgperson object class and that have the organization attribute equal to Orion (people in the Orion organization).
  • (&(objectclass=GroupofNames)(|(o=Orion)(o=Executive))) return all entries that use the groupofNames object class and that have the organization attribute equal to Orion OR the organization attribute equal to Executive (groups in the Orion or Executive organizations).

Why do I need to know this?

How will you apply this LDAP knowledge in SAS Viya? To enable SAS Viya to access your identity provider, you must update the SAS Identities service configuration. As an administrator, the most common items to change are:

  • BaseDN the entry in the tree from which the LDAP server starts it search.
  • ObjectFilter the filter used to identity and limit the users and groups returned.

There is a separate BaseDN and ObjectFilter for users and for groups.

To return users and groups to SASVIYA from our example LDAP server we would set:

sas.identities.providers.ldap.group.BasedN=ou=gelcorp,ou=groups,dc=viyademo,dc=com

sas.identities.providers.ldap.users.BasedN= ou=gelcorp,ou=users,dc=viyademo,dc=com

 

This would tell SASVIYA to begin its search for users and groups at those locations in the tree.

The object filter will then determine what entries are returned for users and groups from a search of the LDAP tree starting at the BaseDN. For example:

sas.identities.providers.ldap.group.objectFilter: 
(&amp;(objectClass=GroupOfNames)(o=GELCorp LTD))

sas.identities.providers.ldap.users.objectFilter: 
(&amp;(objectClass=inetOrgPerson)(o=GELCorp LTD))

 

There are a lot of LDAP clients available that will allow you to connect to an LDAP server and view, query, edit and update LDAP trees and their entries. In addition, the ldif file format is a text file format that includes data and commands that provide a simple way to communicate with a directory so as to read, write, rename, and delete entries.

This has been a high-level overview of LDAP. Here are some additional sources of information that may help.

Basic LDAP concepts

LDAP Query Basics

Quick Introduction to LDAP

How To Use LDIF Files to Make Changes to an OpenLDAP System

LDAP basics for the SAS Viya administrator was published on SAS Users.

12月 072017
 

In SAS Viya, deployments identities are managed by the environments configured identity provider. In Visual SAS Viya deployments the identity provider must be an LDAP (Lightweight Directory Access Protocol)  server. Initial setup of a SAS Viya Deployment requires configuration to support reading the identity information (users and groups) from LDAP. SAS Viya 3.3 adds support for multi-tenancy which has implications for the way users and groups must be stored in LDAP. For the SAS Administrator, this means at least a rudimentary understanding of LDAP is required. In this blog post, I will review some key LDAP basics for the SAS Viya administrator.

A basic understanding of LDAP l ensures SAS Viya administrators can speak the same language as the LDAP administrator.

What is LDAP?

LDAP is a lightweight protocol for accessing directory servers. A directory server is a hierarchical object orientated database. An LDAP server can be used to organize anything. One of the most common uses is as an identity management system, to organize user and group membership.

LDAP directories are organized in a tree manner:

  • A directory is a tree of directory entries.
  • An entry contains a set of attributes.
  • An attribute has a name, and one or more values.

LDAP basics for the SAS Viya administrator

Below is an entry for a user Henrik. It has common attributes like:

  • uid User id
  • cn Common Name
  • L Location
  • DisplayName: name to display

The attribute value pairs are the details of the entry.

The objectclass attribute is worth a special mention. Every entry has at least one objectclass attribute and often more than one. The objectclass is used to provide the rules for the object including required and allowed attributes. For example, the inetorgperson object class specifies attributes about people who are part of an organization, including items such as uid, name, employeenumber etc.

LDAP Tree

Let’s now look at the organization of the tree. DC is the “domain component.” You will often see examples of LDAP structures that use DNS names for the domain component, such as: dc=sas,dc=com. This is not required, but since DNS itself often implies organizational boundaries, it usually makes sense to use the existing naming structure. In this example the domain component is “dc=geldemo,dc=com”. The next level down is the organizational unit (ou).  An organizational unit is a grouping or collection of entries. Organizational units can contain additional organizational units.

But how do we find the objects we want in the directory tree? Every entry in a directory has a unique identifier, called the Distinguished Name (DN). The distinguished name is the full path to the object in the directory tree. For example, the distinguished name of Henrik is uid=Henrik,ou=users, ou=gelcorp,dc=viyademo,dc=com. The distinguished name is the path to the object from lowest to highest (yes it seems backward to me to).

LDAP Queries and Filters

Like any other database LDAP can be queried and it has its own particular syntax for defining queries. LDAP queries are boolean expressions in the format

<em><strong>attribute operator value</strong></em>

<em><strong>uid = sasgnn</strong></em>

 

Attribute can be any valid LDAP attribute (e.g name, uid, city etc.) and value is the value that you wish to search for.  The usual operators are available, including:

Using LDAP filters, you can link two or more Boolean expressions together using the “filter choices” and/or. Unusually, the LDAP “filter choices” are always placed in front of the expressions. The search criteria must be put in parentheses and then the whole term has to be bracketed one more time. Here are some examples of LDAP queries that may make the syntax easier to follow:

  • (sn=Jones): return all entries with a surname equal to Jones.
  • (objectclass=inetorgperson) return entries that use the inegorgperson object class.
  • (mail=*): return all entries that have the mail attribute.
  • (&(objectclass=inetorgperson)(o=Orion)): return all entries that use the inetorgperson object class and that have the organization attribute equal to Orion (people in the Orion organization).
  • (&(objectclass=GroupofNames)(|(o=Orion)(o=Executive))) return all entries that use the groupofNames object class and that have the organization attribute equal to Orion OR the organization attribute equal to Executive (groups in the Orion or Executive organizations).

Why do I need to know this?

How will you apply this LDAP knowledge in SAS Viya? To enable SAS Viya to access your identity provider, you must update the SAS Identities service configuration. As an administrator, the most common items to change are:

  • BaseDN the entry in the tree from which the LDAP server starts it search.
  • ObjectFilter the filter used to identity and limit the users and groups returned.

There is a separate BaseDN and ObjectFilter for users and for groups.

To return users and groups to SASVIYA from our example LDAP server we would set:

sas.identities.providers.ldap.group.BasedN=ou=gelcorp,ou=groups,dc=viyademo,dc=com

sas.identities.providers.ldap.users.BasedN= ou=gelcorp,ou=users,dc=viyademo,dc=com

 

This would tell SASVIYA to begin its search for users and groups at those locations in the tree.

The object filter will then determine what entries are returned for users and groups from a search of the LDAP tree starting at the BaseDN. For example:

sas.identities.providers.ldap.group.objectFilter: 
(&amp;(objectClass=GroupOfNames)(o=GELCorp LTD))

sas.identities.providers.ldap.users.objectFilter: 
(&amp;(objectClass=inetOrgPerson)(o=GELCorp LTD))

 

There are a lot of LDAP clients available that will allow you to connect to an LDAP server and view, query, edit and update LDAP trees and their entries. In addition, the ldif file format is a text file format that includes data and commands that provide a simple way to communicate with a directory so as to read, write, rename, and delete entries.

This has been a high-level overview of LDAP. Here are some additional sources of information that may help.

Basic LDAP concepts

LDAP Query Basics

Quick Introduction to LDAP

How To Use LDIF Files to Make Changes to an OpenLDAP System

LDAP basics for the SAS Viya administrator was published on SAS Users.

8月 312016
 

update_to_SASStudio_ 3.5Update-in-place supports the ability to update a SAS Deployment within a major SAS release. Updates often provide new versions of SAS products. However, when using the SAS Deployment Wizard to perform an update-in-place you cannot selectively update a machine or product. As a general rule if you want to update one product in a SAS Deployment you have to update the whole deployment. With the latest version of SAS Studio, that’s not the case.  You can now update from version 3.4 to version 3.5 of SAS Studio without updating any other part of your SAS deployment.

SAS Studio 3.5 contains some interesting new functionality:

  • A new batch submit feature.
  • The ability to create global settings for all SAS Studio users at your site.
  • A new Messages window that displays information about the programs, tasks, queries, and process flows that you run.
  • A new table of contents in results.
  • New keyboard shortcuts to add and insert code snippets.
  • Many new tasks for statistical process control, multivariate analysis, econometric analysis, and power and sample size analysis. For more information, see SAS Studio Tasks.

For my purposes, I was really interested in using the batch submit feature. Using “Batch Submit” a user can run a saved SAS program in batch mode, which means that the program will run in the background while you continue to use SAS Studio. When you run a program in batch mode, you can view the status of programs that have been submitted, and you can cancel programs that are currently running.

So how does this “selective update” work? Somewhat unusual for a product update, it is available via a hot fix documented in the note 57898: Upgrade SAS® Studio 3.4 to SAS® Studio 3.5 without upgrading other products.

SAS Studio is available in three different deployment flavors: SAS Studio Mid-Tier (the enterprise edition), SAS Studio Basic, and SAS Studio Single-User. The hot fix is available for the enterprise and basic edition. In addition, in order to apply the hot fix the current deployment must be at SAS 9.4 M3. For SAS Studio Single-User, an MSI file has been added to the downloads section of support.sas.com to allow users to download SAS Studio 3.5 to run against their existing Windows desktop SAS for releases 9.4M1 and higher.

The hot fix is a container hot fix, meaning the hot fix delivers one or more “MEMBER” hot fixes in one downloadable unit. Container hot fixes have some special rules you must follow when applying them.

  • They must be applied separately to each machine. The installation process will apply only those MEMBER hot fixes which are applicable based on the SAS Deployment Registry for each specific machine.
  • They may contain MEMBER hot fixes for multiple operating systems. The SAS Deployment Manager will apply only those MEMBER hot fixes which are applicable for the operating system on each specific machine.
  • They often contain pre and/or post installation steps outlined in the instructions provided.

A review of the hot fix instructions shows that to complete the update for the SAS Studio Mid-Tier the web application must be rebuilt and redeployed.

To apply the container hot fix on my three tier deployment, which has a Windows metadata server, LINUX compute tier and LINUX middle tier, I downloaded the hot fix to a network accessible location and followed the process documented in the hot fix instructions. To summarize:

Create a deployment registry report on each machine. The reports showed that:

SAS Studio Basic is installed on the Linux compute tier.

Update to SAS Studio 3.5

SAS Studio Enterprise is installed on the Linux middle-tier.

Update to SAS Studio 3.5_1

Update SAS Studio Basic

Stop all SAS servers in the deployment. Run the SAS Deployment Manager on the LINUX compute tier and select Apply Hot fixes and then select the directory where the hot fix was downloaded. The Wizard updates SAS Studio Basic. A review of the hot fix documentation shows no post-deployment steps are required for SAS Studio Basic.

Update to SAS Studio 3.5_2

Update SAS Studio Mid-Tier (Enterprise)

Run the SAS Deployment Manager on the LINUX middle-tier tier and select Apply Hot fixes and then select the directory where the hot fix was downloaded. The Wizard updates SAS Studio Mid-Tier.

Update to SAS Studio 3.5_3

A review of the hot fix documentation shows that, to complete the update, the SAS Studio Web Application must be rebuilt and redeployed.

Update to SAS Studio 3.5_4

Start the SAS Metadata Server and use the SAS Deployment Manager on the middle-tier to rebuild just the SAS Studio Middle-Tier. Start all SAS Servers and use the SAS Deployment Manager on the middle-tier machine to redeploy just the SAS Studio Middle Tier.

When the redeploy is completed, I logon to SAS Studio. Selecting Help > About shows that now I have SAS Studio 3.5.

Update to SAS Studio 3.5_5

If I navigate the folder tree and select a SAS program I can now right-click on the program and select “Batch Submit” to run the program in the background.

Update to SAS Studio 3.5_6

If you are excited about the new functionality of SAS Studio 3.5, I think you will agree that the hot fix provides an easy path to update the software.

tags: deployment, SAS Administrators, SAS architecture, SAS Professional Services, sas studio

A quick way to update to SAS Studio 3.5 was published on SAS Users.

8月 022016
 

One of the jobs of SAS Administrators is keeping the SAS license current.  In the past, all you needed to do was update the license for Foundation SAS and you were done. This task can be performed by selecting the Renew SAS Software option in the SAS Deployment Manager.

More recently, many SAS solutions require an additional step which updates the license information in metadata. The license information is stored in metadata so that middle-tier applications can access it in order to check whether the license is valid. Not all solutions require that the SAS Installation Data file (SID) file be stored in metadata, however the list of solutions that do require it is growing and includes SAS Visual Analytics. For a full list you can check this SASNOTE. To update the license information in metadata, run the SAS Deployment Manager and select Update SID File in Metadata.

Recently, I performed a license renewal for a Visual Analytics environment. A couple of days later it occurred to me that I might not have performed the update of the SID file in metadata. That prompted the obvious question: how do I check the status of my license file in metadata?

To check the status of a SAS Foundation license you can use PROC setinit. PROC setinit will return the details of the SAS license in the SAS log.

proc setinit;run;

steps to update your SAS License

The above output of PROC setinit shows the:

  • Expiration Date as 25MAY2017
  • Grace Period ends on 09JUL2917
  • Warning Period ends on 04SEP2017

This indicates that the software expires on 25MAY2017, however nothing will happen during the Grace Period. During the Warning Period messages in the SAS log will warn the user that the software is expiring. When the Warning Period ends on 04SEP2017 the SAS Software will stop functioning. PROC setinit is only checking the status of the Foundation SAS license, not the license in metadata.

If the foundation license is up-to-date but the license stored in metadata is expired the web applications will not work. It turns out SAS Environment Manager will also monitor the status of the SAS license. But is it the Foundation license or the license stored in metadata?

To see the status of the license in SAS Environment Manager, select Resources then select Browse > Platforms > SAS 9.4 Application Server Tier. The interface displays:

  • Days Until License Expiration:  the number of days until the license expires.
  • Days Until License Termination: the number of days until the software stops working.
  • Days Until License Termination Warning: the number of days until the Grace period.

steps to update your SAS License

Some testing revealed that Environment Manager is monitoring not the status of the foundation license but the status of the license in metadata. This is an important point, because as we noted earlier not all SAS solutions require the SID to be updated in metadata. Since Environment Manager monitors the license by checking the status of the SID file in metadata, administrators are recommended, as a best practice, to always update the SID file in metadata.

Environment manager with Service Architecture configured also will generate events that warn of license termination when the license termination date is within a month.

In addition, as of SAS 9.4 M3, SAS Management Console has an option to View metadata setinit details. To access this functionality you must be a member of the SAS Administrators Group or the Management Console: Advanced Role.

To check on a SID file in metadata open SAS Management Console and in the plug-ins tab:

1.     Expand Metadata Manager

2.     Select Metadata Utilities

3.     Right- click and select View metadata setinit details

steps to update your SAS License

Selecting the option gives details of the current SID file in metadata, with similar information as PROC setinit displays including the expiration date, the grace period and the warning period.  In addition it displays the date the SID file was last updated in metadata.

steps to update your SAS License

The takeaway: to fully renew SAS software, and ensure that SAS Environment Manager has the correct date for its metrics on license expiration, always use SAS Deployment Manager to both Update the SAS License, AND Update the SID File in Metadata.

To check if your SAS Deployment license has been fully updated, do the following:

1.     Run PROC setinit to view the status of the SAS Foundation license.

2.     Use SAS Management Console or SAS Environment Manager to check if the SID file has been updated in metadata.

For more information on this topic see the video, “Use SAS Environment Manager to Get SAS License Expiration Notice” and additional resources below:

 

SAS® Deployment Wizard and SAS® Deployment Manager 9.4:User’s Guide: Update SID File in Metadata
SAS® Deployment Wizard and SAS® Deployment Manager 9.4:User’s Guide: Renew SAS Software
SAS(R) 9.4 Intelligence Platform: System Administration Guide: Managing Setinit (License) Information in Metadata
SAS® Environment Manager 2.5 User’s Guide

tags: configuration, SAS Administrators, SAS architecture, SAS Environment Manager, SAS Professional Services

Two steps to update your SAS License and check if it is updated was published on SAS Users.

10月 262015
 

SAS Grid Manager for Hadoop is a brand new product released with SAS 9.4M3 this summer. It gives you the ability to co-locate your SAS Grid jobs on your Hadoop data nodes to let you further leverage your investment in your Hadoop infrastructure. This is possible because SAS Grid Manager for Hadoop is integrated with the native components, specifically YARN and Oozie, of your Hadoop ecosystem. Let's review the architecture of this new offering.

First of all, the official name– SAS Grid Manager for Hadoop– shows that it is a brand new product, not just an addition or a different configuration of the “classic” SAS Grid Manager – which I will subsequently refer to as “for Platform” to distinguish the two.

For an end user, grid usage and functionality remains the same, but an architect will notice that many components of the offering have changed. Describing these components will be the focus of the remainder of this post.

Let me start by showing a picture of a sample software architecture, so that it will be easier to recognize all the pieces with a visual schema in front of us. The following is one possible deployment architecture; there are other deployment choices.

SAS_Grid_Manager_for_Hadoop_9_4M3_Architecture_v1_1_full

Third party components

Just as SAS Grid Manager for Platform builds on top of third party software from Platform Computing (part of IBM), SAS Grid Manager for Hadoop requires Hadoop to function. There is a big difference, though.

SAS Grid Manager for Platform includes all of the required Platform Computing components, as they are delivered, installed and supported by SAS.

On the other side, SAS Grid Manager for Hadoop considers all of the Hadoop components (highlighted in yellow in the above diagram) as prerequisites. As such, customers are required to procure, install and support Hadoop before SAS gets installed.

Hadoop, as you know, includes many different components. The diagram lists the one that are needed for SAS Grid Manager:

  • HDFS provides cluster-wide filessytem storage
  • YARN is used for resource management
  • Oozie is the scheduling service
  • Hue is required, if the Oozie web GUI is surfaced through Hue.
  • Hive is required at install time for the SAS Deployment Wizard to be able to access the required Hadoop configuration and jar files.
  • Hadoop jars and config files need to be on every machine, including clients.

YARN Resource Manager, HDFS Name Node, Hive, and Oozie are not necessarily on the same machine. By default, the SAS grid control server needs to be on the machine that YARN Resource Manager is on.

SAS Components

SAS programming interfaces to grid have not changed, apart from the lower-level libraries to connect to the third party software. As such, SAS will deploy the traditional SAS grid control server, SAS grid nodes, SAS thin client (aka SASGSUB) or the full SAS client (SAS Display Manger).

In a typical SAS Grid deployment, a shared directory is used to share the installation and configuration directories between machines in the grid. With SAS Grid Manager for Hadoop, you can either use NFS to mount a shared directory on all cluster hosts or use the SAS Deployment Manager (SDM) to work with the cluster manager to distribute the deployment to the cluster hosts. The SDM has the ability to create Cloudera parcels and Ambari packages to enable the distribution of the installation and configuration directories from the grid control server to the grid nodes.

One notable missing component is the SAS Grid Manager plug-in for SAS Management Console. This management interface is tightly coupled with Platform Computing GMS, and cannot be used with Hadoop.

The Middle Tier

You will notice in the above diagram that the middle tier is faded. In fact, no middle tier components are included in SAS Grid Manager for Hadoop. Anyway, a middle tier will generally be included and deployed as part of other solutions licensed on top of SAS Grid Manager, so you will still be able to program using SAS Studio and monitor the SAS infrastructure using SAS Environment Manager.

Please note that I say “monitor the SAS infrastructure”, not “monitor the SAS grid.” There are no plug-ins or modules within SAS Environment Manager that are specific to SAS Grid Manager for Hadoop.   This is by design because SAS is part of your overall Hadoop environment and therefore the SAS Grid workload can be monitored using your favorite Hadoop management tools.

Hadoop provides plenty of web interfaces to monitor, manage and configure its environment. As such, you will be able to use YARN Web UI to monitor and manage submitted SAS jobs, as well as Hue web UI to review scheduled workflows.

The Storage

Discussing grid storage is never a quick task and could require a full blog post on its own. It is worth noting some architecture peculiarities related to SAS Grid Manager for Hadoop. HDFS can be used to store shared data, and is used to store scheduled jobs, workflows, logs. But, we still require a traditional, POSIX complaint filesystem for stuff such as SAS Work, SASGSUB, solution specific projects, etc.

Conclusion

SAS Grid Manager for Hadoop enables customers to co-locate their SAS Grid and all of the associated SAS workload on their existing Hadoop cluster. We have briefly discussed the key components that are included in – or are missing from – this new offering. I hope you found this post helpful. As always, any comments are welcome.

tags: grid, Hadoop, SAS architecture, SAS Grid Manager, SAS Grid Manager for Hadoop, SAS Professional Services

SAS Grid Manager for Hadoop architecture was published on SAS Users.

10月 142015
 

sizingSizing is a topic that solutions managers typically leave until the end after decisions about the application have been settled. But there are often many variables that can impact the final size requirement. We have seen across our customer base that sizing and the number of environments has been determined by predicted data volumes, the types of environments that need to be supported and the budget available.

Environments

Technical architects spend time debating what environment is right for their business and of course this is no easy decision.  Often the business changes its mind, data volumes increase (often with little or no advance warning), data sources vary, different teams need access and with this performance issues creep in.

Production – this one is a must so is easy to say yes to. It’s perhaps the easiest of the estimates as long as the solutions team is able to predict the volume of data.

Undersizing is a common problem here for many reasons. The most common reason is when the solution has been far more successful and has attracted more users, and/or data sources. The second common cause of this is where the procurement team has persuaded the solutions managers that they can make do with less resources. Finally we also sometimes see incorrect assumptions being used when sizing.

Test – what and when will be tested, and how frequently, are the key questions here

Development – Increasingly we are finding customers trying to minimize the environments. However, for data quality a development server is pretty key to have in place. Don’t let the development server be an afterthought.

Architecture workshop

The most effective way we found to determine optimum sizing is an in-depth workshop with an experienced architect. But such a workshop typically requires a lead time of two-to-four weeks to set up as experts review requirements and work on proposed options. The fallacy of budget constraints - companies may try to save money by reducing environments, however this can end up costing more.

Sometimes though, solutions managers discover very late in the implementation cycle that they need to revisit their sizing/number of environments.   If a resizing has to take place, additional budgets secured and a reworking of the installation, this can have a real impact on time/resource/cost and more importantly the time it takes to start gaining business benefit.

In such situations, we have found three workarounds:

  1. Spend time with the vendor’s architect team to understand what is required now and moving forwards – get advice
  2. Understand the business’s expectations and requirements for the next 24 months so it’s a robust scaleable solution
  3. Get back on track as soon as possible so the business can realize value from improved data quality/analytics/access to Hadoop etc.

Interesting article here by David Lashin virtualised environments.

Communication is king throughout the sizing exercise - between the technical teams of both vendor and customer and between the business and IT teams about exact usage, data volumes and critically growth plans for the coming 12-to-18 months.

Summary

Across all organizations we see various debates happening – our advice is get this topic out on the table and into the open as early as possible. It is critical to the success of any project, it will enable the deployment to be smoother and the adoption and therefore the time to value will be reduced (which is critical).

We would love to hear from you – does any of the above resonate with you? Have you had good/bad experiences? Share your thoughts with caroline.hermon@sas.com

For more hot topics follow me @hermon100 for more tips from Caroline at the Coalface!

tags: data management, SAS architecture, sizing

Sizing: the long and short of it was published on SAS Users.

6月 102015
 

Recently my wife and I took our annual anniversary trip – this time we went to the Grand Canyon, staying in Las Vegas. In researching our options to fly from Raleigh-Durham (RDU) to Las Vegas (LAS), we had several different selection criteria:

  • what time we wanted to leave
  • what time we wanted to arrive
  • price
  • number of stops
  • layover time
  • airline – loyalty program
  • type of aircraft – seating, amenities, food, wifi

All of the flights we looked at would get us from RDU to LAS and back. So the destination wasn’t the issue – it was how much value we placed on each of the attributes: arriving in the afternoon (hotel check-in is 3:00pm) versus spending more money for a nonstop flight, for example. We made our decisions based on our specific needs at that time. We also have different opinions of what was important (I’m basically cheap, and my wife refuses to take the red-eye flight).

The evaluation of storage for a SAS solution can be viewed in a similar fashion. There are tradeoffs to be made, or certainly criteria which will be evaluated and prioritized. This blog posting will briefly examine three such attributes and how they may impact storage in a SAS environment.

Who says you can’t have it all?

tradeoffsThis diagram highlights three of the more common attributes that are considered when evaluating storage. While there are certainly other considerations (capacity, interfaces, architecture), these three are usually involved in most storage decisions. This diagram also suggests that there are tradeoffs to be considered: for example, between price and performance (higher performance may require higher price). Let’s briefly examine each of these, and where we may see tradeoffs in a SAS environment.

Price

Price is usually among the first attributes that come up in any discussion of storage. Everyone is looking to save money, and unfortunately storage often gets compromised. Consider this scenario: our SAS deployment will need about 5 terabytes (TB) of storage. In terms of raw capacity, a new 5TB disk drive can be bought from a number of online vendors for around $150.00 USD. While this drive may meet the capacity requirements, it most likely is not the best selection for a SAS deployment – especially if there are performance or availability considerations. Typical enterprise-class SAS storage may involve configurations with multiple disks and controllers, and perhaps shared storage such as  Network Attached Storage (NAS) or Storage Area Networks (SAN). Factoring in these, and possibly other, considerations would most likely (significantly!) increase the price of our storage.

Performance

SAS applications are consumers of storage, and have significant performance expectations for I/O throughput. Many SAS field consultants can share stories of under-performing storage leading to failed deployments and unhappy customers. SAS has minimum recommended I/O throughput rates of file systems that are to be used in a SAS environment, and the Performance Evaluation team within SAS R&D has written several papers that document best practices and tuning guidelines. There is even a usage note about testing throughput for your SAS9 File Systems. Multiple configuration options are reviewed and discussed, ranging from shared file systems to external SAN or NAS arrays.

Availability

Deploying SAS applications into a business-critical environment or where there are availability requirements such as a Service Level Agreement require careful attention to the type and configuration of storage used. Since SAS is implemented on the host OS file systems, commonly used high availability strategies can be used effectively. From simple strategies, such as configuring local storage using RAID mirroring, to more complex enterprise-class solutions, such as redundancy through a SAN, the appropriate level of high availability can be designed and deployed to assure that the storage is designed to meet the needs of the business.

So how does all this fit together?

tradeoffsAs you can see, none of these criteria should be considered independent of the others when designing and evaluating storage solutions for SAS environments. There will be tradeoffs made in the evaluation process, and priorities will be established. For some areas, such as performance, there are guidelines established by SAS R&D. In other areas, specific needs of the customer (a specific SLA, for example) may dictate specific design decisions. In addition, there’s some flexibility in certain areas – filesystems containing SAS permanent data should be allocated to a more available, more protected storage area than the temporary filesystem of SASWORK. A detailed analysis of the storage needs of the SAS deployment as a part of the overall architecture design will consider these three, in addition to other criteria.

 

In case you were wondering, we didn’t take the red-eye.

tags: SAS architecture, SAS Professional Services, storage

The post Evaluating tradeoffs when designing storage for SAS applications appeared first on SAS Users.

2月 252015
 

My Performance Validation team in SAS R&D is constantly working with our partners to test how their storage arrays work with SAS.  In late 2014, we finalized several papers that discuss how a mixed analytics workload performs on several storage arrays.  While doing this testing, we also listed lessons-learned in the tuning guidelines of each paper.

Please review the papers listed below:

 

These papers, along with lots of other papers for other storage, can be found in Usage Note 53874: Troubleshooting system performance problems: I/O subsystem and storage papers.  Please bookmark this SAS Usage note as we update this list of papers regularly.

Let me know if you have questions about these papers or if there are other new storage systems that you would like SAS to test.

tags: flash storage, SAS Administrators, SAS architecture
10月 272014
 

Perhaps it is my astrological sign, but as a Gemini, I seem to be cursed blessed with a duality that consists of balancing my creative side with exacting nature of my logical side.  In my personal life, I enjoy woodworking.  Whether it is creating “art in the round” from a block of wood or designing a functional piece that must withstand daily use, they both exercise this polarity - the creative process as well as the fine tuned aspects of engineering. I cannot touch a piece of furniture without running my fingers over the joints and appreciating the beauty and exactness of the work.

Greg_bowl_icon  Greg_box_icon I find those two sides converging in my designs of SAS environments—the creative aspect lies in the fact that there are lots of ways to design it, whereas the engineering side ensures that it must perform.

 
As SAS administrators, we are often left to deal with the design of others and forced to make it work. Even though the requirements evolve, the usage grows beyond expectation, and the unintended consequence of your own success forces you to maintain and innovate on an almost continual basis.

An example of this is SAS Visual Analytics.  Traditional BI and even ETL architectures are designed and tuned to handle a specific workload. Memory, CPU and I/O are optimized to take advantage of client-server requests and batch windows for loading and transforming data.  These have become fairly predictable and manageable.  However, faced with the growing demand of massively large, in-memory process for discovery and insight, the same rules about architecture and design need to be reconsidered.  Combined with in-database, both models force the designer to anticipate the proper design and architect for the unexpected.

Considering everything from storage to virtualization, version control to disaster recovery and everything in between, most SAS administrators don’t get the opportunity to practice their architecture and design skills nor do most have the broad range of skills required to do this for enterprise-class architectures.

I am excited to say that you will have the opportunity to get some real world experience in a workshop specifically designed for people who want to learn about enterprise architecture.  At SAS Global Forum 2015 next April, I will be joined by a team of international experts who do this every day.  This workshop is designed to make you think about how to translate technical and functional requirements into logical and physical designs.  Essentially, we will go through the life cycle of an enterprise architecture design—from requirements to sizing to performance management.

The workshop attendees will participate in large group sessions to learn about the desired characteristics of their design and work through the detailed design in small teams to create an optimal solution.  Each team will be given an opportunity to be paired with senior architects and experts in storage, I/O, database, networking, virtual and distributed computing, metadata, governance and SAS technologies.  They will be challenged with creating a design that satisfies the stated and implied requirements, and each team will have an opportunity to compete with other teams for the best design rated on a number of characteristics.  The team designs will be showcased at the first-ever SAS Administrators Reception and a grand champion announced.

We will be working over the winter months to bring you the absolutely best experience. In the meantime, please feel free to offer suggestions about design topics that you are struggling with or have solved in your own experiences.

Remember, Happy Data, Happy Users!

--greg

 

tags: SAS Administrators, SAS architecture, SAS Global Forum
9月 192013
 

My family are all Lord of the Rings Trilogy fans. As a novice in the world of SAS administration, I find discussing the SAS middle tier architecture a little like traveling through Middle Earth. For me, it’s new and fascinating terrain. And like other travelers, I would find it useful to have a map of my SAS environment.

My journey started with this question recently posed to the SAS Deployment community:  “Is there a plan to support WebSphere again as an SAS 9.4 midtier web application server?”

Both replies are correct. With SAS 9.4, you won’t have the added cost, installation and maintenance of a third-party web application server, such as JBoss, WebSphere, or WebLogic. The SAS 9.4 Intelligence Platform license includes its own embedded web server and web application server:

But there’s more to the middle tier story. As SAS administrators, you’ll appreciate these new capabilities and features that simplify your life:

  • Middle tier components are installed as a single unit and configured automatically.
  • Multiple web application servers are clustered and load-balanced by default.
  • Web servers are optimized and tuned to meet the requirements of SAS workloads.
  • Patch management has been streamlined.
  • SAS Technical Support is the single point of contact for support issues and questions.
  • Everything you need to build a robust cloud platform is included in the box.

These new capabilities are available because of architectural and software changes between SAS 9.3 middle tier and SAS 9.4 middle tier. Here are some maps and travel guides you might find useful: 

Side-by-side diagrams of SAS 9.3 and SAS 9.4 middle tier architectures

Let’s take a closer look at the middle tier components and how they interact:

SAS Web Application Server
SAS Web Application Server is a lightweight server dedicated to running SAS web applications. Its deployment footprint is reduced because the embedded web application server functions as a web container, which is a small part of the overall functionality included in third-party commercial web application servers.

SAS 9.4 also simplifies deploying and managing a web application server because the server and the software that automates server configuration tasks are packaged together. You’ll use the SAS Deployment Wizard to install and to apply updates or SAS hot fixes, just as with any other piece of SAS software. Additionally, the configuration tools that are packaged with the software are designed to interact with the SAS Metadata Server and other SAS software products to maintain reliability and reduce administration of the SAS deployment.

The SAS deployment tools can install and configure the SAS Web Application Server for horizontal and vertical clustering.

SAS Web Server
SAS Web Server is an HTTP server that is configured as a single connection point for SAS web applications. Using the SAS Deployment Wizard you can configure it with the following features:

  • load-balancing proxy server when the SAS Web Application Server is clustered
  • HTTPS support for SAS Web Server with CA-signed certificates
  • cache static web content like JavaScript files, cascading style sheets, and graphics files.

SAS Cache Locator
The SAS Cache Locator software is used to tell new, connecting members like SAS Web Application Server where running members are located and provides load balancing for server use. Whether one or two locators are installed depends on your deployment topology:

  • In a single machine deployment, the SAS Deployment Wizard prompts for a cache locator port on the Web Application Server: Cache Locator Configuration and Scheduling Services Cache Locator pages. If you specify different port numbers, then two locators are configured.
  • In a multiple machine deployment, two locators are configured. One is configured on the primary middle-tier machine and one is configured on the server-tier machine.

The SAS Deployment Wizard does not install and configure more than two locators. The two locators are peers and when one is down, the other can do all the work. The two locators provide a failover support.

SAS JMS Broker
A SAS JMS Broker instance is configured as a server on the machine that is used for the SAS middle tier. This software fully implements the Java Message Service 1.1 specification and acts as a message broker. Some SAS web applications use JMS connection factories, queues and topics for implementing business logic. These resources are configured in SAS Web Application Server for use by the SAS web applications. SAS JMS Broker also provides advanced features such as clustering, multiple message stores and the ability to use file systems and databases as a JMS persistence provider.

SAS Environment Manager
SAS Environment Manager software includes an agent process that is installed on each server-tier and middle-tier machine in the deployment. Each agent gathers performance metrics and sends the data to a server process that runs on a middle-tier machine. The server process includes a web application server that provides a web-based administrative interface. Administrators use the SAS Environment Manager Web Application to monitor and manage numerous components in the SAS environment.

Additionally, plug-ins have been created to interact with SAS specific parts of the deployment. For example the SAS plug-ins collect metrics on the SAS Metadata Server such as journal usage, client connections, CPU usage, and memory utilization.

Java Runtime Environment
A Java environment is included in the SAS 9.4 middle tier, eliminating the need to install a separate JRE.

Other Components
The SAS 9.4 middle tier may include other components such as the SAS Web Infrastructure Platform, SAS web applications and other SAS products and solutions.

Additional Information:

 

tags: SAS 9.4, SAS Administrators, SAS architecture