Margaret Crevar

022017
 

SAS Global Forum 2017 is just a month away and, if you’re a SAS administrator, it’s a great place to meet your peers, share your experiences and attend presentations on SAS administration tips and tricks.

SAS Global Forum 2017 takes place in Orlando FL, April 2-5. You can find more information at https://www.sas.com/en_us/events/sas-global-forum/sas-global-forum-2017.html.  This schedule is for the entire conference and include pre and post conference events.

If you’re an administrator, though, I wanted to highlight a few events that would be of particular interest to you:

On Sunday, April 2nd from 2-4 pm there is a “Helping the SAS Administrator Succeed” event. More details can be found here.

On Monday, April 3rd from 6:30-8:00 pm the SAS Users Group for Administrators (SUGA) will be hosting a Community Linkup, with panelists on hand to help answer questions from SAS administrators. Location will be in the Dolphin Level – Asia 4.

There are two post-conference tutorials for the SAS Administrators:

Introduction to SAS Grid Manager, Wednesday, April 5th from 2:30-6:30pm
SAS Metadata Security, Thursday, April 6th from 8:00am-noon
More details can be found here.

For a list of the papers on the topic of SAS Administration, you can visit this link. You will see that SAS Administration has been broken down to Architecture, Deployment, SAS Administration and Security subtopic areas.

Some of the key papers under each sub-topic area are:

Architecture
Twelve Cluster Technologies Available in SAS 9.4
Deploying SAS on Software-Defined and Virtual Storage Systems
Shared File Systems:  Determining the Best Choice for your Distributed SAS Foundation Applications
Do You have a Disaster Recovery Plan for Your SAS Infrastructure

Deployment
Pillars of a Successful SAS Implementation with Lessons from Boston Scientific
Getting the Latest and Greatest from SAS 9.4: Best Practices for Upgrades and Migrations
Migrating Large, Complex SAS Environments: In-Place versus New Build

SAS Administration
SAS Metadata Security 201:  Security Basics for a New Administrator
SAS Environment Manager: Advanced Topics
The Top Ten SAS Studio Tips for SAS Grid Manager Administrators
Implementing Capacity Management Policies on a SASLASR Analytic Server Platform: Can You Afford Not To?
Auditing in SAS Visual Analytics
SAS Viya: What it Means for SAS Administration

Security
Guidelines for Protecting Your Computer, Network, and Data from Malware Threats
Getting Started with Designing and Implementing a SAS® 9.4 Metadata and File System Security Design
SAS® Metadata Security 301: Auditing your SAS Environment
SAS® Users Audit: An Automated Approach to Metadata Reporting
SAS® Metadata Security

In addition to the breakout sessions, there is an Administration Super Demo station where short presentations will be given. The schedule for these presentations is:

Sunday, April 2nd:
17:00     Shared File Systems for SAS Grid Manager
18:00     Where to Place SAS WORK in your SAS Grid Infrastructure

Monday, April 3rd:
11:00     Hands-on Secure Socket Layer Configuration for SAS 9.4 Environment Manager
12:00     Introduction to Configuring SAS Metadata Security for Mutlitenancy
13:00     SAS Viya Overview
14:00     Accelerate your SAS Programs with GPUs
15:00     Authentication and Identity Management with SAS Viya

Tuesday, April 4th:
11:00     Accelerate your SAS Programs with GPUs
12:00     Accelerating your Analytics Adoption with the Analytics Fast Track
13:00     New Deployment Experience for SAS
14:00     Managing Authorization in SAS Viya
15:00     Clustering in SAS Viya
16:00     A Docker Container Toolbox for the Data Scientist

As you can see, there is lots for SAS Administrators to learn and opportunities for SAS Administrators to socialize with fellow SAS Administrators.

Here’s to seeing you in sunny Florida next month.

P.S. SAS administrators don’t have to go to SAS Global Forum to get help administering their environment. In addition to SAS Global Forum and the SUGA group mentioned above, you can find out more information on resources for administrators in this blog. You can also visit our new webpage devoted just to users who administer their organization’s SAS environment. You can find that page here.

Resources for SAS Administrators at SAS Global Forum 2017 … and beyond was published on SAS Users.

042016
 

Testing EMC Storage and Veritas shared file systemsIn my current role I have the privilege of managing the Performance Lab in SAS R&D. Helping users work through performance challenges is a critical part of the Lab’s mission. This spring, my team has been actively testing new and enhanced storage arrays from EMC along with the Veritas clustered file system.  We have documented our findings on the SAS Usage note 42197 “List of Useful Papers.”

The two different flash based storages we tested from EMC are the new DSSD D5 appliance and XtremIO array.  The bottom line: both storages performed very nicely with a mixed analytics workload.  For more details on the results of the testing along with the tuning guidelines for using SAS with this storage, please review these papers:

As with all storage, please validate that the storage can deliver all the “bells and whistles;” you will need to support your failover and high availability needs of your SAS applications.

In addition to the storage testing, we tested with the latest version of Veritas InfoScale clustered file system.  We had great results in a distributed SAS environment with several SAS compute nodes all accessing data in this clustered file system.  A lot of information was learned in this testing and captured in the following paper:

My team plans to continue testing of new storage and file system technologies throughout the remainder 2016.  If there is a storage array or technology you would like to have tested, please let us know by sharing it in the comments section below or contacting me directly.

 

tags: performance, storage

Testing EMC Storage and Veritas shared file systems was published on SAS Users.

302016
 

SAS Global ForumI look forward to SAS Global Forum each year and this past conference ranked up that as one of the best I've ever attended. This year there were so many wonderful presentations, Super Demos and workshops on the topic of administration of SAS and the underlying hardware infrastructure needed for SAS applications. Below you'll find a list of the top 15 presentations based on attendance. We hope you find these papers useful.

10360 - Nine Frequently Asked Questions about Getting Started with SAS® Visual Analytics
2440 - Change Management: The Secret to a Successful SAS® Implementation
8220 - Optimizing SAS® on Red Hat Enterprise Linux (RHEL) 6 and 7
SAS6840 - Reeling Them Back In Keeping SAS® Visual Analytics Users Happy, Behind the Scenes
SAS2820 - Helpful Hints for Transitioning to SAS® 9.4
SAS6761 - Best Practices for Configuring Your I/O Subsystem for SAS®9 Applications
10861 - Best Practices in Connecting External Databases to SAS®
SAS6280 - Hands-On Workshop: SAS® Environment Manager
SAS5680 - Advanced Topics in SAS® Environment Manager
9680 - SAS Big Brother
SAS4240 - Creating a Strong Business Case for SAS® Grid Manager: Translating Grid Computing Benefits to Business Benefits
10962 - SAS® Metadata Security 201: Security Basics for a New SAS Administrator
8860 - SAS® Metadata Security 101: A Primer for SAS Administrators and Users Not Familiar with SAS
9920 - UCF SAS® Visual Analytics: Implementation, Usage, and Performance
11202 - Let the Schedule Manager Take Care of Scheduling Jobs in SAS® Management Console 9.4

Note: It's hard to believe, but we're already thinking about topics to present at SAS Global Forum 2017. If you have suggestions for SAS Administration paper topics you'd like to see at next year's conference, please share your thoughts in the comments below or contact me directly. 

tags: papers & presentations, SAS Administrators, SAS Global Forum

15 top Global Forum 2016 papers for SAS administrators was published on SAS Users.

162016
 

toomuchmemeoryWith memory being affordable now, we are constantly being asked by customers about doubling and tripling the amount of RAM that SAS recommends. More is better, right?

Often, but we have found a specific scenario using SAS datasets where that is not the case. Remember, that increasing memory generally means a commensurate increase not only in operational and computation memory, but in the host system file cache as well. The host system file cache is the portion of memory where SAS pages all of its READS and WRITES to and from storage.

Consider this quick example that arose recently. A customer had a host with a lot of memory, and hence, hundreds of Gigabytes of host system file cache, all to himself. This was a quiet system in which he enjoyed the spoils of excess.

Here is a quick example of what can backfire with populating a large host cache with very large SAS files. Consider the SAS program:


               DATA newds;
                  SET testds;
               RUN;

               DATA newds;
                  SET newds;
               RUN;

The first DATA step creates a new file, by setting an existing file. In the second DATA step, we are updating the dataset, newds, “in place.” The second DATA step in this case runs significantly longer than the first DATA step which set the dataset into a new file, even though they appear to be doing the same type of operation. The reason the second DATA step takes longer is the newds.sas7bdat.lck file that is created in SAS WORK by the DATA statement, cannot just be committed and closed as newds.sas7bdat until all the data associated with the original SAS data set newds.sas7bdat in the SET statement, has been flushed from file cache (i.e. RAM). The exact same pages of the original newds.sas7bdat residing in the host cache, are not being updated, they are being used to create a copy of that file into the new locked file newds.sas7bdat.lck. So we can’t commit the new file, until the pages from old file with the same name is flushed from host cache, and the original file deleted on storage.

If this file is 100s of Gigabytes in size, and most of its pages reside in the host system file cache, this flush can take a considerable amount of time, much longer than just a rename of the file like the first DATA step above, for instance. In the second DATA step, the original file must be emptied from cache by the page flush deamons, and on storage, to be replaced by the newds.sas7bdat.lck version before it can be closed and committed to storage.

So, very large SAS data files that fit into host system file cache, and have to be flushed before SAS can updated that file with the same name, can lead to much longer response times for that operation. This delay is commensurate with the size of the file and how many of its pages reside in the host cache. Please be aware it is generally not a good idea to update a file “in place,” e.g. update a file with the same name, for very large files, to avoid this type of behavior.

When can too much memory hurt SAS? was published on SAS Users.

十一 182015
 

From time to time we’ll hear from customers who are encountering performance issues. SAS has a sound methodology for resolving these issues and we are always here to keep your SAS system humming. However, many problems can be resolved with some simple suggestions. This blog will discuss different types of performance issues you might encounter, with some suggestions on how to effectively resolve them.

Situation: You are a new SAS customer or are simply running a new SAS application on new hardware
Suggestion: Be sure you’ve read and applied all the guidelines in the various tuning papers that have been written:

Making sure you understand the performance issues will help us determine what next steps are. It’s worth noting, 90% of performance issues are because your hardware, operating system and/or storage has not been configured based on the tuning guidelines listed above.  In a recent case we were able to get a 20% performance gain from a long running ETL process by adjusting two RHEL kernel parameters that have been documented for many years in our tuning paper.

Situation: Your SAS application has been running and over time gets slower
Suggestion: Determine if the number of concurrent SAS sessions/users has increased and/or the volume of data (both input and lookup tables) have increased.  This is the top reason for a gradual slowdown.

Situation: Your SAS application took a significant performance hit overnight or in a short time frame.
Suggestion: The first thing you want to do is see if any maintenance (tweaking of your system, hotfix, patch, …) have been made to your operating system, VMware, and/or storage arrays.  A lot of customers have applied maintenance (not to SAS) and SAS all of a sudden is running 2-5 times longer. You’ll want to check that all the operating system settings, mount options, and VMware settings are the same after the maintenance as they were before maintenance.

In conclusion, if you are having performance issues, check the suggested tuning guidelines. Also, be sure to keep track of all the settings for the hardware and storage infrastructure when applying maintenance to make sure these settings are the same afterwards as they were before.

Of course, if you have followed the guidelines and maintenance is not the reason for your performance issues, please contact us. We are here to help.

tags: performance, SAS Administrators, tuning

Tips to keep your SAS system humming was published on SAS Users.

272015
 

SAS recently performed testing using the Intel Cloud Edition for Lustre* Software - Global Support (HVM) available on AWS marketplace to determine how well a standard workload mix using SAS Grid Manager performs on AWS.  Our testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. You can find the detailed results in the technical paper, SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre.

In addition to the paper, Amazon will be publishing a post on the AWS Big Data Blog that will take a look at the approach to scaling the underlying AWS infrastructure to run SAS Grid Manager to meet the demands of SAS applications with demanding I/O requirements.  We will add the exact URL to the blog as a comment once it is published.

System design overview – network, instance sizes, topology, performance

For our testing, we set up the following AWS infrastructure to support the compute and IO needs for these two components of the system:

  • the SAS workload that was submitted using SAS Grid Manager
  • the underlying Lustre file system required to meet the clustered file system requirement of SAS Grid Manager.

SAS Grid Manager and Lustre shared file configuration on AWS clour

The SAS Grid nodes in the cluster are i2.8xlarge instances.  The 8xlarge instance size provides proportionally the best network performance to shared storage of any instance size, assuming minimal EBS traffic.  The i2 instance also provides high performance local storage, which is covered in more detail in the following section.

The use of an 8xlarge size for the Lustre cluster is less impactful since there is significant traffic to both EBS and the file system clients, although an 8xlarge is still is more optimal.  The Lustre file system has a caching strategy, and you will see higher throughput to clients in the case of frequent cache hits which effectively reduces the network traffic to EBS.

Steps to maximize storage I/O performance

The shared storage for SAS applications needs to be high speed temporary storage.  Typically temporary storage has the most demanding load.  The high I/O instance family, I2, and the recently released dense storage instance, D2, provide high aggregate throughput to ephemeral (local) storage.  For the SAS workload tested, the i2.8xlarge has 6.4 TB of local SSD storage, while the D2 has 48 TB of HDD.

Throughput testing and results

We wanted to achieve a throughput of least 100 MB/sec/core to temporary storage, and 50-75 MB/sec/core to shared storage.  The i2.8xlarge has 16 cores (32 virtual CPUs, each virtual CPU is a hyperthread on a core, and a core has two hyperthreads).  Testing done with lower level testing tools (fio and a SAS tool, iotest.sh)  showed a throughput of about 3 GB/sec to ephemeral (temporary) storage and about 1.5 GB/sec to shared storage.  The shared storage performance does not take into account file system caching, which Lustre does well.

This testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. For full details of the testing configuration and results, please see the SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre technical white paper.

 

tags: cloud computing, configuration, grid, SAS Administrators

The post Can I run SAS Grid Manager in the AWS cloud? appeared first on SAS Users.

082015
 

This year, the number of presentations, Super Demos and workshops to help with the administration of SAS software and its hardware infrastructure is growing at SAS Global Forum 2015.  This is wonderful news to the SAS Administrators coming to the conference.

These presentations start on Sunday with a 3.5 hour workshop offered by Greg Nelson of Thotwave  “SAS Administration: Understanding SAS Enterprise Architecture”.

SAS staff will be presenting on a variety of administration topics. Here is a sampling of topics of interest during the conference:

SAS1761 -  Proven Practices for Managing the Enterprise Administrators of a SAS Software Deployment

SAS1857 -  Hands-Off SAS Administration – Using Batch Tools to Make Your Life Easier

SAS1682 -  A Practical Approach to Managing a Multi-Tenant SAS Intelligence Platform Deployment

SAS1844 -  Securing Hadoop Clusters While Still Retaining Your Sanity

SAS1501 -  Best Practices for Configuring your IO Subsystem for SAS®9 Applications

SAS1500 - Frequently Asked Questions Regarding Storage Configurations

SAS1955 -Latest and Greatest: Best Practices for Migrating to SAS 9.4

SAS1520 - Operations Integration, Audits, and Performance Analysis: Getting the Most Out of SAS Environment Manager

SASSD4549 - Monitoring and Managing with SAS Environment Manager

SASSD4553 - Reporting Using the SAS Environment Manager

SAS1904 - Your Top Ten SAS Middle-Tier Questions

SASSD4546 - Deploying SAS 9.4 into a Public Cloud Like AWS

SASSD4548 - SAS in a Snap with Virtual Applications

SAS1928 - SAS Grid Manager – A Building Block Approach

SAS1897 - Planning for the Worst – SAS Grid Manager and Disaster Recovery

SAS1968 - Important Things to Know when Deploying SAS Grid Manager

SASSD4551 - SAS Security Commitment

SAS1779 - Row-level Security and SAS Visual Analytics

For a complete list of presentations, please visit SAS Global Forum Connect, filter on session topic area “Administration” or job role “SAS Administrator” or use the keyword search to find topics that you are interested in.

We look forward to seeing you at the conference and to listening to your needs.

tags: papers & presentations, SAS Administrators, SAS Global Forum

The post SAS Global Forum:  spotlight on SAS administrators appeared first on SAS Users.

112015
 

SAS FULLSTIMER is a SAS system option that takes operating system information that is being collected by SAS process runs and writes that information to the SAS log. Using it can add up to 10 lines additional lines to your SAS log for each SAS step in your SAS log—so why would I recommend turning it on?

This additional information includes memory utilization, date/time stamp for when each step finished, context switch information, along with some other operating-specific information regarding the SAS step that just finished.  Why would you need this much information?

This data is very useful in helping your SAS administrator and SAS support personnel determine why a SAS process may be running slower than expected. Having this information collected every time a SAS job is run means that data can be used to help determine which SAS step ran slower and at what time and under what circumstances.

Since the IT staff for most organizations are collecting hardware monitor data on daily basis, they can then use the information from the SAS log to pinpoint what time of day the performance issue occurred, on what system and using what file systems.

Again, this is just one way SAS users can be proactive in trying to solve any future performance issues. And all you need to do is add –FULLSTIMER to your SAS configuration file or to the SAS command line that you use to invoke SAS.

If you have any questions on the above, please let us know.  Here are additional resources if you want to learn more about SAS FULLSTIMER and its use:

SAS timer - the key to writing efficient SAS code

Improving performance: Determine the cause

Tune your SAS system for max performance

Troubleshoot Your Performance Issues: SAS® Technical Support Shows You How

Increasing IT’s awareness of SAS: A few good practices

tags: performance, SAS Administrators
252015
 

My Performance Validation team in SAS R&D is constantly working with our partners to test how their storage arrays work with SAS.  In late 2014, we finalized several papers that discuss how a mixed analytics workload performs on several storage arrays.  While doing this testing, we also listed lessons-learned in the tuning guidelines of each paper.

Please review the papers listed below:

 

These papers, along with lots of other papers for other storage, can be found in Usage Note 53874: Troubleshooting system performance problems: I/O subsystem and storage papers.  Please bookmark this SAS Usage note as we update this list of papers regularly.

Let me know if you have questions about these papers or if there are other new storage systems that you would like SAS to test.

tags: flash storage, SAS Administrators, SAS architecture
072015
 

With the growing use of SAS on commodity hardware, many organizations are running lots of SAS servers on separate instances of operating system in a SAS infrastructure. This configuration is great for optimizing resources, but when these SAS servers have to share data, then SAS recommends the use of a clustered file system.

This recommendation presents an issue for some companies. Because clustered file systems are not part of their standard operating system, it is an additional expense. So, to avoid driving up the cost of the hardware infrastructure for SAS, some IT administrators are proposing the use of NFS to share files among the SAS servers running on different instances of an operating system. Let’s look in more detail at the pros and cons for NFS as a shared file system with SAS.

When NFS is a wise choice

So, let’s quickly discuss why using NFS may be wise. NFS is great when used in a mostly “read” environment or for SAS shops that have small (less than 1GB) SAS data files. So, it can be good to use permanent SAS data files that are accessed primarily in a read-only manner with SAS jobs.

NFS can be affected by network bandwith, but not its speed and capacity per se. The real issue is largely one of NFS metadata cache coherency that causes the cached file system metadata to “dump” very frequently. NFS does this every time a read or write lock is placed on a file or the file’s attributes such as size change. This dumping of the cached metadata drastically interrupts large sequential writes and affects the ability to process the data because the file system is constantly re-reading via the network and updating the cached file system metadata.

When NFS is not optimal

NFS file sharing may not perform adequately when SAS users are making lots of updates to files or when data manipulation requires a significant amount of temporary storages. The SAS WORK file system is 50% write, 50% read and 100% delete when the SAS session is properly terminated. This type of IO does not work well with NFS as we’ve learned from SAS customer experience and testing within SAS. (Details on why this does not work well can be found in this SAS paper: A Survey of Shared File Systems (updated October 2014).

Please note that there are several storage arrays (many from Network Appliance and EMC Isilon) that only support an NFS-based file system. We are sure the underlying storage array is great, but because the file system associated with them is NFS-based, we strongly recommend they not be used for SAS WORK or for permanent SAS data files where lots of writes occur.

As always, please let us know if you have any questions or comments on the above.

tags: clustered file systems, NFS file system, SAS Administrators