storage

9月 042019
 

Editor’s note: This article is a continuation of the series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. Access all the articles in the series here.

In a previous article in this series, Accessing Databases in the Cloud – SAS Data Connectors and Amazon Web Services, I covered SAS and database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS). Today, I cover the various storage options available on AWS and how connect to and interact with them from SAS.

Object Storage

Amazon Simple Storage Service (S3) is a low-cost, scalable cloud object storage for any type of data in its native format. Individual Amazon S3 objects can range in size from 1 byte all the way to 5 terabytes (TB). Amazon S3 organizes these objects into buckets. A bucket is globally unique. You access the bucket directly through an API from anywhere in the world, if granted permissions. The default granted to the bucket is least access. Amazon advertises 11 9’s, or 99.999999999% of durability, meaning that you never lose your data. Data replicates automatically across availability zones to meet this durability. You can reduce the number of replicants or use one of the various tiers of archive services to reduce your object storage cost. Costs are calculated based on terabytes of storage per month with added costs for request and transfers of data.

SAS and S3

Support for Amazon Web Services S3 as a Caslib data source for SAS Cloud Analytic Services (CAS) was added in SAS Viya 3.4. This data source enables you to access SASHDAT files and CSV files in S3. You can use the CASLIB statement or the table.addCaslib action to add a Caslib for S3. SAS is currently exploring native object storage integration with AWS S3 for more file types. For other file types you can copy the data from S3 and then use a SAS Data Connector to load the data into memory. For example, if I had Excel data in S3, I could use PROC S3 to copy the data locally and then load the data into CAS using the SAS Data Connector to PC Files.

Block Storage

Amazon Elastic Block Store (EBS) is the block storage service designed for use with Amazon Elastic Compute Cloud (EC2). Only when attached to an operating system is the storage class accessible. Storage volumes can be treated as an independent disk drive controlled by a server operating system. You would mount an EBS volume to an operating system as if it were a physical disk. EBS volumes are valuable because they are the storage that will persist when you terminate your compute instance. You can choose from four different volume types that supply performance levels at corresponding costs.

SAS and EBS

EBS is used as the permanent SAS data storage and persists through a restart of your SAS environment. The performance choices made when selecting from the different EBS volume type will have a direct impact on the performance that you get from SAS. One thing to consider is using compute instances that have enhanced EBS performance or dedicated solid state drive instance storage. For example, the SAS Viya on AWS QuickStart uses Storage Optimized and Memory Optimized compute instances with local NVMe-based SSDs that are physically connected to the host server that is coupled to the lifetime of the instance. This is beneficial for performance.

SAS Cloud Analytic Services (CAS) is an in-memory server that relies on the CAS Disk Cache as the virtual memory storage backend. This is especially true if you are reading data from a database. In this case, make sure you have enough block storage, in the form of EBS volumes for use as the CAS Disk Cache.

File Storage

Amazon Elastic File System (EFS) provides access to data through a shared file system. EFS is an elastic network file system that grows and shrinks as you add or remove files, so you only pay for the storage you consume. Users create, delete, modify, read, and write files organized logically in a directory structure for intuitive access. This allows simultaneous access for multiple users to a common set of file data managed with user and group permissions. Amazon FSx for Lustre is the high-performance file system service.

SAS and EFS

EFS shared file system storage can be a powerful tool if utilizing a SAS Grid architecture. If you have a requirement in your SAS architecture for a shared location that any node in a group can access and write to, then EFS could meet your requirement. To access the data stored in your network file system you will have to mount the EFS file system. You can mount your Amazon EFS file systems to any EC2 instance, or any on-premises server connected to your Amazon VPC.

BONUS: Serverless

Amazon Athena is query service for Amazon S3. This service makes it easy to submit queries against the objects stored in S3. You can run analysis on this data using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run. Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet.

SAS and Athena

Amazon Athena is ODBC/JDBC compliant which means I can use SAS/ACCESS Interface to ODBC or SAS/ACCESS Interface to JDBC to connect using SAS. Download an Amazon Athena ODBC driver and submit code from SAS just like you would any ODBC data source. Athena is a great tool if you want to use the serverless computing power of Amazon to query data in S3.

Finally

Many times, we do not have a choice of technologies we use and infrastructures on which they sit. Luckily, if you use AWS, integration with SAS is not a concern. I’ve now covered databases and storage for AWS. In future articles, I’ll cover the same topics for Microsoft Azure and Google Cloud Platform.

Additional Resources

Storage in the Cloud – SAS and Amazon Web Services was published on SAS Users.

8月 042016
 

Testing EMC Storage and Veritas shared file systemsIn my current role I have the privilege of managing the Performance Lab in SAS R&D. Helping users work through performance challenges is a critical part of the Lab’s mission. This spring, my team has been actively testing new and enhanced storage arrays from EMC along with the Veritas clustered file system.  We have documented our findings on the SAS Usage note 42197 “List of Useful Papers.”

The two different flash based storages we tested from EMC are the new DSSD D5 appliance and XtremIO array.  The bottom line: both storages performed very nicely with a mixed analytics workload.  For more details on the results of the testing along with the tuning guidelines for using SAS with this storage, please review these papers:

As with all storage, please validate that the storage can deliver all the “bells and whistles;” you will need to support your failover and high availability needs of your SAS applications.

In addition to the storage testing, we tested with the latest version of Veritas InfoScale clustered file system.  We had great results in a distributed SAS environment with several SAS compute nodes all accessing data in this clustered file system.  A lot of information was learned in this testing and captured in the following paper:

My team plans to continue testing of new storage and file system technologies throughout the remainder 2016.  If there is a storage array or technology you would like to have tested, please let us know by sharing it in the comments section below or contacting me directly.

 

tags: performance, storage

Testing EMC Storage and Veritas shared file systems was published on SAS Users.

6月 102015
 

Recently my wife and I took our annual anniversary trip – this time we went to the Grand Canyon, staying in Las Vegas. In researching our options to fly from Raleigh-Durham (RDU) to Las Vegas (LAS), we had several different selection criteria:

  • what time we wanted to leave
  • what time we wanted to arrive
  • price
  • number of stops
  • layover time
  • airline – loyalty program
  • type of aircraft – seating, amenities, food, wifi

All of the flights we looked at would get us from RDU to LAS and back. So the destination wasn’t the issue – it was how much value we placed on each of the attributes: arriving in the afternoon (hotel check-in is 3:00pm) versus spending more money for a nonstop flight, for example. We made our decisions based on our specific needs at that time. We also have different opinions of what was important (I’m basically cheap, and my wife refuses to take the red-eye flight).

The evaluation of storage for a SAS solution can be viewed in a similar fashion. There are tradeoffs to be made, or certainly criteria which will be evaluated and prioritized. This blog posting will briefly examine three such attributes and how they may impact storage in a SAS environment.

Who says you can’t have it all?

tradeoffsThis diagram highlights three of the more common attributes that are considered when evaluating storage. While there are certainly other considerations (capacity, interfaces, architecture), these three are usually involved in most storage decisions. This diagram also suggests that there are tradeoffs to be considered: for example, between price and performance (higher performance may require higher price). Let’s briefly examine each of these, and where we may see tradeoffs in a SAS environment.

Price

Price is usually among the first attributes that come up in any discussion of storage. Everyone is looking to save money, and unfortunately storage often gets compromised. Consider this scenario: our SAS deployment will need about 5 terabytes (TB) of storage. In terms of raw capacity, a new 5TB disk drive can be bought from a number of online vendors for around $150.00 USD. While this drive may meet the capacity requirements, it most likely is not the best selection for a SAS deployment – especially if there are performance or availability considerations. Typical enterprise-class SAS storage may involve configurations with multiple disks and controllers, and perhaps shared storage such as  Network Attached Storage (NAS) or Storage Area Networks (SAN). Factoring in these, and possibly other, considerations would most likely (significantly!) increase the price of our storage.

Performance

SAS applications are consumers of storage, and have significant performance expectations for I/O throughput. Many SAS field consultants can share stories of under-performing storage leading to failed deployments and unhappy customers. SAS has minimum recommended I/O throughput rates of file systems that are to be used in a SAS environment, and the Performance Evaluation team within SAS R&D has written several papers that document best practices and tuning guidelines. There is even a usage note about testing throughput for your SAS9 File Systems. Multiple configuration options are reviewed and discussed, ranging from shared file systems to external SAN or NAS arrays.

Availability

Deploying SAS applications into a business-critical environment or where there are availability requirements such as a Service Level Agreement require careful attention to the type and configuration of storage used. Since SAS is implemented on the host OS file systems, commonly used high availability strategies can be used effectively. From simple strategies, such as configuring local storage using RAID mirroring, to more complex enterprise-class solutions, such as redundancy through a SAN, the appropriate level of high availability can be designed and deployed to assure that the storage is designed to meet the needs of the business.

So how does all this fit together?

tradeoffsAs you can see, none of these criteria should be considered independent of the others when designing and evaluating storage solutions for SAS environments. There will be tradeoffs made in the evaluation process, and priorities will be established. For some areas, such as performance, there are guidelines established by SAS R&D. In other areas, specific needs of the customer (a specific SLA, for example) may dictate specific design decisions. In addition, there’s some flexibility in certain areas – filesystems containing SAS permanent data should be allocated to a more available, more protected storage area than the temporary filesystem of SASWORK. A detailed analysis of the storage needs of the SAS deployment as a part of the overall architecture design will consider these three, in addition to other criteria.

 

In case you were wondering, we didn’t take the red-eye.

tags: SAS architecture, SAS Professional Services, storage

The post Evaluating tradeoffs when designing storage for SAS applications appeared first on SAS Users.