AWS

5月 022023
 

In 2019, I penned the article We’ve all gone to cloud -- but what about my SAS data(base)? At the time, containerised Viya (4) wasn’t out yet and integration with Microsoft was in its early days, so let’s revisit some of those topics and see what’s changed. There is a lot of lower-level detail out there, but I've taken writing this article as an opportunity to look at the wider landscape and the direction of our customers, partners and SAS as we move SAS applications to be cloud-native in Viya 4.

Before I dive into the details, it's worth framing this conversation with the caveat that I’m mainly regarding the typical analytical platform data use cases in SAS, whether it's analytical data engineering (ETL) or analytics or visualisation. I’ll save the discussion on real-time and SOA-type applications of SAS for another time, as there’s a lot to be said there too. This article is about the target repository for storage. A follow-up article is planned on how we get the data up to the cloud as this has been a frequent question recently.

Let’s first look at cloud-native ‘Database’ Access/Connectivity. SAS continues to enhance its support for Cloud-native databases like Snowflake, Synapse and BigQuery. Updates include further pushdown of functions, a wider support for output data types, a wider range of performance options like MODE= and Specify Projects on BigQuery for example, and single sign-on for Snowflake in Azure. There is now extensive connectivity (and in-database) support for Spark, with support for Databricks on Azure, GCP and AWS as well as Microsoft’s Synapse.

It is also important to remember that we also support a wide range of traditional databases in their cloud-native form. For example, Google Cloud Platform Cloud SQL for SQL Server or Microsoft Azure SQL Server Big Data Clusters can be accessed via our SAS/ACCESS Interface to Microsoft SQL Server. This page provides a great summary of the wide range of over 40 cloud-native databases/stores 40+ we support here, and SAS documentation has a good summary listing to aid usage and setup.

Customers will be happy to know that SAS has added direct connectivity to the parquet file format in SAS Compute (v9 compatible engine), similar to the support for ORC. This support was originally file-based only but now includes object storage for both AWS and GCP with Azure to follow in 2023. Why is this important? First of all, parquet has become the default object-based cloud storage choice for customers in the UKI based on my experience. Parquet is open so all your SAS, Python, Spark and other language programmers can simply share data, with no conversion necessary. On the SAS front, this means you can take existing v9 programs and map them directly to data stored in object storage just by ‘changing’ the libnames mapping when you run them in Viya. Parquet tables have a columnar format and can often be compressed smaller than compressed SAS datasets; I’ve often seen them at 50% of the size. However, there are still some advantages to SAS datasets. Items like SAS indexes, integrity constraints, etc. aren’t supported, so parquet is not a direct replacement for SAS datasets. This documentation page details the current restrictions on parquet support. Some of these restrictions are driven by parquet not supporting those features, other features like partitions are on SAS’ roadmap for future implementation.

Object Storage on the cloud gives us another potential option. When we move a SAS Grid forward to Viya (this functionality is now called SAS Workload Management in Viya) as data stored in object storage, it is available on all nodes just like it would be on a clustered file system. Again, you need to look at the performance characteristics and your workload to see if it will meet your requirements. The balance might be there, particularly if a lot of steps are more transitionary, i.e. you source the data from S3, some of your ETL builds into this shared area. Read this previous article on why I think the dynamics of the cloud change how you might design a SAS Grid when you move it to Viya.

A major improvement using Viya is the single native sign-on into the Azure database eco-system, enabled through our co-engineering work with Microsoft. If I’m an end user, there's no need for storing passwords when using FILENAMES to access ADLS, or connecting to databases like SQL server or Synapse. For more details, have a look at this documentation page. Although the most advanced Single sign-on features exist for SAS Viya on Microsoft Azure, SAS has added IAM integration points on AWS, for example with S3 and Redshift. Speaking to Product Management, the security integration improvements will continue further in AWS and GCP in the near future. Keep your eyes peeled for updates over this year!

One other key item to mention is SAS' recent work with SingleStore, which will give customers the option to add SingleStore as a dedicated analytics Database. Our customers get an independent data and analytics solution that works at scale and is completely portable including on-premise. The foundation for this is tight integration between SAS Viya and SingleStore, which includes items like the tiering of storage costs, security, streaming into the database, with instant analytics and interactions/transfers to SAS including items like the pushdown of SAS functionality into the database (specialist Visual Analytics interactions, for example). Because we OEM, there is a single stop for technical queries giving you accountability and access to world-class support. Evidently, we have customers deploying this and seeing considerable TCO reduction versus traditional bets in breed solutions.

As two final footnotes, not everyone will be moving to the cloud in the immediate future and SAS will continue its strong support for on-premise deployments as illustrated by the support for Kubernetes on-site and being able to use newer file formats like parquet in these deployments. Equally, if you’ve moved to the cloud but not all your SAS data has arrived SAS Cloud Data Exchange is now available again on the 2023.3 release of Viya with some net new functionality.

Learn more

Moving SAS and its data to the cloud (AKA Viya): Making those tricky data decisions was published on SAS Users.

9月 042019
 

Editor’s note: This article is a continuation of the series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. Access all the articles in the series here.

In a previous article in this series, Accessing Databases in the Cloud – SAS Data Connectors and Amazon Web Services, I covered SAS and database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS). Today, I cover the various storage options available on AWS and how connect to and interact with them from SAS.

Object Storage

Amazon Simple Storage Service (S3) is a low-cost, scalable cloud object storage for any type of data in its native format. Individual Amazon S3 objects can range in size from 1 byte all the way to 5 terabytes (TB). Amazon S3 organizes these objects into buckets. A bucket is globally unique. You access the bucket directly through an API from anywhere in the world, if granted permissions. The default granted to the bucket is least access. Amazon advertises 11 9’s, or 99.999999999% of durability, meaning that you never lose your data. Data replicates automatically across availability zones to meet this durability. You can reduce the number of replicants or use one of the various tiers of archive services to reduce your object storage cost. Costs are calculated based on terabytes of storage per month with added costs for request and transfers of data.

SAS and S3

Support for Amazon Web Services S3 as a Caslib data source for SAS Cloud Analytic Services (CAS) was added in SAS Viya 3.4. This data source enables you to access SASHDAT files and CSV files in S3. You can use the CASLIB statement or the table.addCaslib action to add a Caslib for S3. SAS is currently exploring native object storage integration with AWS S3 for more file types. For other file types you can copy the data from S3 and then use a SAS Data Connector to load the data into memory. For example, if I had Excel data in S3, I could use PROC S3 to copy the data locally and then load the data into CAS using the SAS Data Connector to PC Files.

Block Storage

Amazon Elastic Block Store (EBS) is the block storage service designed for use with Amazon Elastic Compute Cloud (EC2). Only when attached to an operating system is the storage class accessible. Storage volumes can be treated as an independent disk drive controlled by a server operating system. You would mount an EBS volume to an operating system as if it were a physical disk. EBS volumes are valuable because they are the storage that will persist when you terminate your compute instance. You can choose from four different volume types that supply performance levels at corresponding costs.

SAS and EBS

EBS is used as the permanent SAS data storage and persists through a restart of your SAS environment. The performance choices made when selecting from the different EBS volume type will have a direct impact on the performance that you get from SAS. One thing to consider is using compute instances that have enhanced EBS performance or dedicated solid state drive instance storage. For example, the SAS Viya on AWS QuickStart uses Storage Optimized and Memory Optimized compute instances with local NVMe-based SSDs that are physically connected to the host server that is coupled to the lifetime of the instance. This is beneficial for performance.

SAS Cloud Analytic Services (CAS) is an in-memory server that relies on the CAS Disk Cache as the virtual memory storage backend. This is especially true if you are reading data from a database. In this case, make sure you have enough block storage, in the form of EBS volumes for use as the CAS Disk Cache.

File Storage

Amazon Elastic File System (EFS) provides access to data through a shared file system. EFS is an elastic network file system that grows and shrinks as you add or remove files, so you only pay for the storage you consume. Users create, delete, modify, read, and write files organized logically in a directory structure for intuitive access. This allows simultaneous access for multiple users to a common set of file data managed with user and group permissions. Amazon FSx for Lustre is the high-performance file system service.

SAS and EFS

EFS shared file system storage can be a powerful tool if utilizing a SAS Grid architecture. If you have a requirement in your SAS architecture for a shared location that any node in a group can access and write to, then EFS could meet your requirement. To access the data stored in your network file system you will have to mount the EFS file system. You can mount your Amazon EFS file systems to any EC2 instance, or any on-premises server connected to your Amazon VPC.

BONUS: Serverless

Amazon Athena is query service for Amazon S3. This service makes it easy to submit queries against the objects stored in S3. You can run analysis on this data using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run. Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet.

SAS and Athena

Amazon Athena is ODBC/JDBC compliant which means I can use SAS/ACCESS Interface to ODBC or SAS/ACCESS Interface to JDBC to connect using SAS. Download an Amazon Athena ODBC driver and submit code from SAS just like you would any ODBC data source. Athena is a great tool if you want to use the serverless computing power of Amazon to query data in S3.

Finally

Many times, we do not have a choice of technologies we use and infrastructures on which they sit. Luckily, if you use AWS, integration with SAS is not a concern. I’ve now covered databases and storage for AWS. In future articles, I’ll cover the same topics for Microsoft Azure and Google Cloud Platform.

Additional Resources

Storage in the Cloud – SAS and Amazon Web Services was published on SAS Users.

8月 222019
 

Editor’s note: This is the first article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Amazon Web Services.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies moves to a cloud-based architecture, companies must consider questions like: Where is my data going to be stored? Do I want a hybrid solution? What cloud storage options do I have? What storage solution best meets my business needs?. Another question requiring an answer is: Is the software I use cloud-ready?. The answer in the case of SAS is, YES! SAS offers various cloud deployment patterns on various cloud providers and supports integration with cloud storage services.

This is part one in a series covering database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS). Microsoft Azure and Google Cloud Platform will be covered in future articles. The goal is to supply a breakdown of these services to better understanding the business requirements of these offerings and how they relate to SAS. I will focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya are called SAS Data Connect Accelerators and are synonymous with the SAS Embedded Process.

SAS integration with AWS

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Amazon Web Services database variants. A database running in AWS is much like your on-premise database, but instead Amazon is managing the software and hardware. Amazon’s DBaaS offerings take care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

AWS databases

Amazon offers database service technologies familiar to users. It is important to understand the new terminology and how the different database services best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in AWS. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and create backups automatically means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the AWS offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between AWS database services and SAS.

Integration points between AWS database services and SAS

Trace you path in the diagram and read on to learn more about connection details.

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Amazon offers several types of transactional database services. You can organize Amazon Relational Database Service (RDS) into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers already have workloads built around enterprise databases. Amazon provides a turn-key enterprise solution for customers not looking to break their relationship with enterprise vendors or refactor their existing workflows. AWS offers Oracle and Microsoft SQL Server as a turn-key enterprise solution in RDS. Both offerings include the required software license, however Oracle also allows you to “Bring Your Own License” (BYOL). SAS has extended SAS Data Connector support for both cloud variants. You can use your existing license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with these RDS databases.

Remember you can install and manage your own database on a virtual machine if there is not an available database as a service offering. The traditional backup and update responsibilities are left to the customer in this case. For example, both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata are supported for Teradata installed on AWS.

Open Source

Amazon provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. SAS has extended SAS Data Connector support for all these cloud variants. You can use your existing license for SAS Data Connector to MYSQL to connect to either RDS MYSQL or RDS MariaDB and SAS Data Connector to PostgreSQL to interface with RDS PostgreSQL.

Cloud Native

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Amazon Aurora. You can use your existing license for SAS Data Connector to MYSQL to connect to either Aurora MYSQL or and SAS Data Connector to PostgreSQL to interface with Aurora PostgreSQL.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Redshift is the analytical database service offered by Amazon. SAS has a dedicated product called SAS Data Connector to Amazon Redshift that was purpose built for analytics workloads running in the Amazon cloud.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. Here is a brief overview of the most common data structures.

Key-Value Database

A key-value database stores data as a collection of key-value pairs. The key acts as a unique identifier for each record. Amazon’s key-value database as a service is DynamoDB. SAS interacts with DynamoDB using industry standard ODBC or JDBC drivers.

Columnar Database

Data in a traditional relational database is sorted by rows. The alternative columnar databases optimize by sorting data quickly using columns, saving valuable time and network I/O. Redshift is the columnar database service offered by Amazon. SAS has a dedicated product called SAS Data Connector to Amazon Redshift that was purpose built for this database.

Document Database

A document database queries data in documents typically stored in JSON format. DocumentDB is the document database service offering from Amazon. SAS interacts with DocumentDB using industry standard ODBC or JDBC drivers. DocumentDB is MongoDB-compatible which means existing MongoDB drivers and tolls work with DocumentDB. Currently SAS is building out functionally to support SAS Data Connector to MongoDB and you should expect that to expand further into DocumentDB as well.

Graph Database

Amazon Neptune is the graph database service designed to work with a complex hierarchy of interconnected data. These design of these types of databases queries relationships in data and reduce the number of table joins. SAS interacts with Amazon Neptune using industry standard ODBC or JDBC drivers.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Amazon offers Elastic Map Reduce as their Hadoop as a service offering. Amazon Elastic supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data that you plan to store before you select an AWS database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases always remember that you will be charged for the storage that you use but also for the data that you move out of the database. To do analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these costs by keeping frequently accessed data cached somewhere or remain on-premise.

Additional Resources

  1. Support for Databases in SAS® Viya® 3.4
  2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Amazon Web Services was published on SAS Users.

6月 192019
 

As a data scientist, did you ever come to the point where you felt the need for an evolved analytics platform bringing together the disparate skills of open source and commercial software? A system that can enable advanced analytic capabilities. This is now possible and easy to implement. With many deployment possibilities, SAS Viya allows you to choose the data storage location where compute happens, and the deployment methods for models.

Let’s say you want to expand your model development process with SAS Viya analytical capabilities and you don’t want to wait for getting such environment up and running. Unfortunately, you have no infrastructure, nor the experience to install SAS Viya. Moving the traditional way, you could go for:

  • Protracted hardware procurement and provisioning
  • Deployment planning and coordination with IT
  • Effort and time required for software installation/configuration

This solution may be the right path for many organizations, but I think we all recognize this: the traditional approach could take days, weeks and yes sometimes months.

What if you could get up and running with a full SAS Viya platform in two hours? If you have some affinity for cloud-based solutions, SAS offers you the AWS SAS Viya Cloud Rapid Deployment tool. SAS released this AWS Quick Start as a rapid deployment architecture for SAS Viya on AWS. Deployable products include SAS Visual Data Mining and Machine Learning, SAS Visual Statistics and SAS Visual Analytics.

The goal of this article is to brief you how I launched such an AWS SAS Viya Quickstart. I strongly advise you to watch this related video by my colleague Erwan Granger. Much of what is covered here appears in Erwan's video. The recording predates the SAS Viya 3.4 release, but main concepts are still the same.

What you will need

The following is a list of items you need to complete this task.

  • AWS Account with appropriate creation privileges
  • A valid SAS Viya License; this means you will need a SAS Software Order Confirmation e-mail
  • Optional: you deploy with your own DNS Name and SSL Certificate. In that case you need to register a domain managed by Amazon Route 53. For instructions on registering the domain, see the Route 53 documentation. And you can request and register a certificate with AWS Certificate Manager.

Furthermore, it’s good to know this Quick Start provides two deployment options. You can deploy SAS Viya into a new Virtual Private Cloud (VPC) or into an existing VPC. The first option builds a new AWS environment consisting of the VPC, private and public subnets, NAT gateways, security groups, Ansible controllers, and other infrastructure components, and then deploys SAS Viya into this new VPC. The second option provisions SAS Viya in your existing AWS infrastructure. I decided to go for the first option.

What you will build

Here's an architectural overview of what we will build:

SAS Viya architecture on the AWS Cloud

You can find exactly the same architecture on the SAS Viya AWS Quick Start landing page.

Configure the build

We’ll be following the build process outlined in the Quick Start guide. On the landing page, next to the "What you’ll build tab" you can click on "How to deploy". From there launch the "Deploy into a new VPC" wizard.

Deploy into a new VPC wizard

Prerequisite prep

Make sure you sign in with your AWS account and you have chosen the region where you want to deploy. On that first screen you can leave the Amazon S3 template URL default. That template is the basics for the AWS CloudFormation we are launching. CloudFormation is a tool from AWS that allows you to spin up resources in the right order. The template is the blueprint document for your CloudFormation. By keeping the default template, we will build exactly the architecture displayed above.

Pre-req prep template

Now click "Next" and move to the page where we can specify more details and the required parameters of the CloudFormation parameters.

Cloudfourmation parameters

The first parameter is the SAS Viya Software Order file, which is the Amazon S3 location of the Software Order e-mail attachment.

SAS Viya install package location

In the Administration section, you provide parameters to configure your AWS architecture. That way, you control access, instance type, and if you will use a SAS Viya Mirror repository.

CloudFormation administration parameters

Administration parameter definitions:

  • The name of an Amazon EC2 key pair, so you can access the Ansible controller
  • The Amazon Availability zone for the public and private subnet
  • Allowable IP range for HTTP traffic; must be a valid IP CIDR range
  • Allowable IP Range for SSH traffic to the Ansible controller; must be a valid IP CIDR range
  • SAS Administrator password
  • Password for Default (sasuser) user
  • Amazon EC2 Instance type for CAS Compute VM
  • Amazon EC2 Instance type for SAS Viya Services VM
  • (Optional) Location of SAS Viya Deployment Repository data
  • (Optional) Operator Email

If you want to work with custom DNS names and SSL, you will need to provide the next three parameters as well.

DNS and SSL configuration (optional)

DNS and SSL parameters:

You may accept the defaults on the remaining parameters.

Optional parameters

After clicking "Next" another set of optional parameters are available. I mostly go with accepting the default parameters provided. The lone exception is the Rollback on failure.

Optional administration parameters

Based on what I’ve learned from Erwan's video, the safer choice is "No" on the Rollback option. This way, if the deployment process encounters issues, the log will identify in which step the error occurred. Of course this means you are responsible to manually delete AWS created resources that are not longer necessary. The easiest way to do this is by deleting the CloudFormation Stacks afterward.

Kick off the build

To conclude the deployment wizard, click "Next" once more and acknowledge the necessary AWS resources to create. By clicking "Create stack" the deployment process starts.

Start the build process

You can monitor the deployment log using AWS CloudWatch. In his video, Erwan demonstrates this at around minute 23.

After a successful formation you will find two AWS CloudFormation Stacks created. The Outputs gives you the direct links to SAStudioV and SASDrive.

SAS Studio and SAS Drive stacks

That’s it. You are deployed and ready to begin using your SAS Viya environment!

Additional Reference

Alexander Koller writes about SAS on AWS and takeaways for preparing for the AWS associate solution architect exam.

Your experiences and opinion matter

New forces are shaping the analytics ecosystem. Because of increased competition, rise in customer expectations and new, emerging technology such as AI and Machine Learning, challenging IT departments with evolving their analytic ecosystems to meet the demands of their business partners.

How is your organization doing this? How does your Analytics Cloud strategy compare to the market? And what do your peers think about migrating Analytics to the cloud? We can give you some insights and an industry benchmark on the topic.

Tell us about your experience in this 5 minute survey and we will be happy to share a detailed industry insight report with you, to answer these questions.

Deploy SAS Viya on AWS - Quick Start was published on SAS Users.