cloud db & storage

11月 052019
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Microsoft Azure. Access all the articles in the series here.

The series

This is the next iteration of a series covering database as a service (DBaaS) and storage offerings in the cloud, this time from Microsoft Azure. I have already published two articles on Amazon Web Services. One of those articles covers the DBaaS offerings and the other covers storage offerings for Amazon Web Services. I will cover Google Cloud Platform in future articles. The goal of these articles is to supply a breakdown of these services to better understand the business requirements of these offerings and how they relate to SAS. I would encourage you to read all the articles in the series even if you are already using a specific cloud provider. Many of the core technologies and services are offered across the different cloud providers. These articles focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya, called the SAS Data Connect Accelerator, are synonymous with the SAS Embedded Process.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies move to a cloud-based architecture, companies must consider questions like: Where do I store my data? What cloud services best meet my business requirements? Which cloud vendor should I use? Can I migrate my applications to the cloud? If you are looking to migrate your SAS infrastructure to Azure, look at the SAS Viya QuickStart Template for Azure to see a rapid deployment pattern to get the SAS Viya platform up and running in Azure.

SAS integration with Azure

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Azure database variants. A database running in Azure is much like your on-premise database, but instead Microsoft manages the software and hardware. Azure’s DBaaS offerings takes care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

Azure databases

Azure offers database service technologies familiar to many users. If you read my previous article on SAS Data Connectors and Amazon Web Services, you are sure to see many parallels. It is important to understand the terminology and how the different database services in Azure best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in Azure. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and creates backups means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the Azure offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between Azure database services and SAS. Trace you path in the diagram and read on to learn more about connection details.

Integration points between Azure database services and SAS

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Azure offers several types of transactional database services. You can organize the Azure transactional database service into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers have workloads built around an enterprise database. Azure is an interesting use case because Microsoft is also a traditional enterprise database vendor. Amazon, for example, does not have existing on-premise enterprise database customers. Oracle cloud is the other big player in the enterprise market looking to migrate existing customers to their cloud. Slightly off topic, but it may be of interest to some, SAS does support customers running their Oracle database on Oracle Cloud Platform using their SAS Data Connector to Oracle. Azure offers a solution for customers looking to continue their relationship with Microsoft without refactoring their existing workflows. Customers bring an existing enterprise database licenses to Azure and run SQL Server on Virtual Machines. SAS has extended SAS Data Connector support for SQL Server on Virtual Machines. You can also use your existing SAS license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with SQL Server on Virtual Machines.

Remember you can install and manage your own database on a virtual machine. For example, support for both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata is available for Teradata installed on Azure. If there is not an available database as a service offering, the traditional backup and update responsibilities are left to the customer.

SQL Server Stretch Database is another service available in Azure. If you are not prepared to add more storage to your existing on-premise SQL Server database, you can add capacity using the resources available in Azure. SQL Server Stretch will scale your data to Azure without having to provision any more servers on-premise. New SQL Server capacity will be running in Azure instead of in your data center.

Open Source

Azure provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. You can use your existing SAS license for SAS Data Connector to MYSQL to connect to Azure Database for MYSQL and SAS Data Connector to PostgreSQL to interface with Azure Database for PostgreSQL. SAS has not yet formally supported Azure Database for MariaDB. MariaDB is a variant of MySQL, so validation of support for SAS Data Connector is coming soon. If you need support for MariaDB in Azure database, please comment below and I will share your feedback with product management and testing.

Cloud Native

Azure SQL Database is an iteration of Microsoft SQL Server built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Azure SQL Database. You can use your existing license for SAS Data Connector to Microsoft SQL Server to connect to Azure SQL Database.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Azure SQL Data Warehouse is the analytical database service offered by Azure. The SAS Data Connector to ODBC combined with a recent version of the Microsoft-supplied ODBC driver is currently the best way to interact with Azure SQL Data Warehouse. Look for the SAS Data Connector to Microsoft SQL Server to support SQL Data Warehouse soon.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. If you want a brief overview of these database structures, I cover them in my AWS database blog.

For Microsoft Azure, CosmosDB is the option available for NoSQL databases. CosmosDB is multi-model, meaning you can build out your databases to fit the NoSQL model you prefer. Use the SAS Data Connector to ODBC to interact with your Data in Azure CosmosDB.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Azure offers HDInsights as their Hadoop as a service offering. Azure HDInsights supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data you plan to store before you select an Azure database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases, remember that you will be charged for the storage you use and for the data that you move out of the database. Performing analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these by keeping frequently accessed data cached somewhere or remain on-premise. Another strategy I’ve seen becoming more popular is taking advantage of the SAS Micro Analytics Service to move the models you have built to run in the cloud provider where your data is stored. Data transfer is cheaper if that data moves between cloud services instead of outside of the cloud provider. Micro Analytics Service allows you to score the data in place without movement from a cloud provider and without having to do an install of SAS.

Additional Resources
1. Support for Databases in SAS® Viya® 3.4
2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Microsoft Azure was published on SAS Users.

11月 052019
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Microsoft Azure. Access all the articles in the series here.

The series

This is the next iteration of a series covering database as a service (DBaaS) and storage offerings in the cloud, this time from Microsoft Azure. I have already published two articles on Amazon Web Services. One of those articles covers the DBaaS offerings and the other covers storage offerings for Amazon Web Services. I will cover Google Cloud Platform in future articles. The goal of these articles is to supply a breakdown of these services to better understand the business requirements of these offerings and how they relate to SAS. I would encourage you to read all the articles in the series even if you are already using a specific cloud provider. Many of the core technologies and services are offered across the different cloud providers. These articles focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya, called the SAS Data Connect Accelerator, are synonymous with the SAS Embedded Process.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies move to a cloud-based architecture, companies must consider questions like: Where do I store my data? What cloud services best meet my business requirements? Which cloud vendor should I use? Can I migrate my applications to the cloud? If you are looking to migrate your SAS infrastructure to Azure, look at the SAS Viya QuickStart Template for Azure to see a rapid deployment pattern to get the SAS Viya platform up and running in Azure.

SAS integration with Azure

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Azure database variants. A database running in Azure is much like your on-premise database, but instead Microsoft manages the software and hardware. Azure’s DBaaS offerings takes care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

Azure databases

Azure offers database service technologies familiar to many users. If you read my previous article on SAS Data Connectors and Amazon Web Services, you are sure to see many parallels. It is important to understand the terminology and how the different database services in Azure best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in Azure. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and creates backups means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the Azure offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between Azure database services and SAS. Trace you path in the diagram and read on to learn more about connection details.

Integration points between Azure database services and SAS

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Azure offers several types of transactional database services. You can organize the Azure transactional database service into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers have workloads built around an enterprise database. Azure is an interesting use case because Microsoft is also a traditional enterprise database vendor. Amazon, for example, does not have existing on-premise enterprise database customers. Oracle cloud is the other big player in the enterprise market looking to migrate existing customers to their cloud. Slightly off topic, but it may be of interest to some, SAS does support customers running their Oracle database on Oracle Cloud Platform using their SAS Data Connector to Oracle. Azure offers a solution for customers looking to continue their relationship with Microsoft without refactoring their existing workflows. Customers bring an existing enterprise database licenses to Azure and run SQL Server on Virtual Machines. SAS has extended SAS Data Connector support for SQL Server on Virtual Machines. You can also use your existing SAS license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with SQL Server on Virtual Machines.

Remember you can install and manage your own database on a virtual machine. For example, support for both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata is available for Teradata installed on Azure. If there is not an available database as a service offering, the traditional backup and update responsibilities are left to the customer.

SQL Server Stretch Database is another service available in Azure. If you are not prepared to add more storage to your existing on-premise SQL Server database, you can add capacity using the resources available in Azure. SQL Server Stretch will scale your data to Azure without having to provision any more servers on-premise. New SQL Server capacity will be running in Azure instead of in your data center.

Open Source

Azure provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. You can use your existing SAS license for SAS Data Connector to MYSQL to connect to Azure Database for MYSQL and SAS Data Connector to PostgreSQL to interface with Azure Database for PostgreSQL. SAS has not yet formally supported Azure Database for MariaDB. MariaDB is a variant of MySQL, so validation of support for SAS Data Connector is coming soon. If you need support for MariaDB in Azure database, please comment below and I will share your feedback with product management and testing.

Cloud Native

Azure SQL Database is an iteration of Microsoft SQL Server built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Azure SQL Database. You can use your existing license for SAS Data Connector to Microsoft SQL Server to connect to Azure SQL Database.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Azure SQL Data Warehouse is the analytical database service offered by Azure. The SAS Data Connector to ODBC combined with a recent version of the Microsoft-supplied ODBC driver is currently the best way to interact with Azure SQL Data Warehouse. Look for the SAS Data Connector to Microsoft SQL Server to support SQL Data Warehouse soon.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. If you want a brief overview of these database structures, I cover them in my AWS database blog.

For Microsoft Azure, CosmosDB is the option available for NoSQL databases. CosmosDB is multi-model, meaning you can build out your databases to fit the NoSQL model you prefer. Use the SAS Data Connector to ODBC to interact with your Data in Azure CosmosDB.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Azure offers HDInsights as their Hadoop as a service offering. Azure HDInsights supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data you plan to store before you select an Azure database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases, remember that you will be charged for the storage you use and for the data that you move out of the database. Performing analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these by keeping frequently accessed data cached somewhere or remain on-premise. Another strategy I’ve seen becoming more popular is taking advantage of the SAS Micro Analytics Service to move the models you have built to run in the cloud provider where your data is stored. Data transfer is cheaper if that data moves between cloud services instead of outside of the cloud provider. Micro Analytics Service allows you to score the data in place without movement from a cloud provider and without having to do an install of SAS.

Additional Resources
1. Support for Databases in SAS® Viya® 3.4
2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Microsoft Azure was published on SAS Users.

9月 042019
 

Editor’s note: This article is a continuation of the series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. Access all the articles in the series here.

In a previous article in this series, Accessing Databases in the Cloud – SAS Data Connectors and Amazon Web Services, I covered SAS and database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS). Today, I cover the various storage options available on AWS and how connect to and interact with them from SAS.

Object Storage

Amazon Simple Storage Service (S3) is a low-cost, scalable cloud object storage for any type of data in its native format. Individual Amazon S3 objects can range in size from 1 byte all the way to 5 terabytes (TB). Amazon S3 organizes these objects into buckets. A bucket is globally unique. You access the bucket directly through an API from anywhere in the world, if granted permissions. The default granted to the bucket is least access. Amazon advertises 11 9’s, or 99.999999999% of durability, meaning that you never lose your data. Data replicates automatically across availability zones to meet this durability. You can reduce the number of replicants or use one of the various tiers of archive services to reduce your object storage cost. Costs are calculated based on terabytes of storage per month with added costs for request and transfers of data.

SAS and S3

Support for Amazon Web Services S3 as a Caslib data source for SAS Cloud Analytic Services (CAS) was added in SAS Viya 3.4. This data source enables you to access SASHDAT files and CSV files in S3. You can use the CASLIB statement or the table.addCaslib action to add a Caslib for S3. SAS is currently exploring native object storage integration with AWS S3 for more file types. For other file types you can copy the data from S3 and then use a SAS Data Connector to load the data into memory. For example, if I had Excel data in S3, I could use PROC S3 to copy the data locally and then load the data into CAS using the SAS Data Connector to PC Files.

Block Storage

Amazon Elastic Block Store (EBS) is the block storage service designed for use with Amazon Elastic Compute Cloud (EC2). Only when attached to an operating system is the storage class accessible. Storage volumes can be treated as an independent disk drive controlled by a server operating system. You would mount an EBS volume to an operating system as if it were a physical disk. EBS volumes are valuable because they are the storage that will persist when you terminate your compute instance. You can choose from four different volume types that supply performance levels at corresponding costs.

SAS and EBS

EBS is used as the permanent SAS data storage and persists through a restart of your SAS environment. The performance choices made when selecting from the different EBS volume type will have a direct impact on the performance that you get from SAS. One thing to consider is using compute instances that have enhanced EBS performance or dedicated solid state drive instance storage. For example, the SAS Viya on AWS QuickStart uses Storage Optimized and Memory Optimized compute instances with local NVMe-based SSDs that are physically connected to the host server that is coupled to the lifetime of the instance. This is beneficial for performance.

SAS Cloud Analytic Services (CAS) is an in-memory server that relies on the CAS Disk Cache as the virtual memory storage backend. This is especially true if you are reading data from a database. In this case, make sure you have enough block storage, in the form of EBS volumes for use as the CAS Disk Cache.

File Storage

Amazon Elastic File System (EFS) provides access to data through a shared file system. EFS is an elastic network file system that grows and shrinks as you add or remove files, so you only pay for the storage you consume. Users create, delete, modify, read, and write files organized logically in a directory structure for intuitive access. This allows simultaneous access for multiple users to a common set of file data managed with user and group permissions. Amazon FSx for Lustre is the high-performance file system service.

SAS and EFS

EFS shared file system storage can be a powerful tool if utilizing a SAS Grid architecture. If you have a requirement in your SAS architecture for a shared location that any node in a group can access and write to, then EFS could meet your requirement. To access the data stored in your network file system you will have to mount the EFS file system. You can mount your Amazon EFS file systems to any EC2 instance, or any on-premises server connected to your Amazon VPC.

BONUS: Serverless

Amazon Athena is query service for Amazon S3. This service makes it easy to submit queries against the objects stored in S3. You can run analysis on this data using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run. Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet.

SAS and Athena

Amazon Athena is ODBC/JDBC compliant which means I can use SAS/ACCESS Interface to ODBC or SAS/ACCESS Interface to JDBC to connect using SAS. Download an Amazon Athena ODBC driver and submit code from SAS just like you would any ODBC data source. Athena is a great tool if you want to use the serverless computing power of Amazon to query data in S3.

Finally

Many times, we do not have a choice of technologies we use and infrastructures on which they sit. Luckily, if you use AWS, integration with SAS is not a concern. I’ve now covered databases and storage for AWS. In future articles, I’ll cover the same topics for Microsoft Azure and Google Cloud Platform.

Additional Resources

Storage in the Cloud – SAS and Amazon Web Services was published on SAS Users.