database engines

11月 052019
 

"Moving to cloud" is top of agenda for a lot of the customers I meet. They see the potential for agility or cost reduction. Interestingly when I was speaking to the CTO of one our customers that was an earlier adopter of cloud, he didn’t see an overall cost reduction and didn’t mind. But they did experience increased agility in IT which directly lead to business improvements and an impact on the bottom line. Food for thought.

One of the biggest costs of running an analytics system can be the storage or database cost, whether through typical databases like Oracle, DB2,  Hadoop, or Teradata or just SAN alongside your SAS implementation. Getting the storage equation right on cloud can bring your TCO (Total Cost of Ownership) right down; some of our customers have halved it.

When moving to a public cloud some customers just assume that they can "lift and shift" and still get the full advantages of cloud. Unfortunately often this isn’t true -- some small refactoring changes nearly always make sense.  Often the virtual hardware on the cloud doesn’t have the same performance criteria as on-premise which means your SAS jobs will run slower when retrieving data. (Buyer beware: some instance types have lower I/O throughput.) To get faster data I/O (throughput) you might need to boost your SAS instance types which increase overall cloud costs. This can also be true for storage types too, so you may end up with more storage volume than you need if you are to guarantee a level of I/O throughput. (For more tech details, Conor Hogan, SAS Solutions Architect, describes the cloud and storage options when using SAS in the cloud.)

In this article I'll cover two main themes. First, thinking differently with SAS Grid storage on cloud, especially with regard to databases. Then, I'll look at getting access to that troublesome data that has stayed on-premise, addressing security and data-on-cloud including cloud caches.

Databases on cloud: do you need one?

I often get asked what database(s) do you support on cloud, and how does the access pattern work. To learn which databases are supported, view the current list for SAS Viya and the current list for SAS 9.4. Most of the traditional databases work fine as infrastructure-as-a-service, and follow standard rules. It starts to get more interesting as customers use the native databases in AWS, Google Cloud and Azure. These each have their own nuances, which customers didn’t quite understand at the start of the journey.  (SAS recently released an access engine that uses native Google BigQuery APIs, with great performance.)

Prior to this customers had been accessing cloud data via ODBC or JDBC, which have limitations in comparison to the new native access engines. However it’s important to note that JDBC was enhanced recently to open up more options to cloud native databases, giving SAS customers a platform-neutral option to try a number of the new native cloud database players.

We also added support for Snowflake (a scalable cloud native database) and will be furthering our support for Azure native databases like HDInsights.  SAS 9.4M6 (or higher) or SAS Viya 3.4 (or higher) is required for SAS/ACCESS Interface to Google BigQuery or Snowflake.

Incidentally a number of the database connectors on SAS Viya also support multi-node access which allows each node in SAS Viya to read/write data in parallel to your database, massively speeding up your loading times. There’s a useful article on the SAS Communities that tells you more.

There are quite a few good papers on our conference proceedings covering connecting to cloud databases like Redshift here or within SAS communities like this example to Amazon RDS.

But do you need actually need a database? You could just use the storage on the instances, but that might not be as efficient if you are wanting to scale and take machines up and down in the cloud. You might want to use lower cost persistent storage like AWS S3.  Luckily, SAS has started to build out its options for reaching into native-cloud storage buckets like S3 in Amazon, with the ability in SAS Viya to read and write directly in parallel without any additional processing layers required. (caveats exist.), have a look here for more details. Storage on S3 is cheap, but the catch with cloud is that moving data up and down the wire between on- and off-premise might be expensive.

Many SAS Grid Computing users are considering migrating to the cloud, but wonder about what to do with their Clustered File system (CFS), mainly used as a central store, to store SAS data sets in between processing steps, including SASWORK. There are Cloud Clustered File Systems available (and they can be expensive), but because we are moving to cloud we need to think a bit more laterally, more cloud native and decouple work steps to take advantage of cloud design principles.

Let’s look at an example. The diagram shows a typically data management or analytics data flow, that could be running at scale every day in a SAS grid. The great thing about this design  is it allows recovery if a node goes from the last step, as the data is written to a file system that all the nodes can see (SASWORK on a CFS), but has the disadvantage of high cost (relatively on cloud).  One way to overcome this could be to re-factoring your design. This could be relatively simple. Let’s look at two simple ways of refactoring this:

  • Don't worry about the job 'blowing up', cloud 'hardware' fails relatively infrequently on cloud. Remember Cloud resources are disposable, to restart from the beginning is very viable proposition, so just spin up a new copy on demand, maybe on a bigger instance if you need to make up time, replacing the failed machine. The later version of SAS Grid Manager for LSF from 9.4 M5 allows dynamic bursting/scaling on demand within AWS, so this is really easy to achieve.
  • Replace SASWORK with either a database or S3 or move the work into CAS on SAS Viya where node redundancy will protect your data…between steps and failure.

SAS has completed testing of Amazon FSx for Lustre, which offers a new off the shelf highly performant clustered file system suitable for grid, at a reasonable price point. However, be aware that this is sold in large blocks, and may require an even larger size to achieve sufficient throughput for a SAS Grid. We understand AWS is working to reduce these constraints.

Looking forward to SAS Viya 3.5 (4Q2019) we will be introducing further direct access data to cloud native storage sources: parquet on s3 (an extension of  the caslib concept) and CASLIBS (direct read and write access) and on Azure Data Lake Storage, Gen 2  -- supporting CSV and ORC formats for direct access.

New SAS tools to help

We covered storage transitions and considerations for design patterns when we lift and shift a SAS system to the cloud and can move most of the storage or the data to the cloud with it. This could include refactoring the workload to be more cloud friendly. This design fits with ‘data gravity’ paradigm, which implies keeping the data near the processing to reduce I/O or network latency low.  Increasingly we see cloud design patterns separating compute and processing to allow flexible processing models, including hydrating data on demand.

Unfortunately, in real life the situation isn’t so clear cut, sometimes all the data can’t be moved to the cloud due to cost, security, historical constraints or maybe because the data SAS processes originates from sources that must stay on-premise next to the primary systems they support.

Luckily SAS Viya offers us some very clever solutions to help with this…

The first of these is SAS Cloud Data Exchange (CDE) that comes as part of our SAS Data Preparation product with SAS Viya.  CDE is a data connection capability that securely performs high-volume data transfers (via https) from an on-premises data store(s) to a cloud-based instance of SAS Viya for use in SAS Viya applications.  This is done by installing an on-premise local SAS data agent that queries the local source(s) and then pulls the relevant data up to your CAS Cloud instance. This allows end users with a Graphical user Interface to reach through and get to on-premise data, while the administrators don’t have to setup multiple connections (and holes through a firewall) for each access engine to reach on-premise data. There a great summary article of how this works on SAS Communities.

One of the other key considerations we see from some customers is not leaving the data in the cloud permanently. This is where the SAS Viya caching and in-memory model come into play. Used in conjunction with CDE the CAS engine can be used as a semi-temporary store for your data -- think of it as a cache for analysts. SAS Viya’s built in scheduling capabilities could be used to pull in large tables before users arrive and the parallel loading capabilities of SAS Cloud Data Exchange also make on-demand request very feasible. This does requires some design and thought behind the strategies that are used for this in-memory cache in order to meet core situational requirements.

One of the other great advantages of this solution is that that it can also offer an effective route to access data for open source developers (a clever cache) in the cloud. CAS in conjunction with SAS Cloud Data Exchange can provide a secure and audited connectivity back to the on-premise data, and expose it through native Python or R packages that support SAS.  The python-swat package allows the Python client access to SAS Cloud Analytic Services. It allows users to execute CAS actions, access data in CAS and process the results all from Python.

In fact, python-swat mimics much of the API of the Pandas package, so that using CAS should feel familiar to current Pandas users. And if they so choose, they can download the data from CAS as required.

This rounds off my initial thoughts on considerations of data when migrating SAS to cloud -- thoughts and discussion are welcome as ever.

We’ve all gone to cloud -- but what about my SAS data(base)? was published on SAS Users.

11月 052019
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Microsoft Azure. Access all the articles in the series here.

The series

This is the next iteration of a series covering database as a service (DBaaS) and storage offerings in the cloud, this time from Microsoft Azure. I have already published two articles on Amazon Web Services. One of those articles covers the DBaaS offerings and the other covers storage offerings for Amazon Web Services. I will cover Google Cloud Platform in future articles. The goal of these articles is to supply a breakdown of these services to better understand the business requirements of these offerings and how they relate to SAS. I would encourage you to read all the articles in the series even if you are already using a specific cloud provider. Many of the core technologies and services are offered across the different cloud providers. These articles focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya, called the SAS Data Connect Accelerator, are synonymous with the SAS Embedded Process.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies move to a cloud-based architecture, companies must consider questions like: Where do I store my data? What cloud services best meet my business requirements? Which cloud vendor should I use? Can I migrate my applications to the cloud? If you are looking to migrate your SAS infrastructure to Azure, look at the SAS Viya QuickStart Template for Azure to see a rapid deployment pattern to get the SAS Viya platform up and running in Azure.

SAS integration with Azure

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Azure database variants. A database running in Azure is much like your on-premise database, but instead Microsoft manages the software and hardware. Azure’s DBaaS offerings takes care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

Azure databases

Azure offers database service technologies familiar to many users. If you read my previous article on SAS Data Connectors and Amazon Web Services, you are sure to see many parallels. It is important to understand the terminology and how the different database services in Azure best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in Azure. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and creates backups means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the Azure offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between Azure database services and SAS. Trace you path in the diagram and read on to learn more about connection details.

Integration points between Azure database services and SAS

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Azure offers several types of transactional database services. You can organize the Azure transactional database service into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers have workloads built around an enterprise database. Azure is an interesting use case because Microsoft is also a traditional enterprise database vendor. Amazon, for example, does not have existing on-premise enterprise database customers. Oracle cloud is the other big player in the enterprise market looking to migrate existing customers to their cloud. Slightly off topic, but it may be of interest to some, SAS does support customers running their Oracle database on Oracle Cloud Platform using their SAS Data Connector to Oracle. Azure offers a solution for customers looking to continue their relationship with Microsoft without refactoring their existing workflows. Customers bring an existing enterprise database licenses to Azure and run SQL Server on Virtual Machines. SAS has extended SAS Data Connector support for SQL Server on Virtual Machines. You can also use your existing SAS license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with SQL Server on Virtual Machines.

Remember you can install and manage your own database on a virtual machine. For example, support for both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata is available for Teradata installed on Azure. If there is not an available database as a service offering, the traditional backup and update responsibilities are left to the customer.

SQL Server Stretch Database is another service available in Azure. If you are not prepared to add more storage to your existing on-premise SQL Server database, you can add capacity using the resources available in Azure. SQL Server Stretch will scale your data to Azure without having to provision any more servers on-premise. New SQL Server capacity will be running in Azure instead of in your data center.

Open Source

Azure provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. You can use your existing SAS license for SAS Data Connector to MYSQL to connect to Azure Database for MYSQL and SAS Data Connector to PostgreSQL to interface with Azure Database for PostgreSQL. SAS has not yet formally supported Azure Database for MariaDB. MariaDB is a variant of MySQL, so validation of support for SAS Data Connector is coming soon. If you need support for MariaDB in Azure database, please comment below and I will share your feedback with product management and testing.

Cloud Native

Azure SQL Database is an iteration of Microsoft SQL Server built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Azure SQL Database. You can use your existing license for SAS Data Connector to Microsoft SQL Server to connect to Azure SQL Database.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Azure SQL Data Warehouse is the analytical database service offered by Azure. The SAS Data Connector to ODBC combined with a recent version of the Microsoft-supplied ODBC driver is currently the best way to interact with Azure SQL Data Warehouse. Look for the SAS Data Connector to Microsoft SQL Server to support SQL Data Warehouse soon.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. If you want a brief overview of these database structures, I cover them in my AWS database blog.

For Microsoft Azure, CosmosDB is the option available for NoSQL databases. CosmosDB is multi-model, meaning you can build out your databases to fit the NoSQL model you prefer. Use the SAS Data Connector to ODBC to interact with your Data in Azure CosmosDB.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Azure offers HDInsights as their Hadoop as a service offering. Azure HDInsights supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data you plan to store before you select an Azure database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases, remember that you will be charged for the storage you use and for the data that you move out of the database. Performing analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these by keeping frequently accessed data cached somewhere or remain on-premise. Another strategy I’ve seen becoming more popular is taking advantage of the SAS Micro Analytics Service to move the models you have built to run in the cloud provider where your data is stored. Data transfer is cheaper if that data moves between cloud services instead of outside of the cloud provider. Micro Analytics Service allows you to score the data in place without movement from a cloud provider and without having to do an install of SAS.

Additional Resources
1. Support for Databases in SAS® Viya® 3.4
2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Microsoft Azure was published on SAS Users.

11月 052019
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Microsoft Azure. Access all the articles in the series here.

The series

This is the next iteration of a series covering database as a service (DBaaS) and storage offerings in the cloud, this time from Microsoft Azure. I have already published two articles on Amazon Web Services. One of those articles covers the DBaaS offerings and the other covers storage offerings for Amazon Web Services. I will cover Google Cloud Platform in future articles. The goal of these articles is to supply a breakdown of these services to better understand the business requirements of these offerings and how they relate to SAS. I would encourage you to read all the articles in the series even if you are already using a specific cloud provider. Many of the core technologies and services are offered across the different cloud providers. These articles focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya, called the SAS Data Connect Accelerator, are synonymous with the SAS Embedded Process.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies move to a cloud-based architecture, companies must consider questions like: Where do I store my data? What cloud services best meet my business requirements? Which cloud vendor should I use? Can I migrate my applications to the cloud? If you are looking to migrate your SAS infrastructure to Azure, look at the SAS Viya QuickStart Template for Azure to see a rapid deployment pattern to get the SAS Viya platform up and running in Azure.

SAS integration with Azure

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Azure database variants. A database running in Azure is much like your on-premise database, but instead Microsoft manages the software and hardware. Azure’s DBaaS offerings takes care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

Azure databases

Azure offers database service technologies familiar to many users. If you read my previous article on SAS Data Connectors and Amazon Web Services, you are sure to see many parallels. It is important to understand the terminology and how the different database services in Azure best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in Azure. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and creates backups means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the Azure offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between Azure database services and SAS. Trace you path in the diagram and read on to learn more about connection details.

Integration points between Azure database services and SAS

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Azure offers several types of transactional database services. You can organize the Azure transactional database service into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers have workloads built around an enterprise database. Azure is an interesting use case because Microsoft is also a traditional enterprise database vendor. Amazon, for example, does not have existing on-premise enterprise database customers. Oracle cloud is the other big player in the enterprise market looking to migrate existing customers to their cloud. Slightly off topic, but it may be of interest to some, SAS does support customers running their Oracle database on Oracle Cloud Platform using their SAS Data Connector to Oracle. Azure offers a solution for customers looking to continue their relationship with Microsoft without refactoring their existing workflows. Customers bring an existing enterprise database licenses to Azure and run SQL Server on Virtual Machines. SAS has extended SAS Data Connector support for SQL Server on Virtual Machines. You can also use your existing SAS license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with SQL Server on Virtual Machines.

Remember you can install and manage your own database on a virtual machine. For example, support for both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata is available for Teradata installed on Azure. If there is not an available database as a service offering, the traditional backup and update responsibilities are left to the customer.

SQL Server Stretch Database is another service available in Azure. If you are not prepared to add more storage to your existing on-premise SQL Server database, you can add capacity using the resources available in Azure. SQL Server Stretch will scale your data to Azure without having to provision any more servers on-premise. New SQL Server capacity will be running in Azure instead of in your data center.

Open Source

Azure provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. You can use your existing SAS license for SAS Data Connector to MYSQL to connect to Azure Database for MYSQL and SAS Data Connector to PostgreSQL to interface with Azure Database for PostgreSQL. SAS has not yet formally supported Azure Database for MariaDB. MariaDB is a variant of MySQL, so validation of support for SAS Data Connector is coming soon. If you need support for MariaDB in Azure database, please comment below and I will share your feedback with product management and testing.

Cloud Native

Azure SQL Database is an iteration of Microsoft SQL Server built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Azure SQL Database. You can use your existing license for SAS Data Connector to Microsoft SQL Server to connect to Azure SQL Database.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Azure SQL Data Warehouse is the analytical database service offered by Azure. The SAS Data Connector to ODBC combined with a recent version of the Microsoft-supplied ODBC driver is currently the best way to interact with Azure SQL Data Warehouse. Look for the SAS Data Connector to Microsoft SQL Server to support SQL Data Warehouse soon.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. If you want a brief overview of these database structures, I cover them in my AWS database blog.

For Microsoft Azure, CosmosDB is the option available for NoSQL databases. CosmosDB is multi-model, meaning you can build out your databases to fit the NoSQL model you prefer. Use the SAS Data Connector to ODBC to interact with your Data in Azure CosmosDB.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Azure offers HDInsights as their Hadoop as a service offering. Azure HDInsights supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data you plan to store before you select an Azure database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases, remember that you will be charged for the storage you use and for the data that you move out of the database. Performing analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these by keeping frequently accessed data cached somewhere or remain on-premise. Another strategy I’ve seen becoming more popular is taking advantage of the SAS Micro Analytics Service to move the models you have built to run in the cloud provider where your data is stored. Data transfer is cheaper if that data moves between cloud services instead of outside of the cloud provider. Micro Analytics Service allows you to score the data in place without movement from a cloud provider and without having to do an install of SAS.

Additional Resources
1. Support for Databases in SAS® Viya® 3.4
2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Microsoft Azure was published on SAS Users.

8月 222019
 

Editor’s note: This is the first article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Amazon Web Services.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies moves to a cloud-based architecture, companies must consider questions like: Where is my data going to be stored? Do I want a hybrid solution? What cloud storage options do I have? What storage solution best meets my business needs?. Another question requiring an answer is: Is the software I use cloud-ready?. The answer in the case of SAS is, YES! SAS offers various cloud deployment patterns on various cloud providers and supports integration with cloud storage services.

This is part one in a series covering database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS). Microsoft Azure and Google Cloud Platform will be covered in future articles. The goal is to supply a breakdown of these services to better understanding the business requirements of these offerings and how they relate to SAS. I will focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya are called SAS Data Connect Accelerators and are synonymous with the SAS Embedded Process.

SAS integration with AWS

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Amazon Web Services database variants. A database running in AWS is much like your on-premise database, but instead Amazon is managing the software and hardware. Amazon’s DBaaS offerings take care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

AWS databases

Amazon offers database service technologies familiar to users. It is important to understand the new terminology and how the different database services best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in AWS. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and create backups automatically means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the AWS offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between AWS database services and SAS.

Integration points between AWS database services and SAS

Trace you path in the diagram and read on to learn more about connection details.

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Amazon offers several types of transactional database services. You can organize Amazon Relational Database Service (RDS) into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers already have workloads built around enterprise databases. Amazon provides a turn-key enterprise solution for customers not looking to break their relationship with enterprise vendors or refactor their existing workflows. AWS offers Oracle and Microsoft SQL Server as a turn-key enterprise solution in RDS. Both offerings include the required software license, however Oracle also allows you to “Bring Your Own License” (BYOL). SAS has extended SAS Data Connector support for both cloud variants. You can use your existing license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with these RDS databases.

Remember you can install and manage your own database on a virtual machine if there is not an available database as a service offering. The traditional backup and update responsibilities are left to the customer in this case. For example, both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata are supported for Teradata installed on AWS.

Open Source

Amazon provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. SAS has extended SAS Data Connector support for all these cloud variants. You can use your existing license for SAS Data Connector to MYSQL to connect to either RDS MYSQL or RDS MariaDB and SAS Data Connector to PostgreSQL to interface with RDS PostgreSQL.

Cloud Native

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Amazon Aurora. You can use your existing license for SAS Data Connector to MYSQL to connect to either Aurora MYSQL or and SAS Data Connector to PostgreSQL to interface with Aurora PostgreSQL.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Redshift is the analytical database service offered by Amazon. SAS has a dedicated product called SAS Data Connector to Amazon Redshift that was purpose built for analytics workloads running in the Amazon cloud.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. Here is a brief overview of the most common data structures.

Key-Value Database

A key-value database stores data as a collection of key-value pairs. The key acts as a unique identifier for each record. Amazon’s key-value database as a service is DynamoDB. SAS interacts with DynamoDB using industry standard ODBC or JDBC drivers.

Columnar Database

Data in a traditional relational database is sorted by rows. The alternative columnar databases optimize by sorting data quickly using columns, saving valuable time and network I/O. Redshift is the columnar database service offered by Amazon. SAS has a dedicated product called SAS Data Connector to Amazon Redshift that was purpose built for this database.

Document Database

A document database queries data in documents typically stored in JSON format. DocumentDB is the document database service offering from Amazon. SAS interacts with DocumentDB using industry standard ODBC or JDBC drivers. DocumentDB is MongoDB-compatible which means existing MongoDB drivers and tolls work with DocumentDB. Currently SAS is building out functionally to support SAS Data Connector to MongoDB and you should expect that to expand further into DocumentDB as well.

Graph Database

Amazon Neptune is the graph database service designed to work with a complex hierarchy of interconnected data. These design of these types of databases queries relationships in data and reduce the number of table joins. SAS interacts with Amazon Neptune using industry standard ODBC or JDBC drivers.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Amazon offers Elastic Map Reduce as their Hadoop as a service offering. Amazon Elastic supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data that you plan to store before you select an AWS database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases always remember that you will be charged for the storage that you use but also for the data that you move out of the database. To do analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these costs by keeping frequently accessed data cached somewhere or remain on-premise.

Additional Resources

  1. Support for Databases in SAS® Viya® 3.4
  2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Amazon Web Services was published on SAS Users.

11月 132013
 

Wow did the summer fly by! Now that grad school is over, it’s about time to resume the “it’s all about the data” series. In the last several posts, I tried to lay a foundation for understanding how SAS stores and manages data for use in business intelligence and analytic applications. Now, I’d like to turn your attention to third-party database engines, which is the term (or SAS-ism) for when data is housed in non-SAS data structures. Examples include the obvious choices like Microsoft SQL Server, Oracle, MySQL, Teradata and a dozen or so additional supported third party database interfaces.

This article provides a quick overview of how SAS “talks” to third-party databases. If you are an experienced SAS programmer, you probably know much of what’s covered here. This topic has been written about extensively over the years, so while I wont go into detail on every nuance, I thought (thot!) it would be useful to provide a high level overview for those SAS administrators who are not SAS programmers but have to support the data needs of their user community. The focus here is the necessary preconditions for connectivity and the differences between explicitly passing through your query to the database versus letting SAS handle the negotiations. Let’s get started!

I can’t seem to connect to my database from SAS, what’s wrong?
Whether you are on Windows or UNIX, you need to both an ODBC driver manager and an ODBC driver to be installed. The error in the log below is for SQL Server and shows that the connection could not be made. You need to look at your ODBC.ini file (UNIX) or ODBC Administrator (Windows) and confirm that you have specified the details correctly. For more information on how to connect, take a look at the articles here for UNIX and Windows

 

For Oracle, you need an Oracle client driver installed on the server, and the driver needs to know where to look for the database. This driver is usually defined in a file called tnsnames.ora that lives within the client driver directory. For more information, take a look at the articles here for connecting to Oracle databases on UNIX and Windows. Note that you can use either SAS/Access to Oracle (native driver) or SAS/Access to ODBC to make these connections from within SAS.

What is SQL Pass-thru and why should I care?
In SAS, you also have the option of using two features to access your third-party database:  libname access or explicit pass-thru capabilities.

Libname access is just like it sounds – you simply use the LIBNAME statement in SAS and specify the engine along with your credentials and database specific attributes. Once the libref is instantiated, then you can use references to the library just as if it were a SAS library. Here is a quick example:

Explicit pass-thru, on the other hand, is different in that you are not letting SAS do any of the database negotiations for you. Instead, you tell SAS to pass through everything in your program directly to the database for processing. Here is the same connection to sqlsvr, this time using explicit pass-thru.

This is perhaps one of the better-documented SAS capabilities as it relates to best practice. For example, Jessica Hampton discusses SQL pass-thru in her NESUG 2011 paper "SQL Pass-Through and the ODBC Interface". Here, she compares the libname method (implicit) versus explicit pass-thru method. 

There are so many options for connecting SAS to my database, which is the best?
This is a common question. After all, you can use Microsoft SQL Server using ODBC, SAS/Access to Microsoft SQL Server and OLE DB. For Oracle, you also have the option of using the native driver as well as ODBC. You’ll also see that this is true for many other database such as Teradata, mySQL, and so on.  Generally the choice comes down to performance or flexibility:

  • The best performance will always be the native driver for that database.
  • However, if you want the flexibility of talking to lots of databases and don’t mind having to find and purchase individual ODBC drivers, then the more general SAS/Access to ODBC may be right for you.

One word of caution, however:  if you use explicit SQL pass-thru, then you have to comply with the rules of the database driver that you are using. For example, there are lots of capabilities within the Oracle SQL language that are not available in ODBC. So if you want the full power of using database-specific language elements, then you will want to go with the native drivers.

What is the best option for connecting to Microsoft SQL Server?
For Microsoft SQL Server, people often ask if they should use SAS/Access to ODBC or the native SAS/Access to SQL Server engine? One of the best “how-to” articles on SAS and Microsoft SQL Server can be found in the SAS Tech Note "Accessing a Microsoft SQL Server Database from SAS on Microsoft Windows".  Although this note is a bit dated, it provides a good foundation for most users.

What might be surprising to some is that the native engine is actually a Microsoft SQL Server ODBC driver that is shipped with the SAS/Access product. SAS licenses the DataDirect ODBC driver for you so that you don’t have to find and eventually purchase another ODBC driver for your environment.

 The DataDirect drivers have been proven to work efficiently with SAS, and they go through extensive testing before each version of SAS is released. On the other hand, if you want the flexibility of using your own ODBC drivers or have a need to connect to other database via ODBC, then SAS/Access to ODBC might be the route for you.

In the SAS Community, you may see papers that talk about Microsoft SQL Server that use ODBC and some use the native access engine:  sqlsvr. Be sure to use the engine that you have licensed as well as the syntax appropriate to that engine.

There are other methods that can be used to get data in and out of Microsoft SQL Server from SAS such as OLE DB, but I’ll leave that to you to read "Getting Started with SAS/ACCESS Interface to OLE DB" as homework.

 Can I use SAS/Access to ODBC and SAS/Access to Microsoft SQL Server on the same machine?
A slightly more confusing answer comes when people want to use both SAS/Access to ODBC and SAS/Access to Microsoft SQL Server on the same machine. Without going into a detailed explanation, the short answer is “yes – you can, but you might not want to do that.”

Supporting SAS/Access products on the same machine requires that you set up the environment so that the drivers are defined correctly in the odbc.ini file and may necessitate using two startup scripts when launching SAS depending on how you want to install and configure. As long as you are precise about where you install the software and how the environment variables are instantiated, it should work.

Where do I go from here?
Obviously this topic is huge! Since there are already so many good references around this topic, I’ll just highlight a few more:

For those new SAS administrators, are there other questions that you have about accessing data from third party relational database management systems?

Remember--Happy Data, Happy Users!

--greg

tags: database engines, Oracle, SAS Administrators, sql
2月 132013
 

I am a big fan of SAS Management Console. This is the central location where you can perform many of your SAS administration tasks. One of the most important tasks of a SAS Administrator is making sure data is available to the right people when they need it. Most of my customers have data that sit in several different locations. It is very common for that data to reside on a DBMS. We can use SAS Management Console to make it easy for SAS users to work with DBMS data.

First thing to note—like many things within SAS Metadata, SAS Management Console itself is role-based. If you are handing out SAS Management Console to more people than just your SAS Administrators, you can set it up so that a user can see only the plug-ins that apply to their job. For example, a user can have a role that allows them to see the Library Manager but not the Server Manager or the User Manager plug-ins.

There are two main steps to set up access to DBMS tables. First, you register the DBMS server. Second, you register the DBMS library. A SAS administrator who has privileges to work with the Server Manager plug-in in SAS Management Console can complete the necessary tasks to set up the access to DBMS tables. For illustration purposes, I’m going to show a SQL Server DBMS. However, these steps can be generalized to any DBMS.

Register the DBMS server

First, let’s register a SQL Server inside SAS Management Console using the Server Manager plug-in.

  1. Open up SAS Management Console and log in as a user that has administrator privileges. Right-click Server Manager and select the New Server option to access the New Server wizard.

  1. Select the database server type from the Database Servers list. Then, click Next.

  1. Enter an appropriate server name in the Name field. Click Next.

  1. From the drop down menu, select the appropriate data source type if you are working with ODBC. I am selecting “ODBC- Microsoft SQL Server(PC Client)”. Accept the remaining defaults for the server properties. Click Next.

  1. Specify the database vendor-specific values on the connection properties page. If the user credentials for the database are different from the credentials used to log in to SAS, then you must create an authentication domain to store valid database credentials. Click New if you wish to create the authentication domain now. Otherwise, use the drop-down menu under Authentication Domain to select the appropriate authentication. Then click Next.

  1. Examine the final page of the wizard to ensure that the proper values have been entered. Click Finish to save the wizard settings.

Register the DBMS library

Next, following our previous example, let’s register the SQL Server library.

  1. In SAS Management Console, expand Data Library Manager. Right-click Libraries and select the New Library option to access the New Library wizard.

  1. Select the database type from the Database Data list. Click Next.

  1. Enter an appropriate library name in the Name field. The Location field is the Metadata Folder structure. You can change it by clicking Browse. Click Next when finished.

  1. Select an application server from the list, and use the right arrow to assign the application server. This step makes the library available to the server and makes the library visible to users of the server. Click Next.

  1. Specify a libref on the library properties page. A libref is simply an 8 character alias to a library, and it follows standard SAS syntax (no special characters, cannot begin with a numeric). You can also click Advanced Options to perform tasks such as pre-assigning a library. However, I always recommend to my customers not to pre-assign a library if they are hitting a production database. Click Next to access the next page of the wizard.

  1. On the server and connection page, select the database server we set up previously. Contact your database administrator if you are unsure of the correct value for the schema field. Click Next.

  1. Examine the final page of the wizard to ensure that the proper values have been entered. Click Finish to save the library settings.

From here, you can register tables and work with your DBMS data the same way you would work with SAS tables!

Please note that some of the techniques discussed here require a SAS/Access engine to allow SAS to reach the data source. The above technique used the SAS/Access to ODBC engine. See
Chapter 2 of the SAS 9.3 Intelligence Platform, Data Administration Guide: Connecting to Common Data Sources
.

tags: database engines, SAS Administrators, SAS Management Console
1月 162013
 

Happy belated New Year!  For what it’s worth, no, I am not calling you a “dummy” or writing this just for Chris Hemedinger– just wanted to get your attention.  This blog is actually an introduction to a series that I am kicking off this year entitled:  "It's all about the data…"

Whether you be “grey and grizzled” or new to this SAS stuff, sometimes it can be helpful to see how others think about things.  In this series, I am going to cover a lot of ground, which is ultimately paramount to everything we do in SAS.  I love reading what Wendy McHenry and the other bloggers write about SAS Metadata and could never compete with their metadata mind-share!  Instead, I am going to focus on data, data, data! – none of this “data about data” stuff!

So this may be as boring as watching disks spin in a refrigerated data center, but I trust that you will keep me honest and alert with how you think about these things.

In this series, I will start with an overview of how SAS “eats” data and what SAS cares about (metaphorically speaking, of course).    Here, we will cover the fundamentals of how SAS works across all data engines. We will follow this up with subsequent articles covering topics like:

  • Understanding SAS Libraries (Physical versus metadata, native versus database, security 101, I/O and system characteristics)
  • How SAS clients see data (understanding permissions, library references, pre-assigned libraries, metadata library options, batch versus interactive and details on the how DI Studio, Enterprise Guide, Web Report Studio see data as well as  as best practice on managing library references)
  • Third-party database engines:
    • Oracle Libraries
    • SQL Server Libraries
    • ODBC
    • Testing and Debugging Connection Issues including benchmarking performance (and of course tuning)
    • Understanding database credentials (shared ids vs. dedicated)
    • Predicting Data Growth – Simple tools for estimating volumetrics and managing cleanup
    • File Systems Primer (types of storage, understanding raid, improving I/O, benchmarking your own performance)
    • Data Architecture for SAS Grid Manager

Hopefully you will find this interesting and challenge me with topics that will expand the list with the things that you care about most.

Until then, see you soon!

-greg

tags: database engines, SAS Administrators