Joe Furbee

6月 042020
 

Learning never stops. When SAS had to change this year’s SAS Global Forum (SGF) to a virtual event, everyone was disappointed. I am, however, super excited about all of the papers and stream of video releases over the last month (and I encourage you to register for the upcoming live event in June). For now, I made a pact with myself to read or watch one piece of SGF related material per day. While I haven’t hit my goal 100%, I sure have learned a lot from all the reading and viewing. One particular paper, Using Jupyter to Boost Your Data Science Workflow, and its accompanying video by Hunter Glanz caught my eye this week. This post elaborates on one piece of his material: how to save Jupyter notebooks in other file formats.

Hunter’s story

Hunter is a professor who teaches multiple classes using SAS® University Edition, which comes equipped with an integrated Jupyter notebook. His focus is on SAS programming and he requires his students to create notebooks to complete assignments; however he wants to see the results of their work, not to run their raw code. The notebooks include text, code, images, reports, etc. Let's explore how the students can transform their navitve notebooks into other, more consumable formats. We'll also discuss other use cases in which SAS users may want to create a copy of their work from a notebook, to say a .pdf, .html, or .py file, just to name a few.

What you’ll find here and what you won’t

This post will not cover how to use Jupyter notebooks with SAS or other languages. There is a multitude of other resources, starting with Hunter’s work, to explore those topics. This post will cover how to produce other file formats in SAS, Python, and R. I’ll outline multiple methods including a point-and-click method, how to write inline code directly in the notebook, and finally using the command line.

Many of the processes discussed below are language agnostic. When there are distinct differences, I’ll make a note.

A LITTLE about Jupyter notebooks

A Jupyter notebook is a web application allowing clients to run commands, view responses, include images, and write inline text all in one concourse. The all-encompassing notebook supports users to telling complete story without having to use multiple apps. Jupyter notebooks were originally created for the Python language, and are now available for many other programming languages. JupyterLab, the notebooks’ cousin, is a later, more sophisticated version, but for this writing, we’ll focus on the notebook. The functionality in this use case is similar.

Where do we start? First, we need to install the notebook, if you're not working in a SAS University Edition.

Install Anaconda

The easiest way to get started with the Jupyter Notebook App is by installing Anaconda (this will also install JupyterLab). Anaconda is an open source distribution tool for the management and deployment of scientific computing. Out-of-the-box, the notebook from the Anaconda install includes the Python kernel. For use with other languages, you need to install additional kernels.

Install additional language kernels

In this post, we’ll focus on Python, R, and SAS. The Python kernel is readily available after the Anaconda install. For the R language, follow the instructions on the GitHub R kernel repository. I also found the instructions on How to Install R in Jupyter with IRKernel in 3 Steps quite straight forward and useful. Further, here are the official install instructions for the SAS kernel and a supporting SAS Community Library article.

With the additional kernels are in place, you should see all available languages when creating a new notebook as pictured below.

Available kernels list

File conversion methods

Now we’re ready to dive into the export process. Let’s look at three approaches in detail.

Download (Export) option

Once you’ve opened your notebook and run the code, select File-> Download As (appears as Export Notebook As… in JupyterLab).

"Download As"  option in Jupyter notebook

"Export Notebook As" option in JupyterLab

HTML format output

Notice the list of options, some more familiar than others. Select the HTML option and Jupyter converts your entire notebook: text, commands, figures, images, etc, into a file with a .html extension. Opening the resulting file would display in a browser as expected. See the images below for a comparison of the .ipynb and .html files.

SAS code in a Jupyther notebook

Corresponding SAS code notebook in html form

SAS (aka script) format output

Using the Save As-> SAS option renders a .sas file and is depicted in Enterprise Guide below. Note: when using a different kernel, say Python or R, you have the option to save in that language specific script format.

SAS code saved from a notebook displayed in Enterprise Guide

One thing to note here is only the code appears in the output file. The markdown code, figures, etc., from the original notebook, are not display options in EG, so they are removed.

PDF format output

There is one (two actually) special case(s) I need to mention. If you want to create a PDF (or LaTeX, which is used to create pdf files) output of your notebook, you need additional software. For converting to PDF, Jupyter uses the TeX document preparation ecosystem. If you attempt to download without TeX, the conversion fails, and you get a message to download TeX. Depending on your OS the TeX software will have a different name but will include TeX in the name. You may also, in certain instances, need Pandoc for certain formats. I suggest installing both to be safe. Install TeX from its dowload site. And do the same for Pandoc.

Once I’ve completed creating the files, the new files appear in my File Explorer.

New SAS file in Windows File Explorer

Cheaters may never win, but they can create a PDF quickly

Well, now that we’ve covered how to properly convert and download a .pdf file, there may be an easier way. While in the notebook, press the Crtl + P keys. In the Print window, select the Save to PDF option, choose a file destination and save. It works, but I felt less accomplished afterward. Your choice.

Inline code option

Point-and-click is a perfectly valid option, but let’s say you want to introduce automation into your world. The jupyter nbconvert command provides the capability to transform the current notebook into any format mentioned earlier. All you must do is pass the command with a couple of parameters in the notebook.

In Python, the nbconvert command is part of the os library. The following lines are representative of the general structure.

import os
os.system("jupyter nbconvert myNotebook.ipynb --to html")

An example with Python

The example below is from a Python notebook. The "0" out code represents success.

Code to create a PDF file from a Python notebook

An example with SAS

As you see with the Python example, the code is just that: Python. Generally, you cannot run Python code in a Jupyter notebook running the SAS kernel. Luckily we have Jupyter magics, which allow us to write and run Python code inside a SAS kernel. The magics are a two-way street and you can also run SAS code inside a Python shell. See the SASPy documentation for more information.

The code below is from a SAS notebook, but is running Python code (triggered by the %%python magic).

Code to create a PDF file from a SAS notebook

The EmployeeChurnSASCode.pdf file is created in same directory as the original notebook file:

Jupyter file system display in a web browser

An example with R

Things are fairly straight forward in an R notebook. However, you must install and load the nbconvert package.

Code to create an HTML file from an R notebook

The first line installs the package, the second line loads the package, and the third actually does the conversion. Double-check your paths if you run into trouble.

The command line

The last method we look at is the command line. This option is the same regardless of the language with which you’re working. The possibilities are endless for this option. You could include it in a script, use it in code to run and display in a web app, or create the file and email it to a colleague. The examples below were all run on a Windows OS machine using the Anaconda command prompt.

An example with a SAS notebook

Convert sasNotebook.ipynb to a SAS file.

>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
 
>> jupyter nbconvert --to script sasNotebook.ipynb
[NbConvertApp] Converting notebook sasNotebook.ipynb to script
[NbConvertApp] Writing 351 bytes to sasNotebook.sas
 
>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
-rw-r--r-- 1 jofurb 1049089     369 May 29 14:57 sasNotebook.sas

An example with a Python notebook

Convert 1_load_data.ipynb to a PDF file

>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
 
>> jupyter nbconvert 1_load_data.ipynb --to pdf
[NbConvertApp] Converting notebook 1_load_data.ipynb to pdf
[NbConvertApp] Writing 27341 bytes to .\notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', '.\\notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', '.\\notebook']
[NbConvertApp] WARNING | b had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 32957 bytes to 1_load_data.pdf
 
>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
-rw-r--r-- 1 jofurb 1049089  32957 May 29 15:23 1_load_data.pdf

An example with an R notebook

Convert HR_R.ipynb to an R file.

>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
 
>> jupyter nbconvert HR_R.ipynb --to script
[NbConvertApp] Converting notebook HR_R.ipynb to script
[NbConvertApp] Writing 981 bytes to HR_R.r
 
>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
-rw-r--r-- 1 jofurb 1049089   1021 May 29 15:44 HR_R.r

Wrapping things up

Whether you’re a student of Hunter’s, an analyst creating a report, or a data scientist monitoring data streaming models, you may have the need/requirement to transform you work from Jupyter notebook to a more consumable asset. Regardless of the language of your notebook, you have multiple choices for saving your work including menu options, inline code, and from the command line. This is a great way to show off your creation in a very consumable mode.

How to save Jupyter notebooks in assorted formats was published on SAS Users.

6月 042020
 

Learning never stops. When SAS had to change this year’s SAS Global Forum (SGF) to a virtual event, everyone was disappointed. I am, however, super excited about all of the papers and stream of video releases over the last month (and I encourage you to register for the upcoming live event in June). For now, I made a pact with myself to read or watch one piece of SGF related material per day. While I haven’t hit my goal 100%, I sure have learned a lot from all the reading and viewing. One particular paper, Using Jupyter to Boost Your Data Science Workflow, and its accompanying video by Hunter Glanz caught my eye this week. This post elaborates on one piece of his material: how to save Jupyter notebooks in other file formats.

Hunter’s story

Hunter is a professor who teaches multiple classes using SAS® University Edition, which comes equipped with an integrated Jupyter notebook. His focus is on SAS programming and he requires his students to create notebooks to complete assignments; however he wants to see the results of their work, not to run their raw code. The notebooks include text, code, images, reports, etc. Let's explore how the students can transform their navitve notebooks into other, more consumable formats. We'll also discuss other use cases in which SAS users may want to create a copy of their work from a notebook, to say a .pdf, .html, or .py file, just to name a few.

What you’ll find here and what you won’t

This post will not cover how to use Jupyter notebooks with SAS or other languages. There is a multitude of other resources, starting with Hunter’s work, to explore those topics. This post will cover how to produce other file formats in SAS, Python, and R. I’ll outline multiple methods including a point-and-click method, how to write inline code directly in the notebook, and finally using the command line.

Many of the processes discussed below are language agnostic. When there are distinct differences, I’ll make a note.

A LITTLE about Jupyter notebooks

A Jupyter notebook is a web application allowing clients to run commands, view responses, include images, and write inline text all in one concourse. The all-encompassing notebook supports users to telling complete story without having to use multiple apps. Jupyter notebooks were originally created for the Python language, and are now available for many other programming languages. JupyterLab, the notebooks’ cousin, is a later, more sophisticated version, but for this writing, we’ll focus on the notebook. The functionality in this use case is similar.

Where do we start? First, we need to install the notebook, if you're not working in a SAS University Edition.

Install Anaconda

The easiest way to get started with the Jupyter Notebook App is by installing Anaconda (this will also install JupyterLab). Anaconda is an open source distribution tool for the management and deployment of scientific computing. Out-of-the-box, the notebook from the Anaconda install includes the Python kernel. For use with other languages, you need to install additional kernels.

Install additional language kernels

In this post, we’ll focus on Python, R, and SAS. The Python kernel is readily available after the Anaconda install. For the R language, follow the instructions on the GitHub R kernel repository. I also found the instructions on How to Install R in Jupyter with IRKernel in 3 Steps quite straight forward and useful. Further, here are the official install instructions for the SAS kernel and a supporting SAS Community Library article.

With the additional kernels are in place, you should see all available languages when creating a new notebook as pictured below.

Available kernels list

File conversion methods

Now we’re ready to dive into the export process. Let’s look at three approaches in detail.

Download (Export) option

Once you’ve opened your notebook and run the code, select File-> Download As (appears as Export Notebook As… in JupyterLab).

"Download As"  option in Jupyter notebook

"Export Notebook As" option in JupyterLab

HTML format output

Notice the list of options, some more familiar than others. Select the HTML option and Jupyter converts your entire notebook: text, commands, figures, images, etc, into a file with a .html extension. Opening the resulting file would display in a browser as expected. See the images below for a comparison of the .ipynb and .html files.

SAS code in a Jupyther notebook

Corresponding SAS code notebook in html form

SAS (aka script) format output

Using the Save As-> SAS option renders a .sas file and is depicted in Enterprise Guide below. Note: when using a different kernel, say Python or R, you have the option to save in that language specific script format.

SAS code saved from a notebook displayed in Enterprise Guide

One thing to note here is only the code appears in the output file. The markdown code, figures, etc., from the original notebook, are not display options in EG, so they are removed.

PDF format output

There is one (two actually) special case(s) I need to mention. If you want to create a PDF (or LaTeX, which is used to create pdf files) output of your notebook, you need additional software. For converting to PDF, Jupyter uses the TeX document preparation ecosystem. If you attempt to download without TeX, the conversion fails, and you get a message to download TeX. Depending on your OS the TeX software will have a different name but will include TeX in the name. You may also, in certain instances, need Pandoc for certain formats. I suggest installing both to be safe. Install TeX from its dowload site. And do the same for Pandoc.

Once I’ve completed creating the files, the new files appear in my File Explorer.

New SAS file in Windows File Explorer

Cheaters may never win, but they can create a PDF quickly

Well, now that we’ve covered how to properly convert and download a .pdf file, there may be an easier way. While in the notebook, press the Crtl + P keys. In the Print window, select the Save to PDF option, choose a file destination and save. It works, but I felt less accomplished afterward. Your choice.

Inline code option

Point-and-click is a perfectly valid option, but let’s say you want to introduce automation into your world. The jupyter nbconvert command provides the capability to transform the current notebook into any format mentioned earlier. All you must do is pass the command with a couple of parameters in the notebook.

In Python, the nbconvert command is part of the os library. The following lines are representative of the general structure.

import os
os.system("jupyter nbconvert myNotebook.ipynb --to html")

An example with Python

The example below is from a Python notebook. The "0" out code represents success.

Code to create a PDF file from a Python notebook

An example with SAS

As you see with the Python example, the code is just that: Python. Generally, you cannot run Python code in a Jupyter notebook running the SAS kernel. Luckily we have Jupyter magics, which allow us to write and run Python code inside a SAS kernel. The magics are a two-way street and you can also run SAS code inside a Python shell. See the SASPy documentation for more information.

The code below is from a SAS notebook, but is running Python code (triggered by the %%python magic).

Code to create a PDF file from a SAS notebook

The EmployeeChurnSASCode.pdf file is created in same directory as the original notebook file:

Jupyter file system display in a web browser

An example with R

Things are fairly straight forward in an R notebook. However, you must install and load the nbconvert package.

Code to create an HTML file from an R notebook

The first line installs the package, the second line loads the package, and the third actually does the conversion. Double-check your paths if you run into trouble.

The command line

The last method we look at is the command line. This option is the same regardless of the language with which you’re working. The possibilities are endless for this option. You could include it in a script, use it in code to run and display in a web app, or create the file and email it to a colleague. The examples below were all run on a Windows OS machine using the Anaconda command prompt.

An example with a SAS notebook

Convert sasNotebook.ipynb to a SAS file.

>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
 
>> jupyter nbconvert --to script sasNotebook.ipynb
[NbConvertApp] Converting notebook sasNotebook.ipynb to script
[NbConvertApp] Writing 351 bytes to sasNotebook.sas
 
>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
-rw-r--r-- 1 jofurb 1049089     369 May 29 14:57 sasNotebook.sas

An example with a Python notebook

Convert 1_load_data.ipynb to a PDF file

>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
 
>> jupyter nbconvert 1_load_data.ipynb --to pdf
[NbConvertApp] Converting notebook 1_load_data.ipynb to pdf
[NbConvertApp] Writing 27341 bytes to .\notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', '.\\notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', '.\\notebook']
[NbConvertApp] WARNING | b had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 32957 bytes to 1_load_data.pdf
 
>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
-rw-r--r-- 1 jofurb 1049089  32957 May 29 15:23 1_load_data.pdf

An example with an R notebook

Convert HR_R.ipynb to an R file.

>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
 
>> jupyter nbconvert HR_R.ipynb --to script
[NbConvertApp] Converting notebook HR_R.ipynb to script
[NbConvertApp] Writing 981 bytes to HR_R.r
 
>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
-rw-r--r-- 1 jofurb 1049089   1021 May 29 15:44 HR_R.r

Wrapping things up

Whether you’re a student of Hunter’s, an analyst creating a report, or a data scientist monitoring data streaming models, you may have the need/requirement to transform you work from Jupyter notebook to a more consumable asset. Regardless of the language of your notebook, you have multiple choices for saving your work including menu options, inline code, and from the command line. This is a great way to show off your creation in a very consumable mode.

How to save Jupyter notebooks in assorted formats was published on SAS Users.

4月 132020
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various storage options available in Microsoft Azure. Access all the articles in the series here.

In this edition of the series on SAS and cloud integration, I cover the various storage options available on Microsoft Azure and how connect to and interact with them. I focus on three key storage services: object storage, block storage, and file storage. In my previous articles I have covered topics regarding database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS) as well as DBaaS on Azure.

Object Storage

Azure Blob Storage is a low-cost, scalable cloud object storage service for any type of data. Objects are a great way to store large amounts of unstructured data in their native formats. Individual Azure Blob objects size up to 4.75 terabytes (TB). Azure organizes these objects into different storage accounts. Because a storage account is a globally unique namespace for your data, no two storage accounts can have the same name. The storage account supplies a unique namespace for your data and is accessible from anywhere in the world over HTTP or HTTPS.

A Container organizes a set of Blobs similar to a traditional directory in a file system. You access Azure Blobs directly through an API from anywhere in the world. For security reasons, it is vital to grant least access to a Blob.

Make sure you are being intentional about opening objects up and are not exposing any sensitive data. Security controls are offered within individual blobs and containers that organize them. The default is to create objects and blobs with no public read access, then you may grant permissions to individual users and groups.

The total cost of blob storage depends on volume of data stored, type of operations performed, data transfer costs, and data redundancy choices. You can reduce the number of replicants or use one of the various tiers of archive services to reduce the cost of your object storage. Terabytes of storage used per month determine the calculations on cost. You incur added costs for data requests and transfers over the network. Data movement is an unpredictable expense for many users.

Azure Blob Storage Tiers
Hot Frequently accessed data
Cool Infrequently accessed data – archived at least 30 days
Archive Rarely accessed data – archived at least 180 days

 
In SAS Viya 3.5, direct support is available for objects stored in Azure Data Lake Storage Gen2. Azure Data Lake Storage Gen2 extends Azure Blob Storage capabilities and optimizing it for analytics workloads. If you want to read any SAS datasets, CSV and ORC files from Azure Blob Storage, you can read them directly using a CASLIB statement to Azure Data Lake Storage (ADLS). If you have files in a different format, you can always copy them to a local file system accessible to the CAS controller. Use CAS Actions to load tables into memory. Making HTTP requests directly from within your SAS code using Proc HTTP for the download process favors automation. Remember, no restrictions exist on file types for objects moving into object storage. Hence, this may require a SAS Data Connector to read some local file system filetypes.

Block Storage

Auzre Disks is the block storage service designed for use with Azure Virtual Machines. You may only access block storage when attached to an operating system. When thinking about Azure Disks, treat the storage volumes as an independent disk drive controlled by the server operating system. Mount an Azure Disk to an operating system as if it were a physical disk. Azure Disks are valuable because they are the persisting storage when you terminate your compute instance. You can choose from four different volume types that supply performance levels at corresponding costs.

Azure makes available a choice from HDD or three different performance classes of SSD: Standard, Premium, and Ultra performance. You can use Ultra Disk if you need the lowest latency and scalable performance. Standard SDD is the most cost effective while Premium SSD is the high-performance disk offering. The table below sums up four offerings.

Azure Disk Storage Types
Standard HDD Standard SDD Premium SDD Ultra SDD
Backups and Non critical Development or Test environments and lightly used workloads Production environments and time sensitive workloads High throughput and IOPS - Transaction heavy workloads

 
Azure Disks are the permanent SAS data storage, persisting through a restart of your SAS environment. The disk performance used when selecting from the different Azure Disk type has a direct impact on the performance you get from SAS. A best practice is to use compute instances with enhanced Azure Disks performance or dedicated solid state drive instance storage.

File Storage

Azure Files provides access to data through a shared file system. The elastic network file system grows and shrinks as you add or remove files, so you only pay for the storage you consume. Users create, delete, modify, read, and write files organized logically in a directory structure for intuitive access. This service allows simultaneous access for multiple users to a common set of files data managed by user and group permissions.

Azure Files is a powerful tool, especially if utilizing a SAS Grid architecture. If you have a requirement in your SAS architecture for a shared location where any node in a group can access and write to, then Azure Files could meet your requirement. To access the data stored in your network file system you will have to mount the file system to your operating system. You can mount Azure Files to any Azure Virtual Machine, or even to an on-premise server within your Azure Virtual Network. Azure Files is a fantastic way to setup a shared file system not only for your data but also to share projects and code between users.

Finally

Storage is a key component of cloud computing because it enables users to stop their compute instances while their most important data remains in place. Storage services make it much easier to manage and scale your data. For example, Blob storage is a great place to store files that you want to make available to anyone, anywhere.

Block storage drives the performance of your environment. Abundant and performant block storage is essential to making your application run against the massive scale of data that SAS is eager to consume. Block storage is where your operating system and software ultimately are installed.

File storage is a great service to attach shared file systems to your compute instances. This is a great place to collaborate or migrate file system data from one compute instance to another. SAS is a compute engine running on data.

Without a robust set up storage tools to persist that data you may not get the performance that you desire or the progress you make will be lost when you shut down your compute instances.

Resources

Storage in the Cloud – SAS and Azure was published on SAS Users.

12月 042019
 

Site relaunches with improved content, organization and navigation.

In 2016, a cross-divisional SAS team created developer.sas.com. Their mission: Build a bridge between SAS (and our software) and open source developers.

The initial effort made available basic information about SAS® Viya® and integration with open source technologies. In June 2018, the Developer Advocate role was created to build on that foundation. Collaborating with many of you, the SAS Communities team has improved the site by clarifying its scope and updating it consistently with helpful content.

Design is an iterative process. One idea often builds on another.

-- businessman Mark Parker

The team is happy to report that recently developer.sas.com relaunched, with marked improvements in content, organization and navigation. Please check it out and share with others.

New overview page on developer.sas.com

The developer experience

The developer experience goes beyond the developer.sas.com portal. The Q&A below provides more perspective and background.

What is the developer experience?

Think of the developer experience (DX) as equivalent to the user experience (UX), only the developer interacts with the software through code, not points and clicks. Developers expect and require an easy interface to software code, good documentation, support resources and open communication. All this interaction occurs on the developer portal.

What is a developer portal?

The white paper Developer Portal Components captures the key elements of a developer portal. Without going into detail, the portal must contain (or link to) these resources: an overview page, onboarding pages, guides, API reference, forums and support, and software development kits (SDKs). In conjunction with the Developers Community, the site’s relaunch includes most of these items.

Who are these developers?

Many developers fit somewhere in these categories:

  • Data scientists and analysts who code in open source languages (mainly Python and R in this case).
  • Web application developers who create apps that require data and processing from SAS.
  • IT service admins who manage customer environments.

All need to interact with SAS but may not have written SAS code. We want this population to benefit from our software.

What is open source and how is SAS involved?

Simply put, open source software is just what the name implies: the source code is open to all. Many of the programs in use every day are based on open source technologies: operating systems, programming languages, web browsers and servers, etc. Leveraging open source technologies and integrating them with commercial software is a popular industry trend today. SAS is keeping up with the market by providing tools that allow open source developers to interact with SAS software.

What is an API?

All communications between open source and SAS are possible through APIs, or application programming interfaces. APIs allow software systems to communicate with one another. Software companies expose their APIs so developers can incorporate functionality and send or request data from the software.

Why does SAS care about APIs?

APIs allow the use of SAS analytics outside of SAS software. By allowing developers to communicate with SAS through APIs, customer applications easily incorporate SAS functions. SAS has created various libraries to aid in open source integration. These tools allow developers to code in the language of their choice, yet still interface with SAS. Most of these tools exist on github.com/sassoftware or on the REST API guides page.

A use case for SAS APIs

A classic use of SAS APIs is for a loan default application. A bank creates a model in SAS that determines the likelihood of a customer defaulting on a loan based on multiple factors. The bank also builds an application where a bank representative enters the information for a new potential customer. The bank application code uses APIs to communicate this information to the SAS model and return a credit decision.

What is a developer advocate?

A developer advocate is someone who helps developers succeed with a platform or technology. Their role is to act as a bridge between the engineering team and the developer community. At SAS, the developer advocate fields questions and comments on the Developers Community and works with R&D to provide answers. The administration of developer.sas.com also falls under the responsibility of the developer advocate.

We’re not done

The site will continue to evolve, with additions of other SAS products and offerenings, and other initiatives. Check back often to see what’s new.
Now that you are an open source and SAS expert, please check out the new developer.sas.com. We encourage feedback and suggestions for content. Leave comments and questions on the site or contact Joe Furbee: joe.furbee@sas.com.

developer.sas.com 2.0: More than just a pretty interface was published on SAS Users.

11月 052019
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Microsoft Azure. Access all the articles in the series here.

The series

This is the next iteration of a series covering database as a service (DBaaS) and storage offerings in the cloud, this time from Microsoft Azure. I have already published two articles on Amazon Web Services. One of those articles covers the DBaaS offerings and the other covers storage offerings for Amazon Web Services. I will cover Google Cloud Platform in future articles. The goal of these articles is to supply a breakdown of these services to better understand the business requirements of these offerings and how they relate to SAS. I would encourage you to read all the articles in the series even if you are already using a specific cloud provider. Many of the core technologies and services are offered across the different cloud providers. These articles focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya, called the SAS Data Connect Accelerator, are synonymous with the SAS Embedded Process.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies move to a cloud-based architecture, companies must consider questions like: Where do I store my data? What cloud services best meet my business requirements? Which cloud vendor should I use? Can I migrate my applications to the cloud? If you are looking to migrate your SAS infrastructure to Azure, look at the SAS Viya QuickStart Template for Azure to see a rapid deployment pattern to get the SAS Viya platform up and running in Azure.

SAS integration with Azure

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Azure database variants. A database running in Azure is much like your on-premise database, but instead Microsoft manages the software and hardware. Azure’s DBaaS offerings takes care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

Azure databases

Azure offers database service technologies familiar to many users. If you read my previous article on SAS Data Connectors and Amazon Web Services, you are sure to see many parallels. It is important to understand the terminology and how the different database services in Azure best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in Azure. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and creates backups means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the Azure offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between Azure database services and SAS. Trace you path in the diagram and read on to learn more about connection details.

Integration points between Azure database services and SAS

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Azure offers several types of transactional database services. You can organize the Azure transactional database service into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers have workloads built around an enterprise database. Azure is an interesting use case because Microsoft is also a traditional enterprise database vendor. Amazon, for example, does not have existing on-premise enterprise database customers. Oracle cloud is the other big player in the enterprise market looking to migrate existing customers to their cloud. Slightly off topic, but it may be of interest to some, SAS does support customers running their Oracle database on Oracle Cloud Platform using their SAS Data Connector to Oracle. Azure offers a solution for customers looking to continue their relationship with Microsoft without refactoring their existing workflows. Customers bring an existing enterprise database licenses to Azure and run SQL Server on Virtual Machines. SAS has extended SAS Data Connector support for SQL Server on Virtual Machines. You can also use your existing SAS license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with SQL Server on Virtual Machines.

Remember you can install and manage your own database on a virtual machine. For example, support for both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata is available for Teradata installed on Azure. If there is not an available database as a service offering, the traditional backup and update responsibilities are left to the customer.

SQL Server Stretch Database is another service available in Azure. If you are not prepared to add more storage to your existing on-premise SQL Server database, you can add capacity using the resources available in Azure. SQL Server Stretch will scale your data to Azure without having to provision any more servers on-premise. New SQL Server capacity will be running in Azure instead of in your data center.

Open Source

Azure provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. You can use your existing SAS license for SAS Data Connector to MYSQL to connect to Azure Database for MYSQL and SAS Data Connector to PostgreSQL to interface with Azure Database for PostgreSQL. SAS has not yet formally supported Azure Database for MariaDB. MariaDB is a variant of MySQL, so validation of support for SAS Data Connector is coming soon. If you need support for MariaDB in Azure database, please comment below and I will share your feedback with product management and testing.

Cloud Native

Azure SQL Database is an iteration of Microsoft SQL Server built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Azure SQL Database. You can use your existing license for SAS Data Connector to Microsoft SQL Server to connect to Azure SQL Database.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Azure SQL Data Warehouse is the analytical database service offered by Azure. The SAS Data Connector to ODBC combined with a recent version of the Microsoft-supplied ODBC driver is currently the best way to interact with Azure SQL Data Warehouse. Look for the SAS Data Connector to Microsoft SQL Server to support SQL Data Warehouse soon.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. If you want a brief overview of these database structures, I cover them in my AWS database blog.

For Microsoft Azure, CosmosDB is the option available for NoSQL databases. CosmosDB is multi-model, meaning you can build out your databases to fit the NoSQL model you prefer. Use the SAS Data Connector to ODBC to interact with your Data in Azure CosmosDB.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Azure offers HDInsights as their Hadoop as a service offering. Azure HDInsights supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data you plan to store before you select an Azure database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases, remember that you will be charged for the storage you use and for the data that you move out of the database. Performing analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these by keeping frequently accessed data cached somewhere or remain on-premise. Another strategy I’ve seen becoming more popular is taking advantage of the SAS Micro Analytics Service to move the models you have built to run in the cloud provider where your data is stored. Data transfer is cheaper if that data moves between cloud services instead of outside of the cloud provider. Micro Analytics Service allows you to score the data in place without movement from a cloud provider and without having to do an install of SAS.

Additional Resources
1. Support for Databases in SAS® Viya® 3.4
2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Microsoft Azure was published on SAS Users.

11月 052019
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Microsoft Azure. Access all the articles in the series here.

The series

This is the next iteration of a series covering database as a service (DBaaS) and storage offerings in the cloud, this time from Microsoft Azure. I have already published two articles on Amazon Web Services. One of those articles covers the DBaaS offerings and the other covers storage offerings for Amazon Web Services. I will cover Google Cloud Platform in future articles. The goal of these articles is to supply a breakdown of these services to better understand the business requirements of these offerings and how they relate to SAS. I would encourage you to read all the articles in the series even if you are already using a specific cloud provider. Many of the core technologies and services are offered across the different cloud providers. These articles focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya, called the SAS Data Connect Accelerator, are synonymous with the SAS Embedded Process.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies move to a cloud-based architecture, companies must consider questions like: Where do I store my data? What cloud services best meet my business requirements? Which cloud vendor should I use? Can I migrate my applications to the cloud? If you are looking to migrate your SAS infrastructure to Azure, look at the SAS Viya QuickStart Template for Azure to see a rapid deployment pattern to get the SAS Viya platform up and running in Azure.

SAS integration with Azure

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Azure database variants. A database running in Azure is much like your on-premise database, but instead Microsoft manages the software and hardware. Azure’s DBaaS offerings takes care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

Azure databases

Azure offers database service technologies familiar to many users. If you read my previous article on SAS Data Connectors and Amazon Web Services, you are sure to see many parallels. It is important to understand the terminology and how the different database services in Azure best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in Azure. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and creates backups means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the Azure offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between Azure database services and SAS. Trace you path in the diagram and read on to learn more about connection details.

Integration points between Azure database services and SAS

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Azure offers several types of transactional database services. You can organize the Azure transactional database service into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers have workloads built around an enterprise database. Azure is an interesting use case because Microsoft is also a traditional enterprise database vendor. Amazon, for example, does not have existing on-premise enterprise database customers. Oracle cloud is the other big player in the enterprise market looking to migrate existing customers to their cloud. Slightly off topic, but it may be of interest to some, SAS does support customers running their Oracle database on Oracle Cloud Platform using their SAS Data Connector to Oracle. Azure offers a solution for customers looking to continue their relationship with Microsoft without refactoring their existing workflows. Customers bring an existing enterprise database licenses to Azure and run SQL Server on Virtual Machines. SAS has extended SAS Data Connector support for SQL Server on Virtual Machines. You can also use your existing SAS license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with SQL Server on Virtual Machines.

Remember you can install and manage your own database on a virtual machine. For example, support for both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata is available for Teradata installed on Azure. If there is not an available database as a service offering, the traditional backup and update responsibilities are left to the customer.

SQL Server Stretch Database is another service available in Azure. If you are not prepared to add more storage to your existing on-premise SQL Server database, you can add capacity using the resources available in Azure. SQL Server Stretch will scale your data to Azure without having to provision any more servers on-premise. New SQL Server capacity will be running in Azure instead of in your data center.

Open Source

Azure provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. You can use your existing SAS license for SAS Data Connector to MYSQL to connect to Azure Database for MYSQL and SAS Data Connector to PostgreSQL to interface with Azure Database for PostgreSQL. SAS has not yet formally supported Azure Database for MariaDB. MariaDB is a variant of MySQL, so validation of support for SAS Data Connector is coming soon. If you need support for MariaDB in Azure database, please comment below and I will share your feedback with product management and testing.

Cloud Native

Azure SQL Database is an iteration of Microsoft SQL Server built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Azure SQL Database. You can use your existing license for SAS Data Connector to Microsoft SQL Server to connect to Azure SQL Database.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Azure SQL Data Warehouse is the analytical database service offered by Azure. The SAS Data Connector to ODBC combined with a recent version of the Microsoft-supplied ODBC driver is currently the best way to interact with Azure SQL Data Warehouse. Look for the SAS Data Connector to Microsoft SQL Server to support SQL Data Warehouse soon.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. If you want a brief overview of these database structures, I cover them in my AWS database blog.

For Microsoft Azure, CosmosDB is the option available for NoSQL databases. CosmosDB is multi-model, meaning you can build out your databases to fit the NoSQL model you prefer. Use the SAS Data Connector to ODBC to interact with your Data in Azure CosmosDB.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Azure offers HDInsights as their Hadoop as a service offering. Azure HDInsights supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data you plan to store before you select an Azure database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases, remember that you will be charged for the storage you use and for the data that you move out of the database. Performing analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these by keeping frequently accessed data cached somewhere or remain on-premise. Another strategy I’ve seen becoming more popular is taking advantage of the SAS Micro Analytics Service to move the models you have built to run in the cloud provider where your data is stored. Data transfer is cheaper if that data moves between cloud services instead of outside of the cloud provider. Micro Analytics Service allows you to score the data in place without movement from a cloud provider and without having to do an install of SAS.

Additional Resources
1. Support for Databases in SAS® Viya® 3.4
2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Microsoft Azure was published on SAS Users.

10月 312019
 

Buy my costume, Georgie!

While growing up in the 80's, I watched The Golden Girls on TV with my Grandma Betty. Now, when my sister visits, we binge watch reruns on TV Land. I was excited when I saw for this Halloween, you could buy Golden Girls costumes! Too bad they sold out right away, making them one this year's most popular costumes.

For the record, I wasn't planning to dress up tonight as a Golden Girl, but the news got me to thinking how Halloween costumes have changed over the years. What was popular when? Hence, this post. I explain how to use SAS REST APIs to append a table containing historic costume data with this year's most popular costumes (including the Golden Girls and Pennywise from It). While looking at costume data in this example, consider the append steps as a template to translate any table needing updates in SAS Viya, using REST APIs.

The data

There I am, in the middle

I created a data set containing the most popular Halloween costumes from the last 50 years (1968-2018). I compiled the data from several sources who couldn't seem to agree on the best-selling costume for a given year, so I combined the lists. Many years have two entries. The data here isn't as important as the append table procedure. What fun to review the costumes list! It was not hard to tell in what year certain movies (and their sequels) were released. Only one costume I wore made the list – my 1979 Ace Frehley outfit!

The process

The procedures in this example run on SAS Viya, utilizing the Cloud Analytics Services (CAS) REST APIs. CAS REST APIs use CAS actions to perform statistical methods across a variety of SAS products. You can learn more about CAS REST APIs and CAS Actions in the Using SAS Cloud Analytics Service REST APIs to run CAS Actions post or on developer.sas.com.

Below, I'll detail each REST call along with sample code. I originally used Postman to organize my calls. This allowed me to utilize pre and post-call scripting to handle responses and create variables. You can find the entire Postman collection here on GitHub. For ease-of-display purposes in this post, I'll use equivalent cURL commands.

Pre-requisites

I registered my client, obtained access token, and added it as an environment variable ACCESSTOKEN. For more information on registering a client or getting an access token, see my earlier post Authentication to SAS Viya: a couple of approaches.

Create a CAS session

Before running any CAS actions, we need to establish a connection to the SAS Viya server.

curl -X POST https://sasserver:8777/cas/sessions \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/vnd.sas.cas.session+json'

The result of this call is a session id in the form of a089ce2b-8116-7a40-b3e3-6e39b7b5566d. This will be used in all subsequent REST calls. You could easily create another variable for further use. In the examples below I substitute the actual session id with <session-id>. You will need to substitute this place holder when attempting the steps on your own.

Create a global Caslib HALLOWEEN

Data in CAS is stored in a Caslib. In the step below, I create a Caslib called HALLOWEEN and link it to a physical server path (/home/sasdemo/halloween), where the table is stored.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.addCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"name":"HALLOWEEN","path":"/home/sasdemo/halloween","description":"HALLOWEEN","subDirectories":"false","permission":"PUBLICWRITE","session":"false","dataSource":{"srcType":"path"},"createDirectory":"true","hidden":"false","transient":"false"}

Note that I created the directory ~/halloween and set permissions as needed. Further, since the Caslib is global, other users have access to the data. This step (and the next step) are one time requests. If you were to repeat this process you would not need to create the Caslib nor upload the data.

Copy data set costumesByYear into HALLOWEEN's path

Now that we have a Caslib and a path, we load the data table to the server. In this instance, I copy the costumesByYear.xlsx file into /home/sasdemo/halloween. There are multiple ways to upload data to the server. You can read more about the various methods in the SAS documentation.

Create a temporary Caslib TEMP

While our data lives in the HALLOWEEN Caslib, we want to create a temporary Caslib to run the append step. We will then save the appended table back into HALLOWEEN. The following code creates a new Caslib called TEMP.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.addCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"name":"TEMP","path":"/home/sasdemo/temp","description":"TEMP","subDirectories":"false","permission":"PUBLICWRITE","session":"false","dataSource":{"srcType":"path"},"createDirectory":"true","hidden":"false","transient":"false"}

Now we're ready to load the data into memory and append the table.

Load costumesByYear into memory

First, we load costumesByYear into memory in the TEMP Caslib.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.loadTable
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"path":"costumesByYear.xlsx","caslib":"HALLOWEEN","importOptions":{"fileType":"EXCEL"},"casOut":{"caslib":"TEMP","name":"costumesByYear","promote":"true"}}

Create a temporary table data2019 with containing append data

Next, we create a new table called data2019 with costume data for 2019 in TEMP.

curl -X PUT https://sasserver:8777/cas/sessions/<session-id>/actions/upload
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: text/plain' \
  -H 'JSON-Parameters: {"casOut":{"caslib":"TEMP","name":"data2019","promote":"true"},"importOptions":{"fileType":"CSV"}}' \
  --data-binary $'Year,Costume\n2019,The Golden Girls\n2019,Pennywise'

Run data step to append data2019 to costumesByYear table

Finally, we run data step code to append table data2019 to table costumesByYear.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/runCode \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"code": "data temp.costumesbyyear(append=force) ; set temp.data2019;run;"}

Save the costumesByYear table back to the HALLOWEEN CASlib

Now that we have successfully appended the costumesByYear table in the TEMP Caslib, we are ready to save back to the HALLOWEEN Caslib.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.save \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"table":{"name":"costumesbyyear","caslib":"TEMP","singlePass":"false"},"name":"costumesbyyear","replace":"true","compress":"false","caslib":"HALLOWEEN","exportOptions":{"fileType":"BASESAS"}}

Delete TEMP Caslib

The TEMP Caslib is just that, temporary. With the code below we drop the Caslib and all its data.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/table.dropCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"caslib":"TEMP"}

Delete the CAS session

The final step is to close our connection to the CAS server.

curl -X DELETE https://sasserver:8777/cas/sessions/<session-id> \
  -H 'Authorization: Bearer $ACCESSTOKEN'

Wrapping it up

There you have it. With a few simple commands we were able to load, append, and save a table. This example is fairly simple in scope, but translates into more complex use cases. The steps for my 2 x 50 table are the same as it would be for a 5GB table with hundreds of columns and millions of rows.

I have asked my mother to send the Polaroid photo of me as Ace in 1979. She just has to dig it out of a photo album. Check back in a week so you can gain fodder and poke fun at me.

Additional Resources

developer.sas.com - developers site for SAS
GitHub resources - GitHub repository for code used in this post

Append tables in SAS® Viya® with REST APIs – a treat, no tricks was published on SAS Users.

9月 042019
 

Editor’s note: This article is a continuation of the series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. Access all the articles in the series here.

In a previous article in this series, Accessing Databases in the Cloud – SAS Data Connectors and Amazon Web Services, I covered SAS and database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS). Today, I cover the various storage options available on AWS and how connect to and interact with them from SAS.

Object Storage

Amazon Simple Storage Service (S3) is a low-cost, scalable cloud object storage for any type of data in its native format. Individual Amazon S3 objects can range in size from 1 byte all the way to 5 terabytes (TB). Amazon S3 organizes these objects into buckets. A bucket is globally unique. You access the bucket directly through an API from anywhere in the world, if granted permissions. The default granted to the bucket is least access. Amazon advertises 11 9’s, or 99.999999999% of durability, meaning that you never lose your data. Data replicates automatically across availability zones to meet this durability. You can reduce the number of replicants or use one of the various tiers of archive services to reduce your object storage cost. Costs are calculated based on terabytes of storage per month with added costs for request and transfers of data.

SAS and S3

Support for Amazon Web Services S3 as a Caslib data source for SAS Cloud Analytic Services (CAS) was added in SAS Viya 3.4. This data source enables you to access SASHDAT files and CSV files in S3. You can use the CASLIB statement or the table.addCaslib action to add a Caslib for S3. SAS is currently exploring native object storage integration with AWS S3 for more file types. For other file types you can copy the data from S3 and then use a SAS Data Connector to load the data into memory. For example, if I had Excel data in S3, I could use PROC S3 to copy the data locally and then load the data into CAS using the SAS Data Connector to PC Files.

Block Storage

Amazon Elastic Block Store (EBS) is the block storage service designed for use with Amazon Elastic Compute Cloud (EC2). Only when attached to an operating system is the storage class accessible. Storage volumes can be treated as an independent disk drive controlled by a server operating system. You would mount an EBS volume to an operating system as if it were a physical disk. EBS volumes are valuable because they are the storage that will persist when you terminate your compute instance. You can choose from four different volume types that supply performance levels at corresponding costs.

SAS and EBS

EBS is used as the permanent SAS data storage and persists through a restart of your SAS environment. The performance choices made when selecting from the different EBS volume type will have a direct impact on the performance that you get from SAS. One thing to consider is using compute instances that have enhanced EBS performance or dedicated solid state drive instance storage. For example, the SAS Viya on AWS QuickStart uses Storage Optimized and Memory Optimized compute instances with local NVMe-based SSDs that are physically connected to the host server that is coupled to the lifetime of the instance. This is beneficial for performance.

SAS Cloud Analytic Services (CAS) is an in-memory server that relies on the CAS Disk Cache as the virtual memory storage backend. This is especially true if you are reading data from a database. In this case, make sure you have enough block storage, in the form of EBS volumes for use as the CAS Disk Cache.

File Storage

Amazon Elastic File System (EFS) provides access to data through a shared file system. EFS is an elastic network file system that grows and shrinks as you add or remove files, so you only pay for the storage you consume. Users create, delete, modify, read, and write files organized logically in a directory structure for intuitive access. This allows simultaneous access for multiple users to a common set of file data managed with user and group permissions. Amazon FSx for Lustre is the high-performance file system service.

SAS and EFS

EFS shared file system storage can be a powerful tool if utilizing a SAS Grid architecture. If you have a requirement in your SAS architecture for a shared location that any node in a group can access and write to, then EFS could meet your requirement. To access the data stored in your network file system you will have to mount the EFS file system. You can mount your Amazon EFS file systems to any EC2 instance, or any on-premises server connected to your Amazon VPC.

BONUS: Serverless

Amazon Athena is query service for Amazon S3. This service makes it easy to submit queries against the objects stored in S3. You can run analysis on this data using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run. Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet.

SAS and Athena

Amazon Athena is ODBC/JDBC compliant which means I can use SAS/ACCESS Interface to ODBC or SAS/ACCESS Interface to JDBC to connect using SAS. Download an Amazon Athena ODBC driver and submit code from SAS just like you would any ODBC data source. Athena is a great tool if you want to use the serverless computing power of Amazon to query data in S3.

Finally

Many times, we do not have a choice of technologies we use and infrastructures on which they sit. Luckily, if you use AWS, integration with SAS is not a concern. I’ve now covered databases and storage for AWS. In future articles, I’ll cover the same topics for Microsoft Azure and Google Cloud Platform.

Additional Resources

Storage in the Cloud – SAS and Amazon Web Services was published on SAS Users.

8月 222019
 

Editor’s note: This is the first article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Amazon Web Services.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies moves to a cloud-based architecture, companies must consider questions like: Where is my data going to be stored? Do I want a hybrid solution? What cloud storage options do I have? What storage solution best meets my business needs?. Another question requiring an answer is: Is the software I use cloud-ready?. The answer in the case of SAS is, YES! SAS offers various cloud deployment patterns on various cloud providers and supports integration with cloud storage services.

This is part one in a series covering database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS). Microsoft Azure and Google Cloud Platform will be covered in future articles. The goal is to supply a breakdown of these services to better understanding the business requirements of these offerings and how they relate to SAS. I will focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya are called SAS Data Connect Accelerators and are synonymous with the SAS Embedded Process.

SAS integration with AWS

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Amazon Web Services database variants. A database running in AWS is much like your on-premise database, but instead Amazon is managing the software and hardware. Amazon’s DBaaS offerings take care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

AWS databases

Amazon offers database service technologies familiar to users. It is important to understand the new terminology and how the different database services best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in AWS. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and create backups automatically means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the AWS offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between AWS database services and SAS.

Integration points between AWS database services and SAS

Trace you path in the diagram and read on to learn more about connection details.

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Amazon offers several types of transactional database services. You can organize Amazon Relational Database Service (RDS) into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers already have workloads built around enterprise databases. Amazon provides a turn-key enterprise solution for customers not looking to break their relationship with enterprise vendors or refactor their existing workflows. AWS offers Oracle and Microsoft SQL Server as a turn-key enterprise solution in RDS. Both offerings include the required software license, however Oracle also allows you to “Bring Your Own License” (BYOL). SAS has extended SAS Data Connector support for both cloud variants. You can use your existing license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with these RDS databases.

Remember you can install and manage your own database on a virtual machine if there is not an available database as a service offering. The traditional backup and update responsibilities are left to the customer in this case. For example, both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata are supported for Teradata installed on AWS.

Open Source

Amazon provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. SAS has extended SAS Data Connector support for all these cloud variants. You can use your existing license for SAS Data Connector to MYSQL to connect to either RDS MYSQL or RDS MariaDB and SAS Data Connector to PostgreSQL to interface with RDS PostgreSQL.

Cloud Native

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Amazon Aurora. You can use your existing license for SAS Data Connector to MYSQL to connect to either Aurora MYSQL or and SAS Data Connector to PostgreSQL to interface with Aurora PostgreSQL.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Redshift is the analytical database service offered by Amazon. SAS has a dedicated product called SAS Data Connector to Amazon Redshift that was purpose built for analytics workloads running in the Amazon cloud.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. Here is a brief overview of the most common data structures.

Key-Value Database

A key-value database stores data as a collection of key-value pairs. The key acts as a unique identifier for each record. Amazon’s key-value database as a service is DynamoDB. SAS interacts with DynamoDB using industry standard ODBC or JDBC drivers.

Columnar Database

Data in a traditional relational database is sorted by rows. The alternative columnar databases optimize by sorting data quickly using columns, saving valuable time and network I/O. Redshift is the columnar database service offered by Amazon. SAS has a dedicated product called SAS Data Connector to Amazon Redshift that was purpose built for this database.

Document Database

A document database queries data in documents typically stored in JSON format. DocumentDB is the document database service offering from Amazon. SAS interacts with DocumentDB using industry standard ODBC or JDBC drivers. DocumentDB is MongoDB-compatible which means existing MongoDB drivers and tolls work with DocumentDB. Currently SAS is building out functionally to support SAS Data Connector to MongoDB and you should expect that to expand further into DocumentDB as well.

Graph Database

Amazon Neptune is the graph database service designed to work with a complex hierarchy of interconnected data. These design of these types of databases queries relationships in data and reduce the number of table joins. SAS interacts with Amazon Neptune using industry standard ODBC or JDBC drivers.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Amazon offers Elastic Map Reduce as their Hadoop as a service offering. Amazon Elastic supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data that you plan to store before you select an AWS database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases always remember that you will be charged for the storage that you use but also for the data that you move out of the database. To do analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these costs by keeping frequently accessed data cached somewhere or remain on-premise.

Additional Resources

  1. Support for Databases in SAS® Viya® 3.4
  2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Amazon Web Services was published on SAS Users.

7月 262019
 

In a previous post, Zero to SAS in 60 Seconds- SAS Machine Learning on SAS Analytics Cloud, I documented my experience with a SAS free trial on the SAS Analytics Cloud. Well, the engineers at SAS have been busy and created another free trial. The new trial covers SAS Event Stream Processing (ESP).

This time last year (when just starting at SAS), I only knew ESP as extrasensory perception. I'm more enlightened now. Working through this exercise introduced me to how event stream processing is a powerful and effective tool for analyzing data using machine learning and streaming analytics to uncover insights for real-time decision making. In a nutshell, you create a model, stream your data, process the results, and make timely decisions based on the results.

The trial uses SAS ESPPy, allowing you to embed an ESP project inside a Python pipeline. To see ESPPy in action take a look at this video. To learn more about ESP and IoT see this article on the SAS Communities Library. In this article I chronicle my journey through the trial while introducing key concepts and operations of ESP.

Register and get started

The process to register and initial login are identical to the machine learning article. You must have a SAS Profile to participate in the trial. The only difference is you need to follow this link to sign up for the ESP trial. Please refer to the machine learning article for detailed steps of signing up and logging in.

The use case

SAS Solar Farm in Cary

The SAS Solar Farm sits on almost 12 acres of SAS Headquarters property. There are 10,276 solar panels producing more than 3.6 million kilowatt hours annually. That’s enough power for more than 325 average sized U.S. homes.

As part of the environment management, it is important to continuously monitor the operation of the solar panels to optimize configuration parameters, detect potential equipment failure, and accurately forecast the amount of energy generated. Factors considered include panel angles, time of day, seasons, and weather patterns as the energy generated depends directly of the amount of sun available to the panels.

The ESP project in this demo is pre-loaded in the trial and is run through a Jupyter notebook. The project shows the monitoring of energy (kWh) and power (kW) generated during a specific time interval eliminating localized outlier effects and triggering alerts when there is a pre-defined difference in the energy generated between subsequent time intervals.

Solar Farm Data represented as digital art

Take two minutes and watch this video on how SAS uses SAS software to create a work of art with solar farm data.

Disclaimer: no sheep were harmed during data collection or writing of this article.

Navigating the trial

Once logged into the trial, you see the Applications screen.

ESP trial Applications screen

The Data and Team options in the left pane behave exactly as those in the machine learning trial. These sections allow you to access data and manage a multi-user system. Select the SAS Event Stream Processing icon to start a JupyterLab session.

JupyterLab home screen

I will not go into the details of JupyterLab here. The left pane contains menus, file management, and other options. The pane on the right displays three options:

Python 3 Notebook - a blank Jupyter notebook - documents that combine live, runnable code with narrative text (Markdown), equations (LaTeX), images, interactive visualizations and other rich output
Python 3 Console - a blank Python console - code consoles enable you to run code interactively in a kernel
Text File - basic text editor - enables you to edit text files in JupyterLab

For this article we're going to follow along and interact with the pre-loaded demo Solar Farm ESP project. To locate the Jupyter notebook double click the demo directory from the left pane.

Select the demo directory from the left pane

Next select Event_Stream_Processing. Before proceeding with the demo, I'd highly suggest opening the README.ipynb file.

Contents of the README notebook

Here you will find overview and environment organization information for the trial. The trial uses SAS ESPPy for designing, testing, and deploying projects on ESP Servers.

Step through the demo

Before starting the trial, I needed a little background on event stream processing. I located the SAS ESP product documentation. I recommend referring to it for details on the ESP model, objects, and workflow.

To access the demo, double click the demo directory from the left pane. The trial comes with five pre-loaded demos. Feel free to try any/all of them. Double click on ESP Basic Project - Solar Farm.ipynb to display the Solar Farm notebook. The notebook walks you through the ESP model creation and execution. To run a command place the cursor in a command cell and select the 'Run' button (triangle-shaped button at the top of the notebook). If no response returns when running the cell block, assume the commands ran successfully.

Below is a brief description of the steps in the project:

  1. Create the project and query used - this creates dedicated space and objects where the ESP process takes place
  2. Create input and aggregate windows - this action extracts desired data and creates data subsets from the stream
  3. Add a join window - this brings together lag and current values into the project
  4. Add a compute window - this calculates the difference between the previous and current event
  5. Add a filter window - this action filters occurrences outside a threshold value; this creates an alert for potential mechanical issues
  6. Define workflow connections - this defines the workflow between the various windows in the project
  7. Save the project - this generates an XML file for the project
  8. Load the project to the ESP Server - this loads the project and produces a graphical representation of the workflow

    Solar Farm project workflow

  9. Start streaming data - in this example, rather than streaming data in real time, the stream derives from the solar farm table data
  10. View solar farm data - this creates a graphical representation of streaming data

    Solar Farm graph for kW and kWh

While not included in the demo, the streaming data would pass through the filter and if a threshold breach occurs, an alert is created. Considering the graph above, alerts could very well have occurred just before 1:15 pm (IntkW drops from 185 to 150) and just before 2:30 pm (IntkW drops from 125 to 35).

Your turn

Now that you have a taste of ESP, feel free to step through the rest of the demos. You may also load your own data and create your own ESP models. Feel free to share your experience and what you create by leaving a comment.

SAS Event Stream Processing on SAS Analytics Cloud - my journey was published on SAS Users.