SAS Viya

6月 292017
 

One of the big benefits of the SAS Viya platform is how approachable it is for programmers of other languages. You don't have to learn SAS in order to become productive quickly. We've seen a lot of interest from people who code in Python, maybe because that language has become known for its application in machine learning. SAS has a new product called SAS Visual Data Mining and Machine Learning. And these days, you can't offer such a product without also offering something special to those Python enthusiasts.

Introducing Python SWAT

And so, SAS has published the Python SWAT project (where "SWAT" stands for the SAS scripting wapper for analytical transfer. The project is a Python code library that SAS released using an open source model. That means that you can download it for free, make changes locally, and even contribute those changes back to the community (as some developers have already done!). You'll find it at github.com/sassoftware/python-swat.

SAS developer Kevin Smith is the main contributor on Python SWAT, and he's a big fan of Python. He's also an expert in SAS and in many programming languages. If you're a SAS user, you probably run Kevin's code every day; he was an original developer on the SAS Output Delivery System (ODS). Now he's a member of the cloud analytics team in SAS R&D. (He's also the author of more than a few conference papers and SAS books.)

Kevin enjoys the dynamic, fluid style that a scripting language like Python affords - versus the more formal "code-compile-build-execute" model of a compiled language. Watch this video (about 14 minutes) in which Kevin talks about what he likes in Python, and shows off how Python SWAT can drive SAS' machine learning capabilities.

New -- but familiar -- syntax for Python coders

The analytics engine behind the SAS Viya platform is called CAS, or SAS Cloud Analytic Services. You'll want to learn that term, because "CAS" is used throughout the SAS documentation and APIs. And while CAS might be new to you, the Python approach to CAS should feel very familiar for users of Python libraries, especially users of pandas, the Python Data Analysis Library.

CAS and SAS' Python SWAT extends these concepts to provide intuitive, high-performance analytics from SAS Viya in your favorite Python environment, whether that's a Jupyter notebook or a simple console. Watch the video to see Kevin's demo and discussion about how to get started. You'll learn:

  • How to connect your Python session to the CAS server
  • How to upload data from your client to the CAS server
  • How SWAT extends the concept of the DataFrame API in pandas to leverage CAS capabilities
  • How to coax CAS to provide descriptive statistics about your data, and then go beyond what's built into the traditional DataFrame methods.

Learn more about SAS Viya and Python

There are plenty of helpful resources to help you learn about using Python with SAS Viya:

And finally, what if you don't have SAS Viya yet, but you're interested in using Python with SAS 9.4? Check out the SASPy project, which allows you to access your traditional SAS features from a Jupyter notebook or Python console. It's another popular open source project from SAS R&D.

The post Using Python to work with SAS Viya and CAS appeared first on The SAS Dummy.

6月 142017
 

In SAS Viya 3.2, the Self-Service Import provides a mechanism for a user to import (copy) data into the SAS Cloud Analytic Services (CAS) environment. The data is copied as a .sashdat file into the selected CAS Library location when it is imported.  Self-Service Import data can only be imported into CAS libraries of type PATH, HDFS, or DNFS.

The Self-Service Import functionality is available in the following applications:

  • SAS Visual Data Builder
  • SAS Visual Analytics
  • SAS Environment Manager – Data

To have access to Self-Service Import, the end user must be granted the Read permission on the /casManagement_capabilities/importData object URI in the Security ⇨ Rules area of SAS Environment Manager.

Self-Service Import supports importing data to CAS from Local, Server, and Social Media sources.

Self-Service Import in SAS Viya

SAS Viya 3.2: Self-Service Import

Local

Local file data can be imported from Microsoft Excel (.XLSX or .XLS), text file (.TXT or .CSV), the clipboard, or a SAS Data Set (SASHDAT or SAS7BDAT). The file(s) must exist on a file system available to your PC.

Server

After providing the appropriate server connection information, a table from LASR or select database types can be imported. The currently supported database types are:  Oracle, Teradata, Hadoop, PostgreSQL, and Impala. The Server selections displayed are dependent on your licensing and configuration.

Social Media

After authentication with the social media provider (Twitter, Facebook, Google Analytics, or YouTube), data can be imported through the social media provider’s public API. Access to these APIs is subject to the social media provider’s applicable license terms, terms of use, and other usage terms and policies.

Currently, there is a size limit for file imports that is set on the CAS Management service Configuration screen in SAS Environment Manager. The default size is 4GB. The local file importer has a 4GB limit because that is what the smallest size limit browser (Internet Explorer) is restricted to; however, Chrome and other browsers will allow larger file sizes, which is why there is a property that allows an Admin to set a higher limit. A modification to the max-file-size property requires a restart of the casManagement service.

Social Media and DBMS importers have no explicit limits. However, there is a limitation of the disk size of where casManagement is running because the uploaded file gets written to a temporary file on the server relative to casManagement.

For more information refer to the Self-Service Import section of the The Self-Service Import in SAS Viya 3.2 was published on SAS Users.

6月 072017
 

Innovation used to happen in structured cycles. The new invention was often a planned event and the domain of a select few departments within an organisation. But in today's always-on economy enterprises need to innovate on a continuous basis to keep up with new players that base their entire businesses [...]

Don’t trip up on the edge of innovation was published on SAS Voices by Peter Pugh-Jones

5月 232017
 

I do not like to stand still. I am a lifelong learner, I take guitar lessons during the week, and I am known for riding my Segway around SAS campus and at events. SAS works the same way; we never stand still. We are continuously innovating and moving forward. Last [...]

Developing with passion: 4 new technologies shaping the future of SAS was published on SAS Voices by Oliver Schabenberger

5月 232017
 

I do not like to stand still. I am a lifelong learner, I take guitar lessons during the week, and I am known for riding my Segway around SAS campus and at events. SAS works the same way; we never stand still. We are continuously innovating and moving forward. Last [...]

Developing with passion: 4 new technologies shaping the future of SAS was published on SAS Voices by Oliver Schabenberger

5月 232017
 

I do not like to stand still. I am a lifelong learner, I take guitar lessons during the week, and I am known for riding my Segway around SAS campus and at events. SAS works the same way; we never stand still. We are continuously innovating and moving forward. Last [...]

Developing with passion: 4 new technologies shaping the future of SAS was published on SAS Voices by Oliver Schabenberger

5月 092017
 

Microservices are a key component of the SAS Viya architecture. In this post, I’ll introduce and explain the benefits of microservices. In a future post we’ll dig deeper into the microservices architecture.

What are microservices?

When we look at SAS Viya architecture diagrams, we can find, among the new core components, microservices.

Microservices are self-contained, lightweight pieces of software that

  • Do one thing.
  • Depend on one another to the least extent possible.
  • Are deployed independently.
  • Provide a language-agnostic API.
  • Can run one or more instances of these processes at any given time.

Note that the prefix “micro” doesn’t mean small in CPU or memory consumption. Rather, it refers to the software performing a single function or being narrow in scope.

Let’s compare to SAS 9

The SAS 9 Web Infrastructure Platform services and the overall platform are tightly coupled to metadata structures and schemas. Every maintenance action takes a bit of effort: can you apply a fix to a single application without first stopping the whole infrastructure? Can you upgrade one component and leave all of the other ones at the previous release? Can you…?

To address these and other issues, SAS R&D decomposed the metadata server,  the Web Infrastructure Platform, and  many web applications. As a result, we got many functional units. Each one is a microservice.

Let’s have a look at the following examples.

In SAS 9.4 we can open the SAS Management Console to manage users and groups:

In SAS Viya, we can do the same using the SAS Environment Manager web application:

You may think we simply switched to a different, web-based client. Actually, the real difference lies in the backend implementation. With SAS 9, the metadata server was responsible for servicing that functionality in addition to a host of other features. With SAS Viya, we have a dedicated microservice for it: the Identities microservice.

Here’s another example. We want to edit an option in the configuration of an application, like the address of the Open Street Map server to use with Visual Analytics geo maps. With SAS 9, we use the SAS Management Console to interact, as usual, with the metadata server.

With SAS Viya, we set the property with Environment Manager. And, guess what? We are using the Configuration microservice.

If you are curious and want to see a list of all the microservices deployed in your SAS Viya environment, you can, again, use the Environment Manager.

Note that in all these examples, the Environment Manager is simply serving as the GUI to a particular microservice supporting the associated feature.

What are the benefits of Microservices?

The move to a microservice-oriented architecture brings many benefits to all stakeholders, first and foremost to SAS users and SAS administrators.

Microservices are independently updatable

It is now easier for you to manage and maintain your environment. Hot fixes for a specific microservice are released just as normal updates, and the official installation process is documented in the

Just as with the previous point, there are a few exceptions: almost everything requires the SASLogon and Identities microservices, so, if they are down, nothing works.

Scalability and High Availability

When microservices are spun up, they self-register, making themselves available for processing requests. This way, supporting failover is as easy as ensuring you have at least two instances of the associated microservice up and running. It is possible to scale further for increased capacity/performance, and you can do so at the microservice level, based on the specific demand for each function (e.g., you likely won’t need as many instances of the Import VA SPK microservice as you do for the Authorization microservice).

Microservices are “open”

Microservices can run in different environments – bare OS, Cloud Foundry, Docker. Also, they are accessible to non-SAS developers through REST APIs. As an example, let’s say I’d like to retrieve the same properties for the SAS Administrators group that were shown above in Environment Manager. It’s as easy as calling a REST endpoint: http://<myserver>/identities/groups/SASAdministrators
The result can be in either XML or json.

In fact, even microservices communicate with one another using REST interfaces!

I hope this blog has been helpful.

Feel free to add comments or questions below.

Let’s talk about Microservices was published on SAS Users.

5月 092017
 

Microservices are a key component of the SAS Viya architecture. In this post, I’ll introduce and explain the benefits of microservices. In a future post we’ll dig deeper into the microservices architecture.

What are microservices?

When we look at SAS Viya architecture diagrams, we can find, among the new core components, microservices.

Microservices are self-contained, lightweight pieces of software that

  • Do one thing.
  • Depend on one another to the least extent possible.
  • Are deployed independently.
  • Provide a language-agnostic API.
  • Can run one or more instances of these processes at any given time.

Note that the prefix “micro” doesn’t mean small in CPU or memory consumption. Rather, it refers to the software performing a single function or being narrow in scope.

Let’s compare to SAS 9

The SAS 9 Web Infrastructure Platform services and the overall platform are tightly coupled to metadata structures and schemas. Every maintenance action takes a bit of effort: can you apply a fix to a single application without first stopping the whole infrastructure? Can you upgrade one component and leave all of the other ones at the previous release? Can you…?

To address these and other issues, SAS R&D decomposed the metadata server,  the Web Infrastructure Platform, and  many web applications. As a result, we got many functional units. Each one is a microservice.

Let’s have a look at the following examples.

In SAS 9.4 we can open the SAS Management Console to manage users and groups:

In SAS Viya, we can do the same using the SAS Environment Manager web application:

You may think we simply switched to a different, web-based client. Actually, the real difference lies in the backend implementation. With SAS 9, the metadata server was responsible for servicing that functionality in addition to a host of other features. With SAS Viya, we have a dedicated microservice for it: the Identities microservice.

Here’s another example. We want to edit an option in the configuration of an application, like the address of the Open Street Map server to use with Visual Analytics geo maps. With SAS 9, we use the SAS Management Console to interact, as usual, with the metadata server.

With SAS Viya, we set the property with Environment Manager. And, guess what? We are using the Configuration microservice.

If you are curious and want to see a list of all the microservices deployed in your SAS Viya environment, you can, again, use the Environment Manager.

Note that in all these examples, the Environment Manager is simply serving as the GUI to a particular microservice supporting the associated feature.

What are the benefits of Microservices?

The move to a microservice-oriented architecture brings many benefits to all stakeholders, first and foremost to SAS users and SAS administrators.

Microservices are independently updatable

It is now easier for you to manage and maintain your environment. Hot fixes for a specific microservice are released just as normal updates, and the official installation process is documented in the

Just as with the previous point, there are a few exceptions: almost everything requires the SASLogon and Identities microservices, so, if they are down, nothing works.

Scalability and High Availability

When microservices are spun up, they self-register, making themselves available for processing requests. This way, supporting failover is as easy as ensuring you have at least two instances of the associated microservice up and running. It is possible to scale further for increased capacity/performance, and you can do so at the microservice level, based on the specific demand for each function (e.g., you likely won’t need as many instances of the Import VA SPK microservice as you do for the Authorization microservice).

Microservices are “open”

Microservices can run in different environments – bare OS, Cloud Foundry, Docker. Also, they are accessible to non-SAS developers through REST APIs. As an example, let’s say I’d like to retrieve the same properties for the SAS Administrators group that were shown above in Environment Manager. It’s as easy as calling a REST endpoint: http://<myserver>/identities/groups/SASAdministrators
The result can be in either XML or json.

In fact, even microservices communicate with one another using REST interfaces!

I hope this blog has been helpful.

Feel free to add comments or questions below.

Let’s talk about Microservices was published on SAS Users.

5月 062017
 

As SAS Viya has been gaining awareness over the past year among SAS users, there has been a lot of discussion about how SAS’ Cloud Analytic Server (CAS) handles memory vs SAS’ previous technologies such as LASR and HPA.  Recently, while I was involved in delivering several SAS Viya enablement sessions, I realised that many, including myself, held an incorrect understanding of how this works, mainly around one particular CAS option called maxTableMem.

The maxTableMem option determines the memory block size that is used per table, per CAS Worker, before converting data to memory-mapped memory.  It is not intended to directly control how much data is put into memory vs how much is put into CAS_DISK_CACHE, but rather it indirectly influences this.

Let’s unpack that a bit and try to understand what it really means.

The CAS Controller doesn’t care what the value of maxTableMem is.  In a serial load example, the CAS Controller distributes the data evenly across the CAS Workers[1], which then fill up maxTableMem-sized buckets (memory blocks), emptying them (converting them to memory-mapped memory) as they fill up, only leaving non-full buckets of table data.  You should almost never  change the default setting of this option (16MB), except perhaps in cases of extremely large tables, in order to reduce the number of file handles (up to 256MB is probably sufficient in these cases).

CAS takes advantage of standard memory mapping techniques for the CAS_DISK_CACHE, and leaves the optimisation of it up to the OS.  With SASHDAT files and LASR in SAS 9.4, the SASHDAT file essentially acts as a pre-paged file, written in a memory-mapped format, so the table data in memory doesn’t need to be written to disk when it is paged out.  Should a table need to be dropped from memory to make room for other data, and subsequently needed to be read back in to memory, it would be paged in from the SASHDAT file.

With CAS, the CAS_DISK_CACHE allows us to extend this pre-paged file approach to all data sources, not just SASHDAT.  Traditional OS swap files are written to each time memory is paged out, however with CAS, regardless of the data source (SASHDAT, database, client-uploaded file etc.) most table memory will never need to be written to disk, as it will already exist in the backing store (this could be CAS_DISK_CACHE, HDFS or NFS).   Although data will be continually paged in and out of memory, the amount of writing to disk will be minimised, which is typically slower than reading data from disk.

Another advantage of the CAS_DISK_CACHE is that when data does need to be written to disk it can happen upfront when the server is less busy, rather than at the last moment when the system detects it is out of memory (pre-paging rather than demand-paging).  Once it is written, it can be paged back into memory multiple times, by multiple concurrent processes.  The CAS_DISK_CACHE also spreads the I/O across multiple devices and servers as opposed to a typical OS swap file that may only write to a single file on a single server.

While CAS supports exceeding memory capacity by using CAS_DISK_CACHE as a backing store, read/write disk operations do have a performance cost.  Therefore, for best performance, we recommend you have enough memory capacity to hold your  most commonly used tables, meaning  most of the time the entire table will be both in memory and the backing store.

If you expect to regularly exceed memory capacity, and therefore are frequently paging data in from CAS_DISK_CACHE, consider spreading the CAS_DISK_CACHE location across multiple devices and using newer solid state storage technologies in order to improve performance.[1]

Additionally, when you need CAS to peacefully co-exist with other applications that are sharing resources on the same nodes, standard Linux cgroup settings along with Hadoop YARN configuration can be utilised to control the resources that CAS sessions can exploit.

References

Paging

Notes

[1] There are exceptions to data being evenly distributed across the CAS Workers.  The main one is if the data is partitioned and the partitions are of different sizes – all the data of a partition must be on the same node therefore resulting in an uneven distribution.  Also, if a table is very small, it may end up on only a single node, and when CAS is co-located with Hadoop the data is loaded locally from each node, so CAS receives whatever the distribution of data is that Hadoop provides.

[2] A comprehensive analysis of all possible storage combinations and the impact on performance has not yet been completed by SAS.

Dr. StrangeRAM or: How I learned to stop worrying and love CAS was published on SAS Users.

4月 262017
 

Oracle databases from SAS ViyaSAS Data Connector to Oracle lets you easily load data from your Oracle DB into SAS Viya for advanced analytics. SAS/ACCESS Interface to Oracle (on SAS Viya) provides the required SAS Data Connector to Oracle, that should be deployed to your CAS environment. Once the below described configuration steps for SAS Data Connector to Oracle are completed, SAS Studio or Visual Data Builder will be able to directly access Oracle tables and load them into the CAS engine.

SAS Data Connector to Oracle requires Oracle client components (release 12c or later) to be installed and configured and configurations deployed to your CAS server. Here is a guide that walks you through the process of installing the Oracle client on a Linux server and configuring SAS Data Connector to Oracle on SAS Viya 3.1 (or 3.2):

Step 1: Get the Oracle Instant client software (release 12.c or later)

To find the oracle client software package open a web browser and navigate to Oracle support at:
http://www.oracle.com/technetwork/topics/linuxx86-64soft-092277.html

Download the following two packages to your CAS controller server:

  • oracle-instantclient12.1-basic-12.1.0.2.0-1.x86_64.rpm
  • oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm (optional for testing only)

Step 2: Install and configure Oracle instant client

On your CAS controller server, execute the following commands to install the Oracle instant client and SQLPlus utilities.

rpm –ivh oracle-instantclient12.1-basic-12.1.0.2.0-1.x86_64.rpm
rpm –ivh oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm

The Oracle client should be installed to /usr/lib/oracle/12.1/client64.
To configure the Oracle client, create a file called tnsnames.ora, for example, in the /etc/ directory. Paste the following lines with the appropriate connection parameters of your Oracle DB into the tnsnames.ora file. (Replace "your_tnsname", "your_oracle_host", "your_oracle_port" and "your_oracle_db_service_name" with parameters according to your Oracle DB implementation)

your_tnsname =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = your_oracle_host)(PORT = your_oracle_port ))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME =your_oracle_db_service_name)
    )
  )

Next you need to set environment variables for the Oracle client to work:

LD_LIBRARY_PATH: Specifies the directory of your Oracle instant client libs
PATH: Must include your Oracle instant client bin directory
TNS_ADMIN: Directory of your tnsnames.ora file
ORACLE_HOME: Location of your Oracle instant client installation

In a console window on your CAS controller Linux server, issue the following commands to set environment variables for the Oracle client: (replace the directory locations if needed)

export LD_LIBRARY_PATH=/usr/lib/oracle/12.1/client64/lib:$LD_LIBRARY_PATH
export PATH=/usr/lib/oracle/12.1/client64/bin:$PATH 
export TNS_ADMIN=/etc 
export ORACLE_HOME=/usr/lib/oracle/12.1/client64

If you installed the SQLPlus package from Oracle, you can test connectivity with the following command: (replace "your_oracle_user" and "your_tnsname" with a valid oracle user and the tnsname configured previously)

sqlplus your_oracle_user@your_tnsname

When prompted for a password, use your Oracle DB password to log on.
You should see a “SQL>” prompt and you can issue SQL queries against the Oracle DB tables to verify your DB connection. This test indicates if the Oracle instant client is successfully installed and configured.

Step 3: Configure SAS Data Connector to Oracle on SAS Viya

Next you need to configure the CAS server to use the instant client.

The recommended way is to edit the vars.yml file in your Ansible playbook and deploy the required changes to your SAS Viya cluster.

Locate the vars.yml file on your cluster deployment and change the CAS_SETTINGS section to reflect the correct environment variables needed for CAS to use the Oracle instant client:
To do so, uncomment the lines for ORACLE_HOME and LD_LIBRARY_PATH and insert the respective path for your Oracle instant client installation as shown below.

CAS Specific ####
# Anything in this list will end up in the cas.settings file
CAS_SETTINGS:
1: ORACLE_HOME=/usr/lib/oracle/12.1/client64
#3: ODBCHOME=ODBC home directory
#4: JAVA_HOME=/usr/lib/jvm/jre-1.8.0
5:LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib

Run the ansible-playbook to deploy the changes to your CAS server.
After ansible finished the update, your cas.settings file should contain the following lines:

export ORACLE_HOME=/usr/lib/oracle/12.1/client64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib

Now you are ready to use SAS/ACCESS Interface to Oracle in SAS Viya.

Step 4: Test SAS/ACCESS Interface to Oracle in SAS Studio

Log on to SAS Studio to load data from your Oracle DB into CAS.
Execute the following SAS Code example in SAS Studio to connect to your Oracle DB and load data into CAS. Change the parameters starting with "your_" in the SAS code below according to your Oracle DB implementation.

/************************************************************************/
/*  Start a CAS session   */
/************************************************************************/
cas;
/************************************************************************/
/*  Create a Caslib for the Oracle connection   */
/************************************************************************/
caslib ORACLE datasource=(                                           
    srctype="oracle",
    uid="your_oracle_user_ID",
    pwd="your_oracle_password",
    path="//your_db_hostname:your_db_port/your_oracle_service_name",
    schema="your_oracle_schema_name" );
 
/************************************************************************/
/*  Load a table from Oracle into CAS   */
/************************************************************************/
proc casutil;
   list files incaslib="ORACLE"; 
   load casdata="your_oracle_table_name" incaslib="ORACLE" outcaslib="casuser" casout="your_cas_table_name";                   
   list tables incaslib="casuser";
quit;
/************************************************************************/
/*  Assign caslib to SAS Studio   */
/************************************************************************/
 caslib _all_ assign;

The previous example is a simple SAS program to test access to Oracle and load data from an Oracle table into CAS memory. As a result, the program loads a table in your CAS library with data from your Oracle database.

How to configure Oracle client for successful access to Oracle databases from SAS Viya was published on SAS Users.