Cloud Computing

5月 222018

SAS ViyaSAS Viya Presentations is our latest extension of the SAS Platform and interoperable with SAS® 9.4. Designed to enable analytics to the enterprise, it seamlessly scales for data of any size, type, speed and complexity. It was also a star at this year’s SAS Global Forum 2018. In this series of articles, we will review several of the most interesting SAS Viya talks from the event. Our first installment reviews Hadley Christoffels’ talk, A Need For Speed: Loading Data via the Cloud.

You can read all the articles in this series or check out the individual interviews by clicking on the titles below:
Part 1: Technology that gets the most from the Cloud.

Technology that gets the most from the Cloud

Few would argue about the value the effective use of data can bring an organization. Advancements in analytics, particularly in areas like artificial intelligence and machine learning, allow organizations to analyze more complex data and deliver faster, more accurate results.

However, in his SAS Global Forum 2018 paper, A Need For Speed: Loading Data via the Cloud, Hadley Christoffels, CEO of Boemska, reminded the audience that 80% of an analyst’s time is still spent on the data. Getting insight from your data is where the magic happens, but the real value of powerful analytical methods like artificial intelligence and machine learning can only be realized when “you shorten the load cycle the quicker you get to value.”

Data Management is critical and still the most common area of investment in analytical software, making data management a primary responsibility of today’s data scientist. “Before you can get to any value the data has to be collected, has to be transformed, has to be enriched, has to be cleansed and has to be loaded before it can be consumed.”

Benefits of cloud adoption

The cloud can help, to a degree. According to Christoffels, “cloud adoption has become a strategic imperative for enterprises.” The advantages of moving to a cloud architecture are many, but the two greatest are elasticity and scalability.

Elasticity, defined by Christoffels, allows you to dynamically provision or remove virtual machines (VM), while scalability refers to increasing or decreasing capacity within existing infrastructure by scaling vertically, moving the workload to a bigger or smaller VM, or horizontally, by provisioning additional VM’s and distributing the application load between them.

“I can stand up VMs in a matter of seconds, I can add more servers when I need it, I can get a bigger one when I need it and a smaller one when I don’t, but, especially when it comes to horizontal scaling, you need technology that can make the most of it.” Cloud-readiness and multi-threaded processing make SAS® Viya® the perfect tool to take advantage of the benefits of “clouding up.”

SAS® Viya® can addresses complex analytical challenges and speed up data management processes. “If you have software that can only run on a single instance, then scaling horizontally means nothing to you because you can’t make use of that multi-threaded, parallel environment. SAS Viya is one of those technologies,” Christoffels said.

Challenges you need to consider

According to Christoffels, it’s important, when moving your processing to the cloud, that you understand and address existing performance challenges and whether it will meet your business needs in an agile manner. Inefficiencies on-premise are annoying; inefficiencies in the cloud are annoying and costly, since you pay for that resource.

It’s not the best use of the architecture to take what you have on premise and just shift it. “Finding and improving and eliminating inefficiencies is a massive part in cutting down the time data takes to load.”

Boemska, Christoffels’ company, has tools to help businesses find inefficiencies and understand the impact users have on the environment, including:

  1. Real-time diagnostics looking at CPU Usage, Memory Usage, SAS Workload, etc.
  2. Insight and comparison provides a historic view in a certain timeframe, essential when trying to optimize and shave off costly time when working in cloud.
  3. Utilization reports to better understand how the platform is used.

Optimizing inefficiencies with SAS Viya

But scaling vertically and horizontally from cloud-based infrastructure to speed the loading and data management process solves only part of the problem. Christoffels said SAS Viya capabilities completes the picture. SAS Viya offers a number of benefits in a Cloud infrastructure, Christoffels said. Code amendments that make use of the new techniques and benefits now available in SAS Viya, such as the multi-threaded DATA step or CAS Action Sets, can be extremely powerful.

One simple example of the benefits of SAS Viya, Christoffels said, is that with in-memory processing, PROC SORT is a procedure that’s no longer needed; SAS Viya does “grouping on the fly,” meaning you can remove sort routines from existing programs, which of itself, can cut down processing time significantly.

As a SAS Programmer, just the fact that SAS Viya can run multithreaded, the fact that you don’t have to do these sorts, the way it handles grouping on the fly, the fact that multithreaded nature and capability is built into how you deal with tables are all “significant,” according to Christoffels.


Data preparation and load processes have a direct impact on how applications can begin and subsequently complete. Many organizations are using the Cloud platform to speed up the process, but to take full advantage of the infrastructure you have to apply the right software technology. SAS Viya enables the full realization of Cloud benefits through performance improvements, such as the transposing of data and the transformation of data using the DATA step or CAS Action Sets.

Additional Resources

SAS Global Forum Video: A Need For Speed: Loading Data via the Cloud
SAS Global Forum 2018 Paper: A Need For Speed: Loading Data via the Cloud
SAS Viya
SAS Viya Products

Read all the posts in this series.

Part 1: Technology that gets the most from the Cloud

Technology that gets the most from the Cloud was published on SAS Users.

9月 092016

If you’re doing data processing in the cloud or using container-enabled infrastructures to deploy your software, you’ll want to learn more about SAS Analytics for Containers. This new solution puts SAS into your existing container-enabled environment – think Docker or Kubernetes – giving data scientists and analysts the ability to perform sophisticated analyses using SAS, and all from the cloud.

The product’s coming out party is the Analytics Experience 2016 conference, September 12-14, 2016 at the Bellagio in Las Vegas. In advance of that event, I sat down with Product Manager Donna DeCapite to learn a little more about SAS Analytics for Containers and find out why it’s such a big deal for organizations who use containers for their applications.

Larry LaRusso: Before we get into details around the solution, and with apologies for my ignorance, let me start out with a really basic question. What are containers?

Donna DeCapite: Cloud containers are all the rage in the IT world. They’re an alternative to virtual machines.  They allow applications and any of its dependencies to be deployed and run in isolated space. Organizations will build and deploy in the container environment because it allows you to build only the necessary system libraries and functions to run a given piece of software. IT prefer it because it’s easy to replicate, and it’s faster and easier to deploy.

LL: And SAS Analytics for Containers will allow organizations to run SAS’ analytics in this environment, the containers?

DD: In short, yes. SAS Analytics for Containers provides a powerful set of data access, analysis and graphical tools to organizations within a container-based infrastructure like Docker. This takes advantage of the build once, run anywhere flexibility of the container environment, making it easier and faster to use SAS Analytics in the cloud.

LL: Who, within an organization, will be the primary users?

DC: Really anyone working with containers in the cloud or anyone working with DevOps. Data scientists will embrace SAS Analytics for Containers because it allows them to access data from nearly any source and easily perform sophisticated analyses using SAS Studio, our browser-based interface, or Jupyter Notebook, an open source notebook-style interface. For SAS developers, the product allows them to quickly provision IT resource to sandbox development ideas. And as I mentioned earlier, IT will appreciate the ease with which applications can be deployed, distributed and managed.

LL: You mentioned data scientists, so now I’m curious; we’re talking complex analyses here, yes?

DC: Definitely. Regressions, decision trees, Bayesian analysis, spatial point pattern analysis, missing data analysis, and many additional statistical analysis techniques can be performed with SAS Analytics for Containers. And in addition to sophisticated statistical and predictive analytics, there are a ton of prebuilt SAS procedures included to handle common tasks like data manipulation, information storage, and report writing, all available via SAS Studio and its assistive nature.

LL: What about organizations that have massive amounts of data? Can they use SAS Analytics for Containers as well?

DC: Yes, SAS Analytics for Containers allows you to take advantage of the processing power of your Hadoop Cluster by leveraging the SAS Accelerators for Hadoop like Code and Scoring Accelerator.

LL: Thank you so much for your time Donna. I know you’ve educated me quite a deal! How about more information. Where can individuals learn more about SAS Analytics for Containers?

DC: If you’re going to Analytics Experience 2016, consider coming to Michael Ames’ table talk, Cloud Computing: How Does It Work? It’s scheduled for Monday, September 12 from 4:30-5:00pm Vegas time. If you’re not going to the event, the SAS website is the best place for more information. Probably the best place to start is the SAS Analytics for Containers home page.

tags: analytics conference, analytics experience, cloud computing, SAS Analytics for Containers, sas studio

Analytics in the Cloud gets a whole lot easier with SAS Analytics for Containers was published on SAS Users.

1月 182016
I love GitHub for version control and collaboration, though I'm no master of it. And the tools for integrating git and GitHub with RStudio are just amazing boons to productivity.

Unfortunately, my University-supplied computer does not play well with GitHub. Various directories are locked down, and I can't push or pull to GitHub directly from RStudio. I can't even use install_github() from the devtools package, which is needed for loading Shiny applications up to I lived with this for a bit, using git from the desktop and rsconnect from a home computer. But what a PIA.

Then I remembered I know how to put RStudio in the cloud-- why not install R there, and make that be my GitHub solution?

It works great. The steps are below. In setting it up, I discovered that Digital Ocean has changed their set-up a little bit, so I update the earlier post as well.

1. Go to Digital Ocean and sign up for an account. By using this link, you will get a $10 credit. (Full disclosure: I will also get a $25 credit once you spend $25 real dollars there.) The reason to use this provider is that they have a system ready to run with Docker already built in, which makes it easy. In addition, their prices are quite reasonable. You will need to use a credit card or PayPal to activate your account, but you can play for a long time with your $10 credit-- the cheapest machine is $.007 per hour, up to a $5 per month maximum.

2. On your Digital Ocean page, click "Create droplet". Click on "One-click Apps" and select "Docker (1.9.1 on 14.04)". (The numbers in the parentheses are the Docker and Ubuntu version, and might change over time.) Then a size (meaning cost/power) of machine and the region closest to you. You can ignore the settings. Give your new computer an arbitrary name. Then click "Create Droplet" at the bottom of the page.

3. It takes a few seconds for the droplet to spin up. Then you should see your droplet dashboard. If not, click "Droplets" from the top bar. Under "More", click "Access Console". This brings up a virtual terminal to your cloud computer. Log in (your username is root) using the password that digital ocean sent you when the droplet spun up.

4. Start your RStudio container by typing: docker run -d -p 8787:8787 -e ROOT=TRUE rocker/hadleyverse

You can replace hadleyverse with rstudio if you like, for a quicker first-time installation, but many R users will want enough of Hadley Wickham's packages that it makes sense to install this version. The -e ROOT=TRUE is crucial for our approach to installing git into the container, but see the comment below from Petr Simicek below for another way to do the same thing.

5. Log in to your Cloud-based RStudio. Find the IP address of your cloud computer on the droplet dashboard, and append :8787 to it, and just put it into your browser. For example: Log in as user rstudio with password rstudio.

6. Install git, inside the Docker container. Inside RStudio, click Tools -> Shell.... Note: you have to use this shell, it's not the same as using the droplet terminal. Type: sudo apt-get update and then sudo apt-get install git-core to install git.

git likes to know who you are. To set git up, from the same shell prompt, type git config --global "Your Handle" and git config --global ""

7. Close the shell, and in RStudio, set things up to work with GitHub: Go to Tools -> Global Options -> Git/SVN. Click on create RSA key. You don't need a name for it. Create it, close the window, then view it and copy it.

8. Open GitHub, go to your Profile, click "Edit Profile", "SSH keys". Click "Add key", and just paste in the stuff you copied from RStudio in the previous step.

You're done! To clone an existing repos from Github to your cloud machine, open a new project in RStudio, and select Version Control, then Git, and paste in the URL name that GitHub provides. Then work away!

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, other than as mentioned above, the aggregator is violating the terms by which we publish our work.
4月 272015

SAS recently performed testing using the Intel Cloud Edition for Lustre* Software - Global Support (HVM) available on AWS marketplace to determine how well a standard workload mix using SAS Grid Manager performs on AWS.  Our testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. You can find the detailed results in the technical paper, SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre.

In addition to the paper, Amazon will be publishing a post on the AWS Big Data Blog that will take a look at the approach to scaling the underlying AWS infrastructure to run SAS Grid Manager to meet the demands of SAS applications with demanding I/O requirements.  We will add the exact URL to the blog as a comment once it is published.

System design overview – network, instance sizes, topology, performance

For our testing, we set up the following AWS infrastructure to support the compute and IO needs for these two components of the system:

  • the SAS workload that was submitted using SAS Grid Manager
  • the underlying Lustre file system required to meet the clustered file system requirement of SAS Grid Manager.

SAS Grid Manager and Lustre shared file configuration on AWS clour

The SAS Grid nodes in the cluster are i2.8xlarge instances.  The 8xlarge instance size provides proportionally the best network performance to shared storage of any instance size, assuming minimal EBS traffic.  The i2 instance also provides high performance local storage, which is covered in more detail in the following section.

The use of an 8xlarge size for the Lustre cluster is less impactful since there is significant traffic to both EBS and the file system clients, although an 8xlarge is still is more optimal.  The Lustre file system has a caching strategy, and you will see higher throughput to clients in the case of frequent cache hits which effectively reduces the network traffic to EBS.

Steps to maximize storage I/O performance

The shared storage for SAS applications needs to be high speed temporary storage.  Typically temporary storage has the most demanding load.  The high I/O instance family, I2, and the recently released dense storage instance, D2, provide high aggregate throughput to ephemeral (local) storage.  For the SAS workload tested, the i2.8xlarge has 6.4 TB of local SSD storage, while the D2 has 48 TB of HDD.

Throughput testing and results

We wanted to achieve a throughput of least 100 MB/sec/core to temporary storage, and 50-75 MB/sec/core to shared storage.  The i2.8xlarge has 16 cores (32 virtual CPUs, each virtual CPU is a hyperthread on a core, and a core has two hyperthreads).  Testing done with lower level testing tools (fio and a SAS tool,  showed a throughput of about 3 GB/sec to ephemeral (temporary) storage and about 1.5 GB/sec to shared storage.  The shared storage performance does not take into account file system caching, which Lustre does well.

This testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. For full details of the testing configuration and results, please see the SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre technical white paper.


tags: cloud computing, configuration, grid, SAS Administrators

The post Can I run SAS Grid Manager in the AWS cloud? appeared first on SAS Users.

4月 022015

In my last blog post, I introduced SAS Visual Analytics for UN Comtrade, which helps anyone glean answers from the most comprehensive collection of international trade data in the world. I’d like to share some of what we learned in the development process as it pertains to big data, high […]

The post Big data lessons learned from visualizing 27 years of international trade data appeared first on SAS Voices.

1月 162015

cloud4modelsThis is the last of my series of posts on the NIST definition of cloud computing. As you can see from this Wikipedia definition, calling anything a “cloud” is likely to be the fuzziest way of describing it.

In meteorology, a cloud is a visible mass of liquid droplets or frozen crystals made of water or various chemicals suspended in the atmosphere above the surface of a planetary body. These suspended particles are also known as aerosols and are studied in the cloud physics branch of meteorology.

Not that there is anything wrong with the label “cloud”--it’s a shortcut that allows us to quickly convey an idea. But for anything beyond that, when talking about functionality, we would be well advised to define and describe “cloud” in as much detail as possible so that all people involved have the same picture in their mind, and not whatever it is they think of when they think of “cloud”.

The NIST definitions help us narrow down features, functionality and models, but those are still only broad categories that leave certain gaps in which misunderstandings can easily sprout. I encourage you to use these definitions, but also to go further and describe cloud architectures by using terms that are as precise as possible.

In recent posts, I talked about the five characteristics of cloud, as well as the three service models. In this final installment of the series, I will discuss the four cloud deployment models.

Public cloud

The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.

Of the four deployment models, this one is the easiest to grasp. A public cloud is simply a cloud that you rent out to others. Essentially, if you build it (physically) and you charge money for others to use it, then you have public cloud. The typical providers of public clouds include Amazon Web Services (AWS), Microsoft Azure, Rackspace, Google Cloud.

Despite the name, the public cloud resources you are renting out are not necessarily accessible to the general public.

Private cloud

The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.

This deployment model is a bit more challenging to describe. The “private” in “private cloud” does not imply any more “privacy” or “security” than a public cloud would, but instead means “your own private use”. Another term you may hear is “corporate cloud”, a cloud used solely by a corporation, not by its customers.

For example, a private cloud may be set up within an organization so that different divisions have access to a shared virtual computing environment, for a variety of purposes:  development, testing, training, demos.  These are typically resources that are not accessible to anyone outside the organization.

Some organizations will decide to build an on-premises cloud using physical servers and software such as OpenStack. Others will opt instead to rent out the required resources from a cloud provider like AWS. Regardless of the choice, both of these fall under private cloud.

Private cloud environments are usually thought of as server configurations physically running on-premises. However, if the organization decides tomorrow to replace all these servers with Amazon instances running in the AWS cloud, it would still be considered a private cloud.  This is because, once again, it would be used for internal purposes, even though it is physically off-premises.

In doing research for this blog post, I came across numerous articles on the internet that seem to confuse private cloud with either “on-premises” or “secure access”. For example, many people consider AWS’ VPC offering to be a private cloud. As others have pointed out, it is not inherently a private cloud. It is a more secure way of accessing public cloud resources.

Community cloud

The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.

If tomorrow, I decided to start my own insurance company and create a new public cloud dedicated to only insurance companies, this would make it a community cloud. I could host it myself or have another public cloud provider manage the back-end for me.

There are few examples of these, and I have not worked with any of them myself. A good example of community cloud is GovCloud. It is created, hosted and managed by AWS. But it is addressed to all the branches of the US government. There is also the NYSE Capital Market Community Platform, which is sort of a financial-industry cloud.

Hybrid cloud

The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).

I have to admit that this definition confuses me a bit. I may just write to the NIST to ask what they meant!

The distinction between private and public is not a distinction of location of the processing, but more of the type of processing that you do. So, to me, needing more resources and bursting to the cloud does not change the type of processing that you do.

I would expect that what makes for a hybrid cloud is the use of differing cloud technologies together. So you could have an on-premises OpenStack Cloud for baseline processing and obtain (burst to) AWS instances for peak usage. This would also mean that a hybrid cloud (made up of different cloud platforms) could then be either private, public, or community.

Parting words

The NIST definitions I have shared in this series of blog posts help us narrow down features, functionality and models so we can be more accurate when talking about the cloud.

In my opinion, they provide a solid base of understanding, and general classification, but they also don't go far enough along the branches of choices when it comes to cloud computing. The five characteristics, three service models and four deployment models are more than just marketing buzzwords. They are the foundation on which the detailed technical cloud architecture should be built. They are the start of the cloud discussion, they are not the whole discussion.

tags: cloud computing, SAS Professional Services, SAS Programmers
12月 052014

cloud3svcsIn a recent post, I talked about the 5 essential characteristics of cloud computing. In today’s post, I will cover the three service models available in the cloud as they are defined by the National Institute of Standards and Technology, or NIST.

Like the story of Goldilocks, when it comes to choosing service models for the cloud, there is no right or wrong.  Your choice depends who you are and what you want. (Granted, that’s my own interpretation of the story).

The idea of this post is not to add more hype around cloud service models (if that’s even possible!) but to use simple examples to illustrate them.

Software as a Service (SaaS)

“The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.”

Do you remember when you had to install an e-mail client on your desktop? Remember Outlook Express? (shudders!). The younger readers will not believe me, but in those prehistoric times, you downloaded your e-mails to your computer. And if you were not careful, they were deleted from the server. Yes, deleted! What a concept!

But this started to change with the advent of web-based e-mail, like Hotmail and GMail. These were likely the first examples of Software-As-A-Service. And you no longer needed a desktop e-mail client.

When I explain SaaS to friends and family, I usually say that it's a lot like renting a home:

  • You pay for the use (number of month) of a house or apartment
  • If something major breaks, your landlord will be the one to fix it
  • You can bring your furniture, may be even paint the inside of your home, but you can't re-model
  • If it turns out to be too small after a while, it's easy to leave and find a bigger place

Infrastructure as a Service (IaaS)

“The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).”

Infrastructure as a Service is also fairly straightforward to understand. Instead of renting out access to an application, you rent out the essential building blocks that enable you to build or run any type of application you desire. Amazon Web Services (AWS) is the archetypal example, and is the leading IaaS provider at this time. On AWS, you can rent a wide variety of servers, storage or network devices for dollars per hour. You can then assemble and build upon those to create any type of software application.

To continue with my house theme, IaaS would be like renting out the tools necessary to build your house and renting the land on which it stands. So, you could make your house twice as big when you have family over, but the price would go up. You could also decide to build hundreds of apartments, and rent them out to others, therefore making a profit.

Platform as a Service (PaaS)

“The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.”

I kept this one for last because it’s the hardest one to explain, and in my opinion, the one that is the least clearly defined at this point. It sits between IaaS and SaaS, but where exactly will depend on who you talk to. Essentially, we want more flexibility than SaaS, but less grunt work than IaaS. The examples I use when talking about this are usually the Apple App Store, Google App Engine or the platforms.

To continue stretching my house analogy to its limits, PaaS is a little like those new 3-D printed houses. It allows for quick building of a house. You can have any shape you want, within reason, but you cannot choose what materials it’s made of. You offer the service of building the house, and the customer can provide the blueprints for it. Or you provide standard modules like bathroom and kitchen, which can then be assembled like LEGO blocks.

Which service model is right for SAS users?

It is unlikely that SAS Institute will become an IaaS provider. We do, however, leverage IaaS providers to host SAS solutions in regions where we do not have a SAS-owned data center.  As hosted solutions become more popular with customers, this trend will likely increase.

SAS Visual Analytics for SAS Cloud is a recent SaaS offering. Over time, SAS will start offering the SaaS option for more products and solutions.

The advent of new application packaging and container technologies offer opportunities for SAS to provide users with PaaS options that will enable them to build their own SAS-based applications

In my opinion, it is likely that this will be the “not too hot, not too cold” spot for us, as it will enable us to continue providing the great analytics tools that we always have, and let our customers put these tools together into applications that fit their own specific needs.

Parting words

I hope that you found this interesting and that it gave you a better understanding of these terms. If you want to read further on these topics, I really like these two other ways of describing the three services models: the "host/build/consume" shortcut, and "Pizza-As-A-Service", the other PaaS.

If you have comments, concerns or feedback, please share in the comment box below.

tags: cloud computing, SAS Professional Services, SAS Programmers