Cloud Computing

4月 272015

SAS recently performed testing using the Intel Cloud Edition for Lustre* Software - Global Support (HVM) available on AWS marketplace to determine how well a standard workload mix using SAS Grid Manager performs on AWS.  Our testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. You can find the detailed results in the technical paper, SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre.

In addition to the paper, Amazon will be publishing a post on the AWS Big Data Blog that will take a look at the approach to scaling the underlying AWS infrastructure to run SAS Grid Manager to meet the demands of SAS applications with demanding I/O requirements.  We will add the exact URL to the blog as a comment once it is published.

System design overview – network, instance sizes, topology, performance

For our testing, we set up the following AWS infrastructure to support the compute and IO needs for these two components of the system:

  • the SAS workload that was submitted using SAS Grid Manager
  • the underlying Lustre file system required to meet the clustered file system requirement of SAS Grid Manager.

SAS Grid Manager and Lustre shared file configuration on AWS clour

The SAS Grid nodes in the cluster are i2.8xlarge instances.  The 8xlarge instance size provides proportionally the best network performance to shared storage of any instance size, assuming minimal EBS traffic.  The i2 instance also provides high performance local storage, which is covered in more detail in the following section.

The use of an 8xlarge size for the Lustre cluster is less impactful since there is significant traffic to both EBS and the file system clients, although an 8xlarge is still is more optimal.  The Lustre file system has a caching strategy, and you will see higher throughput to clients in the case of frequent cache hits which effectively reduces the network traffic to EBS.

Steps to maximize storage I/O performance

The shared storage for SAS applications needs to be high speed temporary storage.  Typically temporary storage has the most demanding load.  The high I/O instance family, I2, and the recently released dense storage instance, D2, provide high aggregate throughput to ephemeral (local) storage.  For the SAS workload tested, the i2.8xlarge has 6.4 TB of local SSD storage, while the D2 has 48 TB of HDD.

Throughput testing and results

We wanted to achieve a throughput of least 100 MB/sec/core to temporary storage, and 50-75 MB/sec/core to shared storage.  The i2.8xlarge has 16 cores (32 virtual CPUs, each virtual CPU is a hyperthread on a core, and a core has two hyperthreads).  Testing done with lower level testing tools (fio and a SAS tool,  showed a throughput of about 3 GB/sec to ephemeral (temporary) storage and about 1.5 GB/sec to shared storage.  The shared storage performance does not take into account file system caching, which Lustre does well.

This testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. For full details of the testing configuration and results, please see the SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre technical white paper.


tags: cloud computing, configuration, grid, SAS Administrators

The post Can I run SAS Grid Manager in the AWS cloud? appeared first on SAS Users.

4月 022015

In my last blog post, I introduced SAS Visual Analytics for UN Comtrade, which helps anyone glean answers from the most comprehensive collection of international trade data in the world. I’d like to share some of what we learned in the development process as it pertains to big data, high […]

The post Big data lessons learned from visualizing 27 years of international trade data appeared first on SAS Voices.

1月 162015

cloud4modelsThis is the last of my series of posts on the NIST definition of cloud computing. As you can see from this Wikipedia definition, calling anything a “cloud” is likely to be the fuzziest way of describing it.

In meteorology, a cloud is a visible mass of liquid droplets or frozen crystals made of water or various chemicals suspended in the atmosphere above the surface of a planetary body. These suspended particles are also known as aerosols and are studied in the cloud physics branch of meteorology.

Not that there is anything wrong with the label “cloud”--it’s a shortcut that allows us to quickly convey an idea. But for anything beyond that, when talking about functionality, we would be well advised to define and describe “cloud” in as much detail as possible so that all people involved have the same picture in their mind, and not whatever it is they think of when they think of “cloud”.

The NIST definitions help us narrow down features, functionality and models, but those are still only broad categories that leave certain gaps in which misunderstandings can easily sprout. I encourage you to use these definitions, but also to go further and describe cloud architectures by using terms that are as precise as possible.

In recent posts, I talked about the five characteristics of cloud, as well as the three service models. In this final installment of the series, I will discuss the four cloud deployment models.

Public cloud

The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.

Of the four deployment models, this one is the easiest to grasp. A public cloud is simply a cloud that you rent out to others. Essentially, if you build it (physically) and you charge money for others to use it, then you have public cloud. The typical providers of public clouds include Amazon Web Services (AWS), Microsoft Azure, Rackspace, Google Cloud.

Despite the name, the public cloud resources you are renting out are not necessarily accessible to the general public.

Private cloud

The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.

This deployment model is a bit more challenging to describe. The “private” in “private cloud” does not imply any more “privacy” or “security” than a public cloud would, but instead means “your own private use”. Another term you may hear is “corporate cloud”, a cloud used solely by a corporation, not by its customers.

For example, a private cloud may be set up within an organization so that different divisions have access to a shared virtual computing environment, for a variety of purposes:  development, testing, training, demos.  These are typically resources that are not accessible to anyone outside the organization.

Some organizations will decide to build an on-premises cloud using physical servers and software such as OpenStack. Others will opt instead to rent out the required resources from a cloud provider like AWS. Regardless of the choice, both of these fall under private cloud.

Private cloud environments are usually thought of as server configurations physically running on-premises. However, if the organization decides tomorrow to replace all these servers with Amazon instances running in the AWS cloud, it would still be considered a private cloud.  This is because, once again, it would be used for internal purposes, even though it is physically off-premises.

In doing research for this blog post, I came across numerous articles on the internet that seem to confuse private cloud with either “on-premises” or “secure access”. For example, many people consider AWS’ VPC offering to be a private cloud. As others have pointed out, it is not inherently a private cloud. It is a more secure way of accessing public cloud resources.

Community cloud

The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.

If tomorrow, I decided to start my own insurance company and create a new public cloud dedicated to only insurance companies, this would make it a community cloud. I could host it myself or have another public cloud provider manage the back-end for me.

There are few examples of these, and I have not worked with any of them myself. A good example of community cloud is GovCloud. It is created, hosted and managed by AWS. But it is addressed to all the branches of the US government. There is also the NYSE Capital Market Community Platform, which is sort of a financial-industry cloud.

Hybrid cloud

The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).

I have to admit that this definition confuses me a bit. I may just write to the NIST to ask what they meant!

The distinction between private and public is not a distinction of location of the processing, but more of the type of processing that you do. So, to me, needing more resources and bursting to the cloud does not change the type of processing that you do.

I would expect that what makes for a hybrid cloud is the use of differing cloud technologies together. So you could have an on-premises OpenStack Cloud for baseline processing and obtain (burst to) AWS instances for peak usage. This would also mean that a hybrid cloud (made up of different cloud platforms) could then be either private, public, or community.

Parting words

The NIST definitions I have shared in this series of blog posts help us narrow down features, functionality and models so we can be more accurate when talking about the cloud.

In my opinion, they provide a solid base of understanding, and general classification, but they also don't go far enough along the branches of choices when it comes to cloud computing. The five characteristics, three service models and four deployment models are more than just marketing buzzwords. They are the foundation on which the detailed technical cloud architecture should be built. They are the start of the cloud discussion, they are not the whole discussion.

tags: cloud computing, SAS Professional Services, SAS Programmers
12月 052014

cloud3svcsIn a recent post, I talked about the 5 essential characteristics of cloud computing. In today’s post, I will cover the three service models available in the cloud as they are defined by the National Institute of Standards and Technology, or NIST.

Like the story of Goldilocks, when it comes to choosing service models for the cloud, there is no right or wrong.  Your choice depends who you are and what you want. (Granted, that’s my own interpretation of the story).

The idea of this post is not to add more hype around cloud service models (if that’s even possible!) but to use simple examples to illustrate them.

Software as a Service (SaaS)

“The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.”

Do you remember when you had to install an e-mail client on your desktop? Remember Outlook Express? (shudders!). The younger readers will not believe me, but in those prehistoric times, you downloaded your e-mails to your computer. And if you were not careful, they were deleted from the server. Yes, deleted! What a concept!

But this started to change with the advent of web-based e-mail, like Hotmail and GMail. These were likely the first examples of Software-As-A-Service. And you no longer needed a desktop e-mail client.

When I explain SaaS to friends and family, I usually say that it's a lot like renting a home:

  • You pay for the use (number of month) of a house or apartment
  • If something major breaks, your landlord will be the one to fix it
  • You can bring your furniture, may be even paint the inside of your home, but you can't re-model
  • If it turns out to be too small after a while, it's easy to leave and find a bigger place

Infrastructure as a Service (IaaS)

“The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).”

Infrastructure as a Service is also fairly straightforward to understand. Instead of renting out access to an application, you rent out the essential building blocks that enable you to build or run any type of application you desire. Amazon Web Services (AWS) is the archetypal example, and is the leading IaaS provider at this time. On AWS, you can rent a wide variety of servers, storage or network devices for dollars per hour. You can then assemble and build upon those to create any type of software application.

To continue with my house theme, IaaS would be like renting out the tools necessary to build your house and renting the land on which it stands. So, you could make your house twice as big when you have family over, but the price would go up. You could also decide to build hundreds of apartments, and rent them out to others, therefore making a profit.

Platform as a Service (PaaS)

“The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.”

I kept this one for last because it’s the hardest one to explain, and in my opinion, the one that is the least clearly defined at this point. It sits between IaaS and SaaS, but where exactly will depend on who you talk to. Essentially, we want more flexibility than SaaS, but less grunt work than IaaS. The examples I use when talking about this are usually the Apple App Store, Google App Engine or the platforms.

To continue stretching my house analogy to its limits, PaaS is a little like those new 3-D printed houses. It allows for quick building of a house. You can have any shape you want, within reason, but you cannot choose what materials it’s made of. You offer the service of building the house, and the customer can provide the blueprints for it. Or you provide standard modules like bathroom and kitchen, which can then be assembled like LEGO blocks.

Which service model is right for SAS users?

It is unlikely that SAS Institute will become an IaaS provider. We do, however, leverage IaaS providers to host SAS solutions in regions where we do not have a SAS-owned data center.  As hosted solutions become more popular with customers, this trend will likely increase.

SAS Visual Analytics for SAS Cloud is a recent SaaS offering. Over time, SAS will start offering the SaaS option for more products and solutions.

The advent of new application packaging and container technologies offer opportunities for SAS to provide users with PaaS options that will enable them to build their own SAS-based applications

In my opinion, it is likely that this will be the “not too hot, not too cold” spot for us, as it will enable us to continue providing the great analytics tools that we always have, and let our customers put these tools together into applications that fit their own specific needs.

Parting words

I hope that you found this interesting and that it gave you a better understanding of these terms. If you want to read further on these topics, I really like these two other ways of describing the three services models: the "host/build/consume" shortcut, and "Pizza-As-A-Service", the other PaaS.

If you have comments, concerns or feedback, please share in the comment box below.

tags: cloud computing, SAS Professional Services, SAS Programmers
12月 032014
In 2012, we presented a post showing how to run RStudio in the cloud on an Amazon server. There were 7 steps, including one with 7 sub-steps, one of which had 6 sub-sub-steps. It was still pretty easy, for what it was-- an effectively free computer in the cloud to run R on.

Today, we show the modern-- 3 years later!-- way to get the same result, only this approach is much easier, and the resulting installation includes all the best goodies of RStudio, including Markdown -> PDF and Hadley Wickham's packages pre-installed. Update, 2016: Digital ocean has changed their set-up, slightly. Check out the first step or two of this post in place of the first two steps below, if you're just starting out.

The approach builds on Docker, an infrastructure that saves start-up time and overhead, as well as efforts led by Dirk Eddelbuettel and Carl Boettiger to develop a Docker application of R. This project is called Rocker, and interested readers are encouraged to read the details. But if you want to just get up and running, here are the simple steps to get going.

1. Go to Digital Ocean and sign up for an account. By using this link, you will get a $10 credit. (Full disclosure: Ken will also get a $25 credit once you spend $25 real dollars there.) The reason to use this provider is that they have a system ready to run with Docker already built in. In addition, their prices are quite reasonable. You will need to use a credit card or PayPal to activate your account, but you can play for a long time with your $10 credit-- the cheapest machine is $.007 per hour, up to a $5 per month maximum.

2. On your Digital Ocean page, click "Create droplet". Then choose an (arbitrary) name, a size (meaning cost/power) of machine, and the region closest to you. You can ignore the settings. Under "Select Image", choose the "Applications" tab and select "Docker (1.3.2 on 14.04)". (The numbers in the parentheses are the Docker and Ubuntu version, and might change over time.) Then click "Create Droplet" at the bottom of the page.

3. It takes about a minute for the machine to start up. When it's ready, click the "Console Access" button. This opens a text terminal to your Ubuntu machine, inside your web page. Press enter to get a prompt, and log in (your username is root) using the password that was sent to your e-mail. You'll have to change the password.

4a. To start a terminal session of R, type
docker run --rm -ti rocker/r-base
you should see a bunch of messages about pulling and downloading, but eventually you will get the ">" prompt-- you can do R in here, but who would want to?

4b. To get RStudio server running, type
docker run -d -p 8787:8787 rocker/rstudio
But this is really not where you want to be. Instead, run the following command, to get a set-up that includes more useful packages installed in and with R.
docker run -d -p 8787:8787 rocker/hadleyverse

5. Use it! The IP address of your server is displayed below the terminal where you typed in your docker command. Open a new browser tab and go to the address http://(ip address):8787. For example: You'll see the RStudio login screen, and can enter "rstudio" (without the quotes) as the username and password. The system is well tuned enough that you can open a new file --> markdown --> PDF and immediately click "Knit PDF", and see the example document beautifully presented back to you in moments.

That's it. It's still way cooler than sliced bread. let us know if you try it, and if you run into any trouble. Oh, and if you're feeling creeped out by the standard username and password in your RStudio, you can set them up from your docker command as follows.
docker run -d -p 8787:8787 -e USER=ken -e PASSWORD=ken rocker/hadleyverse
Other customization details and further information can be found on this Rocker page.

I should perhaps have noted that what you are running here is in fact RStudio Server, and that you can allow additional users on your RStudio using instructions found here.

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.
11月 072014

NIST 5 characteristics of cloud computingIf someone asks you whether SAS runs in the cloud, there are exactly two wrong answers: "yes" and "no". Instead, this question should spark a discussion. It should be a discussion about which of the five characteristics of cloud computing they are interested in. The answers will point you in one direction or another.

If you have seen any SAS presentations about the cloud, you probably have seen this diagram. This terminology comes to us from the National Institute of Standards and Technology, and more precisely, from The NIST Definition of Cloud that has been adopted by SAS.

“Why do we need definitions and standards?,” some of you may ask.  If ever there was a term loaded with hype and misunderstanding, it is “cloud”. The NIST provides us with a common language to talk about it.

In today’s installment of this three-part series, I will discuss the five essential characteristics of cloud computing from NIST, and try to illustrate them with simple examples applied to situations most would be familiar with.

On-demand self-service

“A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.”

Do you remember that day long ago when you signed up for your first online e-mail account? (mine was "Hotmail", back in 1999).  Did you have to talk to an employee of the e-mail provider to obtain the account?  No, you did not. (Ok, if you did, then you probably have an older account than mine!). That’s the idea we are getting at here. This does not apply not just to services, like e-mail, but also to the raw computing resources that underpin software solutions: servers, CPU, databases, etc…

Broad network access

“Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations ).”

To stay with my e-mail analogy, I can access my e-mail account from anywhere in the world, using any computer or using a smart phone, tablet or mobile device. I can walk into a cyber café in New Zealand and quickly access those resources. That’s what broad network access means: any device, anywhere.

Resource pooling

“The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth.”

I am pretty sure that GMail does not hold my e-mails in a specific server in a specific location. They do not want to buy a new blade each time someone signs up. Instead, my data is likely scattered across many machines, and these machines also hold the e-mails of many others like me.

Sharing resources is what pooling is all about. Just like the carpool lane you may have used to come to the office this morning, it makes more sense to squeeze four people in a car, rather than to have four cars at 25% capacity each. Please tell your carpool buddies tomorrow that your car is now considered to be a “multi-tenant vehicle”. They’ll love it.

Rapid elasticity

“Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.”

In the era of binge-watching, Netflix knows that the releases of series like Orange Is The New Black or House of Cards will mean a huge upsurge in their viewership. Since all the Netflix content is delivered via Amazon Web Services, when the demand for shows grows, the quantity of Amazon resources used in the background grows as well. Then, as soon as everyone goes to bed after a 12-hour TV marathon and the demand drops, so does the supply.

Measured service

“Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.”

This one is an interesting concept because of its dual nature. On the one hand, Amazon will scale the services it offers Netflix simply because it is measuring the increasing demand and knows it will run out unless it does so.  But on the other hand, it also measures exactly how much service it’s providing Netflix, so that it can be sure to send them a detailed (and likely quite large) monthly bill.  In a way, very much like the electric companies do.

Parting words

Also, and this is only my personal opinion, standards are good. Well, I don’t necessarily agree with all standards, but I like that they exist. One can disagree with a standard. And reluctantly follow it anyways. Or deviate from a standard - given a good, justifiable reason.

In my mind, the opposite of a standard is a “free-for-all”. And that is never a good starting point for discussions.

Of course, I also share xkcd’s wariness against new standards.

Thoughts, comments, questions, better examples, disagreement on standards?

tags: cloud computing, SAS Administrators, SAS Professional Services, SAS Programmers
10月 052012
I vividly remember the process of changing TV channels when I was a kid.  Turn the channel select knob, turn a dial to rotate the antenna, then make fine adjustments to tune in the channel.  This process usually resulted in an adequate picture, but it required considerable effort, the quality [...]
10月 042012
Among the volumes of information about SAS we were showered with during our new SAS employee orientation, two seemingly insignificant tidbits stuck in my mind. SAS has dedicated about 12 acres of its campus grounds to a state-of-the-art solar farm. SAS prefers employment to outsourcing. Thus, the people we would [...]