Cloud Computing

9月 092016
 

If you’re doing data processing in the cloud or using container-enabled infrastructures to deploy your software, you’ll want to learn more about SAS Analytics for Containers. This new solution puts SAS into your existing container-enabled environment – think Docker or Kubernetes – giving data scientists and analysts the ability to perform sophisticated analyses using SAS, and all from the cloud.

The product’s coming out party is the Analytics Experience 2016 conference, September 12-14, 2016 at the Bellagio in Las Vegas. In advance of that event, I sat down with Product Manager Donna DeCapite to learn a little more about SAS Analytics for Containers and find out why it’s such a big deal for organizations who use containers for their applications.

Larry LaRusso: Before we get into details around the solution, and with apologies for my ignorance, let me start out with a really basic question. What are containers?

Donna DeCapite: Cloud containers are all the rage in the IT world. They’re an alternative to virtual machines.  They allow applications and any of its dependencies to be deployed and run in isolated space. Organizations will build and deploy in the container environment because it allows you to build only the necessary system libraries and functions to run a given piece of software. IT prefer it because it’s easy to replicate, and it’s faster and easier to deploy.

LL: And SAS Analytics for Containers will allow organizations to run SAS’ analytics in this environment, the containers?

DD: In short, yes. SAS Analytics for Containers provides a powerful set of data access, analysis and graphical tools to organizations within a container-based infrastructure like Docker. This takes advantage of the build once, run anywhere flexibility of the container environment, making it easier and faster to use SAS Analytics in the cloud.

LL: Who, within an organization, will be the primary users?

DC: Really anyone working with containers in the cloud or anyone working with DevOps. Data scientists will embrace SAS Analytics for Containers because it allows them to access data from nearly any source and easily perform sophisticated analyses using SAS Studio, our browser-based interface, or Jupyter Notebook, an open source notebook-style interface. For SAS developers, the product allows them to quickly provision IT resource to sandbox development ideas. And as I mentioned earlier, IT will appreciate the ease with which applications can be deployed, distributed and managed.

LL: You mentioned data scientists, so now I’m curious; we’re talking complex analyses here, yes?

DC: Definitely. Regressions, decision trees, Bayesian analysis, spatial point pattern analysis, missing data analysis, and many additional statistical analysis techniques can be performed with SAS Analytics for Containers. And in addition to sophisticated statistical and predictive analytics, there are a ton of prebuilt SAS procedures included to handle common tasks like data manipulation, information storage, and report writing, all available via SAS Studio and its assistive nature.

LL: What about organizations that have massive amounts of data? Can they use SAS Analytics for Containers as well?

DC: Yes, SAS Analytics for Containers allows you to take advantage of the processing power of your Hadoop Cluster by leveraging the SAS Accelerators for Hadoop like Code and Scoring Accelerator.

LL: Thank you so much for your time Donna. I know you’ve educated me quite a deal! How about more information. Where can individuals learn more about SAS Analytics for Containers?

DC: If you’re going to Analytics Experience 2016, consider coming to Michael Ames’ table talk, Cloud Computing: How Does It Work? It’s scheduled for Monday, September 12 from 4:30-5:00pm Vegas time. If you’re not going to the event, the SAS website is the best place for more information. Probably the best place to start is the SAS Analytics for Containers home page.

tags: analytics conference, analytics experience, cloud computing, SAS Analytics for Containers, sas studio

Analytics in the Cloud gets a whole lot easier with SAS Analytics for Containers was published on SAS Users.

1月 182016
 
I love GitHub for version control and collaboration, though I'm no master of it. And the tools for integrating git and GitHub with RStudio are just amazing boons to productivity.

Unfortunately, my University-supplied computer does not play well with GitHub. Various directories are locked down, and I can't push or pull to GitHub directly from RStudio. I can't even use install_github() from the devtools package, which is needed for loading Shiny applications up to Shinyapps.io. I lived with this for a bit, using git from the desktop and rsconnect from a home computer. But what a PIA.

Then I remembered I know how to put RStudio in the cloud-- why not install R there, and make that be my GitHub solution?

It works great. The steps are below. In setting it up, I discovered that Digital Ocean has changed their set-up a little bit, so I update the earlier post as well.

1. Go to Digital Ocean and sign up for an account. By using this link, you will get a $10 credit. (Full disclosure: I will also get a $25 credit once you spend $25 real dollars there.) The reason to use this provider is that they have a system ready to run with Docker already built in, which makes it easy. In addition, their prices are quite reasonable. You will need to use a credit card or PayPal to activate your account, but you can play for a long time with your $10 credit-- the cheapest machine is $.007 per hour, up to a $5 per month maximum.

2. On your Digital Ocean page, click "Create droplet". Click on "One-click Apps" and select "Docker (1.9.1 on 14.04)". (The numbers in the parentheses are the Docker and Ubuntu version, and might change over time.) Then a size (meaning cost/power) of machine and the region closest to you. You can ignore the settings. Give your new computer an arbitrary name. Then click "Create Droplet" at the bottom of the page.

3. It takes a few seconds for the droplet to spin up. Then you should see your droplet dashboard. If not, click "Droplets" from the top bar. Under "More", click "Access Console". This brings up a virtual terminal to your cloud computer. Log in (your username is root) using the password that digital ocean sent you when the droplet spun up.

4. Start your RStudio container by typing: docker run -d -p 8787:8787 -e ROOT=TRUE rocker/hadleyverse

You can replace hadleyverse with rstudio if you like, for a quicker first-time installation, but many R users will want enough of Hadley Wickham's packages that it makes sense to install this version. The -e ROOT=TRUE is crucial for our approach to installing git into the container, but see the comment below from Petr Simicek below for another way to do the same thing.

5. Log in to your Cloud-based RStudio. Find the IP address of your cloud computer on the droplet dashboard, and append :8787 to it, and just put it into your browser. For example: http://135.104.92.185:8787. Log in as user rstudio with password rstudio.

6. Install git, inside the Docker container. Inside RStudio, click Tools -> Shell.... Note: you have to use this shell, it's not the same as using the droplet terminal. Type: sudo apt-get update and then sudo apt-get install git-core to install git.

git likes to know who you are. To set git up, from the same shell prompt, type git config --global user.name "Your Handle" and git config --global user.email "an.email@somewhere.edu"

7. Close the shell, and in RStudio, set things up to work with GitHub: Go to Tools -> Global Options -> Git/SVN. Click on create RSA key. You don't need a name for it. Create it, close the window, then view it and copy it.

8. Open GitHub, go to your Profile, click "Edit Profile", "SSH keys". Click "Add key", and just paste in the stuff you copied from RStudio in the previous step.

You're done! To clone an existing repos from Github to your cloud machine, open a new project in RStudio, and select Version Control, then Git, and paste in the URL name that GitHub provides. Then work away!

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, other than as mentioned above, the aggregator is violating the terms by which we publish our work.
4月 272015
 

SAS recently performed testing using the Intel Cloud Edition for Lustre* Software - Global Support (HVM) available on AWS marketplace to determine how well a standard workload mix using SAS Grid Manager performs on AWS.  Our testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. You can find the detailed results in the technical paper, SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre.

In addition to the paper, Amazon will be publishing a post on the AWS Big Data Blog that will take a look at the approach to scaling the underlying AWS infrastructure to run SAS Grid Manager to meet the demands of SAS applications with demanding I/O requirements.  We will add the exact URL to the blog as a comment once it is published.

System design overview – network, instance sizes, topology, performance

For our testing, we set up the following AWS infrastructure to support the compute and IO needs for these two components of the system:

  • the SAS workload that was submitted using SAS Grid Manager
  • the underlying Lustre file system required to meet the clustered file system requirement of SAS Grid Manager.

SAS Grid Manager and Lustre shared file configuration on AWS clour

The SAS Grid nodes in the cluster are i2.8xlarge instances.  The 8xlarge instance size provides proportionally the best network performance to shared storage of any instance size, assuming minimal EBS traffic.  The i2 instance also provides high performance local storage, which is covered in more detail in the following section.

The use of an 8xlarge size for the Lustre cluster is less impactful since there is significant traffic to both EBS and the file system clients, although an 8xlarge is still is more optimal.  The Lustre file system has a caching strategy, and you will see higher throughput to clients in the case of frequent cache hits which effectively reduces the network traffic to EBS.

Steps to maximize storage I/O performance

The shared storage for SAS applications needs to be high speed temporary storage.  Typically temporary storage has the most demanding load.  The high I/O instance family, I2, and the recently released dense storage instance, D2, provide high aggregate throughput to ephemeral (local) storage.  For the SAS workload tested, the i2.8xlarge has 6.4 TB of local SSD storage, while the D2 has 48 TB of HDD.

Throughput testing and results

We wanted to achieve a throughput of least 100 MB/sec/core to temporary storage, and 50-75 MB/sec/core to shared storage.  The i2.8xlarge has 16 cores (32 virtual CPUs, each virtual CPU is a hyperthread on a core, and a core has two hyperthreads).  Testing done with lower level testing tools (fio and a SAS tool, iotest.sh)  showed a throughput of about 3 GB/sec to ephemeral (temporary) storage and about 1.5 GB/sec to shared storage.  The shared storage performance does not take into account file system caching, which Lustre does well.

This testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. For full details of the testing configuration and results, please see the SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre technical white paper.

 

tags: cloud computing, configuration, grid, SAS Administrators

The post Can I run SAS Grid Manager in the AWS cloud? appeared first on SAS Users.

4月 022015
 

In my last blog post, I introduced SAS Visual Analytics for UN Comtrade, which helps anyone glean answers from the most comprehensive collection of international trade data in the world. I’d like to share some of what we learned in the development process as it pertains to big data, high […]

The post Big data lessons learned from visualizing 27 years of international trade data appeared first on SAS Voices.

1月 162015
 

cloud4modelsThis is the last of my series of posts on the NIST definition of cloud computing. As you can see from this Wikipedia definition, calling anything a “cloud” is likely to be the fuzziest way of describing it.

In meteorology, a cloud is a visible mass of liquid droplets or frozen crystals made of water or various chemicals suspended in the atmosphere above the surface of a planetary body. These suspended particles are also known as aerosols and are studied in the cloud physics branch of meteorology.

Not that there is anything wrong with the label “cloud”--it’s a shortcut that allows us to quickly convey an idea. But for anything beyond that, when talking about functionality, we would be well advised to define and describe “cloud” in as much detail as possible so that all people involved have the same picture in their mind, and not whatever it is they think of when they think of “cloud”.

The NIST definitions help us narrow down features, functionality and models, but those are still only broad categories that leave certain gaps in which misunderstandings can easily sprout. I encourage you to use these definitions, but also to go further and describe cloud architectures by using terms that are as precise as possible.

In recent posts, I talked about the five characteristics of cloud, as well as the three service models. In this final installment of the series, I will discuss the four cloud deployment models.

Public cloud

The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.

Of the four deployment models, this one is the easiest to grasp. A public cloud is simply a cloud that you rent out to others. Essentially, if you build it (physically) and you charge money for others to use it, then you have public cloud. The typical providers of public clouds include Amazon Web Services (AWS), Microsoft Azure, Rackspace, Google Cloud.

Despite the name, the public cloud resources you are renting out are not necessarily accessible to the general public.

Private cloud

The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.

This deployment model is a bit more challenging to describe. The “private” in “private cloud” does not imply any more “privacy” or “security” than a public cloud would, but instead means “your own private use”. Another term you may hear is “corporate cloud”, a cloud used solely by a corporation, not by its customers.

For example, a private cloud may be set up within an organization so that different divisions have access to a shared virtual computing environment, for a variety of purposes:  development, testing, training, demos.  These are typically resources that are not accessible to anyone outside the organization.

Some organizations will decide to build an on-premises cloud using physical servers and software such as OpenStack. Others will opt instead to rent out the required resources from a cloud provider like AWS. Regardless of the choice, both of these fall under private cloud.

Private cloud environments are usually thought of as server configurations physically running on-premises. However, if the organization decides tomorrow to replace all these servers with Amazon instances running in the AWS cloud, it would still be considered a private cloud.  This is because, once again, it would be used for internal purposes, even though it is physically off-premises.

In doing research for this blog post, I came across numerous articles on the internet that seem to confuse private cloud with either “on-premises” or “secure access”. For example, many people consider AWS’ VPC offering to be a private cloud. As others have pointed out, it is not inherently a private cloud. It is a more secure way of accessing public cloud resources.

Community cloud

The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.

If tomorrow, I decided to start my own insurance company and create a new public cloud dedicated to only insurance companies, this would make it a community cloud. I could host it myself or have another public cloud provider manage the back-end for me.

There are few examples of these, and I have not worked with any of them myself. A good example of community cloud is GovCloud. It is created, hosted and managed by AWS. But it is addressed to all the branches of the US government. There is also the NYSE Capital Market Community Platform, which is sort of a financial-industry cloud.

Hybrid cloud

The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).

I have to admit that this definition confuses me a bit. I may just write to the NIST to ask what they meant!

The distinction between private and public is not a distinction of location of the processing, but more of the type of processing that you do. So, to me, needing more resources and bursting to the cloud does not change the type of processing that you do.

I would expect that what makes for a hybrid cloud is the use of differing cloud technologies together. So you could have an on-premises OpenStack Cloud for baseline processing and obtain (burst to) AWS instances for peak usage. This would also mean that a hybrid cloud (made up of different cloud platforms) could then be either private, public, or community.

Parting words

The NIST definitions I have shared in this series of blog posts help us narrow down features, functionality and models so we can be more accurate when talking about the cloud.

In my opinion, they provide a solid base of understanding, and general classification, but they also don't go far enough along the branches of choices when it comes to cloud computing. The five characteristics, three service models and four deployment models are more than just marketing buzzwords. They are the foundation on which the detailed technical cloud architecture should be built. They are the start of the cloud discussion, they are not the whole discussion.

tags: cloud computing, SAS Professional Services, SAS Programmers
12月 052014
 

cloud3svcsIn a recent post, I talked about the 5 essential characteristics of cloud computing. In today’s post, I will cover the three service models available in the cloud as they are defined by the National Institute of Standards and Technology, or NIST.

Like the story of Goldilocks, when it comes to choosing service models for the cloud, there is no right or wrong.  Your choice depends who you are and what you want. (Granted, that’s my own interpretation of the story).

The idea of this post is not to add more hype around cloud service models (if that’s even possible!) but to use simple examples to illustrate them.

Software as a Service (SaaS)

“The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.”

Do you remember when you had to install an e-mail client on your desktop? Remember Outlook Express? (shudders!). The younger readers will not believe me, but in those prehistoric times, you downloaded your e-mails to your computer. And if you were not careful, they were deleted from the server. Yes, deleted! What a concept!

But this started to change with the advent of web-based e-mail, like Hotmail and GMail. These were likely the first examples of Software-As-A-Service. And you no longer needed a desktop e-mail client.

When I explain SaaS to friends and family, I usually say that it's a lot like renting a home:

  • You pay for the use (number of month) of a house or apartment
  • If something major breaks, your landlord will be the one to fix it
  • You can bring your furniture, may be even paint the inside of your home, but you can't re-model
  • If it turns out to be too small after a while, it's easy to leave and find a bigger place

Infrastructure as a Service (IaaS)

“The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).”

Infrastructure as a Service is also fairly straightforward to understand. Instead of renting out access to an application, you rent out the essential building blocks that enable you to build or run any type of application you desire. Amazon Web Services (AWS) is the archetypal example, and is the leading IaaS provider at this time. On AWS, you can rent a wide variety of servers, storage or network devices for dollars per hour. You can then assemble and build upon those to create any type of software application.

To continue with my house theme, IaaS would be like renting out the tools necessary to build your house and renting the land on which it stands. So, you could make your house twice as big when you have family over, but the price would go up. You could also decide to build hundreds of apartments, and rent them out to others, therefore making a profit.

Platform as a Service (PaaS)

“The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.”

I kept this one for last because it’s the hardest one to explain, and in my opinion, the one that is the least clearly defined at this point. It sits between IaaS and SaaS, but where exactly will depend on who you talk to. Essentially, we want more flexibility than SaaS, but less grunt work than IaaS. The examples I use when talking about this are usually the Apple App Store, Google App Engine or the SalesForce.com platforms.

To continue stretching my house analogy to its limits, PaaS is a little like those new 3-D printed houses. It allows for quick building of a house. You can have any shape you want, within reason, but you cannot choose what materials it’s made of. You offer the service of building the house, and the customer can provide the blueprints for it. Or you provide standard modules like bathroom and kitchen, which can then be assembled like LEGO blocks.

Which service model is right for SAS users?

It is unlikely that SAS Institute will become an IaaS provider. We do, however, leverage IaaS providers to host SAS solutions in regions where we do not have a SAS-owned data center.  As hosted solutions become more popular with customers, this trend will likely increase.

SAS Visual Analytics for SAS Cloud is a recent SaaS offering. Over time, SAS will start offering the SaaS option for more products and solutions.

The advent of new application packaging and container technologies offer opportunities for SAS to provide users with PaaS options that will enable them to build their own SAS-based applications

In my opinion, it is likely that this will be the “not too hot, not too cold” spot for us, as it will enable us to continue providing the great analytics tools that we always have, and let our customers put these tools together into applications that fit their own specific needs.

Parting words

I hope that you found this interesting and that it gave you a better understanding of these terms. If you want to read further on these topics, I really like these two other ways of describing the three services models: the "host/build/consume" shortcut, and "Pizza-As-A-Service", the other PaaS.

If you have comments, concerns or feedback, please share in the comment box below.

tags: cloud computing, SAS Professional Services, SAS Programmers
12月 032014
 
In 2012, we presented a post showing how to run RStudio in the cloud on an Amazon server. There were 7 steps, including one with 7 sub-steps, one of which had 6 sub-sub-steps. It was still pretty easy, for what it was-- an effectively free computer in the cloud to run R on.

Today, we show the modern-- 3 years later!-- way to get the same result, only this approach is much easier, and the resulting installation includes all the best goodies of RStudio, including Markdown -> PDF and Hadley Wickham's packages pre-installed. Update, 2016: Digital ocean has changed their set-up, slightly. Check out the first step or two of this post in place of the first two steps below, if you're just starting out.

The approach builds on Docker, an infrastructure that saves start-up time and overhead, as well as efforts led by Dirk Eddelbuettel and Carl Boettiger to develop a Docker application of R. This project is called Rocker, and interested readers are encouraged to read the details. But if you want to just get up and running, here are the simple steps to get going.



1. Go to Digital Ocean and sign up for an account. By using this link, you will get a $10 credit. (Full disclosure: Ken will also get a $25 credit once you spend $25 real dollars there.) The reason to use this provider is that they have a system ready to run with Docker already built in. In addition, their prices are quite reasonable. You will need to use a credit card or PayPal to activate your account, but you can play for a long time with your $10 credit-- the cheapest machine is $.007 per hour, up to a $5 per month maximum.

2. On your Digital Ocean page, click "Create droplet". Then choose an (arbitrary) name, a size (meaning cost/power) of machine, and the region closest to you. You can ignore the settings. Under "Select Image", choose the "Applications" tab and select "Docker (1.3.2 on 14.04)". (The numbers in the parentheses are the Docker and Ubuntu version, and might change over time.) Then click "Create Droplet" at the bottom of the page.

3. It takes about a minute for the machine to start up. When it's ready, click the "Console Access" button. This opens a text terminal to your Ubuntu machine, inside your web page. Press enter to get a prompt, and log in (your username is root) using the password that was sent to your e-mail. You'll have to change the password.

4a. To start a terminal session of R, type
docker run --rm -ti rocker/r-base
you should see a bunch of messages about pulling and downloading, but eventually you will get the ">" prompt-- you can do R in here, but who would want to?

4b. To get RStudio server running, type
docker run -d -p 8787:8787 rocker/rstudio
But this is really not where you want to be. Instead, run the following command, to get a set-up that includes more useful packages installed in and with R.
docker run -d -p 8787:8787 rocker/hadleyverse


5. Use it! The IP address of your server is displayed below the terminal where you typed in your docker command. Open a new browser tab and go to the address http://(ip address):8787. For example: http://135.104.92.185:8787. You'll see the RStudio login screen, and can enter "rstudio" (without the quotes) as the username and password. The system is well tuned enough that you can open a new file --> markdown --> PDF and immediately click "Knit PDF", and see the example document beautifully presented back to you in moments.

That's it. It's still way cooler than sliced bread. let us know if you try it, and if you run into any trouble. Oh, and if you're feeling creeped out by the standard username and password in your RStudio, you can set them up from your docker command as follows.
docker run -d -p 8787:8787 -e USER=ken -e PASSWORD=ken rocker/hadleyverse
Other customization details and further information can be found on this Rocker page.

Update
I should perhaps have noted that what you are running here is in fact RStudio Server, and that you can allow additional users on your RStudio using instructions found here.

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.