As a SAS instructor, I’m often on the road, but, in April, my work travel path is going to take me to a place I haven’t visited since I was 12 years old. The occasion? SAS Global Forum 2017. The location? Walt Disney World® in Orlando. While the main conference [...]
Editor’s note: This is the second in a series of articles to help current SAS programmers add SAS Viya to their analytics skillset. In this post, Advisory Solutions Architect Steven Sober explores how to accomplish distributed data management using SAS Viya. Read additional posts in the series.
This article in the SAS Viya series will explore how to accomplish distributed data management using SAS Viya. In my next article, we will discuss how SAS programmers can collaborate with their open source colleagues to leverage SAS Viya for distributed data management.
Distributed Data Management
SAS Viya provides a robust, scalable, cloud ready distributed data management platform. This platform provides multiple techniques for data management that run distributive, i.e. using all cores on all compute nodes defined to the SAS Viya platform. The four techniques we will explore here are DATA Step, PROC DS2, PROC FEDSQL and PROC TRANSPOSE. With these four techniques SAS programmers and open source programmers can quickly apply complex business rules that stage data for downstream consumption, i.e., Analytics, visualizations, and reporting.
The rule for getting your code to run distributed is to ensure all source and target tables reside in the In-Memory component of SAS Viya i.e., Cloud Analytic Services (CAS).
The following statement is an example of starting a new CAS session. In the coding examples that follow we will reference this session using the key word MYSESS. Also note, this CAS session is using one of the default CAS library, CASUSER.
Binding a LIBNAME to a CAS session
Now that we have started a CAS session we can bind a LIBNAME to that session using the following syntax:
Note: CASUSER is one of the default CAS libraries created when you start a CAS session. In the following coding examples we will utilize CASUSER for our source and target tables that reside in CAS.
To list all default and end-user CAS libraries, use the following statement:
Collaborative distributed data management using SAS Viya was published on SAS Users.
a. Like DATA Step you use a two-level name to reference these tables
Collaborative distributed data management using SAS Viya was published on SAS Users.
For many years, we’ve been saying that to do advanced analytics, well, you must have good quality, clean and standardised data. And now we’re fast approaching the deadline for businesses to be compliant with the GDPR regulations (with fines for noncompliance up to four per cent of revenue). SAS’ capabilities [...]
Editor’s note: This is the first in a series of posts to help current SAS programmers add SAS Viya to their analytics skillset. In this post, SAS instructors Stacey Syphus and Marc Huber introduce you to the new Transitioning from Programming in SAS 9 to SAS Viya video library, designed to show SAS programmers [...]
The post Transitioning from programming in SAS 9 to SAS Viya appeared first on SAS Learning Post.
Digitalisation is blasting the cobwebs out from enterprises and organisations of all kinds – freeing them to innovate and take advantage of the always-on economy. But it’s also helping new disruptive players to gain an unexpectedly strong foothold in many markets. One of the key advantages these new players have [...]
When mentioning to friends that I’m going to Orlando for SAS Global Forum 2107, they asked if I would be taking my kids. Clearly my friends have not attended a SAS Global Forum before as there have been years where I never even left the hotel! My kids would NOT enjoy it… but, […]
The post Learn about SAS Studio, SAS Enterprise Guide and (drumroll) SAS Viya at SAS Global Forum 2017! appeared first on SAS Learning Post.
SAS® Viya™ 3.1 represents the third generation of high performance computing from SAS. Our journey started a long time ago and, along the way, we have introduced a number of high performance technologies into the SAS software platform:
- In-Database processing where SAS data quality and analytical processing occur within the data source, minimizing data movement and leveraging the native language of the data source.
- Grid computing to distribute processing over a number of computing nodes in a cluster.
- High Performance Analytics where in-memory calculations are processed across the nodes of a cluster including High Performance Risk and High Performance Data Mining.
- In-memory Visual Analytics and Visual Statistics powered by the SAS LASR analytic server.
Introducing Cloud Analytic Services (CAS)
SAS Viya introduces Cloud Analytic Services (CAS) and continues this story of high performance computing. CAS is the runtime engine and microservices environment for data management and analytics in SAS Viya and introduces some new and interesting innovations for customers. CAS is an in-memory technology and is designed for scale and speed. Whilst it can be set up on a single machine, it is more commonly deployed across a number of nodes in a cluster of computers for massively parallel processing (MPP). The parallelism is further increased when we consider using all the cores within each node of the cluster for multi-threaded, analytic workload execution. In a MPP environment, just because there are a number of nodes, it doesn’t mean that using all of them is always the most efficient for analytic processing. CAS maintains node-to-node communication in the cluster and uses an internal algorithm to determine the optimal distribution and number of nodes to run a given process.
However, processing in-memory can be expensive, so what happens if your data doesn’t fit into memory? Well CAS, has that covered. CAS will automatically spill data to disk in such a way that only the data that are required for processing are loaded into the memory of the system. The rest of the data are memory-mapped to the filesystem in an efficient way for loading into memory when required. This way of working means that CAS can handle data that are larger than the available memory that has been assigned.
The CAS in-memory engine is made up of a number of components - namely the CAS controller and, in an MPP distributed environment, CAS worker nodes. Depending on your deployment architecture and data sources, data can be read into CAS either in serial or parallel.
What about resilience to data loss if a node in an MPP cluster becomes unavailable? Well CAS has that covered too. CAS maintains a replicate of the data within the environment. The number of replicates can be configured but the default is to maintain one extra copy of the data within the environment. This is done efficiently by having the replicate data blocks cached to disk as opposed to consuming resident memory.
One of the most interesting developments with the introduction of CAS is the way that an end user can interact with SAS Viya. CAS actions are a new programming construct and with CAS, if you are a Python, Java, SAS or Lua developer you can communicate with CAS using an interactive computing environment such as a Jupyter Notebook. One of the benefits of this is that a Python developer, for example, can utilize SAS analytics on a high performance, in-memory distributed architecture, all from their Python programming interface. In addition, we have introduced open REST APIs which means you can call native CAS actions and submit code to the CAS server directly from a Web application or other programs written in any language that supports REST.
Whilst CAS represents the most recent step in our high performance journey, SAS Viya does not replace SAS 9. These two platforms can co-exist, even on the same hardware, and indeed can communicate with one another to leverage the full range of technology and innovations from SAS. To find out more about CAS, take a look at the early preview trial. Or, if you would like to explore the capabilities of SAS Viya with respect to your current environment and business objectives speak to your local SAS representative about arranging a ‘Path to SAS Viya workshop’ with SAS.
The holiday season is over – and you survived. You’ve made a lot of personal resolutions for 2017 - go to the gym, eat less sugar, save more money, visit Grandma more often. These are all great personal resolutions for 2017, but what about your analytics resolutions? If you are having trouble with your analytics resolutions then let us help you out. The recent release of SAS 9.4 M4 will help you make 2017 your best analytics year yet.
Resolution 1: Build more accurate models faster!
Now you will be able to leverage the power of the two most advanced analytics platforms on the market, SAS 9 and SAS Viya from one interface. Using SAS/Connect, users can call powerful SAS Viya analytics from within a process flow in Enterprise Miner. Would you prefer to use the super-fast, autotuned gradient boosting in SAS Viya? No problem! Call SAS Viya analytics directly from Enterprise Miner using the SAS Viya Code node. Then, from the same process flow you can also call open source models, all from one interface, SAS Enterprise Miner. Do you prefer to use SAS Studio on SAS 9? You will also be able to call SAS Viya analytics from SAS Studio as well. With SAS 9 M4, SAS gives you the ability to use both of SAS’ powerful platforms from one interface.
Resolution 2: Score your unstructured models in Hadoop without moving your data!
Got Hadoop? Got a lot of unstructured data? Now SAS Contextual Analysis allows you to score models in Hadoop using the SAS Code Accelerator add-on. Identify new insights with your unstructured text without ever having to move your data. Score it all in Hadoop. Uncover new trends and topics buried in documents, emails, social media and other unstructured text that is stored in Hadoop. You will be able to do it faster because you won’t have to move that data outside of Hadoop. SAS just keeps getting better in 2017.
Resolution 3: Make better forecasts using the weather!
Through SAS/ETS, econometricians and others wanting to incorporate weather data into their models can now do so directly through two new interface engines. SASERAIN enables SAS users to retrieve weather data from the World Weather Online website. And SASENOAA provides access to severe weather data from the National Oceanic and Atmospheric Administration (NOAA) Severe Weather Data Inventory (SWDI) web service. So now you’ll know why there was that big sales spike for rock salt and snow shovels in July! Who says there is no climate change in 2017?
Resolution 4: Estimate causal effects more efficiently!
The new CAUSALTRT procedure in SAS/STAT estimates the average causal effect of a binary treatment variable T on a continuous or discrete outcome Y. Depending on the application, the variable T can represent an intervention (such as smoking cessation – which is a great 2017 resolution - versus control), an exposure to a condition (such as attending private versus public schools), or an existing characteristic of subjects (such as high versus low socioeconomic status). The CAUSALTRT procedure estimates two types of causal effects: the average treatment effect and the average treatment effect for the treated. And best of all, the causal inference methods that the CAUSALTRT procedure implements are designed primarily for use with data from nonrandomized trials or observational studies, where you observe T and Y without assigning subjects randomly to the treatment conditions.
Resolution 5: Design better factory floors!
A factory floor can be a complicated place, with raw materials coming in one side, and finished products going out the other. Options are virtually unlimited for the placement of materials and equipment – and a poorly designed layout can dramatically reduce production capability. Yet experimenting with different layouts would be extremely costly and time consuming. Thankfully, SAS Simulation Studio (a component of SAS/OR) provides a rich – and animated – environment for testing alternatives and coming up with the most appropriate design. And it can handle any kind of discrete-event simulation, integrating with JMP for experimental design and input analysis, and with JMP and SAS for source data and analysis of simulation results. How will your factory floor simulation impact your productivity in 2017?
Much of my recent work has been along the theme of modernization. Analytics is not new for many of our customers, but standing still in this market is akin to falling behind. In order to continue to innovative and remain competitive, organizations need to be prepared to embrace new technologies […]
The study of social networks has gained importance over the years within social and behavioral research on HIV and AIDS. Social network research can show routes of potential viral transfer, and be used to understand the influence of peer norms and practices on the risk behaviors of individuals. This example analyzes the […]
Analyzing social networks using Python and SAS Viya was published on SAS Voices.