big data

10月 232021
 

Rijkswaterstaat (RWS) is the Netherlands main agency for design, construction, management and maintenance for waterways and infrastructure. Their mission is to promote safety, mobility and quality of life in the Netherlands. They are the masterminds behind some of the most prestigious water projects in the world. In a recent panel [...]

A conversation with Rijkswaterstaat: How SAS is helping keep the Netherlands safe was published on SAS Voices by Olivia Ojeda

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

9月 282021
 

SQL is an important language for any programmer working with data. In SAS Cloud Analytic Services (CAS) you can execute SQL queries using the fedSQL.execDirect CAS action!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the

proc cas;
    fedSQL.execDirect / 
        query="select Make,
                      Model,
                      MSRP,
                      mean(MPG_City,MPG_Highway) as MPG_Avg
               from casuser.cars
               where Make='Toyota'
               order by MPG_Avg desc";
quit;

And the results:

The execDirect action executes the query in the distributed CAS environment and returns the expected results.

Using a SOURCE block

While the query we executed was simple, this is not always the case. Adding a complicated query as a string can make writing the query difficult.

Instead of specifying the query as a string in the CAS procedure, use the SOURCE statement to embed text in a variable. In the following example, I'll execute the same query as before. However, this time I'll nest the query inside a SOURCE block by specifying the SOURCE statement and name the variable MPG_toyota. Then I'll add the query inside the SOURCE block and use the ENDSOURCE statement to end the block.

proc cas;
    source MPG_toyota;
        select Make,
               Model,
               MSRP,
               mean(MPG_City,MPG_Highway) as MPG_Avg
        from casuser.cars
        where Make='Toyota'
        order by MPG_Avg desc;
    endsource;
 
    fedSQL.execDirect / query=MPG_toyota;
quit;

After the SOURCE block is complete, you can reference the variable as the value to the query parameter in the execDirect action.

And the results:

The results returned are the same, but using a SOURCE block makes the code easier to write and maintain.

Summary

In SAS Viya, FedSQL provides a scalable, threaded, high-performance way to query data and create new CAS tables from existing tables in the CAS server. In this example we saw two distinct ways to run SQL code on SAS Viya. This is only the beginning. See the resources below for more details on PROC FedSQL.

Resources

SAS® Viya®: FedSQL Programming for SAS® Cloud Analytic Services
FEDSQL Procedure
execDirect CAS Action
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing SQL in SAS Viya was published on SAS Users.

8月 192021
 

In Part 1 of my series fetch CAS, fetch!, I executed the fetch CAS action to return rows from a CAS table. That was great, but what can you do with the results? Maybe you want to create a visualization that includes the top five cars by MSRP for all Toyota vehicles? How can we accomplish this task? We'll cover this question and provide several other examples in this post.

Save the results of a CAS action as a SAS data set

First, execute the table.fetch CAS action on the CARS in-memory table to filter for Toyota cars, return the Make, Model and MSRP columns, and sort the results by MSRP. Then save the results of the action in a variable using the results option. The results of an action return a dictionary to the client. The fetch action returns a dictionary with a single key, and the result table as the value. In this example, I'll name the variable toyota.

proc cas;
    table.fetch result=toyota / 
          table={name="cars", caslib="casuser",
                 where="Make='Toyota'",
                 vars={"Make","Model","MSRP"}
          },
          sortBy={
                 {name="MSRP", order="DESCENDING"}
          },
          index=FALSE,
          to=5;
...

After executing the code, the results of the action are stored in the variable toyota and not shown in the output.

Next, use the SAVERESULT statement to save the result table stored in the toyota variable. Since the variable is a dictionary, specify the variable name toyota, a dot, then the fetch key. This will access the result table from the dictionary. Finally, specify the DATAOUT= option with the name of the SAS data set to create.

proc cas;
    table.fetch result=toyota / 
          table={name="cars", caslib="casuser",
                 where="Make='Toyota'",
                 vars={"Make","Model","MSRP"}
          },
          sortBy={
                 {name="MSRP", order="DESCENDING"}
          },
          index=FALSE,
          to=5;
 
     saveresult toyota.fetch dataout=work.top5;
quit;

After executing the code, the result table is saved as a SAS data set. The SAS data set is named top5 and saved to the WORK library.

 

 

Wondering what else can we do? Let's take a look.

Visualize the SAS data set

Now that the result table is saved as a SAS data set, you can use the SGPLOT procedure to create a bar chart! Consider the code below.

title justify=left height=14pt "Top 5 Toyota Cars by MSRP";
proc sgplot data=work.top5
         noborder nowall;
    vbar Model / 
          response=MSRP 
          categoryorder=respdesc
          nooutline
          fillattrs=(color=cx0379cd);
    label MSRP="MSRP";
quit;
title;

There it is! We processed our data in CAS using the fetch action, returned a smaller subset of results back to the client, then used traditional SAS programming techniques on the smaller table. This method will work similarly in other languages like Python and R. Then you can then use the native visualization packages of the language!

You can now use your imagination on what else to do with the raw data from the CARS table or from the top5 results table we produced with the table.fetch action. Feel free to get creative.

Summary

CAS actions are optimized to run in a distributed environment on extremely large tables. Your CAS table can contain millions or even billions of rows. Since the data in CAS can be extremely large, the goal is to process and subset the table on the CAS server, then return a smaller amount of data to the client for additional processing, visualization or modeling.

Additional resources

fetch Action
SAVERESULT Statement
SAS® Cloud Analytic Services: Fundamentals
Plotting a Cloud Analytic Services (CAS) In-Memory Table
Getting started with SGPLOT - Index
Code used in this post

CAS-Action! fetch CAS, fetch! - Part 2 was published on SAS Users.

6月 172021
 

Small and midsize (SMB) manufacturers are critical drivers of innovation and productivity, and agility often gives them a competitive advantage over larger organizations. But they have unique challenges compared to larger manufacturers who have the resources and capital to achieve greater economies of scale. To take advantage of their agility, [...]

An analytics how-to for small and midsize manufacturers was published on SAS Voices by Loren Sylvan

6月 172021
 

Small and midsize (SMB) manufacturers are critical drivers of innovation and productivity, and agility often gives them a competitive advantage over larger organizations. But they have unique challenges compared to larger manufacturers who have the resources and capital to achieve greater economies of scale. To take advantage of their agility, [...]

An analytics how-to for small and midsize manufacturers was published on SAS Voices by Loren Sylvan

5月 112021
 

It’s safe to say that SAS Global Forum is a conference designed for users, by users. As your conference chair, I am excited by this year’s top-notch user sessions. More than 150 sessions are available, many by SAS users just like you. Wherever you work or whatever you do, you’ll find sessions relevant to your industry or job role. New to SAS? Been using SAS forever and want to learn something new? Managing SAS users? We have you covered. Search for sessions by industry or topic, then add those sessions to your agenda and personal calendar.

Creating a customizable agenda and experience

Besides two full days of amazing sessions, networking opportunities and more, many user sessions will be available on the SAS Users YouTube channel on May 20, 2021 at 10:00am ET. After you register, build your agenda and attend the sessions that most interest you when the conference begins. Once you’ve viewed a session, you can chat with the presenter. Don’t know where to start? Sample agendas are available in the Help Desk.

For the first time, proceedings will live on SAS Support Communities. Presenters have been busy adding their papers to the community. Everything is there, including full paper content, video presentations, and code on GitHub. It all premiers on “Day 3” of the conference, May 20. Have a question about the paper or code? You’ll be able to post a question on the community and ask the presenter.

Want training or help with your code?

Code Doctors are back this year. Check out the agenda for the specific times they’re available and make your appointment, so you’ll be sure to catch them and get their diagnosis of code errors. If you’re looking for training, you’ll be quite happy. Training is also back this year and it’s free! SAS instructor-led demos will be available on May 20, along with the user presentations on the SAS Users YouTube channel.

Chat with attendees and SAS

It is hard to replicate the buzz of a live conference, but we’ve tried our best to make you feel like you’re walking the conference floor. And we know networking is always an important component to any conference. We’ve made it possible for you to network with colleagues and SAS employees. Simply make your profile visible (by clicking on your photo) to connect with others, and you can schedule a meeting right from the attendee page. That’s almost easier than tracking down someone during the in-person event.

We know the exhibit hall is also a big draw for many attendees. This year’s Innovation Hub (formerly known as The Quad) has industry-focused booths and technology booths, where you can interact in real-time with SAS experts. There will also be a SAS Lounge where you can learn more about various SAS services and platforms such as SAS Support Communities and SAS Analytics Explorers.

Get started now

I’ve highlighted a lot in this blog post, but I encourage you to view this 7-minute Innovation Hub video. It goes in depth on the Hub and all its features.

This year there is no reason not to register for SAS Global Forum…and attend as few or as many sessions as you want. Why? Because the conference is FREE!

Where else can you get such quality SAS content and learning opportunities? Nowhere, which is why I encourage you to register today. See you soon!

SAS Global Forum: Your experience, your way was published on SAS Users.

11月 192020
 
SAS loves data. It's our raison d'être. We've been dealing with Big Data long before the term was first used in 2005. A brief history of Big Data*:

  • In 1887, Herman Hollerith invented punch cards and a reader to organize census data.
  • In 1937, the US government had a punch-card reading machine created to keep track of 26 M Americans and 3 M employers as a result of the Social Security Act.
  • In 1943, Colossus was created to decipher Nazi codes during World War II.
  • In 1952, the National Security Agency was created to confront decrypting intelligence signals during the Cold War.
  • In 1965, the US Government built the first data center to store 742 M tax returns and 175 M sets of fingerprints.
  • In 1989, British computer scientist Tim Berners-Lee coined the phrase "World Wide Web" combining hypertext with the Internet.
  • In 1995, the first super-computer is built.
  • In 2005 Roger Mougalas from O'Reilly Media coined the term Big Data.
  • In 2006, Hadoop is created.

From

To


The story goes on to the tune of 90 percent of available data today has been created in the last two years!

As SAS (and the computing world) moves to the cloud, the question of, "How do I deal with my data (Big and otherwise), which used to be on-prem, in the cloud?" is at the forefront of many organizations. I ran across a series of relevant articles by my colleague, Nicolas Robert, on the SAS Support Communities on SAS and data access and storage on Google Cloud Storage (GCS). This post organizes the articles so you can quickly get an overview of the various options for SAS to access data in GCS.

Accessing Google Cloud Storage (GCS) with SAS Viya 3.5 – An overview

As the title suggests, this is an overview of the series. Some basic SAS terminology and capabilities are discussed, followed by an overview of GCS data options for SAS. Options include:

  • gsutil - the "indirect" way
  • REST API - the "web" way
  • gcsfuse - the "dark" way
  • BigQuery - the "smart" way.

In the overview Nicolas provides the pros and cons of each offering to help you decide which option works best for your situation. Below is a list of subsequent articles providing technical details, specific steps for usage, and sample code for each option.

Accessing files on Google Cloud Storage (GCS) using REST

The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage. In this article, Nicolas provides step-by-step instructions on using this API to access GCS files from SAS.

Accessing files on Google Cloud Storage (GCS) using SAS Viya 3.5 and Cloud Storage FUSE (gcsfuse)

Cloud Storage FUSE provides a command-line utility, named “gcsfuse”, which helps you mount a GCS bucket to a local directory so the bucket’s contents are visible and accessible locally like any other file. In this article, Nicolas presents rules for CLI usage, options for mounting a GCS bucket to a local directory, and SAS code for accessing the data.

SAS Viya 3.5 and Google Cloud Storage (GCS) Performance Feedback

In this article, Nicolas provides the results of a performance test of GCS integrated with SAS when accessed from cloud instances. New releases of SAS will only help facilitate integration and improve performance.

Accessing files on Google Cloud Storage (GCS) through Google BigQuery

Google BigQuery naturally interacts with Google Cloud Storage using popular big data file formats (Avro, Parquet, ORC) as well as commodity file formats like CSV and JSON. And since SAS can access Google BigQuery, SAS can access those GCS resources under the covers. In the final article, Nicolas debunks the myth that using Google BigQuery as middleware between SAS and GCS is cumbersome, not direct and requires data duplication.

Finally

Being able to access a wide variety of data on the major cloud providers' object storage technologies has become essential if not already mandatory. I encourage you to browse through the various articles, find your specific area of interest, and try out some of the detailed concepts.

* Big Data history compiled from A Short History Of Big Data, by Dr Mark van Rijmenam.

Accessing Google Cloud Storage (GCS) with SAS Viya was published on SAS Users.