CAS

1月 072022
 

Welcome to the sixth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to generate simple descriptive statistics of a CAS table.

Let's begin by confirming the cars table is loaded into memory. With a connection to CAS established, execute the tableInfo action to view available in-memory tables. If necessary, you can execute the following code in SAS Studio to load the sashelp.cars table into memory.

conn.tableinfo(caslib="casuser")

The results show the cars table is loaded into memory and available for processing. Next, reference the cars table in the variable tbl. Then use the print function to show the value of the variable.

tbl = conn.CASTable('cars', caslib='casuser')
print(tbl)
CASTable('cars', caslib='casuser')

The results show that the tbl variable references the cars table in the CAS server.

Preview the CAS Table

First things first. Remember, the SWAT package blends the world of Pandas and CAS into one. So you can begin with the traditional head method to preview the CAS table.

tbl.head()

The SWAT head method returns five rows from the CAS server to the client as expected.

The Describe Method

Next, let's retrieve descriptive statistics of all numeric columns by using the familiar describe method on the CAS table.

tbl.describe()

The SWAT describe method returns the same descriptive statistics as the Pandas describe method. The only difference is that the SWAT version uses the CAS API to convert the describe method into CAS actions behind the scenes to process the data on the distributed CAS server. CAS processes the data and returns summarized results back to the client as a SASDataFrame, which is a subclass of the Pandas DataFrame. You can now work with the results as you would a Pandas DataFrame.

Summary CAS Action

Instead of using the familiar describe method, let's use a CAS action to do something similar. Here I'll use the summary CAS action.

tbl.summary()

Summary CAS Action

The results of the summary action return a CASResults object (Python dictionary) to the client. The CASResults object contains a single key named Summary with a SASDataFrame as the value. The SASDataFrame shows a variety of descriptive statistics.  While the summary action does not return exactly the same statistics as the describe method, it can provide additional insights into your data.

What if we don't want all the statistics for all of the data?

Selecting Columns and Summary Statistics with the Summary Action

Let's add additional parameters to the summary action. I'll add the inputs parameter to specify the columns to analyze in the CAS server.

tbl.summary(inputs = ['MPG_City','MPG_Highway'])

The results show only the MPG_City and MPG_Highway columns were analyzed.

Next, I'll use the subSet parameter to specify the summary statistics to produce. Here I'll obtain the MEAN, MIN and MAX.

tbl.summary(inputs = ['MPG_City','MPG_Highway'],
                       subSet = ['mean','min','max'])

The results processed only the MPG_City and MPG_Highway columns, and returned only the specified summary statistics to the client.

Creating a Calculated Column

Lastly, let's create a calculated column within the summary action. There are a variety of ways to do this. I like to add it as a parameter to the CASTable object. You can do that by specifying the tbl object, then computedVarsProgram parameter. Within computedVarsProgram you can use SAS assignment statements with most SAS functions. Here we will create a new column name MPG_Avg that takes the mean of MPG_City and MPG_Highway. Lastly, add the new column to the inputs parameter.

tbl.computedVarsProgram = 'MPG_Avg = mean(MPG_City, MPG_Highway);'
tbl.summary(inputs = ['MPG_City','MPG_Highway', 'MPG_Avg'],
                       subSet = ['mean','min','max'])

In the results I see the calculated column and requested summary statistics.

Summary

The SWAT package blends the world of Pandas and CAS. You can use many of the familiar Pandas methods within the SWAT package, or the flexible, highly optimized CAS actions like summary to easily obtain summary statistics of your data in the massively parallel processing CAS engine.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Index
SWAT API Reference
CAS Action Documentation
SAS® Cloud Analytic Services: Fundamentals
SAS Scripting Wrapper for Analytics Transfer (SWAT)
CAS Action! - a series on fundamentals
Execute the following code in SAS Studio to load the sashelp.cars table into memory

Getting Started with Python Integration to SAS® Viya® - Part 6 - Descriptive Statistics was published on SAS Users.

12月 112021
 

SAS Studio is a very common interface to use to work with SAS® Cloud Analytic Services (CAS) data. The Libraries tree in SAS Studio displays SAS libraries (librefs) that contain your SAS data sets. To view CAS data in the Libraries tree, you need to associate a libref with the CAS library (or caslib) that contains your CAS tables.

This article discusses how you can define librefs to be associated with caslibs. The terms “libref” and “caslib” might be new to you, so here is a quick explanation of these terms.

  • A SAS library reference, or libref, is a shortcut name that points to a storage location on the server where your SAS files are stored.
  • A CAS library name, or caslib, is a shortcut name that references an in-memory space that holds your CAS tables.

The LIBNAME statement enables you to create these types of shortcut names.

What you see in the Libraries tree

The Libraries tree in SAS Studio displays the defined librefs where you can select and view your data sets. In the following image, the Libraries tree displays the default libraries:




Any user-defined libraries would appear in this list as well. Note: No caslibs are displayed in this list.

To view caslibs in the Libraries tree, you need to associate a libref with each caslib.

Use the CASLIB _ALL_ ASSIGN; statement to associate all caslibs

After you establish a CAS session (by using the CAS statement), you can associate a libref with each of the defined caslibs by running the following statement:

caslib _all_ assign;

After you submit this code, you can see in the log that a libref was created for each caslib, and the name of the caslib is used as the name of the libref.

After you define a libref for each caslib, the Libraries tree is then updated to include these new librefs, as shown in the following image. The librefs that are associated with caslibs are identified by a cloud and snowflake icon that is next to the caslib name:



You can also use the following SAS Studio snippet to create a libref for each caslib:
SAS Snippets ► SAS Viya Cloud Analytic Services ► Generate SAS librefs for caslibs

Use the LIBNAME statement and CAS engine to associate specific caslibs

The CASLIB _ALL_ ASSIGN; statement defines a libref for each of your caslibs. However, if you do not want to include all caslibs in the Libraries tree, you can define librefs for only the caslibs that you want to include.

To do that, use the LIBNAME statement with the CAS engine to define a libref that is associated with a specific caslib. The following statement shows the syntax to define the libref:

libname libref cas caslib=caslib;

Here, libref is the name that you choose for the SAS libref, and caslib is the name of the assigned caslib that you want to associate with the libref.

When you use the CASLIB _ALL_ ASSIGN; statement, the caslib name is used for the libref name. But the libref name does not have to be the same as the caslib name.

For example, the following statement creates a libref named MYCAS that is associated with the CASUSER caslib:

libname MYCAS cas caslib=CASUSER;

Ensure that the caslib is already defined before you associate it with a libref.

If you are not sure whether a caslib is defined, the following code generates a list of the available defined caslibs. The list is generated on the Results tab in SAS Studio:

proc cas;
table.caslibInfo;
quit;

Note that you can also use the CASLIB _ALL_ LIST; statement to view the list of caslibs in the SAS Studio log.

Whether you use the table.caslibInfo action or the CASLIB _ALL_ LIST; statement is a matter of preference for where you want to view the results.

Additional options are supported for the LIBNAME statement and can be found in the CAS LIBNAME engine documentation.

Troubleshooting

Some librefs are not shown

What if you followed the above instructions and see some, but not all, of the librefs for your caslibs in the Libraries tree?

The rules for caslib names are not the same as the rules for librefs, so not all caslib names can be used as libref names.

For example, when you use the CASLIB _ALL_ ASSIGN; statement, you might see a note like the following written to the log:

NOTE: CASLIB SystemData for session CASAUTO will not be mapped to SAS Library SystemData. The CASLIB name is not valid for use as a libref.

This note is produced because the LIBNAME statement attempted to define a libref named SystemData to be associated with the caslib SystemData. However, a libref name is limited to eight characters, so SystemData could not be used as a libref name.

The name that is used for a SAS libref must conform to the rules for a SAS name.

If a caslib name is longer than eight characters, you can use the LIBNAME statement with the CAS engine to associate your caslib with a libref that meets the requirements for a SAS name.

For example, to define a libref for the SystemData caslib, you can use a shorter libref name, such as SYSDATA:

libname sysdata cas caslib=SystemData;

In the following image, the Libraries tree now includes the SYSDATA libref, which enables you to view the CAS tables that are stored in the SystemData caslib.



Another issue that you might encounter is that the name of the caslib is defined as a SAS name literal, such as containing character spaces in the caslib name, and you cannot associate a libref with that caslib. The CAS LIBNAME engine does not support a SAS name literal for a caslib name.

When you want to specify a name for a caslib, keep these SAS name rules in mind.

CAS tables are not shown

What if you defined librefs for each of the caslibs that you want to view in the Libraries tree, but some CAS tables are not displayed for a caslib?

If the name of a CAS table does not comply with the VALIDMEMNAME rules, then the table is not displayed in SAS Studio. SAS Note 63538 provides more information about this issue.

Librefs are not assigned automatically

What if you want to be able to see your caslibs in the Libraries tree in SAS Studio every time that you sign in? For this preference, you can edit your autoexec file in SAS Studio and include the statements to automatically define the librefs for your caslibs.

Follow these steps:

  1. Select Options ► Autoexec file.
  2. In the file, enter each of the statements that you want to run every time that you sign in. Here are examples:
cas;
caslib _all_ assign;

You can also include the LIBNAME statements for your caslibs as well as any other SAS statements that you want to run each time you sign in to SAS Studio.

  1. Click Run to submit the statements.
  2. Open the Log tab to verify that your statements ran without error.
  3. Click Save.

With each subsequent sign-in for SAS Studio, the statements from your autoexec file are run automatically, and you then can view the librefs for your CAS tables in the Libraries tree.

Conclusion

CAS libraries enable you to access your CAS data. By defining the librefs for those caslibs in SAS Studio, you can have more interactive access to the CAS data.

Learn more

How to view CAS tables in SAS® Studio was published on SAS Users.

12月 112021
 

SAS Studio is a very common interface to use to work with SAS® Cloud Analytic Services (CAS) data. The Libraries tree in SAS Studio displays SAS libraries (librefs) that contain your SAS data sets. To view CAS data in the Libraries tree, you need to associate a libref with the CAS library (or caslib) that contains your CAS tables.

This article discusses how you can define librefs to be associated with caslibs. The terms “libref” and “caslib” might be new to you, so here is a quick explanation of these terms.

  • A SAS library reference, or libref, is a shortcut name that points to a storage location on the server where your SAS files are stored.
  • A CAS library name, or caslib, is a shortcut name that references an in-memory space that holds your CAS tables.

The LIBNAME statement enables you to create these types of shortcut names.

What you see in the Libraries tree

The Libraries tree in SAS Studio displays the defined librefs where you can select and view your data sets. In the following image, the Libraries tree displays the default libraries:




Any user-defined libraries would appear in this list as well. Note: No caslibs are displayed in this list.

To view caslibs in the Libraries tree, you need to associate a libref with each caslib.

Use the CASLIB _ALL_ ASSIGN; statement to associate all caslibs

After you establish a CAS session (by using the CAS statement), you can associate a libref with each of the defined caslibs by running the following statement:

caslib _all_ assign;

After you submit this code, you can see in the log that a libref was created for each caslib, and the name of the caslib is used as the name of the libref.

After you define a libref for each caslib, the Libraries tree is then updated to include these new librefs, as shown in the following image. The librefs that are associated with caslibs are identified by a cloud and snowflake icon that is next to the caslib name:



You can also use the following SAS Studio snippet to create a libref for each caslib:
SAS Snippets ► SAS Viya Cloud Analytic Services ► Generate SAS librefs for caslibs

Use the LIBNAME statement and CAS engine to associate specific caslibs

The CASLIB _ALL_ ASSIGN; statement defines a libref for each of your caslibs. However, if you do not want to include all caslibs in the Libraries tree, you can define librefs for only the caslibs that you want to include.

To do that, use the LIBNAME statement with the CAS engine to define a libref that is associated with a specific caslib. The following statement shows the syntax to define the libref:

libname libref cas caslib=caslib;

Here, libref is the name that you choose for the SAS libref, and caslib is the name of the assigned caslib that you want to associate with the libref.

When you use the CASLIB _ALL_ ASSIGN; statement, the caslib name is used for the libref name. But the libref name does not have to be the same as the caslib name.

For example, the following statement creates a libref named MYCAS that is associated with the CASUSER caslib:

libname MYCAS cas caslib=CASUSER;

Ensure that the caslib is already defined before you associate it with a libref.

If you are not sure whether a caslib is defined, the following code generates a list of the available defined caslibs. The list is generated on the Results tab in SAS Studio:

proc cas;
table.caslibInfo;
quit;

Note that you can also use the CASLIB _ALL_ LIST; statement to view the list of caslibs in the SAS Studio log.

Whether you use the table.caslibInfo action or the CASLIB _ALL_ LIST; statement is a matter of preference for where you want to view the results.

Additional options are supported for the LIBNAME statement and can be found in the CAS LIBNAME engine documentation.

Troubleshooting

Some librefs are not shown

What if you followed the above instructions and see some, but not all, of the librefs for your caslibs in the Libraries tree?

The rules for caslib names are not the same as the rules for librefs, so not all caslib names can be used as libref names.

For example, when you use the CASLIB _ALL_ ASSIGN; statement, you might see a note like the following written to the log:

NOTE: CASLIB SystemData for session CASAUTO will not be mapped to SAS Library SystemData. The CASLIB name is not valid for use as a libref.

This note is produced because the LIBNAME statement attempted to define a libref named SystemData to be associated with the caslib SystemData. However, a libref name is limited to eight characters, so SystemData could not be used as a libref name.

The name that is used for a SAS libref must conform to the rules for a SAS name.

If a caslib name is longer than eight characters, you can use the LIBNAME statement with the CAS engine to associate your caslib with a libref that meets the requirements for a SAS name.

For example, to define a libref for the SystemData caslib, you can use a shorter libref name, such as SYSDATA:

libname sysdata cas caslib=SystemData;

In the following image, the Libraries tree now includes the SYSDATA libref, which enables you to view the CAS tables that are stored in the SystemData caslib.



Another issue that you might encounter is that the name of the caslib is defined as a SAS name literal, such as containing character spaces in the caslib name, and you cannot associate a libref with that caslib. The CAS LIBNAME engine does not support a SAS name literal for a caslib name.

When you want to specify a name for a caslib, keep these SAS name rules in mind.

CAS tables are not shown

What if you defined librefs for each of the caslibs that you want to view in the Libraries tree, but some CAS tables are not displayed for a caslib?

If the name of a CAS table does not comply with the VALIDMEMNAME rules, then the table is not displayed in SAS Studio. SAS Note 63538 provides more information about this issue.

Librefs are not assigned automatically

What if you want to be able to see your caslibs in the Libraries tree in SAS Studio every time that you sign in? For this preference, you can edit your autoexec file in SAS Studio and include the statements to automatically define the librefs for your caslibs.

Follow these steps:

  1. Select Options ► Autoexec file.
  2. In the file, enter each of the statements that you want to run every time that you sign in. Here are examples:
cas;
caslib _all_ assign;

You can also include the LIBNAME statements for your caslibs as well as any other SAS statements that you want to run each time you sign in to SAS Studio.

  1. Click Run to submit the statements.
  2. Open the Log tab to verify that your statements ran without error.
  3. Click Save.

With each subsequent sign-in for SAS Studio, the statements from your autoexec file are run automatically, and you then can view the librefs for your CAS tables in the Libraries tree.

Conclusion

CAS libraries enable you to access your CAS data. By defining the librefs for those caslibs in SAS Studio, you can have more interactive access to the CAS data.

Learn more

How to view CAS tables in SAS® Studio was published on SAS Users.

11月 042021
 

Think about what a modern implementation of SAS looks like for a customer. Programmers rely on robust environments to run the models and programs that answer business questions. These environments can be different for platforms like SAS® 9 and SAS® Viya®. They can be deployed across distributed servers, either on premises or using a cloud provider (sometimes both at the same time). These environments could even be set up across geographic regions for programmers across time zones. And we’re just thinking about the SAS servers—not counting data sources and third-party servers. All of these systems have their own suites of monitoring tools, which only show small slices of the big picture.

Observing all environments

SAS Enterprise Session Monitor aims to be the single point of contact for observing distributed systems. It brings unparalleled visibility to understanding environments using detailed system and application-level metrics for every session of SAS that is launched. This goes beyond traditional monitoring and into observability—aggregating, correlating, and analyzing a steady stream of constant data from systems to effectively troubleshoot or debug environments and sessions. Sessions in this case are those that come from SAS 9, SAS Viya, build servers, testing environments, and more. SAS Enterprise Session Monitor receives that data, displays it live in the tool, and stores the data for historical review in an embedded database.

SAS Enterprise Session Monitor is extensible and customizable: administrators can build patterns using regular expressions to track third-party sessions or custom in-house applications. If a process runs on a Windows or Linux server, SAS Enterprise Session Monitor can be configured to record metrics about it.

What metrics are collected?

SAS Enterprise Session Monitor collects and stores system metrics and logs, many monitoring tools do. Here is where things begin to get interesting, however: SAS Enterprise Session Monitor collects application-level metrics about SAS user sessions. The size of the SASWORK area is monitored and the amount of space in the CAS_DISK_CACHE. Users of SAS Enterprise Session Monitor are able see within DATA and PROC steps as code executes within SAS 9 or SAS Compute Server sessions. SAS Viya users can see the CAS actions that execute within their CAS sessions.

This information is presented in the form of spans which appear on a time-series graph along with session information such as CPU usage, memory usage and disk usage. This user activity is tracked for all user sessions, across all platforms. This code-level analysis can help to understand which SAS Procedures are used, which (and how frequently) datasets are opened, and which users are using the environments at different times.

Grand central admin-station

Administrators use SAS Enterprise Session Monitor to make sure their environments are stable and performant. Historical data can be used to profile workloads, charge back departments or help promote jobs between development, testing and production environments. Critical system resources are tracked to better understand when peak usage time is and to understand where resource constraints occur. This stored historical data can also be used for troubleshooting purposes, and all sessions and jobs can be searched for error events to help in problem analysis. Profiles of scheduled batch jobs can be graphed to see when large numbers of sequential programs could be redesigned to run in parallel. SAS Enterprise Session Monitor knows when distributed workloads should be linked together – in a SAS Grid or MPP CAS deployment.

Lower total cost of ownership

Administrators can use SAS Enterprise Session Monitor to accurately right-size their infrastructure with all the metrics collected — whether that is in the cloud or on premises. Accurate user counts and licensing can be determined for concurrent users in all distributed environments. And with accurate information coming in from distributed environments and multiple nodes, potential problems can be identified, and administrators can accelerate time to resolution and reduce system downtime on production or business-critical systems.

A drag-and-drop interface also allows for workloads from different teams to generate cost allocation rules so that costs can be charged back to departments depending on their usage of system resources. This allows for accurate tracking of cooperative resource sharing.

Empowering development teams

Developers (data scientists, analysts or programmers) use SAS Enterprise Session Monitor in real time to monitor or view progress of their code as it runs. This improves the developer experience and closes the feedback loop as they can see issues before something is promoted to production. Developers can use it to prioritize jobs and have insight into what is happening during their program execution.

This empowers individual programmers, as well as teams of developers: teams can be configured to have access to their other team members’ sessions in SAS Enterprise Session Monitor. Privileged users can also be configured to allow team leads or power users to terminate sessions and view SAS program logs from the SAS Enterprise Session Monitor interface in a secure and audited way.

Other tidbits

I mentioned how SAS Enterprise Session Monitor can analyze batch job flows, visualizing them into graphs that display total runtime and dependencies. Taking this a step further, the batch job flows can be viewed through Relative Comparisons — a feature where two defined time spans can be compared. Simply put, this means that one set of scheduled work can be compared to a previous run. This can give detailed information when evaluating whether to change a program or model, or when performing root-cause analysis of issues that impact the runtime of the scheduled work.

Lastly, developers can use real-time custom chart annotations that show up on the time-series graph. The %esmtag statement generates these annotations and can be used much like %put statements. These can be used as status checkpoints or observation counts, providing feedback in real time as the developer watches the program execute. These annotations are searchable in SAS Enterprise Session Monitor.

Summary

I hope you can feel my excitement about this tool and are able to see a few reasons to check this offering out — the potential for what can be monitored is almost endless. Here’s a quick recap:

    • Enterprise Session Monitor provides visibility into many different types of SAS workloads. Servers and microservices across multiple SAS 9 and SAS Viya environments can be monitored in one place. Even third-party tools and data sources can be monitored with a little customization.

    • Developers use it to close the feedback loop when developing new SAS programs.

    • Administrators use it to solve platform issues—through session management, live data and historical data about SAS processes and system resources.

Additional resources

SAS Enterprise Session Monitor documentation

Configuration and Usage of SAS Enterprise Session Monitor

SAS Enterprise Session Monitor - Obsessing over Observability was published on SAS Users.

10月 282021
 

From articles I've read on the web, it is clear that data is gold in the twenty-first century. Loading, enriching, manipulating and analyzing data is something in which SAS excels. Based on questions from colleagues and customers, it is clear end-users are willing to display data handled by SAS outside of the user interfaces bundled with the SAS software.

I recently completed a series of articles on the SAS Community library where I shed some light on different techniques for feeding web applications with SAS data stored in SAS Viya environment.  The series includes a discussion of options for extracting data, building a React application, how to build web applications using SAS Viya, SAS Cloud Analytic Service (CAS), SAS Compute Server, and SAS Micro Analytic Service (MAS).

I demonstrate the functionality and discuss project details in the video Develop Web Application to Extract SAS Data, found on the SAS Users YouTube Channel.

I'm tying everything together in this post as a reference point. I'll provide a link to each article along with a brief description. The Community articles have all the detailed steps for developing the application. I'm excited bring you this information, so let's get started.

Part 1 - Develop web applications series: Options for extracting data

In this first article, I explain when to use SAS Micro Analytic Service, SAS Viya Jobs, SAS Cloud Analytic Service, and SAS Compute Server.

Part 2 - Develop web applications series: Creating the React based application

To demonstrate the different options, in the second article, I create a simple web application using React JavaScript library. The application also handles authentication against SAS Viya. The application is structured in such a way to avoid redundant code and each component has a well-defined role. From here, we can build the different pages to access CAS, MAS, Compute Server or SAS Viya Jobs.

The image below offers a view of the application which starts in Part 2 and continues throughout the series..

Part 3 - Develop web applications series: Build a web application using SAS Viya Jobs

In this article, I drive you through the steps to retrieve data from the SAS environment using SAS Viya Jobs. We build out the Jobs tab and on the page, display two dropdown boxes to select a library and table. The final piece is a submit button to retrieve the data to populate a table.

Part 4 - Develop web applications series: Build a web application using SAS Cloud Analytic Service

In article number 4, we go through the steps to build a page similar to the one in the previous article, but this time the data comes directly from the SAS Cloud Analytic Service (CAS). We reuse the application structure which was created in Part 2. We focus on the CAS tab. As for the SAS Viya Jobs, we display two dropdown boxes to select a library and table. We finish again with a submit button to retrieve the data to populate a table.

Part 5 - Develop web applications series: Build a web application using SAS Compute Server

In the next article, we go through the steps to build a page similar to the ones from previous articles, but this time the data comes directly from the SAS Compute Server. We reuse the application structure created in this Part 2 article. The remainder of the article focuses on the Compute tab. As for the CAS content, we display two dropdown boxes to select a library and table. Finishing off again with the submit button to retrieve the data to populate a table.

Part 6 - Develop web applications series: Build a web application using SAS Micro Analytic Service

For the final article, you discover how to build a page to access data from the SAS Micro Analytic Service. We reuse the same basic web application built in Part 2. However, this time it will require a bit more preparation work as the SAS Micro Analytic Service (MAS) is designed for model scoring.

Bonus Material - SAS Authentication for ReactJS based applications

In this addendum to the series, I outline the authorization code OAuth flow. This is the recommended means of authenticating to SAS Viya and I provide technical background and detailed code.

Conclusion

If you followed along with the different articles in this series, you should now have a fully functional web application for accessing different data source types from SAS Viya. This application is not for use as-is in production. You should, for example add functionality to handle token expiration. You can of course tweak the interface to get the look and feel you prefer.

See all of my SAS Communities articles here.

Creating a React web app using SAS Viya was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

9月 282021
 

SQL is an important language for any programmer working with data. In SAS Cloud Analytic Services (CAS) you can execute SQL queries using the fedSQL.execDirect CAS action!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the

proc cas;
    fedSQL.execDirect / 
        query="select Make,
                      Model,
                      MSRP,
                      mean(MPG_City,MPG_Highway) as MPG_Avg
               from casuser.cars
               where Make='Toyota'
               order by MPG_Avg desc";
quit;

And the results:

The execDirect action executes the query in the distributed CAS environment and returns the expected results.

Using a SOURCE block

While the query we executed was simple, this is not always the case. Adding a complicated query as a string can make writing the query difficult.

Instead of specifying the query as a string in the CAS procedure, use the SOURCE statement to embed text in a variable. In the following example, I'll execute the same query as before. However, this time I'll nest the query inside a SOURCE block by specifying the SOURCE statement and name the variable MPG_toyota. Then I'll add the query inside the SOURCE block and use the ENDSOURCE statement to end the block.

proc cas;
    source MPG_toyota;
        select Make,
               Model,
               MSRP,
               mean(MPG_City,MPG_Highway) as MPG_Avg
        from casuser.cars
        where Make='Toyota'
        order by MPG_Avg desc;
    endsource;
 
    fedSQL.execDirect / query=MPG_toyota;
quit;

After the SOURCE block is complete, you can reference the variable as the value to the query parameter in the execDirect action.

And the results:

The results returned are the same, but using a SOURCE block makes the code easier to write and maintain.

Summary

In SAS Viya, FedSQL provides a scalable, threaded, high-performance way to query data and create new CAS tables from existing tables in the CAS server. In this example we saw two distinct ways to run SQL code on SAS Viya. This is only the beginning. See the resources below for more details on PROC FedSQL.

Resources

SAS® Viya®: FedSQL Programming for SAS® Cloud Analytic Services
FEDSQL Procedure
execDirect CAS Action
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing SQL in SAS Viya was published on SAS Users.

9月 232021
 

In my previous post CAS-Action! Simply Distinct - Part 1 I reviewed using the simple.distinct CAS action to explore distinct and missing values in a distributed CAS table.

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

And now, back to the distinct action. What if we want to do more? Maybe you want to create a CSV file that documents the percentage of distinct values in each column? Let's explore some possibilities.

To complete the task I'll break it down into four steps.

Step 1 - Find the number of rows in the CAS table

To find the number of rows in a CAS table use the simple.numRows CAS action. Let's execute the numRows action and store the results in a variable. I'll also PRINT and DESCRIBE the results to take a closer look at the output.

proc cas;
    simple.numRows result=n / table={name="cars",caslib="casuser"};
    describe n;
    print n;
...

And the results:
Output of the simple.numRows CAS action

The results of the DESCRIBE statement show the output of the action is a dictionary with a key named numRows and an integer as the value. The PRINT statement shows the value of the dictionary, numRows=428.

Now that we have the total number of rows, we can use that number in our calculation.

Step 2 - Find the number of distinct values in each column

Next, let's execute the distinct action and store the results in a variable named d. Then execute the PRINT statement to confirm the results.

...
    simple.distinct result=d /
            table={name="cars",caslib="casuser"};
    print d;
...

And the resluts:
Results of the simple.distinct CAS action
The results of the distinct action are as expected. Each column with the number of distinct values.

Step 3 - Create a calculated column that computes the percentage of distinct values

Now that we have the number of distinct values in each column, and the total number of rows in the CARS CAS table, we can calculate the total percent of distinct values in each column.

Consider the code:

...
    pctDistinct=d.Distinct.compute({"PctDistinct","Percent Distinct",percent7.2}, nDistinct/n.numRows)
                          [ , {"Column","NDistinct","PctDistinct"} ];
    print pctDistinct;
...

To add a calculated column to a result table use the compute operator. In the first argument, specify column metadata inside an array (column name, label and format). In the second argument, specify the expression. My expression nDistinct/n.numRows divides the distinct values in each column by the total number of rows in the CARS table.

After the compute operator, select specific rows and columns from the result table using bracket notation. Here I'll select all rows, and only the columns Column, nDistinct and PctDistinct.

Lastly, I used the PRINT statement to confirm the results.

New calculated result table

In the output we can see the new result table with the computed column.

Step 4 - Save the results table as a CSV file

Lastly, let's put it all together!

I'll add the code from the pervious steps, then save the table as a CSV file using the SAVERESULTS statement with the CSV= option.

%let outpath=/*specify output file location*/;
 
proc cas;
* Specify the CAS table *;
    casTbl={name="cars", caslib="casuser"};
 
* Store the number of rows in the CAS table *;
    simple.numRows result=n / table=casTbl;
 
* Store the number of distinct values in each column *;
    simple.distinct result=d / table=casTbl;
 
* Calculate the percentage of distinct values in each column *;
    pctDistinct=d.Distinct.compute({"PctDistinct","Percent Distinct",percent7.2}, nDistinct/n.numRows)
                          [ , {"Column","NDistinct","PctDistinct"} ];
 
* Save the result table as a CSV file *;
    saveresult pctDistinct csv="&outpath/pctDistinctCars.csv";
quit;

In the CSV= option I specified the outpath macro variable that contains the location of output folder, and add the name of the CSV file.

After executing the code the log indicates everything ran successfully, and a CSV file was created in the specified location. Next I'll find and open the CSV file.

My CSV file is located in my outfiles folder:

Then double click on the file to open it in SAS Studio:

Summary

The distinct action is a flexible and easy way to explore your data. It allows you to quickly explore your distributed CAS tables, then process and save the results in a variety of formats to fit your needs.

Additional Resources

distinct CAS action
CAS-Action! Simply Distinct - Part 1
CASL Result Tables
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Simply Distinct - Part 2 was published on SAS Users.

9月 152021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dive into exploring your data by viewing the number of distinct and missing values that exist in each column using the simple.distinct CAS action.

In this example, I will use the CAS procedure to execute the distinct action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax from other languages.

Determine the Number of Distinct and Missing Values in a CAS Table

To begin, let's use the simple.distinct CAS action on the CARS in-memory table to view the action's default behavior.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"};
quit;

In the preceeding code, I specify the CAS procedure, the action, then reference the in-memory table. The results of the call are displayed below.

The results allow us to quickly explore the CAS table and see the number of distinct and missing values. That's great, but what if you only want to see specific columns?

Specify the Columns in the Distinct Action

Sometimes your CAS tables contain hundreds of columns, but you are only interested in a select few. With the distinct action, you can specify a subset of columns using the inputs parameter. Here I'll specify the Make, Origin and Type columns.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        inputs={"Make","Origin","Type"};
quit;

After executing the code the results return the information for only the Make, Origin and Type columns.

Next, let's explore what we can do with the results.

Create a CAS Table with the Results

Some actions allow you to create a CAS table with the results. You might want to do this for a variety of reasons like use the new CAS table in a SAS Visual Analytics dashboard or in a data visualization procedure like SGPLOT.

To create a CAS table with the distinct action result, add the casOut parameter and specify new CAS table information, like name and caslib.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        casOut={name="distinctCars", caslib="casuser"};
quit;

After executing the code, the action returns information about the name and caslib of the new CAS table, and the number of rows and columns.

Visualize the Number of Distinct Values in Every Column

Lastly, what if you want to create a data visualization to better explore the table? Maybe you want to visualize the number of distinct values for each column? This task can be accomplished with variety of methods. However, since I know my newly created distinctCars CAS table has only 15 rows, I'll reference the CAS table directly using SGPLOT procedure.

This method works as long as the LIBNAME statement references your caslib correctly. I recommend this method when you know the CAS table is a manageable size. This is important because the CAS server does not execute the SGPLOT procedure on a distributed CAS table. The CAS server instead transfers the entire CAS table back to the client for processing.

To begin, the following LIBNAME statement will reference the casuser caslib.

libname casuser cas caslib="casuser";

Once the LIBNAME statement is correct, all you need to do is specify the CAS table in the DATA option of the SGPLOT procedure.

title justify=left height=14pt "Number of Distinct Values for Each Column in the CARS Table";
proc sgplot data=casuser.distinctCars
            noborder nowall;
    vbar Column / 
        response=NDistinct
        categoryorder=respdesc
        nooutline
        fillattrs=(color=cx0379cd);
    yaxis display=(NOLABEL);
    xaxis display=(NOLABEL);
quit;

The results show a bar chart with the number of distinct values for each column.

Summary

The simple.distinct CAS action is an easy way to explore a distributed CAS table. With one simple action, you can easily see how many distinct values are in each column, and the number of missing rows!

In Part 2 of this post, I'll further explore the simple.distinct CAS action and offer more ideas on how to interpret and use the results.

Additional Resources

distinct CAS action
SAS® Cloud Analytic Services: Fundamentals
Plotting a Cloud Analytic Services (CAS) In-Memory Table
Getting started with SGPLOT - Index
Code

CAS-Action! Simply Distinct - Part 1 was published on SAS Users.