Developers

1月 072022
 

Welcome to the sixth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to generate simple descriptive statistics of a CAS table.

Let's begin by confirming the cars table is loaded into memory. With a connection to CAS established, execute the tableInfo action to view available in-memory tables. If necessary, you can execute the following code in SAS Studio to load the sashelp.cars table into memory.

conn.tableinfo(caslib="casuser")

The results show the cars table is loaded into memory and available for processing. Next, reference the cars table in the variable tbl. Then use the print function to show the value of the variable.

tbl = conn.CASTable('cars', caslib='casuser')
print(tbl)
CASTable('cars', caslib='casuser')

The results show that the tbl variable references the cars table in the CAS server.

Preview the CAS Table

First things first. Remember, the SWAT package blends the world of Pandas and CAS into one. So you can begin with the traditional head method to preview the CAS table.

tbl.head()

The SWAT head method returns five rows from the CAS server to the client as expected.

The Describe Method

Next, let's retrieve descriptive statistics of all numeric columns by using the familiar describe method on the CAS table.

tbl.describe()

The SWAT describe method returns the same descriptive statistics as the Pandas describe method. The only difference is that the SWAT version uses the CAS API to convert the describe method into CAS actions behind the scenes to process the data on the distributed CAS server. CAS processes the data and returns summarized results back to the client as a SASDataFrame, which is a subclass of the Pandas DataFrame. You can now work with the results as you would a Pandas DataFrame.

Summary CAS Action

Instead of using the familiar describe method, let's use a CAS action to do something similar. Here I'll use the summary CAS action.

tbl.summary()

Summary CAS Action

The results of the summary action return a CASResults object (Python dictionary) to the client. The CASResults object contains a single key named Summary with a SASDataFrame as the value. The SASDataFrame shows a variety of descriptive statistics.  While the summary action does not return exactly the same statistics as the describe method, it can provide additional insights into your data.

What if we don't want all the statistics for all of the data?

Selecting Columns and Summary Statistics with the Summary Action

Let's add additional parameters to the summary action. I'll add the inputs parameter to specify the columns to analyze in the CAS server.

tbl.summary(inputs = ['MPG_City','MPG_Highway'])

The results show only the MPG_City and MPG_Highway columns were analyzed.

Next, I'll use the subSet parameter to specify the summary statistics to produce. Here I'll obtain the MEAN, MIN and MAX.

tbl.summary(inputs = ['MPG_City','MPG_Highway'],
                       subSet = ['mean','min','max'])

The results processed only the MPG_City and MPG_Highway columns, and returned only the specified summary statistics to the client.

Creating a Calculated Column

Lastly, let's create a calculated column within the summary action. There are a variety of ways to do this. I like to add it as a parameter to the CASTable object. You can do that by specifying the tbl object, then computedVarsProgram parameter. Within computedVarsProgram you can use SAS assignment statements with most SAS functions. Here we will create a new column name MPG_Avg that takes the mean of MPG_City and MPG_Highway. Lastly, add the new column to the inputs parameter.

tbl.computedVarsProgram = 'MPG_Avg = mean(MPG_City, MPG_Highway);'
tbl.summary(inputs = ['MPG_City','MPG_Highway', 'MPG_Avg'],
                       subSet = ['mean','min','max'])

In the results I see the calculated column and requested summary statistics.

Summary

The SWAT package blends the world of Pandas and CAS. You can use many of the familiar Pandas methods within the SWAT package, or the flexible, highly optimized CAS actions like summary to easily obtain summary statistics of your data in the massively parallel processing CAS engine.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Index
SWAT API Reference
CAS Action Documentation
SAS® Cloud Analytic Services: Fundamentals
SAS Scripting Wrapper for Analytics Transfer (SWAT)
CAS Action! - a series on fundamentals
Execute the following code in SAS Studio to load the sashelp.cars table into memory

Getting Started with Python Integration to SAS® Viya® - Part 6 - Descriptive Statistics was published on SAS Users.

11月 042021
 

Think about what a modern implementation of SAS looks like for a customer. Programmers rely on robust environments to run the models and programs that answer business questions. These environments can be different for platforms like SAS® 9 and SAS® Viya®. They can be deployed across distributed servers, either on premises or using a cloud provider (sometimes both at the same time). These environments could even be set up across geographic regions for programmers across time zones. And we’re just thinking about the SAS servers—not counting data sources and third-party servers. All of these systems have their own suites of monitoring tools, which only show small slices of the big picture.

Observing all environments

SAS Enterprise Session Monitor aims to be the single point of contact for observing distributed systems. It brings unparalleled visibility to understanding environments using detailed system and application-level metrics for every session of SAS that is launched. This goes beyond traditional monitoring and into observability—aggregating, correlating, and analyzing a steady stream of constant data from systems to effectively troubleshoot or debug environments and sessions. Sessions in this case are those that come from SAS 9, SAS Viya, build servers, testing environments, and more. SAS Enterprise Session Monitor receives that data, displays it live in the tool, and stores the data for historical review in an embedded database.

SAS Enterprise Session Monitor is extensible and customizable: administrators can build patterns using regular expressions to track third-party sessions or custom in-house applications. If a process runs on a Windows or Linux server, SAS Enterprise Session Monitor can be configured to record metrics about it.

What metrics are collected?

SAS Enterprise Session Monitor collects and stores system metrics and logs, many monitoring tools do. Here is where things begin to get interesting, however: SAS Enterprise Session Monitor collects application-level metrics about SAS user sessions. The size of the SASWORK area is monitored and the amount of space in the CAS_DISK_CACHE. Users of SAS Enterprise Session Monitor are able see within DATA and PROC steps as code executes within SAS 9 or SAS Compute Server sessions. SAS Viya users can see the CAS actions that execute within their CAS sessions.

This information is presented in the form of spans which appear on a time-series graph along with session information such as CPU usage, memory usage and disk usage. This user activity is tracked for all user sessions, across all platforms. This code-level analysis can help to understand which SAS Procedures are used, which (and how frequently) datasets are opened, and which users are using the environments at different times.

Grand central admin-station

Administrators use SAS Enterprise Session Monitor to make sure their environments are stable and performant. Historical data can be used to profile workloads, charge back departments or help promote jobs between development, testing and production environments. Critical system resources are tracked to better understand when peak usage time is and to understand where resource constraints occur. This stored historical data can also be used for troubleshooting purposes, and all sessions and jobs can be searched for error events to help in problem analysis. Profiles of scheduled batch jobs can be graphed to see when large numbers of sequential programs could be redesigned to run in parallel. SAS Enterprise Session Monitor knows when distributed workloads should be linked together – in a SAS Grid or MPP CAS deployment.

Lower total cost of ownership

Administrators can use SAS Enterprise Session Monitor to accurately right-size their infrastructure with all the metrics collected — whether that is in the cloud or on premises. Accurate user counts and licensing can be determined for concurrent users in all distributed environments. And with accurate information coming in from distributed environments and multiple nodes, potential problems can be identified, and administrators can accelerate time to resolution and reduce system downtime on production or business-critical systems.

A drag-and-drop interface also allows for workloads from different teams to generate cost allocation rules so that costs can be charged back to departments depending on their usage of system resources. This allows for accurate tracking of cooperative resource sharing.

Empowering development teams

Developers (data scientists, analysts or programmers) use SAS Enterprise Session Monitor in real time to monitor or view progress of their code as it runs. This improves the developer experience and closes the feedback loop as they can see issues before something is promoted to production. Developers can use it to prioritize jobs and have insight into what is happening during their program execution.

This empowers individual programmers, as well as teams of developers: teams can be configured to have access to their other team members’ sessions in SAS Enterprise Session Monitor. Privileged users can also be configured to allow team leads or power users to terminate sessions and view SAS program logs from the SAS Enterprise Session Monitor interface in a secure and audited way.

Other tidbits

I mentioned how SAS Enterprise Session Monitor can analyze batch job flows, visualizing them into graphs that display total runtime and dependencies. Taking this a step further, the batch job flows can be viewed through Relative Comparisons — a feature where two defined time spans can be compared. Simply put, this means that one set of scheduled work can be compared to a previous run. This can give detailed information when evaluating whether to change a program or model, or when performing root-cause analysis of issues that impact the runtime of the scheduled work.

Lastly, developers can use real-time custom chart annotations that show up on the time-series graph. The %esmtag statement generates these annotations and can be used much like %put statements. These can be used as status checkpoints or observation counts, providing feedback in real time as the developer watches the program execute. These annotations are searchable in SAS Enterprise Session Monitor.

Summary

I hope you can feel my excitement about this tool and are able to see a few reasons to check this offering out — the potential for what can be monitored is almost endless. Here’s a quick recap:

    • Enterprise Session Monitor provides visibility into many different types of SAS workloads. Servers and microservices across multiple SAS 9 and SAS Viya environments can be monitored in one place. Even third-party tools and data sources can be monitored with a little customization.

    • Developers use it to close the feedback loop when developing new SAS programs.

    • Administrators use it to solve platform issues—through session management, live data and historical data about SAS processes and system resources.

Additional resources

SAS Enterprise Session Monitor documentation

Configuration and Usage of SAS Enterprise Session Monitor

SAS Enterprise Session Monitor - Obsessing over Observability was published on SAS Users.

10月 282021
 

From articles I've read on the web, it is clear that data is gold in the twenty-first century. Loading, enriching, manipulating and analyzing data is something in which SAS excels. Based on questions from colleagues and customers, it is clear end-users are willing to display data handled by SAS outside of the user interfaces bundled with the SAS software.

I recently completed a series of articles on the SAS Community library where I shed some light on different techniques for feeding web applications with SAS data stored in SAS Viya environment.  The series includes a discussion of options for extracting data, building a React application, how to build web applications using SAS Viya, SAS Cloud Analytic Service (CAS), SAS Compute Server, and SAS Micro Analytic Service (MAS).

I demonstrate the functionality and discuss project details in the video Develop Web Application to Extract SAS Data, found on the SAS Users YouTube Channel.

I'm tying everything together in this post as a reference point. I'll provide a link to each article along with a brief description. The Community articles have all the detailed steps for developing the application. I'm excited bring you this information, so let's get started.

Part 1 - Develop web applications series: Options for extracting data

In this first article, I explain when to use SAS Micro Analytic Service, SAS Viya Jobs, SAS Cloud Analytic Service, and SAS Compute Server.

Part 2 - Develop web applications series: Creating the React based application

To demonstrate the different options, in the second article, I create a simple web application using React JavaScript library. The application also handles authentication against SAS Viya. The application is structured in such a way to avoid redundant code and each component has a well-defined role. From here, we can build the different pages to access CAS, MAS, Compute Server or SAS Viya Jobs.

The image below offers a view of the application which starts in Part 2 and continues throughout the series..

Part 3 - Develop web applications series: Build a web application using SAS Viya Jobs

In this article, I drive you through the steps to retrieve data from the SAS environment using SAS Viya Jobs. We build out the Jobs tab and on the page, display two dropdown boxes to select a library and table. The final piece is a submit button to retrieve the data to populate a table.

Part 4 - Develop web applications series: Build a web application using SAS Cloud Analytic Service

In article number 4, we go through the steps to build a page similar to the one in the previous article, but this time the data comes directly from the SAS Cloud Analytic Service (CAS). We reuse the application structure which was created in Part 2. We focus on the CAS tab. As for the SAS Viya Jobs, we display two dropdown boxes to select a library and table. We finish again with a submit button to retrieve the data to populate a table.

Part 5 - Develop web applications series: Build a web application using SAS Compute Server

In the next article, we go through the steps to build a page similar to the ones from previous articles, but this time the data comes directly from the SAS Compute Server. We reuse the application structure created in this Part 2 article. The remainder of the article focuses on the Compute tab. As for the CAS content, we display two dropdown boxes to select a library and table. Finishing off again with the submit button to retrieve the data to populate a table.

Part 6 - Develop web applications series: Build a web application using SAS Micro Analytic Service

For the final article, you discover how to build a page to access data from the SAS Micro Analytic Service. We reuse the same basic web application built in Part 2. However, this time it will require a bit more preparation work as the SAS Micro Analytic Service (MAS) is designed for model scoring.

Bonus Material - SAS Authentication for ReactJS based applications

In this addendum to the series, I outline the authorization code OAuth flow. This is the recommended means of authenticating to SAS Viya and I provide technical background and detailed code.

Conclusion

If you followed along with the different articles in this series, you should now have a fully functional web application for accessing different data source types from SAS Viya. This application is not for use as-is in production. You should, for example add functionality to handle token expiration. You can of course tweak the interface to get the look and feel you prefer.

See all of my SAS Communities articles here.

Creating a React web app using SAS Viya was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

8月 072021
 

I'm excited to curate a series of posts focused on CAS Actions. However, before I dive into details and code, I thought I'd take a moment to lay a foundation for a better understanding of CAS and CAS actions. I'll cover things here at a high level as there's quite a bit of information out there already. Please see the Additional Resources section at the end of this article for more details.

CAS actions are highly optimized units of work for SAS Viya's Cloud Analytics Services (CAS) distributed computing engine. Think of CAS actions as powerful functions or methods created specifically to process data in the CAS server. Actions can load data, manage tables, perform general analytics, statistics, and machine learning, as well as execute DATA step, FedSQL, DS2 and more. Did I mention they can do a lot?

Before we dive into details about actions, let's quickly review the CAS server in SAS Viya.

CAS Server

First, the CAS Server. It provides cloud-based, run-time environment for data management and analytics in SAS Viya. You process data in all stages of the analytic life cycle using the power of distributing computing. To process data in the CAS server, you must first load data into memory from a data source.

Data Sources

Data sources connect to the CAS server through caslibs. Caslibs can access a wide variety of data sources including relational databases, streaming data, SAS data sets, unstructured data, and familiar file formats, such as XML, JSON, CSV, and XLSX.

CAS Actions

Once you have data available in the CAS server you can begin to process it using CAS actions. Actions are the native language of the CAS server. All actions are aggregated with other actions in action sets. You can think of an action set as a package, and the actions inside an action set as methods. There are dozens of CAS action sets and hundreds of CAS actions available.

Executing Actions

There are a variety of interfaces available to execute CAS actions. One method is using the native CAS Language (CASL). CASL is a statement based scripting language. With CASL you can use general programming logic to execute actions, work with the results, and develop analytic pipelines. You can also execute actions through the CAS API using languages like SAS, FedSQL, Python, R, Java, Lua and REST.

Using Familiar Language Methods

One major benefit of CAS actions is the CAS API allows you to execute familiar syntax in your language of choice. When using familiar methods, the CAS API will convert them into actions behind the scenes.

For example, to retrieve the first ten rows of a CAS table you can use the SAS PRINT procedure, the Python head method, or the R head function. All of these methods will convert to the fetch action behind the scenes through the CAS API. Many familiar procedures, methods and functions are available. See the SAS Viya Programming for Developers documentation for more details about specific languages.

NOTE: THE SWAT package is required to use Python and R with the CAS server.

Executing Actions in a Variety of Languages

Another benefit of CAS actions is that you can directly execute actions from a variety of languages. This allows for all types of programmers to work together using common actions. For example, you can execute the same fetch action in SAS, R and Python.

NOTE: THE SWAT package is required to use Python and R with the CAS server.

SAS Viya Applications

Lastly, SAS Viya has a many applications that can work with data in the CAS server, These applications provide a variety of functionality. From point and click to programming, executing actions behind the scenes.

Summary

Hopefully, I've provide enough of a base for CAS, CAS actions, and CAS action sets for you to get started. Here's what you need to know to move forward:

  • SAS Viya's CAS server is a powerful distributed comping engine that processes big data fast.
  • CAS actions are the native language of the CAS server.
  • CAS actions can be executed in a variety of languages like CASL, SAS, Python, R, Lua, Java and REST.
  • SAS Viya has additional applications to work with CAS data.

Additional Resources

SAS® Cloud Analytic Services: Fundamentals
The Architecture of the SAS® Cloud Analytic Services in SAS® Viya™
What is CASL?
SAS Viya Programming examples on GitHub
Using SAS Cloud Analytics Service REST APIs to run CAS Actions
SAS Developers Portal
SAS Developers Community

CAS Actions and Action Sets - a brief intro was published on SAS Users.

8月 072021
 

I'm excited to bring a series of posts centered on CAS Actions. The topics in the list below will be covered in posts set to publish in the next few weeks. Please check back often for new releases.

CAS Actions and Action Sets - a brief intro
CAS-Action! fetch CAS, fetch! - Part 1
CAS-Action! fetch CAS, fetch! - Part 2
CAS-Action! Show me the ColumnInfo!
CAS-Action! Simply Distinct - Part 1
CAS-Action! Simply Distinct - Part 2
CAS-Action! executing SQL in CAS

CAS Action! - a series on fundamentals was published on SAS Users.

6月 232021
 

Welcome to the fifth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, how to execute CAS actions, how to work with the results, and how your data is organized on the CAS server. In this post I'll discuss loading files into the CAS server. Before proceeding, I need to clear up some terminology. There are two sources for loading files into memory, server-side and client-side. It's important to understand the difference.

Server-Side vs Client-Side

Let's start off with an image depicting server- and client-side file locations.

Server-side files are mapped to a caslib. That is, files in the data source portion of a caslib are server-side. For more infomation on data sources, see Part 4 of this series. Client-side files are files that are not mapped to a caslib. These could be files stored in a SAS library or other local files. That's about it!

In this post, I'll focus on loading server-side files into memory (Part 6 of the series will cover client-side files). In Python, we have two data loading options, the loadTable CAS action, or the load_path method. The techniques are almost identical. The difference is the loadTable action returns a dictionary of information and the load_path method returns a CASTable. If you use the loadTable action you must reference a new table in a separate step. In general, I prefer the actions as they are available in any language such as CASL, R, and Python. Conversely, methods like load_path are specific to a language.

Let's look at a few examples of loading different types of datafiles into CAS.

Loading a SAS Data Set into Memory

In the first example, I want to load the cars.sas7bdat SAS data set from the casuser caslib into memory. Let's confirm the table exists in the caslib by executing the fileInfo action. I made my connection to CAS, naming it conn. For more information about making a connection to CAS, visit Part 1 of the series.

conn.fileInfo(caslib="casuser")

In the results I can see that the cars.sas7bdat file is available in the casuser caslib.

Next, I'll use the loadTable action. First I'll specify the path parameter with the full file name, then the caslib parameter specifying the input caslib, and lastly the casOut parameter to specify the new CAS table. In the casOut parameter you specify a dictionary of key value pairs. Here, I want to name the new table cars_sas and place the table in the casuser caslib.

conn.loadTable(path="cars.sas7bdat", caslib="casuser",
               casOut={
                       "name":"cars_sas",
                       "caslib":"casuser"
               })

After executing the loadTable action, the result returns a dictionary of information about the new table.

Now that the file is loaded into memory, I can process the data by making a reference to the in-memory table using the CASTable method. Then I'll use the head method to view the first 5 rows.

cars = conn.CASTable("cars_sas",caslib="casuser")
cars.head()

Loading a CSV file into Memory

In the next example, I'll load a CSV file into memory. Even though it's a different file type, we still use the loadTable action. The only change needed is the file name in the path parameter. That's it!

conn.loadTable(path="cars.csv", caslib="casuser",
           casOut={
                "name":"cars_csv",
                "caslib":"casuser"
           })
 
csv = conn.CASTable("cars_csv", caslib="casuser")
csv.head()

 

Loading a Text file into Memory

Lastly, data is not always stored in the correct format. Sometimes when loading files you need to modify the default parameters. In this final scenario, let's explore an example using the importOptions parameter. importOptions takes a variety of key value pairs as the values to dictate how to load the file. The options vary depending on the file type.

The file in this example contains no column names, and the delimiter is a vertical bar (or pipe).

I'll begin with specifying the column names in a variable named colNames. Then, in the loadTable action I'll add the importOptions parameter and specify the following:

  • file type I want to import, using the fileType subparameter,
  • the delimiter, using the delimiter subparameter,
  • the getNames subparameter with the value False, since no column names exist,
  • and lastly, the vars subparameter with the names of the columns.

And as in the previous examples, I'll print out the first five lines of the resulting table from memory.

colNames=['Make', 'Model', 'Type', 'Origin', 'DriveTrain', 'MSRP', 'Invoice',
          'EngineSize', 'Cylinders', 'Horsepower', 'MPG_City', 'MPG_Highway',
          'Weight', 'Wheelbase', 'Length']
 
conn.loadTable(path="cars_delim_bar.txt", caslib="casuser",
           casOut={
                "name":"cars_text",
                "caslib":"casuser"
           },
           importOptions={"fileType":"DELIMITED",
                          "delimiter":"|",
                          "getNames":False,
                          "vars":colNames}
          )
 
csv = conn.CASTable("cars_text", caslib="casuser")
csv.head()

Summary

The loadTable action is flexible and an easy way to load tables into memory. Benefits of the loadTable action include:

  • it's filetype agnostic
  • has many options available based on the file types being imported into memory
  • available in any CAS-compatible language like Python, R and CASL with slight changes to the syntax based on the language used

 

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 5 - Loading Server-Side Files into Memory was published on SAS Users.

6月 232021
 

Welcome to the fifth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, how to execute CAS actions, how to work with the results, and how your data is organized on the CAS server. In this post I'll discuss loading files into the CAS server. Before proceeding, I need to clear up some terminology. There are two sources for loading files into memory, server-side and client-side. It's important to understand the difference.

Server-Side vs Client-Side

Let's start off with an image depicting server- and client-side file locations.

Server-side files are mapped to a caslib. That is, files in the data source portion of a caslib are server-side. For more infomation on data sources, see Part 4 of this series. Client-side files are files that are not mapped to a caslib. These could be files stored in a SAS library or other local files. That's about it!

In this post, I'll focus on loading server-side files into memory (Part 6 of the series will cover client-side files). In Python, we have two data loading options, the loadTable CAS action, or the load_path method. The techniques are almost identical. The difference is the loadTable action returns a dictionary of information and the load_path method returns a CASTable. If you use the loadTable action you must reference a new table in a separate step. In general, I prefer the actions as they are available in any language such as CASL, R, and Python. Conversely, methods like load_path are specific to a language.

Let's look at a few examples of loading different types of datafiles into CAS.

Loading a SAS Data Set into Memory

In the first example, I want to load the cars.sas7bdat SAS data set from the casuser caslib into memory. Let's confirm the table exists in the caslib by executing the fileInfo action. I made my connection to CAS, naming it conn. For more information about making a connection to CAS, visit Part 1 of the series.

conn.fileInfo(caslib="casuser")

In the results I can see that the cars.sas7bdat file is available in the casuser caslib.

Next, I'll use the loadTable action. First I'll specify the path parameter with the full file name, then the caslib parameter specifying the input caslib, and lastly the casOut parameter to specify the new CAS table. In the casOut parameter you specify a dictionary of key value pairs. Here, I want to name the new table cars_sas and place the table in the casuser caslib.

conn.loadTable(path="cars.sas7bdat", caslib="casuser",
               casOut={
                       "name":"cars_sas",
                       "caslib":"casuser"
               })

After executing the loadTable action, the result returns a dictionary of information about the new table.

Now that the file is loaded into memory, I can process the data by making a reference to the in-memory table using the CASTable method. Then I'll use the head method to view the first 5 rows.

cars = conn.CASTable("cars_sas",caslib="casuser")
cars.head()

Loading a CSV file into Memory

In the next example, I'll load a CSV file into memory. Even though it's a different file type, we still use the loadTable action. The only change needed is the file name in the path parameter. That's it!

conn.loadTable(path="cars.csv", caslib="casuser",
           casOut={
                "name":"cars_csv",
                "caslib":"casuser"
           })
 
csv = conn.CASTable("cars_csv", caslib="casuser")
csv.head()

 

Loading a Text file into Memory

Lastly, data is not always stored in the correct format. Sometimes when loading files you need to modify the default parameters. In this final scenario, let's explore an example using the importOptions parameter. importOptions takes a variety of key value pairs as the values to dictate how to load the file. The options vary depending on the file type.

The file in this example contains no column names, and the delimiter is a vertical bar (or pipe).

I'll begin with specifying the column names in a variable named colNames. Then, in the loadTable action I'll add the importOptions parameter and specify the following:

  • file type I want to import, using the fileType subparameter,
  • the delimiter, using the delimiter subparameter,
  • the getNames subparameter with the value False, since no column names exist,
  • and lastly, the vars subparameter with the names of the columns.

And as in the previous examples, I'll print out the first five lines of the resulting table from memory.

colNames=['Make', 'Model', 'Type', 'Origin', 'DriveTrain', 'MSRP', 'Invoice',
          'EngineSize', 'Cylinders', 'Horsepower', 'MPG_City', 'MPG_Highway',
          'Weight', 'Wheelbase', 'Length']
 
conn.loadTable(path="cars_delim_bar.txt", caslib="casuser",
           casOut={
                "name":"cars_text",
                "caslib":"casuser"
           },
           importOptions={"fileType":"DELIMITED",
                          "delimiter":"|",
                          "getNames":False,
                          "vars":colNames}
          )
 
csv = conn.CASTable("cars_text", caslib="casuser")
csv.head()

Summary

The loadTable action is flexible and an easy way to load tables into memory. Benefits of the loadTable action include:

  • it's filetype agnostic
  • has many options available based on the file types being imported into memory
  • available in any CAS-compatible language like Python, R and CASL with slight changes to the syntax based on the language used

 

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 5 - Loading Server-Side Files into Memory was published on SAS Users.

6月 182021
 

Welcome to the fourth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, how to execute CAS actions, and how to work with the results. Now it's time to understand how your data is organized on the CAS server. To understand how the data is catalogued you must understand caslibs.

So the big question is, "What exactly is a caslib?"

Caslib Overview

Let's start with an image as an introduction to caslibs.

A caslib contains connection information about a data source and an in-memory space to process data. A caslib also contains session information and a variety of access controls for users and scope. In this post I'll focus on exploring the in-memory and data source portion of a caslib.

  • The data source portion holds information about the caslib such as the path that holds a variety of data files like xlsx, sas7bdat, csv, txt, or sashdat. It can also represent a database connection.
  • The in-memory portion contains data source files loaded into memory and available for distributed processing as CAS tables.

For detailed information about caslibs, visit the SAS documentation.

Exploring Caslibs

Let's look at an example. You can follow along by logging into SAS Viya and making a connection to the CAS server with your Python client. I will be using SAS Viya for Learners (using the Jupyter Notebook option), and have already made my connection to CAS and named my connection conn. For more information about making a connection to CAS, visit Part 1 of the series.

After making a connection to CAS, use the caslibInfo CAS action to view the available caslibs.

conn.caslibInfo()

The result of the caslibInfo action returns a table, as seen above. The table includes the name of each caslib, it's type, a description, and the path. The table also includes a caslib metadata:

  • The Local column indicates the scope of the caslib. A 0 represents a global caslib, and a 1 is local.
  • The Active column indicates if the caslib is active. A 1 represents an active caslib. If a CAS action has no caslib specified, the active caslib is used.
  • The Personal column indicates if the caslib is only available to you. By default, all users get a casuser caslib, only accessible by that user.

View Available Data Source Files in a Caslib

Next, let's view the data source files in the casuser caslib. Data source files are physical files available to the CAS server. These are known as server-side files since they are associated with a caslib.

To view the available data source files, use the fileinfo action with the caslib parameter. Note, if you do not use the caslib parameter, the fileinfo action uses the active caslib.

conn.fileInfo(caslib="casuser")

The results of the fileInfo action show that my casuser caslib's data source portion contains csv, txt and sas7bdat files.

View Available In-Memory Tables in a Caslib

Now, let's see if any tables are loaded into memory. To view available in-memory tables, use the tableinfo action with the caslib parameter.

conn.tableinfo(caslib="casuser")

The results of the tableInfo action show one table named CARS loaded into memory. Once a table is loaded into the in-memory portion, you can begin processing the data. Here I'll make a reference the CARS CAS table using the CASTable method. Then I'll use the head method to view the first 5 rows of the table.

cars=conn.CASTable("CARS", caslib="casuser")
cars.head()

The results of the head method show the first 5 rows of the CARS table.

Summary

In conclusion, the most important concept to understand about caslibs are the two main areas: data source and in-memory. The former contains connection information to a data source and the latter refers to CAS tables available for processing. Using the actions above, you can easily explore the data in your environment.

In Part V of the series, we'll look at loading data into memory. Stay tuned.

Additional and related resources:

Getting Started with Python Integration to SAS® Viya® - Part 4 - Exploring Caslibs was published on SAS Users.