Tips and Tricks

8月 012022
 

Recently, the SAS Press team moved to a new building on the SAS campus. And when the SAS Press team moves, we bring a lot of books with us! Packing and organizing all of our books gave us a chance to appreciate all of our authors’ hard work during the more than 30 years that SAS Press has existed.

One author has an outsized presence on the SAS Press bookshelves – Ron Cody. He has written over a dozen books that include some of the most popular titles for new SAS users. He taught statistics at the Rutgers Robert Wood Johnson Medical School for many years and is a frequent presenter at SAS conferences.

Few people know more about SAS than Ron, which made him the perfect person to be our first SAS Press Author of the Month. During August, you can get special deals on all of Ron's books at RedShelf. Find out more information on his author page. Ron is also hosting a free webinar on data cleaning tips and tricks on August 11th.

We recently asked Ron to share a little bit about his journey as an author and teacher. As you might imagine, Ron has a lot of SAS knowledge and advice to share.

Ron Cody's books on a bookshelf.

The Cody section of the SAS Press bookshelf.

Q: When did you decide to write your first book?

I decided to write my first SAS book (Applied Statistics and the SAS Programming Language) in 1985. It was published by Elsevier Publishing, which was later bought by Prentice Hall. At the time I was writing the book, there were no other books out there—just SAS manuals. This book is still in print (in a fifth edition).

Q: What made you decide to keep writing more books about SAS?

Once I realized I could write well enough to get published, I got the "writer's bug." Although writing a book is hard work, the reward is substantial. My motivation is more about teaching people about SAS and statistics rather than the monetary reward. My goal in writing any book is to make enough money to take my wife to dinner!

Q: Is one of your books your favorite? Or do you love them all equally?

I do have some favorites. I would put Learning SAS by Example as one. It contains a section for beginners as well as later chapters that are useful to intermediate, or even advanced SAS programmers. I particularly like my latest book, A Gentle Introduction to Statistics Using SAS Studio in the Cloud. I think I made statistical concepts accessible to non-mathematically minded people. There are only a few equations in the entire book.

Q: As a teacher, how did you encourage students who were having a hard time understanding statistics?

I try really hard to convince them that the statistical concepts really make sense, and it is sometimes the terminology that gets in the way.

Q: What do you think students struggle with the most when they are learning SAS?

I believe the overwhelming richness of SAS can intimidate a beginning programmer. That's why I start from simple examples and explain step-by-step how each program works.

Q: What is your best advice for someone who wants to learn SAS?

Buy all of my books, of course! Just kidding. There are many YouTube videos and other online resources that are useful. I think two of the best books would be The Little SAS Book, and either Learning SAS by Example (for the more serious student) or Getting Started with SAS Programming Using SAS Studio in the Cloud. The latter book is more suited to someone using SAS OnDemand for Academics.

Q: You recently published a memoir about your time as an EMT. How did it feel to reflect back on that time of your life? Any more memoirs in your future?

I thoroughly enjoyed writing a book about my years as a volunteer EMT (10-8 Awaiting Crew: Memories of a Volunteer EMT). I was fortunate that I had kept a journal and recorded details of some of the more interesting and exciting calls. As of now, I do not have another memoir in my future, but I am working on a nonfiction novel. I'm not sure how successful it will be, but I'm going to give it a try. I strive to write almost every day. I tell other beginning authors, that if you spend a few hours writing every day, it becomes much easier. I call it "getting in the groove."


Ron Cody sitting in a chair with a cat in his lap. Ron is wearing a shirt that says "using Statistics to prove a point is just MEAN."

Ron and Dudley. Credit: Jan Cody

Thanks for talking to us, Ron! We hope you will join us in celebrating Ron's accomplishments as our Author of the Month. Check out his author page to see special deals on all of Ron’s books during August and to discover more resources.

 

Meet our SAS Press Author of the Month – Ron Cody was published on SAS Users.

6月 302022
 

Welcome to the seventh installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to learn how to filter CAS tables.

Load and explore data

First, I'll load the cars.csv file into memory from the sassfotware GitHub page. I'll start by using my CAS connection object conn, followed by the upload_file method from the SWAT package. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. The data in this example is small, for training purposes. Processing data in the CAS server is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv',
      casout={'name':'cars', 'caslib':'casuser'})

The response from the command is minimal, but there's much going on. Specifically:

  • Cloud Analytic Services renders the uploaded file available as table CARS in caslib CASUSER(Peter).
  • The table CARS is created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.

Next, I'll view the type and value of the tbl variable to confirm it's a CASTable object that references the CARS table in the CAS server.

display(type(tbl), tbl)
# and the command results:
swat.cas.table.CASTable
CASTable('CARS', caslib='CASUSER(Peter)')

The results show tbl is a CASTable object and references the CARS table in the CAS server. Now, when I execute methods on the tbl object, the processing occurs in CAS.

Further, let's preview the CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame as expected.

Finally, I'll execute the SWAT shape attribute to view the number of rows and columns in the CAS table.

tbl.shape
# and the results
(428, 15)

Notice, the CARS CAS table contains 428 rows and 15 columns.

Filter a CAS Table

Now that we have created the table in the caslib and are familiar with the data, let's learn how to filter CAS tables!

Python Technique

First, I'll start with the traditional Pandas filtering technique. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. This will filter the CAS table for rows where the Make of the car is Acura or Toyota,

makeFilter = (tbl['Make']=='Acura') | (tbl['Make']=='Toyota')
tbl[makeFilter].shape
# and the results
(35, 15)

The results show that 35 rows in the CARS CAS table have the Make Toyota or Acura.

Query Method

You can also filter a CAS table using the query method in the SWAT package. Wait a minute, does this look familiar? I mentioned SWAT blends the world of Pandas as SAS. I'll again filter for a Make of Toyota or Acura .

tbl.query("Make = 'Acura' or Make = 'Toyota'").shape
# and the results
(35, 15)

Notice, the results are the same.

isin Method

All right. Sorry, but again I'll show you another familiar method from the SWAT package. Since I am filtering for specific categories, I can also use the SWAT isin method, which works similarly to the Pandas isin method!

tbl[tbl.Make.isin(['Acura', 'Toyota'])].shape
# and the results
(35, 15)

Notice, again the results are the same.

CAS Table WHERE Parameter

Lastly, I'll show you a specific CAS technique, which is very efficient if you need to filter data for multiple methods or actions. To begin, let's display the type and value of the tbl object again.

display(type(tbl), tbl)
# and the results
swat.cas.table.CASTable
CASTable('CARS', caslib='CASUSER(Peter)')

Notice the tbl variable is a CASTable object, which is a reference to a CAS table in the CAS server, not an actual table. In this example, it references the CARS table in the Casuser caslib.

The CASTable object enables you to add a variety of parameters to the object to filter, create columns, group and more. In this example, I'll add the where parameter to the CASTable object to filter for car makes of Toyota and Acura. Then I'll display the value of the CASTable object again.

tbl.where = 'Make = "Acura" or Make = "Toyota"'
display(tbl)

CASTable('CARS', caslib='CASUSER(Peter)', where='Make = "Acura" or Make = "Toyota"')

Notice the CASTable still references the CARS table in the Casuser caslib, but the where parameter has been added to the object. The where parameter is applied when the object is used with an action or method.

Let's check it out. Here I'll specify the tbl variable (the CASTable object) then the shape attribute.

tbl.shape
# and the results
(35, 15)

Notice the results again show 35 cars are either Toyota or Acura.

Now, what is nice about using this method is if I want to continue to use this filter with other methods or actions, all I need to do is use the tbl object. Perhaps you want the value counts of each Make? The SWAT package has the value_counts method available.

tbl.Make.value_counts()
# and the results
Toyota    28
Acura      7

The response displays the counts only for rows where Make is Acura or Toyota. This occurs because the where parameter is applied to the CAS table when the value_counts method is executed.

Delete the WHERE parameter

Lastly, if you want to delete the where parameter from the CASTable object, use the del_params method.

tbl.del_params('where')
display(tbl)
# and the results
CASTable('CARS', caslib='CASUSER(Peter)')

Once the parameter is deleted you can revert back to analyzing the entire table. Here I'll use the shape attribute again to view how many rows are in the original CAS table.

tbl.shape
# and the results
(428, 15)

Summary

SAS Viya offers various filtering options for CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 7 - Filtering CAS Tables was published on SAS Users.

5月 312022
 

If you use formatted variables in a CLASS statement in procedures such as MEANS or UNIVARIATE, SAS will use the formatted values of those variables and not the internal values. For example, consider you have a data set (Health) with variables Subj, Age, Weight, and Height. You want to see the mean Weight and Height for three age groups: 0 to 40, 41-70, and greater than 70. A program to create the Health data set and compute the means is shown below.

Data Health;
   input Subj $ Age Weight Height;
datalines;
001 23 120 72
002 35 200 80
003 77 98 63
004 51 230 75
005 15 98 54
006 44 160 68
007 83 111 65
008 10 . 45
009 60 200 .
;
proc format;
   value AgeGrp 0-40 = '0 to 40'
                41-70 = '41 to 70'
                71-high = '>70';
run;
 
proc means data=Health n nmiss mean;
   class Age;
   format Age AgeGrp.;
   var Weight Height;
run;

Here is the output:

Notice that there was no need to run another DATA step and create a variable representing age ranges. Not only is this an easier way to get these results—it is also much more efficient.

The other day, I received an email from a person and he posed the following question. How can I see the mean of the variable Sales (in the SASHELP data set Shoes) for three ranges of Sales: 0 to 100,000; 100,000 to 200,000; 200,000 and above?

My first thought was to create a variable (called Range) and then use that variable in a CLASS statement with PROC MEANS. However, after some thought, I wondered if I could use the Sales variable in the CLASS statement (and supply a format) and use the same variable in the VAR statement. As is often the case, the best way to answer a SAS programming question is to write a short program and see what happens. Here is what I suggested:

proc format;
  value SalesRange 0 - <100000 = 'Lower'
              100000 - <200000 = 'Middle'
              200000 - high = 'Upper';
run;
 
proc means data=sashelp.Shoes n nmiss mean std;
   class Sales;  
   format Sales SalesRange.;
   var Sales;
run;

Here is the output:

As you can see from the output, it worked just fine.

Before we leave this topic, let's use the first example, referencing the Health data set, to demonstrate a Multi-Label Format. Suppose you want to see means of Weight and Height for two different ranges of Age. One range is the original 0–40, 41–70, and 70+; the other is 0–50 and 51+. Here is a way to accomplish this using a single format:

proc format;
   value AgeGrp (multilabel) 0-40 = '0 to 40'
                             41-70 = '41 to 70'
                             71-high = '>70'
 
                             0-50 = '0 to 50'
                             51-high = '>50';
run;
 
proc means data=Health n nmiss mean;
   class Age / MLF;
   format Age AgeGrp.;
   var Weight Height;
run;

You need to add the option "multilabel" following the format name on the VALUE statement.  Without this option, you will get an error message if you have overlapping ranges. Next, use the option MLF (multi-label format) on the CLASS statement. Here is the output:

You now have the mean Weight and Height for two separate ranges of Age.

Consider using a formatted variables in CLASS statements for those procedures that allow this statement. It will save you time and reduce CPU time as well.

If you liked this blog, then you might like my latest book: Cody’s Collection of SAS Blogs. It gathers all of my best tips and tricks from my blogs in one place for easy reference.

Using formatted CLASS variables was published on SAS Users.

4月 202022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. This post builds upon CAS-Action! Create Columns in CAS Tables - Part 1 by showing how to add formats and modify the length of computed columns.

I'll start by building off the following code where I create three new columns in the PRODUCTS table:

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the results:

Notice in the results the Total_Price column does not have a format, and the Return_fix column truncates the value Yes. Let's fix these issues.

Add Formats to a Computed Column

First, I'll add a format to the Total_Price computed column. To add formats to a computed column use the computedVars sub-parameter. ComputedVars enables you to specify column metadata (labels and formats) for computed columns. The parameter expects a list of dictionaries, even if one column is specified.

For example, in the computedVars sub-parameter I'll add list with a single dictionary that specifies the name of a column Total_Price and apply the SAS dollar format. Next, I'll execute the fetch action to preview the table.

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVars = {
                            {name = 'Total_Price', format = 'dollar16.2'}
                   },
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the results:

Notice in the results above the format was applied to the Total_Price column. However, now the additional computed columns are not displayed in the output. To fix this issue, add all of the computed columns to the computedVars sub-parameter as a dictionary with the name key, even if you do not modify the column metadata.

In the code below, the Product_fix and Return_fix columns were added to the list in the computedVars sub-parameter.

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVars = {
                            {name = 'Total_Price', format = 'dollar16.2'},
                            {name = 'Product_fix'},
                            {name = 'Return_fix'}
                   },
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the results:

Notice in the results that adding each computed column in the computedVars sub-parameter fixed the issue and the computed columns are shown.

However, we still have one more issue. Look at the truncated values in the Return_fix column. How can we fix that?

Modify Computed Column Lengths

When using an IF/THEN/ELSE statement to create a new column in SAS, it sets the column width by using the length of the first assignment statement. In this example, the first assignment statement sets the value of the Return_fix column to No, so the column length is set to 2.

To modify the length of the computed column add a SAS LENGTH statement in the computedVarsProgram parameter prior to the first assignment statement. Here I'll set the Return_fix column to length of 3.

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
 
        length Return_fix varchar(3);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVars = {
                            {name = 'Total_Price', format = 'dollar16.2'},
                            {name = 'Product_fix'},
                            {name = 'Return_fix'}
                   },
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the results:

In the results the truncation issues in the Return_fix column are corrected.

Summary

In summary, using the computedVars sub parameter allows you to add formats to the computed columns, and the computedVarsProgram parameter enables you to set the column lengths. A few key points:

  • When using the computedVars sub-parameter, all computed columns from the computedVarsProgram sub-parameter must be added. Otherwise, the computed columns are not returned.
  • The computedVars sub-parameter takes a list of dictionaries as it's value.

Additional resources

fetch Action
SAS® Cloud Analytic Services: Fundamentals
Code
Create Data Code

CAS-Action! Create Columns in CAS Tables - Part 2 was published on SAS Users.

4月 132022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's learn how to create calculated columns in CAS tables.

In this example, I will use the CAS procedure to create columns in CAS tables for use in CAS actions. I could execute similar code in Python, R and more with some slight changes to the syntax for the specific language. The other languages also include specific methods to accomplish the same objective.

I'll start by executing the following code to create a CAS table named PRODUCTS in the Casuser caslib with 80 million rows. Then I'll create a variable named productsTbl to reference the PRODUCTS CAS table, and use the variable as the value to the table parameter in the fetch CAS action.

proc cas;
    productsTbl = {name = 'products', caslib = 'casuser'};
 
    table.fetch / table = productsTbl;
quit;

And the response:

Create a Calculated Column

To create a calculated column you add the CAS table reference. The computedVarsProgram parameter specifies an expression for each computed variable and takes a string as it's value. The string can be:

  • an assignment statement that can include most SAS functions.
  • a SAS IF/THEN/ELSE statement
  • a SAS LENGTH or LABEL statement

I'll start by creating a simple calculated column named Total_Price that multiplies Price by Quantity. This calculation triggers when referencing the productsTbl CAS table in the table parameter of a CAS action. Here I'll use the fetch action to view 20 rows of the table.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = 'Total_Price = Price * Quantity;'};
 
    table.fetch / table = productsTbl;
quit;

And the response:

Notice in the results above, the new column Total_Price was created during the execution of the fetch action. The original CAS table was not modified.

Create Multiple Calculated Columns

Next, I'll create a second column by adding a new assignment statement in the string. The second calculated column, Product_fix renders upper case values in the original Product column. In the string you must end each assignment statement with a semicolon. Above all, a new column is required since the computedVarsProgram sub parameter cannot overwrite existing columns in the table. To update values in a CAS table column, use the table.update action.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = 'Total_Price = Price * Quantity;
                                          Product_fix = upcase(Product);'};
 
    table.fetch / table = productsTbl;
quit;

And the response:

Notice in the results above, the fetch action created the Total_Price and Product_fix columns.

Use Conditional Logic

In addition, you can also use a traditional SAS IF/THEN/ELSE statement in the computedVarsProgram sub parameter. For example, I'll create a new column named Return_fix that replaces missing values in the Return column with the value No.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = 'Total_Price = Price * Quantity;
                                          Product_fix = upcase(Product);
                                          if Return = "" then Return_fix = "No"; 
                                              else Return_fix = "Yes";'
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the response:

In the results above there are three new calculated columns: Total_Price, Product_fix and Return_fix.

Storing the Calculated Columns in a Variable

Lastly, instead of adding the calculated columns as a string, I like to store them in a variable using the CAS language SOURCE block statement. Storing the string in the the SOURCE block enables you to embed the calculated columns as a text string and avoid quotation issues that can occur. I'll name my SOURCE block variable createColumns and use it as the value in the computedVarsProgram sub parameter.

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the response:

The results show the same result as the previous section.

While this is great, how can I format the new Total_Price column using the traditional SAS dollar format? Look closely at the Return_fix column. Do you see an issue with the Yes values? Well, stay tuned for part 2!

Summary

In summary, using the computedVarsProgram sub parameter allows you to easily create calculated columns in a CAS tables during the execution of CAS actions. A few key points:

  • The computedVarsProgram sub parameter takes a string as it's value and uses traditional SAS assignment statements, IF/THEN/ELSE statements, and you can also use the LABEL and LENGTH statements.
  • Using the computedVarsProgram sub parameter does not modify the original CAS table. It is simply applied during the execution of a CAS action.
  • If you create a dictionary variable with the CAS table reference, you can then use that variable in a variety of actions within the table parameter.
  • Using a SOURCE block enables you to embed your text in a variable without using a string.

Additional resources

fetch Action
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Create Columns in CAS Tables - Part 1 was published on SAS Users.

4月 112022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's learn how to filter rows in CAS tables.

In this example, I will use the CAS procedure to filter rows in CAS tables for use in CAS actions. I could execute similar code in Python, R and more with some slight changes to the syntax for the specific language. The other languages also include specific methods to accomplish the same objective.

I'll start by executing the following code to create a CAS table named PRODUCTS in the Casuser caslib with 80 million rows. Then, I'll create a variable named productsTbl to reference the PRODUCTS CAS table, and use that variable as the value to the table parameter in the numRows and fetch CAS actions.

proc cas;
    productsTbl = {name = 'products', caslib = 'casuser'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

The numRows action shows the PRODUCTS table has 80 million rows, and the fetch action shows the first 20 rows of the table.

Create a Simple Filter

To filter a CAS table you add the where sub parameter to the castable reference with the where expression as a string. The string is similar to using the WHERE statement in SAS.

Here, I'll add the where key to the productsTbl dictionary that references the CAS table, and filter for all rows where the Product column equals Hats. The where expression is only applied when an action is executed on the productsTbl variable (the CAS table reference). The underlying CAS table is not modified.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'Product = "Hats"'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

After executing the numRows and fetch actions, the results show that there are 8,001,336 rows that contain the product Hats.

Use a SAS Function to Filter a CAS Table

Within the where expression you can also use most SAS Functions. Here, I'll use the UPCASE function to upper case all values in the Product column to standardize the case, then filter for rows with the value HATS. The results will be the same as the previous section since there are no underlying case issues in the CAS table. Where you would see a difference would be if the table contained mixed case versions of Hats, such as hats, HAts, hATs, etc. Without the function, the query is case sensitive and would only pick up Hats. All iterations of Hats would be picked up with using the UPCASE function.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS"'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

Multiple WHERE Expressions

Just like the WHERE statement in SAS, you can specify two or more where expressions with a logical operator. Here I'll filter the CAS table for all rows where the Product column contains the value HATS, and a StoreID less than 15. The entire where expression is stored as a string.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS" and StoreID < 15'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

Notice that the results show that 1,789,106 rows have the product hats with a store ID less than 15.

Create a Calculated Column as a Filter

You can create a calculated expression to use as a filter. The calculated expression is not added as a column to the table output, instead it is used to filter the CAS table during the execution of the actions. Here I'll multiple the Price by Quantity, and filter for all rows greater than 40.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS" and 
                                    StoreID < 15 and 
                            Price * Quantity > 40'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

Notice in the results the calculated column is not output, it was only used to filter the rows. With all three where expressions, we now have a total of 436,728 rows.

Storing the Where Expressions in a Variable

Lastly, I want to clean up my code. Instead of adding the where expressions as a single string, I can use a SOURCE block in the CAS language. The SOURCE block enables embedding text in the program and assigning it to a given variable. It avoids any possible quoting issues and helps compartmentalize your code. Here, I'll name my SOURCE block filter and add the where expressions. Then, I'll use the filter variable as the value to the where key. The results are the same as the previous code.

proc cas;
    source filter;
        upcase(Product) = "HATS" and 
        StoreID < 15 and 
        Price * Quantity > 40;
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = filter};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

Summary

In summary, using the where sub-parameter allows you to easily filter your tables for use in CAS actions. A few key points:

  • The where sub parameter takes a string as its value and is similar to using the WHERE expression in SAS.
  • Using the where sub-parameter does not modify the original CAS table. It is simply applied when executing an action on the CAS table reference.
  • Within the where expression you can use most SAS functions.
  • If you create a dictionary variable with the CAS table reference, you can then use that variable in a variety of actions within the table parameter.
  • Using a SOURCE block enables you to embed your where expressions in a variable without using a string.

Additional resources

fetch Action
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Filtering Rows in CAS Tables was published on SAS Users.

9月 282021
 

SQL is an important language for any programmer working with data. In SAS Cloud Analytic Services (CAS) you can execute SQL queries using the fedSQL.execDirect CAS action!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the

proc cas;
    fedSQL.execDirect / 
        query="select Make,
                      Model,
                      MSRP,
                      mean(MPG_City,MPG_Highway) as MPG_Avg
               from casuser.cars
               where Make='Toyota'
               order by MPG_Avg desc";
quit;

And the results:

The execDirect action executes the query in the distributed CAS environment and returns the expected results.

Using a SOURCE block

While the query we executed was simple, this is not always the case. Adding a complicated query as a string can make writing the query difficult.

Instead of specifying the query as a string in the CAS procedure, use the SOURCE statement to embed text in a variable. In the following example, I'll execute the same query as before. However, this time I'll nest the query inside a SOURCE block by specifying the SOURCE statement and name the variable MPG_toyota. Then I'll add the query inside the SOURCE block and use the ENDSOURCE statement to end the block.

proc cas;
    source MPG_toyota;
        select Make,
               Model,
               MSRP,
               mean(MPG_City,MPG_Highway) as MPG_Avg
        from casuser.cars
        where Make='Toyota'
        order by MPG_Avg desc;
    endsource;
 
    fedSQL.execDirect / query=MPG_toyota;
quit;

After the SOURCE block is complete, you can reference the variable as the value to the query parameter in the execDirect action.

And the results:

The results returned are the same, but using a SOURCE block makes the code easier to write and maintain.

Summary

In SAS Viya, FedSQL provides a scalable, threaded, high-performance way to query data and create new CAS tables from existing tables in the CAS server. In this example we saw two distinct ways to run SQL code on SAS Viya. This is only the beginning. See the resources below for more details on PROC FedSQL.

Resources

SAS® Viya®: FedSQL Programming for SAS® Cloud Analytic Services
FEDSQL Procedure
execDirect CAS Action
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing SQL in SAS Viya was published on SAS Users.

9月 232021
 

In my previous post CAS-Action! Simply Distinct - Part 1 I reviewed using the simple.distinct CAS action to explore distinct and missing values in a distributed CAS table.

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

And now, back to the distinct action. What if we want to do more? Maybe you want to create a CSV file that documents the percentage of distinct values in each column? Let's explore some possibilities.

To complete the task I'll break it down into four steps.

Step 1 - Find the number of rows in the CAS table

To find the number of rows in a CAS table use the simple.numRows CAS action. Let's execute the numRows action and store the results in a variable. I'll also PRINT and DESCRIBE the results to take a closer look at the output.

proc cas;
    simple.numRows result=n / table={name="cars",caslib="casuser"};
    describe n;
    print n;
...

And the results:
Output of the simple.numRows CAS action

The results of the DESCRIBE statement show the output of the action is a dictionary with a key named numRows and an integer as the value. The PRINT statement shows the value of the dictionary, numRows=428.

Now that we have the total number of rows, we can use that number in our calculation.

Step 2 - Find the number of distinct values in each column

Next, let's execute the distinct action and store the results in a variable named d. Then execute the PRINT statement to confirm the results.

...
    simple.distinct result=d /
            table={name="cars",caslib="casuser"};
    print d;
...

And the resluts:
Results of the simple.distinct CAS action
The results of the distinct action are as expected. Each column with the number of distinct values.

Step 3 - Create a calculated column that computes the percentage of distinct values

Now that we have the number of distinct values in each column, and the total number of rows in the CARS CAS table, we can calculate the total percent of distinct values in each column.

Consider the code:

...
    pctDistinct=d.Distinct.compute({"PctDistinct","Percent Distinct",percent7.2}, nDistinct/n.numRows)
                          [ , {"Column","NDistinct","PctDistinct"} ];
    print pctDistinct;
...

To add a calculated column to a result table use the compute operator. In the first argument, specify column metadata inside an array (column name, label and format). In the second argument, specify the expression. My expression nDistinct/n.numRows divides the distinct values in each column by the total number of rows in the CARS table.

After the compute operator, select specific rows and columns from the result table using bracket notation. Here I'll select all rows, and only the columns Column, nDistinct and PctDistinct.

Lastly, I used the PRINT statement to confirm the results.

New calculated result table

In the output we can see the new result table with the computed column.

Step 4 - Save the results table as a CSV file

Lastly, let's put it all together!

I'll add the code from the pervious steps, then save the table as a CSV file using the SAVERESULTS statement with the CSV= option.

%let outpath=/*specify output file location*/;
 
proc cas;
* Specify the CAS table *;
    casTbl={name="cars", caslib="casuser"};
 
* Store the number of rows in the CAS table *;
    simple.numRows result=n / table=casTbl;
 
* Store the number of distinct values in each column *;
    simple.distinct result=d / table=casTbl;
 
* Calculate the percentage of distinct values in each column *;
    pctDistinct=d.Distinct.compute({"PctDistinct","Percent Distinct",percent7.2}, nDistinct/n.numRows)
                          [ , {"Column","NDistinct","PctDistinct"} ];
 
* Save the result table as a CSV file *;
    saveresult pctDistinct csv="&outpath/pctDistinctCars.csv";
quit;

In the CSV= option I specified the outpath macro variable that contains the location of output folder, and add the name of the CSV file.

After executing the code the log indicates everything ran successfully, and a CSV file was created in the specified location. Next I'll find and open the CSV file.

My CSV file is located in my outfiles folder:

Then double click on the file to open it in SAS Studio:

Summary

The distinct action is a flexible and easy way to explore your data. It allows you to quickly explore your distributed CAS tables, then process and save the results in a variety of formats to fit your needs.

Additional Resources

distinct CAS action
CAS-Action! Simply Distinct - Part 1
CASL Result Tables
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Simply Distinct - Part 2 was published on SAS Users.

9月 152021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dive into exploring your data by viewing the number of distinct and missing values that exist in each column using the simple.distinct CAS action.

In this example, I will use the CAS procedure to execute the distinct action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax from other languages.

Determine the Number of Distinct and Missing Values in a CAS Table

To begin, let's use the simple.distinct CAS action on the CARS in-memory table to view the action's default behavior.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"};
quit;

In the preceeding code, I specify the CAS procedure, the action, then reference the in-memory table. The results of the call are displayed below.

The results allow us to quickly explore the CAS table and see the number of distinct and missing values. That's great, but what if you only want to see specific columns?

Specify the Columns in the Distinct Action

Sometimes your CAS tables contain hundreds of columns, but you are only interested in a select few. With the distinct action, you can specify a subset of columns using the inputs parameter. Here I'll specify the Make, Origin and Type columns.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        inputs={"Make","Origin","Type"};
quit;

After executing the code the results return the information for only the Make, Origin and Type columns.

Next, let's explore what we can do with the results.

Create a CAS Table with the Results

Some actions allow you to create a CAS table with the results. You might want to do this for a variety of reasons like use the new CAS table in a SAS Visual Analytics dashboard or in a data visualization procedure like SGPLOT.

To create a CAS table with the distinct action result, add the casOut parameter and specify new CAS table information, like name and caslib.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        casOut={name="distinctCars", caslib="casuser"};
quit;

After executing the code, the action returns information about the name and caslib of the new CAS table, and the number of rows and columns.

Visualize the Number of Distinct Values in Every Column

Lastly, what if you want to create a data visualization to better explore the table? Maybe you want to visualize the number of distinct values for each column? This task can be accomplished with variety of methods. However, since I know my newly created distinctCars CAS table has only 15 rows, I'll reference the CAS table directly using SGPLOT procedure.

This method works as long as the LIBNAME statement references your caslib correctly. I recommend this method when you know the CAS table is a manageable size. This is important because the CAS server does not execute the SGPLOT procedure on a distributed CAS table. The CAS server instead transfers the entire CAS table back to the client for processing.

To begin, the following LIBNAME statement will reference the casuser caslib.

libname casuser cas caslib="casuser";

Once the LIBNAME statement is correct, all you need to do is specify the CAS table in the DATA option of the SGPLOT procedure.

title justify=left height=14pt "Number of Distinct Values for Each Column in the CARS Table";
proc sgplot data=casuser.distinctCars
            noborder nowall;
    vbar Column / 
        response=NDistinct
        categoryorder=respdesc
        nooutline
        fillattrs=(color=cx0379cd);
    yaxis display=(NOLABEL);
    xaxis display=(NOLABEL);
quit;

The results show a bar chart with the number of distinct values for each column.

Summary

The simple.distinct CAS action is an easy way to explore a distributed CAS table. With one simple action, you can easily see how many distinct values are in each column, and the number of missing rows!

In Part 2 of this post, I'll further explore the simple.distinct CAS action and offer more ideas on how to interpret and use the results.

Additional Resources

distinct CAS action
SAS® Cloud Analytic Services: Fundamentals
Plotting a Cloud Analytic Services (CAS) In-Memory Table
Getting started with SGPLOT - Index
Code

CAS-Action! Simply Distinct - Part 1 was published on SAS Users.

8月 282021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dig in with the table.columnInfo action.

In this post we'll l look at exploring the column attributes of a CAS table. Knowing the column names, data types, and any additional formats or labels associated with the columns makes it easier to work with the data. One way to see this type of information on a CAS table is to use the table.columnInfo CAS action!

In this example, I will use the CAS procedure to execute the columnInfo action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language.

Return CAS table Column Information

To view the column information of your CAS table, use the columnInfo CAS action with the table parameter. That's it! Refer to the code below.

proc cas;
    table.columnInfo / table={name="cars", caslib="casuser"};
quit;

The results would appear similar to the following:

table.columnInfo CAS action results

and return a variety of information about each column in the CAS table:

  • the column name
  • a label if one has been applied
  • the Id, which indicates the position of the column in the table
  • the data type of the column
  • the column length
  • the format, formatted length, width and decimal

Create a CSV Data Dictionary Using the ColumnInfo Action

What if instead, you want to create a data dictionary documenting the CAS table? With the columnInfo action you can export the results to a CSV file!

I'll use the columnInfo action again, but this time I'll store the results in a variable. The variable is a dictionary, so I need to reference the dictionary ci, then the columnInfo key to access the table. Next, I'll create two computed columns using the compute operator. The first column contains the name of the CAS table, and the second, the caslib of the table. I'll print the new table to confirm the results.

proc cas;
    table.columnInfo result=ci / table={name="cars", caslib="casuser"};
 
    ciTbl = ci.ColumnInfo.compute({"TableName","Table Name"}, "cars")
                         .compute({"Caslib"}, "casuser");
 
    print ciTbl;
    ...

The code produces the folloiwng result:

The new result table documents the CAS table columns, table name and caslib.

Lastly, I'll export the result table to a CSV file. First, I'll specify the folder location using the outpath variable. Then use the SAVERESULT statement to save the result table as a CSV file named carsDataDictionary.csv.

    ...
    outpath="specify a folder location";
    saveresult ciTbl csv=outpath || "carsDataDictionary.csv";
quit;

 

After I execute the CAS procedure I can find and open the CSV file to view the documented CAS table!

Summary

The table.columnInfo CAS action is a simple and easy way to show column information about your distributed CAS table. Using the results of the action allow you to create a data dictionary in a variety of formats.

Additional resources

table.columnInfo CAS action
CAS Action! - a series on fundamentals
SAS® Cloud Analytic Services: Fundamentals
CASL Result Tables
SAVERESULT Statement
Code

CAS-Action! Show me the ColumnInfo! was published on SAS Users.