Tech

9月 152021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dive into exploring your data by viewing the number of distinct and missing values that exist in each column using the simple.distinct CAS action.

In this example, I will use the CAS procedure to execute the distinct action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax from other languages.

Determine the Number of Distinct and Missing Values in a CAS Table

To begin, let's use the simple.distinct CAS action on the CARS in-memory table to view the action's default behavior.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"};
quit;

In the preceeding code, I specify the CAS procedure, the action, then reference the in-memory table. The results of the call are displayed below.

The results allow us to quickly explore the CAS table and see the number of distinct and missing values. That's great, but what if you only want to see specific columns?

Specify the Columns in the Distinct Action

Sometimes your CAS tables contain hundreds of columns, but you are only interested in a select few. With the distinct action, you can specify a subset of columns using the inputs parameter. Here I'll specify the Make, Origin and Type columns.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        inputs={"Make","Origin","Type"};
quit;

After executing the code the results return the information for only the Make, Origin and Type columns.

Next, let's explore what we can do with the results.

Create a CAS Table with the Results

Some actions allow you to create a CAS table with the results. You might want to do this for a variety of reasons like use the new CAS table in a SAS Visual Analytics dashboard or in a data visualization procedure like SGPLOT.

To create a CAS table with the distinct action result, add the casOut parameter and specify new CAS table information, like name and caslib.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        casOut={name="distinctCars", caslib="casuser"};
quit;

After executing the code, the action returns information about the name and caslib of the new CAS table, and the number of rows and columns.

Visualize the Number of Distinct Values in Every Column

Lastly, what if you want to create a data visualization to better explore the table? Maybe you want to visualize the number of distinct values for each column? This task can be accomplished with variety of methods. However, since I know my newly created distinctCars CAS table has only 15 rows, I'll reference the CAS table directly using SGPLOT procedure.

This method works as long as the LIBNAME statement references your caslib correctly. I recommend this method when you know the CAS table is a manageable size. This is important because the CAS server does not execute the SGPLOT procedure on a distributed CAS table. The CAS server instead transfers the entire CAS table back to the client for processing.

To begin, the following LIBNAME statement will reference the casuser caslib.

libname casuser cas caslib="casuser";

Once the LIBNAME statement is correct, all you need to do is specify the CAS table in the DATA option of the SGPLOT procedure.

title justify=left height=14pt "Number of Distinct Values for Each Column in the CARS Table";
proc sgplot data=casuser.distinctCars
            noborder nowall;
    vbar Column / 
        response=NDistinct
        categoryorder=respdesc
        nooutline
        fillattrs=(color=cx0379cd);
    yaxis display=(NOLABEL);
    xaxis display=(NOLABEL);
quit;

The results show a bar chart with the number of distinct values for each column.

Summary

The simple.distinct CAS action is an easy way to explore a distributed CAS table. With one simple action, you can easily see how many distinct values are in each column, and the number of missing rows!

In Part 2 of this post, I'll further explore the simple.distinct CAS action and offer more ideas on how to interpret and use the results.

Additional Resources

distinct CAS action
SAS® Cloud Analytic Services: Fundamentals
Plotting a Cloud Analytic Services (CAS) In-Memory Table
Getting started with SGPLOT - Index
Code

CAS-Action! Simply Distinct - Part 1 was published on SAS Users.

9月 132021
 

The Day of the Programmer is not enough time to celebrate our favorite code-creators. That’s why at SAS, we celebrate an entire week with SAS Programmer Week! If you want to extend the fun and learning of SAS Programmer Week year-round, SAS Press is here to support you with books for programmers at every level.

2021 has been a big year for learning, so we wanted to share the six most popular books for programmers this year. There are some old favorites on this list as well as some brand-new books on a variety of topics. Check out the list below, and see what your fellow programmers are reading this year!

  1. Little SAS Book: A Primer, Sixth Edition

This book is at the top of almost every list of recommended books for anyone who wants to learn SAS. And for good reason! It breaks down the basics of SAS into easy-to-understand chunks with tons of practice questions. If you are new to SAS or are interested in getting your basic certification, this is the book for you.

  1. Learning SAS by Example: A Programmer’s Guide, Second Edition

Whether you are learning SAS for the first time or just need a quick refresher on a single topic, this book is well-organized so that you can read start to finish or skip to your topic of interest. Filled with real-world examples, this is a book that should be on every SAS programmer’s bookshelf!

  1. Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS

If you work with big data, then you probably work with a lot of text. The third book on our list is for anyone who handles unstructured data. This book focuses on practical solutions to real-life problems. You’ll learn how to collect, cleanse, organize, categorize, explore, analyze, and interpret your data.

  1. End-to-End Data Science with SAS: A Hands-On Programming Guide

This book offers a step-by-step explanation of how to create machine learning models for any industry. If you want to learn how to think like a data scientist, wrangle messy code, choose a model, and evaluate models in SAS, then this book has the information that you need to be a successful data scientist.

  1. Cody's Data Cleaning Techniques Using SAS, Third Edition

Every programmer knows that garbage in = garbage out. Take out the trash with this indispensable guide to cleaning your data. You’ll learn how to find and correct errors and develop techniques for correcting data errors.

  1. SAS Graphics for Clinical Trials by Example

If you are a programmer who works in the health care and life sciences industry and want to create visually appealing graphs using SAS, then this book is designed specifically for you. You’ll learn how to create a wide range of graphs using Graph Template Language (GTL) and statistical graphics procedures to solve even the most challenging clinical graph problems.

An honorable mention also goes to the SAS Certification Guides. They are a great way to study for the certification exams for the SAS Certified Specialist: Base Programming and SAS Certified Professional: Advanced Programming credentials.

We have many books available to support you as you develop your programming skills – and some of them are free! Browse all our available titles today.

Top Books for SAS Programmers was published on SAS Users.

9月 102021
 

As its thousands of users know, SAS Analytics Pro consists of three core elements of the SAS system: Base SAS®, SAS/GRAPH® and SAS/STAT®. It provides the fundamental capabilities of data handling, data visualization, and statistical analysis either through coding or through the SAS Studio interface. For many years, SAS Analytics Pro has been deployed on-site as the entry-level workhorse to the SAS system.

Now, SAS Analytics Pro includes a new option for containerized cloud-native deployment. In addition, the containerized option comes with the full selection of SAS/ACCESS engines making it even easier to work with data from virtually any source. For organizations considering the move to the cloud, or those already there, SAS Analytics Pro provides an exciting new option for cloud deployment.

What is SAS Analytics Pro?

SAS Analytics Pro is an easy-to-use, yet powerful package for accessing, manipulating, analyzing and presenting information. It lets organizations improve productivity with all the tools and methods needed for desktop data analysis – in one package.

  • Organizations can get analysis, reporting and easy-to-understand visualizations from one vendor. Rather than having to piece together niche software packages from different vendors, this consolidated portfolio reduces the cost of licensing, maintenance, training and support – while ensuring that consistent information is available across your enterprise.
  • Innovative statistical techniques are provided with procedures constantly being updated to reflect the latest advances in methodology. Organizations around the world rely on SAS to provide accurate answers to data questions, along with unsurpassed technical support.
  • SAS software integrates into virtually any computing environment, unifying your computing efforts to get a single view of your data, and freeing analysts to focus on analysis rather than data issues.
  • Easily build analytical-style graphs, maps and charts with virtually any style of output that is needed, so you can deliver analytic results where they’re needed most

Why data scientists should care about cloud and containers?

With the new containerized cloud-native deployment option for SAS Analytics Pro, this raises the question of why data scientists should care about cloud and containers? This question was addressed in a SAS Users blog post by SAS R&D director Brent Laster.

This post characterizes a container as a “self-contained environment with all the programs, configuration, initial data, and other supporting pieces to run applications.” The nice thing for data scientists is that this environment can be treated as a stand-alone unit, to turn on and run at any time – sort of a “portable machine.” It provides a complete virtual system configured to run your targeted application.

Using popular container runtime environments (e.g., Docker), containers can be an efficient way for individual users to deploy and manage software applications. This is especially useful for applications like SAS Analytics Pro, which participates in SAS Viya’s continuous delivery approach, releasing updates on a regular basis.

For large, IT-managed environments, containers can call for something like a “data center” to simplify deployment and management of dynamic workloads. The most prominent one today is Kubernetes, which automates key needs around containers including deployment, scaling, scheduling, healing, and monitoring – so the data scientist doesn’t have to.

The combination of containers and cloud environments provides an evolutionary jump in the infrastructure and runtime environments where data scientists run their applications. And this gives them a similar jump in being able to provide the business value their customers demand. Containerized and cloud-native deployment of SAS Analytics Pro provides the automatic optimization of resources and the automatic management of workloads that your organization needs to be competitive.

Note that existing customers can continue programming in SAS in a small footprint environment while availing themselves of the SAS container-based continuous delivery process. And if you aren’t already a SAS customer, cloud deployment gives you one more good reason to start letting SAS Analytics Pro deliver value to your organization.

Learn more

Website: SAS® Analytics Pro
Training: SAS® Global Certification Program

On-Premises Documentation:

Cloud-Native Documentation:

SAS Analytics Pro now available for on-site or containerized cloud-native deployment was published on SAS Users.

9月 022021
 

Deleting any trailing characters in SAS stringsIn my previous post, we solved the task of removing specified leading characters from SAS strings. In this post, we tackle the complementary task of removing trailing characters.

While removing trailing blanks is well covered in SAS by the TRIM() and TRIMN() functions, removing non-blank trailing characters remains a bit of a mystery that can pop up during text string processing.

For example, you may need to clean up the following strings by removing all trailing x's from them:

012345x
012345xxx
012345xx

These extra characters can result from data entry errors, prior data manipulations, etc. No matter how you get them in, you want them out.

How to remove any trailing characters

For instance, let’s remove all occurrences of the arbitrary trailing character '*'. The following diagram illustrates what we are going to achieve and how:
Diagram: Deleting trailing characters
In order to remove a specified character (in this example '*') from all trailing positions in a string, we need to search our string from right to left starting from the rightmost non-blank character position and find the position p of the first character in that string that is not equal to the specified character. Note, that despite our right-to-left direction of search the position p=10 is still enumerated from left-to-right. Then we can extract the substring starting from position 1 with the length equal to the found position p.

Unlike in our leading characters removal solution, out of two contenders for our search functionality, VERIFY and FINDC, the VERIFY function has to be dropped from the competition as it does not provide right-to-left search functionality. However, the FINDC function stays on track. Here is a possible solution using the FINDC() function.

Using FINDC() function with negative start-position

The FINDC(X, C, ’K’, -LENGTH(X)) function searches string X from right to left starting from the last non-blank character position determined by the optional start-position argument equal to LENGTH(X), and returns the position P of the first character that does not appear in C.

Here we use the K modifier that switches the default behavior of searching for any character that appears in C to searching for any character that does not appear in C.

The direction of search is defined by the minus sign in front of the start-position (a negative start-position argument translates into searching from right to left.)

Then we can apply the SUBSTR(X, 1, P) function that extracts a substring of X starting from position 1 with a length of P which is effectively a substring of the first P characters in X.

Special considerations

Before we proceed to the code implementation of the outlined solution let’s consider the following edge case.

If our string X consists of all '*' characters and nothing else the FINDC() function will find no character (and therefore no position) that is not equal to '*'. In this case it will return 0. However, 0 is not a valid third argument value for the SUBSTR(X, 1, P) function. Valid values are 1 . . . through VLENGTH(X) – the length attribute of X. Having a 0 value for the third argument will trigger the automatic data step variable _ERROR_=1 and the following note generated in the SAS log:

NOTE: Invalid third argument to function SUBSTR at line ## column #.

Therefore, we need to handle this special case separately, conditionally using SUBSTR(X, 1, P) for P>0 and assigning blank ('') otherwise.

Code implementation for removing trailing characters

Now we are ready to put everything together. First, let’s create a test data table:

data TEST;
   input X $ 1-20;
   datalines;
*It's done***
*********
**01234*ABC**
No trailing *'s
;

Then we apply the logic described above. The following DATA step illustrates our coding solution for deleting trailing characters:

data CLEAN (keep=X Y);
   set TEST;
   C = '*'; *<- trailing character(s) to be removed;
 
   P = findc(X, C, 'K', -length(X));
   if P then Y = substr(X, 1, P); 
        else Y = '';
 
   put _n_= / X= / P= / Y= /;
run;

The SAS log will show interim and final results by the DATA step iterations:

_N_=1
X=*It's done***
P=10
Y=*It's done
 
_N_=2
X=*********
P=0
Y=
 
_N_=3
X=**01234*ABC**
P=11
Y=**01234*ABC
 
_N_=4
X=No trailing *'s
P=15
Y=No trailing *'s

Here is the output data table CLEAN showing the original and the resulting strings X and Y side by side:
Removing any trailing characters in SAS strings

Conclusion

The solution presented in this blog post expands trailing character deletion functionality beyond solely blanks (which are handled by the TRIM and TRIMN functions). Moreover, using this coding technique, we can simultaneously remove a variety of trailing characters. For example, if we have a string X='012345xxx.%' and specify C = 'x.%' (the order of characters listed within the value of C does not matter), then all three characters 'x', '.', and '%' will be removed from all trailing positions of X. The resulting string will be Y='012345'.

In addition, numerous modifiers of the FINDC() function allow specifying many characters in bulk, without explicitly listing them one by one. For example, we may augment a list of characters being removed by adding the D modifier as in P = FINDC(X, C, 'KD', -LENGTH(X)) which will remove all trailing digits in addition to those characters specified in C. Similarly, we may throw in the U modifier as in P = FINDC(X, C, 'KDU', -LENGTH(X)) which adds all uppercase letters to the list of trailing characters to be removed. And so on.

Additional resources

Questions? Thoughts? Comments?

Do you find this post useful? Do you have questions, concerns, comments? Please share with us below.

Removing trailing characters from SAS strings was published on SAS Users.

9月 012021
 

Analytics and Artificial Intelligence (AI) are changing the way we interact with the world around us – increasing productivity and improving the way we make decisions. SAS and Microsoft are partnering to inspire greater trust and confidence in every decision by driving innovation and delivering proven AI in the cloud.

In this demo, see how intelligent decisioning and machine learning from SAS and Microsoft help Contoso Bank – a fictitious banking customer – simplify and reduce risk in its home loan portfolio.

Let’s get started.

Part 1: Data and Discovery

Organizations can run faster and smarter by enabling employees to uncover insights. See how SAS and Microsoft help Contoso Bank gain new insight into its portfolio by bringing together data management, analytics and AI capabilities with seamless integration into the Azure data estate.

Key Product Features:
• Use built-in Power BI tools like smart narratives and sentiment analysis to quickly analyze structured and unstructured data.
• Connect your SAS Viya and Microsoft Azure environments with single sign-on via Azure Active Directory.
• Catalog your datasets across SAS and Microsoft in SAS Information Catalog for a holistic view of your data environment.
• Integrate data from Azure Synapse Analytics and other Azure data sources into a combined dataset in SAS Data Studio.
• No-code intelligence features in SAS Visual Analytics explain analytic outputs in natural language.

Part 2: Model and Deploy

AI has the potential to transform organizations. See how SAS and Microsoft enable Contoso Bank to quickly build and operationalize predictive models by bringing together SAS Viya advanced analytics and AI capabilities with Azure Machine Learning.

Key Product Features:
• Bring models from SAS Visual Analytics into SAS Model Studio as a candidate for production use.
• Create automatically generated pipelines in SAS Model Studio to select the best features for modeling.
• Register models built in open-source Jupyter notebooks within Azure Machine Learning into SAS Model Manager.
• Publish models from SAS Model Manager in Azure Machine Learning to be deployed in the Microsoft ecosystem.
• Schedule SAS model manager to monitor model drift in the SAS or Microsoft ecosystem to identify the right time to retrain models.

Part 3: Automate and Monitor

Building a data-driven organization means increasing productivity with the necessary insights and tools. See how SAS and Microsoft can help Contoso Bank rapidly operationalize the analytics and AI capabilities of SAS Viya through Power Apps and Power Automate to help employees make better decisions.

Key Product Features:
• Build decision flows in SAS Intelligent Decisioning to make calculated decisions at speed.
• Use AI Builder in Power Platform to extract and process information in Power Platform.
• Access SAS Intelligent Decisioning’s decision access engine in low-code applications by using Power Apps to ingest data and receive decisioning outputs.
• Connect to SAS Intelligent Decisioning from Power Apps and Power Automate with the SAS Decisioning connector.
• Embed Power Apps in Microsoft Teams or access via a mobile friendly web app.

To learn more about how SAS Viya integrates with Microsoft, check out our white paper SAS and Microsoft: Shaping the future of AI and analytics in the cloud.

Transforming Your Business With SAS® Viya® on Microsoft Azure was published on SAS Users.

8月 282021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dig in with the table.columnInfo action.

In this post we'll l look at exploring the column attributes of a CAS table. Knowing the column names, data types, and any additional formats or labels associated with the columns makes it easier to work with the data. One way to see this type of information on a CAS table is to use the table.columnInfo CAS action!

In this example, I will use the CAS procedure to execute the columnInfo action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language.

Return CAS table Column Information

To view the column information of your CAS table, use the columnInfo CAS action with the table parameter. That's it! Refer to the code below.

proc cas;
    table.columnInfo / table={name="cars", caslib="casuser"};
quit;

The results would appear similar to the following:

table.columnInfo CAS action results

and return a variety of information about each column in the CAS table:

  • the column name
  • a label if one has been applied
  • the Id, which indicates the position of the column in the table
  • the data type of the column
  • the column length
  • the format, formatted length, width and decimal

Create a CSV Data Dictionary Using the ColumnInfo Action

What if instead, you want to create a data dictionary documenting the CAS table? With the columnInfo action you can export the results to a CSV file!

I'll use the columnInfo action again, but this time I'll store the results in a variable. The variable is a dictionary, so I need to reference the dictionary ci, then the columnInfo key to access the table. Next, I'll create two computed columns using the compute operator. The first column contains the name of the CAS table, and the second, the caslib of the table. I'll print the new table to confirm the results.

proc cas;
    table.columnInfo result=ci / table={name="cars", caslib="casuser"};
 
    ciTbl = ci.ColumnInfo.compute({"TableName","Table Name"}, "cars")
                         .compute({"Caslib"}, "casuser");
 
    print ciTbl;
    ...

The code produces the folloiwng result:

The new result table documents the CAS table columns, table name and caslib.

Lastly, I'll export the result table to a CSV file. First, I'll specify the folder location using the outpath variable. Then use the SAVERESULT statement to save the result table as a CSV file named carsDataDictionary.csv.

    ...
    outpath="specify a folder location";
    saveresult ciTbl csv=outpath || "carsDataDictionary.csv";
quit;

 

After I execute the CAS procedure I can find and open the CSV file to view the documented CAS table!

Summary

The table.columnInfo CAS action is a simple and easy way to show column information about your distributed CAS table. Using the results of the action allow you to create a data dictionary in a variety of formats.

Additional resources

table.columnInfo CAS action
CAS Action! - a series on fundamentals
SAS® Cloud Analytic Services: Fundamentals
CASL Result Tables
SAVERESULT Statement
Code

CAS-Action! Show me the ColumnInfo! was published on SAS Users.

8月 242021
 

Editor's note: Jack Liu is a member of SAS Analytics Explorers, a SAS community that is dedicated to exploring analytics, sharing knowledge, having fun and helping SAS users in their careers. Members were recently invited to share their analytics journeys, and Jack responded with his impressive story. If you're a loyal SAS advocate and you love analytics, check out SAS Analytics Explorers as a way to connect with others!

Inspired to find a better way of working

Unlike many members of the SAS Analytics Explorers community, I don't have a background in statistics or data science. Arriving in Australia in 2007 as an international student, I studied at Griffith University on the Gold Coast majoring in Hotel Management. In the years to follow, I worked my way up from a casual uni student housekeeper to the assistant front office manager at a 5-star hotel in downtown Brisbane.

It was at that point I felt somewhat burnt out in the hospitality industry. Not because I was trying to smile for every guest every day, but because of how manually things were done in the hotel. A simple guest restaurant bill reconciliation that should be done in a few seconds with a proper automated system in place, was instead done by two people spending hours matching pages of printed receipts against a printed spreadsheet.

Thinking like an analyst

So, in 2015, after a three month holiday, I left my hotel career behind and joined Bupa Australia in their Health Insurance division. As I was switching sectors at that point, I started from the bottom to better understand health insurance. I started working at Bupa as a junior contact centre consultant after weeks of training. With my strong customer service focus and large-scale operational experience gained in my hotel days, it didn't take me long to progress into Bupa's national contact centre operations team. It was there that I felt déjà vu, where many things that could be done by machines were instead done by people. My data and reporting assignments made me consider whether our analytics could be done better.

With this curious mindset, I joined the Business Integration team, where I learnt how to use SAS and other reporting tools, and how to think like a data analyst. Fast forward to today -- I'm now working for Bupa Asia Pacific as a Senior Resource Planning and Reporting Analyst. As part of the Workforce Optimisation team, I gather and analyse data related to customer demand in contact centres and retail stores. I use that data to provide insights and reports about staffing levels and performance at each engagement point with our members.

Many might say my career path is a bit all over the place in the last ten years, but wherever I am, I have always been chasing better performance, improved productivity, automation and insights, and using those to make people happier -- be that as hotel guests, insurance members or stakeholders.

SAS products are used in many parts of Bupa's business. In my current portfolio, I mainly use SAS Studio and SAS Enterprise Guide to conduct data mining, data cleansing, analytics, and process automation. One thing I particularly like about SAS Enterprise Guide is how user friendly the point-and-click system is. It makes it easy to teach and train people with no data background.

Being a good "citizen"

The concept of 'data citizen' was mentioned many times at SAS Global Forum this year. As someone without a data background, I believe strongly that the more people I can help to establish a data mindset in an organisation, the better it will be for the organisation’s future. For users like myself, the first step into the data world can be very intimidating if you start with coding in SQL, Python or R. However, with SAS Enterprise Guide, the users are put at ease with a familiar desktop application experience. There are also many high quality tutorials on the SAS Users YouTube channel. Almost any question can be answered with a quick Google search -- almost always leading to the SAS Community (https://communities.sas.com/).

On top of all these great resources, there is our beloved SAS Analytics Explorers platform. Not only has it gathered a great number of fun activities, it has also provided rewarding points to keep users engaged, and the leaderboard is something that people can be proud of. From time to time I would hear people telling me that learning data and analytics at the beginning is hard work, and while that is true to an extent, SAS Analytics Explorers has certainly made it more interesting and would put a smile on your face watching the points rolling up after a hard day at work.

LOVE SAS? | JOIN SAS ANALYTICS EXPLORERS

Throughout the years, my team and I have conquered many challenges using SAS products, providing data, analytics, reports and insights to the business. I'm inspired by "Curiosity Forever". It is the people, not any technical problem, that motivates me to keep exploring. I love the Chinese proverb that says "Give a man a fish you feed him for a day. Teach a man to fish and you feed him for a lifetime." My curiosity in recent years has driven me to nurture the next generation of data citizens.

Working to inspire others

Countless times, I've seen my fellow data analysts work for days non-stop to be that "knight in shining armour" who saves the day with answers and insights. While we feel proud and appreciated as part of the data community, I would rather not be the hero. At the end of the day, the more people with a data mindset that we have in the business, the easier it will be for us to leap into a data-driven future. Having this in my mind, I have been curious about how to get more people engaged with the data community, especially those without a data or statistics background. And for that, I think SAS Enterprise Guide is a perfect platform. It is quick to learn and easy to use, and new users can get to a visualisation of basic data relatively easily, helping them to feel a sense of achievement and be motivated to keep going.

Introducing people into the data world to become future data citizens has been one of my biggest achievements in the last few years. I have personally introduced six colleagues without a data background to SAS Enterprise Guide -- and my team has helped dozens of people here at Bupa on their journey of becoming future data citizens.

I'll end with a quote I admire from President Obama, "Change is never easy, but always possible." Have a “Curiosity Forever” mindset and explore what's possible. Good Luck!

My SAS journey: from data citizen to analytics "explorer" was published on SAS Users.

8月 232021
 

Illustration for trimming leading characters in SAS stringsAs in many other programming languages, there is a very useful SAS function that removes leading blanks in character strings. It is the ubiquitous LEFT function.

The LEFT(x) function left-aligns a character string x, which effectively removes leading blanks.

However, in many SAS applications we need a similar but more versatile data cleansing functionality allowing for removal of other leading characters, not just blanks. For example, consider some bank account numbers that are stored as the following character strings:

123456789
0123456789
000123456789

These strings represent the same account number recorded with either no, one, or several leading zeros. One way of standardizing this data is by removing the leading 0's. And while we're at it, why don’t we address the leading character removal functionality for any leading characters, not just zeros.

How to remove any leading characters

For example, let’s remove all occurrences of the arbitrary leading character '*'. The following diagram illustrates what we are going to achieve and how:

In order to remove a specified character (in this example '*') from all leading positions in a string, we need to search our string from left to right and find the position of the first character in that string that is not equal to the specified character. In this case, it’s a blank character in position 4. Then we can extract a substring starting from that position till the end of the string.

I can see two possible solutions.

Solution 1: Using VERIFY() function

The VERIFY (X, C) function searches string X from left to right and returns the position P of the first character that does not appear in the value of C.

Then we can apply the SUBSTR(X,P) function that extracts a substring of X starting from position P till the end of the string X.

Solution 2: Using FINDC() function


The FINDC(X, C, ‘K’) function also searches string X from left to right and returns the position P of the first character that does not appear in C. (The modifier ‘K’ switches the default behavior of searching for any character that appears in C to searching for any character that does not appear in C.)

Then, as with the VERIFY() function, we can apply the SUBSTR(X,P) function that extracts a substring of X starting from position P till the end of the string X.

Special considerations

So far so good, and everything will be just hunky-dory, right? Not really - unless we cover our bases by handling edge cases.

Have we thought of what would happen if our string X consisted of all '*' characters and nothing else? In this special case, both the verify() function and findc() function will find no position of the character that is not equal to '*' and thus return 0.

However, 0 is not a valid second argument value for the SUBSTR(X,P) function. Valid values are 1 . . . through length(X). Having a 0 value for the second argument will trigger the automatic data step variable _ERROR_=1 and the following note generated in the SAS log:

NOTE: Invalid second argument to function SUBSTR at line ## column #.

Therefore, we need to handle this special case separately, conditionally using SUBSTR(X,P) for P>0 and assigning blank ('') otherwise.

Code implementation for removing leading characters

Let’s put everything together. First, we'll create a test data table:

data TEST;
   input X $ 1-20;
   datalines;
*** It's done*
*********
**01234*ABC**
No leading *'s
;

Then we apply the logic described above. The following DATA step illustrates our two implemented coding solutions for removing leading characters:

data CLEAN (keep=X Y Z);
   set TEST;
   C = '*'; *<- leading character(s) to be removed;
 
   P1 = verify(X,C); *<- Solution 1;
   if P1 then Y = substr(X, P1);    
         else Y = '';
 
   P2 = findc(X,C,'K'); *<- Solution 2;
   if P2 then Z = substr(X, P2); 
         else Z = '';
 
   put _n_= / X= / P1= / Y= / P2= / Z= /;
run;

Alternatively, we can replace the IF-THEN-ELSE construct with this IFC() function one-liner:

data CLEAN (keep=X Y Z);
   set TEST;
   C='*'; *<- leading character(s) to be removed;
 
   P1 = verify(X,C); *<- Solution 1;
   Y = ifc(P1, substr(X, P1), '');
 
   P2 = findc(X,C,'K'); *<- Solution 2;
   Z = ifc(P2, substr(X, P2), '');
 
   put _n_= / X= / P1= / Y= / P2= / Z= /;
run;

The SAS log will show interim and final results by the DATA step iterations:

_N_=1
X=*** It's done*
P1=4
Y=It's done*
P2=4
Z=It's done*
 
_N_=2
X=*********
P1=10
Y=
P2=10
Z=
 
_N_=3
X=**01234*ABC**
P1=3
Y=01234*ABC**
P2=3
Z=01234*ABC**
 
_N_=4
X=No leading *'s
P1=1
Y=No leading *'s
P2=1
Z=No leading *'s

Here is the output data table CLEAN showing the original string X, and resulting strings Y (solution 1) and Z (solution 2) side by side:
Removing any leading characters in SAS strings
As you can see, both solutions (1 & 2) produce identical results.

Conclusion

Compared to the LEFT() function, the solution presented in this blog post not only expands leading character removal/cleansing functionality beyond the blank character exclusively. Using this coding technique we can simultaneously remove a variety of leading characters (including but not limited to blank). For example, if we have a string X=' 0.000 12345' and specify C = ' 0.' (the order of characters listed within the value of C does not matter), then all three characters ' ', '0', and '.' will be removed from all leading positions of X. The resulting string will be Y='12345'.

Additional resources

Questions? Thoughts? Comments?

Do you find this post useful? Do you have questions, concerns, comments? Please share with us below.

Removing leading characters from SAS strings was published on SAS Users.

8月 202021
 

This article was co-written by Marinela Profi, Product Marketing Manger for AI, Data Science and Open-Source. Check out her blog profile for more information.

Artificial Intelligence (AI) is changing the way people and organizations improve decision-making and move about their lives – from text translation, to chatbots and predictive analytics. However, many organizations are struggling to realize its potential as model deployment processes remain disconnected, creating unforeseen headaches and manual work. Additionally, other requirements like performance monitoring, retraining, and integration into core business processes must be streamlined for optimal teamwork and resource usage.

SAS and Microsoft are partnering to inspire greater trust and confidence in every decision, by driving innovation and proven AI in the cloud. With a combined product roadmap, SAS and Microsoft are working tirelessly to improve offerings and connectivity between SAS Viya and Microsoft Azure environments across industries. That’s why we are especially excited to announce SAS Viya users can now publish SAS and open-source models in Azure Machine Learning.

The SAS and Microsoft team built a tightly integrated connection between SAS Model Manager and Azure Machine Learning to register, validate, and deploy SAS and open-source models to Azure Machine Learning with just a few clicks. From there, data scientists can enrich their applications with SAS or open-source models within their Azure environment.

This integration will enable users to:

1) Extend SAS models stored in SAS Model Manager into the Azure Machine Learning registry, offering more opportunities for collaboration across the enterprise.

2) Deploy SAS and open-source models from SAS Model Manager to Azure Machine Learning on the same Azure Kubernetes cluster you have already set up in Azure Machine Learning. Before deploying the model, you can validate the model and ensure it meets your criteria.

3) Seamlessly connect your SAS Viya and Microsoft environments without the hassle of verifying multiple licenses with single sign-on authentication via Azure Active Directory (Azure AD).

Get started

Step 1: To get started, use Azure AD for simplified SAS Viya access.

Step 2: SAS Model Manager governs, deploys, and monitors all types of SAS and open-source models (i.e., Python, R). On the home page, you can see the projects you and your team are working on in addition to “What’s new” and “How to” videos with the latest updates.

Step 3: Compare different models to identify the most accurate “champion model.” Deploy the model throughout the Microsoft ecosystem from cloud to edge with customizable runtimes, centralized monitoring, and management capabilities.

Step 4: Using the provided artifacts, Azure Machine Learning creates executable containers supporting SAS and open-source models. You can use the endpoints created through model deployment for the scoring of the data.

Step 5: Schedule SAS Model Manager to detect model drift and automatically retrain models in case of poor performance or bias detection.

Discover more

If you want to know more about SAS Model Manager and our partnership with Microsoft, check out the resources below:

“What’s New with SAS Model Manager” article series to find out the latest and greatest updates.
SAS Viya on Azure to solve 100 different use cases on Data for Good, industries and startups.

Let us know what you think!

We would love to hear what you think about this new experience and how we can improve it. If you have any feedback for the team, please share your thoughts and ideas in the comments section below.

Deploying SAS and open-source models to Azure Machine Learning has never been easier was published on SAS Users.

8月 192021
 

In Part 1 of my series fetch CAS, fetch!, I executed the fetch CAS action to return rows from a CAS table. That was great, but what can you do with the results? Maybe you want to create a visualization that includes the top five cars by MSRP for all Toyota vehicles? How can we accomplish this task? We'll cover this question and provide several other examples in this post.

Save the results of a CAS action as a SAS data set

First, execute the table.fetch CAS action on the CARS in-memory table to filter for Toyota cars, return the Make, Model and MSRP columns, and sort the results by MSRP. Then save the results of the action in a variable using the results option. The results of an action return a dictionary to the client. The fetch action returns a dictionary with a single key, and the result table as the value. In this example, I'll name the variable toyota.

proc cas;
    table.fetch result=toyota / 
          table={name="cars", caslib="casuser",
                 where="Make='Toyota'",
                 vars={"Make","Model","MSRP"}
          },
          sortBy={
                 {name="MSRP", order="DESCENDING"}
          },
          index=FALSE,
          to=5;
...

After executing the code, the results of the action are stored in the variable toyota and not shown in the output.

Next, use the SAVERESULT statement to save the result table stored in the toyota variable. Since the variable is a dictionary, specify the variable name toyota, a dot, then the fetch key. This will access the result table from the dictionary. Finally, specify the DATAOUT= option with the name of the SAS data set to create.

proc cas;
    table.fetch result=toyota / 
          table={name="cars", caslib="casuser",
                 where="Make='Toyota'",
                 vars={"Make","Model","MSRP"}
          },
          sortBy={
                 {name="MSRP", order="DESCENDING"}
          },
          index=FALSE,
          to=5;
 
     saveresult toyota.fetch dataout=work.top5;
quit;

After executing the code, the result table is saved as a SAS data set. The SAS data set is named top5 and saved to the WORK library.

 

 

Wondering what else can we do? Let's take a look.

Visualize the SAS data set

Now that the result table is saved as a SAS data set, you can use the SGPLOT procedure to create a bar chart! Consider the code below.

title justify=left height=14pt "Top 5 Toyota Cars by MSRP";
proc sgplot data=work.top5
         noborder nowall;
    vbar Model / 
          response=MSRP 
          categoryorder=respdesc
          nooutline
          fillattrs=(color=cx0379cd);
    label MSRP="MSRP";
quit;
title;

There it is! We processed our data in CAS using the fetch action, returned a smaller subset of results back to the client, then used traditional SAS programming techniques on the smaller table. This method will work similarly in other languages like Python and R. Then you can then use the native visualization packages of the language!

You can now use your imagination on what else to do with the raw data from the CARS table or from the top5 results table we produced with the table.fetch action. Feel free to get creative.

Summary

CAS actions are optimized to run in a distributed environment on extremely large tables. Your CAS table can contain millions or even billions of rows. Since the data in CAS can be extremely large, the goal is to process and subset the table on the CAS server, then return a smaller amount of data to the client for additional processing, visualization or modeling.

Additional resources

fetch Action
SAVERESULT Statement
SAS® Cloud Analytic Services: Fundamentals
Plotting a Cloud Analytic Services (CAS) In-Memory Table
Getting started with SGPLOT - Index
Code used in this post

CAS-Action! fetch CAS, fetch! - Part 2 was published on SAS Users.