SAS Viya

5月 302023
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In this post I'll discuss how to load multiple CSV files into memory as a single table using the loadTable action.

Load and prepare data on the CAS server

To start, we need to create multiple CSV files in a folder. I created this script to load and prepare the CSV files in the CAS server for this post. This should work in your environment. The script does the following:

  • Loads the WARRANTY_CLAIMS_0117.sashdat files from the Samples caslib into memory.
  • Modifies the in-memory table by renaming and dropping columns.
  • Adds a sub directory in the Casuser caslib named csv_file_blogs.
  • Saves a CSV file for each distinct year in the csv_file_blogs folder.
  • Drops the original CAS table.
import swat
import pandas as pd
 
## Connect to CAS
conn = swat.CAS(Enter your CAS connection information)
 
##
## Data prep functions 
##
def prep_data():
    """
    Load and prepare the warranty_claims_0017.sashdat file in CAS
    """
    ## Load the WARRANTY_CLAIMS_0117.sashdat from the Samples caslib into memory in Casuser
    conn.loadTable(path='WARRANTY_CLAIMS_0117.sashdat', caslib='samples',
                   casout={'name':'warranty_claims', 
                           'caslib':'casuser',
                           'replace':True})
    ##
    ## DATA PREP
    ## 
 
    ## Reference the CAS table in an object
    castbl = conn.CASTable('warranty_claims', caslib = 'casuser')
 
    ## Store the column names and labels in a dataframe
    df_col_names = castbl.columnInfo()['ColumnInfo'].loc[:,['Column','Label']]
 
    ## Create a list of dictionaries of how to rename each column using the column labels
    renameColumns = []
    for row in df_col_names.iterrows():
        colName = row[1].values[0]
        labelName = row[1].values[1].replace(' ','_')
        renameColumns.append(dict(name=colName, rename=labelName))
 
    ## List of columns to keep in the CAS table
    keepColumns = {'Campaign_Type', 'Platform','Trim_Level','Make','Model_Year','Engine_Model',
                   'Vehicle_Assembly_Plant','Claim_Repair_Start_Date', 'Claim_Repair_End_Date'}
 
    ## Rename and drop columns to make the table easier to use
    castbl.alterTable(columns = renameColumns, keep = keepColumns)
 
    ## Return the CASTable object reference
    return castbl
 
 
 
def save_cas_table_as_csv_files(cas_table_reference):
    """
    Create a subdirectory in Casuser and save mutliple CSV files in it.
    """
    ## Create a subdirectory in the Casuser caslib named csv_file_blogs
    conn.addCaslibSubdir(name = 'casuser', path = 'csv_file_blogs')
 
    ## Create a CSV file for each year
    for year in list(castbl.Model_Year.unique()):      
        (cas_table_reference
         .query(f"Model_Year ='{year}'")
         .save(name = f'csv_file_blogs/warranty_claims_{year}.csv', 
               caslib = 'casuser',
               replace = True)
        )
 
    ## Drop the CAS Table
    cas_table_reference.dropTable()
 
    ## View files in the csv_file_blogs subdirectory
    fi = conn.fileInfo(allFiles = True, caslib = 'casuser')
    fi_subdir = conn.fileInfo(path = 'csv_file_blogs', caslib = 'casuser')
    display(fi, fi_subdir)
 
 
## Create the CAS table
castbl = prep_data()
 
## Save the CAS table as a CSV file for each year
save_cas_table_as_csv_files(castbl)
 
## and the results
NOTE: Cloud Analytic Services made the file WARRANTY_CLAIMS_0117.sashdat available as table WARRANTY_CLAIMS in caslib CASUSER(Peter).
NOTE: Cloud Analytic Services saved the file csv_file_blogs/warranty_claims_2015.csv in caslib CASUSER(Peter).
NOTE: Cloud Analytic Services saved the file csv_file_blogs/warranty_claims_2016.csv in caslib CASUSER(Peter).
NOTE: Cloud Analytic Services saved the file csv_file_blogs/warranty_claims_2017.csv in caslib CASUSER(Peter).
NOTE: Cloud Analytic Services saved the file csv_file_blogs/warranty_claims_2018.csv in caslib CASUSER(Peter).
NOTE: Cloud Analytic Services saved the file csv_file_blogs/warranty_claims_2019.csv in caslib CASUSER(Peter).
NOTE: Cloud Analytic Services dropped table warranty_claims from caslib CASUSER(Peter.Styliadis).

The results show that five CSV files named warranty_claims<year>.csv were created in the subdirectory csv_file_blogs in the Casuser caslib.

Next, I'll use the fileInfo action to view the new csv_file_blogs subdirectory in the Casuser caslib. In the fileInfo CAS action use the includeDirectories parameter to view subdirectories.

conn.fileInfo(includeDirectories = True, caslib = 'casuser')

Lastly, I'll view the available files in the csv_file_blogs subdirectory. To view files in a subdirectory in a caslib add the folder name in the path parameter.

conn.fileInfo(path = 'csv_file_blogs', caslib = 'casuser')

The results show that the subdirectory has five CSV files.

Load all of the CSV files as a single table

To easily load multiple CSV files as a single in-memory CAS table use the table.loadTable CAS action. The only requirements when loading multiple CSV files:

  • All of the CSV files must end with .csv.
  • Set the multiFile option to True in the importOptions parameter.
  • The CSV files must have the same number of columns and the columns must have the same data type.

Here, I'll use the loadTable action with the path parameter to specify the subdirectory csv_file_blogs and the caslib parameter to specify the Casuser caslib. The importOptions parameter uses the fileType option to determine the other parameters that apply. The CSV file type has a multiFile option. If you set the multiFile option to True, it will try to load all CSV files from the path into a single in-memory table. The casOut parameter simply specifies output CAS table information.

conn.loadTable(path="csv_file_blogs", caslib = 'casuser',  ## Specify the subdirectory name (csv_file_blogs) and the input caslib name
               importOptions = {                           ## Specify the import options
                   'fileType' : 'CSV',
                   'multiFile' : True
               },
               casOut = {                                  ## Specify output CAS table information
                   'name' : 'allCSVFiles',
                   'caslib' : 'casuser',
                   'replace' : True
               })
 
## and the results
NOTE: The file, '/cas/data/caslibs/casuserlibraries/peter/csv_file_blogs/warranty_claims_2015.csv' was used to create the CAS Table column names.
NOTE: The CSV file table load for table, 'allCSVFiles' produced 153217 rows from 5 files.
NOTE: Cloud Analytic Services made the file csv_file_blogs available as table ALLCSVFILES in caslib CASUSER(Peter).

The action concatenated each CSV file and loaded them as a single distributed CAS table named ALLCSVFILES.

Next, I'll run the tableInfo action to view available in-memory tables in the Casuser caslib.

conn.tableInfo(caslib = 'casuser')

The action results show one CAS table is in memory.

Then I'll make a client-side reference to the distributed CAS table and run the head method from the SWAT package.

allcsvfilesTbl = conn.CASTable('allcsvfiles', caslib = 'casuser')
allcsvfilesTbl.head()

Finally, I'll run the SWAT value_counts on the Model_year column.

(allcsvfilesTbl   ## CAS table
 .Model_Year      ## CAS column
 .value_counts()  ## SWAT value_counts method
)
## and the results
2017.0    70479
2018.0    43975
2016.0    32707
2019.0     3510
2015.0     2546
dtype: int64

The results show all five years of data were imported into the CAS table, one for each CSV file.

Add file name and path columns to the CAS table

The importOptions parameter has a variety of CSV options you can use to modify how to import the files. Two useful options are showFile and showPath. The showFile option includes a column that shows the CSV file name. The showPath option includes a column that shows the fully-qualified path to the CSV file.

conn.loadTable(path="csv_file_blogs", caslib = 'casuser',  ## Specify the subdirectory name (csv_file_blogs) and the input caslib name
               importOptions = {                           ## Specify the import options
                   'fileType' : 'CSV',
                   'multiFile' : True,                     
                   'showFile' : True,
                   'showPath' : True
               },
               casOut = {                                  ## Specify output CAS table information
                   'name' : 'allCSVFiles_path_info',
                   'caslib' : 'casuser',
                   'replace' : True
               })
 
## and the results
NOTE: The file, '/cas/data/caslibs/casuserlibraries/peter/csv_file_blogs/warranty_claims_2015.csv' was used to create the CAS Table column names.
NOTE: The CSV file table load for table, 'allCSVFiles_path_info' produced 153217 rows from 5 files.
NOTE: Cloud Analytic Services made the file csv_file_blogs available as table ALLCSVFILES_PATH_INFO in caslib CASUSER(Peter).

I'll run the tableInfo action to view available CAS tables.

conn.tableInfo(caslib = 'casuser')


The action shows that two CAS tables are now in-memory.

I'll reference and preview the new CAS table .

allcsvfiles_path_infoTbl = conn.CASTable('allcsvfiles_path_info', caslib = 'casuser')
allcsvfiles_path_infoTbl.head()

The preview shows the new CAS table has a column named path and a column named fileName.

Lastly, I'll use the SWAT value_counts method to view how many rows came from each CSV file. I'll specify the CAS table reference, the column name, then the value_counts method.

(allcsvfiles_path_infoTbl
 .fileName
 .value_counts()
)
 
## and the results
warranty_claims_2017.csv    70479
warranty_claims_2018.csv    43975
warranty_claims_2016.csv    32707
warranty_claims_2019.csv     3510
warranty_claims_2015.csv     2546
dtype: int64

The results show the CSV files were concatenated into a single CAS table. We can see how many rows came from each file.

Summary

The SWAT package blends the world of pandas and CAS to process your distributed data. In this example I focused on using table.loadTable CAS action to concatenate multiple CSV files into a single distributed CAS table using a single method.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 19 - Load Multiple CSV Files was published on SAS Users.

5月 272023
 

This year, SAS Innovate in Orlando brought executives and other industry luminaries together to inspire attendees to use analytics to outpace tomorrow. Data and analytics enables you to get to the true value of a human being.  Michael Lewis, Author of Moneyball Chief Marketing Officer Jenn Chase, who hosted the event, [...]

A look back at SAS Innovate in Orlando was published on SAS Voices by Caslee Sims

5月 222023
 

The shift to cloud computing has dramatically transformed how organizations and businesses store, manage and analyze their data. The benefits of this shift are well-known: Moving to the cloud increases the potential to gain more value their data through faster access, greater scalability and cost-efficiency. All plain sailing, right? Not [...]

3 facts about cloud economics that could help your organization was published on SAS Voices by Spiros Potamitis

5月 092023
 

Humanity collectively creates 2.5 quintillion bytes of data daily, presenting incredible opportunities for organizations—especially those who collect it in the cloud. Cloud-based data offers excellent insights for organizations, including a meaningful look at customer needs and operational improvements. However, these benefits come with risks, especially regarding security. According to IBM [...]

How SAS and Microsoft are using the power of partners to protect data in the cloud was published on SAS Voices by Albert Qian

5月 052023
 

Oftentimes, the very thought of moving house is worse than the reality of moving. Sometimes we know we need more space, a bigger garden or a faster commute to work but we delay deciding because it just seems too hard. I see parallels with this every time I’ve moved. I delay the decision as I don’t want the stress and hassle, but eventually circumstances force my hand and a move often must happen without the luxury of time. However, if I’ve done the necessary planning and preparation, the move tends to go smoothly and then I start reaping the benefits and wondering why I didn’t do it earlier.

There are lessons here for when we think about moving data and analytics platforms to the cloud. The technology world never stands still, and transformation and change are constant. With that in mind, spending some time thinking about the why, what, how and when of a move to the cloud and Viya 4 is a worthwhile activity. Even if the answer is not to move right now the exercise will provide useful assets that will help when the time is right.

This article will try and set out a series of high-level considerations to make sure you set out on the journey to Cloud and Viya 4 on the right foot.

Document your current challenges

A good place to start is to take stock of where you are today. What are the key challenges with your current platform and what are there emerging corporate strategies around cloud, data or the open-source ecosystem that will need addressing in the future?

A typical list might look something like this:

  • Concerns on costs of data storage
  • CAPEX cost on infrastructure
  • Cost and operational risks of obsolete hardware / operating system
  • IT Strategy requires the use of Cloud computing when modernising IT portfolios
  • Increased infrastructure costs due to poor visibility of existing workloads and inefficient job allocation
  • Organizational and Data silos that limit re-use of analytics assets
  • Productivity of their analytics teams to meet the demand from the business
  • Lack of governance and oversight on analytical models

Having these challenges documented provides a baseline for a future business case that can assess the benefits of moving to the cloud and Viya 4.

Understand your organisations motivation for change

Motivation for change can vary between different organisations. There will be key outcomes that you wish to meet when considering why you should move to the cloud and Viya 4. A better understanding of this gives your organisation the ability to align behind the key drivers and gives you the focus points for any business case. Motivations might fall into the following areas:

Business Agility

  • Migrating to the Cloud and Viya 4 helps increase your overall operational agility. This lets you react to market conditions more quickly through activities such as expanding into new markets, selling lines of your business, and acquiring available assets that offer a competitive advantage.
  • The traditional deployment timescales and effort for SAS environments are no longer a constraint allowing you to flex when moving the SAS estate to the cloud and Viya 4 and being able to take advantage of cloud-native storage scalability, deployment, and update patterns.

Operational Costs

  • Key considerations of operational costs are the unit price of infrastructure, the ability to match supply and demand, and having transparency over infrastructure spending.
  • Data and analytics platforms can be expensive to run and maintain with storage typically 80% of the overall cost of infrastructure – simple considerations including infrastructure rightsizing and introducing alternative storage options provide the ability to review storage technologies against business needs, matching SLA and performance requirements with operational costs.

Workforce Productivity

  • Typically, when moving to the cloud productivity is increased by two key factors: not having to wait for infrastructure and having access to the breadth and depth of cloud services that you would otherwise have to build and maintain. In fact, it is common to see workforce productivity improvements of 30-50% following a large migration.
  • Automation significantly reduces the effort against SAS deployment timescales and utilising the SAS Viya low code/no code interfaces boosts data engineers' and data scientists' productivity

Cost Avoidance

  • Eliminating the need for hardware refresh programs and constant maintenance programs are the key contributors to cost avoidance. Most organisations now have no interest in the cost and effort required to execute a big on-premises refresh cycle or a data centre renewal. Moving the SAS estate to the cloud and Viya 4 allows you to realise these benefits more quickly.

Operational Resilience

  • The ability to provide services and meet compliance and availability requirements gives more options to solve traditional issues such as lifecycle management, performance, availability, and ensures business continuity and disaster recovery.
  • On-premises SAS deployments can sometimes be maintained and operated poorly with organisations scared to make operational changes to their environments for fear of introducing business risk. A move to the cloud and the Viya 4 cloud-native architecture can help deliver a fresh opportunity to ensure resilience across infrastructure, software and data.

Map out the high-level benefits of moving

By spending some time analysing the current challenges and the key motivations for a change, it is possible to produce a prioritised list of benefits and recommendations that are the most relevant to your organisation. An example might look like the following:

Rank Business priority Benefit How the benefit can be realised with Cloud & Viya
1 High Secure; governed and compliant analytics Migration decisions can be triggered by security vulnerabilities that can’t be effectively patched; hardware that has become obsolete & reached the end of its useful life. These concerns can be removed by moving to SAS Viya and phasing out its on-premises support.
2 High Transparent and trusted analytics Responsible AI and the overall management of analytics is becoming more important to organisations with the increased adoption of analytics. This is in part due to the negative impact on a company’s reputation that can happen from the poor management of their analytics. SAS Viya allows you to utilise frameworks such as ModelOps to achieve the appropriate oversight of models and decisions.
3 High Reduce data storage costs and increase processing speed by using Cloud native storage There is a much wider set of data storage options that can be exploited within cloud-native architectures, and these can provide a better balance between cost, availability, and performance, for example, the use of object storage (S3, ADSL Gen2, etc.) or cloud-native databases (e.g., Synapse, Snowflake, RedShift etc.)
4 Medium Increase analytics productivity and ROI By allowing users the choice of language and access to low code/no code interfaces with which they can develop analytics, their productivity will rise. Additionally, highly skilled data scientists can spend more time on the tasks that require their attention because manual, repeated tasks can be automated as part of the open platform.
5 Medium Future-proof IT infrastructure and processes SAS Viya supports modern ModelOps deployment processes with cloud-native software deployment techniques. For example, Modern software development and deployment paradigms: A microservice architecture that facilitates fail-over support reduces system downtime. Modern, open APIs are readily available to interact with other software to help achieve wider ecosystem integration. A container-based architecture results in a scalable environment that is easily transferable across public, private and on-premises deployments in a cost-effective manner.
6 Medium Address skills and resources shortage The right talent is defined by the technology available for people to use, as well as being able to leverage the skills already available to the customer internally. By providing an open platform that democratizes analytics, customers have the choice across all talent in the marketplace, whether it be Python, R or SAS Data Scientists or low-code/no-code business users.
7 Medium Leverage cloud and hosted services to reduce CAPEX Organisations have a choice on how they want to deploy SAS to reduce CAPEX. SAS software can be provided through the SAS Cloud, public cloud or on the infrastructure of choice with remote managed services being available to look after the platform.
8 Low Increase operational efficiency By reducing the dependency on data scientists through democratizing analytics to support low-code/no-code users, organisations can spread the analytical requirements across a wider user base - reducing the need to pay for high-cost, specific skill sets.
9 Low Reduce technical debt and obsolescence SAS Viya is a cloud-native technology that leverages CI-CD software updates. This means no need to migrate to new versions or have software outages for updates.
10 Low Oversight of analytics targeted to business outcomes SAS Viya unifies model governance with a centrally managed repository that allows simple and efficient model deployment to a range of destinations including containers and APIs. Organisations can centrally manage all their analytical models and related assets, over time and compared to other models, while ensuring traceability and governance.

Next steps; getting into the detail

By following these considerations, you can create a baseline for any move to the cloud and Viya 4. The next step is to gather more detailed, granular information that allows you to establish the relative feasibility and complexity of the move as well as the outline approach and its associated costs.

Just like moving house, the process of moving to Viya can be broken down into a sequence of logical steps and considerations.

To help with gathering the necessary detail, SAS provides a rich content assessment framework that analyses your current environment. It creates a detailed inventory of SAS content (jobs, reports, models, OLAP cubes stored processes etc.) and even analyses SAS code to establish what can be moved to Viya as-is and what elements might need to change. The code analysis goes right down to the exact line numbers in individual programs that will need to be changed (e.g., identifying where it finds hard-coded paths) and can even automate the changes. By gathering this detailed information, it allows for a comprehensive assessment to be created that can even be used to create fixed-price proposals that provide certainty and reduce the risk of any future move. SAS can also help with establishing the business case for change. We can bring to bear TCO calculators and a variety of business value templates that can help accelerate the process of getting organizational buy-in for a move to the cloud and Viya 4.

Please contact your SAS account teams if you want more information and a further article on this topic will be released shortly.

Learn more

Getting your house in order: How to start planning for a move to the cloud and SAS Viya 4 was published on SAS Users.

5月 042023
 

How does the Content Assessment help with code?

Migrating to a new system can be expensive and intimidating as a SAS 9 environment has many components that need to be moved. This article will focus on the code elements in an environment and debunk the myth that SAS 9 code will not run in the latest version of Viya.

The SAS 9 Content Assessment is a collection of applications that are designed to help users understand the various characteristics of their deployment. SAS Content Assessment delivers information that is critical to helping you to upgrade to the latest version of SAS (Viya).

Across the UK&I business, we have completed over 20 content assessments across the banking, insurance, commercial and public sectors. We have reviewed over 3 million individual pieces of code containing over 1 billion lines. This code has been collected from Data Integration Jobs, Enterprise Guide Projects, Base SAS and Stored Processes.

Looking at all this data and using the content assessment tool allows clearer movement paths to be created which means you can move sooner and at a lower cost point.

So, what works in Viya?

The good thing to know is that most of your SAS code will work in the same way as it does today. SAS has over 40 years of experience in the field of code, and we appreciate that you use it to run very important and critical processes. Therefore, we work hard to ensure that the latest version will run your old code.

In reviewing over 3 million pieces of code we have less than 1% which needs to be updated to the new ways of working without a metadata server. About 9% of the 3 million had hard-coded paths which would need to be changed. Hard coding is something that is prevalent across all sites and it’s something that affects all coding languages, new and old.

The next sections will discuss the changes needed to be made to your code when an exception occurs.

So, what makes up the 1%?

The 1% which have been highlighted across the 3 million pieces all fall into the following categories:

XCMD

XCMD

These have been around for many decades and have allowed great versatility as you can execute OS commands within a SAS program. However, as security policies get tighter and tighter the usage of XCMDs is coming under the spotlight. With that in mind, it's worth considering whether you take the opportunity of an upgrade to resolve your technical debt in this area. Many of the common OS XCMDs can be performed with SAS procedures today so if you are moving, copying, deleting, and generally managing a file system then consider the SAS alternative.

Procedure

There are several procedures that simply just won't work with Viya. For example:

This procedure will not work with Viya as the concept of a metadata server does not exist. A careful review of these commands being used needs to take place and alternatives provided which allow the same functionality. In a lot of cases, the content assessment has been allowed to run against the SAS Config location which means it picks up several hits based on background SAS code.

A full list of procedures that won't work in Viya is listed below. Note that they are metadata-bound procedures or oldies which have now been replaced with more efficient procedures.

Procedures

Libname (Metalib)

As we continue looking at what works, the next category is libnames. This might seem strange as a libname is generally a path and if you change the path then everything will work. The ones here are METALIBS and they won’t work in Viya because they require a metadata server to assign the libname.

Libname (Metalib)

There are a few other libname with engines that you might want to watch for.

Assignment

This category of ones to watch for results for the output of a command which is not supported being assigned to a value. In this example you can see that the metadata function has been used to set the “rc”. In Viya this would fail and therefore a warning is set on the report so that an alternative can be found.

Source Code Statement

If you are using any of the following metadata functions and assigning the output, then they will be present on your report.

What about the 9% with hard coding issues?

Inevitably when people write code, they include paths that are hard coded and directly linked to the infrastructure they are processing on. It is always best, where possible, to make these paths dynamic and have macros generating the paths. This allows you to move to new hardware with limited changes. However, there is no way to mandate this best practice and hard-coded paths can end up in the top, middle and bottom of the code. We also appreciate that not all code is batch code and user-written code does not always need the same levels of control or best practices to be applied.

Some of the hard-coded items which are highlighted in the reports are:

Filename

This is an example of one which uses filename to send an email and other common ones can be paths that are then used with infile statements to read raw files.

File

This one shows a file being used to generate flat files and the path would need to be changed.

File

Libname

One of the most common hard-coded paths is libname. This example is easier to change as the macro &DATA_PATH can be changed centrally in an autoexec. In most cases the physical path is present.

Path Assignment

This one will be highlighted when a let statement is used to reference a physical path. This is another good way of referencing hard-coded paths as this path could be changed once and then applied many times across multiple pieces of code.

Path Assignment

%include

The process will also highlight any hard-coded paths which relate to %inlcudes. These can be used to call other programs such as assigning a core set of macros for a session.

include

Infile

Any references with infile will also be highlighted on the report.

Infile

Conclusion

This all sounds like quite a lot of change to get to Viya. Does SAS offer anything to help?

The quick answer is yes. SAS has years of experience in upgrading systems and helping customers move to the latest version. The content assessment tool also has a modified SAS code element which allows the source to target the substitution of hard-coded paths. This means that the level of work to change the 9% is reduced significantly. It could also be a great opportunity to think about centralising paths, making them dynamic and ultimately creating a more portable piece of code.

Learn more

Debunking the myth that SAS code doesn't work in Viya: One billion lines later and the results are... was published on SAS Users.

5月 022023
 

In 2019, I penned the article We’ve all gone to cloud -- but what about my SAS data(base)? At the time, containerised Viya (4) wasn’t out yet and integration with Microsoft was in its early days, so let’s revisit some of those topics and see what’s changed. There is a lot of lower-level detail out there, but I've taken writing this article as an opportunity to look at the wider landscape and the direction of our customers, partners and SAS as we move SAS applications to be cloud-native in Viya 4.

Before I dive into the details, it's worth framing this conversation with the caveat that I’m mainly regarding the typical analytical platform data use cases in SAS, whether it's analytical data engineering (ETL) or analytics or visualisation. I’ll save the discussion on real-time and SOA-type applications of SAS for another time, as there’s a lot to be said there too. This article is about the target repository for storage. A follow-up article is planned on how we get the data up to the cloud as this has been a frequent question recently.

Let’s first look at cloud-native ‘Database’ Access/Connectivity. SAS continues to enhance its support for Cloud-native databases like Snowflake, Synapse and BigQuery. Updates include further pushdown of functions, a wider support for output data types, a wider range of performance options like MODE= and Specify Projects on BigQuery for example, and single sign-on for Snowflake in Azure. There is now extensive connectivity (and in-database) support for Spark, with support for Databricks on Azure, GCP and AWS as well as Microsoft’s Synapse.

It is also important to remember that we also support a wide range of traditional databases in their cloud-native form. For example, Google Cloud Platform Cloud SQL for SQL Server or Microsoft Azure SQL Server Big Data Clusters can be accessed via our SAS/ACCESS Interface to Microsoft SQL Server. This page provides a great summary of the wide range of over 40 cloud-native databases/stores 40+ we support here, and SAS documentation has a good summary listing to aid usage and setup.

Customers will be happy to know that SAS has added direct connectivity to the parquet file format in SAS Compute (v9 compatible engine), similar to the support for ORC. This support was originally file-based only but now includes object storage for both AWS and GCP with Azure to follow in 2023. Why is this important? First of all, parquet has become the default object-based cloud storage choice for customers in the UKI based on my experience. Parquet is open so all your SAS, Python, Spark and other language programmers can simply share data, with no conversion necessary. On the SAS front, this means you can take existing v9 programs and map them directly to data stored in object storage just by ‘changing’ the libnames mapping when you run them in Viya. Parquet tables have a columnar format and can often be compressed smaller than compressed SAS datasets; I’ve often seen them at 50% of the size. However, there are still some advantages to SAS datasets. Items like SAS indexes, integrity constraints, etc. aren’t supported, so parquet is not a direct replacement for SAS datasets. This documentation page details the current restrictions on parquet support. Some of these restrictions are driven by parquet not supporting those features, other features like partitions are on SAS’ roadmap for future implementation.

Object Storage on the cloud gives us another potential option. When we move a SAS Grid forward to Viya (this functionality is now called SAS Workload Management in Viya) as data stored in object storage, it is available on all nodes just like it would be on a clustered file system. Again, you need to look at the performance characteristics and your workload to see if it will meet your requirements. The balance might be there, particularly if a lot of steps are more transitionary, i.e. you source the data from S3, some of your ETL builds into this shared area. Read this previous article on why I think the dynamics of the cloud change how you might design a SAS Grid when you move it to Viya.

A major improvement using Viya is the single native sign-on into the Azure database eco-system, enabled through our co-engineering work with Microsoft. If I’m an end user, there's no need for storing passwords when using FILENAMES to access ADLS, or connecting to databases like SQL server or Synapse. For more details, have a look at this documentation page. Although the most advanced Single sign-on features exist for SAS Viya on Microsoft Azure, SAS has added IAM integration points on AWS, for example with S3 and Redshift. Speaking to Product Management, the security integration improvements will continue further in AWS and GCP in the near future. Keep your eyes peeled for updates over this year!

One other key item to mention is SAS' recent work with SingleStore, which will give customers the option to add SingleStore as a dedicated analytics Database. Our customers get an independent data and analytics solution that works at scale and is completely portable including on-premise. The foundation for this is tight integration between SAS Viya and SingleStore, which includes items like the tiering of storage costs, security, streaming into the database, with instant analytics and interactions/transfers to SAS including items like the pushdown of SAS functionality into the database (specialist Visual Analytics interactions, for example). Because we OEM, there is a single stop for technical queries giving you accountability and access to world-class support. Evidently, we have customers deploying this and seeing considerable TCO reduction versus traditional bets in breed solutions.

As two final footnotes, not everyone will be moving to the cloud in the immediate future and SAS will continue its strong support for on-premise deployments as illustrated by the support for Kubernetes on-site and being able to use newer file formats like parquet in these deployments. Equally, if you’ve moved to the cloud but not all your SAS data has arrived SAS Cloud Data Exchange is now available again on the 2023.3 release of Viya with some net new functionality.

Learn more

Moving SAS and its data to the cloud (AKA Viya): Making those tricky data decisions was published on SAS Users.

4月 212023
 

Public health agencies around the world are struggling to use the mountains of data at their disposal to be proactive against health threats to their constituents. Whether facing infectious diseases, systemic conditions or environmental challenges, public health personnel need tools to enhance their efficiency and unite their data toward a common goal.

These agencies are also being asked to protect their communities from a wider range of threats with fewer resources. SAS Viya has out-of-the-box tools that public health stakeholders need to collect their data, build robust disease surveillance reports and harness predictive analytics to serve their communities.

At SAS, our epidemiologists are laser-focused on empowering public health agencies to make informed decisions and take actions to protect against threats to human health. SAS is highly experienced at helping governments predict and quickly respond to crises, as well as manage everyday public health activities. SAS helps public health agencies quickly find answers and improve agency effectiveness, whether streamlining data ingestion using open source or creating dynamic visualizations for public use. In these short videos, we will demonstrate a few common and essential public health functions using SAS Viya. Our goal is to show the art of the possible within the realm of accessibility. We appreciate your feedback and Happy Exploring!

1. Create a Surveillance Report

Creating disease surveillance reports is a critical task for public health agencies. Learn how Viya 4 empowers public health personnel to efficiently build reliable reports.

2. Share a Surveillance Report

After developing the disease surveillance reports, there will be a variety of stakeholders who need to be able to receive the information. Using Viya 4 makes sharing these reports easier among stakeholders and constituents.

3. Create Layered Maps

Understanding where disease events are having an impact is crucial for crafting a response. Which communities are being affected and where do resources need to flow? Viya 4's capabilities include powerful, out-of-the-box geocoding tools for any environment.

Public Health Transformation: Ways to use SAS Viya was published on SAS Users.

4月 202023
 

The role of SAS administrators is crucial in upholding a company's analytics infrastructure. Their responsibility lies in safeguarding data, making it readily available, and ensuring access to the necessary personnel which is why it was essential to simplify how administrators access the support and resources they need to be efficient in their roles.

SAS has redesigned the support page made just for administrators to provide them with a seamless experience to install and maintain their SAS environment. Make sure to bookmark this page as the resource to continually check.

Some of the new updates include...

➢ Receive the latest updates

Always stay informed about important details, like the latest new technology developments, software updates and patches.

➢ Ease of navigation

Quickly find the information you need, without having to spend hours searching through multiple resources.

➢ More learning and skills development opportunities

From video tutorials to detailed technical books, access a wide range of resources that will help you solve problems and learn new skills. Maintaining a growth mindset with continuous learning helps you stay ahead in your critical role.

➢ Connect with the community

Join a community of other like-minded administrators to ask questions, find solutions, share ideas, and learn new tips.

Explore the recently upgraded support site designed specifically for SAS administrators.

SAS Administrators: Check out the new and improved support site was published on SAS Users.

4月 202023
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In this post I'll discuss how to update rows in a distributed CAS table.

Load and prepare data in the CAS server

I created a script to load and prepare data in the CAS server. This should work in your environment. The script:

  • connects to the CAS server
  • loads the RAND_RETAILDEMO.sashdat file from the Samples caslib into memory in the Casuser caslib and names the CAS table rand_retaildemo
  • lower cases all column names
  • keeps specific columns
  • creates a new column named age_dup that is a duplicate of the age column
  • previews the new CAS table
## Packages
import swat
import os
import pandas as pd
pd.set_option('display.max_columns', 50)
import numpy as np
 
## General CAS connection syntax
conn = swat.CAS(host, port, username, password) ## Add your connection information
 
## Load the RAND_RETAILDEMO.sashdat file into memory on the CAS server
conn.loadTable(path = 'RAND_RETAILDEMO.sashdat', caslib = 'samples',
               casout = {
                      'name' : 'rand_retaildemo',
                      'caslib' : 'casuser',
                      'replace' : True
               })
 
## Reference the CAS table
retailTbl = conn.CASTable('rand_retaildemo', caslib = 'casuser')
 
## Create a copy of the table with a new column
(retailTbl
 .eval("age_dup = age", inplace = False)          ## create a duplicate of the age column
 .copyTable(casout = {'name':'rand_retaildemo',
                      'caslib':'casuser',
                      'replace':True})
)
 
 
## Create a list of columns to rename 
newColNames = [{'name':col,'rename':col.lower()} for col in retailTbl.columns.to_list()]
 
## List of columns to keep
keepColumns = ['custid','bucket','age','age_dup','loyalty_card','brand_name','channeltype','class']
 
## Rename and keep columns
retailTbl.alterTable(columns = newColNames, 
                     keep = keepColumns)
 
## Preview the new CAS table
display(retailTbl.shape, 
        retailTbl.tableDetails(),
        retailTbl.tableInfo(caslib = 'casuser'),
        retailTbl.head())
 
## and the results
NOTE: Cloud Analytic Services made the file RAND_RETAILDEMO.sashdat available as table RAND_RETAILDEMO in caslib CASUSER(Peter).

The results show:

  • the file RAND_RETAILDEMO.sashdat was loaded into memory in the Casuser caslib as the CAS table RAND_RETAILDEMO
  • the CAS table RAND_RETAILDEMO has 930,046 rows and 8 columns
  • the CAS table is separated into 383 blocks
  • the Casuser caslib has one table loaded into memory (RAND_RETAILDEMO)
  • a five row preview the RAND_RETAILDEMO CAS table

Update rows in a table

The preview of the CAS table shows the brand_name, channeltype and class columns contain character values. I want to standardize the casing of the character data in each of those columns by:

  • upper casing all values in the brand_name column
  • lower casing all values in the channteltype column
  • proper casing all values in the class column

I'll use the table.update CAS action to update rows in a CAS table. In the update action specify the set parameter with a list of dictionaries, each dictionary specifies the column to update and how to update it. In the dictionary var specifies the column name and value specifies how to update the column. The value key requires SAS functions or constants.

retailTbl.update(set = [
    {'var':'brand_name', 'value':'upcase(brand_name)'},
    {'var':'channeltype', 'value':'lowcase(channeltype)'},
    {'var':'class', 'value':'propcase(class)'}
])

The results show I updated all 930,046 rows with the update action. Next, I'll preview the updated CAS table.

retailTbl.head()

The preview shows I updated the case of the brand_name, channeltype and class columns.

View missing values

Next I'll execute the distinct action on the age and age_dup columns to confirm they have missing values. The age_dup column is simply a copy of the age column.

retailTbl.distinct(inputs = ['age', 'age_dup'])

The results show the columns have 673,447 missing values.

Update rows based on a condition

After finding missing values, I've determined I want to replace them with the mean age. I'll start by storing the mean age in the variable meanAge.

meanAge = retailTbl.age.mean().round(3)
meanAge
 
## and the results
43.577

Specifying the rows to update

One way to update values by a condition is by filtering the CAS table for the rows to update. You can do that by specifying the query method to filter the CAS table for all rows where age is missing. Then add the update action to update the age column using the meanAge variable value.

(retailTbl
 .query("age is null")
 .update(set = [
     {'var':'age', 'value':f'{meanAge}'}])
)

The update action updated 673,447 rows. This number matches the total number of missing values in the age column. Let's run the distinct action to confirm no missing values exist in age.

retailTbl.distinct(inputs = ['age', 'age_dup'])

The distinct action shows the age column does not have any missing values.  I'll confirm the unique values in the age column by running the value_counts method.

(retailTbl
 .age_dup
 .value_counts()
)

The value_counts method shows that 673,447 rows have the mean age value 43.577.

Update rows using conditional logic

Another way to update rows is using conditional logic. You can perform conditional logic with the SAS IFN function (IFC for character). The IFN function returns a numeric value based on whether an expression is true, false, or missing. Here I'll specify the IFN function and test if age_dup is missing. If age_dup is missing, return the meanAge value. Otherwise, return the original age_dup value.

(retailTbl
 .update(set = [
     {'var':'age_dup', 'value':f'ifn(age_dup = . , {meanAge}, age_dup)'}])
)

The update action updated all 930,046 rows. All rows are updated because conditional logic checks each row and updates the values. While this method works, for this scenario the previous method is a bit more efficient since only the necessary rows are updated.

Let's confirm no missing values exist the age_dup column.

The distinct action shows the age_dup column does not have any missing values.  I'll confirm the unique values in the age_dup column by running the value_counts method.

(retailTbl
 .age_dup
 .value_counts()
)

The value_counts method shows that 673,447 rows have the mean age value 43.577. These are the expected results.

Save the CAS table as a data source file

Lastly, I'll save the updated CAS table as a data source file using the table.save action. For more information on saving CAS tables check out my previous post Part 17 - Saving CAS tables.

retailTbl.save(name = 'retail_clean.parquet', caslib = 'casuser')
 
## and the results
NOTE: Cloud Analytic Services saved the file retail_clean.parquet in caslib CASUSER(Peter).

 

Summary

The SWAT package blends the world of pandas and CAS to process your distributed data. In this example I focused on using the Python SWAT package to update rows in a CAS table. Updating data in CAS tables is a bit different than updating DataFrames using pandas. Understanding how to update CAS tables enables you to avoid having to create a copy of a large in-memory CAS table, improving speed and conserving resources.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 18 - Update rows in a table was published on SAS Users.