CAS

2月 152018
 

In this article, I will set out clear principles for how SAS Viya 3.3 will interoperate with Kerberos. My aim is to present some overview concepts for how we can use Kerberos authentication with SAS Viya 3.3. We will look at both SAS Viya 3.3 clients and SAS 9.4M5 clents. In future blog posts, we’ll examine some of these use cases in more detail.

With SAS Viya 3.3 clients we have different use cases for how we can use Kerberos with the environment. In the first case, we use Kerberos delegation throughout the environment.

Use Case 1 – SAS Viya 3.3

The diagram below illustrates the use case where Kerberos delegation is used into, within, and out from the environment.

How SAS Viya 3.3 will interoperate with Kerberos

In this diagram, we show the end-user relying on Kerberos or Integrated Windows Authentication to log onto the SAS Logon Manager as part of their access to the visual interfaces. SAS Logon Manager is provided with a Kerberos keytab and HTTP principal to enable the Kerberos connection. In addition, the HTTP principal is flagged as “trusted for delegation” so that the credentials sent by the client include the delegated or forwardable Ticket-Granting Ticket (TGT). The configuration of SAS Logon Manager with SAS Viya 3.3 includes a new option to store this delegated credential. The delegated credential is stored in the credentials microservice, and secured so that only the end-user to which the credential belongs can access it.

When the end-user accesses SAS CAS from the visual interfaces the initial authentication takes place with the standard internal OAuth token. However, since the end-user stored a delegated credential when accessing the SAS Logon Manager an additional Origin attribute is set on the token of “Kerberos.” The internal OAuth token also contains the groups the end-user is a member of within the Claims. Since we want this end-user to run the SAS CAS session as themselves they must have been added to a custom group with the ID=CASHostAccountRequired. When the SAS CAS Controller receives the OAuth token with the additional Kerberos Origin, it requests the visual interface to make a second Kerberized connection. So, the visual interface retrieves the delegated credential from the credentials microservice and uses this to request a Service Ticket to connect to SAS CAS.

SAS CAS has been provided with a Kerberos keytab and a sascas principal to enable the Kerberos connection. Since the sascas principal is flagged as “trusted for delegation,” the credentials sent by the visual interfaces include a delegated or forwardable Ticket-Granting Ticket (TGT). SAS CAS validates the Service Ticket, which in turn authenticates the end-user. The SAS CAS Controller then launches the session as the end-user and constructs a Kerberos ticket cache containing the delegated TGT. Now, within their SAS CAS session the end-user can connect to the Secured Hadoop environment as themselves since the SAS CAS session has access to a TGT for the end-user.

This means in this first use case all access to, within, and out from the SAS Viya 3.3 environment leverages strong Kerberos authentication. This is our “gold-standard” for authenticating the end-user to each part of the environment.

But, it is strictly dependent on the end-user being a member of the custom group with ID=CASHostAccountRequired, and the two principals (HTTP and sascas) being trusted for delegation. Without both the Kerberos delegation will not take place.

Use Case 1a – SAS Viya 3.3

The diagram below illustrates a slight deviation on the first use case.

Here, either through choice or by omission, the end-user is not a member of the custom group with the ID=CASHostAccountRequired. Now even though the end-user connects with Kerberos and irrespective of the configuration of SAS Logon Manager to store delegated credentials the second connection using Kerberos is not made to SAS CAS. Now the SAS CAS session runs as the account that launched the SAS CAS controller, cas by default. Since, the session is not running as the end-user and SAS CAS did not receive a Kerberos connection, the Kerberos ticket cache that is generated for the session does not contain the credentials of the end-user. Instead, the Kerberos keytab and principal supplied to SAS CAS are used to establish the credentials in the Kerberos ticket cache.

This means that even though Kerberos was used to connect to SAS Logon Manager the connection to the Secured Hadoop environment is as the sascas principal and not the end-user.

The same situation could be arrived at if the HTTP principal for SAS Logon Manager is not trusted for delegation.

Use Case 1b – SAS Viya 3.3

A final deviation to the initial use case is shown in the following diagram.

In this case the end-user connects to SAS Logon Manager with any other form of authentication. This could be the default LDAP authentication, external OAuth, or external SAML authentication. Just as in use case 1a, this means that the connection to SAS CAS from the visual interfaces only uses the internal OAuth token. Again, since no delegated credentials are used to connect to SAS CAS the session is run as the account that launched the SAS CAS controller. Also, the ticket cache created by the SAS Cloud Analytic Service Controller contains the credentials from the Kerberos keytab, i.e. the sascas principal. This means that access to the Secured Hadoop environment is as the sascas principal and not the end-user.

Use Case 2 – SAS Viya 3.3

Our second use case covers those users entering the environment via the programming interfaces, for example SAS Studio. In this case, the end-users have entered a username and password, a credential set, into SAS Studio. This credential set is used to start their individual SAS Workspace Session and to connect to SAS CAS from the SAS Workspace Server. This is illustrated in the following figure.

Since the end-users are providing their username and password to SAS CAS it behaves differently. SAS CAS uses its own Pluggable Authentication Modules (PAM) configuration to validate the end-user’s credentials and hence launch the SAS CAS session process running as the end-user. However, in addition the SAS CAS Controller also uses the username and password to obtain an OAuth token from SAS Logon Manager and then can obtain any access control information from the SAS Viya 3.3 microservices. Obtaining the OAuth token from the SAS Logon Manager ensures any restrictions or global caslibs defined in the visual interfaces are observed in the programming interfaces.

With the SAS CAS session running as the end-user and any access controls validated, the SAS CAS session can access the Secured Hadoop cluster. Now since the SAS CAS session was launched using the PAM configuration, the Kerberos credentials used to access Hadoop will be those of the end-user. This means the PAM configuration on the machines hosting SAS CAS must be linked to Kerberos. This PAM configuration then ensures the Kerberos Ticket-Granting Ticket is available to the CAS Session as is it launched.

Next, we consider three further use cases where the client is SAS 9.4 maintenance 5. Remember that SAS 9.4 maintenance 5 can make a direct connection to SAS CAS without requiring SAS/CONNECT. The use cases we will discuss will illustrate the example with a SAS 9.4 maintenance 5 web application, such as SAS Studio. However, the statements and basic flows remain the same if the SAS 9.4 maintenance 5 client is a desktop application like SAS Enterprise Guide.

Use Case 3 – SAS 9.4 maintenance 5

First, let’s consider the case where our SAS 9.4 maintenance 5 end-user enters their username and password to access the SAS 9.4 environment. This is illustrated in the following diagram.

In this case, since the SAS 9.4 Workspace Server is launched using a username and password, these are cached on the launch of the process. This enables the SAS 9.4 Workspace Server to use these cached credentials when connecting to SAS CAS. However, the same process occurs if instead of the cached credentials being provided by the launching process, they are provided by another mechanism. These credentials could be provided from SAS 9.4 Metadata Server or from an authinfo file in the user’s home directory on the SAS 9.4 environment. In any case, the process on the SAS Cloud Analytic Server Controller is the same.

The username and password used to connect are validated through the PAM stack on the SAS CAS Controller, as well as being used to generate an internal OAuth token from the SAS Viya 3.3 Logon Manager. The PAM stack, just as in the SAS Viya 3.3 programming interface use case 2 above, is responsible for initializing the Kerberos credentials for the end-user. These Kerberos credentials are placed into a Kerberos Ticket cache which makes them available to the SAS CAS session for the connection to the Secured Hadoop environment. Therefore, all the different sessions within SAS 9.4, SAS Viya 3.3, and the Secured Hadoop environment run as the end-user.

Use Case 4 – SAS 9.4 maintenance 5

Now what about the case where the SAS 9.4 maintenance 5 environment is configured for Kerberos authentication throughout. The case where we have Kerberos delegation configured in SAS 9.4 is shown here.

Here the SAS 9.4 Workspace Server is launched with Kerberos credentials, the Service Principal for the SAS 9.4 Object Spawner will need to be trusted for delegation. This means that a Kerberos credential for the end-user is available to the SAS 9.4 Workspace Server. The SAS 9.4 Workspace Server can use this end-user Kerberos credential to request a Service Ticket for the connection to SAS CAS. SAS CAS is provided with a Kerberos keytab and principal it can use to validate this Service Ticket. Validating the Service Ticket authenticates the SAS 9.4 end-user to SAS CAS. The principal for SAS CAS must also be trusted for delegation. We need the SAS CAS session to have access to the Kerberos credentials of the SAS 9.4 end-user.

These Kerberos credentials made available to the SAS CAS are used for two purposes. First, they are used to make a Kerberized connection to the SAS Viya Logon Manager, this is to obtain the SAS Viya internal OAuth token. Therefore, the SAS Viya Logon Manager must be configured to accept Kerberos connections. Second, the Kerberos credentials of the SAS 9.4 end-user are used to connect to the Secure Hadoop environment.

In this case since all the various principals are trusted for delegation, our SAS 9.4 end-user can perform multiple authentication hops using Kerberos with each component. This means that through the use of Kerberos authentication the SAS 9.4 end-user is authenticated into SAS CAS and out to the Secure Hadoop environment.

Use Case 5 – SAS 9.4 maintenance 5

Finally, what about cases where the SAS 9.4 maintenance 5 session is not running as the end-user but as a launch credential; this is illustrated here.

The SAS 9.4 session in this case could be a SAS Stored Process Server, Pooled Workspace Server, or a SAS Workspace server leveraging a launch credential such as sassrv. The key point being that now the SAS 9.4 session is not running as the end-user and has no access to the end-user credentials. In this case we can still connect to SAS CAS and from there out to the Secured Hadoop environment. However, this requires some additional configuration. This setup will leverage One-Time-Passwords generated by the SAS 9.4 Metadata Server, so the SAS 9.4 Metadata Server must be made aware of the SAS CAS. This is done by adding a SAS 9.4 metadata definition for the SAS CAS. Our connection from SAS 9.4 must then be “metadata aware,” achieved by using authdomain=_sasmeta_ on the connection.

Equally, the SAS Viya 3.3 side of the environment must be able to validate the One-Time-Password used to connect to SAS CAS. When SAS CAS receives the One-Time-Password on the connection, it is sent to the SAS Viya Logon Manager for validation and to obtain a SAS Viya internal OAuth token. We need to add some configuration to the SAS Viya Logon Manager to enable this to validate the One-Time-Password. We configured the SAS Viya Logon Manager with the details of where the SAS 9.4 Web Infrastructure Platform is running. The SAS Viya Logon Manager passes the One-Time-Password to the SAS 9.4 Web Infrastructure Platform to validate the One-Time-Password. After the One-Time-Password is validated a SAS Viya internal OAuth token is generated and passed back to SAS CAS.

Since SAS CAS does not have access to the end-user credentials, the session that is created will be run using the account used to launch the controller process, cas by default. Since the end-user credentials are not available, the Kerberos credentials that are initialized for the session are from the Kerberos keytab provided to SAS CAS. Then the connection to the Secured Hadoop environment will be made using those Kerberos credentials of the principal assigned to the SAS CAS.

Summary

We have presented several use cases above. The table below can be used to summarize and differentiate the use cases based on key factors.

SAS Viya 3.3 - Some Kerberos principles was published on SAS Users.

12月 072017
 

authorization in SAS ViyaAuthorization determines what a user can see and do in an application. An authorization system is used to define access control policies, and those policies are later enforced so that access requests are granted or denied. To secure resources in SAS Viya there are three authorization systems of which you need to be aware.

The General Authorization system secures folders within the SAS Viya environment and the content of folders, for example, reports or data plans. It can also secure access to functionality.

The CAS Authorization system secures CAS libraries, tables (including rows and columns) and CAS actions.

The File system authorization system secures files and directories on the OS, for example code or source tables for CAS libraries.

In this post, I will provide a high-level overview of the General and CAS authorization systems.  If you want to dig into more detail please see the SAS Viya Administration Guide Authorization section.

An important factor in authorization is the identity of the user making the request. In a visual deployment of SAS Viya the General and CAS authorization systems use the same identity provider, an LDAP server. The other common feature between the two authorization systems is that they are deny-based. In other words, access to resources is implicitly disallowed by default. This is important as it marks a change for those familiar with metadata authorization in SAS 9.4.

You can administer both general and CAS authorization from SAS Environment Manager. CAS authorization may also be administered from CAS Server Monitor and from the programming interfaces via the accessControl action set. In SAS Viya 3.3, command-line interfaces are available to set authorization for both systems.

General Authorization System

The general authorization system is used to administer authorization for folders, content and functionality. The general authorization system is an attribute based authorization system which determines authorization based on the attributes of the:

  • Subject: attributes that describe the user attempting the access e.g. user, group, department, role, job title etc.,
  • Action: attributes that describe the action being attempted e.g. read, delete, view, update etc.
  • Resource (or object): attributes that describe the object being accessed (e.g. the object type, location, etc.)
  • Context (environment): attributes that deal with time, location etc.

This attribute model supports Boolean logic, in which rules contain “IF, THEN” statements about who is making the request, the resource, the context and the action.

In the general authorization system, information about the requesting user, the target resource, and the environment can influence access. Each access request has a context that includes environmental data such as time and device type. Environmental constraints can be incorporated using conditions.

Permission inheritance in the general authorization system flows through a hierarchy of containers. Each container conveys settings to its child members. Each child member inherits settings from its parent container. Containers can be folders, or rest endpoints which contain functionality. For example, a folder will pass on its permissions to any children which can be additional sub-folders, or content such as reports or data plans.

In the general authorization system precedence, the way in which permission conflicts are resolved, is very simple, the only factor that affects precedence is the type of rule (grant or prohibit). Prohibit rules have absolute precedence. If there is a relevant prohibit rule, effective access is always prohibited.

So, a deny anywhere in the object or identity hierarchy for the principal (user making the request) means that access is denied.

CAS Authorization

The CAS authorization system mostly focuses on data. It makes sense therefore that it is implemented in the style of a database management system (DBMS). DBMS style authorization systems focus on securing access to data. The permissions relate to data for example, read, write, update, select etc., and some CAS-specific ones like promote and limited promote.

Permission inheritance in the general authorization system flows through a hierarchy of objects, The hierarchy is CASLIB > table > rows/columns.

In the CAS authorization system precedence, the way in which permission conflicts are resolved is determined by where an access control is set and who can access control is assigned to.   The precedence rules are:

  • Direct access controls have precedence over inherited settings.
  • Identity precedence from highest to lowest is user > groups > authenticated users.

To put this another way, if a direct access control is found on an object it will determine access. If multiple direct access controls are found, a control for a user will be used over a control for a group, which will be similarly be used over a control for all authenticated users.  If no direct access control is found on, for example, a table, settings will be inherited from the CASLIB in a similar manner.

A final note on CASLIBSs, there may be additional authorization that needs to be considered to provide access to data. For example, for a path based CASLIB if host-layer access requirements are not met, grants in the CAS authorization layer do not provide access.

This has been a brief look at the two authorization systems in a SAS VIYA environment. The table below summarizes some of the information in this blog.

authorization in SAS Viya

As I noted at the start, you can find much more detail in the SAS Viya Administration Guide An introduction to authorization in SAS Viya was published on SAS Users.

11月 292017
 

The CAS procedure (PROC CAS) enables us to interact with SAS Cloud Analytic Services (CAS) from the SAS client based on the CASL (the scripting language of CAS) specification. CASL supports a variety of data types including INT, DOUBLE, STRING, TABLE, LIST, BLOB, and others. The result of a CAS action could be any of the data types mentioned above. In this article, we will use the CAS procedure to explore techniques to interact with the results produced from CAS actions.

PROC CAS enables us to run CAS actions in SAS Viya; CAS actions are at the heart of how the CAS server receives requests, performs the relevant tasks and returns results back to the user. CAS actions representing common functionality constitute CAS action sets.  Consider the CAS action set TABLE. Within the TABLE action set, we will currently find 24 different actions relevant to various tasks that can be performed on a table. For example, the COLUMNINFO action within the TABLE action set will inform the user of the contents of a CAS in-memory table, including column names, lengths, data types, and so on. Let’s take a look at the following code to understand how this works:

proc cas;
	table.columnInfo  / table='HMEQ';
	simple.summary    / table={name='HMEQ' groupby='BAD'};
run;

In the code above, the table called ‘HMEQ’ is a distributed in-memory CAS table that contains data about applicants who were granted credit for a certain home equity loan. The categorical binary-valued variable ‘BAD’ identifies a client who has either defaulted or repaid their home equity loan. Since PROC CAS is an interactive procedure, we can invoke multiple statements within a single run block. In the code above we have executed the COLUMNINFO action from the TABLE actionset and the SUMMARY action from the SIMPLE action set within the same run block. Notice that we are able to obtain a summary statistic of all the interval variables in the dataset grouped by the binary variable ‘BAD’ without having to first sort the data by that variable. Below is a snapshot of the resulting output.

PROC CAS

Fig1: Output from table.columninfo

Fig 2: Output from executing the summary actionset with a BY group option

Let’s dig into this a little deeper by considering the following statements:

proc cas;
simple.summary result=S /table={name='hmeq'};
describe S;
run;

In the code snippet above, the result of the summary action is returned to the user in a variable ‘S’ which is a dictionary. How do we know? To find that, we have invoked the DESCRIBE statement to help us understand exactly what the form of this result, S, is. The DESCRIBE statement is used to display the data type of a variable. Let’s take a look at the log file:

The log file informs the user that ‘S’ is a dictionary with one entry of type table named “Summary.” In the above example the column names and the attributes are shown on the log file. If we want to print the entire summary table or a subset, we would use the code below:

proc cas;
simple.summary result=S /table={name='hmeq'};
print s[“Summary”];
print s[“summary”, 3:5];
run;

The first PRINT statement will fetch the entire summary table; the second PRINT statement will fetch rows 3 through 5. We can also use WHERE expression processing to create a new table with rows that match the WHERE expression. The output of the second PRINT statements above are shown in the figure below:

The result of an action could also be more complex in nature; it could be a dictionary containing dictionaries, arrays, and lists, or the result could be a list containing lists and arrays and tables etc. Therefore, it is important to understand these concepts through some simple cases. Let’s consider another example where the result is slightly more complex.

proc cas;
simple.summary result=s /table={name='hmeq', groupby='bad'};
describe s;
print s["ByGroup1.Summary", 3:5]; run;

In the example above, we are executing a summary using a BY group on a binary variable. The log shows that the result in this case is a dictionary with three entries, all of which are tables. Below is a snapshot of the log file as well as the output of PRINT statement looking at the summary for the first BY group for row 3 through 5.

If we are interested in saving the output of the summary action as a SAS data set (sas7bdat file), we will execute the SAVERESULT statement. The code below saves the summary statistics of the first BY group in the work library as a SAS dataset.

proc cas;
simple.summary result=s /table={name='hmeq', groupby='bad'};
describe s;
print s["ByGroup1.Summary", 3:5]; 
saveresult s["ByGroup1.Summary"] dataout=work.data;
run;

A helpful function we can often use is findtable. This function will search the given value for the first table it finds.

proc cas;
simple.summary result=s /table={name='hmeq'};
val = findtable(s);
saveresult val dataout=work.summary;
run;

In the example above, I used findtable to locate the first table produced within the dictionary, S, and save it under ‘Val,’ which is then directly invoked with SAVERESULT statement to save the output as a SAS dataset.

Finally, we also have the option to save the output of a CAS action in another CAS Table. The table summary1 in the code below is an in-memory CAS table that contains the output of the summary action. The code and output are shown below:

proc cas;
simple.summary /table={name='hmeq'} casout={name='mylib.summary1'}; 
fetch /table={name='mylib.summary1'};
run;

In this post, we saw several ways of interacting with the results of a CAS action using the CAS procedure. Depending on what our end goal is, we can use any of these options to either view the results or save the data for further processing.

Interacting with the results of PROC CAS was published on SAS Users.

11月 212017
 

SAS Viya provides a robust, scalable, cloud-ready, distributed runtime engine. This engine is driven by CAS (Cloud Analytic Services), providing fast processing for many data management techniques that run distributive, i.e. using all threads on all defined compute nodes.

Why

PROC APPEND is a common technique used in SAS processes. This technique will concatenate two data sets together. However, PROC APPEND will produce an ERROR if the target CAS table exists prior to the PROC APPEND.

Simulating PROC APPEND

Figure 1. SAS Log with the PROC APPEND ERROR message

Now what?

How

To explain how to simulate PROC APPEND we first need to create two CAS tables. The first CAS table is named CASUSER.APPEND_TARGET. Notice the variables table, row and variable in figure 2.

Figure 2. Creating the CAS table we need to append rows to

The second CAS table is called CASUSER.TABLE_TWO and in figure 3 we can review the variables table, row, and variable.

Figure 3. Creating the table with the rows that we need to append to the existing CAS table

To simulate PROC APPEND we will use a DATA Step. Notice on line 77 in figure 4 we will overwrite an existing CAS table, i.e. CASUSER.APEND_TARGET. On Line 78, we see the first table on the SET statement is CASUSER.APPEND_TARGET, followed by CASUSER.TABLE_TWO. When this DATA Step runs, all of the rows in CASUSER.APPEND_TARGET will be processed first, followed by the rows in CASUSER.TABLE_TWO. Also, note we are not limited to two tables on the SET statement with DATA Step; we can use as many as we need to solve our business problems.

Figure 4. SAS log validating the DATA Step ran in CAS i.e. distributed

The result table created by the DATA Step is shown in figure 5.

Figure 5. Result table from simulating PROC APPEND using a DATA Step

Conclusion

SAS Viya’s CAS processing allows us to stage data for downstream consumption by leveraging robust SAS programming techniques that run distributed, i.e. fast. PROC APPEND is a common procedure used in SAS processes. To simulate PROC APPEND when using CAS tables as source and target tables to the procedure use DATA Step.

How to simulate PROC APPEND in CAS was published on SAS Users.

11月 202017
 

Many SAS users have inquired about SAS Cloud Analytic Services’ (CAS) Distributed Network File System (Learn more about CAS.)

The “NFS” in “DNFS”

Let’s start at the beginning. The “NFS” in DNFS stands for “Network File System” and refers to the ability to share files across a network. As the picture below illustrates, a network file system lets numerous remote hosts access another host’s files.

Understanding DNFS

NFS

There are numerous network file system protocols that can be used for file sharing – e.g. CIFS, Ceph, Lustre – but the most common on Linux is NFS. While NFS is the dominant file-sharing protocol, the “NFS” part of the DNFS does not correspond to the NFS protocol. Currently all the DNFS supported file systems are based on NFS, but DNFS may support file systems based on other protocols in the future. So, it’s best to think of the “NFS” part of “DNFS” as a generic “network file system” (clustered file system) and not the specific NFS protocol.

The “D” in “DNFS”

The “D” in DNFS stands for “Distributed” but it does not refer to the network file system. By definition, that is already distributed since the file system is external to the machines accessing it. The “Distributed” in DNFS refers to CAS’ ability to use a network file system in a massively parallel way. With a supported file system mounted identically on each CAS node, CAS can access (both write) the file system’s CSV and SASHDAT files from every worker node in parallel.

This parallel file access is not an attribute of the file system, it is a capability of CAS. By definition, network file systems facilitate access at the file level, not the block level. With DNFS, CAS actively manages parallel block level I/O to the network file system, making sure file seek locations and block I/O operations do not overlap, etc.

DNFS

 

DNFS as CAS Backing Store

Not only can CAS perform multi-machine parallel I/O from network file systems, it can also memory-map NFS SASHDAT files directly into CAS. Thus, SASHDAT files on DNFS act as both the CASlib data source as well as the virtual memory “backing store,” often negating the need for CAS to utilize memory mapping (mmap()).

Note 1: Data transformations on load, such as row filtering and field selection, as well as encryption can trigger CAS_DISK_CACHE usage. Since the data must be transformed (subset and/or decrypted), CAS copies the transformed data into CAS_DISK_CACHE to support CAS processing.

Note 2: It is possible to use DNFS atop an encrypting file system or hardware storage device. Here, the HDAT blocks are stored encrypted but transmitted to the CAS I/O layer decrypted. Assuming no other transformations, CAS_DISK_CACHE will not be used in this scenario.

DNFS Memory Mapping

Performance Considerations

DNFS-based CAS loading will only be as fast as the slowest component involved. The chosen NFS architecture (hardware and CAS connectivity) should support I/O throughput commensurate with the CAS installation and in-line with the implementation’s service level agreements. So, while NetApp ONTAP clustering architecture. A different file system technology might look a little different but the same basic ideas will apply.

DNFS w/ Multi Machine File System

As described earlier, CAS manages the parallel I/O operations. Requests from CAS are sent to the appliance and handled by the NFS metadata server. The storage device implementing the NFS protocol points CAS DNFS to the proper file and block locations on the NFS data servers which pass the data to the CAS worker nodes directly.

Understanding DNFS was published on SAS Users.

11月 162017
 

As a SAS Viya user, you may be wondering whether it is possible to execute data append and data update concurrently to a global Cloud Analytic Services (CAS) table from two or more CAS sessions. (Learn more about CAS.) How would this impact the report view while data append or data update is running on a global CAS table? These questions are even more important for those using the programming interface to load and update data in CAS. This post discusses data append, data update, and concurrency in CAS.

Two or more CAS sessions can simultaneously submit a data append and data update process to a CAS table, but only one process at a time can run against the same CAS table. The multiple append and update processes execute in serial, one after another, never running in a concurrent fashion. Whichever CAS session is first to acquire the write lock on a global CAS table prevails, appending or updating the data first. The other append and update processes must wait in a queue to acquire the write lock.

During the data append process, the appended data is not available to end users or reports until all rows are inserted and committed into the CAS table. While data append is running, users can still render reports against the CAS table using the original data, but excluding the appended rows.

Similarly, during the data update process, the updated data is not available to users or reports until the update process is complete. However, CAS lets you render reports using the original (non-updated) data, as the CAS table is always available for the read process. During the data update process, CAS makes additional copies into memory of the to-be-updated blocks containing rows in order to perform the update statement. Once the update process is complete, the additional and now obsolete copies of blocks, are removed from CAS. Data updates to a global CAS table is an expensive operation in terms of CPU and memory usage. You have to factor in the additional overhead memory or CAS_CACHE space to support the updates. The space requirement depends on the number of rows being affected by the update process.

At any given time, there could be only one active write process (append/update) against a global CAS table. However, there could be many concurrent active read processes against a global CAS table. A global CAS table is always available for read processes, even when an append or update process is running on the same CAS table.

The following log example describes two simultaneous CAS sessions executing data appends to a CAS table. Both append processes were submitted to CAS with a gap of a few seconds. Notice the execution time for the second CAS session MYSESSION1 is double the time that it took the first CAS session to append the same size of data to the CAS table. This shows that both appends were executing one after another. The amount of memory used and the CAS_CACHE location also shows that both processes were running one after another in a serial fashion.

Log from simultaneous CAS session MYSESSION submitting APPEND

58 proc casutil ;
NOTE: The UUID '1411b6f2-e678-f546-b284-42b6907260e9' is connected using session MYSESSION.
59 load data=mydata.big_prdsale
60 outcaslib="caspath" casout="big_PRDSALE" append ;
NOTE: MYDATA.BIG_PRDSALE was successfully added to the "caspath" caslib as "big_PRDSALE".
61 quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 49.58 seconds
cpu time 5.05 seconds

Log from simultaneous CAS session MYSESSION1 submitting APPEND

58 proc casutil ;
NOTE: The UUID 'a20a246e-e0cc-da4d-8691-c0ff2a222dfd' is connected using session MYSESSION1.
59 load data=mydata.big_prdsale1
60 outcaslib="caspath" casout="big_PRDSALE" append ;
NOTE: MYDATA.BIG_PRDSALE1 was successfully added to the "caspath" caslib as "big_PRDSALE".
61 quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 1:30.33
cpu time 4.91 seconds

 

When the data append process from MYSESSION1 was submitted alone (no simultaneous process), the execution time is around the same as for the first session MYSESSION. This also shows that when two simultaneous append processes were submitted against the CAS table, one was waiting for the other to finish. At one time, only one process was running the data APPEND action to the CAS table (no concurrent append).

Log from a lone CAS session MYSESSION1 submitting APPEND

58 proc casutil ;
NOTE: The UUID 'a20a246e-e0cc-da4d-8691-c0ff2a222dfd' is connected using session MYSESSION1.
59 load data=mydata.big_prdsale1
60 outcaslib="caspath" casout="big_PRDSALE" append ;
NOTE: MYDATA.BIG_PRDSALE1 was successfully added to the "caspath" caslib as "big_PRDSALE".
61 quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 47.63 seconds
cpu time 4.94 seconds

 

The following log example describes two simultaneous CAS sessions submitting data updates on a CAS table. Both update processes were submitted to CAS in a span of a few seconds. Notice the execution time for the second CAS session MYSESSION1 is double the time it took the first session to update the same number of rows. The amount of memory used and the CAS_CACHE location also shows that both processes were running one after another in a serial fashion. While the update process was running, memory and CAS_CACHE space increased, which suggests that the update process makes copies of to-be-updated data rows/blocks. Once the update process is complete, the space usage in memory/CAS_CACHE returned to normal.

When the data UPDATE action from MYSESSION1 was submitted alone (no simultaneous process), the execution time is around the same as for the first CAS session.

Log from a simultaneous CAS session MYSESSION submitting UPDATE

58 proc cas ;
59 table.update /
60 set={
61 {var="actual",value="22222"},
62 {var="country",value="'FRANCE'"}
63 },
64 table={
65 caslib="caspath",
66 name="big_prdsale",
67 where="index in(10,20,30,40,50,60,70,80,90,100 )"
68 }
69 ;
70 quit ;
NOTE: Active Session now MYSESSION.
{tableName=BIG_PRDSALE,rowsUpdated=86400}
NOTE: PROCEDURE CAS used (Total process time):
real time 4:37.68
cpu time 0.05 seconds

 

Log from a simultaneous CAS session MYSESSION1 submitting UPDATE

57 proc cas ;
58 table.update /
59 set={
60 {var="actual",value="22222"},
61 {var="country",value="'FRANCE'"}
62 },
63 table={
64 caslib="caspath",
65 name="big_prdsale",
66 where="index in(110,120,130,140,150,160,170,180,190,1100 )"
67 }
68 ;
69 quit ;
NOTE: Active Session now MYSESSION1.
{tableName=BIG_PRDSALE,rowsUpdated=86400}
NOTE: PROCEDURE CAS used (Total process time):
real time 8:56.38
cpu time 0.09 seconds

 

The following memory usage snapshot from one of the CAS nodes describes the usage of memory before and during the CAS table update. Notice the values for “used” and “buff/cache” columns before and during the CAS table update.

Memory usage on a CAS node before starting a CAS table UPDATE

Memory usage on a CAS node during CAS table UDPATE

Summary

When simultaneous data append and data update requests are submitted against a global CAS table from two or more CAS sessions, they execute in a serial fashion (no concurrent process execution). To execute data updates on a CAS table, you need an additional overhead memory/CAS_CACHE space. While the CAS table is going through the data append or data update process, the CAS table is still accessible to rendering reports.

Concurrent data append and update to a global CAS table was published on SAS Users.

10月 272017
 

When loading data into CAS using PROC CASUTIL, you have two choices on how the table can be loaded:  session-scope or global-scope.  This is controlled by the PROMOTE option in the PROC CASUTIL statement.

Session-scope loaded

proc casutil;
                                load casdata="model_table.sas7bdat" incaslib="ryloll" 
                                outcaslib="otcaslib" casout="model_table”;
run;
Global-scope loaded
proc casutil;
                                load casdata="model_table.sas7bdat" incaslib="ryloll" 
                                outcaslib="otcaslib" casout="model_table" promote;
run;

 

Global-scope loaded

proc casutil;
                                load casdata="model_table.sas7bdat" incaslib="ryloll" 
                                outcaslib="otcaslib" casout="model_table" promote;
run;

 

Remember session-scope tables can only be seen by a single CAS session and are dropped from CAS when that session is terminated, while global-scope tables can be seen publicly and will not be dropped when the CAS session is terminated.

But what happens if I want to create a new table for modeling by partitioning an existing table and adding a new partition column? Will the new table be session-scoped or global-scoped? To find out, I have a global-scoped table called MODEL_TABLE that I want to partition based on my response variable Event. I will use PROC PARTITION and call my new table MODEL_TABLE_PARTITIONED.

proc partition data=OTCASLIB.MODEL_TABLE partind samppct=30;
	by Event;
	output out=OTCASLIB.model_table_partitioned;
run;

 

After I created my new table, I executed the following code to determine its scope. Notice that the Promoted Table value is set to No on my new table MODEL_TABLE_PARTITIONED which means it’s session-scoped.

proc casutil;
     list tables incaslib="otcaslib";
run;

 

promote CAS tables from session-scope to global-scope

How can I promote my table to global-scoped?  Because PROC PARTITION doesn’t provide me with an option to promote my table to global-scope, I need to execute the following PROC CASUTIL code to promote my table to global-scope.

proc casutil;
     promote casdata="MODEL_TABLE_PARTITIONED"
     Incaslib="OTCASLIB" Outcaslib="OTCASLIB" CASOUT="MODEL_TABLE_PARTITIONED";
run;

 

I know what you’re thinking.  Why do I have to execute a PROC CASUTIL every time I need my output to be seen publicly in CAS?  That’s not efficient.  There has to be a better way!

Well there is, by using CAS Actions.  Remember, when working with CAS in SAS Viya, SAS PROCs are converted to CAS Actions and CAS Actions are at a more granular level, providing more options and parameters to work with.

How do I figure out what CAS Action syntax was used when I execute a SAS PROC?  Using the PROC PARTITION example from earlier, I can execute the following code after my PROC PARTITION completes to see the CAS Action syntax that was previously executed.

proc cas;
     history;
run;

 

This command will return a lot of output, but if I look for lines that start with the word “action,” I can find the CAS Actions that were executed.  In the output, I can see the following CAS action was executed for PROC PARTITION:

action sampling.stratified / table={name='MODEL_TABLE', caslib='OTCASLIB', groupBy={{name='Event'}}}, samppct=30, partind=true, output={casOut={name='MODEL_TABLE_PARTITIONED', caslib='OTCASLIB', replace=true}, copyVars='ALL'};

 

To partition my MODEL_TABLE using a CAS Action, I would execute the following code.

proc cas;
  sampling.stratified / 
    table={name='MODEL_TABLE', caslib='OTCASLIB', groupBy={name='Event'}}, 
    samppct=30, 
    partind=true, 
    output={casOut={name='embu_partitioned', caslib='OTCASLIB'}, copyVars='ALL'};
run;

 

If I look up sampling.stratified syntax in the

proc cas;
  sampling.stratified / 
    table={name='MODEL_TABLE', caslib='OTCASLIB', groupBy={name='Event'}}, 
    samppct=30, 
    partind=true, 
    output={casOut={name='embu_partitioned', caslib='OTCASLIB', promote=true}, copyVars='ALL'};
run;

 

So, what did we learn from this exercise?  We learned that when we create a table in CAS from a SAS PROC, the default scope will be session and to change the scope to global we would need to promote it through a PROC CASUTIL statement.  We also learned how to see the CAS Actions that were executed by SAS PROCs and how we can write code in CAS Action form to give us more control.

I hope this exercise helps you when working with CAS.

Thanks.

Tip and tricks to promote CAS tables from session-scope to global-scope was published on SAS Users.

8月 182017
 

SAS Viya deployments use credentials for accessing databases and other third-party products that require authentication. In this blog post, I will look at how this sharing of credentials is implemented in SAS Environment Manager.

In SAS Viya, domains are used to store the:

  • Credentials required to access external data sources.
  • Identities that are allowed to use those credentials.

There are three types of domains:

  • Authentication stores credentials that are used to access an external source that can then be associated with a caslib.
  • Connection used when the external database has been set up to require a User ID but no password.
  • Encryption stores an encryption key required to read data at rest in a path assigned to a caslib.

In this blog post we will focus on authentication domains which are typically used to provide access to data in a database management system. It is a pretty simple concept; an authentication domain makes a set of credentials available to a set of users. This allows SAS Viya to seamlessly access a resource. The diagram below shows a logical view of a domain. In this example, the domain PGAuth stores the credentials for a Postgres database, and makes those credentials available to two groups (and their members) and three users.

How does this work when a user accesses data in a database caslib? The following steps are performed:

1.     Log on to SAS Viya using personal credentials: the user’s identity is established including group memberships.

2.     Access a CASLIB for a database: using the user’s identity and the authentication domain of the CASLIB, Viya will look up the credentials associated with that identity in the domain.

3.     Two results are possible. A credential match is:

  • 1.     Found: the credentials are passed to the database authentication provider to determine access to the data.
  • 2.     Not found: no access to the data is provided.

To manage domains in SAS Environment Manager you must be an administrator. In SAS Environment Manager select Security > Domains. There are two views available:  Domains and Credentials. The Domains view lists all defined domains. You can access the credentials for a domain by right-clicking on the domain and selecting Credentials.

The Credentials view lists all credentials defined and the domains for which they are associated.

Whatever way you get to a credential, you can edit it by right-clicking and selecting Edit. In the edit dialog, you can specify the Identities (users and groups) that can use the credential, and the User ID and Password of the credential.  Note that only users who are already listed in the Identities field will be able to edit this field, so make sure you are in this field (directly or through group membership) prior to saving.

To use an authentication domain, you reference it in the CASLIB definition. When defining a non-path based CASLIB you must select a domain to provide user credentials to connect to the database server. This can be done when creating a new CASLIB in SAS Environment Manager in the Data > Libraries area.

If you use code to create or access your caslib, use the authenticationdomain option. In this example, we specify authenticationdomain in the table.addcaslib action.

If a user is not attached to the authentication domain directly, or through a group membership, they will not be able to access the credentials. An error will occur when they attempt to access the data.

This has been a brief look at storing and using credentials to access databases from SAS Viya. You can find  more detail in the SAS Viya Administration Guide in the section titled SAS Viya sharing credentials for database access was published on SAS Users.

8月 172017
 

In this blog post I am going to cover the example of importing data into SAS Viya using Cloud Analytic Services (CAS) actions via REST API. For example, you may want to import data into a CASLib via REST API.  This means you can perform an import of data outside of the SAS Self-Service Import user interface environment using REST API.  Once this data is loaded into CAS it is available for use in applications such as SAS Visual Analytics and SAS Visual Data Builder.

Introduction

To import data into SAS Viya via REST API, you need to make a series of REST API calls:

1.     Start CAS Session
2.     Load Data into a CASLib
3.     End CAS Session

I will walk through these various REST API calls in the sections below using the REST API testing application HTTPRequestor, which is a free add-on to the Mozilla Firefox browser.

Before I perform any of my REST API calls, I need to Base-64 encode my credentials. The input for encoding the credentials is: I used the site https://www.base64encode.org/ to encode my credentials.  Note: You can use other methods (e.g., Python) to encode your credentials. Use the preferred method by your organization to ensure you are meeting their security protocols.

Below is the header Authorization information I will be sending with each of my requests.

Authorization Header

1.     Start CAS Session

First, I need to start a CAS Session. Below is an example request for starting a CAS Session:

POST https://<YourCASServer:Port>/cas/sessions

Authorization: Basic <Base-64EncodedCredentials>
 Content-Type: application/json

{}

This request returns the CASSessionUUID needed in the next step.

I construct my request in HTTPRequestor as follows and submit the request:

Start CAS Session Request/Response

Here is a screenshot of the raw transaction information.

Start CAS Session Raw Transaction

I need to copy the CAS Session UUID information that was returned for use in the subsequent REST API calls since their CAS Actions must be performed within a CAS Session.

2.     Load Data into a CASLib

Now that I have started my CAS session and have its UUID, I can load the table to CAS. Below is an example request for the table.loadTable CAS Action:

POST 
https://<YourCASServer:Port>/cas/sessions/<CASSessionUUID>/actions/table.load
Table

Authorization: Basic <Base-64EncodedCredentials>
 Content-Type: application/json

{"casLib":"<InputCASLib>","importOptions":{"fileType":"<FileType>"},"path":"<InputFilePathAndName>",
 "casout":{"caslib":"<OutputCASLib>","name":"<OutputTableName>","promote":true}}

 

This request returns a log message: “NOTE: Cloud Analytic Services made the file <InputFilePathAndName> available as table <OutputTableName> in caslib <OutputCASLib>.”

For my example, I will load the SAS data set BASEBALL located in the helpdata CASLib to the Public CASLib and call the CAS Table SAS_BASEBALL.  I am copying the data to the Public CASLib to make it more readily available to all CAS users. Let’s first confirm that the SAS_BASEBALL table does not currently exist in the Public CASLib.

Public CASLib Before LoadTable CAS Action Called

I construct my request in HTTPRequestor as follows and submit the request:

Load Table Request/Response

Here is a screenshot of the raw transaction information.

Load Table Raw Transaction

Next, I will confirm that the SAS_BASEBALL data set is now loaded in the Public CASLib.

Public CASLib After LoadTable CAS Action Called

The SAS_BASEBALL data set is now available for use in applications such as SAS Visual Analytics and SAS Visual Data Builder.

3.     End CAS Session

Finally, I need to terminate my CAS Session. Below is an example request for the session.endSession CAS Action:

POST https://&lt;YourCASServer:Port&gt;/cas/sessions/&lt;CASSessionUUID&gt;/actions/session.endSession

Authorization: Basic &lt;Base-64EncodedCredentials&gt;
 Content-Type: application/json

{}

 

This request returns a status of 0 indicating there was no error and the CASSessionUUID specified in the request has ended.

I construct my request in HTTPRequestor as follows and submit the request:

End CAS Session Request/Response

Here is a screenshot of the raw transaction information.

End CAS Session Raw Transaction

Conclusion

These calls can be strung together so you could schedule their execution. For more information on SAS Viya and REST APIs, refer to the following documentation the SAS Cloud Analytics REST API documentation.

Load Data into SAS Viya via REST API was published on SAS Users.

6月 292017
 

One of the big benefits of the SAS Viya platform is how approachable it is for programmers of other languages. You don't have to learn SAS in order to become productive quickly. We've seen a lot of interest from people who code in Python, maybe because that language has become known for its application in machine learning. SAS has a new product called SAS Visual Data Mining and Machine Learning. And these days, you can't offer such a product without also offering something special to those Python enthusiasts.

Introducing Python SWAT

And so, SAS has published the Python SWAT project (where "SWAT" stands for the SAS scripting wapper for analytical transfer. The project is a Python code library that SAS released using an open source model. That means that you can download it for free, make changes locally, and even contribute those changes back to the community (as some developers have already done!). You'll find it at github.com/sassoftware/python-swat.

SAS developer Kevin Smith is the main contributor on Python SWAT, and he's a big fan of Python. He's also an expert in SAS and in many programming languages. If you're a SAS user, you probably run Kevin's code every day; he was an original developer on the SAS Output Delivery System (ODS). Now he's a member of the cloud analytics team in SAS R&D. (He's also the author of more than a few conference papers and SAS books.)

Kevin enjoys the dynamic, fluid style that a scripting language like Python affords - versus the more formal "code-compile-build-execute" model of a compiled language. Watch this video (about 14 minutes) in which Kevin talks about what he likes in Python, and shows off how Python SWAT can drive SAS' machine learning capabilities.

New -- but familiar -- syntax for Python coders

The analytics engine behind the SAS Viya platform is called CAS, or SAS Cloud Analytic Services. You'll want to learn that term, because "CAS" is used throughout the SAS documentation and APIs. And while CAS might be new to you, the Python approach to CAS should feel very familiar for users of Python libraries, especially users of pandas, the Python Data Analysis Library.

CAS and SAS' Python SWAT extends these concepts to provide intuitive, high-performance analytics from SAS Viya in your favorite Python environment, whether that's a Jupyter notebook or a simple console. Watch the video to see Kevin's demo and discussion about how to get started. You'll learn:

  • How to connect your Python session to the CAS server
  • How to upload data from your client to the CAS server
  • How SWAT extends the concept of the DataFrame API in pandas to leverage CAS capabilities
  • How to coax CAS to provide descriptive statistics about your data, and then go beyond what's built into the traditional DataFrame methods.

Learn more about SAS Viya and Python

There are plenty of helpful resources to help you learn about using Python with SAS Viya:

And finally, what if you don't have SAS Viya yet, but you're interested in using Python with SAS 9.4? Check out the SASPy project, which allows you to access your traditional SAS features from a Jupyter notebook or Python console. It's another popular open source project from SAS R&D.

The post Using Python to work with SAS Viya and CAS appeared first on The SAS Dummy.