deployment

7月 152021
 

I am a long-time SAS 9 Administrator, I feel very confident in my understanding of SAS 9 administration. I will admit I don’t know everything, but I have been administering SAS since the days of SAS 9.1.3. I often tell my students I am a general practitioner when it comes to SAS 9. I know a little about a lot of things. Even at that, I still feel I can handle most questions when it comes to SAS 9 administration. I will always yield to someone with more knowledge, but I feel I can hold my own.

I am comfortable with SAS 9 administration. I’m confident in my understanding of the SAS Servers, metadata and all the other parts that make up the SAS 9 environment. I was great deploying SAS 9 in various environments that mostly meant some form of on-premise. It was the norm for me to expect to be able to go to the air-conditioned room, with the raised floor, and marvel at my server farm. There was some downside to this that we all knew: the upkeep of physical servers, the up-front cost of those servers and scalability just to mention a few factors. Even with those barriers, I still embraced the tried-and-true work associated with being a SAS 9 administrator.

Sometimes things change and bring us out of our comfort zone

As I was snuggled up to SAS 9 administration like a certain character and his blanket, SAS embraced microservices, cloud and Kubernetes. I’ll admit that I had a brief OMG moment. As an instructor I should, in theory, know more than the student. Well, I knew this was one time that might not be the case. I consider myself technically savvy, but I was concerned because I didn’t really know where to begin and it all seemed daunting to me.

Then I had to realize that though SAS has transformed, some things remain the same. I won’t say my thoughts are the thoughts of anyone else at SAS, but it’s my way to proverbially eat the elephant. (No elephants were harmed in erasing my confusion, by the way. 😊)

If you want an amazing introduction to SAS Viya and Kubernetes, check out Vasilij Nevlev’s SAS Global Forum Paper titled, Introduction to Azure Kubernetes Service and SAS® Viya® 2020. Vasilij’s analysis is a useful tool, but I needed a bit more to be able to feel like “I got it".

Making associations with things I already knew

Although there was a dramatic shift in the overall architecture, the tool for managing the latest release of SAS Viya is very similar to the tool used to manage SAS Viya 3.5. I was excited to see that SAS Environment Manager remained much the same. You can still use SAS Environment Manager to manage data, server configuration, content, and security. I am comforted in that.

This association thing was going well, so I decided to continue the process. The next similarity I noticed was in the configuration. I was pleased to know that SAS Viya still relied on a YAML file for configuration options. That told me that although things had changed, much had remained the same.

The final point I will touch on here is the commands used with Kubernetes. Now I know what you might be thinking, “Raymond those kubectl commands are the stumbling block for me!” I get what you are saying. It is another “language” for us to learn. For me, I’m already used to navigating commands in a Linux environment. In addition, the kubeclt command line is support by the Microsoft, Amazon and Google platforms. This is not something you might have to worry about as an administrator, but as an instructor this is wonderful news. SAS Viya is supported on those platforms and now I just have to learn one command line interface.

All of this gets me excited about administration of SAS Viya and what is happening with the software. I hope you share in my excitement. And I hope to meet you in one of our SAS Viya Administration courses soon.

Going from SAS 9 to SAS Viya and I can hardly contain myself was published on SAS Users.

5月 202021
 

In my previous post, Group-based access to Azure Files in the new SAS Viya, I walked through an example of making one or more Azure File Shares available to SAS Viya end user sessions with a group-based access pattern.

This post extends the scenario and covers the case where we configure a SAS Viya environment to an identity provider that does not provide POSIX attributes for determining membership. In the example environment, the SAS Identities Service has been configured to accept users and groups pushed from Azure Active Directory. We will use the more-likely scenario where we have not enabled Azure Active Directory Domain Services, which would allow communication over LDAP protocols. Instead, we will rely on the default and more cloud-native SCIM protocol.

SCIM explained, briefly

While the details of SCIM are out of scope for this post, at its core SCIM (System for Cross-domain Identity Management) is a REST- and JSON-based communication protocol. It touts the benefits of being a more standardized, secure, and automation-ready solution for identity management than its alternatives. All of these characteristics make it an ideal choice for cloud-native solutions and platforms like SAS Viya on Kubernetes. Learn more about configuring SAS Viya on Kubernetes to use a SCIM client for identity management in this series of SAS Community articles by my colleague Stuart Rogers.

When choosing to use a SCIM client, such as Azure Active Directory or Okta, as your identity provider for SAS Viya, you'll need to be aware of the implications that will impact SAS user flows. One such ramification is that the SCIM client will provide no POSIX attributes to determine group membership. In my last post, we created Kubernetes persistent volume(s) to specify how to mount Azure File Share(s) into our Pods. We defined which POSIX group owned the mount in the PV manifest.

But in this case, what GID should we specify to ensure that we have a group-based access model when we don't know what POSIX groups the user is in? That's where these additional steps come in, valid at version 2020.1.4 and above. After you complete them, you can continue on with creating your Persistent Volumes, Persistent Volume Claims, transformers, and other necessary components to make the data available to end user sessions, just as we did in the previous post.

Creating your Custom Group

Since we will not have POSIX group memberships from our SCIM client, we rely on the SAS Viya General Authorization System to create Custom Groups and assign users to them. In this example, I created a Custom Group called Project.Auto Auction with the unique ID of auto_auction_group. I like to use the Project. prefix just as a matter of personal preference because it keeps similar usage patterns grouped together in the alphabetical list. Be sure not to use any special characters other than underscores for the actual IDs though. I then added members to the group.

Note that you can add a combination of provisioned SCIM groups to your Custom Groups instead. If a SCIM group is provisioned to your SAS Viya environment which contains all of your intended users, you can skip this step of creating a Custom Group and move onto the next section. Notice though, at the time of this writing, Azure Active Directory (AAD) only allows provisioning of Office 365 Security Groups via your AAD SCIM client (not standard Office 365 groups).

Now, we have a Custom Group we can use to provide access to our Azure File Share. We also need to ensure we can map to a POSIX GID because of the way Azure File Shares are mounted into Pods. The SAS Identities Service is the microservice on the SAS Viya side that's interfacing with your SCIM client to keep track of users and groups in your environment. The SAS Identities Service uses an algorithm to map the name of your Custom Groups into GIDs. In cases where we do not receive POSIX attributes from the identity provider, the SAS Identities Service will also use the same algorithm to map unique group names into GIDs. We'll see evidence of this in the next section.

Determining the GID of that custom group

Another implication of opting for the more cloud-native SCIM approach to managing identities rather than LDAP is thepassword OAuth flow is no longer relevant. The SCIM paradigm eliminates individual applications from handling username and password combinations and instead relies on your application, redirecting them to the identity provider itself. When we want to interact with SAS Viya REST endpoints then, we can't use the password grant type which would expect us to provide a username and password. Instead, we'll need to use the authorization_code or client_credentials grant types. I wrote an entire series on the topic of selecting the appropriate grant type to communicate with SAS Viya REST endpoints if you're interested in learning more on that topic. There is also great information about communicating with SAS Viya APIs in this post by Joe Furbee and this white paper by Mike Roda.

For the purpose of this post, I will just provide an example of the authorization_code OAuth flow so we can authenticate as ourselves without sending our username/password combination to SAS Viya in the REST call. First, in my browser, I navigate to:

 https://myServer.com/SASLogon/oauth/authorize?client_id=sas.cli&response_type=code

Where, of course, you will replace the myServer part of the URL with your environment URL. The value for the client_id property in the URL corresponds to a client that was registered by default. It was created for the SAS Viya Admin CLI. We'll reuse it here because it has the necessary properties for the authorization_code flow without giving users the opportunity to accidentally not approve some scopes that are crucial for this to function properly. If you would like to create your own client instead, information on doing so is available in all three resources linked in this section of this post.

When I navigate to to this URL, I'm redirected (if not already logged in) to Azure Active Directory's login page where I login as usual. I'm then automatically redirected back to SAS Logon. The next REST call we will make to get the GID for a SCIM or Custom Group requires membership in the SAS Administrator's Custom Group so at the SAS Logon prompt I assume my SAS Administrator's group rights. Finally, I'm redirected to a screen like this:

Next, I can use a tool like Postman (or curl or whatever you prefer) and trade that code for an OAuth token:

Due to the configuration of the sas.cli client, the returned token takes on the scope of my individual user, including my Administrator's group rights.

Then I make another call, this time to one of the SCIM endpoints of SAS Logon, to get the GID of the Custom Group I created in the previous step. Note that I have the actual ID (auto_auction_group) of that Custom Group in the URL, not the name of it (Project.Auto Auction).


Note: the identities API is not a publicly documented API and is not supported by SAS Technical Support or R&D.

This GID is what we will use when defining the Persistent Volume for our Azure File Share. This way, anyone placed in this Custom Group has rights to that mount in our Pods based on the group part of the permissions we set in that PV manifest. That will make more sense after you read Part 1 of this series.

If you're looking for the ID of a particular provisioned SCIM group instead of a Custom Group, make a call like the following to narrow down the responses to only provisioned SCIM groups:

https://myServer.com/identities/groups?limit=50&providerId=scim

From the response, locate the common name of your SCIM group, and its corresponding id. Remember to replace spaces with %20 when you use this ID in your REST calls.

To locate the UID of the user we want to own the files for our mount, we can make a similar call but this time to the user identifier endpoint:

Validations

We can do validations on a few users we expect to be in that Custom Group using the call in the screenshot above. We can make similar calls to the identifier endpoint for users we know are not in this Custom Group and ensure we don't see that GID listed in their secondaryGids property.

And finally, to keep the post short, I've left out the actual mount and permissions validations. Feel free to repeat the steps in the previous post and validate they yield the same results.

SAS Viya on Kubernetes with Azure Active Directory: Group-based access to Azure File Shares was published on SAS Users.

5月 112021
 

It’s safe to say that SAS Global Forum is a conference designed for users, by users. As your conference chair, I am excited by this year’s top-notch user sessions. More than 150 sessions are available, many by SAS users just like you. Wherever you work or whatever you do, you’ll find sessions relevant to your industry or job role. New to SAS? Been using SAS forever and want to learn something new? Managing SAS users? We have you covered. Search for sessions by industry or topic, then add those sessions to your agenda and personal calendar.

Creating a customizable agenda and experience

Besides two full days of amazing sessions, networking opportunities and more, many user sessions will be available on the SAS Users YouTube channel on May 20, 2021 at 10:00am ET. After you register, build your agenda and attend the sessions that most interest you when the conference begins. Once you’ve viewed a session, you can chat with the presenter. Don’t know where to start? Sample agendas are available in the Help Desk.

For the first time, proceedings will live on SAS Support Communities. Presenters have been busy adding their papers to the community. Everything is there, including full paper content, video presentations, and code on GitHub. It all premiers on “Day 3” of the conference, May 20. Have a question about the paper or code? You’ll be able to post a question on the community and ask the presenter.

Want training or help with your code?

Code Doctors are back this year. Check out the agenda for the specific times they’re available and make your appointment, so you’ll be sure to catch them and get their diagnosis of code errors. If you’re looking for training, you’ll be quite happy. Training is also back this year and it’s free! SAS instructor-led demos will be available on May 20, along with the user presentations on the SAS Users YouTube channel.

Chat with attendees and SAS

It is hard to replicate the buzz of a live conference, but we’ve tried our best to make you feel like you’re walking the conference floor. And we know networking is always an important component to any conference. We’ve made it possible for you to network with colleagues and SAS employees. Simply make your profile visible (by clicking on your photo) to connect with others, and you can schedule a meeting right from the attendee page. That’s almost easier than tracking down someone during the in-person event.

We know the exhibit hall is also a big draw for many attendees. This year’s Innovation Hub (formerly known as The Quad) has industry-focused booths and technology booths, where you can interact in real-time with SAS experts. There will also be a SAS Lounge where you can learn more about various SAS services and platforms such as SAS Support Communities and SAS Analytics Explorers.

Get started now

I’ve highlighted a lot in this blog post, but I encourage you to view this 7-minute Innovation Hub video. It goes in depth on the Hub and all its features.

This year there is no reason not to register for SAS Global Forum…and attend as few or as many sessions as you want. Why? Because the conference is FREE!

Where else can you get such quality SAS content and learning opportunities? Nowhere, which is why I encourage you to register today. See you soon!

SAS Global Forum: Your experience, your way was published on SAS Users.

4月 242021
 

You've probably heard by now the new SAS Viya runs on containers orchestrated by Kubernetes. There is a lot to know and learn for both experienced users and k8s noobies alike. I would recommend searching the SAS Communities Library for a wealth of information on deployment, administration, usability, and more.

If you're here, you're probably looking to deploy SAS Viya on Azure Kubernetes Service (AKS) and you're thinking about making data available to users' sessions. Maybe you have SAS deployed today and are doing that already with an NFS server? This is still possible in this new paradigm, but perhaps you haven't already deployed an NFS server in the cloud, or at least inside of the same Virtual Network as your AKS cluster and you're looking to avoid communicating over public IP addresses. Or, maybe you just don't want to manage another VM to hold an NFS server. Enter: Azure Files. These are a great option if you want to make the same data available to multiple VMs, similar to the way an NFS export works. In our case with SAS Viya on Kubernetes, what we want to do is make the same data available to multiple containers which we can do with Azure Files shares as well.

Pro Tip: Opt for Premium File Shares for better performance. These run on SSD-based hardware and use the NFS protocol. The Standard ones run on HDD-based hardware and SMB protocols, which tends to be slower with data over a few MBs.

If you're not yet familiar with the relationships between Storage Classes, Persistent Volumes, and Persistent Volume Claims, here is a great walkthrough. While it's written with Azure storage options in mind, the general relationships between those k8s objects are always the same. Now that we're all on the same page, we know we need to create a Persistent Volume (PV), which refers to an Azure File share and then define a Persistent Volume Claim (PVC) tying back to that PV. Our container specs request the PVC and in turn, have the Azure File share mounted into them. This allows multiple containers access to the same data. A quick read through this guide on doing all of this highlights why we're here...What if you don't want to mount this data with 777 or 755 permissions?

That's the point of today's post: to show how you can use Azure Files with group-based access patterns to support a line of business (LOB)- or project-based security model for accessing data with SAS Viya.

So here’s the punchline

Follow the steps below as your guide for accomplishing everything we've already talked about, plus some SAS Viya-specific steps to ensure that user access occurs as intended. In the next section we'll get into details, so you can replicate this setup in your deployment.

But one more quick note: the rest of this post assumes that your SAS Viya environment is configured to make calls to an LDAP provider to facilitate user and group information. If you've opted to leverage a SCIM provider like Azure Active Directory or Okta instead, keep reading but stay tuned for my next post which will cover a couple additional steps you'll need to account for.

  1. Make sure there’s an LDAP group for each LOB or project in your security model. Put all the relevant users in them. Your life will be simpler if you have your SAS Administrator(s) in each group as well, but we can work around that if it's not feasible.
  2. Create a storage account for each LOB/project (sort of optional) and then create your Azure File Share(s).
  3. For each storage account, create a Kubernetes Secret to store the Access Key (this is why a separate Storage Account per LOB/project is optional; technically you could share the keys but it's not recommended).
  4. For each File Share create a Persistent Volume (PV), a Persistent Volume Claim (PVC), and Patch Transformers to make sure the data gets mounted to all necessary containers.
  5. Create another PatchTransformer to add the mount point(s) to the CAS allowlist.
  6. Make sure all the volumes and claims end up in the resources section of your kustomize.yaml.
  7. Finally, make sure all of the transformers are listed in the transformers section of your kustomization.yaml, then build and apply.
  8. Put all users in the CASHostAccountRequired Custom Group.
  9. Bounce all the CAS containers.
  10. Rejoice!

Let’s walk through that, a little slower now…

LDAP GROUPS

For this example I'm going to stick with just one group: the HR line of business (LOB). That group's GID in my LDAP environment is 5555.

STORAGE ACCOUNTS

Note: If the Storage Accounts and File Shares already exist at your site, with or without data already in them, that's totally fine, we'll just create the secret(s) and you can skip this step.

For this walk through I'm using a separate Storage Account for each LOB/project so we can also separate the access keys. Just generally a good practice since access keys are already a hefty object, but there's technically nothing stopping you from using the same Storage Account for multiple LOBs/BUs.

A word to the wise: if you create a storage account for this, make sure to pick this on the “Networking” tab:

and this on the "Advanced" tab:

That unique combination will yield no errors and no public access to your data. Regardless of whether you created the storage accounts, or they pre-existed, we’ll need to store the key for each one in a Kubernetes Secret. We'll see that in a later step.

Finally, create a File Share for each LOB/project in your storage account(s, however you decide). You can put data in now or wait until later.

KUBERNETES SPECIFICATIONS

A quick note on the of each of the code blocks to follow: I include the filename and path in a # comment in the initial line. The paths will make more sense in the last section where we see how to put these all together. Where I use $deploy, I am referring to whatever directory you create to store all of the files to customize and build your SAS Viya environment.

Create a Secret manifest

All right, let's get into the meat and potatoes. First you'll create each LOB/project Secret (or just one if you went that route). I will create it in a separate namespace from the one SAS Viya will be deployed into so that I can reuse this secret for multiple SAS Viya deployments. For that reason, I chose not to put it in the same $deploy directory with the rest of my deployment-specific manifests. Here's an example of one way to do it:

apiVersion: v1
kind: Secret
metadata:
   name: hr-azure-sa-key
type: Opaque
stringData:
   azurestorageaccountname: hrdemodata
   azurestoragecountkey: <keyStringHere>

I am keeping this simple for illustrative purposes. Be aware when creating this Secret, k8s will only base64 encode it. There are other extensions and add-ons for actually encrypting your Secrets in k8s but that's out of scope for this post. You could replace the stringData property with the data property and then use an already base64-encoded value for visual obfuscation, but that's easily decodable, so this really shouldn't be stored in source control regardless.

Create a Persistent Volume (PV) manifest

Next, you’ll create your PersistentVolume similar to this example:

# $deploy/site-config/volumesAndClaims/hr-data-storage-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
   name: hr-data-storage-<deploy>
   # The label is used for matching the exact claim
   labels:
      usage: hr-data-storage-<deploy>
spec:
   capacity:
      storage: 1Gi
   accessModes:
      - ReadWriteMany
   persistentVolumeReclaimPolicy: Retain
   azureFile:
      secretName: hr-azure-sa-key
      shareName: viyahrdemodata
      secretNamespace: afs-data
      readOnly: false
   mountOptionns:
   - dir_mode=0770
   - file_mode=0770
   - uid=2222
   - gid=5555

Consider replacing the the <deploy> suffix at the end of the name and usage properties with some unique identifier to this SAS Viya deployment, like the name of the namespace. I'll explain why in the next section.

The secretName comes from the name property in the first step and the shareName is the name of the File Share for this particular LOB inside the key's Storage Account. secretNamespace is pretty self-explanatory, it’s where I created the secret. I did this in a separate namespace from where my SAS Viya deployment will live so I can reuse the same secret for multiple environments, if I need to.

We still have the mountOptions specification here, but this pattern equates to giving anyone in our HR LDAP group (gid 5555) full access, but not world access. The SAS Admin (uid 2222) in my case owns the files. You might pick a LOB/project lead here instead.

Create a Persistent Volume Claim (PVC) manifest

Moving along, we need to create the PersistentVolumeClaim which is what the containers needing access to the data to serve up to users' sessions will request. It will tie them back to the volume specification we just created.

# $deploy/site-config/volumesAndClaims/hr-data-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hr-volume-claim
  annotations:
    # Set this annotation to NOT let Kubernetes automatically create
    # a persistent volume for this volume claim.
    volume.beta.kubernetes.io/storage-class: ""
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  selector:
    matchLabels:
      usage: hr-data-storage-<deploy>

In this case we don't need to add a unique identifier to our PVC's name because it is a namespace-scoped resource already. There is a one-to-one mapping to PVs, but because PVs are not scoped to any namespace by nature, we end up tying to it to one due to this mapping. Just as when creating my Secret in it's own namespace to reuse it for multiple SAS Viya environments, I expect to have multiple but similar PVs - their only difference being which SAS Viya environment they belong to. Hence, the unique identifier on those.

Take note of the fancy annotation here. The comment tells all. By the way, I am modelling this after the examples in the Azure Files Kubernetes Plug-In docs. The very last attribute is also very important: the name hr-data-storage- is the value of the usage label from the PersistentVolume spec above. This is what links these two things together.

Mount the PV using the PVC

To make sure that these volumes get mounted into all CAS and SAS Compute containers for both in-memory and traditional sessions, we'll use PatchTransformers and targert specific types and named k8s objects like this:

# $deploy/site-config/transformers/add-hr-demo-data-to-CAS.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: add-azure-hr-data
patch: |-
  - op: add
    path: /spec/controllerTemplate/spec/volumes/-
    value:
      name: hrdata
      persistentVolumeClaim:
        claimName: hr-volume-claim
  - op: add
    path: /spec/controllerTemplate/spec/containers/0/volumeMounts/-
    value:
      name: hrdata
      mountPath: /mnt/hrdata
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1
# $deploy/site-config/transformers/add-hr-demo-data-to-Compute.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: add-azure-hr-data-compute
patch: |-
  - op: add
    path: /template/spec/volumes/-
    value:
      name: hrdata
      persistentVolumeClaim:
        claimName: hr-volume-claim
  - op: add
    path: /template/spec/containers/0/volumeMounts/-
    value:
      name: hrdata
      mountPath: /mnt/hrdata
target:
  kind: PodTemplate
  name: sas-compute-job-config
  version: v1

The target is, somewhat confusingly, listed at the end of both manifests. In the first one we're targeting all CASDeployments which are the k8s Custom Resources that define CAS. In the second we're targeting all PodTemplates that are called "sas-compute-job-config" which is how we ensure this data gets mounted into all of the containers that get dynamically spun up to support individual user sessions.

The claimName in both manifests has a value equal to the metadata.name property from the PVC manifest in the last step. The name and mountPath value within both of the patches (in my case those values are "hrdata" and "/mnt/hrdata" respectively) can be whatever you like. The later will end up being the name of the directory inside the containers.

Add the new mount points to the CAS allowlist

The details of CAS allowlists are a little out of scope for this post, but my colleague Gerry Nelson, covered this topic deeply his Accessing path-based data from CAS in Viya article. The general concept though is we can only point to data from approved locations, which was implemented to strengthen our security posture when we moved to containerized Kubernetes deployments. If we miss this step, users will not be able to load any of this data to CAS to work with it in memory.

# $deploy/site-config/transformers/add-demo-data-to-allowlist.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: cas-add-allowlist-paths
patch: |-
  - op: add
    path: /spec/appendCASAllowlistPaths
    value:
      - /cas/data/caslibs
      - /mnt/hrdata
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1

The target is similar to what we saw in the last step, it's in charge of making sure this change gets applied to any CAS pods that are created. The list of values within the patch make up the new allowlist. The first path in the list should stay to maintain access to the OOTB paths, and then you can add additional entries to represent your newly-mounted data. Note the paths you add here should correspond exactly to the mountPath values from the PatchTransformers in the last step (in my case that was "/mnt/hrdata"). But the fact we specify a list here means we use this same transformer for all of the projects/LOBs in our security model.

Put all users in the CASHostAccountRequired Custom Group

Finally, we need to add an environment variable to all the CAS pods ensuring all users will spawn PIDs with their own UIDs and GIDs. If we skip this step, then our user's sessions will attempt to access data mounted into our containers as a service account, usually 'cas'. This will not work with the group-based access we set up when defining our PVs above, since only group members have access. If you're familiar with this process in earlier versions of SAS Viya, this would be the equivalent of creating the CASHostAccountRequired custom group and then ensuring all users end up in it, except easier.

# $deploy/site-config/transformers/set-casallhostaccounts-envvar.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: casallhostaccounts-environment-variable
patch: |-
  - op: add
    path: /spec/controllerTemplate/spec/containers/0/env/-
    value:
      name: CASENV_CASALLHOSTACCOUNTS
      value: "ALL"
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1

Not too much new here. The target behaves as before and the patch adds an environment variable to the proper level in the manifest for the target CASDeployment Custom Resource.

A quick aside

I just have to mention this. If you get curious and exec into a CAS container and try to run an ls on one of these newly-mounted paths, you will get denied. That's because the new SAS Viya no longer requires a tight coupling with user identities on the hosts it's running on - as it shouldn't. Configuring something like SSSD so the underlying hosts (in our case, containers) can facilitate users and groups from your LDAP/AD, which is required in a SAS Viya 3.x environment, would add too much size and complexity to our containers. Instead, the CAS containers themselves run as an internal 'sas' user (UID 1001) which doesn't have any access to our mounts because of the access model we set up when we defined the PVs above.

So how do users actually get access to the data then? By default in the new SAS Viya, the SAS Identities Service will pull your users UID and GID information out of the configured LDAP provider. This, in combination with this environment variable, spawns PIDs for your user's CAS sessions as their UID/GID. Further, these sessions, associated with a PID running as the user, are not denied when trying to access allowable mount points. Meaning, if the HR group in our example (GID 5555) is in their secondary groups then they will be allowed, and otherwise denied.

And what if you're environment is configured to use a SCIM provider like Azure Active Directory or Okta instead of an LDAP provider? Likely, in that case your provider won't facilitate user and group information (unless you have something like Domain Services for Azure Active Directory turned on). So how will we manage group-based access without UID and GID information coming from the provider? Stay tuned for the next post!

PUTTING IT ALL TOGETHER

Now we need to make sure all the resources we've defined actually get applied to our environment. For the sake of simplicity, I'm assuming you already have SAS Viya deployed into a namespace on Kubernetes and you're familiar with how we use Kustomize to bundle up our customizations to the vanilla manifests shipped to you and apply that bundle to Kubernetes. If not, keep reading to capture the gist and then take a look at the How do I kustomize a SAS Viya deployment? SAS Communities article.

Structure the sub-directories inside of your site-specific (site-config) however makes sense to your business. Here's how I went about it for the files relevant to this post:

# $deploy/site-config
/transformers
  - add-demo-data-to-allowlist.yaml
  - add-hr-demo-data-to-CAS.yaml
  - add-hr-demo-data-to-Compute.yaml
  - set-casallhostaccounts-envvar.yaml
/volumesAndClaims
  - hr-data-storage-volume.yaml
  - hr-data-volume-claim.yaml
  - kustomization.yaml

The only file out of that list we've yet to see is this one:

# $deploy/site-config/volumesAndClaims/kustomization.yaml
resources:
  - hr-data-storage-volume.yaml
  - finance-data-storage-volume.yaml
  - hr-data-volume-claim.yaml
  - finance-data-volume-claim.yaml

This directory structure/organization, along with the kustomization.yaml inside the same directory, allows me to add the entire $deploy/site-config/volumesAndClaims path to the resources section of my main kustomization.yaml file, which keeps it cleaner. When I'm ready to add PVs and PVCs to support more projects/LOBs, I will add them to the list in the $deploy/site-config/volumesAndClaims/kustomization.yaml file. A peek at my main kustomization.yaml:

# $deploy/kustomization.yaml 
...
resources:
  - sas-bases/base
  - ...
  - site-config/volumesAndClaims
  - ...
...

Next, we need to add all the Transformers we created to the transformers section of our main kustomization.yaml file.

# $deploy/kustomization.yaml
...
transformers:
  - ...
  - site-config/transformers/add-hr-demo-data-to-CAS.yaml
  - site-config/transformers/add-hr-demo-data-to-compute.yaml
  - site-config/transformers/add-demo-data-to-allowlist.yaml
  - site-config/transformers/set-casallhostaccounts-envvar.yaml
  - ...
...

If you're a visual person like I am, I've included a map of these files and to which namespace they should be deployed at the bottom of this post that might help you.

Then in typical Kustomize fashion, we'll bundle up our customizations and rewrite the site.yaml file that holds the manifests for this SAS Viya deployment. We'll then apply that with kubectl, which conveniently only applys the net-new changes to our namespace.

# $deploy/
kustomize build -o site.yaml
kubectl apply -f site.yaml -n $namespace

To finish things off, we need to bounce the CAS server and any of it's Workers which we can do by deleting the Pods controlled by the CAS Operator. Note, in general, Pods are the only "safe" thing to delete because the Replica Set that backs them ensures they get replaced.

kubectl -n $namespace delete pods -l casoperator.sas.com/server=default

VALIDATING THE END RESULT

After completing the process, we want to verify the results. First, start a SAS Studio session and validate the CASALLHOSTACCOUNTS variable was successfully set with the serverStatus action using the following code:

cas;
 
proc cas;
   bulitins.serverStatus result=results;
   /* describe results; */
   print "CAS Host Account Required: " results.About.CASHostAccountRequired;
run;

Verify the output shows “ALL”.

Next, create the global caslibs in SAS Viya. For instructions on creating these, see this page of our documentation (hint: use PATH, not DNFS). Make sure to use the mountPath value from your $deploy/site-config/transformers/add-hr-demo-data-to-CAS.yaml file above.

Finally, we'll apply some authorizations on the global caslib itself, which will add to the POSIX permissions we've already set on our PV. The image below is an example of the project/LOB-based security pattern I typically use for caslibs, applied to my newly-created HR caslib. No full promote! With this pattern, only users in the HR Members LDAP group (GID 5555) will even see this caslib exists, let alone work with it. They won't be able to alter this caslib or adjust who else has access to it. For more information on setting authorizations on caslibs, see this section of our documentation.

This is me logged in as myself (taflem), a member of the HR line of business group, showing I can view and promote tables from source. In other words, I can see DIABETES.sashdat exists in the Azure File Share I created for HR and I can load it into memory for this caslib.

I won't be able to load it into memory associated with any other caslibs that I have access to (the out-of-the-box Public caslib for example) because I (acting as the SAS Admin) don't allow full promote permissions on any caslibs in this environment like you see in the screenshot above. Accomplishing this requires editing the authorizations of all caslibs in your environment and removing (not denying) Full Promote permissions from them. The benefit of doing this consistently is that, for example, users won't be able to take HR data associated with this caslib, which only HR group members have access to, and make it available while it's in memory to any other caslibs which might have wider permissions.

A few more validations… still logged in as myself, I can write back to source and prove the written file maintains the same POSIX permissions (this way of mounting File Shares is like the setgid bit on steroids). I can also validate I can’t do anything with the sister finance caslib, which I set up following the exact same pattern we just walked through for all of the HR Kubernetes objects. This is because I associated that PV with a different LDAP group, just for Finance LOB users, which I'm not a member of.

AND IN CONCLUSION…

I always feel like I do this to myself where I write a really detailed blog post and don’t save energy for a conclusion :). But with this pattern, you can create a group-based security model for path-based data using Azure Files shares. I hope it helps you on your journey to a SAS Viya environment that's well-balanced between security and usability!

As promised, here's a map of these files and which namespace they should get deployed to (click to make it larger and zoom in):

Group-based access to Azure Files in the new SAS Viya was published on SAS Users.

4月 242021
 

You've probably heard by now the new SAS Viya runs on containers orchestrated by Kubernetes. There is a lot to know and learn for both experienced users and k8s noobies alike. I would recommend searching the SAS Communities Library for a wealth of information on deployment, administration, usability, and more.

If you're here, you're probably looking to deploy SAS Viya on Azure Kubernetes Service (AKS) and you're thinking about making data available to users' sessions. Maybe you have SAS deployed today and are doing that already with an NFS server? This is still possible in this new paradigm, but perhaps you haven't already deployed an NFS server in the cloud, or at least inside of the same Virtual Network as your AKS cluster and you're looking to avoid communicating over public IP addresses. Or, maybe you just don't want to manage another VM to hold an NFS server. Enter: Azure Files. These are a great option if you want to make the same data available to multiple VMs, similar to the way an NFS export works. In our case with SAS Viya on Kubernetes, what we want to do is make the same data available to multiple containers which we can do with Azure Files shares as well.

Pro Tip: Opt for Premium File Shares for better performance. These run on SSD-based hardware and use the NFS protocol. The Standard ones run on HDD-based hardware and SMB protocols, which tends to be slower with data over a few MBs.

If you're not yet familiar with the relationships between Storage Classes, Persistent Volumes, and Persistent Volume Claims, here is a great walkthrough. While it's written with Azure storage options in mind, the general relationships between those k8s objects are always the same. Now that we're all on the same page, we know we need to create a Persistent Volume (PV), which refers to an Azure File share and then define a Persistent Volume Claim (PVC) tying back to that PV. Our container specs request the PVC and in turn, have the Azure File share mounted into them. This allows multiple containers access to the same data. A quick read through this guide on doing all of this highlights why we're here...What if you don't want to mount this data with 777 or 755 permissions?

That's the point of today's post: to show how you can use Azure Files with group-based access patterns to support a line of business (LOB)- or project-based security model for accessing data with SAS Viya.

So here’s the punchline

Follow the steps below as your guide for accomplishing everything we've already talked about, plus some SAS Viya-specific steps to ensure that user access occurs as intended. In the next section we'll get into details, so you can replicate this setup in your deployment.

But one more quick note: the rest of this post assumes that your SAS Viya environment is configured to make calls to an LDAP provider to facilitate user and group information. If you've opted to leverage a SCIM provider like Azure Active Directory or Okta instead, keep reading but stay tuned for my next post which will cover a couple additional steps you'll need to account for.

  1. Make sure there’s an LDAP group for each LOB or project in your security model. Put all the relevant users in them. Your life will be simpler if you have your SAS Administrator(s) in each group as well, but we can work around that if it's not feasible.
  2. Create a storage account for each LOB/project (sort of optional) and then create your Azure File Share(s).
  3. For each storage account, create a Kubernetes Secret to store the Access Key (this is why a separate Storage Account per LOB/project is optional; technically you could share the keys but it's not recommended).
  4. For each File Share create a Persistent Volume (PV), a Persistent Volume Claim (PVC), and Patch Transformers to make sure the data gets mounted to all necessary containers.
  5. Create another PatchTransformer to add the mount point(s) to the CAS allowlist.
  6. Make sure all the volumes and claims end up in the resources section of your kustomize.yaml.
  7. Finally, make sure all of the transformers are listed in the transformers section of your kustomization.yaml, then build and apply.
  8. Put all users in the CASHostAccountRequired Custom Group.
  9. Bounce all the CAS containers.
  10. Rejoice!

Let’s walk through that, a little slower now…

LDAP GROUPS

For this example I'm going to stick with just one group: the HR line of business (LOB). That group's GID in my LDAP environment is 5555.

STORAGE ACCOUNTS

Note: If the Storage Accounts and File Shares already exist at your site, with or without data already in them, that's totally fine, we'll just create the secret(s) and you can skip this step.

For this walk through I'm using a separate Storage Account for each LOB/project so we can also separate the access keys. Just generally a good practice since access keys are already a hefty object, but there's technically nothing stopping you from using the same Storage Account for multiple LOBs/BUs.

A word to the wise: if you create a storage account for this, make sure to pick this on the “Networking” tab:

and this on the "Advanced" tab:

That unique combination will yield no errors and no public access to your data. Regardless of whether you created the storage accounts, or they pre-existed, we’ll need to store the key for each one in a Kubernetes Secret. We'll see that in a later step.

Finally, create a File Share for each LOB/project in your storage account(s, however you decide). You can put data in now or wait until later.

KUBERNETES SPECIFICATIONS

A quick note on the of each of the code blocks to follow: I include the filename and path in a # comment in the initial line. The paths will make more sense in the last section where we see how to put these all together. Where I use $deploy, I am referring to whatever directory you create to store all of the files to customize and build your SAS Viya environment.

Create a Secret manifest

All right, let's get into the meat and potatoes. First you'll create each LOB/project Secret (or just one if you went that route). I will create it in a separate namespace from the one SAS Viya will be deployed into so that I can reuse this secret for multiple SAS Viya deployments. For that reason, I chose not to put it in the same $deploy directory with the rest of my deployment-specific manifests. Here's an example of one way to do it:

apiVersion: v1
kind: Secret
metadata:
   name: hr-azure-sa-key
type: Opaque
stringData:
   azurestorageaccountname: hrdemodata
   azurestoragecountkey: <keyStringHere>

I am keeping this simple for illustrative purposes. Be aware when creating this Secret, k8s will only base64 encode it. There are other extensions and add-ons for actually encrypting your Secrets in k8s but that's out of scope for this post. You could replace the stringData property with the data property and then use an already base64-encoded value for visual obfuscation, but that's easily decodable, so this really shouldn't be stored in source control regardless.

Create a Persistent Volume (PV) manifest

Next, you’ll create your PersistentVolume similar to this example:

# $deploy/site-config/volumesAndClaims/hr-data-storage-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
   name: hr-data-storage-<deploy>
   # The label is used for matching the exact claim
   labels:
      usage: hr-data-storage-<deploy>
spec:
   capacity:
      storage: 1Gi
   accessModes:
      - ReadWriteMany
   persistentVolumeReclaimPolicy: Retain
   azureFile:
      secretName: hr-azure-sa-key
      shareName: viyahrdemodata
      secretNamespace: afs-data
      readOnly: false
   mountOptionns:
   - dir_mode=0770
   - file_mode=0770
   - uid=2222
   - gid=5555

Consider replacing the the <deploy> suffix at the end of the name and usage properties with some unique identifier to this SAS Viya deployment, like the name of the namespace. I'll explain why in the next section.

The secretName comes from the name property in the first step and the shareName is the name of the File Share for this particular LOB inside the key's Storage Account. secretNamespace is pretty self-explanatory, it’s where I created the secret. I did this in a separate namespace from where my SAS Viya deployment will live so I can reuse the same secret for multiple environments, if I need to.

We still have the mountOptions specification here, but this pattern equates to giving anyone in our HR LDAP group (gid 5555) full access, but not world access. The SAS Admin (uid 2222) in my case owns the files. You might pick a LOB/project lead here instead.

Create a Persistent Volume Claim (PVC) manifest

Moving along, we need to create the PersistentVolumeClaim which is what the containers needing access to the data to serve up to users' sessions will request. It will tie them back to the volume specification we just created.

# $deploy/site-config/volumesAndClaims/hr-data-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hr-volume-claim
  annotations:
    # Set this annotation to NOT let Kubernetes automatically create
    # a persistent volume for this volume claim.
    volume.beta.kubernetes.io/storage-class: ""
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  selector:
    matchLabels:
      usage: hr-data-storage-<deploy>

In this case we don't need to add a unique identifier to our PVC's name because it is a namespace-scoped resource already. There is a one-to-one mapping to PVs, but because PVs are not scoped to any namespace by nature, we end up tying to it to one due to this mapping. Just as when creating my Secret in it's own namespace to reuse it for multiple SAS Viya environments, I expect to have multiple but similar PVs - their only difference being which SAS Viya environment they belong to. Hence, the unique identifier on those.

Take note of the fancy annotation here. The comment tells all. By the way, I am modelling this after the examples in the Azure Files Kubernetes Plug-In docs. The very last attribute is also very important: the name hr-data-storage- is the value of the usage label from the PersistentVolume spec above. This is what links these two things together.

Mount the PV using the PVC

To make sure that these volumes get mounted into all CAS and SAS Compute containers for both in-memory and traditional sessions, we'll use PatchTransformers and targert specific types and named k8s objects like this:

# $deploy/site-config/transformers/add-hr-demo-data-to-CAS.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: add-azure-hr-data
patch: |-
  - op: add
    path: /spec/controllerTemplate/spec/volumes/-
    value:
      name: hrdata
      persistentVolumeClaim:
        claimName: hr-volume-claim
  - op: add
    path: /spec/controllerTemplate/spec/containers/0/volumeMounts/-
    value:
      name: hrdata
      mountPath: /mnt/hrdata
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1
# $deploy/site-config/transformers/add-hr-demo-data-to-Compute.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: add-azure-hr-data-compute
patch: |-
  - op: add
    path: /template/spec/volumes/-
    value:
      name: hrdata
      persistentVolumeClaim:
        claimName: hr-volume-claim
  - op: add
    path: /template/spec/containers/0/volumeMounts/-
    value:
      name: hrdata
      mountPath: /mnt/hrdata
target:
  kind: PodTemplate
  name: sas-compute-job-config
  version: v1

The target is, somewhat confusingly, listed at the end of both manifests. In the first one we're targeting all CASDeployments which are the k8s Custom Resources that define CAS. In the second we're targeting all PodTemplates that are called "sas-compute-job-config" which is how we ensure this data gets mounted into all of the containers that get dynamically spun up to support individual user sessions.

The claimName in both manifests has a value equal to the metadata.name property from the PVC manifest in the last step. The name and mountPath value within both of the patches (in my case those values are "hrdata" and "/mnt/hrdata" respectively) can be whatever you like. The later will end up being the name of the directory inside the containers.

Add the new mount points to the CAS allowlist

The details of CAS allowlists are a little out of scope for this post, but my colleague Gerry Nelson, covered this topic deeply his Accessing path-based data from CAS in Viya article. The general concept though is we can only point to data from approved locations, which was implemented to strengthen our security posture when we moved to containerized Kubernetes deployments. If we miss this step, users will not be able to load any of this data to CAS to work with it in memory.

# $deploy/site-config/transformers/add-demo-data-to-allowlist.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: cas-add-allowlist-paths
patch: |-
  - op: add
    path: /spec/appendCASAllowlistPaths
    value:
      - /cas/data/caslibs
      - /mnt/hrdata
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1

The target is similar to what we saw in the last step, it's in charge of making sure this change gets applied to any CAS pods that are created. The list of values within the patch make up the new allowlist. The first path in the list should stay to maintain access to the OOTB paths, and then you can add additional entries to represent your newly-mounted data. Note the paths you add here should correspond exactly to the mountPath values from the PatchTransformers in the last step (in my case that was "/mnt/hrdata"). But the fact we specify a list here means we use this same transformer for all of the projects/LOBs in our security model.

Put all users in the CASHostAccountRequired Custom Group

Finally, we need to add an environment variable to all the CAS pods ensuring all users will spawn PIDs with their own UIDs and GIDs. If we skip this step, then our user's sessions will attempt to access data mounted into our containers as a service account, usually 'cas'. This will not work with the group-based access we set up when defining our PVs above, since only group members have access. If you're familiar with this process in earlier versions of SAS Viya, this would be the equivalent of creating the CASHostAccountRequired custom group and then ensuring all users end up in it, except easier.

# $deploy/site-config/transformers/set-casallhostaccounts-envvar.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: casallhostaccounts-environment-variable
patch: |-
  - op: add
    path: /spec/controllerTemplate/spec/containers/0/env/-
    value:
      name: CASENV_CASALLHOSTACCOUNTS
      value: "ALL"
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1

Not too much new here. The target behaves as before and the patch adds an environment variable to the proper level in the manifest for the target CASDeployment Custom Resource.

A quick aside

I just have to mention this. If you get curious and exec into a CAS container and try to run an ls on one of these newly-mounted paths, you will get denied. That's because the new SAS Viya no longer requires a tight coupling with user identities on the hosts it's running on - as it shouldn't. Configuring something like SSSD so the underlying hosts (in our case, containers) can facilitate users and groups from your LDAP/AD, which is required in a SAS Viya 3.x environment, would add too much size and complexity to our containers. Instead, the CAS containers themselves run as an internal 'sas' user (UID 1001) which doesn't have any access to our mounts because of the access model we set up when we defined the PVs above.

So how do users actually get access to the data then? By default in the new SAS Viya, the SAS Identities Service will pull your users UID and GID information out of the configured LDAP provider. This, in combination with this environment variable, spawns PIDs for your user's CAS sessions as their UID/GID. Further, these sessions, associated with a PID running as the user, are not denied when trying to access allowable mount points. Meaning, if the HR group in our example (GID 5555) is in their secondary groups then they will be allowed, and otherwise denied.

And what if you're environment is configured to use a SCIM provider like Azure Active Directory or Okta instead of an LDAP provider? Likely, in that case your provider won't facilitate user and group information (unless you have something like Domain Services for Azure Active Directory turned on). So how will we manage group-based access without UID and GID information coming from the provider? Stay tuned for the next post!

PUTTING IT ALL TOGETHER

Now we need to make sure all the resources we've defined actually get applied to our environment. For the sake of simplicity, I'm assuming you already have SAS Viya deployed into a namespace on Kubernetes and you're familiar with how we use Kustomize to bundle up our customizations to the vanilla manifests shipped to you and apply that bundle to Kubernetes. If not, keep reading to capture the gist and then take a look at the How do I kustomize a SAS Viya deployment? SAS Communities article.

Structure the sub-directories inside of your site-specific (site-config) however makes sense to your business. Here's how I went about it for the files relevant to this post:

# $deploy/site-config
/transformers
  - add-demo-data-to-allowlist.yaml
  - add-hr-demo-data-to-CAS.yaml
  - add-hr-demo-data-to-Compute.yaml
  - set-casallhostaccounts-envvar.yaml
/volumesAndClaims
  - hr-data-storage-volume.yaml
  - hr-data-volume-claim.yaml
  - kustomization.yaml

The only file out of that list we've yet to see is this one:

# $deploy/site-config/volumesAndClaims/kustomization.yaml
resources:
  - hr-data-storage-volume.yaml
  - finance-data-storage-volume.yaml
  - hr-data-volume-claim.yaml
  - finance-data-volume-claim.yaml

This directory structure/organization, along with the kustomization.yaml inside the same directory, allows me to add the entire $deploy/site-config/volumesAndClaims path to the resources section of my main kustomization.yaml file, which keeps it cleaner. When I'm ready to add PVs and PVCs to support more projects/LOBs, I will add them to the list in the $deploy/site-config/volumesAndClaims/kustomization.yaml file. A peek at my main kustomization.yaml:

# $deploy/kustomization.yaml 
...
resources:
  - sas-bases/base
  - ...
  - site-config/volumesAndClaims
  - ...
...

Next, we need to add all the Transformers we created to the transformers section of our main kustomization.yaml file.

# $deploy/kustomization.yaml
...
transformers:
  - ...
  - site-config/transformers/add-hr-demo-data-to-CAS.yaml
  - site-config/transformers/add-hr-demo-data-to-compute.yaml
  - site-config/transformers/add-demo-data-to-allowlist.yaml
  - site-config/transformers/set-casallhostaccounts-envvar.yaml
  - ...
...

If you're a visual person like I am, I've included a map of these files and to which namespace they should be deployed at the bottom of this post that might help you.

Then in typical Kustomize fashion, we'll bundle up our customizations and rewrite the site.yaml file that holds the manifests for this SAS Viya deployment. We'll then apply that with kubectl, which conveniently only applys the net-new changes to our namespace.

# $deploy/
kustomize build -o site.yaml
kubectl apply -f site.yaml -n $namespace

To finish things off, we need to bounce the CAS server and any of it's Workers which we can do by deleting the Pods controlled by the CAS Operator. Note, in general, Pods are the only "safe" thing to delete because the Replica Set that backs them ensures they get replaced.

kubectl -n $namespace delete pods -l casoperator.sas.com/server=default

VALIDATING THE END RESULT

After completing the process, we want to verify the results. First, start a SAS Studio session and validate the CASALLHOSTACCOUNTS variable was successfully set with the serverStatus action using the following code:

cas;
 
proc cas;
   bulitins.serverStatus result=results;
   /* describe results; */
   print "CAS Host Account Required: " results.About.CASHostAccountRequired;
run;

Verify the output shows “ALL”.

Next, create the global caslibs in SAS Viya. For instructions on creating these, see this page of our documentation (hint: use PATH, not DNFS). Make sure to use the mountPath value from your $deploy/site-config/transformers/add-hr-demo-data-to-CAS.yaml file above.

Finally, we'll apply some authorizations on the global caslib itself, which will add to the POSIX permissions we've already set on our PV. The image below is an example of the project/LOB-based security pattern I typically use for caslibs, applied to my newly-created HR caslib. No full promote! With this pattern, only users in the HR Members LDAP group (GID 5555) will even see this caslib exists, let alone work with it. They won't be able to alter this caslib or adjust who else has access to it. For more information on setting authorizations on caslibs, see this section of our documentation.

This is me logged in as myself (taflem), a member of the HR line of business group, showing I can view and promote tables from source. In other words, I can see DIABETES.sashdat exists in the Azure File Share I created for HR and I can load it into memory for this caslib.

I won't be able to load it into memory associated with any other caslibs that I have access to (the out-of-the-box Public caslib for example) because I (acting as the SAS Admin) don't allow full promote permissions on any caslibs in this environment like you see in the screenshot above. Accomplishing this requires editing the authorizations of all caslibs in your environment and removing (not denying) Full Promote permissions from them. The benefit of doing this consistently is that, for example, users won't be able to take HR data associated with this caslib, which only HR group members have access to, and make it available while it's in memory to any other caslibs which might have wider permissions.

A few more validations… still logged in as myself, I can write back to source and prove the written file maintains the same POSIX permissions (this way of mounting File Shares is like the setgid bit on steroids). I can also validate I can’t do anything with the sister finance caslib, which I set up following the exact same pattern we just walked through for all of the HR Kubernetes objects. This is because I associated that PV with a different LDAP group, just for Finance LOB users, which I'm not a member of.

AND IN CONCLUSION…

I always feel like I do this to myself where I write a really detailed blog post and don’t save energy for a conclusion :). But with this pattern, you can create a group-based security model for path-based data using Azure Files shares. I hope it helps you on your journey to a SAS Viya environment that's well-balanced between security and usability!

As promised, here's a map of these files and which namespace they should get deployed to (click to make it larger and zoom in):

Group-based access to Azure Files in the new SAS Viya was published on SAS Users.

12月 172020
 

There’s nothing worse than being in the middle of a task and getting stuck. Being able to find quick tips and tricks to help you solve the task at hand, or simply entertain your curiosity, is key to maintaining your efficiency and building everyday skills. But how do you get quick information that’s ALSO engaging? By adding some personality to traditionally routine tutorials, you can learn and may even have fun at the same time. Cue the SAS Users YouTube channel.

With more than 50 videos that show personality published to-date and over 10,000 hours watched, there’s no shortage of learning going on. Our team of experts love to share their knowledge and passion (with personal flavor!) to give you solutions to those everyday tasks.

What better way to round out the year than provide a roundup of our most popular videos from 2020? Check out these crowd favorites:

Most viewed

  1. How to convert character to numeric in SAS
  2. How to import data from Excel to SAS
  3. How to export SAS data to Excel

Most hours watched

  1. How to import data from Excel to SAS
  2. How to convert character to numeric in SAS
  3. Simple Linear Regression in SAS
  4. How to export SAS data to Excel
  5. How to Create Macro Variables and Use Macro Functions
  6. The SAS Exam Experience | See a Performance-Based Question in Action
  7. How it Import CSV files into SAS
  8. SAS Certification Exam: 4 tips for success
  9. SAS Date Functions FAQs
  10. Merging Data Sets in SAS Using SQL

Latest hits

  1. Combining Data in SAS: DATA Step vs SQL
  2. How to Concatenate Values in SAS
  3. How to Market to Customers Based on Online Behavior
  4. How to Plan an Optimal Tour of London Using Network Optimization
  5. Multiple Linear Regression in SAS
  6. How to Build Customized Object Detection Models

Looking forward to 2021

We’ve got you covered! SAS will continue to publish videos throughout 2021. Subscribe now to the SAS Users YouTube channel, so you can be notified when we’re publishing new videos. Be on the lookout for some of the following topics:

  • Transforming variables in SAS
  • Tips for working with SAS Technical Support
  • How to use Git with SAS

2020 roundup: SAS Users YouTube channel how to tutorials was published on SAS Users.

11月 202020
 

If you’re like me and the rest of the conference team, you’ve probably attended more virtual events this year than you ever thought possible. You can see the general evolution of virtual events by watching the early ones from April or May and compare them to the recent ones. We at SAS Global Forum are studying the virtual event world, and we’re learning what works and what needs to be tweaked. We’re using that knowledge to plan the best possible virtual SAS Global Forum 2021.

Everything is virtual these days, so what do we mean by virtual?

Planning a good virtual event takes time, and we’re working through the process now. One thing is certain -- we know the importance of providing quality content and an engaging experience for our attendees. We want to provide attendees with the opportunity as always, but virtually, to continue to learn from other SAS users, hear about new and exciting developments from SAS, and connect and network with experts, peers, partners and SAS. Yes, I said network. We realize it won’t be the same as a live event, but we are hopeful we can provide attendees with an incredible experience where you connect, learn and share with others.

Call for content is open

One of the differences between SAS Global Forum and other conferences is that SAS users are front and center, and the soul of the conference. We can’t have an event without user content. And that’s where you come in! The call for content opened November 17 and lasts through December 21, 2020. Selected presenters will be notified in January 2021. Presentations will be different in 2021; they will be 30 minutes in length, including time for Q&A when able. And since everything is virtual, video is a key component to your content submission. We ask for a 3-minute video along with your title and abstract.

The Student Symposium is back

Calling all postsecondary students -- there’s still time to build a team for the Student Symposium. If you are interested in data science and want to showcase your skills, grab a teammate or two and a faculty advisor and put your thinking caps on. Applications are due by December 21, 2020.

Learn more

I encourage you to visit the SAS Global Forum website for up-to-date information, follow #SASGF on social channels and join the SAS communities group to engage with the conference team and other attendees.

Connect, learn and share during virtual SAS Global Forum 2021 was published on SAS Users.

4月 132020
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various storage options available in Microsoft Azure. Access all the articles in the series here.

In this edition of the series on SAS and cloud integration, I cover the various storage options available on Microsoft Azure and how connect to and interact with them. I focus on three key storage services: object storage, block storage, and file storage. In my previous articles I have covered topics regarding database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS) as well as DBaaS on Azure.

Object Storage

Azure Blob Storage is a low-cost, scalable cloud object storage service for any type of data. Objects are a great way to store large amounts of unstructured data in their native formats. Individual Azure Blob objects size up to 4.75 terabytes (TB). Azure organizes these objects into different storage accounts. Because a storage account is a globally unique namespace for your data, no two storage accounts can have the same name. The storage account supplies a unique namespace for your data and is accessible from anywhere in the world over HTTP or HTTPS.

A Container organizes a set of Blobs similar to a traditional directory in a file system. You access Azure Blobs directly through an API from anywhere in the world. For security reasons, it is vital to grant least access to a Blob.

Make sure you are being intentional about opening objects up and are not exposing any sensitive data. Security controls are offered within individual blobs and containers that organize them. The default is to create objects and blobs with no public read access, then you may grant permissions to individual users and groups.

The total cost of blob storage depends on volume of data stored, type of operations performed, data transfer costs, and data redundancy choices. You can reduce the number of replicants or use one of the various tiers of archive services to reduce the cost of your object storage. Terabytes of storage used per month determine the calculations on cost. You incur added costs for data requests and transfers over the network. Data movement is an unpredictable expense for many users.

Azure Blob Storage Tiers
Hot Frequently accessed data
Cool Infrequently accessed data – archived at least 30 days
Archive Rarely accessed data – archived at least 180 days

 
In SAS Viya 3.5, direct support is available for objects stored in Azure Data Lake Storage Gen2. Azure Data Lake Storage Gen2 extends Azure Blob Storage capabilities and optimizing it for analytics workloads. If you want to read any SAS datasets, CSV and ORC files from Azure Blob Storage, you can read them directly using a CASLIB statement to Azure Data Lake Storage (ADLS). If you have files in a different format, you can always copy them to a local file system accessible to the CAS controller. Use CAS Actions to load tables into memory. Making HTTP requests directly from within your SAS code using Proc HTTP for the download process favors automation. Remember, no restrictions exist on file types for objects moving into object storage. Hence, this may require a SAS Data Connector to read some local file system filetypes.

Block Storage

Auzre Disks is the block storage service designed for use with Azure Virtual Machines. You may only access block storage when attached to an operating system. When thinking about Azure Disks, treat the storage volumes as an independent disk drive controlled by the server operating system. Mount an Azure Disk to an operating system as if it were a physical disk. Azure Disks are valuable because they are the persisting storage when you terminate your compute instance. You can choose from four different volume types that supply performance levels at corresponding costs.

Azure makes available a choice from HDD or three different performance classes of SSD: Standard, Premium, and Ultra performance. You can use Ultra Disk if you need the lowest latency and scalable performance. Standard SDD is the most cost effective while Premium SSD is the high-performance disk offering. The table below sums up four offerings.

Azure Disk Storage Types
Standard HDD Standard SDD Premium SDD Ultra SDD
Backups and Non critical Development or Test environments and lightly used workloads Production environments and time sensitive workloads High throughput and IOPS - Transaction heavy workloads

 
Azure Disks are the permanent SAS data storage, persisting through a restart of your SAS environment. The disk performance used when selecting from the different Azure Disk type has a direct impact on the performance you get from SAS. A best practice is to use compute instances with enhanced Azure Disks performance or dedicated solid state drive instance storage.

File Storage

Azure Files provides access to data through a shared file system. The elastic network file system grows and shrinks as you add or remove files, so you only pay for the storage you consume. Users create, delete, modify, read, and write files organized logically in a directory structure for intuitive access. This service allows simultaneous access for multiple users to a common set of files data managed by user and group permissions.

Azure Files is a powerful tool, especially if utilizing a SAS Grid architecture. If you have a requirement in your SAS architecture for a shared location where any node in a group can access and write to, then Azure Files could meet your requirement. To access the data stored in your network file system you will have to mount the file system to your operating system. You can mount Azure Files to any Azure Virtual Machine, or even to an on-premise server within your Azure Virtual Network. Azure Files is a fantastic way to setup a shared file system not only for your data but also to share projects and code between users.

Finally

Storage is a key component of cloud computing because it enables users to stop their compute instances while their most important data remains in place. Storage services make it much easier to manage and scale your data. For example, Blob storage is a great place to store files that you want to make available to anyone, anywhere.

Block storage drives the performance of your environment. Abundant and performant block storage is essential to making your application run against the massive scale of data that SAS is eager to consume. Block storage is where your operating system and software ultimately are installed.

File storage is a great service to attach shared file systems to your compute instances. This is a great place to collaborate or migrate file system data from one compute instance to another. SAS is a compute engine running on data.

Without a robust set up storage tools to persist that data you may not get the performance that you desire or the progress you make will be lost when you shut down your compute instances.

Resources

Storage in the Cloud – SAS and Azure was published on SAS Users.

1月 312020
 

Everyone is talking about artificial intelligence (AI) and how it affects our  lives -- there are even AI toothbrushes! But how do businesses use AI to help them compete in the market? According to Gartner research, only half of all AI projects are deployed and 90% take more than three [...]

Driving faster value from analytics – how to deploy models and decisions quickly was published on SAS Voices by Janice Newell

11月 052019
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Microsoft Azure. Access all the articles in the series here.

The series

This is the next iteration of a series covering database as a service (DBaaS) and storage offerings in the cloud, this time from Microsoft Azure. I have already published two articles on Amazon Web Services. One of those articles covers the DBaaS offerings and the other covers storage offerings for Amazon Web Services. I will cover Google Cloud Platform in future articles. The goal of these articles is to supply a breakdown of these services to better understand the business requirements of these offerings and how they relate to SAS. I would encourage you to read all the articles in the series even if you are already using a specific cloud provider. Many of the core technologies and services are offered across the different cloud providers. These articles focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya, called the SAS Data Connect Accelerator, are synonymous with the SAS Embedded Process.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies move to a cloud-based architecture, companies must consider questions like: Where do I store my data? What cloud services best meet my business requirements? Which cloud vendor should I use? Can I migrate my applications to the cloud? If you are looking to migrate your SAS infrastructure to Azure, look at the SAS Viya QuickStart Template for Azure to see a rapid deployment pattern to get the SAS Viya platform up and running in Azure.

SAS integration with Azure

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Azure database variants. A database running in Azure is much like your on-premise database, but instead Microsoft manages the software and hardware. Azure’s DBaaS offerings takes care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

Azure databases

Azure offers database service technologies familiar to many users. If you read my previous article on SAS Data Connectors and Amazon Web Services, you are sure to see many parallels. It is important to understand the terminology and how the different database services in Azure best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in Azure. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and creates backups means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the Azure offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between Azure database services and SAS. Trace you path in the diagram and read on to learn more about connection details.

Integration points between Azure database services and SAS

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Azure offers several types of transactional database services. You can organize the Azure transactional database service into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers have workloads built around an enterprise database. Azure is an interesting use case because Microsoft is also a traditional enterprise database vendor. Amazon, for example, does not have existing on-premise enterprise database customers. Oracle cloud is the other big player in the enterprise market looking to migrate existing customers to their cloud. Slightly off topic, but it may be of interest to some, SAS does support customers running their Oracle database on Oracle Cloud Platform using their SAS Data Connector to Oracle. Azure offers a solution for customers looking to continue their relationship with Microsoft without refactoring their existing workflows. Customers bring an existing enterprise database licenses to Azure and run SQL Server on Virtual Machines. SAS has extended SAS Data Connector support for SQL Server on Virtual Machines. You can also use your existing SAS license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with SQL Server on Virtual Machines.

Remember you can install and manage your own database on a virtual machine. For example, support for both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata is available for Teradata installed on Azure. If there is not an available database as a service offering, the traditional backup and update responsibilities are left to the customer.

SQL Server Stretch Database is another service available in Azure. If you are not prepared to add more storage to your existing on-premise SQL Server database, you can add capacity using the resources available in Azure. SQL Server Stretch will scale your data to Azure without having to provision any more servers on-premise. New SQL Server capacity will be running in Azure instead of in your data center.

Open Source

Azure provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. You can use your existing SAS license for SAS Data Connector to MYSQL to connect to Azure Database for MYSQL and SAS Data Connector to PostgreSQL to interface with Azure Database for PostgreSQL. SAS has not yet formally supported Azure Database for MariaDB. MariaDB is a variant of MySQL, so validation of support for SAS Data Connector is coming soon. If you need support for MariaDB in Azure database, please comment below and I will share your feedback with product management and testing.

Cloud Native

Azure SQL Database is an iteration of Microsoft SQL Server built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Azure SQL Database. You can use your existing license for SAS Data Connector to Microsoft SQL Server to connect to Azure SQL Database.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Azure SQL Data Warehouse is the analytical database service offered by Azure. The SAS Data Connector to ODBC combined with a recent version of the Microsoft-supplied ODBC driver is currently the best way to interact with Azure SQL Data Warehouse. Look for the SAS Data Connector to Microsoft SQL Server to support SQL Data Warehouse soon.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. If you want a brief overview of these database structures, I cover them in my AWS database blog.

For Microsoft Azure, CosmosDB is the option available for NoSQL databases. CosmosDB is multi-model, meaning you can build out your databases to fit the NoSQL model you prefer. Use the SAS Data Connector to ODBC to interact with your Data in Azure CosmosDB.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Azure offers HDInsights as their Hadoop as a service offering. Azure HDInsights supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data you plan to store before you select an Azure database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases, remember that you will be charged for the storage you use and for the data that you move out of the database. Performing analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these by keeping frequently accessed data cached somewhere or remain on-premise. Another strategy I’ve seen becoming more popular is taking advantage of the SAS Micro Analytics Service to move the models you have built to run in the cloud provider where your data is stored. Data transfer is cheaper if that data moves between cloud services instead of outside of the cloud provider. Micro Analytics Service allows you to score the data in place without movement from a cloud provider and without having to do an install of SAS.

Additional Resources
1. Support for Databases in SAS® Viya® 3.4
2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Microsoft Azure was published on SAS Users.