3月 032018

Report data shared by educational institutions, government agencies, healthcare organizations, and human resource departments can contain sensitive or confidential data. Data in such reports are suppressed selectively to protect the identities of individuals or to prevent the report’s audience from easily inferring individual values. The Data Suppression feature in SAS Visual Analytics 8.2 is easy to use when you need to selectively suppress aggregated data values in your reports.

All you need to do is create a calculated data item for Data Suppression and apply it to a report object such as a list table or a crosstab.  You could apply Data Suppression to a variety of report objects, but suppressing data for cells in either list tables or crosstabs is a common practice.

Here are a couple of examples where data suppression is applicable:

  • Universities and schools that release data on their students often use a cell threshold value in their report data to protect the risk of identifying specific students when the number of students in a class falls below the defined threshold value, and individual values for test scores or other criteria such as race can be easily determined by looking at the data.
  • In official reports with federal statistics that are provided by the Centers for Disease Control and Prevention in the U.S., certain data cells in the reports are suppressed to protect the confidentiality of patients and eliminate the risk of disclosing their identity. Patient data in such reports are suppressed by using a cell suppression threshold value of 16.

Before we jump into data suppression in SAS Visual Analytics, a quick note on understanding two kinds of data suppression.

Data Suppression by Using the withComplement Option

When a calculated data item is created for Data Suppression, SAS Visual Analytics applies the  withComplement option by default, and an additional complementary value is hidden randomly (by displaying an asterisk)  when you suppress the data for a single aggregated value.  This is done to prevent easy inference of the data values by viewing the total, subtotals, or other cell values.

Data Suppression by Using the withoutComplement Setting

If a calculated data item for Data Suppression is created by using the withoutComplement option, SAS Visual Analytics suppresses (by using an asterisk) only the aggregated data values that you chose to suppress, and no other additional complementary values are hidden with asterisks.

Let’s Do It

As an instructional exercise for data suppression, I chose a small subset of the data for high school students and their SAT test scores in the state of the North Carolina. I added three list tables to my report. My first list table has no data suppression (so we can see the data that I intend to suppress). My second list table will have data suppression without complementary values, and my third list table will have data suppression with complementary values.

In the first list table, the TESTED column shows the number of students that took the SAT test in each high school. If 14 or fewer than 14 students took the SAT test, I want to suppress the display of the number of students in the TESTED column for that high school.

Create the Calculated Data Item for Data Suppression Without Complementary Values

1.  In SAS Visual Analytics, I click on Data, right click on TESTED (the measure upon which my calculated item for data suppression will be created), and select New calculation.

2.  In the Create Calculation dialog, I change the Type to Suppression. By default, SAS Visual Analytics fills in the default value of 5 observations for the Suppress data if count less than: parameter field. I plan to change this value and the condition; for now, I keep the default value so I click OK.

Edit the Calculated Data Item for Data Suppression Without Complementary Values

1.  To edit the calculated item that I just created, I click on Data, right click on the calculated item I just created (TESTED (Data suppression) 1 and choose Edit.

2.  In the Visual mode, I see the calculated item for data suppression.

3.  I click on Text because I want to suppress low values for the TESTED column (which is the number of students that took the test) to 14 and below, and not the number of observations (Frequency) that are suppressed by default. So I edited the condition for data suppression and saved it:

4.   My second list table already has roles assigned to it. Now I added the newly created calculated data item: TESTED (Data Suppression) 1.
This List Table now shows asterisks for values suppressed in the TESTED column for any high school where 14 or fewer than 14 students took the SAT test.

All values for the TESTED measure upon which my condition is based are replaced with asterisk characters. It is important to note that although the suppressed values for TESTED are hidden from view with asterisks, they are still present in the data source. Therefore, I should hide the original measure (in this case, TESTED) from view in the report to prevent the accidental use of the TESTED measure for other report objects in the same report – (we’ll take a quick look at that at the end).

Create the Calculated Data Item for Data Suppression With Complementary Value

1.  I click on Data, right click on TESTED, and select New calculation.

2.  In the Create Calculation dialog, I change the Type to Suppression and click OK to save this new calculated item.

Edit the Calculated Data Item for Data Suppression With Calculated Value Suppression

1.  To edit the calculated item that I just created, I right click on the calculated item for data suppression and choose Edit.

2.  In the Edit Calculated Item dialog, I click Text to see the text version of the calculated data item, and I edited the condition to ensure that data is suppressed for high schools where the total number of students tested equals 13.

My List Table now shows values suppressed in the TESTED column for the high school where 13 students took the SAT test. In addition, another value in the TESTED column is also suppressed randomly by SAS Visual Analytics – in this case, it was for Creswell High School. The random suppression of another value is done to prevent your audience from looking at the Totals column and guessing the number of students that took the SAT test in each high school.

Be sure to follow the three best practices that are described for data suppression in the SAS Visual Analytics 8.2 documentation:

The TESTED measure does not display anymore.

For details on how to show or hide data items, see Is it sensitive? Mask it with data suppression was published on SAS Users.

3月 012018

Let’s say that you are administering a SAS 9.4 environment that is working just fine. You’ve checked that your full backups are indeed happening and you’ve even tried restoring from one of your backups. You are prepared for anything, right? Well, I’d like to propose a scenario to you. You probably have users responsible for creating reports, maybe even very important reports. What if something happened to one of these reports? Perhaps the user wants to revert to an earlier version. Perhaps the report was accidentally deleted or even corrupted, what then? Restoring a full backup in this situation might help this one user but would likely inconvenience most other users. With a little more preparation, you could “magically” restore a single report if needed. Here’s what you need to do: create a backup of only these critical reports using the promotion tools.

The promotion tools include:

  • the Export SAS Package Wizard and the Import SAS Package Wizard available in SAS Management Console, SAS Data Integration Studio, and SAS OLAP Cube Studio.
  • the batch export tool and the batch import tool.

Note: Starting with the third maintenance of SAS 9.4, you can use the -disableX11 option to run the batch import and batch export tools on UNIX without setting the DISPLAY variable.

You can use the promotion tools on almost anything found in the SAS Folder tree, especially if you use SAS Management Console. If you use the wizards in SAS Data Integration Studio or SAS OLAP Cube Studio, those applications only allow you to access and export/import objects that pertain to that application, a subset of what is available in SAS Management Console.

You may be thinking that using an interactive wizard is not really the answer you are looking for and you may be right. The batch tools are a great solution if you want to schedule the exporting of some objects on a regular basis. If you are unfamiliar with the promotion tools, I would suggest you start with the interactive wizards. You will find that the log produced by the wizard includes the equivalent command line you would use. It’s a nice way to explore how to invoke the batch tools.

Creating the Export Package

How to invoke the Export SAS Package Wizard:

1.  Right-click on a folder or object in the SAS Folders tree and select Export SAS Package.

Selectively backing up metadata

2.  Enter the location and name of the package file to be created and set options as appropriate.

You can opt to Include dependent objects when retrieving initial collection of objects here or you can select specific dependent objects on the next screen.

Filtering offers some very interesting ways of selecting objects including:

  • By object name
  • By object type
  • By when objects were created
  • By when objects were last modified

3.  Select the objects to export. If you started the process with a folder, you will be presented with the folder and all of its contents selected by default. You can deselect specific objects as you like.

In this example, we only want the Marketing folder and its contents. Deselect the other folders. You want to be careful to not create a package file that is too big.

You can click on individual objects and explore what dependencies the object has, what other metadata objects use the current object, options and properties for the object.

In this example, the Marketing Unit Report is dependent on the MEGACORP table whose metadata is found in the /Shared Data/LASR Data folder. When you import this report, you will need to associate the report with the same or similar table in order for the report to be fully functional.

If you had selected Include dependent objects when retrieving initial collection of objects on the previous screen, all of the dependent objects would be listed and be selected for export by default.

Bonus things you get by default in the export package include:

  • Permissions set directly on the objects
  • For most object types, the export tools include both metadata and the associated physical content. For example, with reports you get both the metadata and associated report XML. For a complete list of physical content promoted with metadata objects, refer to:

    5.  When the export process is complete (hopefully without errors) review the log.

    At the top of the log, you can see the location of the log file in case you want to refer to it later.

    If you scroll to the end of the log, you’ll find the command line to invoke the batch export tool to create the same package.

    Considerations for Exporting

    Importing to the Rescue

    Let’s talk about what happens if and when you actually need to import some or all of the objects in a package file.
    Let’s take a look at what we would need to do to replace an accidentally deleted report, Marketing Unit Report.

    How to invoke the Import SAS Package Wizard:

    5.  Right-click on the same folder you started the export, SAS Folders folder in our example, and select Import SAS Package. It is important to initiate the import from the same folder you started the export if you want to end up with the same folder structure.

    6.  If needed, use the Browse functionality to locate the correct package file.

    Include access controls

    By default, Include access controls is not selected. This option will import permission settings directly applied to the objects in the package. It will not import any permissions if there were only inherited permissions on the object in the source environment.

    Since we are bringing the report back into the folder it originally came from, it makes sense to also include direct permissions, if there were any.

    If you do not check the Include access controls box and there are in face some direct permissions on objects being imported, you will get this warning later in the wizard:

    Select objects to import

    If you’re not sure whether to select to import All objects or New objects only, you can always start with all objects. You can use the Back buttons in the wizard to go back to previous prompts and change selections, at least before you kick off the actual import process.

    7.  If you selected import all objects on the first screen, you will see a listing of all objects. Each object will have an icon indicating if the object currently exists where you are doing the import or not. The red exclamation mark indicates the object currently exists and doing the import of this object will overwrite the current object with the copy from the package. The asterisk icon indicates that the object does not currently exist and will be created by the import process.

    In our example, the Marketing Unit Report does not currently exist in the Marketing folder but is in the package file so it is labeled with an asterisk. The other two reports are both in the folder and the package file so they are labeled with red exclamation marks.

    You’ll want to make the appropriate selections here. If you want all of the contents of the package to be written to the Marketing folder, overwriting the first two reports and adding the Marketing Unit Report, leave all objects selected. If one of the reports had become corrupted, you could use this method to overwrite the current copy with the version stored in the package file.

    If you just want to replace the missing Marketing Unit Report, make sure only that object is selected as below:

    By default, objects are imported into the same folder structure they were in when the export package was created.

    8.  Part of the import process is to establish associations between the objects you are importing and metadata not included in the package. You are first presented with a list of the metadata values you will need to select.

    9.  Set the target value(s) as needed.

    In our example, we definitely want the report to use the same table it used originally.
    If we were moving objects to a new folder or a new environment, you might want to associate the report with a different table.

    If you use the batch import tool, changing these associations would be done in a substitution properties file.

    10.  Review the import summary and initiate the import process.

    11.  Hopefully, the process completes without errors and you can review the log.

    12.  Finish things off by testing the content you imported. In this case, we would log in to SAS Visual Analytics and view the Marketing Unit Report.

    Considerations for Importing

    • If you initiated the export from the SAS Folders folder and try to import the package from another folder, Marketing for example, the wizard will recreate everything in the package, including a new Marketing subfolder which is probably not what you intended.

    Notice the new Marketing folder inside the current Marketing folder. In addition, all three reports are considered new since the new Marketing subfolder does not currently exist.

    • The account you use to do the import should have enough access to metadata and the operating system.

    Next Steps

    • Decide what you want to export, how often, and how long you want to keep a specific package file.
    • Once you’ve gotten comfortable with the wizards and you want to schedule an export (or several), you should try out the batch export and import tools. When you name the export package, you can consider customizing the package name to include the date to avoid overwriting the same package file each time.

    Review the documentation on both the wizards and batch tools in the Admin Notebook: Making the case for selectively backing up metadata was published on SAS Users.

2月 282018

Goutam Chakraborty is a busy man. In addition to serving as SAS professor of marketing analytics at Oklahoma State University, Dr. Chakraborty is the director and founder of the SAS and Oklahoma State University MS in Business Analytics and an award winning author and professor. He teaches courses in such areas as business analytics, marketing analytics, data mining, marketing research, and web strategy, and has been preparing students to enter the workforce with advanced skills in marketing and analytics for more than 20 years. Throw in the regular consulting engagements he has with some of the world's top companies and it makes you wonder if Dr. Chakraborty has time to add anything else to his already full plate. Well, this year at least, you add SAS Global Forum 2018 Chair to the list - likely at the expense of a good night's sleep.

As the largest gathering of SAS users in the world, SAS Global Forum will attract more than 5,000 SAS professionals for several days of learning and networking. Recently, I sat down with Dr. Chakraborty to talk with him a bit about this year's conference, which takes place April 8-11, 2018 in Denver. I left excited about SAS Global Forum 2018 and, at the expense of losing credibility as a fair and balanced reporter, convinced that Dr. Chakraborty is one of the nicest individuals you'll ever meet.

Larry LaRusso: I know you've been preparing to chair SAS Global Forum 2018 for more than three years, but now that the event is only a few weeks away, how excited are you to kick this thing off?
Goutam Chakraborty: More excited than you know Larry. I've participated in many SAS Global Forums, but serving as chair gives you the ability to influence every aspect of the event, from speaker and content selection to charity-related events and networking opportunities. It's been a wonderful opportunity to give back to the SAS user community, one I'll never forget.

LL: What excites you most about this year's event?
GC: There are so many new things about this year's conference, all geared toward providing an enriching experience for all SAS users. I'll mention three that immediately come to mind.

One thing we've tried to do well this year is connect industry with academics. While we'll have a full program of events and talks specifically geared toward students and professors, this year we'll emphasize partnerships with industries in a new way. I might be most excited about Sunday's Talent Connection. This event brings students and SAS professionals together to network, discuss career opportunities and share knowledge, research and partnership opportunities that might exist with each other. I anticipate it being a great success for both students and industry looking to connect with young analytical talent.

Another strong focus for us is career development and learning for SAS users at all levels. We'll have a full menu of traditional training and certification opportunities for data scientists, business and data analysts and SAS programmers, but we're also providing opportunities to build on soft-skills development, such as networking, analytical story-telling and much more. We'll also have an on-site Learning Lab, available for several hours each day, where users can explore more than 25 e-learning courses for free.

Finally, I'll mention our volunteer opportunities. We'll have several ways for users to give back, but I'm particularly excited about our STEM-related charity event. During meals and evening networking receptions, both Monday and Tuesday, attendees will have the opportunity to work with RAFT Colorado (Resource Area For Teaching), and build STEM-inspired teaching kits for local teachers to use in their classrooms. Each kit will repurpose educational items RAFT has collected and make them available to teachers as creative tools for teaching STEM – inspiring the next generation of thinkers, innovators, problem-solvers and creators. It's an extraordinary opportunity to impact local area children.

LL: Speaking of extraordinary, this year's conference theme is "Inspire the Extraordinary." What does that theme mean to you?
GC: It means never accept "good enough." I always tell my students to push for something above and beyond what's expected of them, to be extra-ordinary. We expect the same for this year's SAS Global Forum. Knowing the event like I do, I feel confident we're going to deliver a SAS Global Forum that surprises and delights our users in a way they didn't expect.

LL: We all know that one of the best things about SAS Global Forum is its incredible content. What can you tell us about the content you’re putting together for this year’s event?
GC: Thanks to tons of hard work and research from a lot of SAS users, we've selected fantastic content from renowned speakers from across the world. Perhaps the best part of our content planning this year is the variety. Topics range from deep hard-core programming to high-level strategic thinking about data and analytics. From sessions that will help you to develop yourself personally as a better human-being to learning about optimizing Monday night NFL schedule for best viewership to thinking strategically about data as a currency – there is something of value for everyone.

SAS Global Forum 2018LL: SAS Global Forum is likely to attract more than 5,000 data scientists, analytics professionals and business leaders. Every year it amazes me how many of those users are attending SAS Global Forum for the first time. What advice would you give first-timers?
GC: First piece of advice: Have a plan and build a personalized agenda so you don’t get overwhelmed by the large number of available sessions. Second, take every opportunity to engage and network with other attendees. One of the best things about this conference is how willing veteran SAS users (regulars at this conference) are to help and welcome newcomers. So, take advantage of it. If you are sitting down for breakfast or lunch, take the time to introduce yourself to people around you. You may be surprised where it could lead. I'd also encourage attendees to take time to visit the Quad. The Quad is a casual and interactive space where attendees can network with other SAS enthusiasts, view demos and visit with experts from SAS and our sponsors. And, last but not the least, have some fun! Attend the social events we have planned, especially the Kick Back Party at Mile High Stadium on Tuesday evening.

LL: As an academician, I know you’re passionate about learning? What additional learning opportunities, beyond the session talks, are available to attendees?
GC: There are so many learning opportunities at SAS Global Forum that it is mind-numbing. Of course, the 20 and 50 minute session talks are the main modes of content delivery, but there are also e-posters, table talks and super demos in the Quad. We'll also have dozens of pre-conference tutorials, post-conference training, and all the activity in the Learning Labs, including hands-on workshops and the ability to take individual e-learning courses.

LL: Given your personal interests, I know one of your goals for this year’s conference is to increase participation in the event for students and professors. Can you tell me a little more about the special events you have planned for this audience?
GC: For starters, SAS Global Forum is completely “free” for students! As long as you are a full-time enrolled student of an accredited, degree-granting academic institution you can attend free of charge. There are credit hour minimums that must be reached to be eligible, so I'd encourage students to visit the website for complete details.

Programmatically, we have the Sunday afternoon sessions entirely dedicated to academics. We have a fantastic academic keynote speaker, Temple Grandin from Colorado State University, and special training sessions for professors interested in teaching analytics at their universities. For students, we offer a number of opportunities to network and special courses, such as how to best use social media for networking while looking for a job, to help them make a successful transition from student to working professional. We also encourage students, and really anyone who has an interest, to attend the presentations students make as winners of the SAS Global Forum Student Symposium Student Symposium. Though closed now, the Symposium provides an opportunity for teams of two to four students and a faculty adviser to showcase their skills and compete with other teams in the application of SAS Analytics in solving a big data problem. This year, more than 60 teams entered; the top eight will present 20-minute talks during the event.

LL: Dr. Chakraborty, I've taken a lot of your time, but is there anything else you'd like to share with our readers?
GC: Actually, I'd like to thank the many volunteers who have helped put this conference together. From serving on our SAS Global Users Group Executive Board to helping evaluate and select talks, to serving in our Presenter Mentor Program, hundreds of users have invested their time to make this conference the best one yet. SAS Global Forum is truly a user's conference and we depend on the user community to plan, promote and execute so many tasks and activities related to the event. Though I can't call them out by name, I would be remiss if I didn't mention their contributions and take a minute to thank them.

LL: Well let's hope they're reading! Dr. Chakraborty, I want to thank you again for your time. I look forward to seeing you in Denver in April.

Visit the SAS Global Forum 2018 website for more information and to register. Conference Proceedings will be available shortly before the event begins.

Continue the conversation: Join our live Tweetchat, Wednesday, March 7, 2018

How are you inspiring the extraordinary?

The next analytics extraordinary use case is just waiting to be discovered. We believe that in the hands of lifelong learners, the future of data is unlimited, especially when education and business join forces. That is why we are warming up to SAS Global Forum 2018 in Denver with a tweetchat on Wednesday 7th March (simply search #SASchat or #SASGF). We kick off at 6pm CET, 5pm UK, noon ET and 9am Pacific. Will you join us? The discussion will kick off with the following questions, posed to our expert panel:

  • Why is there more interest in lifelong learning now?
  • How does lifelong learning contribute to the analytics economy?
  • What are your favorite examples of analytics in the not-for-profit sector?
  • How is the education sector influencing the development of citizen data scientists?
  • What trends do you see in the consumption of analytics?

A conversation with SAS Global Forum 2018 Chair Goutam Chakraborty was published on SAS Users.

2月 212018

Data protection with GDPRHere are some new tips for masking. The new EU General Data Protection Regulation (GDPR) requires your company to implement (quote) all necessary technical and organizational measures and to take into consideration the available technology at the time of the processing and technological developments. So, how can you comply with this requirement in the real world? In Part 1, we anonymized field content or replaced it with aliases. That can be sufficient, but it doesn’t have to be. That’s why we’ll cover beta functions in this article (the ideal solution for pseudonymization), personal data that has slipped through the cracks, and the exciting question of ...

Read part 1 of this series: Pseudonymagical: masking data to get up to speed with GDPR

How random can your birth be?

The exact date of your birth is important to you, naturally. The analytics experts working with your data, on the other hand, aren’t looking to send you birthday wishes anyway (missing opt-in?!). What they’re interested in is your approximate age, maybe even just the decade. The SQL code from Part 1 moves the date of birth randomly plus or minus five days. Someone who knows your birth date would therefore be unable to locate your records within a stolen database. Privacy risk abated!

But even that should be verified… with respect to providing proof of “appropriate measures,” in other words, cluster size. In our example of around 5,000 VIP customers, there is only one who is in their 20’s and has a postal code beginning with the numeral 1. The time required to indirectly identify the individual (Recital 21, GDPR) could be rather low here. In the worst case scenario, legally too low.

Enter the beta function: the ideal solution for pseudonymization

Luckily, Recital 29 of the General Data Protection Regulation tells us how to handle this problem. The information required to pinpoint an individual is simply stored separately. That can be accomplished using a key or a mathematical function, in other words a macro, with a secret key that I only use – but don’t know about the math hidden behind it. The law doesn’t tell us how tricky this logic has to be, though. This so-called beta function should satisfy two additional conditions from an analytical standpoint:

  • It must be invertible (a hash is not, for instance).
  • The result of the masking should be monotonic, which means: high original value = high new value (encryption doesn’t do this).

Why? Well, we don’t want to affect the analytic modelling too much - ideally, the function would output something linear or slightly exponential… Here is a √2 example I’ve kept simple:


Mathematically, this is a coordinate transformation - or you can also think of it in terms of Star Trek: people are being beamed to an unfamiliar planet. There is a different gravity field than the earth there (a different coordinate system), but it applies to everyone equally — which means that lightweight visitors on the planet can still jump higher there than their heavyweight colleagues. The same applies accordingly to age etc.


When using the birth date or the age, I, as an analytics expert, have no knowledge of how this beaming works technically, but trust that when I’m developing models (and later when scoring) them, that nothing about the behavior has changed. By the way, the computer and correlation don’t care anyway - neither have any concept of age. (It just feels a bit strange for humans.)

We don’t lose the “true” age. It can be re-calculated using another beta function. With what is known as the inverse, but it’s available only to authorized employees - for instance to fraud or legal people during data protection lawsuits. In these cases, your customer can safely be beamed back to earth, so to speak.

A complaint from my office mate

“But how do I explain to the boss my model behavior for these 300-year-olds?!” ... Well in this era of machine learning, neural networks are gaining in popularity and are as selective as they are indescribable. On our side, the math behind it is at least deterministic and explainable; good to know that this key code is no longer stored on your PC, not glued to its data source and target, but remote and safe – because of modern data protection to protect you and the data. And that’s a good thing.

Final aspect: the data for relevant columns has now been subjected to smart masking, the logic is in a central repository, and it’s working in secret. But what about those seemingly harmless fields way in the back, mostly empty and irrelevant, which then in the form of a sales memo or notice suddenly reveal the name of the wife, the second email address, or the former employer? The author who created them thought it was extremely practical, since they didn’t find anywhere else in the contract template where they could enter and save the information.


SAS Data Quality has pre-configured, transparent sets of rules that you can tweak as necessary to detect many of these types of cases using heuristics. That’s indispensable because if I don’t know about it, I can’t protect against it. (If I forget about the tiny basement window when installing the security system, I can be sure that the robbers won’t cooperate by breaking down the front door).

That is a prerequisite for an inventory of the data warehouse, the estimate of the GDPR implementation expense — and here an additional safeguard. Because in the code above, a firewall filter is applied to the data: if the name of a human being slips through the cracks, then only asterisks are displayed when it is output. The field “Note” is always replaced by the description of the category, such as “This is where a telephone number is hidden. After approval by the data protection officer, you may read it – but not for now.”

Are you ready for the GDPR? Learn how your peers are preparing in this global survey report.

Disclaimer: The author of this blog is not an attorney. None of the statements in this article can be construed as legal advice nor can they serve as a substitute for professional legal consultation. All code samples are for illustrative purposes only.

Beam your customers into invisibility: a data protection masked ball to get you up to speed with the GDPR was published on SAS Users.

2月 212018

Masking data for GDPRYou work with data. Data about your customers. It's likely that your customers' identity could be determined from the data you’ve collected. Starting in May 2018, a new data protection law will be in effect. This means you’re going to have to document which technical measures you’ve implemented to prevent your customers from being identified. That’s tricky, but isn’t everything already pseudonymized or anonymous or something? Won’t the IT department just take care of it? No. They won’t. This blog post gives you a detailed breakdown of what’s possible and where standard software can help with masking data for GDPR.

Read part 2 of this series: Beam your customers into invisibility: a data protection masked ball to get you up to speed with the GDPR

In the past, compliance has been relatively easy. The data protection officer was sure to take care of everything, documenting the steps taken together with IT and the compliance team. It felt like the powerful data protection law was there to protect people in charge - like you – from the data’s owners out there. Too abstract for you? Here’s an analogy:

Let’s say my car’s inspection is up in May. I have placed my trust in a copy of the traffic regulations that rests in my glove box. I tell the inspector, this is how I’ve always done things! And my perfectly maintained car has been running great for years. Wait a minute, the new mechanic seems to have missed something last time... and now you want my documentation?! Huh? What documentation?

Data protection by design and by default (Art. 25 GDPR)

But let’s go back to (your) data, which you obtained from the data warehouse (DWH), received from generous colleagues’ collections, or harvested fresh from the data lake ... you know, the stuff that self-service big data analytics is made of. Mix it all together, and the computer is able to guess the customer’s behavior. Lawyers are less inclined to wax poetic here, referring instead to “profiling” (GDPR Article 4), which can quickly extend beyond the initial purpose of the processing. The bad part? If the customer doesn’t reasonably expect this kind of processing of their information, they can submit a complaint and even demand their data be deleted at once (Article 22 and Recital 71 GDPR).

What to do? Well, get rid of personally identifiable information right from the start! So, just select fewer data points? Yes. But if you need them for your analytics, write down in advance exactly how the data will be processed ... that’ll give your data scientist a headache. Even fragments with very minimal data could still be traced back to single out the “wife of a dentist, aged 30 to 40, living in Smallsville.” So it’s better to properly “pseudonymize” or anonymize from the start.

In other words, you have to replace all dates with asterisks or a random number or encrypt them or hash them - then, in the best-case scenario, the record is anonymous. But this procedure, unfortunately, also makes it useless for analysis in most cases. Or even worse, completely wrong with respect to segmentation, model development, or scoring. More about that later. Here’s an example of “common” customer data:

Let’s say we have a file that contains a great deal of personally identifiable information. It could even be lurking in comment fields or notes - without ever being listed in a data dictionary or the records of processing activities. But it’s hiding in there, in unassuming properties, selected out of curiosity or by mistake, and thus stored. And simply storing it brings the GDPR into play. Oops!

Totally anonymous: ensure compliance by making data useless?

So how exactly do you make data unidentifiable without completely destroying all of its value? When masking, you should deploy standard functions that use quality-controlled logic in such a way that users who processes the data are unable to view the algorithmn. In SQL, it may look something like this:

     SYSCAT.DM.MASK ( ‘ENCRYPT’, PUT(A.CustomerNo , 8.), ‘alg’, ‘AES’, ‘key’, ‘12345)
          AS CustomerNr_encrypt,
     SYSCAT.DM.MASK ( ‘HASH’, A.IBAN , ‘alg’, ‘SHA256’, ‘key’, ‘12345)
          AS IBAN_hash,
     SYSCAT.DM.MASK ( ‘TRANC’,A.”IBAN” , ‘FROM’, ‘1234567890’, ‘TO’, ‘XXXXXXXXXX’,
               ‘START’, 10 , ‘LENGTH’, 9 )
          AS IBAN_tranc,
     PUT(SYSCAT.DM.MASK ( ‘RANDATE’, A.Bithdate, ‘VARY’, 5, ‘UNITS’, ‘DAY), DDMMYYP10.)
          AS Birthdate,
     SYSCAT.DM.MASK ( ‘RANDOM’, A.Balance, ‘VARY’, 100 )
          AS Balance,
                               A.COMMENTFIELD, ‘PDP - Personal Data (Core)’, ‘Individual’,’DEDEU’ ) ne “)
               THEN* * *’ ELSE A.COMMENTFIELD END)
          AS Commentfield_without_name,

The results look appealing.

But the following methodological problem quickly arises. Who in the corporate data flow carries out this step, and where should they store the results? Depending on the type of processing, the technician or analytics expert may not be permitted to even view the data in the first place, making it impossible for them to apply masking on top! How about IT (or the application developers) doing a generic anonymization of the entire DWH in one go? Well, that’s a sure way to secure the data, and a sure way to make it useless for a lot of other things, too!

Who’s masking who?

The solution: no one touches those official tables (see note at bottom). The user just no longer (physically) accesses them directly, but rather via dynamic views, which only read specified data. These views handle masking “on-the-fly,” in real time or from the cache – without destroying the original data. In the simplest cases (as in our example code above), the content of the respective fields is modified “for all selectors.” Or depending on a control table, they’re flipped off like a switch the instant a user communicates their opt-out.

The SAS Federation Server hosts these types of views, gently inserting itself between business users, analytics experts, and internal consumers of reports. Once logged in, the server knows the user’s groups, roles, and permissions. This can be elegantly leveraged in the design of the views. The very same view can allow that team with the VIP flag to also see the balance, whereas the fraud team sees everything, while the rest get harmless asterisks in place of sensitive information. All from the same physical data, controlled through a central platform.

New on the inside, proven on the outside: controlled protection for switching on and off

Access to these types of centrally administered views is accomplished via ODBC / JDBC / API. Or, for SAS users, via the trusty libname statement. There, instead of “ORACLE,” “FEDSVR” is your keyword. Everything else looks just like it always has. The tables (some of which are not actual tables) appear, the code is ready to run. Such libref can, of course, also be pre-assigned in the metadata context.

LIBNAME mydwh FEDSVR DSN=dwh1 SERVER=”” PORT=24141 SCHEMA=Team_Alpha;

A double benefit: the IT team can take its time rebuilding the physical layer (DBMS portings, modifications to the DDL, switching to a different DB, etc.) without the phones ringing off the hook with important departments calling to complain. Plus, every access can be logged. The SAS Federation Server can optionally record queries as well, regardless of whatever sneaky macro data step was used - everything is processed and logged. And users cannot get around it by deleting their SAS logs.

This activity should not be misconstrued as surveillance. The GDPR is what demands proof of who, how, where, and when ... and for data breaches, within 72 hours. Like when someone at the company burns a CD with sensitive tax information or commits other internal fraud. And as with the deletion approach, it’s already a big step in the right direction to be able to say, “yes, that’s how it would work — should we turn it on?”

Summary of the first part

The General Data Protection Regulation requires your company to implement the “appropriate technical and organisational measures” and “taking into consideration the available technology at the time of the processing and technological developments.” This article has given you some suggestions for things you can discuss with colleagues. And what’s more, the software is already out there - there’s no need to patch together a solution yourself. Instead, you can use that valuable time to generate the documentation for compliance instead.

P.S. There’s no time to lose to comply with the GDPR. Learn how to get ready in this global survey report.

Attachment: Legal texts

Profiling GDPR Art. 4, Par. 4:
‘profiling’ means any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyze or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behavior, location or movements;

The GDPR is more specific with anonymous pseudonymization (Recital 26 and 29):
To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. The principles of data protection should therefore not apply to anonymous information, namely (...) to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.

In order to create incentives to apply pseudonymization when processing personal data, measures of pseudonymization should, whilst allowing general analysis, be possible within the same controller when that controller has taken technical and organisational measures necessary to ensure, for the processing concerned, that this Regulation is implemented, and that additional information for attributing the personal data to a specific data subject is kept separately. The controller processing the personal data should indicate the authorised persons within the same controller.

A final note: To ensure the separate “right to be forgotten” in the data warehouse (and countless source systems) as per Art. 17 GDPR, special measures must be taken. But this is not what we’re talking about above — deleted records are “gone,” and can no longer be processed or pseudonymized by means of profiling. Note: In the event that they are not intended (or permitted) to disappear from the relevant database immediately, approaches such as the one described could offer a temporary solution.

Disclaimer: The author of this blog is not an attorney. None of the statements in this article can be construed as legal advice nor can they serve as a substitute for professional legal consultation. All code samples are for illustrative purposes only.

Pseudonymagical: masking data to get up to speed with GDPR was published on SAS Users.

2月 202018

When using conventional methods to access and analyze data sets from Teradata tables, SAS brings all the rows from a Teradata table to SAS Workspace Server. As the number of rows in the table grows over time, it adds to the network latency to fetch the data from a database management system (DBMS) and move it to SAS Workspace Server. Considering big data, SAS Workspace Server may not have enough capacity to hold all the rows from a Teradata table.

SAS In-Database processing can help solve the problem of returning too much data from the database. SAS In-Database processing allows you to perform data operations inside the DBMS and use the distributed processing over multiple Access Module Processors (AMPs). Select SAS procedures take advantage of Teradata SQL functionality, and in some cases leverage SAS functions deployed inside the DBMS. The goal of in-database processing is to reduce the I/O required to transfer the data from Teradata to SAS.

SAS® In-Database Processing in Teradata

Using SAS In-Database processing, you can run scoring models, some SAS procedures, DS2 threaded programs, and formatted SQL queries inside the Teradata database.

The list of SAS In-Database features supported for Teradata include:

  • Format publishing and SAS_PUT()function
  • Scoring Models
  • DS2 Threaded programs
  • Data quality operations
  • Extract and transform data

SAS In-Database Deployment Package for Teradata

The in-database deployment package for Teradata includes the following:

  • The SAS formats library, accelterafmt-######.rpm, installs a SAS formats library on the Teradata server. By having a SAS formats library on your Teradata system, you can publish SAS formats in Teradata, which enables you to process SAS statements with SAS formats in the Teradata database. This also enables you to publish SAS PUT functions to Teradata as a SAS_PUT() function. This software can be found in your SAS Install folder under /SAS-install-directory/SASFormatsLibraryforTeradata/3.1/TeradataonLinux/.
  • The SAS® Embedded Process package, sepcoretera-######.rpm, installs SAS Embedded Process in the Teradata database. This is the core package of in-database components. This software can be found in your software depot under folder /depot/standalone_installs/SAS_Core_Embedded_Process_Package_for_Teradata/13_0/Teradata_on_Linux.
  • The SASEPFUNC package, sasepfunc-#####.x86_64.tar.gz, installs SAS Embedded Process support functions on Teradata. SAS Embedded Process support functions are Teradata stored procedures that generate SQL to interface with SAS Embedded Process. The script from the package creates a Teradata database named SAS_SYSFNLIB with a list of tables, views, functions, and procedures to support SAS Embedded Process. The same script also adds a list of functions in the TD_SYSFNLIB database. The package can be obtained from the Teradata support group.

The following figure shows the list of objects from the SAS_SYSFNLIB database to support SAS Embedded Process:

The following shows the list of objects from the TD_SYSFNLIB database to support SAS Embedded Process:

  • The SAS® Quality Knowledge Base package, sasqkb_ci-27.#####.noarch.rpm, installs SAS Quality Knowledge Base on the Teradata server. This is an optional package to SAS Embedded Process. This package is needed along with SAS® Quality Accelerator, if you are planning to run data cleansing operations in the Teradata database. The package can be downloaded from the SAS support site.
  • The SAS Quality Accelerator package. There are two scripts ( and located under SAS-Install-directory to install the data quality accelerator at Teradata. This is an optional package to SAS Embedded Process, and needed only if you are planning to run data cleansing operations in Teradata. The software install files can be found in the folder /SAS-install-directory/SASDataQualityAcceleratorforTeradata/9.4/dqacctera/sasmisc/. As a part of script execution, it adds a list of objects (procedures, functions) to the SAS_SYSFNLIB database.

Sample list of data quality related objects from the SAS_SYSFNLIB database.

Examples of running DS2 Code to perform data quality, data extract, and transform operations in Teradata:

The following example describes the execution of DS2 code by using SAS Data Quality Accelerator and SAS Quality Knowledge Base to match and extract a data set from the Teradata database. The log shows that both Threads program and Data program ran in the Teradata database as in-database program execution.


Stay tuned in for the next part of the SAS In-Database Processing in Teradata blog series. Coming up is one about publishing SAS format in Teradata.

SAS In-Database Processing in Teradata DBMS was published on SAS Users.

2月 172018

Keyboard MacrosIt is not laziness—it is efficiency!!! Programmers are often called lazy; we even call ourselves lazy. But we are not lazy, we are just being efficient. It makes no sense to type the same code over and over again or use more keystrokes than are absolutely necessary.

Keyboard Macros

You might not have heard of keyboard macros. Or, perhaps, you do not know how they could help you. I am very fond of keyboard macros; let me show you why!

In SAS Technical Support, supporting the SAS® Output Delivery System (ODS) and Base SAS® procedures, I often use the same statements to set up test programs. For example, I want any style templates that I create to go into the Work directory. I also use the same data set name all of the time. I have created keyboard macros for the statements, data set names, and options that I use daily.

When I press Ctrl+Alt+w, the following is inserted into my program:

ods path(prepend) work.templat(update);

When I press Ctrl+Alt+p, the following is inserted into my program:


How did I do that? I recorded a keyboard macro that contains the code that I want. Then, I assigned keys that insert the code when I press them.

Here are the steps for recording your very own keyboard macro in the SAS Enhanced Editor:

1.  Select Tools ► Keyboard Macros ► Record New Macro.

2.  Enter the code that you want to be your new keyboard macro. Consider typing slowly because any backspaces that you use are included in the recording.

3.  After you are done entering text, you need to tell SAS to stop recording. Select Tools ►Keyboard Macros ►Stop Recording.

4.  A pop-up dialog box appears that lets you give the new macro a name and assign the keys that you want to be associated with the macro. You can set the key combination that make sense to you. Just make sure that you do not use a combination that is already assigned to another macro.

Now, whenever you need to insert that piece of code, just use the keys that you assigned!

In SAS® Enterprise Guide®, you can find keyboard macros under Program ► Editor Macros, instead of the Tools drop-down menu. The recording and key assignment steps are the same in both applications.

You can also create keyboard macros that perform tasks.

The Macros selection opens a pop-up dialog box that contains a Create button.

Clicking Create opens another dialog box.

With the Categories option set to All, you can see all of the commands that are already available. Moving these over to the Keyboard macro contents section enables you to build a macro that performs a task that you need to accomplish on a regular basis.

For example, I have combined these commands to select a whole block of code, like from the PROC statement down to the RUN statement.

Keyboards macros are available in the Enhanced Editor in Display Manager SAS (DMS) and in SAS Enterprise Guide. They cannot be used with the Program Editor in DMS or in SAS® Studio.

You can export and import keyboard macros. The file created when you export has the .kmf extension. You can find the options for importing and exporting in the Macros dialog box. You can share your keyboard macros with your friends, or just to keep them as a backup copy in case you need to reinstall SAS.

For more information, see the Using "Keyboard Macros" section in "Using the Enhanced Editor."

Function Keys

You have probably used the F8 key to submit your program, or the F4 key to recall your last program. Did you know that you can set or change those instructions?

In the Enhanced Editor, you can get the list of assigned keys by entering keys into the command bar or by selecting Keys under Tools ► Options.

I test a lot, which means that I am routinely clearing the log, the results viewer, and the output window. I have assigned an F key, F12, to clear everything and bring the focus back to the Enhanced Editor (see the commands in the screenshot below). I have to press only one key to clean everything up! I use the F12 key over and over again.

The keys that you assign in DMS are valid from both the Enhanced Editor and the Program Editor.

SAS Enterprise Guide includes a large number of commands by default. A lot of them already have keys assigned, but some do not. You can see the list of the commands and their assigned keys by selecting Enhanced Editor Keys under the Program drop-down menu.

Currently, it is not possible to modify the function keys in SAS Studio. However, a number of keys are already defined that you might find useful. You can see the function key shortcuts by clicking the question mark in the upper right, choosing SAS Studio help, and then selecting the option for Accessibility Features. Here are links to additional resources:

I highly recommend using keyboard macro and function keys. Why type the same thing over and over again? Increase your productivity by handing the repetitive tasks over to SAS.

Efficiency at your fingertips: Keyboard macros and function keys was published on SAS Users.

2月 162018

pricing and promotionThe consumer packaged goods (CPG) and Retail industry are going through a period of significant change. Both retailers and manufacturers are struggling to find growth and improve profitability. One strategy is through consolidation - e.g., Kraft-Heinz, Keurig- Dr Pepper Snapple Group on the manufacturer side, as well as Safeway-Albertsons, Ahold-Delhaize, Walgreens-Rite Aid on the retailer side. The thinking here is that these mergers would lead to large operational efficiencies and focused growth strategies.

Another important lever to drive growth is pricing and promotion. Companies have realized the importance of getting the pricing right and running high-impact promotions in a highly competitive market. As consumer shop multiple channels and new retail formats begin to permeate (e.g., smaller format stores, new entrants such as Aldi and Lidl), the importance of price-promo continues to increase. Pricing and promotion have become the second largest item on CPG manufacturer’s P&L, after cost-of-goods. Similarly for retailers, price-promo decisions have become critical for growth, maybe even survival. This is manifested in the growth in investment focused on pricing and promotion decisions. In some cases this investment could be as much as 20-25% of net revenue of the company.

However, despite the heavy investment in price-promo, the impact of these decisions is declining. A recent IRI study indicated that the price and promo elasticities (response of volume to pricing change) have been steadily declining over the past 3-4 years. Consumers are willing to buy less when faced with decreases in “regular or base” price as well as promoted price.  The study indicated that the “lift” from promotions had decreased by about 1,000 basis points over the past four years.  There is, therefore, an immediate need to manage price and promotion decisions in a more creative and impactful manner.

Three areas of improvement

What does this mean? What can companies do to improve the impact of their pricing and promotion investment? We believe that there are three important areas of improvement. The first area is around a more refined understanding of the impact of price-promo decisions.  The new focus is on understanding the true impact of merchandising through both traditional and new lenses, including stockpiling, cross-retailer pricing and advanced price engines. Being able to more accurately predict the pattern of consumer behavior allows for automation and faster and better decisions.

The second area is around rapid and dynamic decision making. This involves a focus on new techniques such as Artificial Intelligence and Machine Learning to drive price-promo decisions. AI/ML is already getting entrenched within demand identification, product development and in-market execution as well as marketing. Within CPG and retail pricing, this will be accomplished by (a) speed in dealing with the regularly-repeated manual tasks in an efficient manner and (b) new levels of insight and accuracy based upon market trends that enable pricing analysts to focus their efforts on the areas that matter in a dynamic manner. It is imperative to move from a user-driven, manual pricing adjustments to dynamic “smart solutions.”

Another important area of change in pricing and promotion is “personalized pricing;”that is allowing manufacturers and retailers to customize price-promo decisions towards individual consumer/shopper segments. This is done by combining frequent shopper (FSP) data with traditional price-promo modeling for an in-depth evaluation of merchandising strategies as well as developing custom offers that would stimulate demand within these segments. IRI research shows that FSP/loyalty card holders react differently to brand price changes. For example, Brand Loyals react stronger to base price changes, while Brand Non-Loyals react stronger to base price reductions, promotional prices and quality merchandising tactics​.

In our session titled “New Frontiers in Pricing Analytics” at the SAS Global Forum 2018, we will provide a detailed overview of the state of the industry and how it is evolving. We will provide an overview of the new techniques and technologies in this space as well as where things are headed in the future. We hope to see you there.


Shifting sands in pricing and promotion was published on SAS Users.

2月 152018

How effective is your organization at leveraging data and analytics to power your business models?

This question is surprising hard for many organizations to answer.  Most organizations lack a roadmap against which they can measure their effectiveness for using data and analytics to optimize key business processes, uncover new business opportunities or deliver a differentiated customer experience. They do not understand what’s possible with respect to integrating big data and data science into the organization’s business model (see Figure 1).

the economic value of data

Figure 1: Big Data Business Model Maturity Index

My SAS Global Forum 2018 presentation on Tuesday April 10, 2018 will discuss the transformative potential of big data and advanced analytics, and will leverage the Big Data Business Model Maturity Index as a guide for helping organizations understand where and how they can leverage data and analytics to power their business models.

Digital Twins, Analytics Profiles and the Power of One

We all understand that the volume and variety of data are increasing exponentially.  Your customers are leaving their digital fingerprints across the Internet via their website, social media, and mobile devices usage.  The Internet of Things will unleash an estimated 44 Zettabytes of data across 7 billion connected people by 2020.

However, big data isn’t really about big; it’s about small. It’s about understanding your customer and product behaviors at the level of the individual.  Big Data is about building detailed behavioral or analytic profiles for each individual (see Figure 2).

Figure 2: Building Individual Behavioral or Analytic Profiles

If you want to better serve your customers, you need to understand their tendencies, behaviors, inclinations, preferences, interests and passions at the level of each individual customer.

Customers’ expectations of their vendors are changing due to their personal experiences.  From recommending products, services, movies, music, routes and even spouses, customers are expecting their vendors to understand they well enough that these vendors can provide a hyper-personalized customer experience.

Demystifying Data Science (AI | ML | DL)

Too many organizations are spending too much time confusing too many executives on the capabilities of data science.  The concept of data science is simple; data science is about identifying the variables and metrics that might be better predictors of business and operational performance (see Figure 3).

Figure 3: A Moneyball Definition of Data Science

Whether using basic statistics, predictive analytics, data mining, machine learning, or deep learning, almost all of data science benefits are achieved from the simple formula of: Input (A) → Response (B).

Source: Andrew Ng, “What Artificial Intelligence Can and Can’t Do Right Now”

By collaborating closely with the business subject matter experts to choosing Input (A), those variables and metrics that might be better predictor of performance, the data science team can achieve more accurate, more granular, lower latency Response (B).  And the creative creation and selection of Input (A) creatively has already revolutionized many industries, and is poised to revolutionize more.

Data Monetization and the Economic Value of Data

Data is an unusual asset – it doesn’t deplete, it doesn’t wear out and it can be used across an infinite number of use cases at near zero marginal cost.  Organizations have no other assets with those unique characteristics.  And while traditional accounting methods of valuing assets works well with physical assets, account methods fall horribly – dangerously – short in properly determining the economic value of data.

Instead of using traditional accounting techniques to determine the value of the organization’s data, apply economic and data science concepts to determine the economic value of the data based upon it’s ability to optimize key business and operational processes, reduce compliance and security risks, uncover new revenue opportunities and create a more compelling, differentiated customer experience (see Figure 4).

Figure 4: Data Lake 3.0: Collaborative Value Creation Platform

The data lake, which can house both data and analytic models, is transformed from a simple data repository into a “collaborative value creation platform” that facilities the capture, refinement and sharing of the data and analytic digital assets across the enterprise.

Creating the Intelligent Enterprise

When you add up all of these concepts and advancements – Big Data, Analytic Profiles, Data Science and the Economic Value of Data – organizations are poised for digital transformation (see Figure 5).

Figure 5: Achieving Digital Transformation

And what is Digital Transformation?

Digital Transformation is application of digital capabilities to processes, products, and assets to improve efficiency, enhance customer value, manage risk, and uncover new monetization opportunities.

Looking forward to seeing you at my SAS Global Forum 2018 session and helping your organizations on its digital transformation!

Data monetization and the economic value of data was published on SAS Users.

2月 152018

In this article, I will set out clear principles for how SAS Viya 3.3 will interoperate with Kerberos. My aim is to present some overview concepts for how we can use Kerberos authentication with SAS Viya 3.3. We will look at both SAS Viya 3.3 clients and SAS 9.4M5 clents. In future blog posts, we’ll examine some of these use cases in more detail.

With SAS Viya 3.3 clients we have different use cases for how we can use Kerberos with the environment. In the first case, we use Kerberos delegation throughout the environment.

Use Case 1 – SAS Viya 3.3

The diagram below illustrates the use case where Kerberos delegation is used into, within, and out from the environment.

How SAS Viya 3.3 will interoperate with Kerberos

In this diagram, we show the end-user relying on Kerberos or Integrated Windows Authentication to log onto the SAS Logon Manager as part of their access to the visual interfaces. SAS Logon Manager is provided with a Kerberos keytab and HTTP principal to enable the Kerberos connection. In addition, the HTTP principal is flagged as “trusted for delegation” so that the credentials sent by the client include the delegated or forwardable Ticket-Granting Ticket (TGT). The configuration of SAS Logon Manager with SAS Viya 3.3 includes a new option to store this delegated credential. The delegated credential is stored in the credentials microservice, and secured so that only the end-user to which the credential belongs can access it.

When the end-user accesses SAS CAS from the visual interfaces the initial authentication takes place with the standard internal OAuth token. However, since the end-user stored a delegated credential when accessing the SAS Logon Manager an additional Origin attribute is set on the token of “Kerberos.” The internal OAuth token also contains the groups the end-user is a member of within the Claims. Since we want this end-user to run the SAS CAS session as themselves they must have been added to a custom group with the ID=CASHostAccountRequired. When the SAS CAS Controller receives the OAuth token with the additional Kerberos Origin, it requests the visual interface to make a second Kerberized connection. So, the visual interface retrieves the delegated credential from the credentials microservice and uses this to request a Service Ticket to connect to SAS CAS.

SAS CAS has been provided with a Kerberos keytab and a sascas principal to enable the Kerberos connection. Since the sascas principal is flagged as “trusted for delegation,” the credentials sent by the visual interfaces include a delegated or forwardable Ticket-Granting Ticket (TGT). SAS CAS validates the Service Ticket, which in turn authenticates the end-user. The SAS CAS Controller then launches the session as the end-user and constructs a Kerberos ticket cache containing the delegated TGT. Now, within their SAS CAS session the end-user can connect to the Secured Hadoop environment as themselves since the SAS CAS session has access to a TGT for the end-user.

This means in this first use case all access to, within, and out from the SAS Viya 3.3 environment leverages strong Kerberos authentication. This is our “gold-standard” for authenticating the end-user to each part of the environment.

But, it is strictly dependent on the end-user being a member of the custom group with ID=CASHostAccountRequired, and the two principals (HTTP and sascas) being trusted for delegation. Without both the Kerberos delegation will not take place.

Use Case 1a – SAS Viya 3.3

The diagram below illustrates a slight deviation on the first use case.

Here, either through choice or by omission, the end-user is not a member of the custom group with the ID=CASHostAccountRequired. Now even though the end-user connects with Kerberos and irrespective of the configuration of SAS Logon Manager to store delegated credentials the second connection using Kerberos is not made to SAS CAS. Now the SAS CAS session runs as the account that launched the SAS CAS controller, cas by default. Since, the session is not running as the end-user and SAS CAS did not receive a Kerberos connection, the Kerberos ticket cache that is generated for the session does not contain the credentials of the end-user. Instead, the Kerberos keytab and principal supplied to SAS CAS are used to establish the credentials in the Kerberos ticket cache.

This means that even though Kerberos was used to connect to SAS Logon Manager the connection to the Secured Hadoop environment is as the sascas principal and not the end-user.

The same situation could be arrived at if the HTTP principal for SAS Logon Manager is not trusted for delegation.

Use Case 1b – SAS Viya 3.3

A final deviation to the initial use case is shown in the following diagram.

In this case the end-user connects to SAS Logon Manager with any other form of authentication. This could be the default LDAP authentication, external OAuth, or external SAML authentication. Just as in use case 1a, this means that the connection to SAS CAS from the visual interfaces only uses the internal OAuth token. Again, since no delegated credentials are used to connect to SAS CAS the session is run as the account that launched the SAS CAS controller. Also, the ticket cache created by the SAS Cloud Analytic Service Controller contains the credentials from the Kerberos keytab, i.e. the sascas principal. This means that access to the Secured Hadoop environment is as the sascas principal and not the end-user.

Use Case 2 – SAS Viya 3.3

Our second use case covers those users entering the environment via the programming interfaces, for example SAS Studio. In this case, the end-users have entered a username and password, a credential set, into SAS Studio. This credential set is used to start their individual SAS Workspace Session and to connect to SAS CAS from the SAS Workspace Server. This is illustrated in the following figure.

Since the end-users are providing their username and password to SAS CAS it behaves differently. SAS CAS uses its own Pluggable Authentication Modules (PAM) configuration to validate the end-user’s credentials and hence launch the SAS CAS session process running as the end-user. However, in addition the SAS CAS Controller also uses the username and password to obtain an OAuth token from SAS Logon Manager and then can obtain any access control information from the SAS Viya 3.3 microservices. Obtaining the OAuth token from the SAS Logon Manager ensures any restrictions or global caslibs defined in the visual interfaces are observed in the programming interfaces.

With the SAS CAS session running as the end-user and any access controls validated, the SAS CAS session can access the Secured Hadoop cluster. Now since the SAS CAS session was launched using the PAM configuration, the Kerberos credentials used to access Hadoop will be those of the end-user. This means the PAM configuration on the machines hosting SAS CAS must be linked to Kerberos. This PAM configuration then ensures the Kerberos Ticket-Granting Ticket is available to the CAS Session as is it launched.

Next, we consider three further use cases where the client is SAS 9.4 maintenance 5. Remember that SAS 9.4 maintenance 5 can make a direct connection to SAS CAS without requiring SAS/CONNECT. The use cases we will discuss will illustrate the example with a SAS 9.4 maintenance 5 web application, such as SAS Studio. However, the statements and basic flows remain the same if the SAS 9.4 maintenance 5 client is a desktop application like SAS Enterprise Guide.

Use Case 3 – SAS 9.4 maintenance 5

First, let’s consider the case where our SAS 9.4 maintenance 5 end-user enters their username and password to access the SAS 9.4 environment. This is illustrated in the following diagram.

In this case, since the SAS 9.4 Workspace Server is launched using a username and password, these are cached on the launch of the process. This enables the SAS 9.4 Workspace Server to use these cached credentials when connecting to SAS CAS. However, the same process occurs if instead of the cached credentials being provided by the launching process, they are provided by another mechanism. These credentials could be provided from SAS 9.4 Metadata Server or from an authinfo file in the user’s home directory on the SAS 9.4 environment. In any case, the process on the SAS Cloud Analytic Server Controller is the same.

The username and password used to connect are validated through the PAM stack on the SAS CAS Controller, as well as being used to generate an internal OAuth token from the SAS Viya 3.3 Logon Manager. The PAM stack, just as in the SAS Viya 3.3 programming interface use case 2 above, is responsible for initializing the Kerberos credentials for the end-user. These Kerberos credentials are placed into a Kerberos Ticket cache which makes them available to the SAS CAS session for the connection to the Secured Hadoop environment. Therefore, all the different sessions within SAS 9.4, SAS Viya 3.3, and the Secured Hadoop environment run as the end-user.

Use Case 4 – SAS 9.4 maintenance 5

Now what about the case where the SAS 9.4 maintenance 5 environment is configured for Kerberos authentication throughout. The case where we have Kerberos delegation configured in SAS 9.4 is shown here.

Here the SAS 9.4 Workspace Server is launched with Kerberos credentials, the Service Principal for the SAS 9.4 Object Spawner will need to be trusted for delegation. This means that a Kerberos credential for the end-user is available to the SAS 9.4 Workspace Server. The SAS 9.4 Workspace Server can use this end-user Kerberos credential to request a Service Ticket for the connection to SAS CAS. SAS CAS is provided with a Kerberos keytab and principal it can use to validate this Service Ticket. Validating the Service Ticket authenticates the SAS 9.4 end-user to SAS CAS. The principal for SAS CAS must also be trusted for delegation. We need the SAS CAS session to have access to the Kerberos credentials of the SAS 9.4 end-user.

These Kerberos credentials made available to the SAS CAS are used for two purposes. First, they are used to make a Kerberized connection to the SAS Viya Logon Manager, this is to obtain the SAS Viya internal OAuth token. Therefore, the SAS Viya Logon Manager must be configured to accept Kerberos connections. Second, the Kerberos credentials of the SAS 9.4 end-user are used to connect to the Secure Hadoop environment.

In this case since all the various principals are trusted for delegation, our SAS 9.4 end-user can perform multiple authentication hops using Kerberos with each component. This means that through the use of Kerberos authentication the SAS 9.4 end-user is authenticated into SAS CAS and out to the Secure Hadoop environment.

Use Case 5 – SAS 9.4 maintenance 5

Finally, what about cases where the SAS 9.4 maintenance 5 session is not running as the end-user but as a launch credential; this is illustrated here.

The SAS 9.4 session in this case could be a SAS Stored Process Server, Pooled Workspace Server, or a SAS Workspace server leveraging a launch credential such as sassrv. The key point being that now the SAS 9.4 session is not running as the end-user and has no access to the end-user credentials. In this case we can still connect to SAS CAS and from there out to the Secured Hadoop environment. However, this requires some additional configuration. This setup will leverage One-Time-Passwords generated by the SAS 9.4 Metadata Server, so the SAS 9.4 Metadata Server must be made aware of the SAS CAS. This is done by adding a SAS 9.4 metadata definition for the SAS CAS. Our connection from SAS 9.4 must then be “metadata aware,” achieved by using authdomain=_sasmeta_ on the connection.

Equally, the SAS Viya 3.3 side of the environment must be able to validate the One-Time-Password used to connect to SAS CAS. When SAS CAS receives the One-Time-Password on the connection, it is sent to the SAS Viya Logon Manager for validation and to obtain a SAS Viya internal OAuth token. We need to add some configuration to the SAS Viya Logon Manager to enable this to validate the One-Time-Password. We configured the SAS Viya Logon Manager with the details of where the SAS 9.4 Web Infrastructure Platform is running. The SAS Viya Logon Manager passes the One-Time-Password to the SAS 9.4 Web Infrastructure Platform to validate the One-Time-Password. After the One-Time-Password is validated a SAS Viya internal OAuth token is generated and passed back to SAS CAS.

Since SAS CAS does not have access to the end-user credentials, the session that is created will be run using the account used to launch the controller process, cas by default. Since the end-user credentials are not available, the Kerberos credentials that are initialized for the session are from the Kerberos keytab provided to SAS CAS. Then the connection to the Secured Hadoop environment will be made using those Kerberos credentials of the principal assigned to the SAS CAS.


We have presented several use cases above. The table below can be used to summarize and differentiate the use cases based on key factors.

SAS Viya 3.3 - Some Kerberos principles was published on SAS Users.