Tech

4月 062018
 

SAS Global Forum 2018 is just days away, and if you're one of the 6,000 SAS users heading to Denver for the event, get ready for four days of learning from SAS peers, exchanging ideas, discovering new techniques for using SAS, and a bit of fun as well.

But what if you can't make the trip to Denver this year? Is there another way to experience some of the great content that will be shared there? I'm happy to say the answer is yes!

SAS Global Forum 2018 Virtual Event

SAS Global Forum 2018 Virtual EventOnce again, SAS will provide dozens of hours of live video streaming from the event. Register for an All-Access Pass and watch select sessions including Opening Session, keynote talks, select breakouts, Tech Talks, updates from The Quad, interviews with SAS executives and developers, and more. You can find livestream presentations and add them to your calendar here. Coverage will be available for on-demand viewing on SAS Global Forum Video Portal after the conference as well.

Get involved via Social Media

Video not your thing? No worries. SAS will provide several other ways to stay up to date. For starters, you can read any of a number of blog posts from the event. Posts will come from many different SAS blogs, but all posts from SAS Global Forum will be aggregated here.

If you're on LinkedIn, Twitter or Facebook, you can stay connected with what's happening and engage with attendees on SAS’ social media channels. Join the conversation, comment on some of the cool presentations you attended or viewed, discuss the exciting news coming out of the event, or simply follow along. The channels sure to have the most activity are the SAS Users LinkedIn group, the SAS Twitter account, the SAS Users Group Facebook page; and Instagram. The hashtag for SAS Global Forum is #SASGF; be sure to use the hashtag in all your posts.

SAS Global Forum 2018 Proceedings

If you want a deeper dive into a particular talk, or if you're interested in presentations that may not be part of the Live Stream, the proceedings for SAS Global Forum 2018 are already available. Here you'll find abstracts, PDF of papers, e-posters, slides, data files, and more.

With all the opportunities to follow along, connect and contribute, you can be a part of SAS Global Forum 2018, whether you're attending in person or not.

Enjoy the event...whether you're going or not!

Participate in SAS Global Forum 2018...even if you're not going was published on SAS Users.

4月 052018
 

SAS Global Forum 2018 takes place April 8-11 in Denver. The following post is from Sebastian Dziadkowiec and Piotr Czetwertynski, presenters at the event. You can join Sebastian and Piotr for their talk: “An Agile Approach to Building an Omni-Channel Customer Experience” on April 9 at 2 p.m. in Meeting Room 302. We'll also post their presentation here after the event has concluded.

Keys to building a successful and future-proof omni-channel customer experience

Most organizations acknowledge that building a seamless and consistent customer experience is critical to long-term success. The big question is: Now what? With all of the channels to stitch together – from brick and mortar experiences to online clicks – how do you track and make sense of all that customer data? And, more importantly, how do you use that data to create the very best customer experience?

Over many years of implementing SAS Customer Intelligence and helping our clients give their customers exactly what they want and when they want it, our team has identified some characteristics that make for successful projects. Here are some of the key components that most often make or break a Customer Intelligence project.

Time to market

Everyone likes to see value generated quickly and reaching the break-even point for project within weeks of project launch is critical. In case of campaign management, it is possible. Instead of following the traditional waterfall path, with all the IT-heavy components like requirements gathering and analysis, solution design, many streams of implementation and testing, it is worth considering releasing a minimum viable product as soon as possible. Such approach allows us to focus on delivering business value and field-testing all the creative ideas, rather than building an IT system in perfect accordance to requirements, and one that may no longer be relevant at the day of release.

Applying analytics in the decisioning process

Go beyond traditional, rule-based approach to get the most out of the data you have. Nowadays, everyone speaks about machine learning, big data, NBA, artificial intelligence and so on. It is up to each organization and CI project to forge those fancy buzz words into real value, by embedding advanced analytics techniques in the decisioning process. There are many ways to boost various use cases by the advanced methods; make sure you will be able to use all you need and integrate their results seamlessly, regardless of when and how you engage with your customers.

While working on a CI project you should also keep in mind other areas: project organization, building a future-proof solution that will stay relevant for years, and constant search for additional opportunities to use available data and solutions to generate incremental value beyond the core scope of customer intelligence project.

There isn’t a one-size fits all approach to implementing a CI project, but these lessons learned can greatly increase your chances for project success – successful delivery generating a high ROI in a short timeframe while staying relevant in the long run - through the very best possible customer experience.

Find out more at the SAS Global User Forum 2018

Join Sebastian and Piotr for their “An Agile Approach to Building an Omni-Channel Customer Experience” Breakout Session at SAS Global Forum April 9 at 2 p.m. in Meeting Room 302.

About the Authors

Piotr Czetwertyński

Piotr is Customer Analytics Manager in Accenture. He has 11 years of experience in Campaign Management and Analytics. Currently he is one of the people responsible for launching of Accenture Center of Excellence for SAS CI in Warsaw, Poland.

Piotr recently focuses on solutioning & strategy in the areas of campaign management, BI & Analytics.

Sebastian Dziadkowiec

Sebastian has 8 years of experience in technology and management consulting, mostly in communications industry. He went through the entire project lifecycle on numerous engagements, starting from programmer, through business and technical analyst, up to solution architect and team manager on large-scale analytics projects.

Sebastian specializes in analytics solutions technology architecture, particularly focusing on customer intelligence and big data. He serves as technology lead in Accenture Center of Excellence for SAS CI in Warsaw, Poland.

 

 

Keys to building a successful and future-proof omni-channel customer experience was published on SAS Users.

4月 012018
 

BREAKING NEWS. Today, shortly after midnight on the U.S. East Coast, Cary, NC-based SAS Institute successfully completed its first space exploration mission.

This interplanetary expedition was conducted on a SAS-designed manned spacecraft powered by our state-of-the-art atomic Collider Acceleration System (CAS) engine. A crew of three SAS volunteers took part in that undertaking. These brave souls were:

spacecraft commanderspacecraft pilot/engineerspacecraft mission specialist

All three specialize in terrestrial communications (i.e. social media) and received special training on extra-terrestrial travel and communications. Their mission was to study the Venus space area up close to find out what causes the gravitational field anomaly that has recently been observed there.

How it all started

Those of us who attended last year’s SAS Global Forum in Orlando must remember the inspiring speech by Canadian astronaut Chris Hadfield. The main idea I took away from his speech was that success is not a good teacher, as it teaches us nothing; failure, on the hand, is a very good teacher, at least for those of us who are willing to learn the lesson.  But, as we all know, there is a time to fail/learn, and there is a time to succeed.

I can’t speak for all of you, but, man, were we inspired by that speech! We at SAS knew right then that we, too, wanted to explore the “final frontier.” As a data analytics software company, all our studies start with data explorations. Our public relations group worked tirelessly with the major stakeholder Government agencies and private companies (NASA, SpaceX, ROSCOSMOS, etc.) to get ahold of the data. When our analysts finally did get access to the data, they were overwhelmed by its size. That was really BIG data (literally of cosmic proportions) – data about every little pocket of spacetime in our Solar system, collected over multiple years of astronomical observations and Space exploration programs.

Predictive Modeling

Using our flagship SAS Viya Analytics software, we mined these vast data archives, employing various predictive modeling, computational, and heuristic techniques such as automatic machine deep space learning, 3D artificial intelligence simulation, and, most importantly, natural, coffee-stimulated human intelligence.

What caught our attention was the space area around Venus. Planet Venus is notorious for being an outlier. First of all, it spins slower than any other planet in the Solar system, even slower than it revolves around the Sun. In fact it spins about 243 times slower than Earth. That means that a day there lasts approximately 243 Earth-days, making it longer than a Venusian year, which is only about 225 Earth-days long.

Second of all, it spins backwards, in the opposite direction from most other planets, including Earth, so that on Venus the sun rises in the west.

Third, it has the highest mean surface temperature of all the Solar System planets – reaching up to 726 °K (452 °C or 870 °F), which is 1.6 times hotter than Mercury, the closest planet to the Sun. This is because of Venus’ thick atmosphere composed mostly of greenhouse gases (carbon dioxide and sulfur dioxide), which trap a good portion of the Sun’s heat.

However, the most unusual thing that we discovered was an aberration in Venus’ gravitational field, suggesting a significant mass (possibly large enough to be a planet) hidden behind it.

The following bubble plot uses a logarithmic scale for x-axis (distance from the Sun) and visualizes our finding:

SAS VA bubble plot showing planets period of revolution around the Sun

Now we can see it clearly. Not only does it show an unknown small planet behind Venus, it also explains why it is not visible from Earth: its period of revolution around the Sun is exactly the same as that of Venus, which is why it has always been obstructed by Venus, and not visible from Earth.

Due to its very close proximity to Venus, there is a good chance that even a slight tangential nudge experienced by planet “2-?” might break gravitational equilibrium, causing it to start orbiting Venus as a moon rather than the Sun.  We will be observing this situation carefully.

Another interesting finding is that this unknown planet is a much more hospitable place than Venus, as it has an oxygen-dominated atmosphere and a relaxing surface temperature only slightly higher than that of Earth, as it is shown in the following chart:

SAS VA bubble plot - Mean surface temperature of the Solar System planets

At this point we had had enough modeling and needed some hard proof.

Space exploration mission

We leveraged our best human intelligence resources around the world to progress rapidly through all the required phases of spacecraft design and construction, crew selection and training, and finally launching and completing the space mission.

All in all, it took just under a year (11 months to be precise) to bring this project to completion. The data collection and exploration phase took around one month, which is in line with the capacity of the SAS Viya analytical environment. The design and build phase took about five months, and it was conducted in parallel with crew selection and training; finally, the fly phase also took 5 months including launch, travel to the Venus space area with a flyby of Venus and the planet X, and a return to Earth. As you can see from the picture at the very top, the unknown planet X hidden behind Venus does indeed exist. However, further analysis and study will be necessary to determine the nature of the observed surface irregularities.

Your participation and input are requested

If you would like you to engage in the fascinating field of space exploration, you are welcome to use the following SAS-generated summary data table:

Data about planets of the Solar System

If you are a detail-oriented type, it will be obvious to you that the circumference of the new planet oddly equals 200π2 (km), which defies all the canons of geometry. You are welcome to prove or disprove the possibility of such an unusual occurrence.

So far, we have referred to this new planet as “2-?” and “planet X”, but it’s about time to give it a name, a real name as all other 8 planets of our Solar System have.

Our first inclination was to name it after our newest SAS analytical software environment, Viya. But do we really want to dilute the brand by applying it to two different though prominent objects?!

That is why we decided to reach out to you, our readers, to solicit ideas for the name of the new planet.  Please provide a brief justification for your name suggestion. We also welcome any insights, hypotheses and data stories you might come up with based on the collected data. We greatly appreciate your input.

Disclosure

Please click here for full disclosure.

SAS discovers a new planet in the Solar System was published on SAS Users.

3月 302018
 

Gradient boosting is one of the most widely used machine learning models in practice, with more and more people like to use it in Kaggle competitions. Are you interested in seeing how to use gradient boosting model for classification in SAS Visual Data Mining and Machine Learning? Here I play with the classification of Fisher’s Iris flower dataset using gradient boosting, and this may serve as a start point to those interested in trying the classification models in SAS Visual Data Mining and Machine Learning product.

Fisher’s Iris data is a well-known dataset in data mining. Per Wikipedia, Fisher developed a linear discriminant model to distinguish the species from each other by the features provided in the dataset. You may already see people run different classification models on this dataset, such as neural network. What I am interested in, is to see how well SAS gradient boosting model will do the species classification.

#1  Explore the dataset

We can easily load Fisher’s Iris dataset from SASHelp.Iris into SAS Viya. The dataset consists of 50 samples each species of Iris Setosa Virginica and Versicolor, totally 150 records with five attributes: Petal Length, Petal Width, Sepal Length, Sepal width and Iris Species. The dataset itself is already well-formed, with neither missing values, nor outliers. Take a quick look of the dataset in SAS Visual Analytics as below.

Gradient boosting

From the chart, we see that the iris species of ‘Setosa’ can be easily distinguished from the ‘Versicolor’ and ‘Virginica’ species by the length and width of their petals and sepals. However, this is not the case for the latter two species, some of them are staggered closely, which makes it a little hard to distinguish each other by these features.

#2  Prepare Data

There is not much effort needed to prepare the data for the prediction. But one thing I’d like to mention here is about the standardization of measure variables. By viewing the measure details in SAS Visual Analytics, we see that neither Petal Length distribution nor Petal Width distribution is normal. You may wonder if we need to normalize the data before applying it to the model for analysis, but this leads to one great thing I like the Gradient Boosting model. Users do not need to explicitly standardize quantitative data. Tree-base models should be robust to such problem in an input feature, since the algorithm is based on node splits. (Here is an article discussing a similar problem.)

So, here my data preparation is just doing the data partitioning before starting the classification on iris species. I need to make sure each partition will follow the same distribution on different species in the iris dataset. This can be achieved easily in SAS Visual Analytics by adding a partition data item - by setting the Sampling method to ‘Stratified sampling’ and add the ‘Iris Species’ as the column to be stratified by. I define two partitions so I have training partition, validation partition. I set 60% for training, and 40% for validation partition, with random seed 1234. Thus, a categorical data item ‘Partition’ is added, with value of 0 for validation, 1 for training partition. (For easier understanding in the charts, I’ve created a custom category called ‘Partitions’ based on the ‘Partition’ data item values.)

The charts below show that the 150 rows in Fisher’s Iris dataset are distributed equally into three species, and the created partitions are sampled with the same percentage among the three species.

#3  Train the gradient boosting model

Training various models in SAS Visual Data Mining and Machine Learning allows us to appreciate the advantages of visualization, and it’s very straight-forward for users. In ‘Objects’ tab, drag and drop the ‘Gradient Boosting’ to the canvas. Assign the ‘Iris Species’ as response variable, and ‘Petal Length, Petal Width, Sepal Length, Sepal width’ as predictors. Then set the ‘Partition’ data item for Partition ID. After that, the system will train the model and show the model assessment. I’ve taken a screenshot for ‘Virginica’ event as below.

The response variable of Iris Species has three event levels – ‘Setosa’, ‘Versicolor’ and ‘Virginica’, and we can choose desired event level to have a look of the model output. In addition, we may switch the assessment plot of Lift to ROC plot, or to Misclassification plot (Note: the misclassification plot is based on event level, thus it will show the ‘Setosa’ and ‘NOT Setosa’ species if we choose the ‘Setosa’ event.). Below is a screenshot with ROC plot and the model assessment statistics.

In practice, training models usually cost a lot of effort in tuning model parameters. SAS Visual Data Mining and Machine Learning has provided the ‘Autotune’ feature that can help this, users may decide some settings like maximum iterations, seconds, and evaluations and the product will choose the optimal values for the hyperparameters of the model. Considering that this dataset only has 150 samples, I won’t bother to do the hyperparameters tuning.

#4  Make prediction by the model

Now I can start to make predictions from the gradient boosting model for the data in testing partition. There are several ways to go here. In Visual Data Mining and Machine Learning, on the right-button mouse menu, either click the ‘Export model…’ or click the ‘Derive predicted…’ menu. The first one will export the model codes, so you can run it in SAS Studio with your data to be predicted. The latter one is very straight-forward in SAS Visual Data Mining and Machine Learning. It will pop up the ‘New Prediction Items’ page, where you may choose to get the predicted value and its probability values for all the levels of Iris Species. These data items will be added to the iris CAS table for further evaluation. Since the iris dataset has three species in the sample, I need to set ‘All levels’ so the prediction will give out the classification in three species and their probabilities.

#5  Review the prediction result

In the model assessment tab, we already see the model assessment statistics for model evaluation. We may also switch to ‘Variable Importance’ tab, or ‘Lift’ tab, ‘ROC’ tab, and ‘Misclassification’ tab to see more about the model. Here I’d like to visually compare the predicted species value with the iris species value provided in the dataset.

To show how many failures of the classification visually, I perform following actions:

  • In SAS Visual Analytics, create a list table to show all 150 rows of the iris dataset. Since there is no primary key in the dataset, the SAS Visual Analytics list table will do aggregation for measure variables by default, so be sure to set the ‘Detail data’ option in the Options tab.
  • Create a calculated item (named ‘equals’) to compare if the values of ‘Iris Species’ and ‘Predicted: Iris Species’ columns are equal: {IF ( 'Iris Species'n = 'Predicted: Iris Species'n ) RETURN 1 ELSE 0. }
  • Define a display rule with the calculated item to highlight the misclassified rows. I’ve sorted the table by above ‘equals’ value so those rows without equal value of ‘Iris Species’ and ‘Predicted : Iris Species’ columns are shown on top.

We see four rows are misclassified by the model, 3 of them are from training partition and 1 from validation partition. So far, the result looks not bad, right?

We may continue to tune the parameters of gradient boosting model easily in SAS Visual Data Mining and Machine Learning, to improve the model. For example, if I set smaller leaf size value to 2 instead of the default value of 5, the model accuracy will be improved (too good to be true?). See below screenshot for a comparison.

Of course, people may like to try tuning other parameters, or to generate more features to refine the model. Anyway, it is easy-to-use and straight-forwarded to do classification using gradient boosting model in SAS Visual Data Mining and Machine Learning. In addition, there are many other models in SAS Visual Data Mining and Machine Learning people may like to run for classification. Do you like to play with the other models for practicing?

Play with classification of Iris data using gradient boosting was published on SAS Users.

3月 302018
 

As a follow on from my previous blog post, where we looked at the different use cases for using Kerberos in SAS Viya 3.3, in this post I want to delve into more details on configuring Kerberos delegation with SAS Viya 3.3. SAS Viya 3.3 supports the use of Kerberos delegation to authenticate to SAS Logon Manager and then use the delegated credentials to access SAS Cloud Analytic Services. This was the first use case we illustrated in the previous blog post.

As a reminder this is the scenario we are discussing in this blog post:

Kerberos Delegation

In this post we’ll examine:

  • The implications of using Kerberos delegation.
  • The prerequisites.
  • How authentication is processed.
  • How to configure Kerberos delegation.

Why would we want to configure Kerberos delegation for SAS Viya 3.3? Kerberos will provide us with a strong authentication mechanism for the Visual interfaces, SAS Cloud Analytic Services, and Hadoop in SAS Viya 3.3. With Kerberos enabled, no end-user credentials will be sent from the browser to the SAS Viya 3.3 environment. Instead Kerberos relies on a number of encrypted tickets and a trusted third party to provide authentication. Equally, leveraging Kerberos Delegation means that both the SAS Cloud Analytic Services session and the connection to Hadoop will all be running as the end-user. This better allows you to trace operations to a specific end-user and to more thoroughly apply access controls to the end-user.

Implications

Configuring Kerberos delegation will involve configuring Kerberos authentication for both the Visual interfaces and SAS Cloud Analytic Services. First, we’ll look at the implications for the Visual interfaces.

Once we configure Kerberos for authentication of SAS Logon Manager it replaces the default LDAP provider for end-users. This means that the only way for end-users to authenticate to SAS Logon Manager will be with Kerberos. In SAS Viya 3.3 there is no concept of fallback authentication.

Kerberos will be our only option for end-user authentication and we will be unable to use the sasboot account to access the environment. Configuring Kerberos authentication for SAS Logon Manager will be an all-or-nothing approach.

While the web clients will be using Kerberos for authentication, any client using the OAuth API directly will still use the LDAP provider. This means when we connect to SAS Cloud Analytic Services from SAS Studio (which does not integrate with SAS Logon) we will still be obtaining an OAuth token using the username and password of the user accessing SAS Studio.

If we make any mistakes when we configure Kerberos, or if we have not managed to complete the prerequisites correctly, the SAS Logon Manager will not start correctly. The SAS Logon Manager bootstrap process will error and SAS Logon Manager will fail to start. If SAS Logon Manager fails to start then there is no way to gain access to the SAS Viya 3.3 visual interfaces. In such a case the SAS Boot Strap configuration tool must be used to repair or change the configuration settings. Finally, remember using Kerberos for SAS Logon Manager does not change the requirement for the identities microservice to connect to an LDAP provider. Since the identities microservice is retrieving information from LDAP about users and groups we need to ensure the username part of the Kerberos principal for the end-users match the username returned from LDAP. SAS Logon Manager will strip the realm from the user principal name and use this value in the comparison.

Then considering SAS Cloud Analytic Services, we will be adding Kerberos to the other supported mechanisms for authentication. We will not replace the other mechanisms the way we do for SAS Logon Manager. This means we will not prevent users from connecting with a username and password from the Programming interfaces. As with the configuration of SAS Logon Manager, issues with the configuration can cause SAS Cloud Analytic Services to fail to start. Therefore, it is recommended to complete the configuration of SAS Cloud Analytic Services after the deployment has completed and you are certain things are working correctly.

Prerequisites

To be able to use Kerberos delegation with SAS Viya 3.3 a number of prerequisites need to be completed.

Service Principal Name

First a Kerberos Service Principal Name (SPN) needs to be registered for both the HTTP service class and the sascas service class. This will take the form <service class>/<HOSTNAME>, where the <HOSTNAME> is the value that will be used by clients to request a Kerberos Service Ticket. In most cases for HTTP the <HOSTNAME> will just be the fully qualified hostname of the machine where the Apache HTTP Server is running. If you are using aliases or alternative DNS registrations then finding the correct name to use might not be so straight forward. For SAS Cloud Analytic Services, the <HOSTNAME> will be the CAS Controller hostnameNext by registering we mean that this Service Principal Name must be provided to the Kerberos Key Distribution Center (KDC). If we are using Microsoft Active Directory, each SPN must be registered against an object in the Active Directory database. Objects that can have a SPN registered against them are users or computers. We recommend using a user object in Active Directory to register each SPN against. We also recommend that different users are used for HTTP and CAS.

So, we have two service accounts in Active Directory and we register the SPN against each service account. There are different ways the SPN can be registered in Active Directory. The administrator could perform these tasks manually using the GUI, using an LDAP script, PowerShell script, using the setspn command, or using the ktpass command. Using these tools multiple SPNs can be registered against the service account, which is useful if there are different hostnames the end-users might use to access the service. In most cases using these tools will only register the SPN; however, using the ktpass command will also change the User Principal Name for the service account. More on this shortly.

Alternatively, to Microsoft Active Directory customers could be using a different Kerberos KDC. They could use MIT Kerberos or Heimdal Kerberos. For these implementations of Kerberos there is no difference between a user and a service. The database used by these KDCs just stores information on principals and does not provide a distinction between a User Principal Name and a Service Principal Name.

Trusted for Delegation

For the Kerberos authentication to be delegated from SAS Logon Manager to SAS Cloud Analytic Services and then from SAS Cloud Analytic Services to Secured Hadoop, the two service accounts that have the SPNs registered against them must be trusted for delegation. Without this the scenario it will not work. You can only specify that an account is trusted for delegation after the Service Principal Name has been registered. The option is not available until you have completed that step. The picture below shows an example of the delegation settings in Active Directory.

If the Secured Hadoop environment is configured using a different Kerberos Key Distribution Center (KDC) to the rest of the environment it will not prevent the end-to-end scenario working. However, it will add further complexity. You will need to ensure there is a cross-realm trust configured to the Hadoop KDC for the end-to-end scenario to work.

Kerberos Keytab

Once you have registered each of the SPNs you’ll need to create a Kerberos keytab for each service account. Again, there are multiple tools available to create the Kerberos keytab. We recommend using the ktutil command on Linux, since this is independent of the KDC and makes no changes to the Kerberos database when creating the keytab. Some tools like ktpass will make changes when generating the keytab.

In the Kerberos keytab we need to have the User Principal Name (UPN) and associated Kerberos keys for that principal. The Kerberos keys are essentially encrypted versions of the password for the principal. As we have discussed above, about the SPN, depending on the tools used to register it the UPN for the Kerberos keytab could take different forms.

When using ktpass to register SPN and create the keytab in a single step the UPN of the account in Active Directory will be set to the same value as the SPN. Whilst using the setspn command or performing the task manually will leave the UPN unchanged. Equally for MIT Kerberos or Heimdal Kerberos, since there is no differentiation between principals the UPN for the keytab, will be the SPN registered with the KDC.

Once the Kerberos keytabs have been created they will need to be made available to any hosts with the corresponding service deployed.

Kerberos Configuration File

Finally, as far as prerequisites are concerned we might need to provide a Kerberos configuration file for the host where SAS Logon Manager is deployed. This configuration should identify the default realm and other standard Kerberos settings. The Kerberos implementation in Java should be able to use network queries to find the default realm and Kerberos Key Distribution Center. However, if there are issues with the network discovery, then providing a Kerberos configuration file will allow us to specify these options.

The Kerberos configuration file should be placed in the standard location for the operating system. So on Linux this would be /etc/krb5.conf. If we want to specify a different location we can also specify a JVM option to point to a different location. This would be the java.security.krb5.conf option. Equally, if we cannot create a Kerberos configuration file we could set the java.security.krb5.realm and java.security.krb5.kdc options to identify the Kerberos Realm and Kerberos Key Distribution Center. We’ll show how to set JVM options below.

Authentication Process

The process of authenticating an end-user is shown in the figure below:

Where the steps are:

A.  Kerberos used to authenticate to SAS Logon Manager. SAS Logon Manager uses the Kerberos Keytab for HTTP/<HOSTNAME> to validate the Service Ticket. Delegated credentials are stored in the Credentials microservice.
B.  Standard internal OAuth connection to SAS Cloud Analytic Services. Where the origin field in the OAuth token includes Kerberos and the claims include the custom group ID “CASHOSTAccountRequired”.
C.  The presence of the additional Kerberos origin causes SAS Cloud Analytic Services to get the CAS client to make a second connection attempt using Kerberos. The Kerberos credentials for the end-user are obtained from the Credentials microservice. SAS Cloud Analytic Services Controller uses the Kerberos Keytab for sascas/<HOSTNAME> to validate the Service Ticket and authenticate the end-user. Delegated credentials are placed in the end-user ticket cache.
D.  SAS Cloud Analytic Services uses the credentials in the end-user ticket cache to authenticate as the end-user to the Secured Hadoop environment.

Configuration

Kerberos authentication must be configured for both SAS Logon Manager and SAS Cloud Analytic Services. Also, any end-user must be added to a new custom group.

SAS Logon Manager Configuration

SAS Logon Manager is configured in SAS Environment Manager.

Note: Before attempting any configuration, ensure at least one valid LDAP user is a member of the SAS Administrators custom group.

The configuration settings are within the Definitions section of SAS Environment Manager. For the sas.logon.kerberos definition you need to set the following properties:

For more information see the

SAS Logon Manager will need to be restarted for these new JVM options to be picked up. The same method can be used to set the JVM options for identifying the Kerberos Realm and KDC where we would add the following:

  • Name = java_option_krb5realm
  • Value = -Djava.security.krb5.realm=<REALM>
  • Name = java_option_krb5kdc
  • Value = -Djava.security.krb5.kdc=<KDC HOSTNAME>

Or for setting the location of the Kerberos configuration file where we would add:

  • Name = java_option_krb5conf
  • Value = -Djava.security.krb5.conf=/etc/krb5.conf

SAS Cloud Analytic Services Configuration

The configuration for SAS Cloud Analytic Services is not performed in SAS Environment Manager and is completed by changing files on the file system. The danger of changing files on the file system is that re-running the deployment Ansible playbook might overwrite any changes you make. The choices you have is to either remake any changes to the file system, make the changes to both the file system and the playbook files, or make the changes in the playbook files and re-run the playbook to change the file system. Here I will list the changes in both the configuration files and the playbook files.

There is only one required change and then two option changes. The required change is to define the authentication methods that SAS Cloud Analytic Services will use. In the file casconfig_usermods.lua located in:

/opt/sas/viya/config/etc/cas/default

Add the following line:

cas.provlist = 'oauth.ext.kerb'

Note: Unlike the SAS Logon Manager option above, this is separated with full-stops!

In the same file we can make two optional changes. These optional changes enable you to override default values. The first is the default Service Principal Name that SAS Cloud Analytic Services will use. If you cannot use sascas/<HOSTNAME> you can add the following to the casconfig_usermods.lua:

-- Add Env Variable for SPN
env.CAS_SERVER_PRINCIPAL = 'CAS/HOSTNAME.COMPANY.COM'

This sets an environment variable with the new value of the Service Principal Name. The second optional change is to set another environment variable. This will allow you to put the Kerberos Keytab in any location and call it anything. The default name and location is:

/etc/sascas.keytab

If you want to put the keytab somewhere else or call it something else add the following to the casconfig_usermods.lua

-- Add Env Variable for keytab location
env.KRB5_KTNAME = '/opt/sas/cas.keytab'

These changes can then be reflected in the vars.yml within the playbook by adding the following to the CAS_CONFIGURATION section:

CAS_CONFIGURATION:
   env:
     CAS_SERVER_PRINCIPAL: 'CAS/HOSTNAME.COMPANY.COM'
     KRB5_KTNAME: '/opt/sas/cas.keytab'
   cfg:
     provlist: 'oauth.ext.kerb'

With this in place we can restart the SAS Cloud Analytic Services Controller to pick-up the changes.

Custom Group

If you attempted to test accessing SAS Cloud Analytic Services at this point from the Visual interfaces as an end-user you would see that they were not delegating credentials and the CAS session was not running as the end-user. The final step is to create a custom group in SAS Environment Manager. This custom group can be called anything, perhaps “Delegated Users”, but the ID for the group must be “CASHostAccountRequired“. Without this the CAS session will not be run as the end-user and delegated Kerberos credentials will not be used to launch the session.

Summary

What we have outlined in this article is the new feature of SAS Viya 3.3 that enables Kerberos delegation throughout the environment. It allows you to have end-user sessions in SAS Cloud Analytics Services that are able to use Kerberos to connect to Secured Hadoop. I hope you found this helpful.

SAS Viya 3.3 Kerberos Delegation was published on SAS Users.

3月 302018
 

Multi Node Data TransferWith SAS Viya 3.3, a new data transfer mechanism “MultiNode Data Transfer” has been introduced to transfer data between the data source and the SAS’ Cloud Analytics Services (‘CAS’), in addition to Serial and Parallel data transfer modes. The new mechanism is an extension of the Serial Data Transfer mode. In MultiNode Data transfer mode each CAS Worker makes a simultaneous concurrent connection to read and write data from the source DBMS or Distributed data system.

In CAS, SAS Data connectors are used for Serial mode and SAS Data Connect Accelerators are used for Parallel mode data transfer between CAS and DBMS. The SAS Data connector can also be used for the MultiNode data transfer mechanism. In a multi-node CAS environment when the Data Connector is installed on all Nodes, the Data connector can take advantage of a multi-node CAS environment and make concurrent data access connections from each CAS worker to read and write data from the data source environment.

The CAS Controller controls the MultiNode Data transfer. It directs each CAS worker node on how to query the source data and obtain the needed data. The CAS Controller checks the source data table for the first numeric column and uses the values to divide the table into slices using a MOD function of the number of CAS nodes specified. The higher the Cardinality of the selected numeric column, the easier the data can be divided into slices. If CAS chooses a low cardinality column, you could end-up with poor data distribution on the CAS worker nodes. The CAS controller directs each CAS worker to submit a query to obtain the needed slice of data. During this process, each CAS worker makes an independent, concurrent request to the data source environment.

Data is transferred from the source environment to the CAS worker nodes directly using a single thread connection, bypassing the CAS Controller.

The following diagrams describe the data access from CAS to data source environment using MultiNode Data transfer Mode. CAS is hosted on a multi-node environment with SAS Data Connector installed on each node (CAS Controller and Workers). A CASLIB is defined with NUMREADNODES= and NUMWRITENODES= value other than 1. With each data table access request, the CAS controller scan through the source data table for the first numeric columns and use the value to prepare a query for each CAS worker to run. The CAS Worker node submits an individual query to get its slice of the data. Something like:

Select * from SourceTable where mod(NumericField, NUMREADNODES) = WorkerNodeNumber

The data moves from the DBMS gateway server to each CAS Worker Nodes directly using a single thread connection, bypassing the CAS Controller. It’s a kind of parallel load using the serial mechanism, but it’s not a massively parallel data load. You can notice the bottleneck at DBMS gateway server. The data transfers always passes through the DBMS gateway server to the CAS Worker nodes.

Multi Node Data Transfer

Prerequisites to enable MultiNode Data Transfer include:

  • The CAS environment is a multi-node environment (multiple CAS Worker Nodes).
  • The SAS Data Connector for the data source is installed on each CAS Worker, and Controller Node.
  • The data source client connection components are installed on each CAS Worker, and Controller Node.

By default, SAS Data connector uses serial data transfer mode. To enable MultiNode Data Transfer mode you must use the NUMREADNODES= and NUMWRITENODES= parameters in CSLIB statement and specify value other than 1. If value is specified as 0, CAS will use all available CAS worker nodes. MultiNode Data Transfer Mode can use only number of available node, if you specify more than available nodes, the log prints a warning message.

The following code example describes the data load using “MultiNode” data transfer mode. It assigns a CASLIB using serial mode with NUMREADNODES=10 and NUMWRITENODES=10 and loads data from a Hive table to CAS. As NUMREADNODES= value is other than 1, it follows the MultiNode mechanism. You can notice in log, there is a warning message stating that the Number of Read node parameter exceeds the available Worker nodes. This is one way to verify whether CAS is using MultiNode data transfer mode, by specifying the higher number than available CAS worker nodes. If you specify value for NUMREADNODES =0, it will use all available nodes but no message or warning message in SAS log about multi node usage.

CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
caslib HiveSrl datasource=(srctype="hadoop",
server="xxxxxxx.xxx",
username="hadoop",
dataTransferMode="SERIAL",
NUMREADNODES=10, 
NUMWRITENODES=10,
hadoopconfigdir="/opt/MyHadoop/CDH/Config",
hadoopjarpath="/opt/MyHadoop/CDH/Jars",
schema="default");
proc casutil;
load casdata="prdsal2_1G" casout="prdsal2_1G"
outcaslib="HiveSrl" incaslib="HiveSrl" ;
quit;

SAS Log extract:

….
77 proc casutil;
78 ! load casdata="prdsal2_1G" casout="prdsal2_1G"
79 outcaslib="HiveSrl" incaslib="HiveSrl" ;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to Hadoop.
WARNING: The value of numReadNodes(10) exceeds the number of available worker nodes(7). The load will proceed with numReadNodes=7. 
…
..

On the Database side, in this case Hive, note the queries submitted by CAS Worker Nodes. Each include the MOD function WHERE clause as described above.

On Hadoop Resource Manager User Interface you can notice the corresponding job execution for each query submitted by CAS worker nodes.

When using MultiNode mode to load data to CAS, data distribution is dependent on the cardinality of the numeric column selected by CAS during MOD function operation. You can notice the CAS data distribution for the above loaded table is not ideal, since it selected a column (‘year’) which is not ideal (in this case) for data distribution across CAS worker nodes. There is no option with MultiNode mechanism to specify a column name to be use for query preparation and eventually for data distribution.

If CAS cannot find suitable columns for MultiNode data transfer mode, it will use standard Serial mode to transfer data as shown in the following log:

……..
74
74 ! load casdata="prdsal2_char" casout="prdsal2_char"
75 outcaslib="HiveSrl" incaslib="HiveSrl" ;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to Hadoop.
WARNING: The value of numReadNodes(10) exceeds the number of available worker nodes(7). The load will proceed with numReadNodes=7.
WARNING: Unable to find an acceptable column for multi-node reads. Load will proceed with numReadNodes = 1. 
NOTE: Cloud Analytic Services made the external data from prdsal2_char available as table PRDSAL2_CHAR in caslib HiveSrl.
…….

List of data platform supported with MultiNode Data Transfer using Data Connector:

  • Hadoop
  • Impala
  • Oracle
  • PostgreSQL
  • Teradata
  • Amazon Redshift
  • DB2
  • MS SQL Server
  • SAP HANA

The order of data types that SAS uses to divide data into slices for MultiNode Data Read.

  • INT (includes BIGINT, INTEGER, SMALLINT, TINYINT)
  • DECIMAL
  • NUMERIC
  • DOUBLE

Multi-Node Write:

While this post focused on loading data from a data source into CAS, multi-node data transfer also works when saving from CAS back to the data source. The important parameter when saving is NUMWRITENODES instead of NUMREADNODES. The behavior of multi-node saving is similar to that of multi-node loading.

Summary:

The SAS Data Connector can be used for MultiNode data transfer by installing Data Connector and DBMS client components on all CAS Worker nodes without additional license fees. The source data is transferred directly from DBMS gateway server to CAS Worker Nodes being divided up by a simple MOD function. By using this mechanism, the optimum data distribution in CAS Nodes are not guaranteed. It’s suggested to use all CAS Worker Nodes by specifying NUMREADNODES=0 when loading data to CAS using MultiNode mode.

Important links for more information about this topic:

Multi Node Data Transfer to CAS was published on SAS Users.

3月 302018
 

Reasons to look forward to SAS Global Forum 2018SAS Global Forum 2018 takes place April 8-11 in Denver. The event will attract nearly 6,000 SAS professionals from around the world and from nearly every industry for the a learning and networking experience second to none. At this year's event I'll be presenting the paper Getting More Insight into Your Forecast Errors with the GLMSELECT and QUANTSELECT Procedures on Tuesday, April 10 from 2:30-3:30 in Meeting Room 401. SAS Global Forum is my favorite SAS user event for so many reasons. Here are my top 10 reasons I'm particularly excited about this year's event.

Reason #1: Opening Session
It's always impressive to see the latest and greatest SAS software has to offer. I'm particularly looking forward to hearing the latest SAS Viya news and developments in Machine Learning. Keynotes by Oliver Schabenberger and other SAS Executives highlight the event.

Reason #2: Meeting my favorite SAS bloggers
It's inspiring to see how Rick Wicklin, Robert Allison, Chris Hemedinger and Sanjay Matange spend a lot of (spare?) time to provide SAS Tips for the SAS Community and SAS blogs. Thank you!

Reason #3: Presenting my paper
Inspired by Paul Goodwin, my paper will include lots of @SASsoftware code and SAS tips for SAS users and the SAS community.

Reason #4: Co-Authoring paper 2419: "An Easier and Faster Way to Untranspose a Wide File"
You'll learn important data prep and data quality tasks and the and benefits of using @SASsoftware code to perform these task.

Reason #5: Meeting other SAS Press authors
It'll be a lot of fun meeting other SAS book authors including Tricia Aanderud, Sanjay MatangeRick Wicklin, Chris Hemedinger and Robert Allison to discuss their experiences and meet the charming ladies of the SAS Press team.

Reason #6: Meeting SAS Press readers
Excited about meeting the SAS users who read SAS books at the SAS Press booth in the QUAD to discuss their experiences with SAS software for analytics.

Reason #7: Taking care of our SAS customers from Austria, Germany and Switzerland
It'll be good to see that many SAS software users from our region, many of whom are travelling to Denver to see news on SAS analytics, Machine Learning, Deep Learning, Artificial Intelligence and SAS Viya.

Reason #8: Meeting students who use SAS
It will be exhilarating exchanging ideas with students using SAS software for their research and classes. It's so refreshing to see the increasing interest in SAS from universities around the world.

Reason #9: Getting inspired
Always a great thing to see nearly 6,000 people with an interest in SAS software and SAS analytics interacting and exchanging best practices. I always return to work with a renewed sense of motivation and energy.

Reason #10: Interest in my SAS books and papers
Thanks a lot for your interest in my work! It is always a pleasure to discuss all things SAS with each and every one of you.

Hope to see many of you in Denver!

10 Reasons why I look forward to SAS Global Forum was published on SAS Users.

3月 292018
 

With the release of SAS Viya 3.3, you now have the ability to pass implicit SQL queries to a variety of SQL data sources, including Hive. Under an implicit pass-through, users can write SAS compliant SQL code, and SAS will:

  1. Convert as much code as possible into database native SQL.
  2. Execute the resulting query in-database.
  3. Bring the result back into SAS Viya.

My SAS Viya is co-located within a Hortonworks Hadoop environment. Within this environment, I have set up multiple tables within Hive, which provides structure and a query-like environment for Hadoop data. Using the SAS Data Explorer in SAS Viya, I can easily see the different tables in the Hive environment, and visually inspect them without having to load the data into SAS. The screenshot below shows the Hive table va_service_detail, which contains anonymous data related to recent hospital stays.

SQL Pass-through to Hive in SAS Viya

In my Hive environment, I have a second table called va_member_detail, which contains information about the individuals who were hospitalized in the above table, va_service_detail. A summary of this Hive table can be found in the screenshot below.

Using this data, I would like to perform an analysis to determine why patients are readmitted to the hospital, and understand how we can preventatively keep patients healthy. I will need to join these two tables to allow me to have visit-level and patient-level information in one table. Since medical data is large and messy, I would like to only import the needed information into SAS for my analysis.  The simplest way to do this is through an implicit SQL pass-through to Hive, as shown below:

With an implicit pass-through, I write normal SAS FedSQL code on top of a SAS Library called “Hadoop” pointing to my Hive Server. Once the code is submitted, the SAS System performs the following steps:

  1. Translates the SAS FedSQL code into HiveQL.
  2. Executes the HiveQL script in Hive.
  3. Loads the resulting data in parallel into SAS.

Looking at the log, we can see that the SQL statement was “Fully offloaded to the underlying data source via fill pass-through”, meaning that SAS successfully executed the query, in its entirety, in Hive. With the SAS Embedded Process for Hadoop, the resulting table is then lifted in-parallel from Hive into SAS Viya, making it available for analysis.

As we can see in the log, it took 42 seconds to execute the query in Hive, and bring the result into SAS. To compare efficiency, I redid the analysis, loading va_service_detail and va_member_detail into the memory of the SAS server and performed the join there. The execution took 58 seconds, but required three in-memory tables to do so, along with much more data passing through the network. The implicit pass-through has the benefits of increased speed and decreased latency in data transfer by pushing the query to its source, in this case Hive.

Conclusion

The Implicit SQL Pass-through to Hive in SAS Viya is a must have tool for any analyst working with Hadoop data. With normal SQL syntax in a familiar SAS interface, analysts can push down powerful queries into Hive, speeding up their analysis while limiting data transfer. Analysts can effectively work with large ever-growing data sizes, and speed up the time to value on solving key business challenges.

Implicit SQL Pass-through to Hive in SAS Viya was published on SAS Users.

3月 222018
 

Generating HTML output might be something that you do daily. After all, HTML is now the default format for Display Manager SAS output, and it is one of the available formats for SAS® Enterprise Guide®. In addition, SAS® Studio generates HTML 5.0 output as a default. The many faces of HTML are also seen during everyday operations, which can include the following:

  • Creating reports for the corporate intranet.
  • Creating a responsive design so that content is displayed well on all devices (including mobile devices).
  • Emailing HTML within the body of an email message.
  • Embedding figures in a web page, making the page easier to send in an email.

These tasks show the need for and the true power and flexibility of HTML. This post shows you how to create HTML outputs for each of these tasks with the Output Delivery System (ODS). Some options to use include the HTML destination (which generates HTML 4.1 output by default) or the HTML5 destination (which generates HTML 5.0 output by default).

Reports

With the HTML destination and PROC REPORT, you can create a summary report that includes drill-down data along with trafficlighting.

   ods html path="c:\temp" file="summary.html";	
 
   proc report data=sashelp.prdsale;
      column Country  Actual Predict; 
      define Country / group;
      define actual / sum;
      define predict / sum;
      compute Country;
         drillvar=cats(country,".html");
         call define(_col_,"url",drillvar);
      endcomp;
   run;
 
   ods html close;
 
   /* Create Detail data */
 
   %macro detail(country);
   ods html path="c:\temp" file="&amp;country..html";
 
   proc report data=sashelp.prdsale(where=(country="&amp;country"));
      column Country region product Predict Actual; 
      compute actual;
         if actual.sum &gt;  predict.sum then 
         call define(_col_,"style","style={background=green}");
   endcomp;
   run;
 
   ods html close;
   %mend;
 
   %detail(CANADA)
   %detail(GERMANY)
%detail(U.S.A.)

Generating HTML output

In This Example

  • The first ODS HTML statement uses a COMPUTE block to create drill-down data for each Country variable. The CALL DEFINE statement within the COMPUTE block uses the URL access method.
  • The second ODS HTML statement creates targets for each of the drill-down values in the summary table by using SAS macro language to subset the data. The filename is based on the value.
  • Trafficlighting is added to the drill-down data. The added color is set to occur within a row when the data value within the Actual Sales column is larger than the data value for the Predicted Sales column.

HTML on Mobile Devices

One approach to generating HTML files is to assume that users access data from mobile devices first. Therefore, each user who accesses a web page on a mobile device should have a good experience. However, the viewport (visible area) is smaller on a mobile device, which often creates a poor viewing experience. Using the VIEWPORT meta tag in the METATEXT= option tells the mobile browser how to size the content that is displayed. In the following output, the content width is set to be the same as the device width, and the  initial-scale property controls the zoom level when the page first loads.

<meta name="viewport" content="width=device-width, initial-scale=1">

 ods html path="C:\temp" file="mobile.html" 
 metatext='name="viewport" content="width=device-width, initial-
 scale=1"';
   proc print data=sashelp.prdsale;
      title "Viewing Output Using Mobile Device";
   run;
   ods html close;

In This Example

  • The HTML destination and the METATEXT= option set the width of the output to the width of the mobile device, and the zoom level for the initial load is set.

HTML within Email

Sending SMTP (HTML) email enables you to send HTML within the body of a message. The body can contain styled output as well as embedded images. To generate HTML within email, you must set the EMAILSYS= option to SMTP, and the EMAILHOST= option must be set to the email server. To generate the email, use a FILENAME statement with the EMAIL access method, along with an HTML destination. You can add an image by using the ATTACH= option along with the INLINED= option to add a content identifier, which is defined in a later TITLE statement. For content to appear properly in the email, the CONTENT_TYPE= option must be set to text/html.

The MSOFFICE2K destination is used here instead of the HTML destination because it holds the style better for non-browser-based applications, like Microsoft Office. The ODSTEXT procedure adds the text to the message body.

   filename mymail email to="chevell.parker@sas.com"
                       subject="Forecast Report"
                       attach=('C:\SAS.png' inlined="logo")
                       content_type="text/html";   
 
   ods msoffice2k file=mymail rs=none style=htmlblue options(pagebreak="no");
     title j=l '<img src="cid:logo" width="120" height="100" />';
     title2 "Report for Company XYZ";
 
 
   proc odstext;
      H3 "Confidential!";
   run;
 
   title;   
   proc print data=sashelp.prdsale;
   run;
 
   ods msoffice2k close;

In This Example

  • The FILENAME statement with the EMAIL access method is used.
  • The ATTACH= option specifies the image to include.
  • The INLINED= option specifies a content identifier.
  • The CONTENT_TYPE= option is text/html for HTML output.
  • The ODSTEXT procedure adds the text before the table.
  • The TITLE statement defines the “logo” content identifier.

Graphics within HTML

The ODS HTML5 destination has many benefits, such as the ability to embed graphics directly in an HTML file (and the default file format is SVG). The ability to embed the figure is helpful when you need to email the HTML file, because the file is self-contained. You can also add a table of contents inline to this file.

ods graphics / height=2.5in width=4in;
ods html5 path="c:\temp" file="html5output.html";
   proc means data=sashelp.prdsale;
   run;
 
   proc sgplot data=sashelp.prdsale;
      vbar product / response=actual;
   run;
 
   ods html5 close;

In This Example

  • The ODS HTML5 statement creates a table along with an embedded figure. The image is stored as an SVG file within the HTML file.

Conclusion

HTML is used in many ways when it comes to reporting. Various ODS destinations can accommodate the specific output that you need.

The many faces of HTML was published on SAS Users.

3月 172018
 

This is a continuation of my previous blog post on SAS Data Studio and the Code transform. In this post, I will review some additional examples of using the Code transform in a SAS Data Studio data plan to help you prepare your data for analytic reports and/or models.

Create a Unique Identifier Example

The DATA step code below combines the _THREADID_ and the _N_ variables to create a UniqueID for each record.

SAS Data Studio Code Transform

The variable _THREADID_ returns the number that is associated with the thread that the DATA step is running in a server session. The variable _N_ is an internal system variable that counts the iterations of the DATA step as it automatically loops through the rows of an input data set. The _N_ variable is initially set to 1 and increases by 1 each time the DATA step loops past the DATA statement. The DATA step loops past the DATA statement for every row that it encounters in the input data. Because the DATA step is a built-in loop that iterates through each row in a table, the _N_ variable can be used as a counter variable in this case.

_THREADID_ and _N_ are variables that are created automatically by the SAS DATA step and saved in memory. For more information on automatic DATA step variables refer to its

Cluster Records Example

The DATA step code below combines the _THREADID_ and the counter variables to create a unique ClusterNum for each BY group.

This code uses the concept of FIRST.variable to increase the counter if it is the beginning of a new grouping. FIRST.variable and LAST.variable are variables that CAS creates for each BY variable. CAS sets FIRST.variable when it is processing the first observation in a BY group, and sets LAST.variable when it is processing the last observation in a BY group. These assignments enable you to take different actions, based on whether processing is starting for a new BY group or ending for a BY group. For more information, refer to the topic

De-duplication Example

The DATA step code below outputs the last record of each BY group; therefore, de-duplicating the data set by writing out only one record per grouping.

Below are the de-duplication results on the data set used in the previous Cluster Records Example section.

For more information about DATA step, refer to the

Below is the resulting customers2.xlsx file in the Public CAS library.

For more information on the available action sets, refer to the SAS® Cloud Analytic Services 3.3: CASL Reference guide.

For more information on SAS Data Studio and the Code transform, please refer to this SAS Data Studio Code Transform (Part 2) was published on SAS Users.