Your statistical software probably provides a function that computes quantiles of common probability distributions such as the normal, exponential, and beta distributions. Because there are infinitely many probability distributions, you might encounter a distribution for which a built-in quantile function is not implemented. No problem! This article shows how to numerically compute the quantiles of any probability distribution from the definition of the cumulative distribution (CDF).

In SAS, the QUANTILE function computes the quantiles for about 25 distributions. This article shows how you can use numerical root-finding methods (and possibly numerical integration) in SAS/IML software to compute the quantile function for ANY continuous distribution. I have previously written about related topics and particular examples, such as the following:

### The quantile is the root of an integral equation

Computing a quantile would make a good final exam question for an undergraduate class in numerical analysis. Although some distributions have an explicit CDF, many distributions are defined only by a probability density function (the PDF, f(x)) and numerical integration must be used to compute the cumulative distribution (the CDF, F(x)). A canonical example is the normal distribution. I've previously shown how to use numerical integration to compute a CDF from a PDF by using the definition F(x) = ∫ f(t) dt, where the lower limit of the integral is –∞ and the upper limit is x.

Whether the CDF is defined analytically or through numerical integration, the quantile for p is found implicitly as the solution to the equation F(x) = p, where p is a probability in the interval (0,1). This is illustrated by the graph at the right.

Equivalently, you can define G(x; p) = F(x) – p so that the quantile is the root of the equation G(x; p) = 0. For well-behaved densities that occur in practice, a numerical root is easily found because the CDF is monotonically increasing. (If you like pathological functions, see the Cantor staircase distribution.)

### Example: Create a custom distribution

SAS/IML software provides the QUAD subroutine, which provides numerical integration, and the FROOT function, which solves for roots. Thus SAS/IML is an ideal computational environment for computing quantiles for custom distributions.

As an example, consider a distribution that is a mixture of an exponential and a normal distribution:
F(x) = 0.4 Fexp(x; 0.5) + 0.6 Φ(x; 10, 2),
where Fexp(x; 0.5) is the exponential distribution with scale parameter 0.5 and Φ(x; 10, 2) is the normal CDF with mean 20 and standard deviation 2. In this case, you do not need to use numerical integration to compute the CDF. You can compute the CDF as a linear combination of the exponential and normal CDFs, as shown in the following SAS/IML function:

```/* program to numerically find quantiles for a custom distribution */ proc iml; /* Define the cumulative distribution function here. */ start CustomCDF(x); F = 0.4*cdf("Exponential", x, 0.5) + 0.6*cdf("Normal", x, 10, 2); return F; finish;```

The previous section shows the graph of the CDF on the interval [0, 16]. The vertical and horizontal lines correspond to the first, second and third quartiles of the distribution. The quartiles are close to the values Q1 ≈ 0.5, Q2 ≈ 8, and Q3 ≈ 10.5. The next section shows how to compute the quantiles.

### Compute quantiles for an arbitrary distribution

As long as you can define a function that evaluates the CDF, you can find quantiles. For unbounded distributions, it is usually helpful to plot the CDF so that you can visually estimate an interval that contains the quantile. (For bounded distributions, the support of the distribution contains all quantiles.) For the mixture distribution in the previous section, it is clear that the quantiles are in the interval [0, 16].

The following program finds arbitrary quantiles for whichever CDF is evaluated by the CustomCDF function. To find quantiles for a different function, you can modify the CustomCDF and change the interval on which to find the quantiles. You do not need to modify the RootFunc or CustomQuantile functions.

```/* Express CDF(x)=p as the root for the function CDF(x)-p. */ start RootFunc(x) global(gProb); return CustomCDF(x) - gProb; /* quantile for p is root of CDF(x)-p */ finish;   /* You need to provide an interval on which to search for the quantiles. */ start CustomQuantile(p, Interval) global(gProb); q = j(nrow(p), ncol(p), .); /* allocate result matrix */ do i = 1 to nrow(p)*ncol(p); /* for each element of p... */ gProb = p[i]; /* set global variable */ q[i] = froot("RootFunc", Interval); /* find root (quantile) */ end; return q; finish;   /* Example: look for quartiles in interval [0, 16] */ probs = {0.25 0.5 0.75}; /* Q1, Q2, Q3 */ intvl = {0 16}; /* interval on which to search for quantiles */ quartiles = CustomQuantile(probs, intvl); print quartiles[colname={Q1 Q2 Q3}];```

In summary, you can compute an arbitrary quantile of an arbitrary continuous distribution if you can (1) evaluate the CDF at any point and (2) numerically solve for the root of the equation CDF(x)-p for a probability value, p. Because the support of the distribution is arbitrary, the implementation requires that you provide an interval [a,b] that contains the quantile.

The computation should be robust and accurate for non-pathological distributions provided that the density is not tiny or zero at the value of the quantile. Although this example is illustrated in SAS, the same method will work in other software.

The post Compute the quantiles of any distribution appeared first on The DO Loop.

Do a quick search on “data scientist” on any of the popular job boards and there’s no denying the global shortage of data scientists is a real one. And, whether you’re looking for the salary commensurate with the prestigious title, a fast pass to the C-suite, or simply want to [...]

The post SAS Academy for Data Science creates top-rate analytical professionals appeared first on SAS Learning Post.

It is not laziness—it is efficiency!!! Programmers are often called lazy; we even call ourselves lazy. But we are not lazy, we are just being efficient. It makes no sense to type the same code over and over again or use more keystrokes than are absolutely necessary.

### Keyboard Macros

You might not have heard of keyboard macros. Or, perhaps, you do not know how they could help you. I am very fond of keyboard macros; let me show you why!

In SAS Technical Support, supporting the SAS® Output Delivery System (ODS) and Base SAS® procedures, I often use the same statements to set up test programs. For example, I want any style templates that I create to go into the Work directory. I also use the same data set name all of the time. I have created keyboard macros for the statements, data set names, and options that I use daily.

When I press Ctrl+Alt+w, the following is inserted into my program:

`ods path(prepend) work.templat(update);`

When I press Ctrl+Alt+p, the following is inserted into my program:

sashelp.class

How did I do that? I recorded a keyboard macro that contains the code that I want. Then, I assigned keys that insert the code when I press them.

Here are the steps for recording your very own keyboard macro in the SAS Enhanced Editor:

1.  Select Tools ► Keyboard Macros ► Record New Macro.

2.  Enter the code that you want to be your new keyboard macro. Consider typing slowly because any backspaces that you use are included in the recording.

3.  After you are done entering text, you need to tell SAS to stop recording. Select Tools ►Keyboard Macros ►Stop Recording.

4.  A pop-up dialog box appears that lets you give the new macro a name and assign the keys that you want to be associated with the macro. You can set the key combination that make sense to you. Just make sure that you do not use a combination that is already assigned to another macro.

Now, whenever you need to insert that piece of code, just use the keys that you assigned!

In SAS® Enterprise Guide®, you can find keyboard macros under Program ► Editor Macros, instead of the Tools drop-down menu. The recording and key assignment steps are the same in both applications.

You can also create keyboard macros that perform tasks.

The Macros selection opens a pop-up dialog box that contains a Create button.

Clicking Create opens another dialog box.

With the Categories option set to All, you can see all of the commands that are already available. Moving these over to the Keyboard macro contents section enables you to build a macro that performs a task that you need to accomplish on a regular basis.

For example, I have combined these commands to select a whole block of code, like from the PROC statement down to the RUN statement.

Keyboards macros are available in the Enhanced Editor in Display Manager SAS (DMS) and in SAS Enterprise Guide. They cannot be used with the Program Editor in DMS or in SAS® Studio.

You can export and import keyboard macros. The file created when you export has the .kmf extension. You can find the options for importing and exporting in the Macros dialog box. You can share your keyboard macros with your friends, or just to keep them as a backup copy in case you need to reinstall SAS.

For more information, see the Using "Keyboard Macros" section in "Using the Enhanced Editor."

### Function Keys

You have probably used the F8 key to submit your program, or the F4 key to recall your last program. Did you know that you can set or change those instructions?

In the Enhanced Editor, you can get the list of assigned keys by entering keys into the command bar or by selecting Keys under Tools ► Options.

I test a lot, which means that I am routinely clearing the log, the results viewer, and the output window. I have assigned an F key, F12, to clear everything and bring the focus back to the Enhanced Editor (see the commands in the screenshot below). I have to press only one key to clean everything up! I use the F12 key over and over again.

The keys that you assign in DMS are valid from both the Enhanced Editor and the Program Editor.

SAS Enterprise Guide includes a large number of commands by default. A lot of them already have keys assigned, but some do not. You can see the list of the commands and their assigned keys by selecting Enhanced Editor Keys under the Program drop-down menu.

Currently, it is not possible to modify the function keys in SAS Studio. However, a number of keys are already defined that you might find useful. You can see the function key shortcuts by clicking the question mark in the upper right, choosing SAS Studio help, and then selecting the option for Accessibility Features. Here are links to additional resources:

I highly recommend using keyboard macro and function keys. Why type the same thing over and over again? Increase your productivity by handing the repetitive tasks over to SAS.

The consumer packaged goods (CPG) and Retail industry are going through a period of significant change. Both retailers and manufacturers are struggling to find growth and improve profitability. One strategy is through consolidation - e.g., Kraft-Heinz, Keurig- Dr Pepper Snapple Group on the manufacturer side, as well as Safeway-Albertsons, Ahold-Delhaize, Walgreens-Rite Aid on the retailer side. The thinking here is that these mergers would lead to large operational efficiencies and focused growth strategies.

Another important lever to drive growth is pricing and promotion. Companies have realized the importance of getting the pricing right and running high-impact promotions in a highly competitive market. As consumer shop multiple channels and new retail formats begin to permeate (e.g., smaller format stores, new entrants such as Aldi and Lidl), the importance of price-promo continues to increase. Pricing and promotion have become the second largest item on CPG manufacturer’s P&L, after cost-of-goods. Similarly for retailers, price-promo decisions have become critical for growth, maybe even survival. This is manifested in the growth in investment focused on pricing and promotion decisions. In some cases this investment could be as much as 20-25% of net revenue of the company.

However, despite the heavy investment in price-promo, the impact of these decisions is declining. A recent IRI study indicated that the price and promo elasticities (response of volume to pricing change) have been steadily declining over the past 3-4 years. Consumers are willing to buy less when faced with decreases in “regular or base” price as well as promoted price.  The study indicated that the “lift” from promotions had decreased by about 1,000 basis points over the past four years.  There is, therefore, an immediate need to manage price and promotion decisions in a more creative and impactful manner.

### Three areas of improvement

What does this mean? What can companies do to improve the impact of their pricing and promotion investment? We believe that there are three important areas of improvement. The first area is around a more refined understanding of the impact of price-promo decisions.  The new focus is on understanding the true impact of merchandising through both traditional and new lenses, including stockpiling, cross-retailer pricing and advanced price engines. Being able to more accurately predict the pattern of consumer behavior allows for automation and faster and better decisions.

The second area is around rapid and dynamic decision making. This involves a focus on new techniques such as Artificial Intelligence and Machine Learning to drive price-promo decisions. AI/ML is already getting entrenched within demand identification, product development and in-market execution as well as marketing. Within CPG and retail pricing, this will be accomplished by (a) speed in dealing with the regularly-repeated manual tasks in an efficient manner and (b) new levels of insight and accuracy based upon market trends that enable pricing analysts to focus their efforts on the areas that matter in a dynamic manner. It is imperative to move from a user-driven, manual pricing adjustments to dynamic “smart solutions.”

Another important area of change in pricing and promotion is “personalized pricing;”that is allowing manufacturers and retailers to customize price-promo decisions towards individual consumer/shopper segments. This is done by combining frequent shopper (FSP) data with traditional price-promo modeling for an in-depth evaluation of merchandising strategies as well as developing custom offers that would stimulate demand within these segments. IRI research shows that FSP/loyalty card holders react differently to brand price changes. For example, Brand Loyals react stronger to base price changes, while Brand Non-Loyals react stronger to base price reductions, promotional prices and quality merchandising tactics​.

In our session titled “New Frontiers in Pricing Analytics” at the SAS Global Forum 2018, we will provide a detailed overview of the state of the industry and how it is evolving. We will provide an overview of the new techniques and technologies in this space as well as where things are headed in the future. We hope to see you there.

Shifting sands in pricing and promotion was published on SAS Users.

This question is surprising hard for many organizations to answer.  Most organizations lack a roadmap against which they can measure their effectiveness for using data and analytics to optimize key business processes, uncover new business opportunities or deliver a differentiated customer experience. They do not understand what’s possible with respect to integrating big data and data science into the organization’s business model (see Figure 1).

Figure 1: Big Data Business Model Maturity Index

My SAS Global Forum 2018 presentation on Tuesday April 10, 2018 will discuss the transformative potential of big data and advanced analytics, and will leverage the Big Data Business Model Maturity Index as a guide for helping organizations understand where and how they can leverage data and analytics to power their business models.

### Digital Twins, Analytics Profiles and the Power of One

We all understand that the volume and variety of data are increasing exponentially.  Your customers are leaving their digital fingerprints across the Internet via their website, social media, and mobile devices usage.  The Internet of Things will unleash an estimated 44 Zettabytes of data across 7 billion connected people by 2020.

However, big data isn’t really about big; it’s about small. It’s about understanding your customer and product behaviors at the level of the individual.  Big Data is about building detailed behavioral or analytic profiles for each individual (see Figure 2).

Figure 2: Building Individual Behavioral or Analytic Profiles

If you want to better serve your customers, you need to understand their tendencies, behaviors, inclinations, preferences, interests and passions at the level of each individual customer.

Customers’ expectations of their vendors are changing due to their personal experiences.  From recommending products, services, movies, music, routes and even spouses, customers are expecting their vendors to understand they well enough that these vendors can provide a hyper-personalized customer experience.

### Demystifying Data Science (AI | ML | DL)

Too many organizations are spending too much time confusing too many executives on the capabilities of data science.  The concept of data science is simple; data science is about identifying the variables and metrics that might be better predictors of business and operational performance (see Figure 3).

Figure 3: A Moneyball Definition of Data Science

Whether using basic statistics, predictive analytics, data mining, machine learning, or deep learning, almost all of data science benefits are achieved from the simple formula of: Input (A) → Response (B).

Source: Andrew Ng, “What Artificial Intelligence Can and Can’t Do Right Now”

By collaborating closely with the business subject matter experts to choosing Input (A), those variables and metrics that might be better predictor of performance, the data science team can achieve more accurate, more granular, lower latency Response (B).  And the creative creation and selection of Input (A) creatively has already revolutionized many industries, and is poised to revolutionize more.

### Data Monetization and the Economic Value of Data

Data is an unusual asset – it doesn’t deplete, it doesn’t wear out and it can be used across an infinite number of use cases at near zero marginal cost.  Organizations have no other assets with those unique characteristics.  And while traditional accounting methods of valuing assets works well with physical assets, account methods fall horribly – dangerously – short in properly determining the economic value of data.

Instead of using traditional accounting techniques to determine the value of the organization’s data, apply economic and data science concepts to determine the economic value of the data based upon it’s ability to optimize key business and operational processes, reduce compliance and security risks, uncover new revenue opportunities and create a more compelling, differentiated customer experience (see Figure 4).

Figure 4: Data Lake 3.0: Collaborative Value Creation Platform

The data lake, which can house both data and analytic models, is transformed from a simple data repository into a “collaborative value creation platform” that facilities the capture, refinement and sharing of the data and analytic digital assets across the enterprise.

### Creating the Intelligent Enterprise

When you add up all of these concepts and advancements – Big Data, Analytic Profiles, Data Science and the Economic Value of Data – organizations are poised for digital transformation (see Figure 5).

Figure 5: Achieving Digital Transformation

And what is Digital Transformation?

Digital Transformation is application of digital capabilities to processes, products, and assets to improve efficiency, enhance customer value, manage risk, and uncover new monetization opportunities.

Looking forward to seeing you at my SAS Global Forum 2018 session and helping your organizations on its digital transformation!

Data monetization and the economic value of data was published on SAS Users.

In this article, I will set out clear principles for how SAS Viya 3.3 will interoperate with Kerberos. My aim is to present some overview concepts for how we can use Kerberos authentication with SAS Viya 3.3. We will look at both SAS Viya 3.3 clients and SAS 9.4M5 clents. In future blog posts, we’ll examine some of these use cases in more detail.

With SAS Viya 3.3 clients we have different use cases for how we can use Kerberos with the environment. In the first case, we use Kerberos delegation throughout the environment.

### Use Case 1 – SAS Viya 3.3

The diagram below illustrates the use case where Kerberos delegation is used into, within, and out from the environment.

In this diagram, we show the end-user relying on Kerberos or Integrated Windows Authentication to log onto the SAS Logon Manager as part of their access to the visual interfaces. SAS Logon Manager is provided with a Kerberos keytab and HTTP principal to enable the Kerberos connection. In addition, the HTTP principal is flagged as “trusted for delegation” so that the credentials sent by the client include the delegated or forwardable Ticket-Granting Ticket (TGT). The configuration of SAS Logon Manager with SAS Viya 3.3 includes a new option to store this delegated credential. The delegated credential is stored in the credentials microservice, and secured so that only the end-user to which the credential belongs can access it.

When the end-user accesses SAS CAS from the visual interfaces the initial authentication takes place with the standard internal OAuth token. However, since the end-user stored a delegated credential when accessing the SAS Logon Manager an additional Origin attribute is set on the token of “Kerberos.” The internal OAuth token also contains the groups the end-user is a member of within the Claims. Since we want this end-user to run the SAS CAS session as themselves they must have been added to a custom group with the ID=CASHostAccountRequired. When the SAS CAS Controller receives the OAuth token with the additional Kerberos Origin, it requests the visual interface to make a second Kerberized connection. So, the visual interface retrieves the delegated credential from the credentials microservice and uses this to request a Service Ticket to connect to SAS CAS.

SAS CAS has been provided with a Kerberos keytab and a sascas principal to enable the Kerberos connection. Since the sascas principal is flagged as “trusted for delegation,” the credentials sent by the visual interfaces include a delegated or forwardable Ticket-Granting Ticket (TGT). SAS CAS validates the Service Ticket, which in turn authenticates the end-user. The SAS CAS Controller then launches the session as the end-user and constructs a Kerberos ticket cache containing the delegated TGT. Now, within their SAS CAS session the end-user can connect to the Secured Hadoop environment as themselves since the SAS CAS session has access to a TGT for the end-user.

This means in this first use case all access to, within, and out from the SAS Viya 3.3 environment leverages strong Kerberos authentication. This is our “gold-standard” for authenticating the end-user to each part of the environment.

But, it is strictly dependent on the end-user being a member of the custom group with ID=CASHostAccountRequired, and the two principals (HTTP and sascas) being trusted for delegation. Without both the Kerberos delegation will not take place.

### Use Case 1a – SAS Viya 3.3

The diagram below illustrates a slight deviation on the first use case.

Here, either through choice or by omission, the end-user is not a member of the custom group with the ID=CASHostAccountRequired. Now even though the end-user connects with Kerberos and irrespective of the configuration of SAS Logon Manager to store delegated credentials the second connection using Kerberos is not made to SAS CAS. Now the SAS CAS session runs as the account that launched the SAS CAS controller, cas by default. Since, the session is not running as the end-user and SAS CAS did not receive a Kerberos connection, the Kerberos ticket cache that is generated for the session does not contain the credentials of the end-user. Instead, the Kerberos keytab and principal supplied to SAS CAS are used to establish the credentials in the Kerberos ticket cache.

This means that even though Kerberos was used to connect to SAS Logon Manager the connection to the Secured Hadoop environment is as the sascas principal and not the end-user.

The same situation could be arrived at if the HTTP principal for SAS Logon Manager is not trusted for delegation.

Use Case 1b – SAS Viya 3.3

A final deviation to the initial use case is shown in the following diagram.

In this case the end-user connects to SAS Logon Manager with any other form of authentication. This could be the default LDAP authentication, external OAuth, or external SAML authentication. Just as in use case 1a, this means that the connection to SAS CAS from the visual interfaces only uses the internal OAuth token. Again, since no delegated credentials are used to connect to SAS CAS the session is run as the account that launched the SAS CAS controller. Also, the ticket cache created by the SAS Cloud Analytic Service Controller contains the credentials from the Kerberos keytab, i.e. the sascas principal. This means that access to the Secured Hadoop environment is as the sascas principal and not the end-user.

### Use Case 2 – SAS Viya 3.3

Our second use case covers those users entering the environment via the programming interfaces, for example SAS Studio. In this case, the end-users have entered a username and password, a credential set, into SAS Studio. This credential set is used to start their individual SAS Workspace Session and to connect to SAS CAS from the SAS Workspace Server. This is illustrated in the following figure.

Since the end-users are providing their username and password to SAS CAS it behaves differently. SAS CAS uses its own Pluggable Authentication Modules (PAM) configuration to validate the end-user’s credentials and hence launch the SAS CAS session process running as the end-user. However, in addition the SAS CAS Controller also uses the username and password to obtain an OAuth token from SAS Logon Manager and then can obtain any access control information from the SAS Viya 3.3 microservices. Obtaining the OAuth token from the SAS Logon Manager ensures any restrictions or global caslibs defined in the visual interfaces are observed in the programming interfaces.

With the SAS CAS session running as the end-user and any access controls validated, the SAS CAS session can access the Secured Hadoop cluster. Now since the SAS CAS session was launched using the PAM configuration, the Kerberos credentials used to access Hadoop will be those of the end-user. This means the PAM configuration on the machines hosting SAS CAS must be linked to Kerberos. This PAM configuration then ensures the Kerberos Ticket-Granting Ticket is available to the CAS Session as is it launched.

Next, we consider three further use cases where the client is SAS 9.4 maintenance 5. Remember that SAS 9.4 maintenance 5 can make a direct connection to SAS CAS without requiring SAS/CONNECT. The use cases we will discuss will illustrate the example with a SAS 9.4 maintenance 5 web application, such as SAS Studio. However, the statements and basic flows remain the same if the SAS 9.4 maintenance 5 client is a desktop application like SAS Enterprise Guide.

### Use Case 3 – SAS 9.4 maintenance 5

First, let’s consider the case where our SAS 9.4 maintenance 5 end-user enters their username and password to access the SAS 9.4 environment. This is illustrated in the following diagram.

In this case, since the SAS 9.4 Workspace Server is launched using a username and password, these are cached on the launch of the process. This enables the SAS 9.4 Workspace Server to use these cached credentials when connecting to SAS CAS. However, the same process occurs if instead of the cached credentials being provided by the launching process, they are provided by another mechanism. These credentials could be provided from SAS 9.4 Metadata Server or from an authinfo file in the user’s home directory on the SAS 9.4 environment. In any case, the process on the SAS Cloud Analytic Server Controller is the same.

The username and password used to connect are validated through the PAM stack on the SAS CAS Controller, as well as being used to generate an internal OAuth token from the SAS Viya 3.3 Logon Manager. The PAM stack, just as in the SAS Viya 3.3 programming interface use case 2 above, is responsible for initializing the Kerberos credentials for the end-user. These Kerberos credentials are placed into a Kerberos Ticket cache which makes them available to the SAS CAS session for the connection to the Secured Hadoop environment. Therefore, all the different sessions within SAS 9.4, SAS Viya 3.3, and the Secured Hadoop environment run as the end-user.

### Use Case 4 – SAS 9.4 maintenance 5

Now what about the case where the SAS 9.4 maintenance 5 environment is configured for Kerberos authentication throughout. The case where we have Kerberos delegation configured in SAS 9.4 is shown here.

Here the SAS 9.4 Workspace Server is launched with Kerberos credentials, the Service Principal for the SAS 9.4 Object Spawner will need to be trusted for delegation. This means that a Kerberos credential for the end-user is available to the SAS 9.4 Workspace Server. The SAS 9.4 Workspace Server can use this end-user Kerberos credential to request a Service Ticket for the connection to SAS CAS. SAS CAS is provided with a Kerberos keytab and principal it can use to validate this Service Ticket. Validating the Service Ticket authenticates the SAS 9.4 end-user to SAS CAS. The principal for SAS CAS must also be trusted for delegation. We need the SAS CAS session to have access to the Kerberos credentials of the SAS 9.4 end-user.

These Kerberos credentials made available to the SAS CAS are used for two purposes. First, they are used to make a Kerberized connection to the SAS Viya Logon Manager, this is to obtain the SAS Viya internal OAuth token. Therefore, the SAS Viya Logon Manager must be configured to accept Kerberos connections. Second, the Kerberos credentials of the SAS 9.4 end-user are used to connect to the Secure Hadoop environment.

In this case since all the various principals are trusted for delegation, our SAS 9.4 end-user can perform multiple authentication hops using Kerberos with each component. This means that through the use of Kerberos authentication the SAS 9.4 end-user is authenticated into SAS CAS and out to the Secure Hadoop environment.

### Use Case 5 – SAS 9.4 maintenance 5

Finally, what about cases where the SAS 9.4 maintenance 5 session is not running as the end-user but as a launch credential; this is illustrated here.

The SAS 9.4 session in this case could be a SAS Stored Process Server, Pooled Workspace Server, or a SAS Workspace server leveraging a launch credential such as sassrv. The key point being that now the SAS 9.4 session is not running as the end-user and has no access to the end-user credentials. In this case we can still connect to SAS CAS and from there out to the Secured Hadoop environment. However, this requires some additional configuration. This setup will leverage One-Time-Passwords generated by the SAS 9.4 Metadata Server, so the SAS 9.4 Metadata Server must be made aware of the SAS CAS. This is done by adding a SAS 9.4 metadata definition for the SAS CAS. Our connection from SAS 9.4 must then be “metadata aware,” achieved by using authdomain=_sasmeta_ on the connection.

Equally, the SAS Viya 3.3 side of the environment must be able to validate the One-Time-Password used to connect to SAS CAS. When SAS CAS receives the One-Time-Password on the connection, it is sent to the SAS Viya Logon Manager for validation and to obtain a SAS Viya internal OAuth token. We need to add some configuration to the SAS Viya Logon Manager to enable this to validate the One-Time-Password. We configured the SAS Viya Logon Manager with the details of where the SAS 9.4 Web Infrastructure Platform is running. The SAS Viya Logon Manager passes the One-Time-Password to the SAS 9.4 Web Infrastructure Platform to validate the One-Time-Password. After the One-Time-Password is validated a SAS Viya internal OAuth token is generated and passed back to SAS CAS.

Since SAS CAS does not have access to the end-user credentials, the session that is created will be run using the account used to launch the controller process, cas by default. Since the end-user credentials are not available, the Kerberos credentials that are initialized for the session are from the Kerberos keytab provided to SAS CAS. Then the connection to the Secured Hadoop environment will be made using those Kerberos credentials of the principal assigned to the SAS CAS.

### Summary

We have presented several use cases above. The table below can be used to summarize and differentiate the use cases based on key factors.

SAS Viya 3.3 - Some Kerberos principles was published on SAS Users.

You might have seen a SAS Global Forum infographic floating around the web. And maybe you  wondered how you might create something similar using SAS software? If so, then this blog's for you - I have created my own version of the infographic using SAS/Graph, and I'll show you how [...]

The post Building a SAS Global Forum infographic appeared first on SAS Learning Post.

When I first learned to program in SAS, I remember being confused about the difference between CLASS statements and BY statements. A novice SAS programmer recently asked when to use one instead of the other, so this article explains the difference between the CLASS statement and BY variables in SAS procedures.

The BY statement and the CLASS statement in SAS both enable you to specify one or more categorical variables whose levels define subgroups of the data. (For simplicity, we consider only a single categorical variable.) The primary difference is that the BY statement computes many analyses, each on a subset of the data, whereas the CLASS statement computes a single analysis of all the data. Specifically,

• The BY statement repeats an analysis on every subgroup. The subgroups are treated as independent samples. If a BY variable defines k groups, the output will contains k copies of every table and graph, one copy for the first group, one copy for the second group, and so on.
• The CLASS statement enables you to include a categorical variable as part of an analysis. Often the CLASS variable is used to compare the groups, such as in a t test or an ANOVA analysis. In regression models, the CLASS statement enables you to estimate parameters for the levels of a categorical variable, thereby estimating the effect of each level on the response. Another use of a CLASS variable is to define categories for a classification task, such as a discriminant analysis.

To illustrate the differences between an analysis that uses a BY statement and one that uses a CLASS statement, let's create a subset (called Cars) of the Sashelp.Cars data. The levels of the Origin variable indicate whether a vehicle is manufactured in "Asia", "Europe", or the "USA". For efficiency reasons, most classical SAS procedures require that you sort the data when you use a BY statement. Therefore, a call to PROC SORT creates a sorted version of the data called CarsSorted, which will be used for the BY-group analyses.

```data Cars; set Sashelp.Cars; where cylinders in (4,6,8) and type ^= 'Hybrid'; run;   proc sort data=Cars out=CarsSorted; by Origin; run;```

### Descriptive statistics for grouped data

When you generate descriptive statistics for groups of data, the univariate statistics are identical whether you use a CLASS statement or a BY statement. What changes is the way that the statistics are displayed. When you use the CLASS statement, you get one table that contains all statistics or one graph that shows the distribution of each subgroup. However, when you use the BY statement you get multiple tables and graphs.

The following statements use the CLASS statement to produce descriptive statistics. PROC UNIVARIATE displays one (paneled) graph that shows a comparative histogram for the vehicles that are made in Asia, Europe, and USA. PROC MEANS displays one table that contains descriptive statistics:

```proc univariate data=Cars; class Origin; var Horsepower; histogram Horsepower / nrows=3; /* must use NROWS= to get panel */ ods select histogram; run;   proc means data=Cars N Mean Std; class Origin; var Horsepower Weight Mpg_Highway; run;```

In contrast, if you run a BY-group analysis on the levels of the Origin variable, you will see three times as many tables and graphs. Each analysis is preceded by a label that identifies each BY group. Notice that the BY-group analysis uses the sorted data.

```proc means data=CarsSorted N Mean Std; by Origin; var Horsepower Weight Mpg_Highway; run;```

Always remember that the output from a BY statement is equivalent to the output from running the procedure multiple times on subsets of the data. For example, the previous statistics could also be generated by calling PROC MEANS three times, each call with a different WHERE clause, as follows:

```proc means N Mean Std data=CarsSorted( where=(origin='Asia') ); var Horsepower Weight Mpg_Highway; run; proc means N Mean Std data=CarsSorted( where=(origin='Europe') ); var Horsepower Weight Mpg_Highway; run; proc means N Mean Std data=CarsSorted( where=(origin='USA') ); var Horsepower Weight Mpg_Highway; run;```

In fact, if you ever find yourself repeating an analysis many times (perhaps by using a macro loop), you should consider whether you can rewrite your program to be more efficient by using a BY statement.

### Comparing groups: Use the CLASS statement

As a general rule, you should use a CLASS statement when you want to compare or contrast groups. For example, the following call to PROC GLM performs an ANOVA analysis on the horsepower (response variable) for the three groups defined by the Origin variable. The procedure automatically creates a graph that displays three boxplots, one for each group. The procedure also computes parameter estimates for the levels of the CLASS variable (not shown).

```proc glm data=Cars; /* by default, create graph with side-by-side boxplots */ class Origin; model Horsepower = Origin / solution; run;```

You can specify multiple variables on the CLASS statement to include multiple categorical variables in a model. Any variables that are not listed on the CLASS statement are assumed to be continuous. Thus the following call to PROC GLM analyzes a model that has one continuous and one classification variable. The procedure automatically produces a graph that overlays the three regression curves on the data:

```ods graphics /antialias=on; title "CLASS Variable Regression: One Model with Multiple Parameters"; proc GLM data=Cars plots=FitPlot; class Origin; model Horsepower = Origin | Weight / solution; ods select ParameterEstimates ANCOVAPlot; quit;```

In contrast, if you use a BY statement, the Origin variable cannot be part of the model but is used only to subset the data. If you use a BY statement, you obtain three different models of the form Horsepower = Weight. You get three parameter estimates tables and three graphs, each showing one regression line overlaid on a subset of the data.

### Predicted Values: CLASS VARIABLE versus BY Variable

When you use a BY statement and fit three models of the form Horsepower = Weight, the procedure fits a total of six parameters. Notice that when you use the CLASS statement and fit the model Horsepower = Origin | Weight, you also fit six free parameters. It turns out that these two methods produce the same predicted values. In fact, you can combine the parameter estimates (for the GLM parameterization) for the CLASS model to obtain the parameter estimates from the BY-variable analysis, as shown below. Each parameter estimate for the BY-variable models are obtained as the sum of two estimates for the CLASS-variable analysis:

For many regression models, the predicted values for the BY-variable analyses are the same as for a particular model that uses a CLASS variable. As shown above, you can even see how the parameters are related when you use a GLM or reference parameterization. However, the CLASS variable formulation can fit models (such as the equal-slope model Horsepower = Origin Weight) that are not available when you use a BY variable to fit three separate models. Furthermore, the CLASS statement provides parameter estimates so that you can see the effect of the groups on the response variable. It is more difficult to compare the models that are produced by using the BY statement.

### Other CLASS-like statements in SAS

Some SAS procedures use other syntax to analyze groups. In particular, the SGPLOT procedure calls classification variables "group variables." If you want to overlay graphs for multiple groups, you can use the GROUP= option on many SGPLOT statements. (Some statements support the CATEGORY= option, which is similar.) For example, to replicate the two-variable regression analysis from PROC GLM, you can use the following statements in PROC SGPLOT:

```proc sgplot data=Cars; reg y=Horsepower x=Weight / group=Origin; /* Horsepower = Origin | Weight */ run;```

### Summary

In summary, use the BY statement in SAS procedures when you want to repeat an analysis for every level of one or more categorical variables. The variables define the subsets but are not otherwise part of the analysis. In classical SAS procedures, the data must be sorted by the BY variables. A BY-group analysis can produce many tables and graphs, so you might want to suppress the ODS output and write the results to a SAS data set.

Use the CLASS statement when you want to include a categorical variable in a model. A CLASS statement often enables you to compare or contrast subgroups. For example, in regression models you can evaluate the relative effect of each level on the response variable.

In some cases, the BY statement and the CLASS statement produce identical statistics. However, the CLASS statement enables you to fit a wider variety of models.

The post The difference between CLASS statements and BY statements in SAS appeared first on The DO Loop.

In the hype and excitement surrounding artificial intelligence and big data, most of us miss out on critical aspects related to collection, processing, handling and analyzing data. It's important for data science practitioners to understand these critical aspects and add a human touch to big data. What are these aspects? [...]