9月 082017
 

As Hurricane Irma makes its way through the Caribbean, and heads towards the United States, the big question on everyone's mind ... is the hurricane going to hit my city? Or, as some people like to say, "should I buy milk & bread?" Let's analyze & map some data to [...]

The post What cities are in Hurricane Irma's path? appeared first on SAS Learning Post.

9月 082017
 

In SAS Visual Analytics 8.1, report creators have the ability to include drive-distance and drive-time in their geographical maps, but only if their site has an Esri ArcGIS Online account and they have valid credentials for the account.

In the user Settings for SAS Visual Analytics Geographic Mapping 8.1 release, there are three choices for selection of a geographic map provider.  The map provider creates the background map for geo maps and for network diagrams that display a map.

The map provider options are:

  • OpenStreet Map service, hosted at SAS.
  • Esri ArGIS Online Services, which only requirFinale acceptance of the terms and conditions.
  • Esri premium services, which requires a credential validation.

If Esri premium services is selected, there is an additional prompt for valid credentials, and you must still accept the Esri ArcGIS Online Services terms in order to select the premium services checkbox.

It’s also worth pointing out here, that even if you have Esri premium credentials, in order for these credentials to be validated in SAS Visual Analytics, you must also be a member of the ESRI Users custom group.  Users can be added to this group in SAS Environment Manager, as shown below.

Note that without the Esri ‘premium’ service and validated credentials, when you right-click and Create geographic selection in your report map, you are only able to select the Distance selection, which displays the radial distance for the selection point.

With premium services in effect, you can also select drive-time or drive-distance.  An example of a drive-time selection is shown here.  Drive-time creates an irregular selection based on the distance that can be driven in the specified amount of time.

A drive-distance example is shown below.  Drive-distance creates an irregular selection based on the driving distance using roads.

When selecting drive-time or drive distance, you can also add breaks to show, as in the example below, the 5-mile distance, the 10-mile distance, and the 15-mile distance on the maps.

It’s also worth pointing out, that if a viewer of the report has not had Esri premium credentials validated, the viewer will be unable to view the drive-distances and drive-time features.  The settings for users of the report viewer are also stored in the Report Viewer Geographic Mapping user settings.

If a user is adding a connection to the server in SAS Mobile BI 8.15 and their account is a member of the Esri Users group, they will be prompted for their Esri premium credentials when adding the server connection:

I hope you’ve found this post helpful.

How do I access the Premium Esri Map Service for my SAS Visual Analytics reports? was published on SAS Users.

9月 072017
 

I suppose we've all been watching Hurricane Irma rip through the Caribbean like a giant buzzsaw blade, with wind speeds over 180mph. This is one of those rare Category 5 storms. But just how rare are Category 5 hurricanes? According to the Wikipedia page, hurricanes with wind speeds >=157mph are [...]

The post How rare are Category 5 hurricanes? appeared first on SAS Learning Post.

9月 072017
 

If you use SAS regression procedures, you are probably familiar with the "stars and bars" notation, which enables you to construct interaction effects in regression models. Although you can construct many regression models by using that classical notation, a friend recently reminded me that the EFFECT statement in SAS provides greater control over the interaction terms in a regression model.

The EFFECT statement is supported in many SAS procedures, including the GLIMMIX, GLMSELECT, and LOGISTIC procedures. Because those procedures can output a design matrix, you can use a model that is generated by an EFFECT statement in any SAS procedure, even older procedures that do not support the EFFECT statement. I have previously shown how you can use the SPLINE option in the EFFECT statement to generate spline effects. This article deals with polynomial effects, which are effects that are formed by elementwise multiplication of continuous variables, such as x1*x1, x1*x2, and x1*x1*x2*x3.

The following statements rename a few continuous variables in the Sashelp.Heart data set, which will be used in the examples. The new variables (x1, x2, x3, and x4) are easier to type and using these short variable names makes it easier to see how interactions effects are formed.

data Heart / view=Heart;
set Sashelp.Heart;
rename Height=x1  Weight=x2  Diastolic=x3  Systolic=x4;
run;

A review of "stars and bars" notation in SAS

Recall that many SAS regression procedures (such as GLM, GENMOD, LOGISTIC, MIXED,...) support the bar operator (|) to specify interactions between effects. For example, the following MODEL statement specifies that the model should include all main effects and all higher-order interactions:

proc logistic data=Heart;
   model Y = x1 | x2 | x3 | x4;   /* all main effects and interactions up to 4-way */
run;

The previous MODEL statement includes all two-way, three-way, and four-way interaction effects between distinct variables. In practice, fitting a model with so many effects will lead to overfitting, so most analysts restrict the model to two-way interactions. In SAS you can use the "at" operator (@) to specify the highest interaction terms in the model. For example, the following syntax specifies that the model contains only main effects and two-way interactions:

model Y = x1 | x2 | x3 | x4 @2;   /* main effects and two-way interactions */
/* equivalent: model Y = x1 x2 x3 x4   x1*x2 x1*x3 x1*x4   x2*x3 x2*x4   x3*x4; */

Use the EFFECT statement to build polynomial effects

Notice that the bar operator does not generate the interaction of a variable with itself. For example, the terms x1*x1 and x2*x2 are not generated. Notice also that you need to explicitly type out each variable when you use the bar operator. You cannot use "colon notation" (x:) or hyphens (x1-x4) to specify a range of variables. For four variable, this is an inconvenience; for hundreds of variables, this is a serious problem.

The EFFECT statement enables you to create polynomial effects, which have the following advantages:

  • You can use colon notation or a hyphen to specify a range of variables. (You can also specify a space-separated list of variable names.)
  • You can control the degree of the polynomial terms and the maximum value of the exponent that appears in each term.
  • You can generate interactions between a variable and itself.

The POLYNOMIAL option in the EFFECT statement is described in terms of multivariate polynomials. Recall that a multivariate monomial is a product of powers of variables that have nonnegative integer exponents. For example, x2 y z is a monomial that contains three variables. The degree of a monomial is the sum of the exponents of the variables. The previous monomial has degree 4 because 4 = 2 + 1 + 1.

Use the EFFECT statement to generate two-way interactions

The syntax for the EFFECT statement is simple. For example, to generate all main effects and two-way interactions, including second-degree terms like x1*x1 and x2*x2, use the following syntax:

ods select ParameterEstimates(persist);
proc logistic data=Heart;
   effect poly2 = polynomial(x1-x4 / degree=2);
   model Status = poly2;
/* equivalent:  model Status = x1 | x2 | x3 | x4  @2     
                               x1*x1 x2*x2 x3*x3 x4*x4;  */
run;

The name of the effect is 'poly2'. It is a polynomial effect that contains all terms that involve first- and second-degree monomials. Thus it contains the main effects, the two-way interactions between variables, and the terms x1*x1, x2*x2, x3*x3, and x4*x4. The equivalent model in "stars and bars" notation is shown in the comment. The models are the same, although the variables are listed in a different order.

You can also use the colon operator to select variables that have a common prefix. For example, to specify all polynomial effects for variables that begin with the prefix 'x', you can use EFFECT poly2 = POLYNOMIAL(x: / DEGREE=2).

You can use the MDEGREE= option to control the maximum degree of any exponent in a monomial. For example, the monomials x1*x1 and x1*x2 are both second-degree monomials, but the maximum exponent that appears in the first monomial is 2 whereas the maximum exponent that appears in the second monomial is 1. The following syntax generates only monomial terms for which the maximum exponent is 1, which means that x1*x1 and x2*x2 will not appear:

proc logistic data=Heart;
   effect poly21 = polynomial(x1-x4 / degree=2 mdegree=1);  /* exclude x1*x1, x2*x2, etc. */
   model Status = poly21;
/* equivalent:  model Status = x1 | x2 | x3 | x4  @2    */
run;

Use the EFFECT statement to generate two-way interactions between lists of variables

A novel use of polynomial effects is to generate all two-way interactions between variables in one list and variables in another list. For example, suppose you are interested in the interactions between the lists (x1 x2) and (x3 x4), but you are not interested in within-list interactions such as x1*x2 and x3*x4. By using polynomial effects, you can create the two lists and then use the bar operator to request all main and two-way interactions between elements of the lists, as follows:

proc logistic data=Heart;
   effect list1 = polynomial(x1-x2);    /* first list of variables */
   effect list2 = polynomial(x3-x4);    /* second list of variables */
   model Status = list1 | list2;        /* main effects and pairwise interactions between lists */
/* equivalent:
   model Status = x1 x2                    
                  x3 x4                
                  x1*x3 x1*x4 x2*x3 x2*x4; */
run;

Notice that you can use the EFFECT statement multiple times in the same procedure. This is a powerful technique! By using multiple EFFECT statements, you can name sets of variables that have similar properties, such as demographic effects (age, sex), lifestyle effects (diet, exercise), and physiological effects (blood pressure, cholesterol). You can then easily construct models that contain interactions between these sets, such as LIFESTYLE | PHYSIOLOGY.

In summary, the POLYNOMIAL option in the EFFECT statement enables you to control the interactions between variables in a model. You can form models that are equivalent to the "stars and bars" syntax or specify more complex models. In particular, you can use the EFFECT statement multiple times to construct lists of variables and interactions between elements of the lists. For more information about polynomial effects, see the SAS documentation for the POLYNOMIAL option in the EFFECT STATEMENT.

The post Construct polynomial effects in SAS regression models appeared first on The DO Loop.

9月 072017
 
In many introductory to image recognition tasks, the famous MNIST data set is typically used. However, there are some issues with this data:

1. It is too easy. For example, a simple MLP model can achieve 99% accuracy, and a 2-layer CNN can achieve 99% accuracy.

2. It is over used. Literally every machine learning introductory article or image recognition task will use this data set as benchmark. But because it is so easy to get nearly perfect classification result, its usefulness is discounted and is not really useful for modern machine learning/AI tasks.

Therefore, there appears Fashion-MNIST dataset. This dataset is developed as a direct replacement for MNIST data in the sense that:

1. It is the same size and style: 28x28 grayscale image
2. Each image is associated with 1 out of 10 classes, which are:
       0:T-shirt/top,
       1:Trouser,
       2:Pullover,
       3:Dress,
       4:Coat,
       5:Sandal,
       6:Shirt,
       7:Sneaker,
       8:Bag,
       9:Ankle boot
3. 60000 training sample and 10000 testing sample Here is a snapshot of some samples:
Since its appearance, there have been multiple submissions to benchmark this data, and some of them are able to achieve 95%+ accuracy, most noticeably Residual network or separable CNN.
I am also trying to benchmark against this data, using keras. keras is a high level framework for building deep learning models, with selection of TensorFlow, Theano and CNTK for backend. It is easy to install and use. For my application, I used CNTK backend. You can refer to this article on its installation.

Here, I will benchmark two models. One is a MLP with layer structure of 256-512-100-10, and the other one is a VGG-like CNN. Code is available at my github: https://github.com/xieliaing/keras-practice/tree/master/fashion-mnist

The first model achieved accuracy of [0.89, 0.90] on testing data after 100 epochs, while the latter achieved accuracy of >0.94 on testing data after 45 epochs. First, read in the Fashion-MNIST data:

import numpy as np
import io, gzip, requests
train_image_url = "http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz"
train_label_url = "http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz"
test_image_url = "http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz"
test_label_url = "http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz"

def readRemoteGZipFile(url, isLabel=True):
response=requests.get(url, stream=True)
gzip_content = response.content
fObj = io.BytesIO(gzip_content)
content = gzip.GzipFile(fileobj=fObj).read()
if isLabel:
offset=8
else:
offset=16
result = np.frombuffer(content, dtype=np.uint8, offset=offset)
return(result)

train_labels = readRemoteGZipFile(train_label_url, isLabel=True)
train_images_raw = readRemoteGZipFile(train_image_url, isLabel=False)

test_labels = readRemoteGZipFile(test_label_url, isLabel=True)
test_images_raw = readRemoteGZipFile(test_image_url, isLabel=False)

train_images = train_images_raw.reshape(len(train_labels), 784)
test_images = test_images_raw.reshape(len(test_labels), 784)
Let's first visual it using tSNE. tSNE is said to be the most effective dimension reduction tool.This plot function is borrowed from sklearn example.

from sklearn import manifold
from time import time
import matplotlib.pyplot as plt
from matplotlib import offsetbox
plt.rcParams['figure.figsize']=(20, 10)
# Scale and visualize the embedding vectors
def plot_embedding(X, Image, Y, title=None):
x_min, x_max = np.min(X, 0), np.max(X, 0)
X = (X - x_min) / (x_max - x_min)

plt.figure()
ax = plt.subplot(111)
for i in range(X.shape[0]):
plt.text(X[i, 0], X[i, 1], str(Y[i]),
color=plt.cm.Set1(Y[i] / 10.),
fontdict={'weight': 'bold', 'size': 9})

if hasattr(offsetbox, 'AnnotationBbox'):
# only print thumbnails with matplotlib > 1.0
shown_images = np.array([[1., 1.]]) # just something big
for i in range(X.shape[0]):
dist = np.sum((X[i] - shown_images) ** 2, 1)
if np.min(dist) < 4e-3:
# don't show points that are too close
continue
shown_images = np.r_[shown_images, [X[i]]]
imagebox = offsetbox.AnnotationBbox(
offsetbox.OffsetImage(Image[i], cmap=plt.cm.gray_r),
X[i])
ax.add_artist(imagebox)
plt.xticks([]), plt.yticks([])
if title is not None:
plt.title(title)

tSNE is very computationally expensive, so for impatient people like me, I used 1000 samples for a quick run. If your PC is fast enough and have time, you can run tSNE against the full dataset.

sampleSize=1000
samples=np.random.choice(range(len(Y_train)), size=sampleSize)
tsne = manifold.TSNE(n_components=2, init='pca', random_state=0)
t0 = time()
sample_images = train_images[samples]
sample_targets = train_labels[samples]
X_tsne = tsne.fit_transform(sample_images)
t1 = time()
plot_embedding(X_tsne, sample_images.reshape(sample_targets.shape[0], 28, 28), sample_targets,
"t-SNE embedding of the digits (time %.2fs)" %
(t1 - t0))
plt.show()
We see that several features, including mass size, split on bottom and semetricity, etc, separate the categories. Deep learning excels here because you don't have to manually engineering the features but let the algorithm extracts those.

In order to build your own networks, we first import some libraries

from keras.models import Sequential
from keras.layers.convolutional import Conv2D, MaxPooling2D, AveragePooling2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers import Activation
We also do standard data preprocessing:

X_train = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
X_test = test_images.reshape(test_images.shape[0], 28, 28, 1).astype('float32')

X_train /= 255
X_test /= 255

X_train -= 0.5
X_test -= 0.5

X_train *= 2.
X_test *= 2.

Y_train = train_labels
Y_test = test_labels
Y_train2 = keras.utils.to_categorical(Y_train).astype('float32')
Y_test2 = keras.utils.to_categorical(Y_test).astype('float32')
Here is the simple MLP implemented in keras:

mlp = Sequential()
mlp.add(Dense(256, input_shape=(784,)))
mlp.add(LeakyReLU())
mlp.add(Dropout(0.4))
mlp.add(Dense(512))
mlp.add(LeakyReLU())
mlp.add(Dropout(0.4))
mlp.add(Dense(100))
mlp.add(LeakyReLU())
mlp.add(Dropout(0.5))
mlp.add(Dense(10, activation='softmax'))
mlp.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
mlp.summary()

This model achieved almost 90% accuracy on test dataset at about 100 epochs. Now, let's build a VGG-like CNN model. We use an architecture that is similar to VGG but still very different. Because the figure data is small, if we use original VGG architecture, it is very likely to overfit and won't perform very well in testing data which is observed in publically submitted benchmarks listed above. To build such a model in keras is very natural and easy:

num_classes = len(set(Y_train))
model3=Sequential()
model3.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same",
input_shape=X_train.shape[1:], activation='relu'))
model3.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation='relu'))
model3.add(MaxPooling2D(pool_size=(2, 2)))
model3.add(Dropout(0.5))
model3.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same", activation='relu'))
model3.add(Conv2D(filters=256, kernel_size=(3, 3), padding="valid", activation='relu'))
model3.add(MaxPooling2D(pool_size=(3, 3)))
model3.add(Dropout(0.5))
model3.add(Flatten())
model3.add(Dense(256))
model3.add(LeakyReLU())
model3.add(Dropout(0.5))
model3.add(Dense(256))
model3.add(LeakyReLU())
#model2.add(Dropout(0.5))
model3.add(Dense(num_classes, activation='softmax'))
model3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model3.summary()
This model has 1.5million parameters. We can call 'fit' method to train the model:

model3_fit=model3.fit(X_train, Y_train2, validation_data = (X_test, Y_test2), epochs=50, verbose=1, batch_size=500)
After 40 epochs, this model archieves accuracy of 0.94 on testing data.Obviously, there is also overfitting problem for this model. We will address this issue later.

 Posted by at 8:48 上午
9月 062017
 

When it comes to the SAS web experience on sas.com, support.sas.com (and more), we have a vision and mission to create experiences that connect people to the things that matter – quickly, easily and enjoyably. You could say that we’re a user-task-focused bunch of web people here, and so whether you’re a web visitor to SAS researching a solution area or evaluating a product, looking for how-to content or applying for a job, our goal is to make sure you complete that task easily.

Using tools like SAS Customer Intelligence 360 helps us do this by allowing us to take the guesswork out creating the most optimized web experiences possible through it's IA, machine learning, omnichannel marketing and analytics and more is all the rage – and for good reason – don’t lose sight of the power and impact of good old fashioned a/b and multivariant testing.

The power of small in a big customer journey

If you think of your website as a product, and think of that product as being comprised of dozens, maybe hundreds of small interactions that users engage with – imagery, video, buttons, content, forms, etc. – then the ability to refine and improve those small interactions for users can have big impact and investment return for the product as a whole. Herein lies the beauty of web testing – the ability (really art and science) of taking these small interactions and testing them to refine and improve the user experience.

So what does this look like in real life, and how to do it with SAS Customer Intelligence 360?

On the top right corner of the sas.com homepage we have a small “sticky” orange call-to-action button for requesting a SAS demo. We like to test this button.

A button test? Yes, I know – it doesn’t get much smaller than this, which is why I affectionately refer to this particular button as “the little button that could.” It’s small but mighty, and by the end of this year, this little button will have helped to generate several hundred demo and pricing requests for sales. That’s good for SAS, but better for our site visitors because we’re helping to easily connect them with a high-value task they’re looking to accomplish during their customer journey.

How do we know this button is mighty? We’ve tested a ton of variations with this little guy measuring CTR and CVR. It started off as a “Contact Us” button, and went through a/b test variations as “Connect with SAS” “How to Buy” “Request Pricing” as we came to realize what users were actually contacting us for. So here we are today with our control as “Request a SAS Demo>"

Setting up a simple a/b test like this literally takes no longer than five minutes in SAS Customer Intelligence 360. Here's how:

  • First, you set up a message or creative.
  • Finally, you create your

    Easy breezy. Activate it, let it run, get statistical significance, declare a winner, optimize, rinse and repeat.

    Now, add segmentation targeting to the mix

    So now let’s take our testing a step further. We’ve tested our button to where we have a strong control, but what if we now refine our testing and run a test targeted to a particular segment, such as a “return user” segment to our website - and test button variations of Request a SAS Demo vs. Explore SAS Services.

    Why do this? The hypothesis is that for new users to our site, requesting a SAS demo is a top task, and our control button is helping users accomplish that. For repeat visitors, who know more about SAS, our solutions and products – maybe they are deeper in the customer journey and doing more repeat research and validation on www.sas.com. If so, what might be more appropriate content for that audience? Maybe it’s our services content. SAS has great services available to SAS users - such as Training, Certification, Consulting, Technical Support, and more. Would this content be more relevant for a return user on the website that's possibly deeper in the research and evaluate phase, or maybe already a customer? Let's find out.

    Setting up this segmentation a/b test is just like I noted above – you create a spot, build the creative, and set up your task. After you have set up the task, you have the option to select a “Target” as part of this task, and for this test, we select "New or Return User" as the criteria from the drop down, and then "Returning" as the value. Then just activate and see what optimization goodness takes place.

    So, how did our test perform?

    I'll share results and what we learn from this test in the upcoming weeks. Regardless of the results though, it's not really about what variation wins, but rather it's about what we learn from simply trying to improve the user experience that allows us to continue to design and build good, effective, optimized user experiences. Tools like SAS Customer Intelligence 360 and it's web testing and targeting capabilities allow us to do that faster and more efficiently than ever.

     

     

     

     

     

     

     

    Using SAS at SAS: SAS Customer Intelligence 360, a/b testing and web optimization was published on Customer Intelligence Blog.

9月 052017
 

Correlation is a fundamental statistical concept that measures the linear association between two variables. There are multiple ways to think about correlation: geometrically, algebraically, with matrices, with vectors, with regression, and more. To paraphrase the great songwriter Paul Simon, there must be 50 ways to view your correlation! But don't "slip out the back, Jack," this article describes only seven of them.

How can we understand these many different interpretations? As the song says, "the answer is easy if you take it logically." These seven ways to view your Pearson correlation are based on the wonderful paper by Rodgers and Nicewander (1988), "Thirteen ways to look at the correlation coefficient," which I recommend for further reading.

1. Graphically

The simplest way to visualize correlation is to create a scatter plot of the two variables. A typical example is shown to the right. (Click to enlarge.) The graph shows the heights and weights of 19 students. The variables have a strong linear "co-relation," which is Galton's original spelling for "correlation." For these data, the Pearson correlation is r = 0.8778, although few people can guess that value by looking only at the graph.

For data that are approximately bivariate normal, the points will align northwest-to-southeast when the correlation is negative and will align southwest-to-northeast when the data are positively correlated. If the point-cloud is an amorphous blob, the correlation is close to zero.

In reality, most data are not well-approximated by a bivariate normal distribution. Furthermore, Anscombe's Quartet provides a famous example of four point-clouds that have exactly the same correlation but very different appearances. (See also the images in the Wikipedia article about correlation.) So although a graphical visualization can give you a rough estimate of a correlation, you need computations for an accurate estimate.

2. The sum of crossproducts

In elementary statistics classes, the Pearson sample correlation coefficient between two variables x and y is usually given as a formula that involves sums. The numerator of the formula is the sum of crossproducts of the centered variables. The denominator involves the sums of squared deviations for each variable. In symbols:

The terms in the numerator involve the first central moments and the terms in the denominator involve the second central moments. Consequently, Pearson's correlation is sometimes called the product-moment correlation.

3. The inner product of standardized vectors

I have a hard time remembering complicated algebraic formulas. Instead, I try to visualize problems geometrically. The way I remember the correlation formula is as the inner (dot) product between two centered and standardized vectors. In vector notation, the centered vectors are x - x̄ and y - ȳ. A vector is standardized by dividing by its length, which in the Euclidean norm is the square root of the sum of the square of the elements. Therefore you can define u = (x - x̄) / || x - x̄ || and v = (y - ȳ) / || y - ȳ ||. Looking back at the equation in the previous section, you can see that the correlation between the vectors x and y is the inner product r = u · v. This formula shows that the correlation coefficient is inavariant under affine transformations of the data.

4. The angle between two vectors

Linear algebra teaches that the inner product of two vectors is related to the angle between the vectors. Specifically, u · v = ||u|| ||v|| cos(θ), where θ is the angle between the vectors u and v. Dividing both sides by the lengths of u and v and using the equations in the previous section, we find that r = cos(θ), where θ is the angle between the vectors.

This equation gives two important facts about the Pearson correlation. First, the correlation is bounded in the interval [-1, 1]. Second, it gives the correlation for three important geometric cases. When x and y have the same direction (θ = 0), then their correlation is +1. When x and y have opposite directions (θ = π), then their correlation is -1. When x and y are orthogonal (θ = π/2), their correlation is 0.

5. The standardized covariance

Recall that the covariance between two variables x and y is

The covariance between two variables depends on the scales of measurement, so the covariance is not a useful statistic for comparing the linear association. However, if you divide by the standard deviation of each variable, then the variables become dimensionless. Recall that the standard deviation is the square root of the variance, which for the x variable is given by

Consequently, the expression sxy / (sx sy) is a standardized covariance. If you expand the terms algebraically, the "n - 1" terms cancel and you are left with the Pearson correlation.

6. The slope of the regression line between two standardized variables

Most textbooks for elementary statistics mention that the correlation coefficient is related to the slope of the regression line that estimates y from x. If b is the slope, then the result is r = b (sx / sy). That's a messy formula, but if you standardize the x and y variables, then the standard deviations are unity and the correlation equals the slope.

This fact is demonstrated by using the same height and weight data for 19 students. The graph to the right shows the data for the standardized variables, along with an overlay of the least squares regression line. The slope of the regression line is 0.8778, which is the same as the correlation between the variables.

7. Geometric mean of regression slopes

The previous section showed a relationship between two of the most important concepts in statistics: correlation and regression. Interestingly, the correlation is also related to another fundamental concept, the mean. Specifically, when two variables are positively correlated, the correlation coefficient is equal to the geometric mean of two regression slopes: the slope of y regressed on x (bx) and the slope of x regressed on y (by).

To derive this result, start from the equation in the previous section, which is r = bx (sx / sy). By symmetry, it is also true that r = by (sy / sx). Consequently, r2 = bx by or r = sqrt( bx by ), which is the geometric mean of the two slopes..

Summary

This article shows multiple ways that you can view a correlation coefficient: analytically, geometrically, via linear algebra, and more. The Rodgers and Nicewander (1988) article includes other ideas, some of which are valid only asymptotically or approximately.

If you want to see the computations for these methods applied to real data, you can download a SAS program that produces the computations and graphs in this article.

What is your favorite way to think about the correlation between two variables? Leave a comment.

The post 7 ways to view correlation appeared first on The DO Loop.

9月 012017
 

International Talk Like a Pirate Day is Sept 19 ... which always gets me thinking and wondering about modern-day pirates. Most movies usually focus on pirates from the Golden Age of piracy (a couple hundred years ago), when pirates typically stole ships and booty (treasure). But modern day pirates usually board [...]

The post Where do modern day pirates attack? appeared first on SAS Learning Post.

9月 012017
 

For those of you using SAS Customer Intelligence 360, or generally just interested in the space around  web content targeting and personalization capabilities, I wanted to share the latest with what we’re doing at SAS.

As the SAS web experience division, we use SAS Customer Intelligence 360 on sas.com for data collection and a variety of a/b testing and content targeting. One of the things we’re starting to use it for is industry segmentation and targeting. Not in a creepy, over-the-top kind of way, but more subtle; in a way that helps to improve the user experience on sas.com.

One of the larger user segments we have on sas.com are academic-related visitors students, educators and researchers. That’s great for us, because these are important users of SAS.

The other good thing is that we already have great content and resources for those visitors on sas.com as part of our SAS Academic Program section. The challenge and opportunity is: “How do we surface that relevant content to the right user, in a way that then allows them to find what they’re looking for quicker, easier and more efficiently?” That's good for our users and good for us.

For us, part of the answer is to use SAS Customer Intelligence 360 targeting features that allow us to identify the right user, and deliver the right message.

It's as easy as 1-2-3

The way it works is pretty simple:

  1. We set up a spot in SAS Customer Intelligence 360. A spot is a place on a web page where we’ll target and deliver content to. In this instance, the spot we created is the feature banner spot on the sas.com homepage.

  2. Then we simply set up the creative. The creative is the content we'll use to deliver to the spot we created. For this targeting, we created a new homepage feature creative that is designed specifically for our academic segment students, educators, researchers and drives them to a specific call to action to our SAS Academic Program content on sas.com. So most visitors will see a standard promo about X, while students and educators will see the page below.
  3. Lastly, we create the web task. A web task allows us to target content to the website visitor and collect insights and data performance. When we set up this task, one of the steps is to create rules for targeting as applicable. For this experiment, we chose to target by industry group, one of the targeting options within SAS Customer Intelligence 360 that allows us to select an industry group SIC code (Standard Industrial Classification codes). We selected education services. We also created a targeted rule (see the image below) for any referral traffic that comes to sas.com from a top-level domain URL ending in ".edu", since we have a good number of backlinks to sas.com from academic institutions who use SAS.

Once activated, SAS Customer Intelligence 360 recognizes a users IP and associated SIC code, or referral URL, that identified them as a visitor from an academic institution. It then places place them into our segmentation bucket and displays our targeted content and experience.

There's a ton more we can do around this, but to get up and running it's incredibly easy and and proving effective (see the performance tracking below).

If you're interested in learning more, here are the latest and greatest in free tutorials and SAS Customer Intelligence 360 documentation.

 

Using SAS at SAS: 3 simple steps to better web content targeting was published on Customer Intelligence Blog.