11月 272018
 

Like many b2b and b2c organizations, our corporate website, www.sas.com, is a critical channel for how people learn about SAS and interact with us in the digital space. We have millions of visitors from around the world on a monthly basis looking to learn more about who we are, what [...]

Using SAS at SAS: The power of experimentation using SAS Customer Intelligence 360 was published on Customer Intelligence Blog.

11月 272018
 

Do you remember The Matrix movies, that started coming out in 1999? Hopefully this movie franchise didn't give you a fear of virtual reality and AI. The thing I remember most from the movie was the really cool slow-motion video effects (from multiple angles) in the fight scenes. And the [...]

The post Creating a 'Matrix-like' movie scroll animation, from a text file appeared first on SAS Learning Post.

11月 262018
 
Funnel plot of the proportion of unimmunized students in NC kindergarten classes

Last week my colleague, Robert Allison, visualized data regarding immunization rates for kindergarten classes in North Carolina. One of his graphs was a scatter plot that displayed the proportion of unimmunized students versus the size of the class for 1,885 kindergarten classes in NC. This scatter plot is the basis for a statistical plot that is known as a funnel plot for proportions. The plot to the right shows a funnel plot that I created based on Robert's analysis.

The basic idea of a funnel plot is that small random samples have more variation than large random samples from the same population. If students are randomly chosen from a population in which some small proportion of children have an attribute, it might not be unusual if 40% of the students in a five-student class have the attribute (that's 2 out of 5) whereas it might be highly unusual to see such a high proportion in a 100-student class. The funnel plot enhances a scatter plot by adding curves that indicate a reasonable range of values for the proportion, given the size of a random sample.

For a discussion of funnel plots and how to create a funnel plot in SAS, see the article "Funnel plots for proportions." You can download the immunization data and the SAS program that I used to create the funnel plot for proportions.

A funnel plot for proportions

The preceding funnel plot contains the following statistical features:
  • A scatter plot of the data. Each marker represents a school.
  • A horizontal reference line. The line indicates the average proportion for the data. For these data, the statewide average proportion of unimmunized kindergarteners is 4.58%.
  • A curve that indicates an upper confidence limit for the proportion of unimmunized students, assuming that the classes are a random sample from a population in which 4.58% of the students are unimmunized. When a marker (school) appears above the curve, the proportion of unimmunized students in that school is significantly higher than the statewide proportion.

This funnel plot uses the shape (and color) of a marker to indicate whether the school is public (circle), private (upward-pointing triangle), or a charter school (right-pointing triangle). The plot includes tool tips so that you can hover the mouse over a marker and see the name and county of the school.

The graph indicates that there are dozens of schools for which the proportion of unimmunized students far exceeds the state average. A graph like this enables school administrators and public health officials to identify schools that have a larger-than-expected proportion of unimmunized students. Identifying schools is the first step to developing initiatives that can improve the immunization rate in school-age children.

Funnel plots for each school district

You can use a WHERE statement or BY-group processing in PROC SGPLOT to create a funnel plot for each county. A graph that shows only the schools in a particular district is more relevant for local school boards and administrators. The following graphs show the proportion of unimmunized students in Mecklenburg County (near Charlotte) and Wake County (near Raleigh), which are the two largest school districts in NC.

Proportion of unimmunized students in Mecklenburg County kindergarten classes
Proportion of unimmunized students in Wake County kindergarten classes

The first graph shows that Mecklenburg County has several schools that contain more than 60 kindergarten students and for which 25% or more of the students are unimmunized. In fact, some large schools have more than 40% unimmunized! In Wake County, fewer schools have extreme proportions, but there are still many schools for which the proportion of unimmunized students is quite large relative to the statewide average.

As Robert pointed out in his blog post, these are not official figures, so it is possible that some of the extreme proportions are caused by data entry errors rather than by hordes of unimmunized students.

Funnel plots for public, private, and charter schools

The following graph shows a panel of plots for public, private, and charter schools. There are many public schools whose proportion of unimmunized students is well above the statewide average. For the private and charter schools, about 10 schools stand out in each group.

I think the plot of private schools is particularly interesting. When the media reports on outbreaks of childhood diseases in schools, there is often a mention of a "religious exemption," which is when a parent or guardian states that a child will not be immunized because of their religious beliefs. The report often mentions that private schools are often affiliated with a particular religion or church. I've therefore assumed that private schools have a larger proportion of unimmunized students because of the religious exemption. These data do not indicate which students are unimmunized because of a religious exemption, but the panel of funnel plots indicates that, overall, not many private schools have an abnormally large proportion of unimmunized students. In fact, the private schools show smaller deviations from the expected value than the public and charter schools.

Summary

In summary, I started with one of Robert Allison's graphs and augmented it to create a funnel plot for proportions. A funnel plot shows the average proportion and confidence limits for proportions (given the sample size). If the students in the schools were randomly sampled from a population where 4.58% of students are unimmunized, then few schools would be outside of the confidence curve. Of course, in reality, schools are not random samples. Many features of school performance—including unimmunized students—depend on local socioeconomic conditions. By taking into account the size of the classes, the funnel plot identifies schools that have an exceptionally large proportion of unimmunized students. A funnel plot can help school administrators and public health officials identify schools that might benefit from intervention programs and additional resources, or for which the data were incorrectly entered.

The post A funnel plot for immunization rates appeared first on The DO Loop.

11月 242018
 
First load necessary packages

import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D
import keras.backend as K
import scipy, imageio
import matplotlib.pyplot as plt
from PIL import Image
%matplotlib inline
Then show original picture of my Jeep

# 首先将图片读入为矩阵
# 我们可以用pyplot的imshow()方法来展示图片
# 这是我曾经拥有的牧马人JK Rubicon Unlimited
#
img_data = imageio.imread('./pics/wranglerJK.jpg')
print(img_data.data.shape)

img = Image.fromarray(img_data, 'RGB')
plt.imshow(img)


添加图片说明


Now, build our 2-D convolutional function that takes a custom filter matrix and comput the filtered output image matrix.
def my_init(shape, dtype=None):
new_mat = np.zeros((shape[0], shape[1], 3, 3))
for i in range(shape[0]):
for j in range(shape[1]):
new_mat[:, :, i, j] = filter_mat
return np.array(new_mat, dtype=dtype)


def MyFilter(filter_mat):
print(len(filter_mat.shape))
if len(filter_mat.shape)!=2:
print('Invalid filter matrix. It must be 2-D')
return []
else:
kernel_size=filter_mat.shape
row, col, depth = img_data.shape
input_shape=img_data.shape
filter_size = row*col*depth
print(filter_size)


model = Sequential()
model.add(Conv2D(depth,
kernel_size=kernel_size,
input_shape=input_shape,
padding='same',
activation='linear',
data_format='channels_last',
kernel_initializer=my_init,
name='Conv')
)
model.add(Dense(1, activation='linear'))
model.compile(optimizer='sgd', loss='mse')
model.summary()


inX = model.input
outputs = [layer.output for layer in model.layers if layer.name=='Conv']
functions = [K.function([inX], [out]) for out in outputs]

layer_outs = [func([img_data.reshape(1, row, col, depth)]) for func in functions]
activationLayer = layer_outs[0][0]
temp = (activationLayer-np.min(activationLayer))
normalized_activationLayer = temp/np.max( np.max(temp))
return(normalized_activationLayer.reshape(row, col, depth))
Now, insert our own fixed filter matrix and get the output, using pyplot.imshow() to display the filtered picture. This time we throw in an edge detector.
filter_mat = np.array([-1, -2, -3, 0, 0, 0, 1, 2, 3]).reshape(3, 3)

outLayer = MyFilter(filter_mat)
plt.imshow(outLayer)
Below is the filtered picture.


 
 Posted by at 2:39 上午
11月 222018
 

This week I noticed that they've started building the lot where they sell Christmas trees near SAS (at the intersection of Maynard & Reedy Creek Rd). They put up a nice rustic wooden fence, and lights, and maybe even a fire pit to keep their workers warm. They sell some [...]

The post Finding a cut-your-own Christmas tree in North Carolina appeared first on SAS Learning Post.

11月 222018
 

This week I noticed that they've started building the lot where they sell Christmas trees near SAS (at the intersection of Maynard & Reedy Creek Rd). They put up a nice rustic wooden fence, and lights, and maybe even a fire pit to keep their workers warm. They sell some [...]

The post Finding a cut-your-own Christmas tree in North Carolina appeared first on SAS Learning Post.

11月 212018
 

Burger and fries, wine and cheese, peanut butter and jelly … some things just go better together. For organizations embarking on digital transformation, AI and IoT just go better together. These two distinct technologies; AI and IoT (or AIoT) are a natural fit. To take an analogy from the human [...]

AI and IoT, better together to accelerate digital transformation was published on SAS Voices by David Tareen

11月 212018
 

A data analyst asked how to compute parameter estimates in a linear regression model when the underlying data matrix is rank deficient. This situation can occur if one of the variables in the regression is a linear combination of other variables. It also occurs when you use the GLM parameterization of a classification variable. In the GLM parameterization, the columns of the design matrix are linearly dependent. As a result, the matrix of crossproducts (the X`X matrix) is singular. In either case, you can understand the computation of the parameter estimates learning about generalized inverses in linear systems. This article presents an overview of generalized inverses. A subsequent article will specifically apply generalized inverses to the problem of estimating parameters for regression problems with collinearities.

What is a generalized inverse?

Recall that the inverse matrix of a square matrix A is a matrix G such as A*G = G*A = I, where I is the identity matrix. When such a matrix exists, it is unique and A is said to be nonsingular (or invertible). If there are linear dependencies in the columns of A, then an inverse does not exist. However, you can define a series of weaker conditions that are known as the Penrose conditions:

  1. A*G*A = A
  2. G*A*G = G
  3. (A*G)` = A*G
  4. (G*A)` = G*A

Any matrix, G, that satisfies the first condition is called a generalized inverse (or sometimes a "G1" inverse) for A. A matrix that satisfies the first and second condition is called a "G2" inverse for A. The G2 inverse is used in statistics to compute parameter estimates for regression problems (see Goodnight (1979), p. 155). A matrix that satisfies all four conditions is called the Moore-Penrose inverse or the pseudoinverse. When A is square but singular, there are infinitely many matrices that satisfy the first two conditions, but the Moore-Penrose inverse is unique.

Computations with generalized inverses

In regression problems, the parameter estimates are obtained by solving the "normal equations." The normal equations are the linear system (X`*X)*b = (X`*Y), where X is the design matrix, Y is the vector of observed responses, and b is the parameter estimates to be solved. The matrix A = X`*X is symmetric. If the columns of the design matrix are linearly dependent, then A is singular. The following SAS/IML program defines a symmetric singular matrix A and a right-hand-side vector c, which you can think of as X`*Y in the regression context. The call to the DET function computes the determinant of the matrix. A zero determinant indicates that A is singular and that there are infinitely many vectors b that solve the linear system:

proc iml;
A = {100  50 20 10,
      50 106 46 23,
      20  46 56 28,
      10  23 28 14 };
c = {130, 776, 486, 243};
 
det = det(A);         /* demonstrate that A is singular */
print det;

For nonsingular matrices, you can use either the INV or the SOLVE functions in SAS/IML to solve for the unique solution of the linear system. However, both functions give errors when called with a singular matrix. SAS/IML provides several ways to compute a generalized inverse, including the SWEEP function and the GINV function. The SWEEP function is an efficient way to use Gaussian elimination to solve the symmetric linear systems that arise in regression. The GINV function is a function that computes the Moore-Penrose pseudoinverse. The following SAS/IML statements compute two different solutions for the singular system A*b = c:

b1 = ginv(A)*c;       /* solution even if A is not full rank */
b2 = sweep(A)*c;
print b1 b2;

The SAS/IML language also provides a way to obtain any of the other infinitely many solutions to the singular system A*b = c. Because A is a rank-1 matrix, it has a one-dimensional kernel (null space). The HOMOGEN function in SAS/IML computes a basis for the null space. That is, it computes a vector that is mapped to the zero vector by A. The following statements compute the unit basis vector for the kernel. The output shows that the vector is mapped to the zero vector:

xNull = homogen(A);   /* basis for nullspace of A */
print xNull (A*xNull)[L="A*xNull"];

All solutions to A*b = c are of the form b + α*xNull, where b is any particular solution.

Properties of the Moore-Penrose solution

You can verify that the Moore-Penrose matrix GINV(A) satisfies the four Penrose conditions, whereas the G2 inverse (SWEEP(A)) only satisfies the first two conditions. I mentioned that the singular system has infinitely many solutions, but the Moore-Penrose solution (b1) is unique. It turns out that the Moore-Penrose solution is the solution that has the smallest Euclidean norm. Here is a computation of the norm for three solutions to the system A*b = c:

/* GINV gives the estimate that has the smallest L2 norm: */
GINVnorm  = norm(b1);
sweepNorm = norm(b2);
/* you can add alpha*xNull to any solution to get another solution */
b3 = b1 + 2*xNull;  /* here's another solution (alpha=2) */
otherNorm = norm(b3);
print ginvNorm sweepNorm otherNorm;

Because all solutions are of the form b1 + α*xNull, where xNull is the basis for the nullspace of A, you can graph the norm of the solutions as a function of α. The graph is shown below and indicates that the Moore-Penrose solution is the minimal-norm solution:

alpha = do(-2.5, 2.5, 0.05);
norm = j(1, ncol(alpha), .);
do i = 1 to ncol(alpha);
   norm[i] = norm(b1 + alpha[i] * xNull);
end;
title "Euclidean Norm of Solutions b + alpha*xNull";
title2 "b = Solution from Moore-Penrose Inverse";
title3 "xNull = Basis for Nullspace of A";
call series(alpha, norm) 
     other="refline 0 / axis=x label='b=GINV';refline 1.78885 / axis=x label='SWEEP';";
Graph of norm of solutions to the singular system A*b=c. The norm is plotted for vectors b + alpha*x_Null where b is the Moore-Penrose solution and x_Null is a basis for the nullspace of A.

In summary, a singular linear system has infinitely many solutions. You can obtain a particular solution by using the sweep operator or by finding the Moore-Penrose solution. You can use the HOMOGEN function to obtain the full family of solutions. The Moore-Penrose solution is expensive to compute but has an interesting property: it is the solution that has the smallest Euclidean norm. The sweep solution is more efficient to compute and is used in SAS regression procedures such as PROC GLM to estimate parameters in models that include classification variables and use a GLM parameterization. The next blog post explores regression estimates in more detail.

The post Generalized inverses for matrices appeared first on The DO Loop.

11月 212018
 

I recently read an article that said a school in Asheville, North Carolina had the worst chickenpox outbreak in the state in 2 decades. The article was interesting, and it also let me know I had a hole in my knowledge ... "What?!? - There's a chickenpox vaccine?!?" When I [...]

The post Immunization rates in North Carolina schools appeared first on SAS Learning Post.