10月 042018
 

You can now conduct a live demo of SAS Visual Analytics on your mobile device to participants who are geographically dispersed by using the Present Screen feature, a tucked-away option in the SAS Visual Analytics app for iPad, iPhone, and Android devices. Let’s say I am looking at a report on my mobile device, and I have questions about a couple of items for my colleagues Joe and Anita, both of whom are located in two different cities. The three of us are able to see the report while I demo it, drawing their attention to specific areas of interest in the report.

Sitting in my office, or from any location where I have a Wi-Fi or cellular connection, I can use the Present Screen feature to do a live shared presentation with Joe and Anita. And I don't have to present just one report. During the live presentation, I can close a report, open a different one, perform interactions, and move around in the app between different reports.

And here’s the real beauty of this feature. Neither Joe nor Anita need to have the SAS Visual Analytics app on a mobile device, or SAS Visual Analytics running on a desktop. The only requirement for participants is that they have a mobile device (could be an iOS, Android, or Windows device), a laptop, or a desktop system.  Internet access via Wi-Fi (or a cellular connection for mobile devices), an email client for receiving an email notification, and a Web browser are necessary. VPN connectivity might be required if the participants' organization requires VPN.

Plus, you can conduct your live presentation for up to 10 people! Before we move on, note that you can also use the iOS feature, AirDrop, on your iOS device to engage participants with the Present Screen feature. This is useful if you’re in a room with a bunch of folks who have iOS devices, and you want to do live sharing.

Ready to try it? Here’s a short checklist of what you need to do a live presentation of SAS Visual Analytics reports.

Requirements for the presenter

  • Use any one of these devices to present your screen for a shared presentation to your participants: iPad, iPhone, Android tablet or smartphone
  • SAS Visual Analytics app installed on your device
  • Connection via Wi-Fi or cellular to the server where the report(s) reside
  • Subscription to the reports that you want to present and share
  • Email client on the iPad or phone
  • VPN connectivity if your organization requires it

Couple of things to note

Say you're presenting to 10 participants via email or AirDrop.  Note that as the presenter, you must have the SAS Visual Analytics app open in your device with the Present Screen feature selected and active for your guests to be able to see your reports. Once you exit the app, the screen presentation session ends and the email or AirDrop invitations are no longer valid.

And a note on participant information. After a participant forwards an email invitation to view the presentation to a colleague, the recipient is required to enter a name and email address before joining your live screen presentation. You, as the presenter, get to see the names of the folks that have joined your screen presentation. Participant names and email addresses simply let the presenter know who all have joined to presentation. Neither the participants' names nor their email addresses are validated by the SAS Visual Analytics server.

Supported versions for SAS Visual Analytics reports

You can present and share reports with the Present Screen feature for these versions of SAS Visual Analytics:

7.3, 7.4, 8.1, 8.2, 8.3

How to start the screen presentation

In the Analytics report I'm subscribed to from the SAS Visual Analytics app on my iPad, I choose Present Screen.

Choosing Present Screen in the subscribed report

I am reminded that I can present my screen to a maximum of 10 participants. I click OK.

The app prompts me to send email or choose AirDrop to present my screen to participants - I choose email.

My email client opens, and the app includes instructions along with the link that takes them to my live screen presentation. I enter the email addresses for my participants and send the email.

In the email that Anita receives on her desktop PC, she taps on the link which takes her to her Web browser where my screen presentation is set to start soon.

Anita enters her name and email address, and notes that the presentation has not yet started.

Joe, who has logged on to the presentation from his Android phone, is also presented with the same message on his smartphone.

To begin the screen presentation, I tap on the blinking cursor in the app.

The app reminds me that now my participants can see everything on my iPad screen.

Next, a blue bar at the top of the report indicates that my screen presentation is live and can be seen by Anita and Joe. Now I can begin my report presentation, or exit this report and open a different report to share.

Here is my screen on the iPad Mini where I started my screen presentation:

Anita's screen on her Windows desktop monitor:

Joe's screen on the Android smartphone:

When I am finished with the presentation, I tap on the 'stop' button to end it.

A message displays to indicate that the presentation has ended. Here's an example of that message from Joe's Android smartphone.

Do it live! How to present your screen from the SAS Visual Analytics app was published on SAS Users.

10月 032018
 

It is sometimes necessary for researchers to simulate data with thousands of variables. It is easy to simulate thousands of uncorrelated variables, but more difficult to simulate thousands of correlated variables. For that, you can generate a correlation matrix that has special properties, such as a Toeplitz matrix or a first-order autoregressive (AR(1)) correlation matrix. I have previously written about how to generate a large Toeplitz matrix in SAS. This article describes three useful results for an AR(1) correlation matrix:

  • How to generate an AR(1) correlation matrix in the SAS/IML language
  • How to use a formula to compute the explicit Cholesky root of an AR(1) correlation matrix.
  • How to efficiently simulate multivariate normal variables with AR(1) correlation.

Generate an AR(1) correlation matrix in SAS

The AR(1) correlation structure is used in statistics to model observations that have correlated errors. (For example, see the documentation of PROC MIXED in SAS.) If Σ is AR(1) correlation matrix, then its elements are constant along diagonals. The (i,j)th element of an AR(1) correlation matrix has the form Σ[i,j] = ρ|i – j|, where ρ is a constant that determines the geometric rate at which correlations decay between successive time intervals. The exponent for each term is the L1 distance between the row number and the column number. As I have shown in a previous article, you can use the DISTANCE function in SAS/IML 14.3 to quickly evaluate functions that depend on the distance between two sets of points. Consequently, the following SAS/IML function computes the correlation matrix for a p x p AR(1) matrix:

proc iml;
/* return p x p matrix whose (i,j)th element is rho^|i - j| */
start AR1Corr(rho, p); 
   return rho##distance(T(1:p), T(1:p), "L1");
finish;
 
/* test on 10 x 10 matrix with rho = 0.8 */
rho = 0.8;  p = 10;
Sigma = AR1Corr(rho, p);
print Sigma[format=Best7.];

A formula for the Cholesky root of an AR(1) correlation matrix

Every covariance matrix has a Cholesky decomposition, which represents the matrix as the crossproduct of a triangular matrix with itself: Σ = RTR, where R is upper triangular. In SAS/IML, you can use the ROOT function to compute the Cholesky root of an arbitrary positive definite matrix. However, the AR(1) correlation matrix has an explicit formula for the Cholesky root in terms of ρ. This explicit formula does not appear to be well known by statisticians, but it is a special case of a general formula developed by V. Madar (Section 5.1, 2016), who presented a poster at a Southeast SAS Users Group (SESUG) meeting a few years ago. An explicit formula means that you can compute the Cholesky root matrix, R, in a direct and efficient manner, as follows:

/* direct computation of Cholesky root for an AR(1) matrix */
start AR1Root(rho, p);
   R = j(p,p,0);                /* allocate p x p matrix */
   R[1,] = rho##(0:p-1);        /* formula for 1st row */
   c = sqrt(1 - rho**2);        /* scaling factor: c^2 + rho^2 = 1 */
   R2 = c * R[1,];              /* formula for 2nd row */
   do j = 2 to p;               /* shift elements in 2nd row for remaining rows */
      R[j, j:p] = R2[,1:p-j+1]; 
   end;
   return R;
finish;
 
R = AR1Root(rho, p);   /* compare to R = root(Sigma), which requires forming Sigma */
print R[L="Cholesky Root" format=Best7.];

You can compute an AR(1) covariance matrix from the correlation matrix by multiplying the correlation matrix by a positive scalar, σ2.

Efficient simulation of multivariate normal variables with AR(1) correlation

An efficient way to simulate data from a multivariate normal population with covariance Σ is to use the Cholesky decomposition to induce correlation among a set of uncorrelated normal variates. This is the technique used by the RandNormal function in SAS/IML software. Internally, the RandNormal function calls the ROOT function, which can compute the Cholesky root of an arbitrary positive definite matrix.

When there are thousands of variables, the Cholesky decomposition might take a second or more. If you call the RandNormal function thousands of times during a simulation study, you pay that one-second penalty during each call. For the AR(1) covariance structure, you can use the explicit formula for the Cholesky root to save a considerable amount of time. You also do not need to explicitly form the p x p matrix, Σ, which saves RAM. The following SAS/IML function is an efficient way to simulate N observations from a p-dimensional multivariate normal distribution that has an AR(1) correlation structure with parameter ρ:

/* simulate multivariate normal data from a population with AR(1) correlation */
start RandNormalAR1( N, Mean, rho );
    mMean = rowvec(Mean);
    p = ncol(mMean);
    U = AR1Root(rho, p);      /* use explicit formula instead of ROOT(Sigma) */
    Z = j(N,p);
    call randgen(Z,'NORMAL');
    return (mMean + Z*U);
finish;
 
call randseed(12345);
p = 1000;                  /* big matrix */
mean = j(1, p, 0);         /* mean of MVN distribution */
 
/* simulate data from MVN distribs with different values of rho */
v = do(0.01, 0.99, 0.01);  /* choose rho from list 0.01, 0.02, ..., 0.99 */
t0 = time();               /* time it! */
do i = 1 to ncol(v);
   rho = v[i];
   X = randnormalAR1(500, mean, rho);  /* simulate 500 obs from MVN with p vars */
end;
t_SimMVN = time() - t0;     /* total time to simulate all data */
print t_SimMVN;

The previous loop generates a sample that contains N=500 observations and p=1000 variables. Each sample is from a multivariate normal distribution that has an AR(1) correlation, but each sample is generated for a different value of ρ, where ρ = 0.01. 0.02, ..., 0.99. On my desktop computer, this simulation of 100 correlated samples takes about 4 seconds. This is about 25% of the time for the same simulation that explicitly forms the AR(1) correlation matrix and calls RandNormal.

In summary, the AR(1) correlation matrix is an easy way to generate a symmetric positive definite matrix. You can use the DISTANCE function in SAS/IML 14.3 to create such a matrix, but for some applications you might only require the Cholesky root of the matrix. The AR(1) correlation matrix has an explicit Cholesky root that you can use to speed up simulation studies such as generating samples from a multivariate normal distribution that has an AR(1) correlation.

The post Fast simulation of multivariate normal data with an AR(1) correlation structure appeared first on The DO Loop.

10月 022018
 

For those new to SAS Customer Intelligence 360, its functionality can seem vast, but I found it easy to jump in and get started. With this cloud-based software you can combine multiple data sources to create a full picture of your site visitors and better know your customers. SAS Customer [...]

Using SAS at SAS: A new user's perspective on SAS Customer Intelligence 360 was published on Customer Intelligence Blog.

10月 022018
 

For those new to SAS Customer Intelligence 360, its functionality can seem vast, but I found it easy to jump in and get started. With this cloud-based software you can combine multiple data sources to create a full picture of your site visitors and better know your customers. SAS Customer [...]

Using SAS at SAS: A new user's perspective on SAS Customer Intelligence 360 was published on Customer Intelligence Blog.

10月 022018
 

Over the years, the US has drilled for crude oil in several locations, such as Pennsylvania, Texas, Alaska, and the Gulf of Mexico. A few years ago, as the US started drilling more in North Dakota, there were forecasts that we would surpass Saudi Arabia in crude oil production. And recently, [...]

The post Does the US really produce the most oil now?!? appeared first on SAS Learning Post.

10月 022018
 

Once a disaster is over, and the frenzy of news stories and social media posts has subsided, it can seem like the crisis has passed. However, for those Hurricane Florence survivors left with ruined homes and businesses, damaged schools and buildings, there remains a struggle to return to normalcy. As [...]

SAS mobilizes volunteers, analytics experts to support Hurricane Florence relief was published on SAS Voices by I-sah Hsieh

10月 022018
 

I often get asked for programming tips. Here, I share three of my favorite tips for beginners. Tip #1: COUNTC and CATS Functions Together The CATS function concatenates all of its arguments after it strips leading and trailing blanks. The COUNTC function counts characters. Together, they can let you operate [...]

The post Three of My Favorite Programming Tips appeared first on SAS Learning Post.

10月 012018
 

The solar farm at SAS world headquarters is a treasure trove of data. Jessica Peter, Senior User Experience Designer at SAS, had an idea about using that treasure in an art installation to show how data can tell a story. Her idea became a reality when she and others at SAS [...]

What can you learn from an AI art installation? was published on SAS Voices by Lane Whatley

10月 012018
 

Programmers on a SAS discussion forum recently asked about the chi-square test for proportions as implemented in PROC FREQ in SAS. One person asked the basic question, "how do I test the null hypothesis that the observed proportions are equal to a set of known proportions?" Another person said that the null hypothesis was rejected for his data, and he wanted to know which categories were "responsible for the rejection." This article answers both questions and points out a potential pitfall when you specify the proportions for a chi-square goodness-of-fit test in PROC FREQ.

The basic idea: The proportion of party affiliations for a group of voters

To make these questions concrete, let's look at some example data. According to a 2016 Pew research study, the party affiliation of registered voters in the US in 2016 was as follows: 33% of voters registered as Democrats, 29% registered as Republicans, 34% were Independents, and 4% registered as some other party. If you have a sample of registered voters, you might want to ask whether the observed proportion of affiliations matches the national averages. The following SAS data step defines the observed frequencies for a hypothetical sample of 300 voters:

data Politics;
length Party $5;
input Party $ Count;
datalines;
Dem   125
Repub  79
Indep  86
Other  10
;

You can use the TESTP= option on the TABLES statement in PROC FREQ to compare the observed proportions with the national averages for US voters. You might assume that the following statements perform the test, but there is a potential pitfall. The following statements contain a subtle error:

proc freq data=Politics;
/* National Pct:    D=33%, R=29%, I=34%, Other=04% */
tables Party / TestP=(0.33 0.29 0.34 0.04) nocum; /* WARNING: Contains an error! */
weight Count;
run;

If you look carefully at the OneWayFreqs table that is produced, you will see that the test proportions that appear in the fourth column are not the proportions that we intended to specify! The problem is that order of the categories in the table is alphabetical whereas the proportions in the LISTP= option correspond to the order that the categories appear in the data. In an effort to prevent this mistake, the documentation for the TESTP= option warns you to "order the values to match the order in which the corresponding variable levels appear in the one-way frequency table." The order of categories is important in many SAS procedures, so always think about the order! (The ESTIMATE and CONTRAST statements in linear regression procedures are other statements where order is important.)

Specify the test proportions correctly

To specify the correct order, you have two options: (1) list the proportions for the TESTP= option according to the alphabetical order of the categories, or (2) use the ORDER=DATA option on the PROC FREQ statement to tell the procedure to use the order of the categories as they appear in the data. The following statement uses the ORDER=DATA option to specify the proportions:

proc freq data=Politics ORDER=DATA;   /* list proportions in DATA order */
/*                 D=33%, R=29%, I=34%, Other=04% */
tables Party / TestP=(0.33 0.29 0.34 0.04);  /* cellchi2 not available for one-way tables */
weight Count;
ods output OneWayFreqs=FreqOut;
output out=FreqStats N ChiSq;
run;

The analysis is now correctly specified. The chi-square table indicates that the observed proportions are significantly different from the national averages at the α = 0.05 significance level.

Which categories are "responsible" for rejecting the null hypothesis?

A SAS programmer posted a similar analysis on a discussion and asked whether it was possible to determine which categories were the most different from the specified proportions. The analysis shows that the chi-square test rejects the null hypothesis, but does not indicate whether only one category is different than expected or whether many categories are different.

Interestingly, PROC FREQ supports such an option for two-way tables when the null hypothesis is the independence of the two variables. Recall that the chi-square statistic is a sum of squares, where each cell in the table contributes one squared value to the sum. The CELLCHI2 option on the TABLES statement "displays each table cell’s contribution to the Pearson chi-square statistic.... The cell chi-square is computed as
(frequencyexpected)2 / expected
where frequency is the table cell frequency (count) and expected is the expected cell frequency" under the null hypothesis.

Although the option is not supported for one-way tables, it is straightforward to use the DATA step to compute each cell's contribution. The previous call to PROC FREQ used the ODS OUTPUT statement to write the OneWayFreqs table to a SAS data set. It also wrote a data set that contains the sample size and the chi-square statistic. You can use these statistics as follows:

/* create macro variables for sample size and chi-square statistic */
data _NULL_;
   set FreqStats;
   call symputx("NumObs", N);         
   call symputx("TotalChiSq", _PCHI_);
run;
 
/* compute the proportion of chi-square statistic that is contributed
   by each cell in the one-way table */
data Chi2;
   set FreqOut;
   ExpectedFreq = &NumObs * TestPercent / 100;
   Deviation = Frequency - ExpectedFreq;
   ChiSqContrib = Deviation**2 / ExpectedFreq;  /* (O - E)^2 / E */
   ChiSqPropor = ChiSqContrib / &TotalChiSq;    /* proportion of chi-square contributed by this cell */
   format ChiSqPropor 5.3;
run;
 
proc print data=Chi2; 
   var Party Frequency TestPercent ExpectedFreq Deviation ChiSqContrib ChiSqPropor; 
run;

The table shows the numbers used to compute the chi-square statistic. For each category of the PARTY variable, the table shows the expected frequencies, the deviations from the expected frequencies, and the chi-square term for each category. The last column is the proportion of the total chi-square statistic for each category. You can see that the 'Dem' category contributes the greatest proportion. The interpretation is that the observed count of the 'Dem' group is much greater than expected and this is the primary reason why the null hypothesis is rejected.

You can also create a bar chart that shows the contributions to the chi-square statistic. You can create the "chi-square contribution plot" by using the following statements:

title "Proportion of Chi-Square Statistic for Each Category";
proc sgplot data=Chi2;
   vbar Party / response=ChiSqPropor datalabel=ChiSqPropor;
   xaxis discreteorder=data;
   yaxis label="Proportion of Chi-Square Statistic" grid;
run;
Contribution of each cell to the chi-square statistic

The bar chart makes it clear that the frequency of the 'Dem' group is the primary factor in the size of the chi-square statistic. The "chi-square contribution plot" is a visual companion to the Deviation Plot, which is produced automatically by PROC FREQ when you specify the PLOTS=DEVIATIONPLOT option. The Deviation Plot shows whether the counts for each category are more than expected or less than expected. When you combine the two plots, you can make pronouncements like Goldilocks:

  • The 'Dem' group contributes the most to the chi-square statistic because the observed counts are "too big."
  • The 'Indep' group contributes a moderate amount because the counts are "too small."
  • The remaining groups do not contribute much because their counts are "just right."

Summary

In summary, this article addresses three topics related to testing the proportions of counts in a one-way frequency table. You can use the TESTP= option to specify the proportions for the null hypothesis. Be sure that you specify the proportions in the same order that they appear in the OneWayFreqs table. (The ORDER=DATA option is sometimes useful for this.) If the data proportions do not fit the null hypothesis, you might want to know why. One way to answer this question is to compute the contributions of each category to the total chi-square computation. This article shows how to display that information in a table or in a bar chart.

The post Chi-square tests for proportions in one-way tables appeared first on The DO Loop.