8月 202018
 

Using small multiples is a neat way to display a lot of information in a small amount of space. But depending on how deeply you want to analyze and scrutinize the data, you need to be careful in choosing just how small you make your small multiples. Let's look at [...]

The post Most common jobs in the US - a small multiples comparison appeared first on SAS Learning Post.

8月 202018
 

I love data; I’m a real and unabashed data geek. I'm the sci-fi nerd who has fun with data from Star Wars and analyzes World of Warcraft logs using SAS. More importantly, I love what data can do. I love the way it can show people new insights and new ideas, [...]

Having fun, getting involved and feeling useful: Data for Good was published on SAS Voices by Mary Osborne

8月 202018
 
Video killed the radio star....
We can't rewind, we've gone too far.
      -- The Buggles (1979)

"You kids have it easy," my father used to tell me. "When I was a kid, I didn't have all the conveniences you have today." He's right, and I could say the same thing to my kids, especially about today's handheld technology. In particular, I've noticed that powerful handheld technology (especially the modern calculator) has killed the standard probability tables that were once ubiquitous in introductory statistics textbooks.

In my first probability and statistics course, I constantly referenced the 23 statistical tables (which occupied 44 pages!) in the appendix of my undergraduate textbook. Any time I needed to compute a probability or test a hypothesis, I would flip to a table of probabilities for the normal, t, chi-square, or F distribution and use it to compute a probability (area) or quantile (critical value). If the value I needed wasn't tabulated, I had to manually perform linear interpolation from two tabulated values. I had no choice: my calculator did not have support for these advanced functions.

In contrast, kids today have it easy! When my son took AP statistics in high school, his handheld calculator (a TI-84, which costs about $100) could compute the PDF, CDF, and quantiles of all the important probability distributions. Consequently, his textbook did not include an appendix of statistical tables.

It makes sense that publishers would choose to omit these tables, just as my own high school textbooks excluded the trig and logarithm tables that were prevalent in my father's youth. When handheld technology can reproduce all the numbers in a table, why waste the ink and paper?

In fact, by using SAS software, you can generate and display a statistical table with only a few statements. To illustrate this, let's use SAS to generate two common statistical tables: a normal probability table and a table of critical values for the chi-square statistic.

A normal probability table

A normal probability table gives an area under the standard normal density curve. As discussed in a Wikipedia article about the standard normal table, there are three equivalent kinds of tables, but I will use SAS to produce the first table on the list. Given a standardized z-score, z > 0, the table gives the probability that a standard normal random variate is in the interval (0, z). That is, the table gives P(0 < Z < z) for a standard normal random variable Z. The graph below shows the shaded area that is given in the body of the table.

You can create the table by using the SAS DATA step, but I'll use SAS/IML software. The rows of the table indicate the z-score to the first decimal place. The columns of the table indicate the second decimal place of the z-score. The key to creating the table is to recall that you can call any Base SAS function from SAS/IML, and you can use vectors of parameters. In particular, the following statements use the EXPANDGRID function to generate all two-decimal z-scores in the range [0, 3.4]. The program then calls the CDF function to evaluate the probability P(Z < z) and subtracts 1/2 to obtain the probability P(0 < Z < z). The SHAPE function reshapes the vector of probabilities into a 10-column table. Finally, the PUTN function converts the column and row headers into character values that are printed at the top and side of the table.

proc iml;
/* normal probability table similar to 
   https://en.wikipedia.org/wiki/Standard_normal_table#Table_examples  */
z1 = do(0, 3.4, 0.1);          /* z-score to first decimal place */
z2 = do(0, 0.09, 0.01);        /* second decimal place */
z = expandgrid(z1, z2)[,+];    /* sum of all pairs from of z1 and z2 values */
p = cdf("Normal", z) - 0.5;    /* P( 0 < Z < z ) */
Prob = shape( p, ncol(z1) );   /* reshape into table with 10 columns */
 
z1Lab = putn(z1, "3.1");       /* formatted values of z1 for row headers */
z2Lab = putn(z2, "4.2");       /* formatted values of z2 for col headers */
print Prob[r=z1Lab c=z2Lab F=6.4
           L="Standard Normal Table for P( 0 < Z < z )"];
Standard Normal Probability Table

To find the probability between 0 and z=0.67, find the row for z=0.6 and then move over to the column labeled 0.07. The value of that cell is P(0 < Z < 0.67) = 0.2486.

Chi-square table of critical values

Some statistical tables display critical values for a test statistic instead of probabilities. This section shows how to construct a table of the critical values of a chi-square test statistic for common significance levels (α). The rows of the table correspond to degrees of freedom; the columns correspond to significance levels. The following graph shows the situation. The shaded area corresponds to significance levels.

The corresponding table provides the quantile of the distribution that corresponds to each significance level. Again, you can use the DATA step, but I have chosen to use SAS/IML software to generate the table:

/* table of critical values for the chi-square distribution
   https://flylib.com/books/3/287/1/html/2/images/xatab02.jpg */
df = (1:30) || do(40,100,10);   /* degrees of freedom */
/* significance levels for common one- or two-sided tests */
alpha = {0.99 0.975 0.95 0.9 0.1 0.05 0.025 0.01}; 
g = expandgrid(df, alpha);                    /* all pairs of (df, alpha) values */
p = quantile("ChiSquare", 1 - g[,2], g[,1]);  /* compute quantiles for (df, 1-alpha) */
CriticalValues = shape( p, ncol(df) );        /* reshape into table */
 
dfLab = putn(df, "3.");                       /* formatted row headers */
pLab = putn(alpha, "5.3");                    /* formatted col headers */
print CriticalValues[rowname=dfLab colname=pLab F=6.3 
      L="Critical Values of Chi-Square Distribution"];

To illustrate how to use the table, suppose that you have computed a chi-square test statistic for 9 degrees of freedom. You want to determine if you should reject the (one-sided) null hypothesis at the α = 0.05 significance level. You trace down the table to the row for df=9 and then trace over to the column for 0.05. The value in that cell is 16.919, so you would reject the null hypothesis if your test statistic exceeds that critical value.

Just as digital downloads of songs have supplanted records and CDs, so, too, have modern handheld calculators replaced the statistical tables that used to feature prominently in introductory statistics courses. However, if you ever feel nostalgic for the days of yore, you can easily resurrect your favorite table by writing a few lines of SAS code. To be sure, there are some advanced tables (the Kolmogorov-Smirnov test comes to mind) that have not yet been replaced by a calculator key, but the simple tables are dead. Killed by the calculator.

It might be ill to speak of the dead, but I say, "good riddance"; I never liked using tables those tables anyway. What are your thoughts? If you are old enough to remember tables do you have fond memories of using them? If you learned statistics recently, are you surprised that tables were once so prevalent? Share your experiences by leaving a comment.

The post Calculators killed the standard statistical table appeared first on The DO Loop.

8月 182018
 

Get ready to have your mind blown. Whether or not you plan to attend Analytics Experience in San Diego on September 17, you'll be inspired by the speakers we have lined up to keynote the event. There's an inventor. A data scientist. A world class athlete. And a photographer. They've [...]

4 inspiring videos from Analytics Experience keynote speakers was published on SAS Voices by Alison Bolen

8月 172018
 

Data density estimation is often used in statistical analysis as well as in data mining and machine learning. Visualization of data density estimation will show the data’s characteristics like distribution, skewness and modality, etc. The most widely-used visualizations people used for data density are boxplot, histogram, kernel density estimates, and some other plots. SAS has several procedures that can create such plots. Here, I'll visualize the kernel density estimates superimposing on histogram using SAS Visual Analytics.

A histogram shows the data distribution through some continuous interval bins, and it is a very useful visualization to present the data distribution. With a histogram, we can get a rough view of the density of the values distribution. However, the bin width (or number of bins) has significant impact to the shape of a histogram and thus gives different impressions to viewers. For example, we have same data for the two below histograms, the left one with 6 bins and the right one with 4 bins. Different bin width shows different distribution for same data. In addition, histogram is not smooth enough to visually compare with the mathematical density models. Thus, many people use kernel density estimates which looks more smoothly varying in the distribution.

Kernel density estimates (KDE) is a widely-used non-parametric approach of estimating the probability density of a random variable. Non-parametric means the estimation adjusts to the observations in the data, and it is more flexible than parametric estimation. To plot KDE, we need to choose the kernel function and its bandwidth. Kernel function is used to compute kernel density estimates. Bandwidth controls the smoothness of KDE plot, which is essentially the width of the sliding window used to generate the density. SAS offers several ways to generate the kernel density estimates. Here I use the Proc UNIVARIATE to create KDE output as an example (for simplicity, I set c = SJPI to have SAS select the bandwidth by using the Sheather-Jones plug-in method), then make the corresponding visualization in SAS Visual Analytics.

Visualize the kernel density estimates using SAS code

It is straightforward to run kernel density estimates using SAS Proc UNIVARIATE. Take the variable MSRP in SASHELP.CARS dataset as an example. The min/max value of MSRP column is 10280 and 192465 respectively. I plot the histogram with 15 bins here in the example. Below is the sample codes segment I used to construct kernel density estimates of the MSRP column:

title 'Kernel density estimates of MSRP';
proc univariate data = sashelp.cars noprint;	
   histogram MSRP / kernel (c = SJPI) endpoints = 10280 to 192465 by 12145 outkernel = KDE  odstitle = title; 
run;

Run above code in SAS Studio, and we get following graph.

Visualize the kernel density estimates using SAS Visual Analytics

  1. In SAS Visual Analytics, load the SASHELP.CARS and the KDE dataset (from previous Proc UNIVARIATE) to the CAS server.
  2. Drag and drop a ‘Precision Container’ in the canvas, and put a histogram and a numeric series plot in the container.
  3. Assign corresponding data to the histogram plot: assign CARS.MSRP as histogram Measure, and ‘Frequency Percent’ as histogram Frequency; Set the options of the histogram with following settings:
    Object -> Title: No title;

    Graph Frame: Grid lines: disabled

    Histogram -> Bin range: Measure values; check the ‘Set a fixed bin count’ and set ‘Bin count’ to 15.

    X Axis options:

       Fixed minimum: 10280

       Fixed maximum: 192465

       Axis label: disabled

       Axis Line: enabled

       Tick value: enabled

    Y Axis options:

       Fixed minimum: 0

       Fixed maximum: 0.5

       Axis label: disabled

       Axis Line: disabled

       Tick value: disabled

  1. Assign corresponding KDE data to the numeric series plot. Define a calculated item: Percent as (‘Percent of Observations Per Data Unit’n / 100) with the format of ‘PERCENT12.2’, and assign it to the ‘Y axis’; assign the ‘Data Value’ to the ‘X axis.’ Now set the options of the numeric series plot with following settings:
    Object -> Title: No title;

    Style -> Line/Marker: (change the first color to purple)

    Graph Frame -> Grid lines: disabled

    Series -> Line thickness: 2

    X Axis options:

       Axis label: disabled

       Axis Line: disabled

       Tick value: disabled

    Y Axis options:

       Fixed minimum: 0

       Fixed maximum: 0.5

       Axis label: enabled

       Axis Line: enabled

       Tick value: enabled

    Legend:

       Visibility: Off

  1. Now we can start to overlay the two charts. As can be seen in the screenshot below, SAS Visual Analytics 8.3 provides a smart guide with precision container, which shows grids to help you align the objects in it. If you hold the ctrl button while dragging the numeric series plot to overlay the histogram, some fine grids displayed by the smart guide to help you with basic alignment. It is a little tricky though, to make the overlay precisely, you may fine tune the value of the Left/Top/Width/Height in the Layout of VA Options panel. The goal is to make the intersection of the axes coincides with each other.

After that, we can add a text object above the charts we just made, and done with the kernel density estimates superimposing on a histogram shown in below screenshot, similarly as we got from SAS Proc UNIVARIATE. (If you'd like to use PROC KDE UNIVAR statement for data density estimates, you can visualize it in SAS Visual Analytics in a similar way.)

To go further, I make a KDE with a scatter plot where we can also get impression of the data density with those little circles; another KDE plot with a needle plot where the data density is also represented by the barcode-like lines. Both are created in similar ways as described in above histogram example.

So far, I’ve shown you how I visualize KDE using SAS Visual Analytics. There are other approaches to visualize the kernel density estimates in SAS Visual Analytics, for example, you may create a custom graph in Graph Builder and import it into SAS Visual Analytics to do the visualization. Anyway, KDE is a good visualization in helping you understand more about your data. Why not give a try?

Visualizing kernel density estimates in SAS Visual Analytics was published on SAS Users.

8月 172018
 

When I'm at a social gathering, someone always asks what type of work I do. I like to keep my social life separate from my work, therefore I usually give a vague answer such as "software" (and quickly change the topic). How vague or specific is your response? How vague [...]

The post What are the most common occupations in each US state? appeared first on SAS Learning Post.

8月 152018
 

SAS Viya logoSAS Viya 3.4 has some new functionality that provides real help for those who want to transition from SAS Visual Analytics on 9.4 to SAS Viya. In prior releases of SAS Viya you could promote reports and explorations (and a few other supporting objects). In SAS Viya 3.4, promotion support is added for many additional SAS 9.4 resources, making it easier to make the leap to SAS Viya. In this blog, I will review this new functionality.

In SAS Viya 3.4, the following objects participate in promotion from SAS 9.4.

  • Configuration
    • Identities
    • Authorization
    • Data definitions
  • Content
    • Folders
    • Reports
    • Explorations
    • Stored processes
    • Supporting resources (such as themes, images, graphs templates)

The details of support for each resource are unique and are discussed below.

Identities

User and group promotion from SAS 9.4 to SAS Viya is used to support the transition to the target environment of authorization settings that are associated with content.  Metadata is exported to support the mapping of SAS 9.4 identity metadata (Users and Groups) to SAS Viya identities (Users, Groups and Custom groups).

During promotion of identity metadata:

  • Users connections are mapped using metadata DefaultAuth:logonid to SAS Viya identity id
  • Metadata-only groups from SAS 9.4 are converted to SAS Viya Custom groups (except SAS General Servers and SAS System Services)
  • If custom groups of the same name (or sometimes the same purpose but a different name) exist in the target, the group is preserved and any mapped members from the source system are added to the group.

Authorization

Identities are “promoted” to support re-implementation of authorization. You do not have to explicitly export authorization as it is included with libraries, tables, folders and reports when they are exported. Promotion of authorization is optional. If you don’t wish to include authorization, but rather re-implement it in
SAS Viya, you can switch this functionality off at import time.

SAS Viya has two authorization systems, the general authorization system for folders and content, and the CAS authorization system for data. These authorization systems are different than the metadata authorization model in SAS 9.4. So what happens when you promote content that includes authorization?

General Authorization (folders and content)

Promotion will attempt to convert SAS 9.4 authorization to rules in the General authorization system.  During the process:

  • Explicit Access Control Entries are converted to SAS Viya Rules
  • Access Control Entries with denials are discarded
  • Access Control Templates are not promoted

In addition, if an object (folder/report):

  • does not exist in the target environment,relevant authorization is set for the object and the access control entries from the source are implemented as rules on the object.
  • existsin the target environment, then access control entries from the source are merged with any pre-existing authorizations in the target environment.

CAS Authorization

The CAS authorization system covers CASlibs and data.  Promotion will attempt to convert SAS 9.4 authorization on libraries and tables to access controls in the CAS authorization system. During the process:

  • Access Control Entries are not promoted unless they are applied directly to a library or table.
  • Access Control Entries are converted to CAS access controls.
  • Row-level permissions are preserved.
  • If an object exists in the target environment no authorization settings are imported.
  • Access Control Templates are not promoted.

For details of how individual permissions for both data and content are mapped from SAS 9.4 to SAS Viya see the documentation has great coverage of the steps to follow.

The Process

To finish off, I'll share few observations on the process of exporting from 9.4 and importing in SAS Viya. Like SAS 9.4 promotion, you need to import in a specific order. This allows the software to make the relevant connections to dependent resources. For example, if the CASLIB already exists in the target, then import tables can be mapped to it. Typically, the order is: identities > library definitions > tables > reports and folders. To support this process, make sure, during export, you have a separate package for each resource type. Some considerations for the export process.

You should export:

  • Identities (users and groups) from the security folders in SAS 9.4 metadata to a separate package.
  • Only groups that you need in the target environment (you can subset any irrelevant SAS 9 groups at export time).
  • LASR and Base Libraries and tables directly from the library definition in the folder tree (this prevents extraneous folders being created in the target environment).
  • Libraries in a separate package from tables so that they may be imported first and be available for mapping when the tables are imported.
  • Content and reports from the base of the folder tree so that all directly applied access control entries will be included in the package.

Prior to importing, make sure that users and groups are configured correctly in LDAP. As I already mentioned, physical data is not promoted so ensure that required data and formats are accessible to the SAS Viya environment.

The new functionality for promotion is a great start in helping with the transition from SAS 9.4 to SAS Viya. Look for more functionality in future releases.

New functionality for transitioning from SAS Visual Analytics on 9.4 to SAS Viya was published on SAS Users.

8月 152018
 

What if? Two simple words. A bold question. Driver of change and progress. Fuel for innovation. “What if?” perfectly captures core values of the SAS culture: to be curious, forward-thinking and to challenge assumptions in solving problems. Today, I would like to introduce to you the people behind four of [...]

Curiosity: The force that drives innovation in SAS® technologies was published on SAS Voices by Oliver Schabenberger

8月 152018
 

What if? Two simple words. A bold question. Driver of change and progress. Fuel for innovation. “What if?” perfectly captures core values of the SAS culture: to be curious, forward-thinking and to challenge assumptions in solving problems. Today, I would like to introduce to you the people behind four of [...]

Curiosity: The force that drives innovation in SAS® technologies was published on SAS Voices by Oliver Schabenberger