12月 092019
 

Building on my last post, How to create checklist tables in SAS®, this one shows you how to compare SAS data Check mark and cross mark sets that include common and uncommon columns. You'll learn how to visualize side-by-side columns commonalities and differences in data tables.

As before, we're working with a comparison matrix (aka checklist table) where check-marks / x-marks indicate included / excluded columns.

Data tables will be comparable products while their columns (variables) will represent product features. We'll add background color to highlight which attributes are different in the common columns. Since there might be several different attributes for a given column, we will use a hierarchy typelengthlabel to indicate only the highest mismatched level of hierarchy. For example:

  • If same-named columns have different type (Numeric vs. Character), their corresponding check-mark will be shown on a light-red background, which indicates the highest degree of mismatch.
  • If same-named columns have the same type, a yellow background will indicate any difference in variables length.
  • When same-named variables type and length match, a light-blue background marks any difference in variables label.

SAS code to create color-enhanced comparison matrix

Let’s compare variable attributes in two data tables: one is SAS-supplied SASHELP.CARS, and another WORK.NEWCARS that I derive from the first one, slightly scrambling its column definitions:

data WORK.NEWCARS (drop=temp:);
   set SASHELP.CARS (rename=(Origin=Region EngineSize=temp1 Make=temp2));
   length EngineSize $3 Make $20;
   EngineSize = put(temp1,3.1);
   Make = temp2; 
   label Type='New Car Type';
run;

In this NEWCARS data table, I did the following:

  • Replaced column name Origin with Region
  • Changed type of column EngineSize from Numeric to Character
  • Changed length of column Make from $13 to $20
  • Changed label of column Type from blank to “New Car Type”

Now let’s build the comparison matrix:

proc contents data=SASHELP.CARS noprint out=DS1(keep=Name Type Length Label);
run;
 
proc contents data=WORK.NEWCARS noprint out=DS2(keep=Name Type Length Label);
run;
 
data comparison_matrix;
   merge
      DS1(in=in1 rename=(Type=Typ1 Length=Len1 Label=Lab1))
      DS2(in=in2 rename=(Type=Typ2 Length=Len2 Label=Lab2));
   by Name;
 
   /* set symbol shape: 1=V; 0=X */
   ds1 = 1; ds2 = 1;
   if in1 and not in2 then ds2 = 0; else
   if in2 and not in1 then ds1 = 0;
 
   /* add background color */
   if ds1=ds2=1 then
   select;
      when(Typ1^=Typ2) do; ds1=2; ds2=2; end;
      when(Len1^=Len2) do; ds1=3; ds2=3; end;
      when(Lab1^=Lab2) do; ds1=4; ds2=4; end;
      otherwise; 
   end;
 
   label
      Name = 'Column Name'
      ds1 = 'SASHELP.CARS'
      ds2 = 'WORK.NEWCARS'
      ;
run;
 
proc format;
   value chmark
      0   = '(*ESC*){unicode "2718"x}'
      1-4 = '(*ESC*){unicode "2714"x}'
      ;
   value chcolor
      0   = red
      1-4 = green
      ;
   value bgcolor
      2 = 'cxffccbb'
      3 = 'cxffe177'
      4 = 'cxd4f8d4' 
      ;
run;
 
ods html path='c:\temp' file='comp_marix.html' style=Seaside;
ods escapechar='^';
title 'Data set columns comparison matrix';
 
proc odstext;
   p '<div align="center">Mismatch Legend:'||
     '<span style="background-color:#ffccbb;margin-left:17px">^_^_^_^_</span> Type'||
     '<span style="background-color:#ffe177;margin-left:17px">^_^_^_^_</span> Length'||
     '<span style="background-color:#d4f8d4;margin-left:17px">^_^_^_^_</span> Label</div>'
   / style=[fontsize=9pt];
run;
 
title; 
proc print data=comparison_matrix label noobs;
   var Name / style={fontweight=bold width=100px};
   var ds1 ds2 / style={color=chcolor. backgroundcolor=bgcolor. just=center fontweight=bold width=120px};
   format ds1 ds2 chmark.;
run;
 
ods html close;

Here is a brief explanation of the code:

  1. Two PROC CONTENTS produce alphabetical lists (as datasets) of the data table column names, as well as their attributes (type, length, label)
  2. The DATA STEP merges these 2 lists and creates DS1 and DS2 variables indicating common name (values 1, 2, 3, 4) or uncommon name (value 0).
  3. PROC FORMAT creates 3 user-defined formats chmark, chcolor, bgcolor responsible for checkmark shape, checkmark color, and background color respectively. For checkmark shape, we use Unicode characters, and for colors we use both, color names (e.g. red, green) and hexadecimal RGB color notations (e.g. 'cxFFCCBB').
  4. PROC ODSTEXT’s P statement is used to display color legend for the comparison matrix.
  5. Finally, PROC PRINT with user-defined formats produces our color-enhanced comparison matrix.

Data tables comparison matrix – OUTPUT

The above code will generate the following HTML output with the comparison matrix for variables in two data sets:

Comparison matrix for common/uncommon variables in 2 datasets

Adding more detail to the comparison matrix chart

We can further enhance our output comparison matrix by adding detailed descriptive information about differences between variable attributes. For comprehensive view, we can add a COMMENTS column that spells out differences (attributes mismatches). In addition to the hierarchical logic defining only one mismatch of the highest degree indicated by color highlighting above, comments can include all found discrepancies. Simply add the following two pieces of SAS code:

1. Add the following group of statements to the above DATA Step (right after SELECT statement):

 length Comments $200;
   if ds1>1 then
   do;
      if Typ1^=Typ2 then Comments = catx(' ', Comments, 'Type1=',   Typ1, '; Type2=',   Typ2, ';');
      if Len1^=Len2 then Comments = catx(' ', Comments, 'Length1=', Len1, '; Length2=', Len2, ';');
      if Lab1^=Lab2 then Comments = catx(' ', Comments, 'Label1=',  Lab1, '; Label2=',  Lab2, ';');
   end;

Depending on your needs this Comments can be added unconditionally – you would just need to remove IF-THEN logic keeping only:

length Comments $200;
Comments = catx(' ', Comments, 'Type1=',   Typ1, '; Type2=',   Typ2, ';');
Comments = catx(' ', Comments, 'Length1=', Len1, '; Length2=', Len2, ';');
Comments = catx(' ', Comments, 'Label1=',  Lab1, '; Label2=',  Lab2, ';');

2. Add the following statement to the above PROC PRINT (right before the FORMAT statement):

var comments / style={width=250px};

Then your HTML output will look as follows:

Detailed comparison matrix for common/uncommon variables in 2 datasets

Conclusion

Comparison matrix charts are a convenient tool for data development and metadata validation when you're comparing a data table’s metadata against requirements descriptions.

It allows us to quickly identify tables’ common and uncommon variables, as well as common variable inconsistencies by type, length and other attributes, such as labels and formats.

We can easily add detailed descriptive information when needed.

On a related note

While this post focused on visualizing SAS data sets comparison vis-à-vis common and uncommon columns, it's worth noting SAS websites have plenty of info on finding common variables (or columns) in data sets. For example:

Your thoughts?

Do you find this material useful? What other usages of the checklist tables and color-enhanced comparison matrices can you suggest?

How to compare SAS data tables for common/uncommon columns was published on SAS Users.

12月 092019
 

Recently I showed how to visualize and analyze longitudinal data in which subjects are measured at multiple time points. A very common situation is that the data are collected at two time points. For example, in medicine it is very common to measure some quantity (blood pressure, cholesterol, white-blood cell count,...) for each patient before a treatment and then measure the same quantity after an intervention.

This situation leads to pairs of observations. It is useful to visualize the pre/post measurements in a spaghetti plot, a scatter plot, or a histogram of the differences. This article points out that you do not need to create these plots manually: the TTEST procedure can produce all these plots automatically, as well as provide a statistical analysis of paired data.

Visualize pre/post data

When given measurements before and after an intervention (sometimes called "pre/post data"), I like to create a scatter plot of the "before" variable versus the "after" variable. To demonstrate, the following SAS statements define a subset of the data that were analyzed in the previous article. The subset contains 50 children who had high levels of lead in their blood. Lead is toxic, especially in children. The children were given a compound (called succimer) to reduce the blood-lead level and the levels were re-measured one week later.

data PrePost;
   input id lead0 lead1 @@;
   label lead0 = "Baseline Blood Lead Level (mcg/dL)"
         lead1 = "Post-Treatment Blood Lead Level (mcg/dL)";
datalines;
  2   26.5   14.8   3   25.8   23.0  5   20.4    2.8  6   20.4    5.4
 12   24.8   23.1  14   27.9    6.3 19   35.3   25.5 20   28.6   15.8
 22   29.6   15.8  23   21.5    6.5 25   21.8   12.0 26   23.0    4.2
 27   22.2   11.5  29   25.0    3.9 31   26.0   21.4 32   19.7   13.2
 36   29.6   17.5  39   24.4   16.4 40   33.7   14.9 43   26.7    6.4
 44   26.8   20.4  45   20.2   10.6 48   20.2   17.5 49   24.5   10.0
 53   27.1   14.9  54   34.7   39.0 57   24.5    5.1 64   27.7    4.0
 65   24.3   24.3  66   36.6   23.3 68   34.0   10.7 69   32.6   19.0
 70   29.2    9.2  71   26.4   15.3 72   21.8   10.6 79   21.1    5.6
 82   22.1   21.0  85   28.9   12.5 87   19.8   11.6 89   23.5    7.9
 90   29.1   16.8  91   30.3    3.5 93   30.6   28.2 94   22.4    7.1
 95   31.2   10.8  96   31.4    3.9 97   41.1   15.1 98   29.4   22.1
 99   21.9    7.6 100   20.7    8.1
;
 
title "Pre/Post Measurements";
title2 "Created Manually";
proc sgplot data=PrePost aspect=1 noautolegend;
   scatter x=Lead0 y=Lead1 / jitter;
   lineparm x=0 y=0 slope=1 / clip;
   xaxis grid;
   yaxis grid;
run;

The graph shows the initial measurement (on the X axis) versus the post-treatment measurement (on the Y axis) for each patient in the study. The diagonal line is the identity line. A marker above the line indicates a patient whose measurement increased after treatment. A marker below the line indicates a patient whose measurement decreased. For these data, most children experienced a decrease in blood-lead levels, which indicates that the treatment was effective.

Use PROC TTEST to visualize pre/post data

The goal of ODS statistical graphics in SAS procedures is to provide an analyst with standard visualizations of data that can accompany and enhance a statistical analysis. For pre/post data, a common analysis is to perform a paired t test for the mean difference between the pre- and post-intervention measurements. It turns out that if you run a PROC TTEST analysis, the output not only includes statistical tables but also three common graphs that visualize the data. If the data are in "wide form," the PROC TTEST syntax is simple:

ods graphics on;
proc ttest data=PrePost;
   paired Lead0*Lead1;
run;

Think about this syntax: When you type three PROC TTEST statements, you get a statistical analysis AND three standard visualizations! The graphs appear automatically when you turn on ODS graphics, without requiring any effort on your part.

The first graph from PROC TTEST is a histogram of the difference (pre-treatment minus post-treatment). The graph is actually a panel, with a small horizontal box plot underneath that gives another view of the distribution of the differences. You can see that the differences range from -5 to 28. The median change appears to be 13 and the interquartile range is from 8 to 18. If the differences are approximately normally distributed, the light green rectangle indicates a 95% confidence interval for the mean.

The second graph is a spaghetti plot of the patient profiles. Each line segment represents a patient. The lines connect the pre-treatment values (on the left) to the post-treatment values (on the right). You can see that most (but not all) patients experienced a decrease in blood-lead levels. The red line in the graph indicates the mean pre- and post-treatment levels.

The third graph is a version of the scatter plot that I created earlier by using PROC SGPLOT. Like my version, the graph includes a diagonal reference line. In addition, the graph displays a symbol that shows the pre- and post-treatment mean values of the response variable. The vertical distance between the mean marker and the reference line indicates the average change since the treatment.

PROC TTEST also creates a fourth graph (not shown), which is a normal Q-Q plot of the difference. This graph is useful if you are assessing the normality of the distribution of differences.

Summary

PROC TTEST creates excellent graphs that enable you to visualize pre/post data. With only a few SAS statements, you can produce three common graphs that visualize the change, presumably due to the treatment. Using PROC TTEST is quicker and easier than creating the graphs yourself. Furthermore, one of the graphs (the histogram/box plot panel) is a graph that is not easily created by using PROC SGPLOT. So next time you want to visualize paired data, reach for PROC TTEST.

The post Visualize data before and after a treatment appeared first on The DO Loop.

12月 092019
 

Recently I showed how to visualize and analyze longitudinal data in which subjects are measured at multiple time points. A very common situation is that the data are collected at two time points. For example, in medicine it is very common to measure some quantity (blood pressure, cholesterol, white-blood cell count,...) for each patient before a treatment and then measure the same quantity after an intervention.

This situation leads to pairs of observations. It is useful to visualize the pre/post measurements in a spaghetti plot, a scatter plot, or a histogram of the differences. This article points out that you do not need to create these plots manually: the TTEST procedure can produce all these plots automatically, as well as provide a statistical analysis of paired data.

Visualize pre/post data

When given measurements before and after an intervention (sometimes called "pre/post data"), I like to create a scatter plot of the "before" variable versus the "after" variable. To demonstrate, the following SAS statements define a subset of the data that were analyzed in the previous article. The subset contains 50 children who had high levels of lead in their blood. Lead is toxic, especially in children. The children were given a compound (called succimer) to reduce the blood-lead level and the levels were re-measured one week later.

data PrePost;
   input id lead0 lead1 @@;
   label lead0 = "Baseline Blood Lead Level (mcg/dL)"
         lead1 = "Post-Treatment Blood Lead Level (mcg/dL)";
datalines;
  2   26.5   14.8   3   25.8   23.0  5   20.4    2.8  6   20.4    5.4
 12   24.8   23.1  14   27.9    6.3 19   35.3   25.5 20   28.6   15.8
 22   29.6   15.8  23   21.5    6.5 25   21.8   12.0 26   23.0    4.2
 27   22.2   11.5  29   25.0    3.9 31   26.0   21.4 32   19.7   13.2
 36   29.6   17.5  39   24.4   16.4 40   33.7   14.9 43   26.7    6.4
 44   26.8   20.4  45   20.2   10.6 48   20.2   17.5 49   24.5   10.0
 53   27.1   14.9  54   34.7   39.0 57   24.5    5.1 64   27.7    4.0
 65   24.3   24.3  66   36.6   23.3 68   34.0   10.7 69   32.6   19.0
 70   29.2    9.2  71   26.4   15.3 72   21.8   10.6 79   21.1    5.6
 82   22.1   21.0  85   28.9   12.5 87   19.8   11.6 89   23.5    7.9
 90   29.1   16.8  91   30.3    3.5 93   30.6   28.2 94   22.4    7.1
 95   31.2   10.8  96   31.4    3.9 97   41.1   15.1 98   29.4   22.1
 99   21.9    7.6 100   20.7    8.1
;
 
title "Pre/Post Measurements";
title2 "Created Manually";
proc sgplot data=PrePost aspect=1 noautolegend;
   scatter x=Lead0 y=Lead1 / jitter;
   lineparm x=0 y=0 slope=1 / clip;
   xaxis grid;
   yaxis grid;
run;

The graph shows the initial measurement (on the X axis) versus the post-treatment measurement (on the Y axis) for each patient in the study. The diagonal line is the identity line. A marker above the line indicates a patient whose measurement increased after treatment. A marker below the line indicates a patient whose measurement decreased. For these data, most children experienced a decrease in blood-lead levels, which indicates that the treatment was effective.

Use PROC TTEST to visualize pre/post data

The goal of ODS statistical graphics in SAS procedures is to provide an analyst with standard visualizations of data that can accompany and enhance a statistical analysis. For pre/post data, a common analysis is to perform a paired t test for the mean difference between the pre- and post-intervention measurements. It turns out that if you run a PROC TTEST analysis, the output not only includes statistical tables but also three common graphs that visualize the data. If the data are in "wide form," the PROC TTEST syntax is simple:

ods graphics on;
proc ttest data=PrePost;
   paired Lead0*Lead1;
run;

Think about this syntax: When you type three PROC TTEST statements, you get a statistical analysis AND three standard visualizations! The graphs appear automatically when you turn on ODS graphics, without requiring any effort on your part.

The first graph from PROC TTEST is a histogram of the difference (pre-treatment minus post-treatment). The graph is actually a panel, with a small horizontal box plot underneath that gives another view of the distribution of the differences. You can see that the differences range from -5 to 28. The median change appears to be 13 and the interquartile range is from 8 to 18. If the differences are approximately normally distributed, the light green rectangle indicates a 95% confidence interval for the mean.

The second graph is a spaghetti plot of the patient profiles. Each line segment represents a patient. The lines connect the pre-treatment values (on the left) to the post-treatment values (on the right). You can see that most (but not all) patients experienced a decrease in blood-lead levels. The red line in the graph indicates the mean pre- and post-treatment levels.

The third graph is a version of the scatter plot that I created earlier by using PROC SGPLOT. Like my version, the graph includes a diagonal reference line. In addition, the graph displays a symbol that shows the pre- and post-treatment mean values of the response variable. The vertical distance between the mean marker and the reference line indicates the average change since the treatment.

PROC TTEST also creates a fourth graph (not shown), which is a normal Q-Q plot of the difference. This graph is useful if you are assessing the normality of the distribution of differences.

Summary

PROC TTEST creates excellent graphs that enable you to visualize pre/post data. With only a few SAS statements, you can produce three common graphs that visualize the change, presumably due to the treatment. Using PROC TTEST is quicker and easier than creating the graphs yourself. Furthermore, one of the graphs (the histogram/box plot panel) is a graph that is not easily created by using PROC SGPLOT. So next time you want to visualize paired data, reach for PROC TTEST.

The post Visualize data before and after a treatment appeared first on The DO Loop.

12月 062019
 

Another year, another traditional Christmas song or carol turned into a fun technology-related version! This is the sixth year and my ninth song. I hope you enjoy your 2019 holiday song, based on this famous tune. The Data Science and AI Song Computer vision processing on an open stack The [...]

The Data Science and AI Song was published on SAS Voices by David Pope

12月 052019
 

This is a second article about analyzing longitudinal data, which features measurements that are repeatedly taken on subjects at several points in time. The previous article discusses a response-profile analysis, which uses an ANOVA method to determine differences between the means of an experimental group and a placebo group. The response-profile analysis has limitations, including the fact that longitudinal data are autocorrelated and so do not satisfy the independence assumption of ANOVA. Furthermore, the method does not enable you to model the response profile of individual subjects; it produces only a mean response-by-time profile.

This article shows how a mixed model can produce a response profile for each subject. A mixed model also addresses other limitations of the response-profile analysis. This blog post is based on the introductory article, "A Primer in Longitudinal Data Analysis", by G. Fitzmaurice and C. Ravichandran (2008), Circulation, 118(19), p. 2005-2010.

The data (from Fitzmaurice and C. Ravichandran, 2008) are the blood lead levels for 100 inner-city children who were exposed to lead in their homes. Half were in an experimental group and were given a compound called succimer as treatment. The other half were in a placebo group. Blood-lead levels were measured for each child at baseline (week 0), week 1, week 4, and week 6. The data are visualized in the previous article. You can download the SAS program that creates the data and graphs in this article.

Advantages of the mixed model for longitudinal data

The main advantage of a mixed-effect model is that each subject is assumed to have his or her own mean response curve that explains how the response changes over time. The individual curves are a combination of two parts: "fixed effects," which are common to the population and shared by all subjects, and "random effects," which are specific to each subject. The term "mixed" implies that the model incorporates both fixed and random effects.

You can use a mixed model to do the following:

  1. Model the individual response-by-time curves.
  2. Model autocorrelation in the data. This is not discussed further in this blog post.
  3. Model time as a continuous variable, which is useful for data that includes mistimed observations and parametric models of time, such as a quadratic model or a piecewise linear model.

The book Applied Longitudinal Analysis (G. Fitzmaurice, N. Laird, and J. Ware, 2011, 2nd Ed.) discusses almost a dozen ways to model the data for blood-lead level in children. This blog post briefly shows how to implement three models in SAS that incorporate random intercepts. The models are the response-profile model, a quadratic model, and a piecewise linear model.

Visualize mixed models

I've previously written about how to visualize mixed models in SAS. One of the techniques is to create a spaghetti plot that shows the predicted response profiles for each subject in the study. Because we will examine three different models, the following statements define a macro that will sort the predicted values and plot them in a spaghetti plot:

%macro SortAndPlot(DSName);
proc sort data=&DSName;
   by descending Treatment ID Time;
run;
 
proc sgplot data=&DSName dattrmap=Order;
   series x=Time y=Pred / group=ID groupLC=Treatment break lineattrs=(pattern=solid)
                       attrid=Treat;
   legenditem type=line name="P" / label="Placebo (P)" lineattrs=GraphData1; 
   legenditem type=line name="A" / label="Succimer (A)" lineattrs=GraphData2; 
   keylegend "A" "P";
   xaxis values=(0 1 4 6) grid;
   yaxis label="Predicted Blood Lead Level (mcg/dL)";
run;
%mend;

A response-profile model with a random intercept

In the response-profile analysis, the data were analyzed by using PROC GLM, although these data do not satisfy the assumptions of PROC GLM. This article uses PROC MIXED in SAS/STAT software for the analyses. You can use the REPEATED statement in PROC MIXED to specify that the measurements for individuals are autocorrelated. We will use an unstructured covariance matrix for the model (TYPE=UN), but Fitzmaurice, Laird, and Ware (2011) discuss other options.

In the response-profile analysis, the model predicts the mean response for each treatment group. However, the baseline measurements for each subject are all different. For example, some start the trial with a blood-lead level that is higher than the mean, others start lower than the mean. To adjust for this variation among subjects, you can use the RANDOM INTERCEPT statement in PROC MIXED. Use the OUTPRED= option on the MODEL statement to write the predicted values for each subject to a data set, then plot the results. The following statements repeat the response-profile model of the previous blog post but include an intercept for each subject.

/* Repeat the response-profile analysis, but 
   use the RANDOM statement to add random intercept for each subject */
proc mixed data=TLCView;
   class id Time(ref='0') Treatment(ref='P');
   model y = Treatment Time Treatment*Time / s chisq outpred=MixedOut;
   repeated Time / type=un subject=id r;   /* measurements are repeated for subjects */
   random intercept / subject=id;          /* each subject gets its own intercept */
run;
 
title "Predicted Individual Growth Curves";
title2 "Random Intercept Model";
%SortAndPlot(MixedOut);
Random intercept model or response profiles

In this model, the shape of the response-profile curve is the same for all subjects in each treatment group. However, the curves are shifted up or down to better match the subject's individual profile.

A mixed model with a quadratic response curve

From the shape of the predicted response curve in the previous section, you might conjecture that a quadratic model might fit the data. You can fit a quadratic model in PROC MIXED by treating Time as a continuous variable. However, Time is also used in the REPEATED statement, and that statement requires a discrete CLASS variable. A resolution to this problem is to create a copy of the Time variable (call it T). You can include T in the CLASS and REPEATED statements and use Time in the MODEL statement as a continuous effect. The third analysis will require a linear spline variable, so the DATA VIEW also creates that variable, called T1.

/* Make "discrete time" (t) to use in REPEATED statement.
   Make spline effect with knot at t=1. */
data TLCView / view=TLCView;
set tlc;
t = Time;                       /* discrete copy of time */
T1 = ifn(Time<1, 0, Time - 1);  /* knot at Time=1 for PWL analysis */
run;

You can now use Time as a continuous effect and T to specify that measurements are repeated for the subjects.

/* Model time as continuous and use a quadratic model in Time. 
   For more about quadratic growth models, see
   https://support.sas.com/resources/papers/proceedings/proceedings/sugi27/p253-27.pdf */
proc mixed data=TLCView;
   class id t(ref='0') Treatment(ref='P');
   model y = Treatment Time Time*Time Treatment*Time / s outpred=MixedOutQuad;
   repeated t / type=un subject=id r;      /* measurements are repeated for subjects */
   random intercept / subject=id;          /* each subject gets its own intercept */
run;
 
title2 "Random Intercept; Quadratic in Time";
%SortAndPlot(MixedOutQuad);
Quadratic model for response profiles with random intercepts

The graph shows the individual predicted responses for each subject, but the quadratic model does not seem to capture the dramatic drop in the blood-lead level at T=1.

A mixed model with a piecewise linear response curve

The next model uses a piecewise linear model instead of a quadratic model. I've previously written about how to use spline effects in SAS to model data by using piecewise polynomials.

For the four time points, the mean response profile seems to go down for the experimental (succimer) group until T=1, and then increase at approximately a constant rate. Is this behavior caused by an underlying biochemical process? I don't know, but if you believe (based on domain knowledge) that this reflects an underlying process, you can incorporate that belief in a piecewise linear model. The first linear segment models the blood-lead level for T ≤ 1; the other segment models the blood-lead level for T > 1.

/* Piecewise linear (PWL) model with knot at Time=1.
   For more about PWL models, see Hwang (2015) 
   "Hands-on Tutorial for Piecewise Linear Mixed-effects Models Using SAS PROC MIXED"
   https://www.lexjansen.com/pharmasug-cn/2015/ST/PharmaSUG-China-2015-ST08.pdf    */
proc mixed data=TLCView;
   class id t(ref='0') Treatment(ref='P');
   model y = Treatment Time T1 Treatment*Time Treatment*T1 / s outpred=MixedOutPWL;
   repeated t / type=un subject=id r;      /* measurements are repeated for subjects */
   random intercept / subject=id;          /* each subject gets its own intercept */
run;
 
title2 "Random Intercept; Piecewise Linear Model";
%SortAndPlot(MixedOutPWL);
Piecewise linear model for response profiles with random intercepts

This graph shows a model that is piecewise linear. It assumes that the blood-lead level falls constantly during the first week of the treatment, then either falls or rises constantly during the remainder of the study. You could use the slopes of the lines to report the average rate of change during each time period.

Further reading

There are many papers and many books written about mixed models in SAS. This article presents data and ideas that are discussed in detail in the book Applied Longitudinal Analysis (2012, 2nd Ed) by G. Fitzmaurice, N. Laird, and J. Ware. For an informative article about piecewise-linear mixed models, see Hwang (2015) "Hands-on Tutorial for Piecewise Linear Mixed-effects Models Using SAS PROC MIXED" For a comprehensive discussion of mixed models and repeated-measures analysis, I recommend SAS for Mixed Models, either the 2nd edition or the new edition.

Many people have questions about how to model longitudinal data in SAS. Post your questions to the SAS Support Communities, which has a dedicated community for statistical analysis.

The post Longitudinal data: The mixed model appeared first on The DO Loop.

12月 042019
 

If you’re like me, you struggle to buy gifts. Most folks in my inner circle already have everything they need and most of what they want. Most folks, that is, except the tech-lovers. That’s because there’s always something new on the horizon. There’s always a new gadget or program. Or a new way to learn those things. If you know folks who fall into this category, and who love SAS, boy do I have some gift ideas for you:

Try SAS for free

If you know someone who is interested in SAS but doesn’t work with it on a daily basis, or someone looking to experiment in a different area of SAS, let them know about our free software trials. Who doesn’t love free? And you’ll look like a rock star when you point out that they can explore areas like SAS Data Preparation, SAS Visual Forecasting, SAS Visual Statistics and lots more, for free.

SAS University Edition

Let’s keep things in the vein of free, shall we? SAS University Edition includes SAS Studio, Base SAS, SAS/STAT, SAS/IML, SAS/ACCESS, and several time series forecasting procedures from SAS/ETS. It’s the same software used by sites around the world; that means it’s the most up-to-date statistical and quantitative methods. And it’s free for academic, noncommercial use.

Registration to SAS Global Forum 2020

Oh my gosh, you would be your loved one’s favorite person! I’ve been to many SAS Global Forums and I love seeing the excitement on someone’s face as they explore the booths, sit in on sessions, and network to their heart’s content. SAS Global Forum is the place to be for SAS professionals, thought leaders, decision makers, partners, students, and academics. There truly is something for everyone at this event for analytics enthusiasts. And who wouldn’t want to be in Washington, D.C. in the Spring? It’s cherry blossom time! Be sure to register before Jan. 29, 2020 to get early bird pricing and save $600 off the on-site registration fee.

SAS books

I worked for many years in SAS Press and saw, firsthand, how excited folks get over a new book. I mean, who doesn’t want the 6th edition of the Little SAS Book or the latest from Ron Cody? Browse more than 100 titles and find the perfect gift. Don’t forget, SAS Press is offering 25% off everything in the bookstore for the month of December! Use promo code HOLIDAYS25 at checkout when placing your order. Offer expires at midnight Tuesday, December 31, 2019 (US).

A SAS Learning subscription

Give the gift of unlimited learning with access to our selection of e-learning and video tutorials. You can give a monthly or yearly subscription. Help the special people in your life get a head start on a new career. The gift of education is one of the best gifts you can give.

SAS OnDemand for Academics

For those who don't want to install anything, but run SAS in the cloud, we offer SAS OnDemand for Academics. Users get free access to powerful SAS software for statistical analysis, data mining and forecasting. Point-and-click functionality means there’s no need to program. But if you like to program, you do can do that, too! It’s really the best of both worlds.

SAS blogs

I began this post with something free, and I’ll end it with something free: learning from our SAS blogs. Pick from areas such as customer intelligence, operations research, data science and more. You can also read content from specific regions such as Korea, Latin America, Japan and others. To get you started, here are a few posts from three of our most popular blog authors:

Chris Hemedinger, author of The SAS Dummy:

Rick Wicklin, author of The DO Loop:

Robert Allison, major contributor to Graphically Speaking:

Happy holidays and happy learning. Now go help someone geek out. And I won’t tell if you purchase a few of these things for yourself!

Gifts to give the SAS fan in your life was published on SAS Users.

12月 042019
 

If you’re like me, you struggle to buy gifts. Most folks in my inner circle already have everything they need and most of what they want. Most folks, that is, except the tech-lovers. That’s because there’s always something new on the horizon. There’s always a new gadget or program. Or a new way to learn those things. If you know folks who fall into this category, and who love SAS, boy do I have some gift ideas for you:

Try SAS for free

If you know someone who is interested in SAS but doesn’t work with it on a daily basis, or someone looking to experiment in a different area of SAS, let them know about our free software trials. Who doesn’t love free? And you’ll look like a rock star when you point out that they can explore areas like SAS Data Preparation, SAS Visual Forecasting, SAS Visual Statistics and lots more, for free.

SAS University Edition

Let’s keep things in the vein of free, shall we? SAS University Edition includes SAS Studio, Base SAS, SAS/STAT, SAS/IML, SAS/ACCESS, and several time series forecasting procedures from SAS/ETS. It’s the same software used by sites around the world; that means it’s the most up-to-date statistical and quantitative methods. And it’s free for academic, noncommercial use.

Registration to SAS Global Forum 2020

Oh my gosh, you would be your loved one’s favorite person! I’ve been to many SAS Global Forums and I love seeing the excitement on someone’s face as they explore the booths, sit in on sessions, and network to their heart’s content. SAS Global Forum is the place to be for SAS professionals, thought leaders, decision makers, partners, students, and academics. There truly is something for everyone at this event for analytics enthusiasts. And who wouldn’t want to be in Washington, D.C. in the Spring? It’s cherry blossom time! Be sure to register before Jan. 29, 2020 to get early bird pricing and save $600 off the on-site registration fee.

SAS books

I worked for many years in SAS Press and saw, firsthand, how excited folks get over a new book. I mean, who doesn’t want the 6th edition of the Little SAS Book or the latest from Ron Cody? Browse more than 100 titles and find the perfect gift. Don’t forget, SAS Press is offering 25% off everything in the bookstore for the month of December! Use promo code HOLIDAYS25 at checkout when placing your order. Offer expires at midnight Tuesday, December 31, 2019 (US).

A SAS Learning subscription

Give the gift of unlimited learning with access to our selection of e-learning and video tutorials. You can give a monthly or yearly subscription. Help the special people in your life get a head start on a new career. The gift of education is one of the best gifts you can give.

SAS OnDemand for Academics

For those who don't want to install anything, but run SAS in the cloud, we offer SAS OnDemand for Academics. Users get free access to powerful SAS software for statistical analysis, data mining and forecasting. Point-and-click functionality means there’s no need to program. But if you like to program, you do can do that, too! It’s really the best of both worlds.

SAS blogs

I began this post with something free, and I’ll end it with something free: learning from our SAS blogs. Pick from areas such as customer intelligence, operations research, data science and more. You can also read content from specific regions such as Korea, Latin America, Japan and others. To get you started, here are a few posts from three of our most popular blog authors:

Chris Hemedinger, author of The SAS Dummy:

Rick Wicklin, author of The DO Loop:

Robert Allison, major contributor to Graphically Speaking:

Happy holidays and happy learning. Now go help someone geek out. And I won’t tell if you purchase a few of these things for yourself!

Gifts to give the SAS fan in your life was published on SAS Users.

12月 042019
 

Site relaunches with improved content, organization and navigation.

In 2016, a cross-divisional SAS team created developer.sas.com. Their mission: Build a bridge between SAS (and our software) and open source developers.

The initial effort made available basic information about SAS® Viya® and integration with open source technologies. In June 2018, the Developer Advocate role was created to build on that foundation. Collaborating with many of you, the SAS Communities team has improved the site by clarifying its scope and updating it consistently with helpful content.

Design is an iterative process. One idea often builds on another.

-- businessman Mark Parker

The team is happy to report that recently developer.sas.com relaunched, with marked improvements in content, organization and navigation. Please check it out and share with others.

New overview page on developer.sas.com

The developer experience

The developer experience goes beyond the developer.sas.com portal. The Q&A below provides more perspective and background.

What is the developer experience?

Think of the developer experience (DX) as equivalent to the user experience (UX), only the developer interacts with the software through code, not points and clicks. Developers expect and require an easy interface to software code, good documentation, support resources and open communication. All this interaction occurs on the developer portal.

What is a developer portal?

The white paper Developer Portal Components captures the key elements of a developer portal. Without going into detail, the portal must contain (or link to) these resources: an overview page, onboarding pages, guides, API reference, forums and support, and software development kits (SDKs). In conjunction with the Developers Community, the site’s relaunch includes most of these items.

Who are these developers?

Many developers fit somewhere in these categories:

  • Data scientists and analysts who code in open source languages (mainly Python and R in this case).
  • Web application developers who create apps that require data and processing from SAS.
  • IT service admins who manage customer environments.

All need to interact with SAS but may not have written SAS code. We want this population to benefit from our software.

What is open source and how is SAS involved?

Simply put, open source software is just what the name implies: the source code is open to all. Many of the programs in use every day are based on open source technologies: operating systems, programming languages, web browsers and servers, etc. Leveraging open source technologies and integrating them with commercial software is a popular industry trend today. SAS is keeping up with the market by providing tools that allow open source developers to interact with SAS software.

What is an API?

All communications between open source and SAS are possible through APIs, or application programming interfaces. APIs allow software systems to communicate with one another. Software companies expose their APIs so developers can incorporate functionality and send or request data from the software.

Why does SAS care about APIs?

APIs allow the use of SAS analytics outside of SAS software. By allowing developers to communicate with SAS through APIs, customer applications easily incorporate SAS functions. SAS has created various libraries to aid in open source integration. These tools allow developers to code in the language of their choice, yet still interface with SAS. Most of these tools exist on github.com/sassoftware or on the REST API guides page.

A use case for SAS APIs

A classic use of SAS APIs is for a loan default application. A bank creates a model in SAS that determines the likelihood of a customer defaulting on a loan based on multiple factors. The bank also builds an application where a bank representative enters the information for a new potential customer. The bank application code uses APIs to communicate this information to the SAS model and return a credit decision.

What is a developer advocate?

A developer advocate is someone who helps developers succeed with a platform or technology. Their role is to act as a bridge between the engineering team and the developer community. At SAS, the developer advocate fields questions and comments on the Developers Community and works with R&D to provide answers. The administration of developer.sas.com also falls under the responsibility of the developer advocate.

We’re not done

The site will continue to evolve, with additions of other SAS products and offerenings, and other initiatives. Check back often to see what’s new.
Now that you are an open source and SAS expert, please check out the new developer.sas.com. We encourage feedback and suggestions for content. Leave comments and questions on the site or contact Joe Furbee: joe.furbee@sas.com.

developer.sas.com 2.0: More than just a pretty interface was published on SAS Users.

12月 032019
 

Longitudinal data are used in many health-related studies in which individuals are measured at multiple points in time to monitor changes in a response variable, such as weight, cholesterol, or blood pressure. There are many excellent articles and books that describe the advantages of a mixed model for analyzing longitudinal data. Recently, I encountered an introductory article that summarizes the essential issues in a little more than five pages! You can download the article for free: "A Primer in Longitudinal Data Analysis", by G. Fitzmaurice and C. Ravichandran (2008), Circulation, 118(19), p. 2005-2010.

The article analyzes a set of longitudinal data in two ways. First, the authors use a traditional linear model to perform an "analysis of response profiles." Then, the authors discuss how a mixed model can correct some of the deficiencies of the analysis. This blog post analyzes the same data by using PROC GLM in SAS. A subsequent blog post analyzes the same data by using PROC MIXED in SAS.

Longitudinal Data: Treatment of lead-exposed children

Fitzmaurice and C. Ravichandran analyze data for a randomized trial involving toddlers who were exposed to high levels of lead. The article analyzes a subset of 100 children. Half the children were given a treatment (called succimer) and the other half were given a placebo. The blood lead levels were measured for each child at baseline (week 0), week 1, week 4, and week 6. The main scientific question is "whether the two treatment groups differ in their patterns of change from baseline in mean blood lead levels" (p. 2006).

The children in the subset were measured at all four time points. There are no missing values or mistimed measurements. (This situation is fairly unusual in longitudinal data, which is often plagued by missed appointments or individuals who leave the study.) The following "spaghetti plot" shows the individual measurements for the 100 children at each time point. Each line represents a child. Blue lines indicate that the child was in the placebo group; red lines indicate the experimental group that was given succimer.

You can download the SAS program that contains the data and the programs that create the graphs and tables in this article.

Analysis of response profiles

In an analysis of response profiles, you compare the mean response-by-time profile for the treatment and placebo groups. You can visualize the mean response over time for each group by using the VBAR statement in PROC SGPLOT. The following graph shows the mean value and standard error for each time point for each treatment group:

If the treatment is ineffective, the line segments for the two treatment groups will be approximately parallel. The graph shows that this is not that case for these data. The visualization indicates that the mean blood-lead value for the treatment group (Treatment='A') is lower than for the placebo group at 1 and 4 weeks.

You can use PROC GLM to confirm that these differences are statistically significant and to estimate the effect that taking succimer had on the mean blood-lead level:

proc glm data=tlc;
   class Time(ref='0') Treatment(ref='P');
   model y = Treatment Time Treatment*Time / solution;
   output out=GLMOut Predicted=Pred;
quit;

According to the Type 3 statistics, all three effects in the model are significant. The parameter estimates (outlined in red) indicate that the mean blood-lead level for children in the Treatment='A' group is 11.4 mcg/dL lower than the children in the placebo group after 1 week. Similarly, after four weeks the mean of the experimental group is 8.8 mcg/dL lower than the placebo group. These are both significant differences, so the response-profile analysis provides a positive answer to the research question: the profile for the treatment group is lower than the placebo group.

Advantages and disadvantages of the analysis of response profiles

As discussed in Fitzmaurice and Ravichandran (2008), the analysis of the response profile has several advantages:

  • It is familiar to researchers who have experience with ANOVA.
  • It does not require any advanced statistical modeling , such as modeling the covariance of the repeated measurements.

However, this simple method suffers from several statistical problems:

  • Longitudinal data do not satisfy the assumptions of linear regression and ANOVA. Because the data contains repeated measures from the same individuals, the residual errors are neither independent nor do they have constant variance (homoscedastic).
  • Some participants in a study might miss an appointment or drop out of the study. Others might be measured at time points that were not part of the design (for example, at 2 or 3 weeks). These two problems are known as incomplete data and mistimed measurements, respectively. Although the first can be handled by using an unbalanced ANOVA, the second is a problem that does not have a simple solution within an ANOVA model that uses discrete time points.
  • The response-profile analysis does not enable you to model each individual's response as a function of time.

Prediction of individual response profiles

The inability to model individual trajectories is often the reason that researchers abandon the response-profile analysis in favor of a more complicated mixed model. To be clear, the GLM model can make predictions, but the predicted values for every child in the placebo group are the same. Similarly, the predicted values for every child in the experimental group are the same. This is shown in the following panel of graphs, which shows the predicted response curves for six children in the study, three from each treatment group.

/* Look at predictions for each individual. They are identical! */
proc sort data=GLMOut(where=(ID in (1,2,4,5,6,7))) out=GLMSubset;
   by Treatment ID;
run;
 
title2 "Fixed Effect Model";
proc sgpanel data=GLMSubset dattrmap=Order;
   panelby Treatment ID;
   scatter x=Time y=y / group=Treatment attrid=Treat;
   series x=Time y=Pred / group=Treatment attrid=Treat;
run;

Notice that the predicted responses are the same across the top row (ID=2, 5, and 6). These children were all in the experimental group. Although the predicted values seem to fit the actual observed response for ID=2, the predicted responses for ID=5 and ID=6 are much higher than the observed responses. Although the predicted response is the best prediction for an "average patient," it does not account for individual differences in the study participants.

The same is true for the patients in the placebo group, three of which are plotted in the second row. The predicted values are "too low" for ID=1 and are "too high" for ID=4.

If modeling the individual profiles is important, then clearly this method is not sufficient. If you want to model the individual profiles, you can use a linear mixed model. The mixed model also addresses other deficiencies of the response-profile analysis. The mixed model is described in the next blog post.

You can download the SAS code and data for the response-profile analysis.

Further reading

This blog post is a brief summary of the article "A Primer in Longitudinal Data Analysis" (Fitzmaurice and Ravichandran, 2008). See the article for more details. Also, these data and these ideas are also discussed in the book Applied Longitudinal Analysis (2012, 2nd Ed) by G. Fitzmaurice, N. Laird, and J. Ware. You can download data (and SAS programs) from the book at the book's web site.

The post Longitudinal data: The response-profile model appeared first on The DO Loop.