Christmas

12月 152021
 

Suppose you are creating a craft project for the Christmas holidays, and you want to choose a palette of Christmas colors to give it a cheery holiday appearance. You could use one of the many online collections of color palettes to choose a Christmas-themed palette. However, I didn't want to merely take a palette that someone else designed. As a statistician, I wanted to create my own Christmas palette by using some sort of mathematical technique that creates a "statistically perfect" set of Christmas colors from other palettes!

Artists use their talents to create aesthetically pleasing and harmonious palettes of colors. But why should artists have all the fun? In this article, I show how a statistician might create a palette of Christmas colors! Starting from existing palettes of colors, the technique uses principal component analysis to select a linear subspace of colors. From within that subspace, you can create your own palette of colors. An example is the green-gold-red palette that is shown below. It was made by math, but I think it is pretty!

The same technique could also be used to create autumn palettes, spring palettes, and so forth, but don't take this too seriously: this article is intended to be geeky fun for the holidays. I'll let others debate whether this mathematical technique is useful for creating art.

Outline of the method

Here is the general idea behind creating a statistical palette of colors:

  1. Collect 6-10 existing palettes that you like. This process should produce 40-60 colors that are related to your theme (such as "Christmas"). I have written about how to read and visualize palettes in SAS.
  2. Each color can be represented a triplet of values in RGB space. You can use principal component (PC) analysis to project the colors into a plane that captures most of the variation in the colors. The graph at the right shows the principal component scores for 41 colors in seven Christmas palettes.
  3. Each palette is represented as a path through this linear subspace. So, you can choose a new sequence of points in this subspace and then REVERSE the projection to obtain a new palette of RGB colors.
  4. There are infinitely many paths you could choose, but one choice is to use vertices of a regular polygon to create a "statistically perfect" set of Christmas colors.

Paths through principal component space

For each palette, if you connect the sequence of colors in the palette, you obtain a path through the plane of the principal component scores. If you try to display the paths for all seven palettes, you will get a jumbled mess of lines that crisscross the graph. To make it easier to visualize the paths for each palette, the following graph limits the display to two palettes. The "Jingle Bell" palette has six colors. It starts with two green colors, moves to golden colors, and ends with two red colors. The "Unwrapping My Gifts" palette has five colors. It starts with two green colors, moves to a silver color, and ends with a red and dark burgundy color. The other palettes trace similar paths through the plane of scores.

Create your own path

If each palette corresponds to a path, then each path corresponds to a palette. In other words, if you choose a sequence five or six coordinates in the space of principal component scores, the path defines a new Christmas palette.

The ancient Greeks thought that a circle was the most perfect shape. Regular polygons, which approximate the circle, are also seen as perfect in their symmetry and beauty. Therefore, it makes sense to use a regular hexagon to create a new Christmas palette. The following graph demonstrates how you can choose a new Christmas palette. The circle represents a path through the space of colors. The gray stars represent six points on the circle, which are also vertices on a regular hexagon. I will define a new palette by using math to figure out what colors to associate with the coordinates of the six stars. I've ordered the six points starting from the point that I know will be green (because it is in the lower-left corner) and ending with the point that I know will be red (because it is in the upper-left corner).

Inverting the principal component transformation

Until now, we've only considered the projection of the RGB colors onto the plane of the first two PCs. Of course, this projection is not invertible: There are many colors that share the same two PC scores. However, in general, the transformation from RGB space to the space of all principal component scores is invertible. You can represent the transformation as a matrix by using the spectral decomposition of the correlation matrix of the colors. To embed the plane of the first two PCs, you need to set a value for the coordinate of the third PC. The obvious statistical choice is to use PC3=0, which represents the average score in the third PC. In other words, you can associate the first two PC scores with the plane that has ordered triplets (PC1, PC2, 0).

With that choice, it is straightforward to apply the inverse PC transformation. The inverse transformation maps PC scores into colors in RGB space. You can use the SAS IML language to carry out this transformation. If you visualize a dense set of points on the circle, you get the following spectrum of colors:

Any smaller palette will contain a subset of these colors. For example, if you choose six points on the circle that are vertices of a regular histogram, you get the palette that is shown at the top of this article. The new palette is based on the original seven palettes but is distinct from them. The new palette incorporates the properties of the original palettes but is defined by defining a new path through the PC scores.

Summary

In summary, this article shows how to create a new palette of colors that is based on statistical properties of a set of colors. You can perform a principal component analysis in RGB space to obtain a plane of PC scores. Every path through that plane corresponds to a new palette of colors. For my palette, I chose a sequence of vertices on a regular hexagon.

This method is not guaranteed to give an aesthetically pleasing palette of colors, but I think it does in this example. And if you don't like my palette, you can choose any other path through the set of PC scores to create a new palette! If you would like to experiment with this technique, you can download the SAS program that I used to generate the figures in this article.

The post A statistical palette of Christmas colors appeared first on The DO Loop.

12月 132021
 

In a previous article, I visualized seven Christmas-themed palettes of colors, as shown to the right. You can see that the palettes include many red, green, and golden colors. Clearly, the colors in the Christmas palettes are not a random sample from the space of RGB colors. Rather, they represent a specific subset of Christmas colors. I thought it would be fun to use principal component analysis to compute and visualize the linear subspace that best captures the set of Christmas colors!

Convert hexadecimal colors to RGB

Each color can be represented as an ordered triplet of unsigned integers in RGB space. The colors in SAS are 8-bit colors, which means that each coordinate in RGB space has a value between 0 and 255. The following DATA step reads the colors in the Christmas palettes and uses the HEX2. informat to convert the hexadecimal values to their equivalent RGB values. As explained in the previous article, the DATA step also creates an ID value for each color and creates a macro variable (AllColors) that contains the list of all colors.

/* read Christmas palettes. Convert hex to RGB */
data RGB;
length Name $30 color $8;
length palette $450; /* must be big enough to hold all colors */
retain ID 1 palette;
input Name 1-22 n @;
do col = 1 to n;
   input color @;
   R = inputn(substr(color, 3, 2), "HEX2."); /* get RGB colors from hex */
   G = inputn(substr(color, 5, 2), "HEX2.");
   B = inputn(substr(color, 7, 2), "HEX2.");
   palette = catx(' ', palette, color);
   output;
   ID + 1;                             /* assign a unique ID to each color */
end;
call symput('AllColors', palette);   /* concatenate colors; store in macro */
drop n col palette;
/* Palette Name      |n| color1 | color2 | color2 | color4 | ... */
datalines;
Jingle Bell           6 CX44690D CX698308 CXF3C543 CXFFEDC7 CXCA2405 CX9E1007
Holiday Red and Gold  6 CXCF1931 CXAD132D CXD9991A CXEAA61E CXF2BC13 CX216A1B
Green and Gold        6 CX03744B CX15885A CX1E9965 CXFBE646 CXFBC34D CXF69F44
Unwrapping My Gifts   5 CX2A5B53 CX5EB69D CXECEBF1 CXD34250 CX5F1326
Christmas Wrapping    6 CX237249 CX3B885C CXE5D6B5 CXE3CD8E CXDA111E CXC00214
Christmas Wedding     6 CX325C39 CX9C9549 CXDBAA46 CXFFE9D9 CXFF4A4A CXDB2121
Real Christmas Tree   6 CX779645 CX497542 CX274530 CX6E3C3B CXBF403B CXEDB361
;

The result of this DATA step is a data set that contains 41 rows. The R, G, and B variables contain the red, green, and blue components, respectively, which are in the range 0 to 255. Thus, the data contains 41 triplets of RGB values.

Let's use PROC PRINCOMP in SAS to run a principal component analysis of these 41 observations:

proc princomp data=RGB N=2 out=PCOut plots(only)=(Score Pattern);
   var R G B;
   ID color;
run;

Recall that each principal component is a linear combination of the original variables (R, G, and B). The table shows that the first principal component (PC) is the linear combination
PC1 = 0.39*R + 0.66*G + 0.64*B
The first PC is a weighted sum of the amount of color across all three variables. More weight is given to the green and blue components than to the red component. Along the axis of the first PC, black is on the extreme left since black has the RGB value (0, 0, 0). Similarly, white is on the extreme right since white has the RGB value (255, 255, 255).

The second principal component is the linear combination
PC2 = 0.91*R - 0.21*G - 0.35*B
The second PC is a contrast between the red coordinate and the green and blue coordinates. Along the axis of the second PC, colors that have a lot of green and blue (such as cyan, which is (0, 255, 255)) have extreme negative values whereas colors that are mostly red have extreme positive values.

An additional table (not shown) indicates that 89% of the variation in the data is explained by using the first two principal components. You could add a third principal component if you want to have a way to separate the green and blue colors.

The PRINCOMP procedure creates a graph that projects each RGB value onto the principal component axes. Because I put the COLOR variable on the ID statement, each marker is labeled by using its hexadecimal value:

This graph shows how the colors in the Christmas palettes are distributed in the space of the first two PCs. However, wouldn't it be cool if the markers in this plot were the colors that the markers represent? The next section adds color to the scatter plot by using PROC SGPLOT.

Adding color to the score plot

The previous call to PROC PRINCOMP includes the OUT= option, which writes the principal component scores to a SAS data set. The names of the scores are PRIN1 and PRIN2. If you use the PRIN1 and PRIN2 variables to create a scatter plot, you can re-create the score plot from PROC PRINCOMP. However, I am going to add a twist. As shown in my previous post, I will use the GROUP= option to specify that the marker colors be assigned according to the value of the ID variables, which are the integers 1, 2, ..., 41. I will use the STYLEATTRS statement and the DATACONTRASTCOLORS= option to specify that the colors that should be used for each group. The result is a scatter plot in which each marker is colored according to a specified list of colors.

title "Principal Component Scores for Christmas Colors";
proc sgplot data=PCOut noautolegend aspect=1;
   label Prin1="Component 1 (60.37%)" Prin2="Component 2 (28.62%)";
   styleattrs datacontrastcolors=(&AllColors);
   scatter x=Prin1 y=Prin2 / markerattrs=(symbol=CircleFilled size=16) group=ID;
   refline 0 / axis=x;
   refline 0 / axis=y;
run;

The graph shows the distribution of the colors, using the colors themselves as a visual cue. The graph shows that the first PC differentiates between dark colors (on the left) and light colors (on the right). The second PC differentiates between blue-green colors (on the bottom) and red colors (on the top).

Summary

I almost titled this article, "A Statistician Looks at Christmas Colors." I started with a set of Christmas-themed palettes with 41 colors that were primarily red, green, and gold. By performing a principal component analysis, you can project the RGB coordinates of these colors onto the plane that captures most of the variation in the data. If you color the markers in this projection, you obtain a scatter plot that shows green colors in one region, red colors in another region, and gold and silver colors in yet another.

The graph shows the various Christmas colors in one image. To me, the image seems more coherent and organized than the original strips of colors. I like that similar colors are grouped together, and that the three groups of colors (red, green, and blue) are visibly distinct.

The post A principal component analysis of color palettes appeared first on The DO Loop.

12月 082021
 

In data visualization, colors can represent the values of a variable in a choropleth map, a heatmap, or a scatter plot. But how do you visualize a palette of colors from the RGB or hexadecimal values of the colors? One way is to use the HEATMAPDISC subroutine in SAS/IML, which enables you to specify a color ramp as part of the syntax. If you do not have SAS/IML software, you can use the HEATMAPPARM statement in PROC SGPLOT and use a discrete attribute map to define which color is used for each cell in the heat map. If you do not yet know about discrete attribute maps, I highly recommend that you take a moment to learn about them; they are very useful!

If you do not want to use a discrete attribute map and you do not have access to SAS/IML, there is a third way to visualize color palettes: use the POLYGON statement in PROC SGPLOT. As discussed in this article, each polygon has a unique value of an ID variable. You can use the GROUP= option to associate a color with each value of the ID variable. You can use the DATACOLORS= option on the STYLEATTRS statement to specify the colors for each polygon. This article uses rectangular polygons and arranges them so that you can visualize a palette of colors. Because I am writing this article in the Christmas season, I will use a set of Christmas-themed red-and-green palettes to demonstrate the technique. The final visualization is shown to the right.

How to use the POLYGON statement to visualize a palette

The following list outlines the main steps for using the POLYGON statement in PROC SGPLOT to visualize a set of palettes:

  1. Use the DATA step to read the colors in each palette.
  2. Assign a unique ID to each color.
  3. Concatenate all the colors and store the set of colors in a macro variable. This macro variable will be used to assign a color to each polygon.
  4. Create a polygon for each color. This example uses rectangles. The i_th color for the j_th palette is represented by a rectangle that is centered at the coordinate (i,j).
  5. Optionally, use the TEXT statement in PROC SGPLOT to overlay the hexadecimal values of the colors on the graph.
  6. Optionally, use the TEXT statement to display the name of each palette.
  7. Use the POLYGON and TEXT statements to display the palettes of colors.

Read the colors in each palette

I found several Christmas-themed palettes at www.schemecolor.com. The specific palettes are cited in the comments of the following program, which implements the first three steps of the outline:

ods graphics/reset;
/* Acknowledgements:
   Jingle Bell palette:  https://www.schemecolor.com/jingle-bells.php
   Holiday Red and Gold: https://www.schemecolor.com/holiday-red-gold.php
   Green and Gold:       https://www.schemecolor.com/green-and-gold.php
   Unwrapping My Gifts:  https://www.schemecolor.com/unwrapping-my-gifts.php
   Christmas Wrapping:   https://www.schemecolor.com/christmas-wrapping.php
   Christmas Wedding:    https://www.schemecolor.com/christmas-wedding.php
   Real Christmas Tree:  https://www.schemecolor.com/real-christmas-tree.php
*/
/* 1. Use the DATA step to read the colors in each palette. */
data Palettes;
length Name $30 color $8;
length palette $450; /* must be big enough to hold all colors */
retain ID 1 row 1 palette;
input Name 1-22 n @;
do col = 1 to n;
   input color @;
   palette = catx(' ', palette, color);
   output;
   ID + 1;    /* Assign a unique ID to each color. */
end;
row + 1;
call symput('AllPalettes', palette);  /* 3. Concatenate colors; store in macro. */
drop palette;
/* Palette Name      |n| color1 | color2 | color2 | color4 | ... */
datalines;
Jingle Bell           6 CX44690D CX698308 CXF3C543 CXFFEDC7 CXCA2405 CX9E1007
Holiday Red and Gold  6 CXCF1931 CXAD132D CXD9991A CXEAA61E CXF2BC13 CX216A1B
Green and Gold        6 CX03744B CX15885A CX1E9965 CXFBE646 CXFBC34D CXF69F44
Unwrapping My Gifts   5 CX2A5B53 CX5EB69D CXECEBF1 CXD34250 CX5F1326
Christmas Wrapping    6 CX237249 CX3B885C CXE5D6B5 CXE3CD8E CXDA111E CXC00214
Christmas Wedding     6 CX325C39 CX9C9549 CXDBAA46 CXFFE9D9 CXFF4A4A CXDB2121
Real Christmas Tree   6 CX779645 CX497542 CX274530 CX6E3C3B CXBF403B CXEDB361
;
 
%put &AllPalettes;   /* check that all colors are here */

The following sections implement the visualization in three stages: an initial visualization that uses a scatter plot, an improved visualization that uses rectangular polygons, and a final visualization.

An initial visualization

After running this DATA step, the Palettes data set contains all the information you need to visualize the colors in each palette. In fact, you can use a scatter plot to visualize the data and make sure it was read correctly. The following call to PROC SGPLOT creates a graph for which each row represents a palette of colors. Recall that the STYLEATTRS statement supports two different options for specifying how colors are assigned to groups when you use the GROUP= option:

  • The DATACOLORS= option specifies the fill color for bars and polygons.
  • The DATACONTRASTCOLORS= option specifies the colors for lines and markers.

For a scatter plot, you can use the DATACONTRASTCOLORS= option to assign group colors, as follows:

%let gray   = CXF8F8F8;
title "Initial Visualization of Christmas Palettes";
proc sgplot data=palettes noautolegend noborder;
   styleattrs backcolor=&gray wallcolor=&gray datacontrastcolors=(&AllPalettes);
   scatter x=col y=row / group=ID markerattrs=(symbol=SquareFilled size=45);
   text x=col y=row text=color;
run;

This is not a professional-looking graph, but it does provide a quick-and-easy way to visualize the colors. Each row is a palette of colors. The colors are labeled by using their hexadecimal values. In the next section, the markers are replaced by rectangular polygons.

A better visualization of palettes

In the data set, the (col, row) coordinates are the center of each polygon. The following DATA step creates new variables (x, y) for the vertices of rectangles:

/* 4. Create a polygon for each color. 
      The i_th color for the j_th palette is represented by a rectangle 
      that is centered at the coordinate (i,j). */
data Poly;
set Palettes;
w = 0.5; h = 0.48;                /* width and height of rectangle */
x = col - w; y = row - h; output; /* four vertices of square */
x = col + w; y = row - h; output;
x = col + w; y = row + h; output;
x = col - w; y = row + h; output;
drop w h;
run;

With that minor change, you can use a POLYGON statement to display colored rectangles. (Be sure to use the DATACOLORS= option on the STYLEATTRS statement.) The resulting visualization looks much better:

%let gray   = CXF8F8F8;
title "Second Visualization of Christmas Palettes";
title2 "A Polygon Plot";
proc sgplot data=Poly noautolegend;
      styleattrs wallcolor=&gray datacolors=(&AllPalettes);
   polygon x=x y=y ID=ID / group=ID fill;
   text x=col y=row text=color;
run;

The final visualization

Although both previous graphs show the palettes of colors, you can improve the graphs by making minor changes:

  • Because each rectangular polygon is defined by using four rows in the data, the hexadecimal value of the colors is plotted four times. You can see in the previous graphs that the color values appear to be boldfaced. You only need to plot the value once for each value of the ID variable.
  • Add the name of the palette to the graph.
  • Do not display the axes. They are not necessary.
/* 5. Optionally, use the TEXT statement to display the name of each palette. */
data Temp;
set Poly;
by ID;
if ^first.ID then  color=" ";
run;
 
/* 6. Optionally, use the TEXT statement to overlay the hex values of the colors */
data PlotPalettes;
set Temp;
by Name notsorted;
label = Name;
if first.Name then do;
   lx = col - 0.6;    ly = row;
end;
else do;
   label=" ";  lx = .;  ly = .;
end;
run;
 
/* 7. Use the POLYGON and TEXT statements to display the palettes of colors. */
%let gray   = CXF8F8F8;
title "Visualization of Christmas Palettes";
proc sgplot data=PlotPalettes noautolegend noborder;
   styleattrs backcolor=&gray wallcolor=&gray datacolors=(&AllPalettes);
   polygon x=x y=y ID=ID / group=ID fill;
   text x=col y=row text=color;
   text x=lx y=ly text=label / position=left textattrs=(size=12);
   xaxis display=none offsetmax=0;
   yaxis display=none thresholdmin=0 thresholdmax=0;
run;

The final graph is shown at the top of this article. It enables you to see that most of the palettes are green-to-red and include either a golden or creamy-white color. Three palettes are different from the others:

  • The "Real Christmas Tree" palette ends with gold instead of red.
  • The "Green and Gold" palette does not contain any red.
  • The "Holiday Red and Gold" palette starts with red and ends with green.

Summary

This article shows how to use a polygon plot to visualize a set of Christmas-themed color palettes. By using the STYLEATTRS statement in PROC SGPLOT, you can specify how colors are assigned to groups when you use the GROUP= option. You can use the DATACOLORS= option to specify the fill color for polygons. You can use the POLYGON statement and the GROUP=ID option to assign each color to a rectangular polygon, thus visualizing each palette as a strip of colors.

The post Visualize palettes of colors in SAS appeared first on The DO Loop.

12月 072020
 
Spruce (picea glauca) branches

Image of spruce branch (Picea glauca) by Aleksandrs Balodis, licensed under the Creative Commons Attribution-Share Alike 4.0 International license (CC-BY-SA-4.0).

"O Christmas tree, O Christmas tree, how lovely are your branches!" The idealized image of a Christmas tree is a perfectly straight conical tree with lush branches and no bare spots. Although this ideal exists only on Christmas cards, forest researchers are always trying to develop trees that approach the ideal. And they use serious science and statistics to do it!

Bert Cregg, a forest researcher at Michigan State University, is a Christmas tree researcher who has been featured in Wired magazine. Cregg and his colleagues have published many papers on best practices for growing Christmas trees. One paper that caught my eye is Gooch, Nzokou, and Cregg (2009), "Effect of Indoor Exposure on the Cold Hardiness and Physiology of Containerized Christmas Trees." In this paper, Cregg and his colleagues investigate whether you can buy a live Christmas tree, keep it in the house for the holidays, and then transplant it in your yard. The authors use a statistical technique known as the median lethal dose, or LD50, to describe how bringing a potted Christmas tree into the house can affect its hardiness to freezing temperatures.

This blog post uses Cregg's data and shows how to use SAS to estimate the LD50 statistic. If you are not interested in the details, the last section of this article summarizes the results. Spoiler: An evergreen kept indoors has a reduced ability to withstand freezing temperatures after it is transplanted. In Cregg's experiment, all the transplanted trees died within six months.

The problem and the experiment

Containerized (potted) Christmas trees are popular among consumers who want a live Christmas tree but do not want to kill the tree by cutting it down. The idea is to bring the tree indoors during the holidays, then plant it on your property. However, there is a problem with bringing a tree into a house during the winter. Evergreens naturally go dormant in the winter, which enables their needles and buds to withstand freezing temperatures. When you bring a tree into a heated space, it "wakes up." If you later move the tree outside into freezing temperatures, the needles and buds can be damaged by the cold. This damage can kill the tree.

Cregg and his colleagues set up an experiment to understand how the length of time spent indoors affects a Christmas tree's ability to withstand freezing temperatures. Trees were brought indoors for 0, 3, 7, 14, and 20 days. Cuttings from the trees were then exposed to freezing conditions: -3, -6, -9, ..., -30 degrees Celsius. The goal is to estimate the median "lethal dose" for temperature. That is, to estimate the temperature at which half of the cuttings are damaged by the cold. In pharmacological terms, the time spent indoors is a treatment and the temperature is a "dose." The trees that were never brought indoors (0 days) are a control group.

Cregg and his colleagues studied three species of Christmas trees and studied dames to both buds and needles. For brevity, I will only look at the results for buds on the Black Hills Spruce (Picea glauca).

The data and the graphical method

In Table 1 (p. 75), the authors show data for the bud mortality and 50% lethality (LD50) for each treatment group (days indoors) as a function of decreasing temperatures. The data for the spruce are shown in the following SAS DATA step. Each cell in the table represents the percentage of 60 cuttings that showed damage after being exposed to a freezing temperature. By knowing there were 60 cuttings, you can compute how many cuttings were damaged.

/* Bud mortality and LD50. Data from Table 1 of Gooch, Nzokou, & Cregg (2009). */
data SpruceBudDamage;
Ntrials = 60;
input TimeIndoors @;
do Temp = -3 to -30 by -3;
   input BudMort @;
   NEvents = int(NTrials * BudMort / 100);  /* compute NEvents from mortality */
   output;
end;
label Temp = "Temperature (C)";
/* Bud Mortality (percent) as a function of temperature for each treatment */
/* Days    | --------  Temperature (C)  -------- */   
/* Indoors |-3 -6 -9 -12 -15 -18 -21 -24 -27 -30 */
datalines;
   0         0  0  0   0   0  6.7  0   0  20 100
   3         0  0  0   0   0  0    0  30  80 100
   7         0  0  0   0   0  0   10  40 100 100
  14         0  0  0   0   0  0    0  80 100 100
  20         0  0  0   0  20 40  100 100 100 100
;

The paper says that the LD50 "was determined graphically using a pairwise plot of the exposure temperature and the percentage of bud mortality ... for each species." I don't know what that means. It sounds like they did not estimate the LD50 statistically but instead graphed the bud mortality versus the temperature and used linear interpolation (or a smoother) to estimate the temperature at which the mortality is 50%. Here is a graph of the data connected by lines:

title "Bud Mortality in Black Hills Spruce";
proc sgplot data=SpruceBudDamage;
   series x=Temp y=BudMort / markers group=TimeIndoors lineattrs=(thickness=2);
   refline 50 / axis=y lineattrs=(color=red);
   xaxis grid; yaxis grid;
run;

The horizontal red line in the graph is the line of 50% mortality. For each curve, a crude estimate of LD50 is the temperature at which the curve crosses that line. The graphical estimates and the estimates in the paper are shown in the following table. The estimates in the authors' paper are greater than the estimates from my graph, but I do not know why. If you use something other than linear interpolation (for example, a loess smoother), you will get different curves and different estimates.

Although these graphical estimates differ slightly from the published results, the numbers tell the same story. On average, a spruce Christmas tree that is not brought indoors is hardy to about -28C. If you bring a tree indoors for 3 days, it is hardy only to about -25C. The longer a tree is indoors, the more it loses hardiness. For trees that are indoors for 20 days, the median lethal temperature is -18C, or about 10 degrees warmer than for the control group.

Better estimates of LD50

The graphical estimates are crude. They are based on linear interpolation between two consecutive data points: one for which the mortality is below 50% and the next for which the mortality is above 50%. The estimates ignore all other data. Furthermore, the estimates assume that the mortality is linear between those two points, which is not usually the case. The mortality curve is typically a sigmoid (or S-shaped) curve.

Fortunately, we can use statistics to address these concerns. The usual way to estimate LD50 in SAS is to use PROC PROBIT. For these data, we will perform a separate analysis for each value of the TimeIndoors variable. The INVERSECL option on the MODEL statement estimates the Temperature (and confidence limits) for a range of probability values. You can use the ODS OUTPUT statement to write the statistics to a SAS data set so that you can use PROC SGPLOT to overlay all five curves on a single graph, as follows:

proc probit data=SpruceBudDamage plots=(predpplot);
   by TimeIndoors;
   model NEvents / NTrials = Temp / InverseCL;
   ods exclude ProbitAnalysis;
   ods output ProbitAnalysis=ProbitOut;
run;
 
proc sgplot data=ProbitOut;
   band y=Probability lower=LowerCL upper=UpperCL / group=TimeIndoors transparency=0.5;
   series y=Probability x=Variable / group=TimeIndoors;
   refline 0.50 / axis=y lineattrs=(color=brown);
   xaxis grid values=(-30 to -3 by 3); yaxis grid;
run;

For each treatment group (time spent indoors), the graph shows probability curves for bud mortality as a function of the outdoor temperature. The median lethal temperature is where these curves and inverse confidence intervals intersect the line of 0.5 probability. (They are inverse confidence limits because they are for the value of the X value that produces a given Y value.) You can use PROC PRINT to display the estimates for LD50 for each treatment group:

The estimates for LD50 use all the data to model the bud mortality. This method also produces 95% (inverse) confidence limits for the LD50 parameter. For one of the treatment groups (TimeIndoors=14), the confidence limits could not be produced. The documentation for PROC PROBIT discusses why this can happen.

If you want, you can use this table to estimate the differences between the LD50 values.

Summary

In summary, growing beautiful Christmas trees requires a lot of science and statistics. This article analyzes data from an experiment in which potted Christmas trees were brought indoors and later returned outdoors where they can experience freezing temperatures. Trees that are brought indoors lose some of their hardiness and can be damaged by freezing temperatures. This article shows how to use PROC PROBIT in SAS to compute the median lethal temperature (LD50), which is the temperature at which half of the trees would be damaged. After 20 days indoors, a spruce tree will lose about 10 degrees (C) of resistance to freezing temperatures.

If you are thinking about getting a containerized (live) Christmas tree, Cregg (2020) wrote an article about how to take care of it and how to prepare it for transplanting after Christmas. He suggests waiting until spring. In the 2009 experiment, 100% of the trees that had been brought indoors for 10 or more days died within six months of transplanting, as compared to 0% of the control group. This happened even though the outdoor temperature never approached the LD50 level. This experiment was done in Michigan, so the results might not apply to trees in warmer regions.

The post Can you transplant an indoor Christmas tree? appeared first on The DO Loop.

12月 162019
 
Rockin' around the Christmas tree
At the Christmas party hop.
     – Brenda Lee
Christmas tree image with added noise

Last Christmas, I saw a fun blog post that used optimization methods to de-noise an image of a Christmas tree. Although there are specialized algorithms that remove random noise from an image, I am not going to use a specialized de-noising algorithm. Instead, I will use the SVD algorithm, which is a general-purpose matrix algorithm that has many applications. One application is to construct a low-rank approximation of a matrix. This article applies the SVD algorithm to a noisy image of a Christmas tree. The result is recognizable as a Christmas tree, but much of the noise has been eliminated.

A noisy Christmas tree image

The image to the right is a heat map of a binary (0/1) matrix that has 101 rows and 100 columns. First, I put the value 1 in certain cells so that the heat map resembles a Christmas tree. About 41% of the cells have the value 1. We will call these cells "the tree".

To add random noise to the image, I randomly switched 10% of the cells. The result is an image that is recognizable as a Christmas tree but is very noisy.

Apply the SVD to a matrix

As mentioned earlier, I will attempt to de-noise the matrix (image) by using the general-purpose SVD algorithm, which is available in SAS/IML software. If M is the 101 x 100 binary matrix, then the following SAS/IML statements factor the matrix by using a singular-value decomposition. A series plot displays the first 20 singular values (ordered by magnitude) of the noisy Christmas tree matrix:

call svd(U, D, V, M);                             /* A = U*diag(D)*V` */
call series(1:20, D) grid={x y} xvalues=1:nrow(D) label={"Component" "Singular Value"};

The graph (which is similar to a scree plot in principal component analysis) indicates that the first four singular values are somewhat larger than the remainder. The following statements approximate M by using rank-4 matrix. The rank-4 matrix is not a binary matrix, but you can visualize the rank-4 approximation matrix by using a continuous heat map. For convenience, the matrix elements are shifted and scaled into the interval [0,1].

keep = 4;        /* how many components to keep? */
idx = 1:keep;
Ak = U[ ,idx] * diag(D[idx]) * V[ ,idx]`;
Ak = (Ak - min(Ak)) / range(Ak); /* for plotting, standardize into [0,1] */
s = "Rank " + strip(char(keep)) + " Approximation of Noisy Christmas Tree Image";
call heatmapcont(Ak) colorramp=ramp displayoutlines=0 showlegend=0 title=s range={0, 1};

Apply a threshold cutoff to classify pixels

The continuous heat map shows a dark image of the Christmas tree surrounded by a light-green "fog". The dark-green pixels represent cells that have values near 1. The light-green pixels have smaller values. You can use a histogram to show the distribution of values in the rank-4 matrix:

ods graphics / width=400px height=300px;
call histogram(Ak);

You can think of the cell values as being a "score" that predicts whether each cell belongs to the Christmas tree (green) or not (white). The histogram indicates that the values in the matrix appear to be a mixture of two distributions. The high values in the histogram correspond to dark-green pixels (cells); the low values correspond to light-green pixels. To remove the light-green "fog" in the rank-4 image, you can use a "threshold value" to convert the continuous values to binary (0/1) values. Essentially, this operation predicts which cells are part of the tree and which are not.

For every choice of the threshold parameter, there will be correct and incorrect predictions:

  • A "true positive" is a cell that belongs to the tree and is colored green.
  • A "false positive" is a cell that does not belong to the tree but is colored green.
  • A "true negative" is a cell that does not belong to the tree and is colored white.
  • A "false negative" is a cell that belongs to the tree but is colored white.

By looking at the histogram, you might guess that a threshold value of 0.5 will do a good job of separating the low and high values. The following statements use 0.5 to convert the rank-4 matrix into a binary matrix, which you can think of as the predicted values of the de-noising process:

t = 0.5;  /* threshold parameter */
s = "Discrete Filter: Threshold = " + strip(char(t,4,2)) ;
call HeatmapDisc(Ak > t) colorramp=ramp displayoutlines=0 showlegend=0 title=s;

Considering that 10% of the original image was corrupted, the heat map of the reconstructed matrix is impressive. It looks like a Christmas tree, but has much less noise. The reconstructed matrix agrees with the original (pre-noise) matrix for 98% of the cells. In addition:

  • There are only a few false positives (green dots) that are far from the tree.
  • There are some false negatives in the center of the tree, but many fewer than were in the original noisy image.
  • The locations where the image is the most ragged is along the edge and trunk of the Christmas tree. In that region, the de-noising could not tell the difference between tree and non-tree.

The effect of the threshold parameter

You might wonder what the reconstruction would look like for different choices of the cutoff threshold. The following statements create two other heat maps, one for the threshold value 0.4 and the other for the threshold value 0.6. As you might have guessed, smaller threshold values result in more false positives (green pixels away from the tree), whereas higher threshold values result in more false negatives (white pixels inside the tree).

do t = 0.4 to 0.6 by 0.2;
   s = "Discrete Filter: Threshold = " + strip(char(t,4,2)) ;
   call HeatmapDisc(Ak > t) colorramp=ramp displayoutlines=0 showlegend=0 title=s;
end;

Summary

Although I have presented this experiment in terms of an image of a Christmas tree, it is really a result about matrices and low-rank approximations. I started with a binary 0/1 matrix, and I added noise to 10% of the cells. The SVD factorization enables you to approximate the matrix by using a rank-4 approximation. By using a cutoff threshold, you can approximate the original matrix before the random noise was added. Although the SVD algorithm is a general-purpose algorithm that was not designed for de-noising images, it does a good job eliminating the noise and estimating the original matrix.

If you are interested in exploring these ideas for yourself, you can download the complete SAS program that creates the graphs in this article.

The post Math-ing around the Christmas tree: Can the SVD de-noise an image? appeared first on The DO Loop.

12月 102018
 
The best way to spread Christmas cheer
is singing loud for all to hear!
-Buddy in Elf

In the Christmas movie Elf (2003), Jovie (played by Zooey Deschanel) must "spread Christmas cheer" to help Santa. She chooses to sing "Santa Claus is coming to town," and soon all of New York City is singing along.

The best sing-along songs are short and have lyrics that repeat. Jovie's choice, "Santa Claus is coming to town," satisfies both criteria. The musical structure of the song is simple:

  • Verse 1: You better watch out / You better not cry / Better not pout / I'm telling you why
  • Tag line: Santa Claus is coming to town
  • Verse 2: He's making a list / And checking it twice; / Gonna find out / Who's naughty and nice
  • Tag line repeats
  • Bridge: He sees you when you're sleeping / He knows when you're awake / He knows if you've been bad or good / So be good for goodness sake! / O!
  • Verse 1 repeats
  • Partial tags and final tag: Santa Claus is coming / Santa Claus is coming / Santa Claus is coming to town

There is a fun way to visualize repetition in song lyrics. For a song that has N words, you can define the repetition matrix to be the N x N matrix where the (i,j)th cell has the value 1 if the i_th word is the same as the j_th word. Otherwise, the (i,j)th cell equals 0. You can visualize the matrix by using a two-color heat map. Colin Morris has a web site devoted to these visualizations.

The following image visualizes the lyrics of "Santa Claus is coming to town." I have added some vertical and horizontal lines to divide the lyrics into seven sections: the verses (V1 and V2), the tag line (S), and the bridge (B).

The image shows the structure of the repetition in the song lyrics:

  • The first verse contains the repetition of the words 'you', 'better', and 'not'.
  • The second verse repeats only the word 'out' from Verse 1.
  • The bridge repeats the word 'you', which appeared three times in Verse 1. It also repeats several words ('when', 'knows', 'good', ...) within the bridge.
  • The tag line "Santa Claus is coming [to town]" is repeated a total of five times.

Now that you understand what a repetition matrix looks like and how to interpret it, let's visualize a few other classic Christmas songs that contain repetitive lyrics! To help "spread Christmas cheer," I'll use shades of red and green to visualize the lyrics, rather than the boring white and black colors.

The Twelve Days of Christmas

If you make a list of Christmas songs that have repetition, chances are "The Twelve Days of Christmas" will be at the top of the list. The song is formulaic: each new verse adds a few new words before repeating the words from the previous verse. As a result, the repetition matrix is almost boring in its regularity. Here is the visualization of the classic song (click to enlarge):

Little Drummer Boy

Another highly repetitive Christmas song is "The Little Drummer Boy," which features an onomatopoeic phrase (Pa rum pum pum pum) that alternates with the other lyrics. A visualization of the classic song is shown below:

Silver Bells

In addition to repeating the title, "Silver Bells" repeats several phrases. Most notably, the phrase "Soon it will be Christmas Day" is repeated multiple times at the end of the song. Because only certain phrases are repeated, the visualization has a pleasing structure that complements the song's lyrical qualities:

Silent Night

To contrast the hustle, bustle, and commercialism of Christmas, I enjoy hearing songs that are musically simple. One of my favorites is "Silent Night." Each verse is distinct, yet each begins with "Silent night, holy night!" and ends by repeating a phrase. The resulting visualization is devoid of clutter. It is visually empty and matches the lyrical imagery, "all is calm, all is bright."

Your turn!

You can download the SAS program that creates these images. The program also computes visualizations of some contemporary songs such as "Last Christmas" by Wham!, "Someday at Christmas" (Stevie Wonder version), "Rockin' Around the Christmas Tree" (Brenda Lee version), and "Happy XMas (War Is Over)" by John Lennon and Yoko Ono. If you have access to SAS, you can even add your own favorite lyrics to the program! If you don't have access to SAS, Colin Morris's website enables you to paste in the lyrics and see the visualization.

In a little-known "deleted scene" from Elf, Buddy says that the second-best way to spread Christmas cheer is posting images for all to share! So post a comment and share your favorite visualization of a Christmas song!

Happy holidays to all my readers. I am grateful for you. Merry Christmas to all, and to all a good night!

The post Visualize Christmas songs appeared first on The DO Loop.

11月 222018
 

This week I noticed that they've started building the lot where they sell Christmas trees near SAS (at the intersection of Maynard & Reedy Creek Rd). They put up a nice rustic wooden fence, and lights, and maybe even a fire pit to keep their workers warm. They sell some [...]

The post Finding a cut-your-own Christmas tree in North Carolina appeared first on SAS Learning Post.

11月 222018
 

This week I noticed that they've started building the lot where they sell Christmas trees near SAS (at the intersection of Maynard & Reedy Creek Rd). They put up a nice rustic wooden fence, and lights, and maybe even a fire pit to keep their workers warm. They sell some [...]

The post Finding a cut-your-own Christmas tree in North Carolina appeared first on SAS Learning Post.

12月 162017
 

With the Christmas holiday approaching, I got to wondering what they call Santa in other countries. Of course, some countries don't celebrate Christmas - but most countries at least have some sort of "winter holiday," and most also have some tradition of gift-giving. So, I guess the better question might [...]

The post What do they call Santa in other countries? appeared first on SAS Learning Post.

12月 172016
 

"Two weeks to go," Santa said to himself, with millions of toys stacked up on the shelves. Each year worry hit at the same time – "How do I get the right toy to the right child without losing my mind?" Though Old St. Nick didn't have a computer science degree, deep down […]

The post How Santa uses data quality to wrap up Christmas appeared first on The Data Roundtable.