time series

1月 062021
 

For ordinary least squares (OLS) regression, you can use a basic bootstrap of the residuals (called residual resampling) to perform a bootstrap analysis of the parameter estimates. This is possible because an assumption of OLS regression is that the residuals are independent. Therefore, you can reshuffle the residuals to get each bootstrap sample.

For a time series, the residuals are not independent. Rather, if you fit a model to the data, the residuals at time t+i are often close to the residual at time t for small values of i. This is known as autocorrelation in the error component. Accordingly, if you want to bootstrap the residuals of a time series, it is not correct to randomly shuffle the residuals, which would destroy the autocorrelation. Instead, you need to randomly choose a block of residuals (for example, at times t, t+1, ..., and t+L) and use those blocks of residuals to create bootstrap resamples. You repeatedly choose random blocks until you have enough residuals to create a bootstrap resample.

There are several ways to choose blocks:

  • The simplest way is to choose from non-overlapping blocks of a fixed length, L. This is called the simple block bootstrap and is described in this article.
  • Only slightly more complicated is to allow the blocks of length L to overlap. This is called the moving block bootstrap and is described in a separate article.
  • A more complicated scheme (but which has superior statistical properties) is to allow the length of the blocks to vary. This is called the stationary block bootstrap and is described in a third article.

Create the residuals

There are many ways to fit a model to a time series and to obtain the model residuals. Trovero and Leonard (2018) discuss several modern methods to fit trends, cycles, and seasonality by using SAS 9.4 or SAS Viya. To get the residuals, you will want to fit an additive model. In this article, I will use the Sashelp.Air data and will fit a simple additive model (trend plus noise) by using the AUTOREG procedure in SAS/ETS software.

The Sashelp.Air data set has 144 months of data. The following SAS DATA step drops the first year of data, which leaves 11 years of 12 months. I am doing this because I am going to use blocks of size L=12, and I think the example will be clearer if there are 11 blocks of size 12 (rather than 12 blocks).

data Air;
set Sashelp.Air;
if Date >= '01JAN1950'd;   /* exclude first year of data */
Time = _N_;                /* the observation number */
run;
 
title "Original Series: Air Travel";
proc sgplot data=Air;
   series x=Time y=Air;
   xaxis grid; yaxis grid;
run;

The graph suggests that the time series has a linear trend. The following call to PROC AUTOREG fits a linear model to the data. The predicted mean and residuals are output to the OutReg data set as the PRED and RESID variables, respectively. The call to PROC SGPLOT overlays a graph of the trend and a graph of the residuals.

/* Similar to Getting Started example in PROC AUTOREG */
proc autoreg data=Air plots=none outest=RegEst;
   AR12: model Air = Time / nlag=12;
   output out=OutReg pm=Pred rm=Resid;  /* mean prediction and residuals */
   ods select FinalModel.ParameterEstimates ARParameterEstimates;
run;
 
title "Mean Prediction and Residuals from AR Model";
proc sgplot data=OutReg;
   series x=Time y=Pred;
   series x=Time y=Resid;
   refline 0 / axis=y;
   xaxis values=(24 to 144 by 12) grid valueshint;
run;

The parameter estimates are shown for the linear model. On average, airlines carried an additional 2.8 thousand passengers per month during this time period. The graph shows the decomposition of the series into a linear trend and residuals. I added vertical lines to indicate the blocks of residuals that are used in the next section. The first block contains the residuals for times 13-24. The second block contains the residuals for times 25-36, and so forth until the 11th block, which contains the residuals for times 133-144.

Set up the simple block bootstrap

For the simple bootstrap, the length of the blocks (L) must evenly divide the length of the series (n), which means that k = n / L is an integer. Because I dropped the first year of observations from Sashelp.Air, there are n=132 observations. I will choose the block size to be L=12, which means that there are k=11 non-overlapping blocks.

Each bootstrap resample is formed by randomly choosing k blocks (with replacement) and add those residuals to the predicted values. Think about putting the n predicted values and residuals into a matrix in row-wise order. The first L observations are in the first row, the next L are in the second row, and so forth. Thus, the matrix has k rows and L columns. The original series is of the form Predicted + Residuals, where the plus sign represents matrix addition. For the simple block bootstrap, each bootstrap resample is obtained by resampling the rows of the residual array and adding the rows together to obtain a new series of the form Predicted + (Random Residuals). This process is shown schematically in the following figure.

You can use the SAS/IML language to implement the simple block bootstrap. The following call to PROC IML reads in the original predicted and residual values and reshapes then vectors into k x L matrices (P and R, respectively). The SAMPLE function generates a sample (with replacement) of the vector 1:k, which is used to randomly select rows of the R matrix. To make sure that the process is working as expected, you can create one bootstrap resample and graph it. It should resemble the original series:

/* SIMPLE BLOCK BOOTSTRAP */
%let L = 12;
proc iml;
call randseed(12345);
/* the original series is Y = Pred + Resid */
use OutReg;
   read all var {'Time' 'Pred' 'Resid'};
close;
 
/* For the Simple Block Bootstrap, the length of the series (n) 
   must be divisible by the block length (L). */
n = nrow(Pred);          /* length of series */
L = &L;                  /* length of each block */
k = n / L;               /* number of non-overlapping blocks */
if k ^= int(k) then 
   ABORT "The series length is not divisible by the block length";
 
/* Trick: reshape data into k x L matrix. Each row is block of length L */
P = shape(Pred, k, L);
R = shape(Resid, k, L); /* non-overlapping residuals (also k x L) */
 
/* Example: Generate one bootstrap resample by randomly selecting from the residual blocks */
idx = sample(1:nrow(R), k);     /* sample (w/ replacement) of size k from the set 1:k */
YBoot = P + R[idx,];
title "One Bootstrap Resample";
title2 "Simple Block Bootstrap";
refs = "refline " + char(do(12,nrow(Pred),12)) + " / axis=x;";
call series(Time, YBoot) other=refs;

The graph shows one bootstrap resample. The residuals from arbitrary blocks are concatenated until there are n residuals. These are added to the predicted value to create a "new" series, which is a bootstrap resample. You can generate a large number of bootstrap resamples and use them to perform inferences for time series statistics.

Implement the simple block bootstrap in SAS

You can repeat the process in a loop to generate more resamples. The following statements generate B=1000 bootstrap resamples. These are written to a SAS data set (BootOut). The program uses a technique in which the results of each computation are immediately written to a SAS data set, which is very efficient. The program writes the Time variable, the resampled series (YBoot), and an ID variable that identifies each bootstrap sample.

/* The simple block bootstrap repeats this process B times
   and usually writes the resamples to a SAS data set. */
B = 1000;
J = nrow(R);              /* J=k for non-overlapping blocks, but prepare for moving blocks */
SampleID = j(n,1,.);
create BootOut var {'SampleID' 'Time' 'YBoot'};  /* open data set outside of loop */
do i = 1 to B;
   SampleId[,] = i;       /* fill array: https://blogs.sas.com/content/iml/2013/02/18/empty-subscript.html */
   idx = sample(1:J, k);  /* sample of size k from the set 1:k */
   YBoot = P + R[idx,];
   append;                /* append each bootstrap sample */
end;
close BootOut;
QUIT;

The BootOut data set contains B=1000 bootstrap samples. You can efficiently analyze the samples by using a BY statement. For example, suppose that you want to investigate how the parameter estimates for the trend line vary among the bootstrap samples. You can run PROC AUTOREG on each bootstrap sample by using BY-group processing. Be sure to suppress ODS output during the BY-group analysis, and write the desired statistics to an output data set (BootEst), as follows:

/* Analyze the bootstrap samples by using a BY statement. See
   https://blogs.sas.com/content/iml/2012/07/18/simulation-in-sas-the-slow-way-or-the-by-way.html
*/
proc autoreg data=BootOut plots=none outest=BootEst noprint;
   by SampleID;
   AR12: model YBoot = Time / nlag=12;
run;
 
/* OPTIONAL: Use PROC MEANS or PROC UNIVARIATE to estimate standard errors and CIs */
proc means data=BootEst mean stddev P5 P95;
   var Intercept Time _A:;
run;
 
title "Distribution of Parameter Estimates";
proc sgplot data=BootEst;
   scatter x=Intercept y=Time;
   xaxis grid; yaxis grid; 
   refline 77.5402 / axis=x;
   refline 2.7956  / axis=y;
run;

The scatter plot shows the bootstrap distribution of the parameter estimates of the linear trend. The reference lines indicate the parameter estimates for the original data. You can use the bootstrap distribution for inferential statistics such as estimation of standard errors, confidence intervals, the covariance of estimates, and more.

You can perform a similar bootstrap analysis for any other statistic that is generated by any time series analysis. The important thing is that the block bootstrap is performed on some sort of residual or "noise" component, so be sure to remove the trend, seasonality, cycles, and so forth and then bootstrap the remainder.

Summary

This article shows how to perform a simple block bootstrap on a time series in SAS. First, you need to decompose the series into additive components: Y = Predicted + Residuals. You then choose a block length (L), which (for the simple block bootstrap) must divide the total length of the series (n). Each bootstrap resample is generated by randomly choosing blocks of residuals and adding them to the predicted model. This article uses the SAS/IML language to perform the simple block bootstrap in SAS.

In practice, the simple block bootstrap is rarely used. However, it illustrates the basic ideas for bootstrapping a time series, and it provides a foundation for more sophisticated bootstrap methods.

The post The simple block bootstrap for time series in SAS appeared first on The DO Loop.

1月 062021
 

For ordinary least squares (OLS) regression, you can use a basic bootstrap of the residuals (called residual resampling) to perform a bootstrap analysis of the parameter estimates. This is possible because an assumption of OLS regression is that the residuals are independent. Therefore, you can reshuffle the residuals to get each bootstrap sample.

For a time series, the residuals are not independent. Rather, if you fit a model to the data, the residuals at time t+i are often close to the residual at time t for small values of i. This is known as autocorrelation in the error component. Accordingly, if you want to bootstrap the residuals of a time series, it is not correct to randomly shuffle the residuals, which would destroy the autocorrelation. Instead, you need to randomly choose a block of residuals (for example, at times t, t+1, ..., and t+L) and use those blocks of residuals to create bootstrap resamples. You repeatedly choose random blocks until you have enough residuals to create a bootstrap resample.

There are several ways to choose blocks:

  • The simplest way is to choose from non-overlapping blocks of a fixed length, L. This is called the simple block bootstrap and is described in this article.
  • Only slightly more complicated is to allow the blocks of length L to overlap. This is called the moving block bootstrap and is described in a separate article.
  • A more complicated scheme (but which has superior statistical properties) is to allow the length of the blocks to vary. This is called the stationary block bootstrap and is described in a third article.

Create the residuals

There are many ways to fit a model to a time series and to obtain the model residuals. Trovero and Leonard (2018) discuss several modern methods to fit trends, cycles, and seasonality by using SAS 9.4 or SAS Viya. To get the residuals, you will want to fit an additive model. In this article, I will use the Sashelp.Air data and will fit a simple additive model (trend plus noise) by using the AUTOREG procedure in SAS/ETS software.

The Sashelp.Air data set has 144 months of data. The following SAS DATA step drops the first year of data, which leaves 11 years of 12 months. I am doing this because I am going to use blocks of size L=12, and I think the example will be clearer if there are 11 blocks of size 12 (rather than 12 blocks).

data Air;
set Sashelp.Air;
if Date >= '01JAN1950'd;   /* exclude first year of data */
Time = _N_;                /* the observation number */
run;
 
title "Original Series: Air Travel";
proc sgplot data=Air;
   series x=Time y=Air;
   xaxis grid; yaxis grid;
run;

The graph suggests that the time series has a linear trend. The following call to PROC AUTOREG fits a linear model to the data. The predicted mean and residuals are output to the OutReg data set as the PRED and RESID variables, respectively. The call to PROC SGPLOT overlays a graph of the trend and a graph of the residuals.

/* Similar to Getting Started example in PROC AUTOREG */
proc autoreg data=Air plots=none outest=RegEst;
   AR12: model Air = Time / nlag=12;
   output out=OutReg pm=Pred rm=Resid;  /* mean prediction and residuals */
   ods select FinalModel.ParameterEstimates ARParameterEstimates;
run;
 
title "Mean Prediction and Residuals from AR Model";
proc sgplot data=OutReg;
   series x=Time y=Pred;
   series x=Time y=Resid;
   refline 0 / axis=y;
   xaxis values=(24 to 144 by 12) grid valueshint;
run;

The parameter estimates are shown for the linear model. On average, airlines carried an additional 2.8 thousand passengers per month during this time period. The graph shows the decomposition of the series into a linear trend and residuals. I added vertical lines to indicate the blocks of residuals that are used in the next section. The first block contains the residuals for times 13-24. The second block contains the residuals for times 25-36, and so forth until the 11th block, which contains the residuals for times 133-144.

Set up the simple block bootstrap

For the simple bootstrap, the length of the blocks (L) must evenly divide the length of the series (n), which means that k = n / L is an integer. Because I dropped the first year of observations from Sashelp.Air, there are n=132 observations. I will choose the block size to be L=12, which means that there are k=11 non-overlapping blocks.

Each bootstrap resample is formed by randomly choosing k blocks (with replacement) and add those residuals to the predicted values. Think about putting the n predicted values and residuals into a matrix in row-wise order. The first L observations are in the first row, the next L are in the second row, and so forth. Thus, the matrix has k rows and L columns. The original series is of the form Predicted + Residuals, where the plus sign represents matrix addition. For the simple block bootstrap, each bootstrap resample is obtained by resampling the rows of the residual array and adding the rows together to obtain a new series of the form Predicted + (Random Residuals). This process is shown schematically in the following figure.

You can use the SAS/IML language to implement the simple block bootstrap. The following call to PROC IML reads in the original predicted and residual values and reshapes then vectors into k x L matrices (P and R, respectively). The SAMPLE function generates a sample (with replacement) of the vector 1:k, which is used to randomly select rows of the R matrix. To make sure that the process is working as expected, you can create one bootstrap resample and graph it. It should resemble the original series:

/* SIMPLE BLOCK BOOTSTRAP */
%let L = 12;
proc iml;
call randseed(12345);
/* the original series is Y = Pred + Resid */
use OutReg;
   read all var {'Time' 'Pred' 'Resid'};
close;
 
/* For the Simple Block Bootstrap, the length of the series (n) 
   must be divisible by the block length (L). */
n = nrow(Pred);          /* length of series */
L = &L;                  /* length of each block */
k = n / L;               /* number of non-overlapping blocks */
if k ^= int(k) then 
   ABORT "The series length is not divisible by the block length";
 
/* Trick: reshape data into k x L matrix. Each row is block of length L */
P = shape(Pred, k, L);
R = shape(Resid, k, L); /* non-overlapping residuals (also k x L) */
 
/* Example: Generate one bootstrap resample by randomly selecting from the residual blocks */
idx = sample(1:nrow(R), k);     /* sample (w/ replacement) of size k from the set 1:k */
YBoot = P + R[idx,];
title "One Bootstrap Resample";
title2 "Simple Block Bootstrap";
refs = "refline " + char(do(12,nrow(Pred),12)) + " / axis=x;";
call series(Time, YBoot) other=refs;

The graph shows one bootstrap resample. The residuals from arbitrary blocks are concatenated until there are n residuals. These are added to the predicted value to create a "new" series, which is a bootstrap resample. You can generate a large number of bootstrap resamples and use them to perform inferences for time series statistics.

Implement the simple block bootstrap in SAS

You can repeat the process in a loop to generate more resamples. The following statements generate B=1000 bootstrap resamples. These are written to a SAS data set (BootOut). The program uses a technique in which the results of each computation are immediately written to a SAS data set, which is very efficient. The program writes the Time variable, the resampled series (YBoot), and an ID variable that identifies each bootstrap sample.

/* The simple block bootstrap repeats this process B times
   and usually writes the resamples to a SAS data set. */
B = 1000;
J = nrow(R);              /* J=k for non-overlapping blocks, but prepare for moving blocks */
SampleID = j(n,1,.);
create BootOut var {'SampleID' 'Time' 'YBoot'};  /* open data set outside of loop */
do i = 1 to B;
   SampleId[,] = i;       /* fill array: https://blogs.sas.com/content/iml/2013/02/18/empty-subscript.html */
   idx = sample(1:J, k);  /* sample of size k from the set 1:k */
   YBoot = P + R[idx,];
   append;                /* append each bootstrap sample */
end;
close BootOut;
QUIT;

The BootOut data set contains B=1000 bootstrap samples. You can efficiently analyze the samples by using a BY statement. For example, suppose that you want to investigate how the parameter estimates for the trend line vary among the bootstrap samples. You can run PROC AUTOREG on each bootstrap sample by using BY-group processing. Be sure to suppress ODS output during the BY-group analysis, and write the desired statistics to an output data set (BootEst), as follows:

/* Analyze the bootstrap samples by using a BY statement. See
   https://blogs.sas.com/content/iml/2012/07/18/simulation-in-sas-the-slow-way-or-the-by-way.html
*/
proc autoreg data=BootOut plots=none outest=BootEst noprint;
   by SampleID;
   AR12: model YBoot = Time / nlag=12;
run;
 
/* OPTIONAL: Use PROC MEANS or PROC UNIVARIATE to estimate standard errors and CIs */
proc means data=BootEst mean stddev P5 P95;
   var Intercept Time _A:;
run;
 
title "Distribution of Parameter Estimates";
proc sgplot data=BootEst;
   scatter x=Intercept y=Time;
   xaxis grid; yaxis grid; 
   refline 77.5402 / axis=x;
   refline 2.7956  / axis=y;
run;

The scatter plot shows the bootstrap distribution of the parameter estimates of the linear trend. The reference lines indicate the parameter estimates for the original data. You can use the bootstrap distribution for inferential statistics such as estimation of standard errors, confidence intervals, the covariance of estimates, and more.

You can perform a similar bootstrap analysis for any other statistic that is generated by any time series analysis. The important thing is that the block bootstrap is performed on some sort of residual or "noise" component, so be sure to remove the trend, seasonality, cycles, and so forth and then bootstrap the remainder.

Summary

This article shows how to perform a simple block bootstrap on a time series in SAS. First, you need to decompose the series into additive components: Y = Predicted + Residuals. You then choose a block length (L), which (for the simple block bootstrap) must divide the total length of the series (n). Each bootstrap resample is generated by randomly choosing blocks of residuals and adding them to the predicted model. This article uses the SAS/IML language to perform the simple block bootstrap in SAS.

In practice, the simple block bootstrap is rarely used. However, it illustrates the basic ideas for bootstrapping a time series, and it provides a foundation for more sophisticated bootstrap methods.

The post The simple block bootstrap for time series in SAS appeared first on The DO Loop.

3月 022020
 

A colleague recently posted an article about how to use SAS Visual Analytics to create a circular graph that displays a year's worth of temperature data. Specifically, the graph shows the air temperature for each day in a year relative to some baseline temperature, such as 65F (18C). Days warmer than baseline are displayed in one color (red for warm) whereas days colder than the baseline are displayed in another color (blue for cold). The graph was very pretty. A reader posted a comment asking whether a similar graph could be created by using other graphical tools such as GTL or even PROC SGPLOT. The answer is yes, but I am going to propose a different graph that I think is more flexible and easier to read.

Let's generalize the problem. Suppose you have a time series and you want to compare the values to a baseline (or reference) value. One way to do this is to visualize the data as deviations from the baseline. Data values that are close to the baseline will be small and almost unnoticeable. The eye will be drawn to values that indicate large deviations from the baseline. A "deviation plot" like this can be used for many purposes. Some applications include monitoring blood glucose relative to a target value, showing expenditures relative to a fixed income amount, and, yes, displaying the temperature relative to some comfortable reference value. Deviation plots sometimes accompany a hypothesis test for a one-way frequency distribution.

Linear displays versus circular displays

My colleague's display shows one year's worth of temperatures by plotting the day of the year along a circle. While this makes for an eye-catching display, there are a few shortcomings to this approach:

  • It is difficult to read the data values. It is also difficult to compare values that are on opposite sides of a circle. For example, how does March data compare with October data?
  • Although a circle can show data for one year, it is less effective for showing 8 or 14 months of data.
  • Even for one year's worth of data, it has a problem: It places December 31 next to January 1. In the temperature graph, the series began on 01JAN2018. However, the graph places 31DEC2018 next to 01JAN2018 even though those values are a year apart.

As mentioned earlier, you can use SAS/GRAPH or the statistical graphics (SG) procedure in SAS to display the data in polar coordinates. Sanjay Matange's article shows how to create a polar plot. For some of my thought about circular versus rectangular displays, see "Smoothers for periodic data."

A deviation-from-baseline plot

The graph to the right (click to enlarge) shows an example of a deviation plot (or deviation-from-baseline plot). It is similar to a waterfall chart, but in many waterfall charts the values are shown as percentages, whereas for the deviation plot we will show the observed values. You can see that the values are plotted for each day. The high values are plotted in one color (red) whereas low values are plotted in a different color (blue). A reference line (in this case, at 100) is displayed.

To create a deviation plot, you need to perform these three steps:

  1. Use the SAS DATA step to encode the data as 'High' or 'Low' by using the reference value. Compute the deviations from the reference value.
  2. Create a discrete attribute map that maps values to colors. This step is optional. Alternatively, SAS will assign colors based on the current ODS style.
  3. Use a HIGHLOW plot to graph the deviations from the reference value.

Let's implement these steps on a time series for three months of daily blood glucose values. An elderly male takes oral medications to control his blood glucose level. Each morning he takes his fasting blood glucose level and records it. The doctor has advised him to try to keep the blood glucose level below 100 mg/dL, so the reference value is 100. The following DATA step defines the dates and glucose levels for a three-month period.

data Series;
informat Date date.;
format Date Date.;
input Date y @@;
label y = "Blood Glucose (mg/dL)";
datalines;
01SEP19 100 02SEP19  96 03SEP19  86 04SEP19  93 05SEP19 105 06SEP19 106 07SEP19 123 
08SEP19 121 09SEP19 115 10SEP19 108 11SEP19  94 12SEP19  96 13SEP19  95 14SEP19 120
15SEP19 112 16SEP19 104 17SEP19  97 18SEP19 101 19SEP19 108 20SEP19 108 21SEP19 117 
22SEP19 103 23SEP19 109 24SEP19  97 25SEP19  93 26SEP19 100 27SEP19  98 28SEP19 122 
29SEP19 116 30SEP19  99 01OCT19 102 02OCT19  99 03OCT19  95 04OCT19  99 05OCT19 116 
06OCT19 109 07OCT19 106 08OCT19  94 09OCT19 104 10OCT19 112 11OCT19 119 12OCT19 111 
13OCT19 104 14OCT19 101 15OCT19  99 16OCT19  92 17OCT19 101 18OCT19 115 19OCT19 109 
20OCT19  98 21OCT19  91 22OCT19  92 23OCT19 100 24OCT19 109 25OCT19 102 26OCT19 117 
27OCT19 106 28OCT19  98 29OCT19  98 30OCT19  95 31OCT19  97 01NOV19 129 02NOV19 120 
03NOV19 117 04NOV19   . 05NOV19 101 06NOV19 105 07NOV19 105 08NOV19 106 09NOV19 118 
10NOV19 109 11NOV19 102 12NOV19  98 13NOV19  97 14NOV19 .   15NOV19  92 16NOV19 114 
17NOV19 107 18NOV19  98 19NOV19  91 20NOV19  97 21NOV19 109 22NOV19  98 23NOV19 95 
24NOV19  95 25NOV19  94 26NOV19   . 27NOV19  98 28NOV19 115 29NOV19 123 30NOV19 114 
01DEC19 104 02DEC19  96 03DEC19  97 04DEC19 100 05DEC19  94 06DEC19  93 07DEC19 105 
08DEC19   . 09DEC19  88 10DEC19  84 11DEC19 101 12DEC19 122 13DEC19 114 14DEC19 108 
15DEC19 103 16DEC19  88 17DEC19  74 18DEC19  92 19DEC19 110 20DEC19 118 21DEC19 106 
22DEC19 100 23DEC19 106 24DEC19 107 25DEC19 116 26DEC19 113 27DEC19 113 28DEC19 117 
29DEC19 101 30DEC19  96 31DEC19 101  
;

Encode the data

The first step is to compute the deviation of each observed value from the reference value. If an observed value is above the reference value, mark it as 'High', otherwise mark it as 'Low'. We will plot a vertical bar that goes from the reference level to the observed value. Because we will use a HIGHLOW statement to display the graph, the DATA step computes two new variables, High and Low.

/* 1. Compute the deviation and encode the data as 'High' or 'Low' by using the reference value */
%let RefValue = 100;
 
data Center;
set Series;
if (y > &RefValue) then Group="High";
else Group="Low";
Low  = min(y, &RefValue);    /* lower end of highlow bar */
High = max(y, &RefValue);    /* upper end of highlow bar */
run;

Maps high and low values to colors

If you want SAS to assign colors to the two groups, you can skip this step. However, in many cases you might want to choose which color is plotted for the high and low categories. You can map levels of a group to colors by using a discrete attribute map ("DATTR map", for short) in PROC SGPLOT. Because we are going to use a HIGHLOW statement to graph the data, we need to define a map that has the FillColor and LineColor for the vertical bars. The following DATA step maps the 'High' category to red and the 'Low' category to blue:

/* 2. Create a discrete attribute map that maps values to colors */
data DAttrs;
length c FillColor LineColor $16.;
ID = "HighLow";
Value = "High"; c="DarkRed";  FillColor=c; LineColor=c; output;
Value = "Low";  c="DarkBlue"; FillColor=c; LineColor=c; output;
run;

Create a high-low plot

The final step is to create a high-low plot that shows the deviations from the reference value. You can use the DATTRMAP= option to tell PROC SGPLOT how to assign colors for the group values. Because a data set can contain multiple maps, the ATTRID= option specifies which mapping to use.

/* 3. Use a HIGHLOW plot to graph the deviations from the reference value */
title "Deviations from Reference Value (&RefValue)";
title2 "Morning Fasting Blood Glucose";
ods graphics / width=600px height=400px;
proc sgplot data=Center DATTRMAP=DAttrs noautolegend;
   highlow x=day low=low high=high / group=Group ATTRID=HighLow;
   refline &RefValue / axis=y;
   yaxis grid label="Blood Glucose Level";
run;

The graph is shown at the top of this section. It is clear that on most days the patient has high blood sugar. With additional investigation, you can discover that the highest levels are associated with weekends and holidays.

Note that these data would not be appropriate to plot on a circular graph because the data are not for a full year. Furthermore, on this graph it is easy to see specific values and days and to compare days in September with days in December.

A deviation plot for daily average temperatures

My colleague's graph displayed daily average temperatures. The following deviation plot shows average temperatures and a reference value of 65F. The graph shows the daily average temperature in Raleigh, NC, for 2018:

In this graph, it is easy to find the approximate temperature for any range of dates (such as "mid-October") and to compare the temperature for different time periods, such as March versus October. I think the rectangular deviation plot makes an effective visualization of how these data compare to a baseline value.

You can download the SAS program that creates the graphs in this article.

The post Create a deviation plot to visualize values relative to a baseline appeared first on The DO Loop.

9月 162019
 

A moving average is a statistical technique that is used to smooth a time series. My colleague, Cindy Wang, wrote an article about the Hull moving average (HMA), which is a time series smoother that is sometimes used as a technical indicator by stock market traders. Cindy showed how to compute the HMA in SAS Visual Analytics by using a GUI "formula builder" to compute new columns in a data table. This article shows how to implement the Hull moving average in the SAS/IML language. The SAS/IML language is an effective way to implement new (or existing) smoothers for time series because it is easy to use lags and rolling windows to extract values from a time series.

I have previously written about how to compute common moving averages in SAS/ETS (PROC EXPAND) and the DATA step. I have also shown how to compute a weighted moving average (WMA) in the SAS/IML language, which includes simple moving averages and exponential moving averages. This article shows how to implement the Hull moving average formula in SAS by leveraging the WMA function in the previous article.

Every choice of weights in a weighted moving average leads to a different smoother. This article uses weights that are linear in time. Specifically, if Y is a time series, the weighted moving average at time index t is obtained by the formula
WMA(t, n) = 1*Y[tn + 1] + 2*y[tn + 2] + 3*Y[tn + 3] + ... + n*Y[t]) / (1 + 2 + ... + n)
In general, the divisor in the formula is the sum of whatever weights are used.

Although this article focuses on the Hull moving average, the techniques are broadly applicable to other moving average computations.

Hull moving average

Cindy's article contains formulas that show how to compute the Hull moving average (HMA). The HMA is the moving average of a linear combination of two other weighted moving averages, one on a shorter time scale and the other on a longer time scale. Given a time series, Y, choose a lag time, n, which is sometimes called the window length. You can compute the Hull moving average by computing four quantities:

  1. Compute Xshort as the weighted moving average of Y by using a short window of length n/2.
  2. Compute Xlong as the weighted moving average of Y by using the longer window of length n.
  3. Define the series X = 2* Xshort – Xlong.
  4. The Hull moving average of Y is the weighted moving average of X by using a window of length sqrt(n).

The following SAS/IML program implements the Hull moving average. The WMA function is explained in my previous blog post. The HMA function computes the Hull moving average by calling the WMA function three times, twice on Y and once on X. It requires only four statements, one for each computed quantity:

proc iml;
/* Weighted moving average of k data values.
   First k-1 values are assigned the weighted mean of all preceding values.
   Inputs:     y     column vector of length N >= k
               wt    column vector of weights. w[k] is weight for most recent 
                      data; wt[1] is for k time units ago.  The function 
                     internally standardizes the weights so that sum(wt)=1.
   Example calls: 
   WMA  = WMA(y, 1:5);          is weighted moving average of most recent 5 points
   SMA  = WMA(y, {1 1 1 1 1});  is simple moving average of 
   See https://blogs.sas.com/content/iml/2016/02/03/rolling-statistics-sasiml.html
*/
start WMA(y, wt);
   w = colvec(wt) / sum(wt);         /* standardize weights so that sum(w)=1 */
   k = nrow(w);                      /* length of lag */
   MA = j(nrow(y), 1, .);
   /* handle first k values separately */
   do i = 1 to k-1;
      wIdx = k-i+1:k;                /* index for previous i weights */
      MA[i] = sum(wt[wIdx]#y[1:i]) / sum(wt[wIdx]);  /* weighted average */
   end;
   /* main computation: average of current and previous k-1 data values */
   do i = k to nrow(y);
      idx = (i-k+1):i;               /* rolling window of k data points */
      MA[i] = sum( w#y[idx] );       /* weighted sum of k data values */
   end;
   return ( MA );
finish;
 
/* Hull moving average HMA(y, nLag) at each time point. All computations
   use linear weights 1:k for various choices of k. */
start HMA(y, nLag);
   Xshort = WMA(y, 1:round(nLag/2)); /* short smoother (nLag/2) */
   Xlong = WMA(y, 1:nLag);           /* longer smoother (nLag) */
   X = 2*Xshort - Xlong;             /* combination of smoothers */
   HMA = WMA(X, 1:round(sqrt(nLag)));/* Hull moving average (length sqrt(nLag)) */
   return HMA;
finish;
 
/* test on simple time series */
y = T({0 1 0 1 3 2 3 3});
nLag = 4;
WMA = WMA(y, 1:nLag);  
HullMA = HMA( y, nLag );  
print y WMA HullMA;
Hull moving average compared with weighted moving average

The table shows the weighted moving average and the Hull moving average applied to a simple time series with eight observations. The smoothed values are shown. As a general rule, the Hull moving average tends to be smoother than a raw weighted moving average. For a given lag parameter, it responds more quickly to changes in Y.

However, it is important to realize that the HMA is not a smoother of Y. Rather, it smooths a linear combination of other smoothers. Consequently, the value of the HMA at time t can be outside of the range of the series {Y1, Y1, ..., Yt}. This is seen in the last observation in the table. The HMA has the value 3.08 even though the range of Y is [0, 3]. This is shown in the following graph, which plots the series, the weighted moving average, and the Hull moving average.

Graph of Hull moving average and weighted moving average applied to an example time series

The Hull moving average applied to stock data

This section reproduces the graphs in Cindy's blog post by applying the Hull moving average to the monthly closing price of IBM stock. The following statements compute three different moving averages (Hull, simple, and weighted) and use PROC SGPLOT to overlay the simple and Hull moving averages on a scatter plot that shows the closing price and a high-low plot that shows the range of stock values for each month.

use Sashelp.Stocks where(stock='IBM');       /* read stock data for IBM */
read all var {'Close'} into y; close;
 
nLag = 5;
SMA = WMA(y, j(1,nLag,1));                   /* simple MA */
WMA = WMA(y, nLag);                          /* weighted MA */
HMA = HMA(y, nLag);                          /* Hull MA */
create out var {SMA WMA HMA}; append; close; /* write moving averages to SAS data set */
QUIT;
 
data All;                                    /* merge data and moving averages */
   merge Sashelp.Stocks(where=(stock='IBM')) out;
run;
 
ods graphics / width=800px height=300px;
title "Comparison of Simple and Hull Moving Average";
proc sgplot data=All noautolegend;
   highlow x=Date high=High low=Low / lineattrs=(color=LightGray);
   scatter x=Date y=Close / markerattrs=(color=LightGray);
   series x=Date y=SMA / curvelabel curvelabelattrs=(color=DarkGreen) lineattrs=(color=DarkGreen);
   series x=Date y=HMA / curvelabel curvelabelattrs=(color=DarkOrange) lineattrs=(color=DarkOrange);
   refline '01JUN1992'd / axis=x label='June 1992' labelloc=inside;
   xaxis grid display=(nolabel); 
   yaxis grid max=200 label="IBM Closing Price";
run;
Hull moving average and wimple moving average applied to IBM stock price (1986-2006)

It is well known that the simple moving average (green line) lags the data. As Cindy points out, the Hull moving average (orange line) responds more quickly to changes in the data. She compares the two smoothers in the months after June 1992 to emphasize that the HMA is closer to the data than the simple moving average.

In summary, this article shows how to implement a custom time series smoother in SAS. The example uses the Hull moving average and reproduces the computations and graphs shown in Cindy's article. The techniques apply to any custom time series smoother and time series computation. The SAS/IML language makes it easy to compute rolling-window computations because you can easily access segments of a time series.

The post The Hull moving average: Implement a custom time series smoother in SAS appeared first on The DO Loop.

5月 042018
 

It looks like we've finally recovered from the Great Recession, and there are even claims of record-low unemployment in several U.S. states. Of course claims like that make my data-radar go off, and I wanted to see the numbers for myself. And it's a great excuse for me to create [...]

The post Lowest unemployment since 1976? appeared first on SAS Learning Post.

7月 242017
 

For a time series { y1, y2, ..., yN }, the difference operator computes the difference between two observations. The kth-order difference is the series { yk+1 - y1, ..., yN - yN-k }. In SAS, The DIF function in the SAS/IML language takes a column vector of values and returns a vector of differences.

For example, the following SAS/IML statements define a column vector that has five observations and calls the DIF function to compute the first-order differences between adjacent observations. By convention, the DIF function returns a vector that is the same size as the input vector and inserts a missing value in the first element.

proc iml;
x = {0, 0.1, 0.3, 0.7, 1};
dif = dif(x);            /* by default DIF(x, 1) ==> first-order differences */
print x dif;
First-order difference of time series

The difference operator is a linear operator that can be represented by a matrix. The first nonmissing value of the difference is x[2]-x[1], followed by x[3]-x[2], and so forth. Thus the linear operator can be represented by the matrix that has -1 on the main diagonal and +1 on the super-diagonal (above the diagonal). An efficient way to construct the difference operator is to start with the zero matrix and insert ±1 on the diagonal and super-diagonal elements. You can use the DO function to construct the indices for the diagonal and super-diagonal elements in a matrix:

start DifOp(dim);
   D = j(dim-1, dim, 0);        /* allocate zero martrix */
   n = nrow(D); m = ncol(D);
   diagIdx = do(1,n*m, m+1);    /* index diagonal elements */
   superIdx  = do(2,n*m, m+1);  /* index superdiagonal elements */
   *subIdx  = do(m+1,n*m, m+1); /* index subdiagonal elements (optional) */
   D[diagIdx] = -1;             /* assign -1 to diagonal elements */
   D[superIdx] = 1;             /* assign +1 to super-diagonal elements */
   return D;
finish;
 
B = DifOp(nrow(x));
d = B*x;
print B, d[L="Difference"];
Difference operator and first-order difference of a time series

You can see that the DifOp function constructs an (n-1) x n matrix, which is the correct dimension for transforming an n-dimensional vector into an (n-1)-dimensional vector. Notice that the matrix multiplication omits the element that previously held a missing value.

You probably would not use a matrix multiplication in place of the DIF function if you needed the first-order difference for a single time series. However, the matrix formulation makes it possible to use one matrix multiplication to find the difference for many time series.

The following matrix contains three time-series, one in each column. The B matrix computes the first-order difference for all columns by using a single matrix-matrix multiplication. The same SAS/IML code is valid whether the X matrix has three columns or three million columns.

/* The matrix can operate on a matrix where each column is a time series */
x = {0    0    0,
     0.1  0.2  0.3,
     0.3  0.8  0.5,
     0.7  0.9  0.8,
     1    1    1   };
B = DifOp(nrow(x));
d = B*x;                        /* apply the difference operator */
print d[L="Difference of Columns"];
First-order differences for multiple time series

Other operators in time series analysis can also be represented by matrices. For example, the first-order lag operator is represented by a matrix that has +1 on the super-diagonal. Moving average operators also have matrix representations.

The matrix formulation is efficient for short time series but is not efficient for a time series that contains thousands of elements. If the time series contains n elements, then the dense-matrix representation of the difference operator contains about n2 elements, which consumes a lot of RAM when n is large. However, as we have seen, the matrix representation of an operator is advantageous when you want to operate on a large number of short time series, as might arise in a simulation.

The post Difference operators as matrices appeared first on The DO Loop.

7月 132016
 

Last week I showed how to represent a Markov transition matrix in the SAS/IML matrix language. I also showed how to use matrix multiplication to iterate a state vector, thereby producing a discrete-time forecast of the state of the Markov chain system. This article shows that the expected behavior of a Markov chain can often be determined just by performing linear algebraic operations on the transition matrix.


Absorbing Markov chains in #SAS
Click To Tweet


Absorbing Markov chains

Whereas the system in my previous article had four states, this article uses an example that has five states. The ith row of the following transition matrix gives the probabilities of transitioning from State i to any other state:

proc iml;
/* i_th row contains transition probabilities from State i */
P = { 0    0.5  0    0.5  0,
      0.5  0    0.5  0    0,
      0    0.5  0    0    0.5,
      0    0    0    1    0,
      0    0    0    0    1 };

For example, the last row of the matrix indicates that if the system is in State 5, the probability is 1 that it stays in State 5. This is the definition of an absorbing state. An absorbing state is common for many Markov chains in the life sciences. For example, if you are modeling how a population of cancer patients might respond to a treatment, possible states include remission, progression, or death. Death is an absorbing state because dead patients have probability 1 that they remain dead. The non-absorbing states are called transient. The current example has three transient states (1, 2, and 3) and two absorbing states (4 and 5).

If a Markov chain has an absorbing state and every initial state has a nonzero probability of transitioning to an absorbing state, then the chain is called an absorbing Markov chain. The Markov chain determined by the P matrix is absorbing. For an absorbing Markov chain, you can discover many statistical properties of the system just by using linear algebra. The formulas and examples used in this article are taken from the online textbook by Grimstead and Snell.

The first step for analyzing an absorbing chain is to permute the rows and columns of the transition matrix so that all of the transient states are listed first and the absorbing states are listed last. (The P matrix is already in this form.) If there are k absorbing states, the transition matrix in block form looks like the following:

Block form of an absorbing Markov transition matrix

The bottom right block of the transition matrix is a k x k identity matrix and represents the k absorbing states. The top left block contains the probabilities of transitioning between transient states. The upper right block contains the probabilities of transitioning from a transient state to an absorbing state. The lower left block contains zeros because there is no chance of transitioning from an absorbing state to any other state.

The following SAS/IML statements show how to extract the Q and R matrices from the P matrix:

k = sum(vecdiag(P)=1);      /* number of absorbing states */
nt = ncol(P) - k;           /* number of transient states */
 
Q = P[1:nt, 1:nt];          /* prob of transitions between transient states */ 
R = P[1:nt, nt+1:ncol(P)];  /* prob of transitions to absorbing state */

Expected behavior of absorbing Markov chains

By definition, all initial states for an absorbing system will eventually end up in one of the absorbing states. The following questions are of interest. If the system starts in the transient state i, then:

  1. What is the expected number of steps the system spends in transient state j?
  2. What is the expected number of steps before entering an absorbing state?
  3. What is the probability that the system will be absorbed into the jth absorbing state?

The answers to these questions are obtained by defining the so called fundamental matrix, which is N = (I-Q)-1. The fundamental matrix answers the first question because the entries of N are expected number of steps that the system spends in transient state j if it starts in transient state i:

transState = char(1:nt);        /* labels of transient states */
absState = char(nt+1:ncol(P));  /* labels of absorbing states */
 
/* Fundamental matrix gives the expected time that the system is 
   in transient state j if it starts in transient state i */
N = inv(I(nt) - Q);  
print N[L="Expected Time in State j" r=transState c=transState];
Expected time in State j for an absorbing Markov chain

The first row indicates that if the system starts in State 1, then on average it will spend 1.5 units of time in State 1 (including the initial time period), 1 unit of time in State 2, and 0.5 units of time in State 3 before transitioning to an absorbing state. Similarly, if the system starts in State 2, you can expect 1 unit, 2 units, and 1 unit of time in States 1, 2, and 3, respectively, before transitioning to an absorbing state.

Obviously, if you sum up the values for each row, you get the expect number of steps until the system transitions to an absorbing state:

/* Expected time before entering an absorbing state if the system
   starts in the transient state i */
t = N[,+];
print t[L="Expected Time until Absorption" r=transState];
Expected ime until absorption for an absorbing Markov chain

The remaining question is, to me, the most interesting. If the system starts in a transient state, you know that it will eventually transition into one of the k absorbing states. But which one? What is the probability that the system ends in the jth absorbing state if it starts in the ith transient state? The answer is obtained by the matrix multiplication N*R because the matrix N is the expected number of steps in each transient state and R contains the probability of transitioning from a transient state to an absorbing state. For our example, the computation is as follows:

/* The probability that an absorbing chain will be absorbed
   into j_th absorbing state if it starts in i_th transient state */
B = N*R;
print B[L="Absorption Probabilities" r=transState c=absState];
Absorption probabilities for an absorbing Markov chain with two absorbing states

The first row of the matrix indicates that if the system starts in State 1, it will end up in State 4 three quarters of the time and in State 5 one quarter of the time. The second rows indicates that if the system starts in State 2, there is a 50-50 chance that it will end up in State 4 or State 5.

Because this Markov chain is a stochastic process, you cannot say with certainty whether the system will eventually be absorbed into State 4 or State 5. However, starting the system in State 1 means that there is a higher probability that the final state will be State 4. Similarly, starting in State 3 means a higher probability that the final state will be in State 5.

There are many other statistics that you can compute for absorbing Markov chains. Refer to the references for additional computations.

References

tags: Matrix Computations, Time series

The post Absorbing Markov chains in SAS appeared first on The DO Loop.

7月 072016
 

Many computations in elementary probability assume that the probability of an event is independent of previous trials. For example, if you toss a coin twice, the probability of observing "heads" on the second toss does not depend on the result of the first toss.

However, there are situations in which the current state of the system determines the probability of the next state. A stochastic process in which the probabilities depend on the current state is called a Markov chain.

A Markov transition matrix models the way that the system transitions between states. A transition matrix is a square matrix in which the (i,j)th element is the probability of transitioning from state i into state j. The sum of each row is 1. For reference, Markov chains and transition matrices are discussed in Chapter 11 of Grimstead and Snell's Introduction to Probability.


Markov transition matrices in #SAS
Click To Tweet


A transition matrix of probabilities

A Wikipedia article on Markov chains uses a sequence of coin flips to illustrate transitioning between states. This blog post uses the same example, which is described below.

Imagine a game in which you toss a fair coin until the sequence heads-tails-heads (HTH) appears. The process has the following four states:

  • State 1: No elements of the sequence are in order. If the next toss is tails (T), the system stays at State 1. If the next toss is heads (H), the system transition to State 2.
  • State 2: H. The first element of the sequence is in order. If the next toss is H, the system stays at State 2. If the next toss is T, the system transitions to State 3.
  • State 3: HT. The first two elements of the sequence in order. If the next toss is H, transition to State 4. If the next toss is T, transition to State 1.
  • State 4: HTH. The game is over. Stay in this state.

The transition matrix is given by the following SAS/IML matrix. The first row contains the transition probabilities for State 1, the second row contains the probabilities for State 2, and so on.

proc iml;
/* Transition matrix. Columns are next state; rows are current state */
/*     Null  H   HT  HTH */
P =    {0.5  0.5 0   0,   /* Null */
        0    0.5 0.5 0,   /* H    */
        0.5  0   0   0.5, /* HT   */
        0    0   0   1};  /* HTH  */
states = "0":"3";
print P[r=states c=states L="Transition Matrix"];

Analyzing sequences of coin tosses can be interesting and sometimes counterintuitive. Let's describe the expected behavior of this system.

The state vector

You can use a four-element row vector to represent the probabilities that the system is in each state. If the system is in the ith state, put a 1 in the ith element and zero elsewhere. Thus the state vector {1 0 0 0} indicates that the system is in State 1.

You can also use the state vector to represent probabilities. If the system has a 50% chance of being in State 1 and a 50% chance of being in State 2, the state of the system is represented by the state vector {0.5 0.5 0 0}.

The time-evolution of the system can be studied by multiplying the state vector and the transition matrix. If s is the current state of the system, then s*P gives the vector of probabilities for the next state of the system. Similarly, (s*P)*P = s*P2 describes the probabilities of the system being in each state after two tosses.

For the HTH-sequence game, suppose that you start a new game. The game starts in State 1. The following computation describes the evolution of the state vector:

s0 = {1 0 0 0};    /* a new game is in State 1 */
s1 = s0 * P;       /* probability distribution of states after 1 toss */
s2 = s1 * P;       /* probability distribution of states after 2 tosses */
print (s1//s2)[L="Prob of State" c=("1":"4") r={"1 toss" "2 tosses"}];
First two states in a Markov chain

The first row of the output gives the state probabilities for the system after one coin toss. The system will be in State 1 with probability 0.5 (if you toss T) and will be in State 2 with probability 0.5 (if you toss H). The second row is more interesting. The computation use the probabilities from the first toss to compute probabilities for the second toss. After two tosses, the probability is 0.25 for being in State 1 (TT), the probability is 0.5 for being in State 2 (TH or HH), and the probability is 0.25 for being in State 3 (HT).

Modeling a sequence of coin tosses

In general you can repeatedly multiple the state vector by the transition matrix to find the state probabilities after k time periods. For efficiency you should avoid concatenating results inside a loop. Instead, allocate a matrix and use the ith row to hold the ith state vector, as follows:

/* Iterate to see how the probability distribution evolves */
numTosses = 10;
s0 = {1 0 0 0};                     /* initial state */
s = j(numTosses+1, ncol(P), .);     /* allocate room for all tosses */
s[1,] = s0;                         /* store initial state */
do i = 2 to nrow(s);
   s[i,] = s[i-1,] * P;             /* i_th row = state after (i-1) iterations */
end;
iteration = 0:numTosses;            /* iteration numbers */
print s[L="Prob of State" c=("1":"4") r=(char(iteration))];
First 10 states in a Markov chain with one absorbing state

The output shows the state probabilities for a sequence of 10 coin tosses. Recall that the last row of the transition matrix ensures that the sequence stays in State 4 after it reaches State 4. Therefore the probability of the system being in State 4 is nondecreasing. The output shows that there is a 65.6% chance that the sequence HTH will appear in 10 tosses or less.

You can visualize the evolution of the probability distributions by making a series plot for each column of this output. You can download the SAS program that creates the plot and contains all of the computations in this article. The plot is shown below:

Predicted states for a Markov chain by iterating a trasition matrix

The plot shows the probability distributions after each toss when the system starts in State 1. After three time periods the system can be in any state. Over the long term, the system has a high probability of being in State 4 and a negligible chance of being in the other three states.

Modeling transitions in a population

You can also apply the transition matrix to a population of games. For example, suppose that many students are playing the game. At a certain instant, 50% of the games are in State 1, 30% are in State 2, and 20% are in State 3. You can represent that population of initial states by using the initial vector

s0 = {0.5 0.3 0.2 0};

The following graph gives the probability of the states for the population for the next 30 coin tosses:

Predicted states for a Markov chain by iterating a trasition matrix

The initial portion of the graph looks different from the previous graph because the population starts in a different initial distribution of states. However, the long-term behavior of this system is the same: all games eventually end in State 4. For this initial population, the graph shows that you should expect 80% of the games to be in State 4 by the 13th toss.

A real example: Predicting caseloads for social workers and parole officers

An interesting application of using Markov chains was presented by Gongwei Chen at SAS Global Forum 2016. Chen built a discrete-time Markov chain model to forecast workloads at the Washington State Department of Corrections. Social workers and parole officers supervise convicted offenders who have been released from prison or who were sentenced to supervision by the court system. Individuals who exhibit good behavior can transition from a highly supervised situation into less supervision. Other individuals might commit a new offense that requires an increase in supervision. Chen used historical data to estimate the transition matrix between the different supervisory states. His model helps the State of Washington forecast the resources needed to supervise offenders.

Summary

In summary, it is easy to represent a transition matrix and a state vector in SAS/IML. You can iterate the initial distribution of states to forecast the state of the system after an arbitrary number of time periods. This is done by using matrix multiplication.

A Markov chain model involves iterating a linear dynamical system. The qualitative asymptotic behavior of such systems can be described by using the tools of linear algebra. In a future article, I will describe how you can compute statistical properties of Markov chain models from the transition matrix.

tags: Matrix Computations, Time series

The post Markov transition matrices in SAS/IML appeared first on The DO Loop.

3月 072016
 

We recently had a flooding event at Jordan Lake where the water rose almost 20 feet above normal. This blog details that flooding event in both photos and graphs. If you're intrigued by weather, boats, or lakes then this blog's for you! In NC's Research Triangle Park area, there are basically two […]

The post Tracking our local lake rise 20-ft above normal appeared first on SAS Learning Post.

11月 132015
 

With a major election coming next year, I was wondering if there have been any shifts & changes in the voters in my state.  This seems like an interesting opportunity for some data analysis, eh!?! To get you into the spirit of elections, here's an "I Voted" sticker from my friend […]

The post How to build a customized voter registration data viewer appeared first on SAS Learning Post.