2月 102021
 

One of the first things I learned in SAS was how to use PROC PRINT to display parts of a data set. I usually do not need to see all the data, so my favorite way to use PROC PRINT is to use the OBS= data set option to display the first few rows. For example, I often display the first five rows of a SAS data set as follows:

proc print data=Sashelp.Class(obs=5);
    * VAR Weight Height Age;  /* optional: the VAR statement specifies variables */
run;

By using the OBS= data set option, you can display only a few observations. This enables you to take a quick peek at the values of your data. As shown in the comment, you can optionally use the VAR statement to display only certain columns. (Use the FIRSTOBS= option if you want more control over the rows that are printed.)

Display the rows of a data table in SAS/IML

In a SAS/IML program, data are either stored in a table or in a matrix. If the data are in a table, you can use the TABLEPRINT subroutine to display the data. The NUMOBS= option enables you to display only a few rows:

proc iml;
TblClass = TableCreateFromDataset("sashelp", "class");
run TablePrint(TblClass) numobs=5;

The TABLEPRINT subroutine supports many options for printing, including the VAR= option for specifying only certain columns.

Display the rows of a matrix in SAS/IML

How can you display only a portion of a SAS/IML matrix? I often write statements like this to print only the first few rows:

/* read numerical data into the X matrix */
use Sashelp.Class; read all var _NUM_ into X[c=varNames]; close;
print (X[1:5,]);    /* print only rows 1 through 5 */

This works, but there are a few things that I don't like. My primary complaint is that X[1:5,] is a temporary matrix and therefore has no name. The rows are printed, but there is no header that tells me where the data came from. My second complaint is that the output does not indicate which rows are being displayed. Consequently, sometimes I include information in the label and add row and column headers:

print (X[1:5,])[rowname=(1:5) colname=varNames label="Top of X"];

The output now includes the information that I want, but that is a LOT of typing, especially if I want to display similar information for other matrices. Even if I use the abbreviated version of the PRINT options (R=, C=, and L=), it is cumbersome to type.

By the way, this PRINT statement demonstrates a new feature of SAS/IML 15.1 (which was released with SAS 9.4M6), which is that the ROWNAME= and COLNAME= options on the PRINT statement support numerical vectors. If you have an earlier version of SAS/IML, you can use rowname=(char(1:5)).

HEAD: A module to print the top rows of a matrix

There's a saying among computer programmers: if you find yourself writing the same statements again and again, create a function to do it. So, let's write a SAS/IML module to print the top rows of a matrix. Because there is a UNIX command called 'head' that displays the top lines of a file, I will use the same name.

Many years ago, I blogged about how to write a HEAD subroutine, although my emphasis was on how to use default arguments in SAS/IML functions. The following routine is a richer version of the previous function:

/* Print the first n rows of a matrix. Optionally, display names of columns */
start Head(x, n=5, colname=);
   m = min(n, nrow(x));         /* make sure n isn't too big */
   idx = 1:m;                   /* the rows to print */
   name = parentname("x");      /* name of symbol in calling environment */
   if name=" " then name = "Temp";  /* the parent name of a temporary variable is " "*/
   labl = "head(" + name + ") rows=" + strip(char(m));  /* construct the label */
   if isSkipped(colname) then  /* print the top rows */
      print (x[idx,])[r=idx label=labl];
   else 
      print (x[idx,])[r=idx c=colname label=labl];
finish;
 
run Head(X) colname=varNames;  /* example: call the HEAD module */

The HEAD subroutine uses three features of user-defined modules that you might not know about:

The result is a short way to display the top few rows of a matrix.

TAIL: A module to print the bottom rows of a matrix

Although I usually want to print the top row of a matrix, it is easy to modify the HEAD module to display the last n rows of a matrix. The following module, called TAIL, is almost identical to the HEAD module.

/* Print the last n rows of a matrix. Optionally, display names of columns */
start Tail(x, n=5, colname=);
   m = min(n, nrow(x));         /* make sure n isn't too big */
   idx = (nrow(x)-m+1):nrow(x); /* the rows to print */
   name = parentname("x");      /* name of symbol in calling environment */
   if name=" " then name = "Temp";  /* the parent name of a temporary variable is " "*/
   labl = "tail(" + name + ") rows=" + strip(char(m));  /* construct the label */
   if isSkipped(colname) then  /* print the bottom rows */
      print (x[idx,])[r=idx label=labl];
   else 
      print (x[idx,])[r=idx c=colname label=labl];
finish;
 
run Tail(X) colname=varNames;

Summary

Most SAS programmers know how to use the OBS= option in PROC PRINT to display only a few rows of a SAS data set. When writing and debugging programs in the SAS/IML matrix language, you might want to print a few rows of a matrix. This article presents the HEAD module, which displays the top rows of a matrix. For completeness, the article also defines the TAIL module, which displays the bottom rows of a matrix. If you find these modules useful, you can incorporate them into your SAS/IML programs.

For more tips and techniques related to SAS/IML modules, see the article "Everything you wanted to know about writing SAS/IML modules."

The post Print the top rows of your SAS data appeared first on The DO Loop.

2月 082021
 

Look at the following matrices. Do you notice anything that these matrices have in common?

If you noticed that the rows of each matrix are arithmetic progressions, good for you. For each row, there is a constant difference (also called the "increment") between adjacent elements. For these examples:

  • In the first matrix, each row is an arithmetic progression with a difference of 1. That is, the value of the (j+1)th column is one more than the value of the j_th column.
  • In the second matrix, each row is an arithmetic progression with a difference of 2. That is, the value of the (j+1)th column is two more than the value of the j_th column.
  • In the third matrix, each row is an arithmetic progression, but the increments for the rows are not identical. The increment for the first row is -3. The increment for the second row is +1. The increments for the remaining rows are -2 and +3, respectively.

However, there is another characteristic that these matrices have in common: they are all singular matrices. It turns out this is not a coincidence. An n x n matrix (n>2) is singular if each row is an arithmetic progression.

Why are these matrices singular?

Recall that a matrix will be singular if one of the columns can be written as a linear combination of other columns. For these three matrices, you can use mental arithmetic to verify that the third column is a linear combination of the first two columns. Specifically, you can verify that the third column can be written as
X3 = 2 X2 – X1
where X1, X2, X3, ... are the columns of a matrix. Thus, these matrices are singular.

It turns out that this relationship holds in general for any matrix whose rows are arithmetic progressions. Suppose the elements of the i_th row satisfy the progression with increment di so that the elements of the i_th row satisfy X[i, j+1] = X[i, j] + d[i]  for j=1,2,...,n. Let d be the vector of differences. Then
X2 = X1 + d
which you can solve for d to obtain
d = X2 – X1.
By substitution,
X3 = X2 + d = X2 + (X2 – X1), or
X3 = 2*X2 – X1.
Therefore, the third column is always a linear combination of the first two columns, and the coefficients are independent of the differences in the arithmetic progressions of the rows. It doesn't matter what increments are used in the arithmetic progressions of the rows.

In fact, even more is true. You can use similar arguments to show that the j_th column can be written as a linear combination of the first two columns for any j > 2. Thus, regardless of the matrix size, matrices whose rows are arithmetic progressions are rank-2 matrices: they have only two linearly independent columns.

Why should you care about this?

I decided to write about this topic for three reasons:

  1. It's mathematically interesting, and I never noticed it before.
  2. When I write a program, I often need to quickly create an example matrix to test the program. I often use a matrix such as {1 2 3, 4 5 6, 7 8 9} or SHAPE(1:25,5,5). This article shows that both matrices are singular, which is important to know if the program requires a nonsingular matrix.
  3. Lastly, I wanted to emphasize the difference between a "random matrix" and an "arbitrary matrix." In a previous article, I discussed the probability that a random matrix is singular and concluded that a random matrix is almost always nonsingular. Here, "random matrix" means that the matrix elements are drawn randomly from a continuous probability distribution such as the uniform or normal distribution.

    However, the other day I received a comment on that blog post. Essentially, the comment said, "I wrote down this matrix at random, but it is singular. How do you explain that?" The matrix was similar to the matrices at the top of this article. The answer is that the elements of the matrix were not generated randomly. The rows of the matrix form an arithmetic progression (very NON-random!), which, as we have seen, implies that the matrix is singular.

The post A matrix is singular if its rows are arithmetic sequences appeared first on The DO Loop.

2月 042021
 

Some organizations need advanced analytics that are customized, configured and managed off-site. That’s where the SAS Cloud comes in. Ever wondered what it takes to get a managed application services (MAS) project implemented and supported on an ongoing basis? That’s where Jenny Welsh comes in. She’s the Senior Director of Project Management, Technical Account Management [...]

You’re in safe hands with SAS Cloud was published on SAS Voices by Lindsay Marshall

2月 032021
 

In a previous article, I showed how to generate random points uniformly inside a d-dimensional sphere. In that article, I stated the following fact:

If Y is drawn from the uncorrelated multivariate normal distribution, then S = Y / ||Y|| has the uniform distribution on the unit sphere.

I was reminded of this fact recently when I wrote about how to simulate a random walk in high dimensions. Each step of a random walk needs to generate a random direction and a random step length. For a 2-D random walk, it is easy to generate a random direction as an angle, θ, chosen from the uniform distribution on the interval [0,2π). However, for higher-dimensional random walks, I do not suggest using random angles to determine directions. Although it is mathematically possible to use spherical coordinates in d-dimensions, computations in a spherical coordinate system are extremely messy.

A better alternative is to use random unit vectors to determine a direction for each step in a random walk. (In fact, a unit vector is sometimes called a direction vector.) Since a unit vector is a vector from the origin to a point on the unit sphere, this article shows how to generate random points uniformly on the unit sphere. I show the computation twice: once by using the SAS/IML matrix language and again by using the SAS DATA step. In order to visualize the data, I show how to create a contour plot of ungridded data in SAS.

Random points on a circle

For simplicity, let's see how to generate random points on the unit circle in 2-D. The following SAS/IML program simulates N=1000 points from d=2 independent normal distributions. The notation X[ ,##] is a subscript reduction operator that returns a row vector that contains the sum of the squared elements of each row of X. Thus the expression sqrt(X[,##]) gives the Euclidean distance of each row to the origin. If you divide the coordinates by the distance, you obtain a point on the unit circle:

%let N = 1000;         /* number of steps in random walk */
%let d = 2;            /* S^{d-1} sphere embedded in R^d */
%let r = 1;            /* radius of sphere */
 
proc iml;
call randseed(12345);
r = &r; d = &d;
x = randfun(&N // d, "Normal");   /* N points from MVN(0, I(d)) */
norm = sqrt( x[,##] );            /* Euclidean distance to origin */
x = r * x / norm;                 /* scale point so that distance to origin is r */
title "&n Random Points on Circle";
call scatter(x[,1], x[,2]) grid={x y}
             procopt="aspect=1" option="transparency=0.8";
QUIT;

As promised, the resulting points are on the circle of radius r=1. Recall that uniformly at random does not mean evenly spaced. Point patterns that are generated by a uniform process will typically have gaps and clusters.

Random points on a sphere

You can also use the SAS DATA step to generate the random points on a sphere. You can generate each coordinate independently from a normal distribution and use the EUCLID function in Base SAS to compute the Euclidean distance from a point to the origin. You can then scale the coordinates so that the point is on the sphere of radius r. The following DATA step generates d-dimensional points for any d:

%let N = 2000;         /* number of steps in random walk */
%let d = 3;            /* S^{d-1} sphere embedded in R^d */
%let r = 1;            /* radius of sphere */
data RandSphere;
array x[&d];
call streaminit(12345);
do i = 1 to &N;
   do j = 1 to &d;
      x[j] = rand("Normal");      /* random point from MVN(0, I(d)) */
   end;
   norm = Euclid( of x[*] );      /* Euclidean distance to origin */
   do j = 1 to &d;
      x[j] = &r * x[j] / norm;    /* scale point so that distance to origin is r */
   end;
   output;
end;
drop j norm;
run;

How can we visualize the points to assure ourselves that the program is running correctly? One way is to plot the first two coordinates and use colors to represent the value of the points in the third coordinate direction. For example, the following call to PROC SGPLOT creates a scatter plot of (x1,x2) and uses the COLORRESPONSE=x3 option to color the markers. The default color ramp is the ThreeAltColor ramp, which (for my ODS style) is a blue-black-red color ramp. That means that points near the "south pole" (x3 = -1) will be colored blue, points near the equator will be colored black, and points near the "north pole" (x3 = 1) will be colored red. I have also used the TRANSPARENCY= option so that the density of the points is shown.

title "&n Random Points on a Sphere";
proc sgplot data=RandSphere aspect=1;
   scatter x=x1 y=x2 / colorresponse=x3 transparency=0.5 markerattrs=(symbol=CircleFilled);
   xaxis grid; yaxis grid;
run;

Switching to (x,y,z) instead of (x1,x2,x3) notation, the graph shows that observations near the (x,y)-plane (for which x2 + y2 &asymp 1) have a dark color because the z-value is close 1 zero. In contrast, observations near for which x2 + y2 &asymp 0 have either a pure blue or pure red color since those observations are near one of the poles.

A contour plot of ungridded data

Another way to visualize the data would be to construct a contour plot of the points in the upper hemisphere of the sphere. The exact contours of the upper hemisphere are concentric circles (lines of constant latitude) centered at (x,y)=(0,0). For these simulated data, a contour plot should display contours that are approximately concentric circles.

In a previous article, I showed how to use PROC TEMPLATE and PROC SGRENDER in SAS to create a contour plot. However, in that article the data were aligned on an evenly spaced grid in the (x,y) plane. The data in this article have random positions. Consequently, on the CONTOURPLOTPARM statement in PROC TEMPLATE, you must use the GRIDDED=FALSE option so that the plot interpolates the data onto a rectangular grid. The following Graph Template Language (GTL) defines a contour plot for irregularly spaced data and creates the contour plot for the random point on the upper hemisphere:

/* Set GRIDDED=FALSE if the points are not arranged on a regular grid */
proc template;
define statgraph ContourPlotParmNoGrid;
dynamic _X _Y _Z _TITLE;
begingraph;
   entrytitle _TITLE;
   layout overlay;
      contourplotparm x=_X y=_Y z=_Z / GRIDDED=FALSE /* for irregular data */
        contourtype=fill nhint=12  colormodel=twocolorramp name="Contour";
      continuouslegend "Contour" / title=_Z;
   endlayout;
endgraph;
end;
run;
 
ods graphics / width=480px height=480px;
proc sgrender data=RandSphere template=ContourPlotParmNoGrid;
where x3 >= 0;
dynamic _TITLE="Contour Plot of &n Random Points on Upper Hemisphere"
        _X="x1" _Y="x2" _Z="x3";
run;

The data are first interpolated onto an evenly spaced grid and then the contours are estimated. The contours are approximately concentric circles. From the contour plot, you can conclude that the data form a dome or hemisphere above the (x,y) plane.

Summary

This article shows how to use SAS to generate random points on the sphere in any dimension. You can use this technique to generate random directions for a random walk. In addition, the article shows how to use the GRIDDED=FALSE option on the CONTOURPLOTPARM statement in GTL to create a contour plot from irregularly spaced ("ungridded") data in SAS.

The post Generate random points on a sphere appeared first on The DO Loop.

2月 012021
 

I previously blogged about the impact drafting offensive linemen has on winning and making the playoffs. Since I wrote that, we have new data points to add to the analysis and an exciting finale to this NFL season. In this blog post, I’ll explore a few more arguments both in [...]

Should teams draft more offensive linemen? was published on SAS Voices by Frank Silva

2月 012021
 

Do you want to spend less time on the tedious task of preparing your data? I want to tell you about a magical and revolutionary SAS macro called %TK_codebook. Not only does this macro create an amazing codebook showcasing your data, it also automatically performs quality control checks on each variable. You will easily uncover potential problems lurking in your data including variables that have:

  • Incomplete formats
  • Out of range values
  • No variation in response values
  • Variables missing an assigned user-defined format
  • Variables that are missing labels

All you need is a SAS data set with labels and formats assigned to each variable and the accompanying format catalogue. Not only will this macro change the way you clean and prepare your data, but it also gives you an effortless way to evaluate the quality of data you obtain from others before you start your analysis. Look how easy it is to create a codebook if you have a SAS data set with labels and formats:

title height=12pt 'Master Codebook for Study A Preliminary Data';
title2 height=10pt 'Simulated Data for Participants in a Health Study';
title3 height=10pt 'Data simulated to include anomalies illustrating the power of %TK_codebook';
 
libname library "/Data_Detective/Formats/Blog_1_Codebooks";
 
%TK_codebook(lib=work,
	file1=STUDYA_PRELIM,
	fmtlib=LIBRARY,
	cb_type=RTF,
	cb_file=/Data_Detective/Book/Blog/SAS_Programs/My_Codebook.rtf,
	var_order=INTERNAL,
	organization = One record per CASEID,
	include_warn=YES;
run;

Six steps create your codebook

After creating titles for your codebook, this simple program provides %TK_codebook with the following instructions:

  1. Create a codebook for SAS data set STUDYA_PRELIM located in the temporary Work library automatically defined by SAS
  2. Find the formats assigned to the STUDYA_PRELIM in a format catalogue located in the folder assigned to the libref LIBRARY
  3. Write your codebook in a file named /Data_Detective/Book/Blog/SAS_Programs/My_Codebook.rtf
  4. List variables in the codebook by their INTERNAL order (order stored in the data set)
  5. Add “One record per CASEID” indicating which variable(s) uniquely identify each observation to codebook header
  6. Include reports identifying potential problems in the data

Just these few lines of code will create the unbelievably useful codebook shown below.

The data set used has many problems that can interfere with analysis. %TK_codebook creates reports showing a concise summary of only those problem variables needing close examination. These reports save you an incredible amount of time.

Using assigned formats, %TK_codebook identifies unexpected values occurring in each variable and provides a summary in the first two reports.

Values occurring outside those defined by the assigned format indicate two possible problems:

  1. A value was omitted from the format definition (Report 1 – Incomplete formats)
  2. The variable has unexpected values needing mitigation before the data is analyzed (Report 2 – Out of Range Value)

The next report lists numeric variables that have no variation in their values.

These variables need examining to uncover problems with preparing the data set.

The next two reports warn you about variables missing an assigned user-defined format. These variables will be excluded from screening for out-of-range values and incomplete format definitions.

The last report informs you about variables that are missing a label or have a label that matches the variable name.

It is easy to use %TK_codebook to resolve problems in your data and create an awesome codebook. Instead of spending your time preparing your data, you will be using your data to change the world!

Create your codebook

Download %TK_codebook from my author page, then learn to use it from my new book, The Data Detective’s Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data.

THE DATA DETECTIVE'S TOOLKIT | BUY IT NOW

Creating codebooks with SAS macros was published on SAS Users.

2月 012021
 

Imagine an animal that is searching for food in a vast environment where food is scarce. If no prey is nearby, the animal's senses (such as smell and sight) are useless. In that case, a reasonable search strategy is a random walk. The animal can choose a random direction, walk/swim/fly in that direction for a while, and hope that it encounters food. If not, it can choose another random direction and try again.

According to a story in Quanta magazine, an optimal search strategy is to use a variant of a random walk that is known as a Lévy flight. This article describes a Lévy flight, simulates it in SAS, and compares it to Gaussian random walk.

Random walks

For the purpose of this article, a 2-D random walk is defined as follows. From the current position,

  • Choose a random direction by choosing an angle, θ, uniformly in [0, 2π).
  • Choose a random distance, r.
  • Move that distance in that direction.

The amount of territory that gets explored during a random walk depends on the distribution of the distances. For example, if you always choose a unit distance, you obtain the "drunkard's walk." If you choose distances from a normal distribution, you obtain a Gaussian walk. In both scenarios, the animal primarily searches near where it has been previously. These random walks that always use short distances are good for finding food in a food-rich environment.

In a food-scarce environment, it is a waste of time and energy to always move a short distance from the previous position. It turns out that a better strategy is to occasionally move a long distance. That puts the animal in a new location, which it can explore by moving small distances again. According to the Quanta article, this behavior has been observed "to greater or lesser extents" in many animals that hunt at sea, including albatross, sharks, turtles, penguins, and tuna (Sims, et al., Nature, 2008).

Levy walks and the Levy distribution

A popular model for this behavior is the Lévy walk, which is often called a Lévy flight because it was first applied to the flight path of an albatross over open water. In a Lévy flight, the distance is a random draw from a heavy-tailed distribution. In this article, I will simulate a Lévy flight by drawing distances from a Lévy distribution, which is a special case of the inverse gamma distribution.

The Lévy distribution Levy(μ, c) has a location parameter (μ) and a shape parameter (c). It is a special case of the inverse gamma distribution, which is represented by IGamma(α, β). If X ~ Levy(μ, c) is a random variable, then X – μ ~ IGamma(1/2, β/2). In this article, μ=0, so a random draw from the Levy(0,c) distribution is a random draw from IGamma(1/2, c/2) distribution. Thus, you can use the SAS program in the previous article to generate a random sample from the Lévy distribution.

The inverse gamma distribution with α=1/2 has very heavy tails. This means that a random variate can be arbitrarily large. When modeling a physical or biological system, there are often practical bounds on the maximum value of a quantity. For example, if you are modeling the flight of an albatross, you know that there is a maximum distance that an albatross can fly in a given time. Thus, researchers often use a truncated Lévy distribution to model the distances that an animal travels in one unit of time. For example, an albatross can fly up to 800 km in one day (!!), so a model of albatross distances should truncate the distances at 800.

Simulate a Gaussian and Levy Walk

You can use the SAS DATA step to simulate random walks. The program in this section generates a random direction in [0, 2π), then generates a random distance. For the Gaussian walk, the distance is the absolute value of a random N(0, σ) variate. For the Lévy walk, the distances are sampled from the Levy(0, 0.3*σ) distribution. The constant 0.3 is chosen so that the two distance distributions have approximately the same median.

For convenience, you can define a macro that makes it easy to generate random variates from the Lévy distribution. The following program uses the %RandIGamma macro (explained in the previous article), which generates a random variate from the inverse gamma distribution.

The following program simulates four individuals (animals) who start at the corners of a unit square with coordinates (0,0), (1,0), (1,1), and (0,1). At each step, the animals choose a random direction and move a random distance in that direction. In one scenario, the distances generated as the absolute value of a normal distribution. In another scenario, the distances are Levy distributed.

/* useful macros: https://blogs.sas.com/content/iml/2021/01/27/inverse-gamma-distribution-sas.html */
%macro RandIGamma(alpha, beta);          /* IGamma(alpha,beta) */
   (1 / rand('Gamma', &alpha, 1/(&beta)))
%mend;
%macro RandLevy(mu, scale);              /* Levy(mu,c) */
   (&mu + %RandIGamma(0.5, (&scale)/2));
%mend;
%macro RandTruncLevy(mu, scale, max);    /* Truncated Levy */
   min((&mu + %RandIGamma(0.5, (&scale)/2)), &max);
%mend;
 
%let N = 1000;         /* number of steps in random walk */
%let sigma = 0.1;      /* scale parameter for N(0,sigma) */
%let c = (0.3*&sigma); /* makes medians similar */
 
data Walks;
array x0[4] _temporary_ (0 1 1 0);  /* initial positions */
array y0[4] _temporary_ (0 0 1 1);
call streaminit(12345);
twopi = 2*constant('pi');
c = &c;                    /* Levy parameter */
do subject = 1 to dim(x0);
   t=0;                    /* initial position for subject */
   x = x0[subject];   y = y0[subject]; 
   r=.; theta=.;           /* r = random distance; theta=random angle */
   Walk = "Gaussian";  output;
   Walk = "Levy";      output;
   xN = x; yN = y;         /* (xN,yN) are coords for Gaussian walk */
   xL = x; yL = y;         /* (xL,yL) are coords for Levy walk */
   do t = 1 to &N;         /* perform steps of a random walk */
      theta = rand("Uniform", 0, twopi);  /* random angle */
      ux = cos(theta); uy = sin(theta);   /* u=(ux,uy) is random unit vector */
      /* take a Gaussian step; remember location */
      Walk = "Gaussian";
      r = abs( rand("Normal", 0, &sigma) ); /* ~ abs(N(0,sigma)) */
      xN = xN + r*ux;   yN = yN + r*uy;
      /* output the Gaussian walk step */
      x = xN; y = yN; output;
 
      /* take a (truncated) Levy step; remember location */
      Walk = "Levy";
      r = %RandTruncLevy(0, c, 2);  /* Levy(0,c) truncated into [0,2] */
      xL = xL + r*ux;   yL = yL + r*uy;
      /* output the Levy walk step */
      x = xL; y = yL; output;
   end;
end;
drop twopi c xN yN xL yL ux uy;
run;

Let's compare the step sizes for the two types of random walks. The following call to PROC SGPANEL creates histograms of the step lengths for each type of random walk:

title "Distribution of Step Lengths";
proc sgpanel data=Walks;
   label r = "Step Length";
   panelby Walk;
   histogram r / binwidth=0.1 binstart=0.05;
run;

The graph shows that all step lengths for the Gaussian random walk are less than 0.4, which represents four standard deviations for the normal distribution. In contrast, the distribution of distances for the Lévy walk has a long tail. About 15% of the distances are greater than 0.75. About 9% of the step lengths are the maximum possible value, which is 2.

How does that difference affect the amount of territory covered by an animal that is foraging for food? The following statements show the paths of the four animals for each type of random walk:

title "Position of Subjects";
title2 "50 Steps of a Random Walk";
proc sgpanel data=Walks aspect=1;
   where t <= 50;
   panelby Walk / onepanel;
   series x=x y=y / group=Subject;
   colaxis grid; rowaxis grid;
run;

The differences in the paths are evident. For the Gaussian random walk, the simulated animals never leave a small neighborhood around their starting positions. After 50 steps, none of the animals who are taking a Gaussian walk have moved more than one unit away from their starting position. In contrast, the animals who take Lévy walk have explored a much greater region. You can see that the typical motion is several small steps followed by a larger step that brings the animal into a new region to explore.

Generalizing to higher dimensions

It is easy to extend the random walk to higher dimensions. The trick is to represent the "random direction" by a unit vector (u) instead of by an angle (θ). In a previous article about random point within a d-dimensional ball, I remark that "If Y is drawn from the uncorrelated multivariate normal distribution, then u = Y / ||Y|| has the uniform distribution on the unit d-sphere." You can use this trick to generate random unit vectors, then step a distance r in that direction, where r is from a specified distribution of distances.

Summary

An article in Quanta magazine describes how some animals use a swim/flight path that can be modeled by a (truncated) Lévy random walk. This type of movement is a great way to search for food in a vast food-scarce environment. This article defines the Lévy distribution as a sub-family of the inverse gamma distribution. It shows how to generate random variates from the Lévy distribution in SAS. Finally, it simulates a Gaussian random walk and a Lévy walk and visually compares the area explored by each.

The post Gaussian random walks and Levy flights appeared first on The DO Loop.