Data Structures

4月 032017
 

One of the advantages of the new mixed-type tables in SAS/IML 14.2 is the greatly enhanced printing functionality. You can control which rows and columns are printed, specify formats for individual columns, and even use templates to completely customize how tables are printed. Printing a table is accomplished by using the TableCreateFromDataSet function. Finally, the TablePrint subroutine prints a customize portion of the table:

data class;
set sashelp.class;
Birthday = '03APR2017'd - age*365 - floor(365*uniform(1)); /* create birthday */
format Birthday DATE9.;
run;
 
proc iml;
tClass = TableCreateFromDataSet("Class");    /* read data into table */
run TablePrint(tClass) firstobs=3 numobs=5 
                       var={"Age" "Birthday"} 
                       ID="Name"
                       label="Subset of Class";
Basic printing of SAS/IML Tables

Notice that the table contains both numeric and character columns. Furthermore, the numeric columns have different formats. The TablePrint subroutine has some distinct advantages over the traditional PRINT statement in SAS/IML:

  • The TablePrint subroutine supports an easy way to display a range of observations. When you use the PRINT statement for multiple vectors, you have to use row subscripts in each vector, such as PRINT (X[3:8,]) (Y[3:8,]);
  • The TablePrint subroutine supports printing any columns in any order. When you use the PRINT statement on a matrix, you have to use column subscripts to change the order of the matrix columns: PRINT (X[, {2 3 1}]);
  • The PRINT statement supports the ROWNAME= option (for specifying row headers), the COLNAME= option (for specifying column headers), and the LABEL= option. Those options are easy to work with when you print a single matrix. However, you can't store mixed-type data in a matrix and those options are less convenient when you print a set of vectors.

Advanced printing of SAS/IML tables

Trafic lighting: Color cells in SAS/IML tables by cell contents

The SAS/IML documentation has several sections of documentation devoted to

For statistical programmers, the ability to use ODS templates means that output from PROC IML can look the same as output from some other SAS procedue. For example, suppose that you have a table that contains parameter estimates for a linear regression. The following example prints that table by using the same ODS template that PROC REG uses, which is the Stat.Reg.ParameterEstimates template:

proc iml;
vars = {"Intercept", X, Z};
stats = {32.19   5.08   21.42   42.97,
          0.138  0.0348  0.0644  0.2117, 
          1.227  0.5302  0.1027  2.3506 }; 
tbl = TableCreate("Variable", vars);
call TableAddVar(tbl, {"Estimate" "StdErr" "LowerCL" "UpperCL"},  stats);
 
Confidence=95;
call TablePrint(tbl) template="Stat.Reg.ParameterEstimates"
                     dynamic={Confidence};
Print SAS/IML tables by using existing ODS templates

This example works because the column names in the SAS/IML table match the names that are expected by the Stat.REG.ParameterEstimates template. The DYNAMIC= option specifies a dynamic variable (Confidence) that the template requires. See the documentation for further details.

Summary

In summary, the TablePrint subroutine in SAS/IML gives programmers control over many options for printing tables of data and statistics. For complex layouts, you can use an existing ODS template or create your own template to customize virtually every aspect of your tabular displays.

The post Print tables in SAS/IML appeared first on The DO Loop.

3月 292017
 

Lists are collections of objects. SAS/IML 14.2 supports lists as a way to store matrices, data tables, and other lists in a single object that you can pass to functions. SAS/IML lists automatically grow if you add new items to them and shrink if you remove items. You can also modify existing items. You can use indices to refer to items of a list, or you can assign names to the items to create a named list.

Create a list

You can create a list by using the ListSetItem subroutine to assign a value to each item:

proc iml;
L = ListCreate(2);                  /* L is two-item list */
call ListSetItem(L, 1, 1:3);        /* 1st item is 1x3 vector */
call ListSetItem(L, 2, {4 3, 2 1}); /* 2nd item is 2x2 matrix */

The arguments for the ListSetItem subroutine are list, index, and value, which is the same order that you use when you assign a value to a matrix element, such as A[2] = 5.

If you do not know how many items the list will contain, or if you later want to add additional items, you can use

X = {3 1 4, 2 5 3};                 /* create a 2x3 matrix */
call ListAddItem(L, X);             /* add 3rd item; list grows */

Notice that the syntax for the ListAddItem subroutine only requires the list and the value because the new item is always added to the end of the list. At this point, the list contains three items, as shown below:

Items in a SAS/IML list can be different shapes and sizes

Insert, modify, and delete list items

A convenient feature of SAS/IML lists is that they automatically resize. You can insert new items into any position in the list. You can also replace an existing item or delete an item. The following statements demonstrate modifying an existing list.

call ListInsertItem(L, 2, -1:1);    /* insert new item before item 2; list grows */
call ListSetItem(L, 1, {A B, C D}); /* replace 1st item with 2x2 character matrix */
call ListDeleteItem(L, 3);          /* delete 3rd item; list shrinks */

The arguments for k, existing higher-indexed items are also renumbered. (Conceptually, items "move to the left.") After the previous sequence of operations, the list contains the following items:

You can insert, modify, and delete items in a SAS/IML list

Create named lists

In the previous section, the list acted like a dynamic array: it stored items that you could access by using indices such as 1, 2, and 3. For some applications, it makes more sense to name the list items and refer to the items by their names rather than their positions. This is similar to "structs" in some languages, where you can refer to members of a struct by name.

For example, suppose a teacher wants to store information about students in her classes. For each student, she might want to store the student's name, class, and test scores. The following statements create a SAS/IML list that has three items named "Name", "Class", and "Scores". The items are then assigned values:

Student = ListCreate({"Name" "Class" "Scores"});  /* create named list with 3 items */
call ListSetItem(Student, "Name", "Ron");         /* set "name" value for student */
call ListSetItem(Student, "Class", "Statistics"); /* set "class" value  for student */
Tests = {100 97 94 100 100};                      /* test scores */
call ListSetItem(Student, "Scores", Tests);       /* set "scores" value for student */

Although you can still use positional indices to access list items ("Ron" is the first item, "Statistics" is the second item, ...), you can also use the name of items. For example, to extract the test scores into a SAS/IML matrix or vector, you can use

s = ListGetItem(Student, "Scores");               /* get test scores from list */

Lists are for storing items, not for computing. You cannot add, subtract, or multiply lists. However, the example shows that you can extract an item into a matrix and subsequently perform algebraic operations on the matrix.

Lists of lists

Lists can contain sublists. In this way you can represent nested or hierarchical data. For example, the teacher might want to store information about several students. If information about each student is stored in a list, then she can use a list of lists to store information about multiple students.

The following statements create a list called All. The first student ("Ron") is added to the list, which means that the item is a copy of the Student list. Then information about a second student ("Karl") is stored in the Student list and copied into the All list as a second item. This process could continue until all students are stored in the list.

All = ListCreate();                  /* empty list */
call ListAddItem(All, Student);      /* add "Ron" to list */
 
call ListSetItem(Student, "Name", "Karl"); 
call ListSetItem(Student, "Class", "Calculus");
call ListSetItem(Student, "Scores", {90 92 84 70 80});  
call ListAddItem(All, Student);      /* add "Karl" to list */

At this point, the All list contains two items. Each item is a list that contains three items.

Summary

SAS/IML 14.2 supports lists, which are containers that store other objects. Lists dynamically grow or shrink as necessary. You can index items by their position in the list (using indices) or you can create a named list and reference items by their names. You can use built-in functions to extract items a list or add new items to a list. You can insert, delete, and modify list items. You can represent hierarchical data by using lists of lists. For more information about SAS/IML lists, see the chapter in the SAS/IML documentation. The documentation shows how lists can be used to emulate other data structures, such as stacks, queues, and trees.

The post Lists: Nonmatrix data structures in SAS/IML appeared first on The DO Loop.

3月 222017
 

Prior to SAS/IML 14.2, every variable in the Interactive Matrix Language (IML) represented a matrix. That changed when SAS/IML 14.2 introduced two new data structures: data tables and lists. This article gives an overview of data tables. I will blog about lists in a separate article.

A matrix is a rectangular array that contains all numerical or all character values. Numerical matrices are incredibly useful for computations because linear algebra provides a powerful set of tools for implementing analytical algorithms. However, a matrix is somewhat limiting as a data structure. Matrices are two-dimensional, rectangular, and cannot contain mixed-type data (numeric AND character). Consequently, you can't use one single matrix to pass numeric and character data to a function.

Data tables in SAS/IML are in-memory versions of a data set. They contain columns that can be numeric or character, as well as column attributes such as names, formats, and labels. The data table is associated with a single symbol and can be passed to modules or returned from a module. the TableCreateFromDataSet function, as shown:

proc iml;
tClass = TableCreateFromDataSet("Sashelp", "Class"); /* SAS/IML 14.2 */

The function reads the data from the Sashelp.Class data set and creates an in-memory copy. You can use the tClass symbol to access properties of the table. For example, if you want to obtain the names of the columns in the table, you can use

varNames = TableGetVarName(tClass);
print varNames;
Column names for a data table in SAS/IML

Extracting columns and adding new columns

Data tables are not matrices. You cannot add, subtract, or multiply with tables. When you want to compute something, you need to extract the data into matrices. For example, if you want to compute the body-mass index (BMI) of the students in Sashelp.Class, you can use the TableAddVar function to add the BMI as a new column in the table:

Y = TableGetVarData(tClass, {"Weight" "Height"});
wt = Y[,1]; ht = Y[,2];                /* get Height and Weight variables */
BMI = wt / ht##2 * 703;                /* BMI formula */
call TableAddVar(tClass, "BMI", BMI);  /* add new "BMI" column to table */

Passing data tables to modules

As indicated earlier, you can use data tables to pass mixed-type data into a user-defined function. For example, the following statements define a module whose argument is a data table. The module prints the mean value of the numeric columns in the table, and it prints the number of unique levels for character columns. To do so, it first extracts the numeric data into a matrix, then later extracts the character data into a matrix.

start QuickSummary(tbl);
   type = TableIsVarNumeric(tbl);      /* 0/1 vector   */
   /* for numeric columns, print mean */
   idx = loc(type=1);                  /* numeric cols */
   if ncol(idx)>0 then do;             /* there is a numeric col */
      varNames = TableGetVarName(tbl, idx);         /* get names */
      m = TableGetVarData(tbl, idx);   /* extract numeric data   */
      mean = mean(m);
      print mean[colname=varNames L="Mean of Numeric Variables"];
   end;
   /* for character columns, print number of levels */
   idx = loc(type=0);                  /* character cols */
   if ncol(idx)>0 then do;             /* there is a character col */
      varNames = TableGetVarName(tbl, idx);           /* get names */
      m = TableGetVarData(tbl, idx);   /* extract character data   */
      levels = countunique(m, "col");
      print levels[colname=varNames L="Levels of Character Variables"];
   end;
finish;
 
run QuickSummary(tClass);
Pass data tables to SAS/IML functions and modules

Summary

SAS/IML 14.2 supports data tables, which are rectangular arrays of mixed-type data. You can use built-in functions to extract columns from a table, add columns to a table, and query the table for attributes of columns. For more information about data tables, see the chapter

The post Data tables: Nonmatrix data structures in SAS/IML appeared first on The DO Loop.