sas books

2月 232023
 

As you know, almost every SAS programming problem has many very different solutions. I’m going to solve a very simple problem using two different approaches.

The problem: Compute the sum of integers from 1 to 1,000,000.

I bet most of you thought of a solution almost immediately. Let me guess that you thought of one of the solutions shown below:

options fullstimer; ❶
 
Solution 1
data _null_; ❷
   do i = 1 to 1000000;
      Sum + i; ❸
   end;
   file print; ❹
   put Sum=;
run;

The option fullstimer will give you more complete timing information.

❷ To be more efficient use data _null_.

❸ The SUM statement does several things: 1) the variable Sum is retained (not set back to missing) for each iteration of the DATA Step. 2) Sum it initialized at zero. 3) If you had an expression instead of the constant (1) missing values would be ignored.

❹ Use FILE PRINT to send the output to the output window instead of the default location, the LOG.

Solution 2

data Integers; ❶
   do i = 1 to 10000000;
      output; ❷
   end;
run;
 
title "Sum of Integers";
proc means data=Integers n sum; ❸
   var i;
run;

Create a data set called Integers.

❷ Output an observation for each iteration of the DATA Step.  Note that the OUTPUT statement is inside the DO Loop.

❸ Use PROC MEANS to compute the sum.

Although both programs work, there is a difference in CPU time. Does that mean you should always seek a DATA Step solution? Not really. It depends on several factors, such as how often the program is to be run and which method you feel most comfortable with.

Here is a partial listing of the SAS Log showing timing information:

NOTE: 1 lines were written to file PRINT.
NOTE: DATA statement used (Total process time):
      real time           1.00 seconds
      user cpu time       0.21 seconds
      system cpu time     0.32 seconds
      memory              7875.03k
      OS Memory           16876.00k
      Timestamp           02/16/2023 08:23:18 AM
      Step Count           1  Switch Count  0

NOTE: The data set WORK.INTEGERS has 10000000 observations and 1
      variables.
NOTE: DATA statement used (Total process time):
      real time           0.21 seconds
      user cpu time       0.20 seconds
      system cpu time     0.00 seconds
      memory              410.03k
      OS Memory           17392.00k
      Timestamp           02/16/2023 08:23:18 AM
      Step Count           2  Switch Count  0
NOTE: There were 10000000 observations read from the data set
      WORK.INTEGERS.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.25 seconds
      user cpu time       1.07 seconds
      system cpu time     0.03 seconds
      memory              8471.84k
      OS Memory           25116.00k
      Timestamp           02/16/2023 08:23:19 AM
      Step Count           3  Switch Count  0

Do you care about CPU time? Unless this is a production program, I think you should program in a way that is most comfortable (unless you are a compulsive programmer and want the “best” program). By the way, if you remove the FILE PRINT statement from solution 1, the System CPU time is 0.0.  I guess there is some overhead to sending the results to your output device.

I’m interested in what your first instinct was when you read the problem. One of these two, or something else. Please post your comments below.

LEARN MORE | Ron Cody's books on Amazon

DATA Step or PROC? It depends... was published on SAS Users.

12月 142022
 

If you have ever needed to score a multiple-choice test, this blog is for you. Even if you are not planning to score a test, the techniques used in this example are useful for many other programming tasks. The example I am going to present assumes the answer key and student answers are in a single file. A sample file for scoring is shown below:

Sample_Test.txt

Student IDs are in columns 1–9. (The answer key uses all zeros, but the program would work equally well if you left these columns blank.) The student answers are in columns 11–20. The "trick" is to read the answer key and place the correct answers in the variables Key1–Key10.  You can do this by writing a conditional INPUT statement that uses the internal SAS variable _N_ to let you know when you are reading the first observation. Let's take a look at the program and annotate it.

data Score;
   infile 'c:\books\Test Soring\Sample_Test.txt' pad; ❶
   array Ans[10] $ 1 Ans1-Ans10;   ***student answers;array Key[10] $ 1 Key1-Key10;   ***answer key; 
   array Score[10] Score1-Score10; ***score array 
                                      1=right,0=wrong;
   retain Key1-Key10; ❸
 
   if _n_ = 1 then input @11 (Key1-Key10)($1.); ❹
   input @1  ID $9.
         @11 (Ans1-Ans10)($1.);
 
   do Item=1 to 10;
      Score[Item] = Key[Item] eq Ans[Item]; ❺
   end;
 
   Raw=sum(of Score1-Score10); ❻
   Percent=100*Raw / 10;
   keep Ans1-Ans10 ID Raw Percent;
   label ID      = 'Student ID'
         Raw     = 'Raw Score'
         Percent = 'Percent Score';
run;
 
proc sort data=Score; ❼
   by ID;
run;
 
title "Student Scores";
proc print data=Score label;	 ❽
   id ID;
   var Raw Percent;
run;

❶ The data file is located in the C:\Books folder. The PAD option is useful if there are lines of data that contain fewer than 10 answers—it is an instruction to PAD each line with blanks up to the defined record length.

❷ You set up three arrays: Ans holds the student answers, Key holds the answer key, and Score is a variable that indicates if the answer is correct or not (1=correct, 0=incorrect).

❸ Because you are reading the Key values from raw data, you need to RETAIN these values so that SAS does not set them to a missing value on subsequent iterations of the DATA step.

❹ When _N_ is equal to 1, you read the variables Key1–Key10.

❺ This is an interesting statement. The logical expression Key[Item] eq Ans[Item] will be 1 if the student has the correct answer and 0 otherwise. This value is then assigned to the variable Score[Item].

❻ The raw score (the number of items answered correctly) is the sum of the variables Score1–Score10.

❼ For a more logical report, you sort the observations in ID order.

❽ You use PROC PRINT (or PROC REPORT if you prefer) to list the data. The PROC PRINT option LABEL causes the procedure to use variable labels (if they have labels—and they do) instead of the default use of variable names as column headings. By the way, the default behavior for PROC REPORT is to use variable labels.

Below is the output from this program.

If you would like to know more about test scoring, including how to score tests with multiple versions or how to perform item analysis, please see my book Test Scoring and Item Analysis Using SAS (used copies are available on Amazon). As usual, comments or corrections are welcome. You may email me directly at Ron.Cody@gmail.com.

Using SAS to score a test was published on SAS Users.

11月 012022
 

SAS SQL handles missing values differently than the ANSI standard for SQL. PROC SQL follows the SAS convention for handling missing values: numerical missing values are always interpreted as less or smaller than all nonmissing values. My first blog showed that missing values can be troublemakers in non-grouped descriptive statistics. This blog now focuses on how to handle tables if they contain missing values.

In a single table

Missing values can lead to unexpected results while working with only one table, for example, in WHERE conditions, when sorting data using ORDER BY, and when grouping data using GROUP BY. (See the following examples.)

WHERE

If a column contains missing values, a WHERE condition can lead to undesirable results under certain circumstances. If you want to query all S0666 values smaller than 1.5 in the example code below, then a simple WHERE would also include missing values.

To keep S0666 values less than 1.5, but without missing values, you can extend the WHERE condition by adding IS NOT MISSING

Example: WHERE

proc sql ;
  select * 
  from SASHELP.SYR1001
  where S0666 < 1.5 
            and S0666 is not missing ;
quit ;

ORDER BY

When sorting data in ascending order, PROC SQL places missing values before data of type numeric or string. If that ORDER BY does not contain any other column to sort by, the values in other columns are only sorted if you expressly do so. If your data contain missing values, you may have to add further sorting variables to ORDER BY after the column containing missing values, for example, T and S0502 in the example below.

Example: ORDER BY

proc sql ;
  select * 
  from SASHELP.SYR1001
  order by S0666, T, S0502 ;
quit ;

GROUP BY

Missing values have subtle effects that may lead to fake statistics. The following example demonstrates the effect that missing values have on a seemingly simple calculation of means. To demonstrate, I replaced the missing values in column S0666 with 0s and saved the new column as S0666_0. The means of both S0666 and S0666_0 are calculated based on the sum of all available values divided by the number of nonmissing values data rows.

Example: GROUP BY

proc sql ;
       select T, S0666, 
	       avg(S0666) as S0666_MEAN,  
                case
		    when S0666 is missing then 0
                         else S0666 
                  end as S0666_0, 
          avg(calculated S0666_0) as S0666_0_MEAN 
	from SASHELP.SYR1001 ;
quit ;

Output

                   S0666_              S0666_0_
 T      S0666       mean     S0666_0       mean       

 52    1.52000    1.75444    1.52000    0.90229
 53    1.84000    1.75444    1.84000    0.90229
 54    1.96000    1.75444    1.96000    0.90229
 55     .         1.75444    0.00000    0.90229
 56     .         1.75444    0.00000    0.90229
 57     .         1.75444    0.00000    0.90229
 58     .          ...

Because the denominator for S0666_0_mean is larger, the resulting mean is (not surprisingly) lower in the output. However, S0666_mean also returns means for rows that did not contribute values to its calculation (only for the denominator, if you will).

When joining tables

If you are about to join tables, you need to check whether the keys of the tables to be joined contain missing values. And if they contain missing values, you need to check if they are of the same type. SAS defines missing values in different ways. Unfortunately, a numerical column may contain different types of missing values at the same time. Missing numerical values and missing strings are handled completely differently.

Even if your data contain only one type of numerical or string missing value, you may be in for a big surprise. In the following example, the data sets ONE and TWO are identical except for the names of the columns in TWO. Each second and third row have missing values in the key; the following example highlights its effect.

 

Data ONE Data TWO
data ONE ;
   input ID A B C ;
datalines;
01 1   2  3
.  1   2  3
.  99 99 99
05 1   2  3
;
run ;
data TWO ;
   input ID D E F;
datalines;
01 1   2  3
.  1   2  3
.  99 99 99
05 1   2  3
;
run ;

 

Example: Join

proc sql ; 
create table FJ as select
a.ID, A,B,C,D,E,F
from ONE a full join TWO b
on a.ID=b.ID
order by ID ;
quit ;

 

Output

ID     A     B     C     D     E     F
 .     1     2     3    99    99    99
 .    99    99    99     1     2     3
 .    99    99    99    99    99    99
 .     1     2     3     1     2     3
 1     1     2     3     1     2     3
 5     1     2     3     1     2     3

The missing values in the key of ONE and TWO generate a Cartesian product. Each missing value of ONE is combined with each missing value of TWO (the same effect applies to non-missing values as well). With 2 times 2 missing values, two missing values are enough to multiply the associated data rows. Please note also the limited effect of ORDER BY in the presence of missing values. Beware if you may have more than one missing value in keys and are about to join one or more tables.

Have you had unexpected results from missing values in your data when using PROC SQL? You can learn more about effects caused by missing values in Chapter 2 of my book Advanced SQL with SAS.

Unexpected results from missing values with PROC SQL was published on SAS Users.

10月 032022
 

We publish a lot of books by SAS experts at SAS Press, but how does someone become an expert in the first place? Becoming certified is one step, but who develops the certifications in the first place? Those are the true experts. They have to have a deep understanding of SAS, an ability to explain difficult concepts, and a passion for sharing their knowledge. Our SAS Press Author of the Month for October is Jane Eslinger, also known around the office as the PROC REPORT expert.

Jane has written two books on PROC REPORT that are the go-to reference for anyone who wants to understand all of the capabilities of the REPORT procedure. I asked her a few questions about how she became the PROC REPORT expert and what advice she has for SAS novices.

Q: What made you decide to write a book about PROC REPORT?

At the time I was in SAS Technical Support and answering customer questions on PROC REPORT. I tell people that PROC REPORT has a large learning curve. I knew PROC REPORT but discovered that information about it was spread across a wide variety of places – SAS notes, conference papers, and SAS Communities posts. I decided that I wanted to put all of the information in one place, so I wrote the first book.

Q: Have you noticed any new trends in reporting in the last few years?

Good question. I would say that programmers are more interested in making drill-down or cascading reports. People want to click from one report to the next, more refined report.

Q: What do you think is the most misunderstood feature of PROC REPORT?

Two things, actually. First, is the compute block. Programmers know that they can assign or change values within the compute block. They also know it can be used to affect styling. But it takes practice to get all of it right. And lots of programmers use compute blocks for the bare essentials but it can do so much more.

Second, is the COLUMN statement. Programmers want to control what is in the header section of the report but do not understand how do use spanning headers and null headers to get what they want.

Q: How did you first learn SAS?

I had to use it in my regression statistics course in college, as well as take a basic programming class for my degree.

Q: As an instructor, how do you encourage students who get frustrated when learning SAS?

I tell them that they are learning a new language or a new skill. You aren’t going to get it right the first time! I also tell them that programmers with 20 years of experience still have to look up syntax.

Q: What is your best advice for someone who wants to learn SAS?

Practice. But really the best way to practice is to have a project (school or at work) that requires you to use SAS. Learning SAS is much easier when you can connect programming concepts to your own project instead of arbitrary data.

Q: Just for fun, what is the last non-technical book you read that you enjoyed?

Wish You Were Here by Jodi Picoult.


Thanks for talking to me, Jane! Visit Jane’s author page to find an exclusive discount code that you can use to purchase her e-books in the month of October.

You can also find her at SESUG 2022 where she will be giving a hands-on training session on how to create PROC REPORT code for multiple report types. And check out Jane's "Ask the Expert" webinars anytime to learn more about PROC REPORT:

 

Meet our SAS Press Author of the Month -- Jane Eslinger was published on SAS Users.

8月 012022
 

Recently, the SAS Press team moved to a new building on the SAS campus. And when the SAS Press team moves, we bring a lot of books with us! Packing and organizing all of our books gave us a chance to appreciate all of our authors’ hard work during the more than 30 years that SAS Press has existed.

One author has an outsized presence on the SAS Press bookshelves – Ron Cody. He has written over a dozen books that include some of the most popular titles for new SAS users. He taught statistics at the Rutgers Robert Wood Johnson Medical School for many years and is a frequent presenter at SAS conferences.

Few people know more about SAS than Ron, which made him the perfect person to be our first SAS Press Author of the Month. During August, you can get special deals on all of Ron's books at RedShelf. Find out more information on his author page. Ron is also hosting a free webinar on data cleaning tips and tricks on August 11th.

We recently asked Ron to share a little bit about his journey as an author and teacher. As you might imagine, Ron has a lot of SAS knowledge and advice to share.

Ron Cody's books on a bookshelf.

The Cody section of the SAS Press bookshelf.

Q: When did you decide to write your first book?

I decided to write my first SAS book (Applied Statistics and the SAS Programming Language) in 1985. It was published by Elsevier Publishing, which was later bought by Prentice Hall. At the time I was writing the book, there were no other books out there—just SAS manuals. This book is still in print (in a fifth edition).

Q: What made you decide to keep writing more books about SAS?

Once I realized I could write well enough to get published, I got the "writer's bug." Although writing a book is hard work, the reward is substantial. My motivation is more about teaching people about SAS and statistics rather than the monetary reward. My goal in writing any book is to make enough money to take my wife to dinner!

Q: Is one of your books your favorite? Or do you love them all equally?

I do have some favorites. I would put Learning SAS by Example as one. It contains a section for beginners as well as later chapters that are useful to intermediate, or even advanced SAS programmers. I particularly like my latest book, A Gentle Introduction to Statistics Using SAS Studio in the Cloud. I think I made statistical concepts accessible to non-mathematically minded people. There are only a few equations in the entire book.

Q: As a teacher, how did you encourage students who were having a hard time understanding statistics?

I try really hard to convince them that the statistical concepts really make sense, and it is sometimes the terminology that gets in the way.

Q: What do you think students struggle with the most when they are learning SAS?

I believe the overwhelming richness of SAS can intimidate a beginning programmer. That's why I start from simple examples and explain step-by-step how each program works.

Q: What is your best advice for someone who wants to learn SAS?

Buy all of my books, of course! Just kidding. There are many YouTube videos and other online resources that are useful. I think two of the best books would be The Little SAS Book, and either Learning SAS by Example (for the more serious student) or Getting Started with SAS Programming Using SAS Studio in the Cloud. The latter book is more suited to someone using SAS OnDemand for Academics.

Q: You recently published a memoir about your time as an EMT. How did it feel to reflect back on that time of your life? Any more memoirs in your future?

I thoroughly enjoyed writing a book about my years as a volunteer EMT (10-8 Awaiting Crew: Memories of a Volunteer EMT). I was fortunate that I had kept a journal and recorded details of some of the more interesting and exciting calls. As of now, I do not have another memoir in my future, but I am working on a nonfiction novel. I'm not sure how successful it will be, but I'm going to give it a try. I strive to write almost every day. I tell other beginning authors, that if you spend a few hours writing every day, it becomes much easier. I call it "getting in the groove."


Ron Cody sitting in a chair with a cat in his lap. Ron is wearing a shirt that says "using Statistics to prove a point is just MEAN."

Ron and Dudley. Credit: Jan Cody

Thanks for talking to us, Ron! We hope you will join us in celebrating Ron's accomplishments as our Author of the Month. Check out his author page to see special deals on all of Ron’s books during August and to discover more resources.

 

Meet our SAS Press Author of the Month – Ron Cody was published on SAS Users.

7月 122022
 

SQL (Structured Query Language) is the most widely used programming language for relational databases worldwide. No other programming language produces more hits for a web search than SQL and interest is growing rapidly. In June 2010, Google showed 135 million hits versus 586 million hits in June 2020.

SQL is the programming language to handle big data. Within a very short time after its invention, SQL developed into an industry quasi-standard. The reasons behind this rapid development include the fast spread of databases, proven performance, popularity among users, and the growing relevance of analysts and programmers. Most analyst positions expect at least knowledge of basic SQL capabilities.

For analysts and programmers, SQL is attractive because once acquired, SQL skills can be applied in basically all environments from finance to healthcare, and in all systems from open source to commercial. Due to its relative independence from manufacturers (an ANSI standard), what you learn to do in SQL in Oracle can be applied to SQL in SAS.

PROC SQL in SAS is powerful and versatile. It offers myriad possibilities for working with:

  • descriptive statistics
  • advanced macro programming
  • SAS datasets, views, or complex queries
  • special application areas like integrity constraints or performance tuning
  • thousands of SAS functions and SAS function calls

If you start learning PROC SQL, you will also acquire the basics of PROC FEDSQL, PROC DS2, and PROC CAS. And that will offer you a handy toolbox for SAS platforms and applications like SAS 9.4, SAS Viya, SAS Integration Studio, SAS Studio, Enterprise Guide, and many more.

Is that tempting enough to try your hand at PROC SQL? No, you want to see what you get? I will show you examples of four programs that do the same thing using PROC SQL, PROC FEDSQL, PROC DS2, and PROC CAS. I’ll keep it simple just to prove the point. But have no fear, SQL can accomplish very advanced tasks. I’ve been involved in rewriting complex SQL programs that were thousands of lines long.

Let’s use PROC SQL as a springboard. From there, choose where you want to go.

Example 1: PROC SQL

proc sql;
   select REGION, SUBSIDIARY, SALES
   from work.shoes
      where SALES > 750000 ; 
quit;

With PROC FEDSQL, you can start working in the cloud environment of SAS Viya. Please note that PROC FEDSQL is not always 1:1 to PROC SQL as it may appear from this example.

Example 2: PROC FEDSQL

proc fedsql;
   select REGION, SUBSIDIARY, SALES
   from work.shoes
      where SALES > 750000 ; 
quit;

DS2 allows you to speed up processing by using its built-in multi-threading capabilities.

Example 3: PROC DS2

proc DS2 ; 
data LEFT_RIGHT4 (overwrite=yes) ; 
method run(); 
set {select LEFT.ID, 
             LEFT.A, RIGHT.F 
     from work.LEFT, work.RIGHT 
       where LEFT.ID = RIGHT.ID} ; 
output ; 
end ; 
enddata ;
run ;
quit;

PROC CAS enables you to take advantage of SAS Cloud Analytic Services (CAS).

Example 4: PROC CAS

proc CAS; 
session my_CAS_session ; 
  fedsql.execdirect 
  query=
   'select * 
    from 
    CASUSER.CAS_CLASS' ;
Quit ;

Notice that the SQL language elements like the SELECT statement are the same in each example. Once you have learned the basic syntax, you can use it in PROC FEDSQL, PROC DS2, and PROC CAS. And I am pretty sure, there are some more to come.

Why learn SQL? Because it’s a sustainable investment in your future. If you want to learn more about PROC SQL techniques, check out my book Advanced SQL with SAS®.

Why learn SQL? was published on SAS Users.

7月 122022
 

SQL (Structured Query Language) is the most widely used programming language for relational databases worldwide. No other programming language produces more hits for a web search than SQL and interest is growing rapidly. In June 2010, Google showed 135 million hits versus 586 million hits in June 2020.

SQL is the programming language to handle big data. Within a very short time after its invention, SQL developed into an industry quasi-standard. The reasons behind this rapid development include the fast spread of databases, proven performance, popularity among users, and the growing relevance of analysts and programmers. Most analyst positions expect at least knowledge of basic SQL capabilities.

For analysts and programmers, SQL is attractive because once acquired, SQL skills can be applied in basically all environments from finance to healthcare, and in all systems from open source to commercial. Due to its relative independence from manufacturers (an ANSI standard), what you learn to do in SQL in Oracle can be applied to SQL in SAS.

PROC SQL in SAS is powerful and versatile. It offers myriad possibilities for working with:

  • descriptive statistics
  • advanced macro programming
  • SAS datasets, views, or complex queries
  • special application areas like integrity constraints or performance tuning
  • thousands of SAS functions and SAS function calls

If you start learning PROC SQL, you will also acquire the basics of PROC FEDSQL, PROC DS2, and PROC CAS. And that will offer you a handy toolbox for SAS platforms and applications like SAS 9.4, SAS Viya, SAS Integration Studio, SAS Studio, Enterprise Guide, and many more.

Is that tempting enough to try your hand at PROC SQL? No, you want to see what you get? I will show you examples of four programs that do the same thing using PROC SQL, PROC FEDSQL, PROC DS2, and PROC CAS. I’ll keep it simple just to prove the point. But have no fear, SQL can accomplish very advanced tasks. I’ve been involved in rewriting complex SQL programs that were thousands of lines long.

Let’s use PROC SQL as a springboard. From there, choose where you want to go.

Example 1: PROC SQL

proc sql;
   select REGION, SUBSIDIARY, SALES
   from work.shoes
      where SALES > 750000 ; 
quit;

With PROC FEDSQL, you can start working in the cloud environment of SAS Viya. Please note that PROC FEDSQL is not always 1:1 to PROC SQL as it may appear from this example.

Example 2: PROC FEDSQL

proc fedsql;
   select REGION, SUBSIDIARY, SALES
   from work.shoes
      where SALES > 750000 ; 
quit;

DS2 allows you to speed up processing by using its built-in multi-threading capabilities.

Example 3: PROC DS2

proc DS2 ; 
data LEFT_RIGHT4 (overwrite=yes) ; 
method run(); 
set {select LEFT.ID, 
             LEFT.A, RIGHT.F 
     from work.LEFT, work.RIGHT 
       where LEFT.ID = RIGHT.ID} ; 
output ; 
end ; 
enddata ;
run ;
quit;

PROC CAS enables you to take advantage of SAS Cloud Analytic Services (CAS).

Example 4: PROC CAS

proc CAS; 
session my_CAS_session ; 
  fedsql.execdirect 
  query=
   'select * 
    from 
    CASUSER.CAS_CLASS' ;
Quit ;

Notice that the SQL language elements like the SELECT statement are the same in each example. Once you have learned the basic syntax, you can use it in PROC FEDSQL, PROC DS2, and PROC CAS. And I am pretty sure, there are some more to come.

Why learn SQL? Because it’s a sustainable investment in your future. If you want to learn more about PROC SQL techniques, check out my book Advanced SQL with SAS®.

Why learn SQL? was published on SAS Users.

4月 052022
 

Have you ever heard something referred to as the bee’s knees? Most likely the person uttering that expression meant that it was truly amazing and extraordinary. Maybe you stopped and pondered the origin of the phrase. Well wonder no more! In the 1920s in the United States, people were obsessed with rhyming and anthropomorphism (giving human traits or attributes to animals). This era saw the introduction of multiple animal-centric idioms: the ant’s pants, the cat’s pajamas, and (you guessed it) the bee’s knees. Most likely this idiom refers to the pollen basket on the lower section of a worker bee’s legs (where the tibia and the tarsi connect). When bees are out foraging, they carry all the pollen they find in the pollen baskets on their legs. If you ever see a bee with lots of pollen on her legs, you know she’s been working hard… one might even say she’s an amazing and extraordinary worker!

SAS Visual Analytics has a lot of features that you might call the bee’s knees, but one of the most amazing and extraordinary features is the AggregateTable operator. This operator, introduced in SAS Visual Analytics 8.3, enables you to perform aggregations on data crossings that are independent of (or changed from) the data in your objects. This means you can use this operator to compare aggregations for different groups of data in one single object.

To illustrate the AggregateTable operator in action (and to keep with the theme), let’s consider an example.

I’m a hobby beekeeper in Texas. This means that I own and maintain a few hives in my backyard from which I collect honey and wax to sell to family and friends. I’m interested in learning about honey production in the United States for different years. I’m pretty sure I’m not the biggest honey producer in my state (or even my county), but I want to look at different crossings of total production (by state and year, by county and year, and by state).

The United States Department of Agriculture’s National Agricultural Statistics Service has Census data on honey production (measured in pounds) for all counties in the United States for 2002, 2007, 2012, and 2017.

Type of Aggregation: Add

To start, I would like to calculate the total production for each state (Maximum by State) and display the total production for the state that produced the most honey in that year. For example, in 2002 California produced the most honey of any state (15,585,925 pounds) and in 2017 North Dakota produced the most honey of any state (24,296,437 pounds).

Because the table contains details by county and I’m interested in the total production by state, I will either need to create an aggregated data source that contains details by state, or I will need to use the AggregateTable operator. Since this post is about the wonders of the AggregateTable operator, let’s focus on that.

The AggregateTable operator requires five parameters:

Parameter Description
Aggregation- Aggregated The aggregation applied to the aggregated item when it is used in an object that displays fewer group-by crossings than the table in the expression.
Aggregation- Measure The aggregation applied to the measure in the inner table context.
Type of aggregation The type of aggregation that is performed. Values are Fixed, Add, or Remove.
Categories The list of categories used to alter the data crossing for the aggregation.
Measure The measure that is aggregated. A Table operator can be added as the measure to perform a nested aggregation.

It also has a nested operator, Table, that creates an intermediate table defined by the Type of aggregation, Categories, Measure, and Aggregation- Measure parameters.

For this example, I want to use a Key value object to see the total production values for the state that produced the most honey in each year. The object will contain one group-by crossing (or category): Year. The calculation, however, will need to use two group-by crossings to determine the highest producing state for each year: Year and State. Therefore, the Aggregation-Measure is _Sum_ (to calculate the total production by state), the Type of aggregation is Add (because I want to add a crossing for State to the calculation), Categories is set to State, and Measure is Production (lb).

The intermediate table will contain one row for each state and year and contain total production values.

Then, for each year, I want the highest production value (for 2002, 15,585,925 pounds). Therefore, the Aggregation- Aggregated parameter should be _Max_ to grab the maximum values for each year from the intermediate table.

Then, I can display the Maximum by State with Year in a Key value object.

Note: Beginning in SAS Visual Analytics 2021.1.1 (May 2021), a new role is available for the Key value object, Lattice category. This enables you to display a key value object for each distinct value of a category data item (in this example, Year).

Now that I have a data item that contains the production amount for the highest producing state for each year, I can create some more complex calculations, like the percentage of total production for each year by the state that had the highest production. This will enable me to see if the highest producing state is doing all the heavy lifting or if all the states are producing a near-equal amount.

Type of Aggregation: Remove

The Type of aggregation parameter also enables you to remove categories from the calculation. For this example, suppose I want to compare the production in each county to the production from the highest producing county in that state (Maximum by County). I want to use a list table to compare these values.

The object will contain three group-by crossings: Year, State, and County. The calculation, however, will only use two group-by crossings to determine the highest producing county in each state for each year: Year and State. Therefore, the Aggregation-Measure is _Max_ (to calculate the maximum production in each state), the Type of aggregation is Remove (because I want to remove a crossing for County from the calculation), Categories is set to County, and Measure is Production (lb).

The intermediate table will contain one row for each state and year and contain the production values for the county in that state with the highest production. Notice that for this table, the aggregation for Production was set to Maximum to show the maximum production for each state.

Because the number of groupings in the object (County, Year, and State) is not fewer than the number of groupings in the calculation (Year and State), the Aggregation-Aggregated parameter is not applied and can be set to any value.

Then, I can display Maximum by County in a list table with total production by county to easily compare each county’s production with the production of the county in that state that had the highest production.

Now, I can calculate each county’s difference from the county with the highest production in that state for each year.

Type of Aggregation: Fixed

The Type of aggregation parameter also enables you to set a fixed category for the calculation. For this example, suppose I want to compare the production in each county to the total production across all years by state (Total by State). I want to use a list table to compare these values.

The object will contain three group-by crossings: Year, State, and County. The calculation, however, will only use one group-by crossing to determine the total production by state across all years: State. Therefore, the Aggregation-Measure is _Sum_ (to calculate the total production by state across all years), the Type of aggregation is Fixed (because I want to fix the crossing to State for the calculation), Categories is set to State and Measure is Production (lb).

The intermediate table will contain one row for each state and total production values across all years.

Because the number of groupings in the object (County, Year, and State) is not fewer than the number of groupings in the calculation (State), the Aggregation-Aggregated parameter is not applied and can be set to any value.

Then, I can display Total by State in a list table with total production by county to easily compare each county’s production with total production in the state across all years.

I can even compare total state values for each year with total production in that state across all years.

Then, I can calculate the share of total production produced each year.

For more information about how to create advanced data items and filters for your SAS Visual Analytics reports, including examples that use different types of operators, check out my book Interactive Reports in SAS Visual Analytics: Advanced Features and Customization.

Bees are important contributors in pollinating most of the food that you eat and are in desperate need of your help! There are many ways you can help the honeybee population thrive:

  • Become a beekeeper
  • Plant a garden for all pollinators (including bumblebees, butterflies, bats, and moths)
  • Reduce or eliminate pesticide use
  • Support your local beekeepers by buying local honey
  • Contact a beekeeping group if you see a swarm
  • Volunteer as a citizen data scientist by taking pictures and posting information on the iNaturalist app
  • Create a bee bath for those hot summer days
  • Build homes for native bees

 

The AggregateTable operator… It’s the bee’s knees! was published on SAS Users.

2月 172022
 

This blog serves two purposes: the main purpose is to show you some useful SAS coding techniques, and the second is to show you an interesting method of creating a Beale cipher.

TJ Beale is famous in Virginia for leaving behind three ciphers, supposedly describing the location of hidden gold and treasures. (Most cryptologists and historians believe the whole set of ciphers and treasure was a hoax.) In one of the ciphers, he used a method based on the Declaration of Independence. His coding method was as follows:

  • Get a copy of the Declaration of Independence and number each word.
  • Take the first letter of each word and form a list.
  • Associate each number with that letter.

For example, consider this text:

“Four score and seven years ago, our fathers brought forth upon this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal. “

To create a Beale cipher, you would proceed as follows:

Four(1) score(2) and(3) seven(4) years(5) ago(6), our(7) fathers(8) brought(9) forth(10) upon(11) this(12) continent(13) a(14) new(15) nation(16), conceived(17) in(18) liberty(19) and(20) dedicated(21) to(22) the(23) proposition(24) that(25) all(26) men(27) are(28) created(29) equal(30).

Next, you would make a table like this:

Letter Numbers
F 1,8,10 (all the numbers of words that begin with 'F')
S 2,4
A 3,6,14,20,26,28
Y 5
…and so on

 

You would then want to put the list in alphabetical order like this:

A 3,6,14,20,26,28
B 9
C 13,17,29
D 21
E 30
F 1,8,10
…and so on

 

To create your cipher, select any number at random from the list of numbers, corresponding to the letter that you want to encode. The advantage of this over a simple substitution cipher is that you cannot use frequency analysis to guess what letter a particular number represents.

This blog explains how to create a Beale cipher; my next blog will explain how to decipher a Beale cipher.

You need to start out with a book or document that is accessible to the sender and recipient of the cipher. To offer some additional security, you could decide to start from a specific page in a book. For a simple demonstration of how to create a Beale cipher, I have entered part of the Declaration of Independence in a text file called Declare.txt.

A funny aside: I was teaching my Functions course in the UK, in a small town north of London on the Thames. One of the programs demonstrating several SAS character functions was the program I'm using here to demonstrate how to create a Beale cipher. I had completely forgotten that the document was the Declaration of Independence. Whoops! I asked the class, "I hope you're not still angry with us about that." Apparently not, and we all had a good laugh.

Back to the problem. I will break down the program into small steps and provide a partial listing of data sets along the way, so that you can see exactly how the program works. The first step is read the text file, extract the first letter from each word, change the letter to uppercase, and associate each letter with the count of words in the text.

Here is the first part of the program.

data Beale;
   length Letter $ 1;  
   infile 'c:\Books\Blogs\Declare.txt'; 
   input Letter : $upcase1. @@; ❶
   N + 1; ❷	
   output;
run;
 
title "First Step in the Beale Cipher (first 10 observations)";
proc print data=Beale(obs=10) noobs;
run;

❶ By using the $UPCASE1. informat, you are selecting the first letter of each word and converting it to uppercase. If you are unfamiliar with the $UPCASEn. informat, it is similar to the $n. informat with the additional task of converting the character(s) to uppercase.

❷ You use a SUM statement to associate each letter with the word count.

Here is the listing from this first step:

Next, you need to sort the data set by Letter so that all the words that start with As, Bs, and so forth are placed together.

proc sort data=Beale;
   by Letter;
run;
 
title "The list in sorted order (partial listing)";
proc print data=Beale(obs=10) noobs;
run;

Below is a partial listing of the sorted file:

Any of the numbers 24, 25, 27, and so forth can be used to code an 'A'.

The final step is to list all the letters from A to Z (Z is pronounced Zed in the UK and Canada) in a line, followed by all the possible numbers associated with each letter.

data Next;
   length List $ 40; ❸
   retain List; ❹
   set Beale;
   by Letter; ❺
   if first.Letter then List = ' '; ❻
   List = catx(',',List,N); ❼ 
   if last.Letter then output; ❽ 
run;
 
title "List of Beale Substitutions";
proc print data=next(obs=5) noobs;
   var Letter List;
run;

❸ The variable List will hold all the possible numbers that can be used to code any of the letters. In a real program, this list might be longer.

❹ You need to RETAIN this variable; otherwise, it would be set back to a missing value for each iteration of the DATA step.

❺ Following the SET statement with a BY statement creates the two temporary variables, First.Letter and Last.Letter. First.Letter is true when you are reading the first observation for each letter—Last.Letter is true when you are reading the last observation for a letter.

❻ For the first A, B, C, and so on, initialize the variable List to a missing value.

❼ Use the CATX function to concatenate all the numbers, separated by commas.

❽ When you are done reading the last A, B, C, and so on, output the string.

Below are a few lines generated by this program:

For more information about the CATX function and other SAS functions, please take a look at my book, SAS Functions by Example, Second Edition.

Fun with Ciphers (Part 1) was published on SAS Users.

2月 012022
 

When we moved out to the country with our two dogs, our oldest dog Todd suddenly decided he liked to howl…. And he would do so every time we left the house. Maybe it was the country air? Maybe it was a time-lapse gene? Maybe he just wanted to learn something new?

If you’ve been using SAS Visual Analytics for a while, it’s possible you might have never created reports that linked to external URLs. SAS Visual Analytics can do so much on its own, perhaps you never thought about extending its functionality outside the product itself! Well, it’s time to learn a new trick.

To illustrate how this can be done (and to keep with the theme), let’s consider an example.

I’m interested in adding a new member to my family (a dog!), and I know I would like to adopt an animal in need. I’m not sure, however, which breed will suit my lifestyle. I need a dog that’s playful and sweet, but one that also likes to sleep late.

I have a report in SAS Visual Analytics that shows details about animals surrendered at an Austin animal shelter. I’d like to select various characteristics (like animal type, sex, whether the dog is spayed or neutered, and condition) and see what breeds they have available. Then, I’d like to see additional details about each breed at the American Kennel Club website (www.akc.org). On this website, you can find information about various breeds of dogs (and cats!), including average sizes, life expectancy, personality, and many other traits.

I’ll add an interactive link to the report, so when a user selects a specific breed, the page for that breed appears. The interactive link will use parameters to pass a selected value from the report to the web page.

To create interactive links, I like to follow four simple steps:

  1. Research the structure of the URL
  2. Use a hardcoded value to test the link
  3. Parameterize the link
  4. Test the parameterized value

Step 1: Research

Before adding interactive links to a report, you need to understand how the target web page structures the URL. I typically do this by accessing the target web page and searching for a specific subject. For some websites, you might need to view the Developer Guide for the website to fully understand the structure.

Typically, URLs are constructed in one of three ways:

  • Path: In these URLs, the subject is added at the end of the URL. For example, to view a country page on Wikipedia, you use the following URL: https://en.wikipedia.org/wiki/country where country is the full name of the country of interest.
  • Query: In these URLs, the subject is assigned to the value of a URL parameter using a sequence of attribute-value pairs: ?parameter1=value1&parameter2=value2. Multiple parameters can be assigned by separating the attribute-value pairs with an ampersand (&). For example, to search Etsy for a specific type of item, you use the following URL: https://www.etsy.com/search?q=item where item is the specific search string.
  • File: In these URLs, the subject is a part of a file name at the end of the URL. For example, to view a country profile on CIA Factbook, you use the following URL: https://www.cia.gov/library/publications/the-world-factbook/geos/country-code.html where country-code is the 2-letter abbreviation of the country of interest.

To start, I’ll select one of the breeds in the list: Australian Cattle Dog (my current dog’s breed). The American Kennel Club website has a drop-down selector at the top of the page where you can select a breed.

Australian cattle dogs are alert, curious, and pleasant. Tell me about it! He won’t let a leaf fly by outside without raising the alarm.

The URL is constructed with the breed as part of the path: https://www.akc.org/dog-breeds/australian-cattle-dog/. Notice that for breeds with multiple words (like Australian Cattle Dog), the link uses hyphens (-) instead of spaces.

Step 2: Hardcode

Now that you understand the structure of the URL, you can test the link using various hardcoded values. For example, to view details about dachshunds, go to https://www.akc.org/dog-breeds/dachshund/. Dachshunds are friendly, curious, and spunky. They must be! Why else would they be chosen to star in dog races at Oktoberfest celebrations around the world?

Step 3: Parameterize

After you have tested the URL using hardcoded values, you need to replace the hardcoded value with parameters. These are values that will be passed from your report to the external URL to make the links interactive. For the report, I’ll add the link to the word cloud and replace the hardcoded breed with the breed I select in the report.

Because the URL replaces spaces with hyphens, I have created a calculated item in SAS Visual Analytics that has breeds with multiple words separated by slashes instead of spaces, Breed (ForLink).

Because I want to pass Breed (ForLink) from the word cloud to the URL, I need to add it to one of the roles for the word cloud. I don’t want the breeds to appear with hyphens in the word cloud, so I’ll add the calculated item to the Hidden role.

Tip: Data items assigned to the Hidden role are available for color-mapped display rules, external links, and mapping data sources and should only be assigned if it will not increase the number of rows in the query. In this example, the word cloud shows details about breeds. Adding Breed (ForLink) to the Hidden role makes the value available for the external link and does not increase the number of rows in the query.

Then, to add the link:

  1. With the word cloud selected, click Actions in the right pane and expand URL Links.
  2. Click New URL Link.
  3. Specify a descriptive name for the link.
  4. For the URL, enter the URL up to, but not including, the breed (https://www.akc.org/dog-breeds/); this value will be passed from the selected breed in the word cloud.
  5. Next to Parameters, click the Add icon.
  6. For the Source field, select Breed (ForLink) and leave the Target value blank. Adding nothing to the Target field indicates that the value of Breed (ForLink) should be appended to the end of the URL.

When a viewer selects a breed in the word cloud, the breed value will be appended to the end of the URL and details for that breed will be displayed.

Step 4: Test

After the interactive link has been created, you need to ensure that the link works by testing it in the report.

I’ll select both Cat and Dog as the type of animal, Male for sex, Yes for spayed or neutered, and Aged for condition. There are 143 animals in the Austin animal shelter that meet these criteria. I’m thinking a Labrador retriever might be good for my family, so I’ll double-click Labrador Retriever in the word cloud to see the traits and characteristics for that breed. It looks like Labrador retrievers are friendly, active, and outgoing, and they are also highly adaptable (meaning I can train them to sleep late). It sounds like a perfect fit!

For more information about how to add interactive links to your SAS Visual Analytics reports, including examples on creating links with different URL structures, check out my book Interactive Reports in SAS Visual Analytics: Advanced Features and Customization.

You can’t teach an old dog new tricks… or can you? was published on SAS Users.