Data cardinality is an important topic for for data analysis. In mathematical terms, cardinality is simply the number of elements in a set. But in data applications, cardinality signifies the number of unique values for a given field of data. Related terms include number of levels (thank you, PROC FREQ and NLEVELS), percent unique, as well as discrete data -- data with a finite number of different values. (That's not to be confused with discreet data, which is when you're careful not to share too much personal information. For example, we can count the Kardashians in discrete increments. But that doesn't mean that they behave discreetly.)

#### But first, this

Since I've been working with cardinality recently, I've developed this persistent earworm that I just can't shake. I decided to write it down and present it to you. (You're welcome.)

(I've got) Cardinality (Apologies to Lloyd Price)

Oh-oh-over and over
I'll iterate through each row
Over and over, tell me why it's so
Over and over, each value that's unique
But oh-oh-over and over
I wish I ran my PROC FREQ

'Cause I've got - (cardinality)
High - (cardinality) N - (cardinality)
Discrete - (cardinality) HUGE - (cardinality)
Unique - (cardinality)
'Cause I got a big data set
So over ... and over
There's a high percent unique
Well, well, well over and over
I wish my data were discrete

#### Cardinal importance of cardinality

SAS Visual Analytics -- which I've been using for recent projects -- makes a big deal out of cardinality. The displayed list of variables shows the cardinality values front-and-center (see the inset screenshot image). Knowing how many different values a column can take on should influence the type of visualization you design. For example, it might make sense to group a set of charts by GENDER, which typically has a cardinality of 2 (or at most 3, if "unknown" is permitted). However, it would make less sense to group your charts by CUSTOMER_ID when that field has a cardinality in the thousands.

Many SAS users already know how to calculate cardinality. sasCommunity.org hosts a useful reference on "Cardinality Ratio" (more-or-less curated by Ron Fehd), complete with sample programs, SAS macros, and explanations.

I have a variation of my own, developed mostly by my colleague Hans-Joachim Edert. It produces a report like this:

What can you learn from this report? In this example, we can see that Origin (where the model of car comes from) has just 3 possible values, making it less than 1% unique across the 428 records. That might make it a good grouping variable. Model, on the other hand, has a unique value for almost every record. The fact that it's so close to 100% unique might indicate a data quality problem. (Do we expect all Model names to be unique?) And the numeric variable Cylinders has only 8 different values, which might indicate that it's more of a category than a measure. (Would you want to SUM or AVERAGE the Cylinders across these records?)

Here's a link to the program code if you want to try it yourself. Here's a sample call to the macro in the program:

`%getcardinality(SASHELP.CARS, WORK.CARDS, 1);`

#### Coming up...

In a future article I'll present a SAS Enterprise Guide custom task that makes the "check cardinality" process easy -- no programming. (Here's a sneak peak.) But you do not have to wait to learn more about your data: you can already glean much of this (and more) by using the Characterize Data task.

Data cardinality is an important topic for for data analysis. In mathematical terms, cardinality is simply the number of elements in a set. But in data applications, cardinality signifies the number of unique values for a given field of data. Related terms include number of levels (thank you, PROC FREQ and NLEVELS), percent unique, as well as discrete data -- data with a finite number of different values. (That's not to be confused with discreet data, which is when you're careful not to share too much personal information. For example, we can count the Kardashians in discrete increments. But that doesn't mean that they behave discreetly.)

#### But first, this

Since I've been working with cardinality recently, I've developed this persistent earworm that I just can't shake. I decided to write it down and present it to you. (You're welcome.)

(I've got) Cardinality (Apologies to Lloyd Price)

Oh-oh-over and over
I'll iterate through each row
Over and over, tell me why it's so
Over and over, each value that's unique
But oh-oh-over and over
I wish I ran my PROC FREQ

'Cause I've got - (cardinality)
High - (cardinality) N - (cardinality)
Discrete - (cardinality) HUGE - (cardinality)
Unique - (cardinality)
'Cause I got a big data set
So over ... and over
There's a high percent unique
Well, well, well over and over
I wish my data were discrete

#### Cardinal importance of cardinality

SAS Visual Analytics -- which I've been using for recent projects -- makes a big deal out of cardinality. The displayed list of variables shows the cardinality values front-and-center (see the inset screenshot image). Knowing how many different values a column can take on should influence the type of visualization you design. For example, it might make sense to group a set of charts by GENDER, which typically has a cardinality of 2 (or at most 3, if "unknown" is permitted). However, it would make less sense to group your charts by CUSTOMER_ID when that field has a cardinality in the thousands.

Many SAS users already know how to calculate cardinality. sasCommunity.org hosts a useful reference on "Cardinality Ratio" (more-or-less curated by Ron Fehd), complete with sample programs, SAS macros, and explanations.

I have a variation of my own, developed mostly by my colleague Hans-Joachim Edert. It produces a report like this:

What can you learn from this report? In this example, we can see that Origin (where the model of car comes from) has just 3 possible values, making it less than 1% unique across the 428 records. That might make it a good grouping variable. Model, on the other hand, has a unique value for almost every record. The fact that it's so close to 100% unique might indicate a data quality problem. (Do we expect all Model names to be unique?) And the numeric variable Cylinders has only 8 different values, which might indicate that it's more of a category than a measure. (Would you want to SUM or AVERAGE the Cylinders across these records?)

Here's a link to the program code if you want to try it yourself. Here's a sample call to the macro in the program:

`%getcardinality(SASHELP.CARS, WORK.CARDS, 1);`

#### Coming up...

In a future article I'll present a SAS Enterprise Guide custom task that makes the "check cardinality" process easy -- no programming. (Here's a sneak peak.) But you do not have to wait to learn more about your data: you can already glean much of this (and more) by using the Characterize Data task.

Data cardinality is an important topic for for data analysis. In mathematical terms, cardinality is simply the number of elements in a set. But in data applications, cardinality signifies the number of unique values for a given field of data. Related terms include number of levels (thank you, PROC FREQ and NLEVELS), percent unique, as well as discrete data -- data with a finite number of different values. (That's not to be confused with discreet data, which is when you're careful not to share too much personal information. For example, we can count the Kardashians in discrete increments. But that doesn't mean that they behave discreetly.)

#### But first, this

Since I've been working with cardinality recently, I've developed this persistent earworm that I just can't shake. I decided to write it down and present it to you. (You're welcome.)

(I've got) Cardinality (Apologies to Lloyd Price)

Oh-oh-over and over
I'll iterate through each row
Over and over, tell me why it's so
Over and over, each value that's unique
But oh-oh-over and over
I wish I ran my PROC FREQ

'Cause I've got - (cardinality)
High - (cardinality) N - (cardinality)
Discrete - (cardinality) HUGE - (cardinality)
Unique - (cardinality)
'Cause I got a big data set
So over ... and over
There's a high percent unique
Well, well, well over and over
I wish my data were discrete

#### Cardinal importance of cardinality

SAS Visual Analytics -- which I've been using for recent projects -- makes a big deal out of cardinality. The displayed list of variables shows the cardinality values front-and-center (see the inset screenshot image). Knowing how many different values a column can take on should influence the type of visualization you design. For example, it might make sense to group a set of charts by GENDER, which typically has a cardinality of 2 (or at most 3, if "unknown" is permitted). However, it would make less sense to group your charts by CUSTOMER_ID when that field has a cardinality in the thousands.

Many SAS users already know how to calculate cardinality. sasCommunity.org hosts a useful reference on "Cardinality Ratio" (more-or-less curated by Ron Fehd), complete with sample programs, SAS macros, and explanations.

I have a variation of my own, developed mostly by my colleague Hans-Joachim Edert. It produces a report like this:

What can you learn from this report? In this example, we can see that Origin (where the model of car comes from) has just 3 possible values, making it less than 1% unique across the 428 records. That might make it a good grouping variable. Model, on the other hand, has a unique value for almost every record. The fact that it's so close to 100% unique might indicate a data quality problem. (Do we expect all Model names to be unique?) And the numeric variable Cylinders has only 8 different values, which might indicate that it's more of a category than a measure. (Would you want to SUM or AVERAGE the Cylinders across these records?)

Here's a link to the program code if you want to try it yourself. Here's a sample call to the macro in the program:

`%getcardinality(SASHELP.CARS, WORK.CARDS, 1);`

#### Coming up...

In a future article I'll present a SAS Enterprise Guide custom task that makes the "check cardinality" process easy -- no programming. (Here's a sneak peak.) But you do not have to wait to learn more about your data: you can already glean much of this (and more) by using the Characterize Data task.

Has this ever happened to you? You have a SAS program with statements that you wrote, then you make use of a macro function or %include file supplied by a helpful colleague, and when your SAS code resumes, you find that all of your SAS titles have been changed out from under you!

To be a "good citizen" within a macro function, you should return the SAS environment to the way you found it. You'll occasionally see constructs like this to store and restore SAS option values:

```%let _currValidName = %sysfunc(getoption(validvarname)); %put Overriding current VALIDVARNAME setting of &_currValidName; options validvarname=v7;   /* some statements that rely on VALIDVARNAME */   %put Restoring previous VALIDVARNAME setting to &_currValidName; options validvarname=&_currValidName.;```

(Even more is possible with PROC OPTSAVE; see Denise Poll's paper on this topic.)

But what about the titles and footnotes? Is there a way to squirrel those values away before usurping them, and then put them back the way that you found them?

Yes, there is. Even though TITLE and FOOTNOTE are global SAS statements and not system options per se, you can programmatically query the current settings from one of the SAS dictionary tables: SASHELP.VTITLE. (Despite the "VTITLE" name, this table also contains FOOTNOTE settings.)

You can use these macros (or the code within them) to save and restore the current settings for TITLE and FOOTNOTE statements:

```/* Define macro to save titles */ %macro saveTitles; data _savedTitles; set sashelp.vtitle; run; %mend;   /* Define macro to restore previously saved titles */ %macro restoreTitles; proc sql noprint; /* Using a SAS 9.3 feature that allows open-ended macro range */ select text into :SavedTitles1- from _savedTitles where type="T"; %let SavedTitlesCount = &sqlobs.;   /* and footnotes */ select text into :SavedFootnotes1- from _savedTitles where type="F"; %let SavedFootnotesCount = &sqlobs.;   /* remove data set that stored our titles*/ drop table _savedTitles; quit;   /* emit statements to reinstate the titles */ TITLE; /* clear interloping titles */ %do i = 1 %to &SavedTitlesCount.; TITLE&i. "&&SavedTitles&i."; %end;   FOOTNOTE; /* clear interloping footnotes */ %do i = 1 %to &SavedFootnotesCount.; FOOTNOTE&i. "&&SavedFootnotes&i."; %end; %mend;```

Sample use:

```title "This is my tremendous title"; title2 "and this is a subtitle"; footnote "Created by ME"; proc means data=sashelp.class; var weight; run; %saveTitles;   %someMacroThatChangesTitles();   /* restore the old titles */ %restoreTitles; proc means data=sashelp.class; var height; run;```

There is at least one idiosyncrasy of this approach: if your title or footnote includes a macro variable or expression, that macro will be resolved when the title is stored. So when you restore to the original value with %restoreTitles(), the value will be "stuck" as it was when you used %saveTitles(). If you dislike that limitation, then perhaps some hotshot macro programmer will add a solution here in the comments.

A couple of years ago I shared a method for copying any file within a SAS program. It was a simple approach, copying the file byte-by-byte from one fileref (SAS file reference) to another.

My colleague Bruno Müller, a SAS trainer in Switzerland, has since provided a much more robust method. Bruno's method has several advantages:

• It's coded as a SAS macro, so it is simple to reuse -- similar to a function.
• It copies the file content in chunks rather than byte-by-byte, so it's more efficient.
• It provides good error checks and reports any errors and useful diagnostics to the SAS log.
• It's an excellent example of a well-documented SAS program!

Bruno tells me that "copying files" within a SAS program -- especially from nontraditional file systems such as Web sites -- is a common need among his SAS students. I asked Bruno for his permission to share his solution here, and he agreed.

To use the macro, you simply define two filerefs: _bcin (source) and _bcout (target), then call the %binaryFileCopy() macro. Here is an example use that copies a file from my Dropbox account:

```filename _bcin TEMP; filename _bcout "C:\temp\streaming.sas7bdat"; proc http method="get" url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" out=_bcin ; run;   %binaryFileCopy() %put NOTE: _bcrc=&_bcrc;   filename _bcin clear; filename _bcout clear;```

The following is partial log output from the program:

```NOTE: BINARYFILECOPY start  17SEP2013:20:50:33
NOTE: BINARYFILECOPY infile=_bcin C:\SASTempFiles\_TD5888\#LN00066
NOTE: BINARYFILECOPY outfile=_bcout C:\temp\streaming.sas7bdat

NOTE: BINARYFILECOPY processed 525312 bytes
NOTE: DATA statement used (Total process time):
real time           0.20 seconds
cpu time            0.07 seconds

NOTE: BINARYFILECOPY end  17SEP2013:20:50:34
NOTE: BINARYFILECOPY processtime 00:00:00.344
```

You can download the program -- which should work with SAS 9.2 and later -- from here: binaryfilecopy.sas

#### Update: using FCOPY in SAS 9.4

Updated: 18Sep2013
Within hours of my posting here, Vince DelGobbo reminded me about the new FCOPY function SAS 9.4. With two filerefs assigned to binary-formatted files, you can use FCOPY to copy the content from one to the other. When I first tried it with my examples, I had problems because of the way FCOPY treats logical record lengths. However, Jason Secosky (the developer for FCOPY and tons of other SAS functions) told me that if I use RECFM=N on each FILENAME statement, the LRECL would not be a problem. And of course, he was correct.

Here's my example revisited:

```filename _bcin TEMP recfm=n /* RECFM=N needed for a binary copy */; filename _bcout "C:\temp\streaming.sas7bdat" recfm=n;   proc http method="get" url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" out=_bcin ; run;   data _null_; length msg \$ 384; rc=fcopy('_bcin', '_bcout'); if rc=0 then put 'Copied _bcin to _bcout.'; else do; msg=sysmsg(); put rc= msg=; end; run;   filename _bcin clear; filename _bcout clear;```

Last week I presented two talks at the University of Wisconsin at Milwaukee, which has established a new Graduate Certificate in Applied Data Analysis Using SAS. While in Milwaukee, I ran into an old friend: the ODS LISTING destination.

One of my presentations was a hands-on workshop titled Getting Started with the SAS/IML Language. In the UW-M computer lab, the students used SAS/IML Studio to run the exercises. I noticed that the student output was produced in the ODS LISTING destination, whereas my laptop was generating output for the HTML destination. That is, in the following screen capture, I was generating the output on the right side, whereas the student output looked like the left side (click to enlarge the image):

As I wandered through the lab, watching the students complete the exercises, I realized that I have grown accustomed to the HTML destination. HTML became the default ODS destination for the SAS Windowing environment in SAS 9.3. SAS/IML Studio made HTML the default destination in SAS 9.3m2, which shipped in August 2012. Thus I have been seeing HTML output exclusively for about a year.

I now prefer the HTML output, but when SAS 9.3 changed the default destination from LISTING to HTML, I had mixed feelings. The LISTING destination was an old friend, and I didn't want to see it move away. We had had good times together through the years.

However, I embraced the change. I did not override the new default when I installed SAS 9.3, and I gritted through the first few weeks of working with the HTML output. I discovered several benefits to the HTML destination, including the fact that that HTML output is "infinitely wide," and is therefore valuable when working with large matrices or wide tables. No more worrying about matrices wrapping when the output is wider than the LINESIZE option!

As I looked at the student output in the computer lab, I realized that I have made a new friend: the HTML destination. I like having it around when I work. I enjoy its beautiful tables and its integrated and interlaced ODS graphics.

When I encountered my old friend, the LISTING destination, in Milwaukee, I got the same feeling that I get when I play a classic video game like Pong, Space Invaders, or Asteroids: I briefly enjoy the nostalgic experience, but I realize that newer technology makes for a more enjoyable overall experience.

What is your default ODS destination in SAS? Are you still clinging to the LISTING destination? Have you converted to using HTML output? Why or why not? Share your story in the comments.

tags: SAS Programming

Recently I wrote about how to determine the age of your SAS release. Experienced SAS programmers know that you can programatically determine information about your SAS release by using certain automatic macro variables that SAS provides:

• SYSVER: contains the major and minor version of the SAS release
• SYSVLONG: contains the information in SYSVER, and information about the maintenance release
• SYSVLONG4: contains the information in SYSVLONG, and the year of release

For example, the following DATA step displays information about the SAS release. The results shown are for the second maintenance release of SAS 9.3.

```data _NULL_; %put SYSVER = &SYSVER; %put SYSVLONG = &SYSVLONG; %put SYSVLONG4 = &SYSVLONG4; run;```
```SYSVER = 9.3 SYSVLONG = 9.03.01M2D082312 SYSVLONG4 = 9.03.01M2D08232012```

These macro variables are usually used in macro code to conditionally include code (see the %INCLUDE statement) or to control the flow of execution through a macro, such as in the following example:

```%if %sysevalf(&sysver < 9) %then %do; %put SAS 9.0 or later is required. Terminating.; %goto exit; %end;```

Recently I wrote a SAS/IML function that decomposes the SYSVLONG macro into its components. You can write similar code for the SAS DATA step. The following program uses the FIND function and the SUBSTR function to parse and extract relevant information about a SAS release. If you ever have the need to extract details from the SYSVLONG macro variable, you might find this function helpful.

```proc iml; /* Helper function that returns information about the current SAS system release. This function decomposes the SYSVLONG system macro variable and returns four numbers that are associated with the version. */ start GetSASVersion( major, minor, iteration, maint ); sysvlong = symget("SYSVLONG"); /* system macro variable */ pos1 = find(sysvlong, "."); major = substr(sysvlong, 1, pos1-1); /* major version */ major = num(major); /* convert to numeric */   pos2 = find(sysvlong, ".", 'i', pos1+1); minor = substr(sysvlong, pos1+1, pos2-pos1-1);/* minor version */ minor = num(minor);   pos3 = find(sysvlong, "M", 'i', pos2+1); iteration = substr(sysvlong, pos2+1, pos3-pos2-1);/* iteration version */ iteration = num(iteration);   pos4 = notdigit(sysvlong, pos3+1); maint = substr(sysvlong, pos3+1, pos4-pos3-1); /* maintenance level */ maint = num(maint); finish;   /* test it by running code on SAS 9.3m2 (SAS/IML 12.1) */ run GetSASVersion( major, minor, iteration, maint ); v = major || minor || iteration || maint; print v[colname={"major" "minor" "iteration" "maint"} label="Results for SAS 9.3m2"];   b = ( major<9 ) | ( major=9 & minor<3 ) | ( major=9 & minor=3 & iteration<1 ) | ( major=9 & minor=3 & iteration=1 & maint<=2 ); if b then print "SAS 9.3m2 or earlier"; else print "After SAS 9.3m2";```

Even the best programmers make mistakes. For most errors, SAS software displays the nature and location of the error, returns control to the programmer, and awaits further instructions. However, there are a handful of insidious errors that cause SAS to think that a statement or program is not finished. For these errors, SAS doesn't display the error because it is waiting for the programmer to finish submitting the rest of the statement. Meanwhile, the programmer (who is unaware that an error has occurred) is waiting for SAS to respond. From the programmer's point of view, SAS is frozen. It has gone off into La-La Land, or maybe the Twilight Zone.

Fortunately, there is a simple "magic command" that fixes them all of these common errors. The common errors that render SAS unresponsive are as follows:

• The forgotten semicolon: If the last statement in a program does not contain a terminating semicolon, SAS thinks that the program is not finished. It waits to receive the rest of the statement. Without a terminating semicolon, SAS will wait, and wait, and wait....
`y = 1 /* No semicolon, so statement not complete */`
• The forgotten closing single quote: If your program starts a string but forgets to end it, SAS thinks you are in the process of defining a string. You can submit statements such as QUIT and ENDSAS, but SAS thinks these statements are just part of the string and does not execute them.
```c = 'My string; /* No closing quote. Future stmts are part of string */ run; * Hey! SAS is frozen! ; endsas; * Argh! Nothing works! ;```
As shown above, you can detect this error visually if you are using a program editor in which syntax is color-coded. For example, in the SAS enhanced editor, all characters after the equal sign are colored purple, which indicates that SAS thinks they are all part of a string. Also, after the character string exceeds 256 characters, SAS writes a helpful warning to the SAS Log:
```WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation marks.```
• The forgotten closing double quote: Same issue as for the forgotten single quote.
• The forgotten closing comment: You started a comment, but haven't closed it with */. No matter what text you submit, SAS thinks it is part of the comment.
```c = 'My string'; /* Program is complete run; * Hey! SAS is frozen! ; endsas; * Argh! Nothing works! ;```
Again, if you use a color-coded program editor, you ought to be able to detect this error visually. In the SAS enhanced editor, you will notice that your statements are green.

There is a "magic command" that you can submit that will recover from all four errors:

`;*';*";*/;`

If you have used SAS Enterprise Guide, you've probably seen this special statement (also called the "magic string" or the "quote killer") appended to the end of submitted programs. It is used by many client applications to ensure that the SAS server terminates and produces results such as ODS tables and graphics. I don't know who originally invented the magic command, but let's look at what it does:

• If the submitted program is already properly terminated (none of the errors are present), the command issues a null statement (the first character) and a comment (the remaining characters).
• If the submitted program forgot a concluding semicolon, the command terminates the previous statement (the first character) and issues a comment (the remaining characters).
• If the submitted program forgot to close a single-quote string, the command terminates the string (the third character) and issues a comment (the remaining characters).
• If the submitted program forgot to close a double-quote string, the command terminates the string (the sixth character) and issues a comment (the remaining characters).
• If the submitted program is missing a closing comment symbol, the command closes the comment (the eighth and ninth characters) and issues a null statement (the last character).

In all cases, the magic command causes SAS to escape from La-La Land and returns control to the programmer.

A forgotten RUN or QUIT statement is another error that can cause SAS to be unresponsive. For most procedures, SAS parses the statements in a program, but does not execute them until it encounters a RUN or QUIT statement. (Exceptions include some interactive procedures such as the IML and SQL procedures.) This kind of programming error is obviously fixed by submitting a QUIT or RUN statement. (Some programmers use the RUN CANCEL statement to abort a submitted DATA step.) Consequently, some programmers might want to modify the magic string as follows:

`;*';*";*/;quit;`

Again, this version of the magic command is used by many SAS client applications, including EG. It looks mysterious the first time you see it, but after you dissect it, it makes perfect sense. If you have ever asked "what is the purpose of the statement at the end of SAS Enterprise Guide programs," now you know!

Do you have a debugging tip that you use to overcome an insidious error? What do you do to regain control when your SAS program contains an error that locks-up your computer? Leave a comment.

Occasionally, people ask me what is the best thing about writing a book. Is it the notoriety you get from being a SAS Press author? Fame is always pleasant. Is it the money you make from the advance and the royalties?  Money is always useful. Is it displaying technical expertise [...]

In SAS 9.4, the SAS programming language continues add new features by the truckload. I've already discussed PROC DELETE (which is actually an old feature, but like an 80s hit song it's now back with a better version).

In this SAS Tech Talk video from SAS Global Forum 2013, I talked with Rick Langston about the advancements in the SAS programming language. Rick has been with SAS for...well, a long time. He's considered to be the steward of the SAS programming language. In this session, Rick discusses the process that we use to add new syntax to the language and to ensure its integrity.

Rick also talks about three specific new features in 9.4, all of which were added because customers asked for them. (It's difficult to read the Rick's syntax examples in the video, so I've included reference links below so that you can learn more.)

#### FILENAME ZIP access method

This brings the ability to read and write compressed ZIP files directly into the SAS language. For more information, see the FILENAME ZIP documentation. If you don't have SAS 9.4, you can still create ZIP files using ODS PACKAGE.

#### DOSUBL function

Rick calls this "submitting SAS code on the side", as it allows you to run a SAS step or statement from "inside" a currently running step. You can learn more from the DOSUBL function reference, or from this SAS Global Forum paper. I've also written a post with a specific example in SAS Enterprise Guide.

#### LOCKDOWN system option and statement

This one will excite SAS administrators. You can set the LOCKDOWN system option in a batch SAS session or SAS Workspace server to limit some of the "dangerous" functions of SAS and, more importantly, limit the file areas in which the SAS session will operate. We don't currently have a documentation link for this, so I'll dive in a bit further in a future blog post.

That's just a small taste of what's new. Be sure to check out the complete What's New in SAS 9.4 document for even more goodies.