debugging

8月 192017
 

SAS programmers have high expectations for their coding environment, and why shouldn't they? Companies have a huge investment in their SAS code base, and it's important to have tools that help you understand that code and track changes over time. Few things are more satisfying as a SAS program that works as designed and delivers perfect results. (Oh, hyperbole you say? I don't think so.) But when your program isn't working the way it should, there are two features that can help you get back on track: a code debugger, and program revision history. Both of these capabilities are built into SAS Enterprise Guide. Program history was added in v7.1, and the debugger was added in v7.13.

I've written about the DATA step debugger before -- both as a teaching tool and as a productivity tool. In this article, I'm sharing a demo of the debugger's features, led by SAS developer Joe Flynn. Before joining the SAS Enterprise Guide development team, Joe worked in SAS Technical Support. He's very familiar with "bugs," and reported his share of them to SAS R&D. Now -- like every programmer -- Joe makes the bugs. But of course, he fixes most of them before they ever see the light of day. How does he do that? Debugging.

This video is only about 8 minutes long, but it's packed with good information. In the debugger demo, you'll learn how you can use standard debugging methods, such as breakpoints, step over and step through, watch variables, jump to, evaluate expression, and more. There is no better way to understand exactly what is causing your DATA step to misbehave.

Joe's debugger

In the program history demo (the second part of the video), you'll learn how team members can collaborate using standard source management tools (such as Git). If you establish a good practice of storing code in a central place with solid source management techniques, SAS Enterprise Guide can help you see who changed what, and when. SAS Enterprise Guide also offers a built-in code version comparison tool, which enhances your ability to find the breaking changes. You can also use the code comparison technique on its own, outside of the program history feature.

program history

Take a few minutes to watch the video, and then try out the features yourself. You don't need a Git installation to play with program history at the project level, though it helps when you want to extend that feature to support team collaboration.

See also

The post Code debugging and program history in SAS Enterprise Guide appeared first on The SAS Dummy.

1月 312013
 

I've got a new trick that you can use to track progress in a long-running SAS program, while using SAS Enterprise Guide.

I've previously written about the SYSECHO statement and how you can use it to update the Task Status window with custom messages. SYSECHO is a "global" statement in the SAS language, which works great within the boundaries of a PROC step to annotate your Task Status messages. For certain interactive PROCs (like SQL), you can pepper in SYSECHO statements to give more frequent and detailed updates.
here's what it looks like
But the DATA step, which is often used for long-running operations, cannot use multiple SYSECHO statements. When SAS compiles the DATA step it processes all global statements up front, so that only one use of the SYSECHO statement takes effect. This means that if you want to use multiple SYSECHO calls to report which observation or values are being processed as the DATA step runs, you can't.

That is, you couldn't, until now. With SAS 9.3 Maintenance 2 you can use a new function, named DOSUBL, to process any SAS statements (including global statements) "just in time" as your DATA step runs. DOSUBL accepts one or more SAS statements and executes them immediately, inline with the currently running DATA step. The statements must represent a "complete" segment of a SAS program: no partial PROCs or partial DATA steps. But it can execute any PROC step (bounded by RUN or QUIT) or a DATA step (bounded by RUN) or -- key for our example -- a SAS global statement.

NOTE: DOSUBL is an experimental function in SAS 9.3, and you really need the M2 revision or later for it to work well. It's on target for a production release with SAS 9.4. It's also the topic of a paper by Rick Langston at SAS Global Forum 2013 (here's a preview from Jason's paper last year).

If you have SAS 9.3M2, try running this from your SAS Enterprise Guide session. Note how I put a condition around the DOSUBL call; I don't want to process the statement for every record, as that might get too chatty in the status window. Instead, I call it once for every 5 records.

/* Tracking where you are in the DATA step iterations */
data _null_;
 set sashelp.class;
 if mod(_n_,5)=0 then
  rc = dosubl(cats('SYSECHO "OBS N=',_n_,'";'));
 s = sleep(1); /* contrived delay - 1 second on Windows */
run;


Here's another example that reports on progress, but instead of triggering the update based on record count, I've triggered it based on the current BY value in process:

proc sort data=sashelp.cars out=sortedcars;
by make;
run;
 
/* Tracking which "category" value is being processed */
data _null_;
 set sortedcars;
 by make;
 if first.make then do;
  rc = dosubl(cats('SYSECHO "OBS N=',_n_,' MAKE=',make,'";'));
 end;
run;

With a little imagination, you can devise some sophisticated SYSECHO statements that report on other aspects of the DATA step processing: the timings, the program data vector, and so on. It might become a useful debugging technique (in lieu of the DATA step debugger, which can't run in a client application like SAS Enterprise Guide).

Other references:

Tracking progress in SAS programs with SAS Enterprise Guide
Executing a PROC from a DATA step

tags: debugging, SAS Enterprise Guide, SYSECHO
5月 192012
 
This week's tip comes from SAS powerhouse Art Carpenter and his book Carpenter's Complete Guide to the SAS REPORT Procedure. SAS user Kim LeBouton called this book "the single best resource for PROC REPORT." And in his review, SAS user Charles Patridge said "This is a must-have book if you are a SAS [...]
11月 232011
 

The SAS macro variable "inspector" is a custom task that plugs into SAS Enterprise Guide 4.3. You can use it to view the current values for all SAS macro variables that are defined within your SAS session. You can also evaluate "immediate" macro expressions in a convenient quick view window. If you develop or run SAS macro programs, this task can be a valuable debugging and learning tool.

UPDATE 28Nov2011: I've received several comments about this task, both here on the blog and in e-mail. Based on some of your very good suggestions, I've updated the task with additional features. If you downloaded the task before 28Nov2011, you might want to refresh your copy.

UPDATE 16Dec2011: The download package for this task now also includes a SAS Options viewer task, described on this separate blog post.

I've been working with SAS macros quite a bit lately, and I decided that a task like this could come in handy as a sort of "watch" window for macro values. I built the task using the custom task APIs and Microsoft .NET. I hope that you find the task useful. If you try it out and have suggestions for how to make it better, please share by adding a comment to the blog.

The custom task and an accompanying README.pdf file (containing description and detailed installation instructions) are available for download in this ZIP file. Installation is simple: copy the DLL to a designated folder on your PC, and SAS Enterprise Guide will detect the task automatically.

This add-in offers the following main features:

Always-visible window: Once you open the task from the Tools menu, you can leave it open for your entire SAS Enterprise Guide session. The window uses a "modeless" display, so you can still interact with other SAS Enterprise Guide features while the window is visible. This makes it easy to switch between SAS programs and other SAS Enterprise Guide windows and the macro variable viewer to see results.

Select active SAS server: If your SAS environment contains multiple SAS workspace connections, you can switch among the different servers to see macro values on multiple systems.

One-click refresh: Refresh the list of macro variables by clicking on the Refresh button in the toolbar.

View by scope or as a straight list: View the macro variables in their scope categories (for example, Global and Automatic) or as a straight list, sorted by variable name or current value. Click on the column headers to sort the list.

Filter results (NEW!): Type a string of characters in the "Filter results" field, and the list of macro variable results will be instantly filtered to those that contain the sequence that you type. The filtered results will match on variable names as well as values, and the search is case-insensitive. This is a convenient way to narrow the list to just a few variables that you're interested in. To clear the filter, click on the X button next to the "Filter results" field, or "blank out" the text field. (Note: this is in the 28Nov2011 update!)

Set window transparency: You can make the window appear "see-through" so that it doesn't completely obscure your other windows as you work with it.

Copy macro variables as %LET statements: Select one or more macro variables within the window, right-click and select Copy assignments. This generates a series of %LET statements -- one for each macro variable/value pair -- which you can then paste into a SAS program.

Macro expression "quick view": Have you ever wanted to test out a macro expression before using it in a longer program? This window allows you to get immediate feedback on a macro expression, whether a simple macro reference or a more complex expression with nested functions. If the expression generates a SAS warning or error, the feedback window shows that as well. Note: the expression can be any macro expression that is valid for the right-side of a macro variable assignment (%let statement).

More macro productivity in SAS Enterprise Guide is just a few clicks away! Download the task today and let me know what you think!

tags: .net, debugging, macro programming, SAS custom tasks, SAS Enterprise Guide, SAS programming
5月 032011
 
R objects that reside in other R objects can require a lot of typing to access. For example, to refer to a variable x in a dataframe df, one could type df$x. This is no problem when the dataframe and variable names are short, but can become burdensome when longer names or repeated references are required, or objects in complicated structures must be accessed.

The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes. As an example:

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
names(ds)
attach(ds)
mean(cesd)
[1] 32.84768

The search() function can be used to list attached objects and packages. Let's see what is there, then detach() the dataset to clean up after ourselves.

search()
> search()
[1] ".GlobalEnv" "ds" "tools:RGUI" "package:stats"
[5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
[9] "package:methods" "Autoloads" "package:base"
detach(ds)

As noted in section B.4.5, users are cautioned that if there is already a variable
called cesd in the local workspace, issuing attach(ds), may not mean that cesd references ds$cesd. Name conflicts of this type are a common problem with attach() and care should be taken to avoid them.

The help page for attach() notes that attach can lead to confusion. The Google R Style Manual provides clear advice on this point, providing the following advice about attach():
The possibilities for creating errors when using attach are numerous. Avoid it.


After being burned by this one too many times, we concur.

So what options exist for those who decide to go cold turkey?

  1. Reference variables directly (e.g. lm(ds$x ~ ds$y))

  2. Specify the dataframe for commands which support this (e.g. lm(y ~ x, data=ds))

  3. Use the with() function, which returns the value of whatever expression is evaluated (e.g. with(ds,lm(y ~x)))

  4. (Also note the within() function, which is similar to with(), but returns a modified object.)


Some examples may be helpful.

> # fit a linear model
> lm1 = lm(cesd ~ pcs, data=ds)

> mean(ds$cesd[ds$female==1]) # these next three are equivalent
[1] 36.88785
> with(ds, mean(cesd[female==1]))
[1] 36.88785
> with(subset(ds, female==1), mean(cesd))
[1] 36.88785

In short, there's never an actual need to use attach(), using it can lead to confusion or errors, and alternatives exists that avoid the problems. We recommend against it.

In SAS, all procedures use the most recent data set or must reference a data set explicitly. Very roughly speaking, using attach() in R is like relying on the implicit use of the most recent data set. Our recommendation against attach() thus mirrors our use of the data= option throughout our books.
5月 032011
 
R objects that reside in other R objects can require a lot of typing to access. For example, to refer to a variable x in a dataframe df, one could type df$x. This is no problem when the dataframe and variable names are short, but can become burdensome when longer names or repeated references are required, or objects in complicated structures must be accessed.

The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes. As an example:

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
names(ds)
attach(ds)
mean(cesd)
[1] 32.84768

The search() function can be used to list attached objects and packages. Let's see what is there, then detach() the dataset to clean up after ourselves.

search()
> search()
[1] ".GlobalEnv" "ds" "tools:RGUI" "package:stats"
[5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
[9] "package:methods" "Autoloads" "package:base"
detach(ds)

As noted in section B.4.5, users are cautioned that if there is already a variable
called cesd in the local workspace, issuing attach(ds), may not mean that cesd references ds$cesd. Name conflicts of this type are a common problem with attach() and care should be taken to avoid them.

The help page for attach() notes that attach can lead to confusion. The Google R Style Manual provides clear advice on this point, providing the following advice about attach():
The possibilities for creating errors when using attach are numerous. Avoid it.


After being burned by this one too many times, we concur.

So what options exist for those who decide to go cold turkey?

  1. Reference variables directly (e.g. lm(ds$x ~ ds$y))

  2. Specify the dataframe for commands which support this (e.g. lm(y ~ x, data=ds))

  3. Use the with() function, which returns the value of whatever expression is evaluated (e.g. with(ds,lm(y ~x)))

  4. (Also note the within() function, which is similar to with(), but returns a modified object.)


Some examples may be helpful.

> # fit a linear model
> lm1 = lm(cesd ~ pcs, data=ds)

> mean(ds$cesd[ds$female==1]) # these next three are equivalent
[1] 36.88785
> with(ds, mean(cesd[female==1]))
[1] 36.88785
> with(subset(ds, female==1), mean(cesd))
[1] 36.88785

In short, there's never an actual need to use attach(), using it can lead to confusion or errors, and alternatives exists that avoid the problems. We recommend against it.

In SAS, all procedures use the most recent data set or must reference a data set explicitly. Very roughly speaking, using attach() in R is like relying on the implicit use of the most recent data set. Our recommendation against attach() thus mirrors our use of the data= option throughout our books.
12月 232010
 
As we get close to the end of the year, it's time to look back over the past year and think of resolutions for 2011 and beyond. One that's often on my mind relates to ways to structure my code to make it clearer to others (as well as to myself when I look back upon it months later).

Style guides are common in many programming languages, and are often purported to increase the readability and legibility of code, as well as minimize errors. The Wikipedia page on this topic describes the importance of indentation, spacing, alignment, and other formatting conventions.

Many stylistic conventions are appropriate for statistical code written in SAS and R, and can help to make code clearer and easier to comprehend. Consider the difference between:

ds=read.csv("http://www.math.smith.edu/r/data/help.csv");attach(ds)
fOo=ks.test(age[female==1],age[female==0],data=ds)
plotdens=function(x,y,mytitle, mylab){densx = density(x)
densy = density(y);plot(densx,main=mytitle,lwd=3,xlab=mylab,
bty="l");lines(densy,lty=2,col=2,lwd=3);xvals=c(densx$x,
rev(densy$x));yvals=c(densx$y,rev(densy$y));polygon(xvals,
yvals,col="gray")};mytitle=paste("Test of ages: D=",round(fOo$statistic,3),
" p=",round(fOo$p.value,2),sep="");plotdens(age[female==1],
age[female==0],mytitle=mytitle,mylab="age (in years)")
legend(50,.05,legend=c("Women","Men"),col=1:2,lty=1:2,lwd=2)

and

# code example from the Using R for Data Management, Statistical
# Analysis and Graphics book
# Nicholas Horton, Smith College December 21, 2010
#
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
attach(ds)

# fit KS test and save object containing p-value
ksres = ks.test(age[female==1], age[female==0], data=ds)

# define function to plot two densities on the same graph
plotdens = function(x, y, mytitle, mylab) {
densx = density(x)
densy = density(y)
plot(densx, main=mytitle, lwd=3, xlab=mylab, bty="l")
lines(densy, lty=2, col=2, lwd=3)
xvals = c(densx$x, rev(densy$x))
yvals = c(densx$y, rev(densy$y))
polygon(xvals, yvals, col="gray")
}

# craft specialized title containing statistic and p-value
mytitle = paste("Test of ages: D=",
round(ksres$statistic,3),
" p=", round(ksres$p.value, 2),
sep="")

plotdens(age[female==1], age[female==0],
mytitle=mytitle, mylab="age (in years)")

legend(50, .05, legend=c("Women", "Men"), col=1:2, lty=1:2,
lwd=2)

While the first example has the advantage of using considerably fewer lines, it suffers dramatically from readability. The use of appropriate indentation, white space, spacing and comments help the analyst when debugging as well as fostering easier reuse in the future. In settings where code review is undertaken, sharing a set of common standards is eminently sensible.

SAS

A specific but somewhat cursory style manual for SAS can be found at the SAS community Style guide for writing and polishing programs. I like the start of this guide, though it is incomplete at present. Other useful words of wisdom can be found here and here.

R

Google's R Style Guide is chock full of tips and guidelines to make R code easier to read, share and verify. Another source of ideas is Henrik Bengtsson's draft R coding conventions. While one can quibble about some of the specific suggestions, overall, the effect of adherence to such a style guide is code that is easier to understand and less likely to hide errors.

Some coders are fundamentalists in insisting on "the correct" style. In general, however, it is more important to develop a sensible, interpretable, and coherent style of your own than to adhere to styles that you find awkward, whatever their provenance. The links above provide some common sense tips that can help improve productivity and make you a better analyst.