6月 142010

As we continue with our series on survival analysis, we demonstrate how to plot estimated (smoothed) hazard functions.


We will utilize the routines available in the muhaz package. Background information on the methods can be found in K.R. Hess, D.M. Serachitopol and B.W. Brown Hazard Function Estimators: A Simulation Study, Statistics in Medicine, 1999: 18(22):3075-3088.

ds = read.csv("http://www.math.smith.edu/sasr/datasets/help.csv")
smallds = data.frame(dayslink=ds$dayslink,
linkstatus=ds$linkstatus, treat=ds$treat)

# drop subjects with missing data
smallds = na.omit(smallds)

treatds = smallds[smallds$treat==1,]
controlds = smallds[smallds$treat==0,]
rm(ds, smallds) # clean up

haztreat = with(treatds, muhaz(dayslink, linkstatus))
hazcontrol = with(controlds, muhaz(dayslink, linkstatus))

plot(haztreat, lwd=2, xlab="Follow-up time (days)")
lines(hazcontrol, lty=2, lwd=2)
legend(200, 0.005, legend=c("Treatment", "Control"),
lty=1:2, lwd=2)

The treatment group has dramatically higher hazard, but this drops appreciably after 6 months. The control group hazard is low, and decreases in a roughly linear fashion.


Paul Alison includes macros to display estimates from parametric and semiparametric models in Survival Analysis Using SAS (2nd edition). We'll use the smooth macro, which is built to accept output from proc lifetest.

proc import file="c:\book\help.csv"
out=help dbms=dlm;

data h2;
set help;
if nmiss(dayslink, linkstatus, treat) eq 0;

proc lifetest data=h2 outsurv=allison;
time dayslink*linkstatus(0);
strata treat;

%include "c:/ken/sasmacros/smooth.sas";
%smooth(data=allison, time=dayslink, width=25);

The proc lifetest results (not shown) indicate that group 1 is the control and group 2 is the intervention. The macro uses a simpler smoothing method than that found in R, so that the curve is bumpier and estimates near the edges are not shown.
6月 102010

In my last post, I promised a series of tips on where to stay, what to eat, and what to do if you come to Cary for SAS training. In this segment, I'll cover the logistics of arrival and lodging. For those of you awaiting dining and entertainment information, I'm afraid you'll just have to sit in your hotel room and starve until I can get around to writing the next couple of postings.

First Things First

Before you set out for Cary, check out the Cary Training Center web page. We also have information on our other SAS training centers if you’re going somewhere other than Cary.

A Geographical Introduction

Cary is located next to Raleigh, the capital of North Carolina.  Raleigh, Durham, and Chapel Hill comprise the Research Triangle area, or "The Triangle".  Interestingly, Cary has more than twice as many people (134,000 as of 2009) as Chapel Hill (55,000), but nobody has demanded to rename the area to "The Research Quadrangle" or to drop Chapel Hill and add Cary. The area is home to 3 major universities (Duke in Durham, North Carolina State in Raleigh, and the University of North Carolina in Chapel Hill). All residents of the Triangle are required by law to follow college basketball, and all visitors to the Triangle must declare a preference for a local team on arrival. You can do so at the baggage claim area at the airport.

Will I Need a Car?

Not necessarily. If you are flying into the area, you will arrive at Raleigh-Durham International (really!) Airport (RDU), which is about 5 miles from the SAS campus. Taxi fare between RDU and SAS runs about $20 each way. Many of the recommended hotels have shuttles to and from the airport and to and from SAS. There are restaurants within walking distance of several of the hotels and some have their own restaurant.

On the other hand, if you want to do anything other than go to class, eat, and sit in your hotel, you will need a car or a taxi, as there is little mass transportation to speak of. Local governments have been planning light rail since the last century, and they will still be planning it well into the 22nd century. If you're driving, directions to SAS can be found on the Cary Training Center web page.
Continue reading "So, You're Going to Cary--Where to Stay?"
6月 102010
I don’t know what you did for the Memorial Day weekend, but I went to Boston to staff a booth at the Association for Psychological Science conference. I went to talk to university professors about the SAS Global Academic Program and SAS OnDemand for Academics software, which will soon be available at no cost to students (It is already available at no cost to professors for teaching purposes). Two big highlights for me on this trip:

First, I saw a game at Fenway, something I haven’t done since I was 10. The view was a little obstructed by Pesky’s Pole, and the Sox lost 12-5. Still, I was in heaven.

Second, at the conference, a psychology professor walked right up to our booth and said “I like what SAS can do, but I don’t find the programming to be very intuitive”. I asked her if she had seen Enterprise Guide (available thru OnDemand for Academics), and she hadn’t. I gave her a brief demo of the menu driven interface, all point and click, no programming required. She was impressed, but raised a serious concern. “My husband’s a “math guy” and really likes SAS programming. He won’t like this”. So I showed her that you could access the code that your ‘pointing and clicking’ created, modify the code, save the code, email the code, and even write code from scratch. She stared at the screen for a moment, digested the implications, and said “I think this could save my marriage.”

Who knew booth duty could turn into marriage counseling? Think Enterprise Guide might help you too? Check out this brief demo.
6月 102010
Like many people, I prioritize tasks based on expected results; specifically by evaluating the impact of each likely outcome on the future of SAS’ marketing efforts. Ensuring deliberate actions have purpose and meaning—today and tomorrow—is important, because a forward thinking mind-set keeps one moving in the right direction.

There’s a lyric from a Blues Traveler song that goes “It won’t mean a thing in 100 years.” I love the song, but disagree with the premise. Choosing to make all 480 workday-minutes count towards future success is good advice, and unlikely to show up on a Phrases Your Boss Doesn't Want to Hear list anytime soon.

Much has changed in the Database Marketing world in the nine years I’ve been with SAS. 2001 was all about using SAS Data Quality solutions to reduce the time it took to pull and standardize lists for direct mail campaigns. Over the years our internal usage of SAS Software evolved, and today we also use SAS Marketing Automation for campaign management, to segment and target our email, direct mail, and contact center lists.

While there are many accomplishments to be proud of, I’m most proud of my team’s ability to remain focused on tomorrow; on the next impact they can deliver. One highlight here is a new project focused on lead nurturing and optimization. At a basic level, we’re working on two main objectives:

  1. Combine the power of SAS Enterprise Miner to create predictive and descriptive models, with SAS Marketing Optimization to maximize the performance of our database when marketing campaigns compete for the same data.
  2. Develop and deliver a B2B Lead Nurturing program we can be proud of.

Over the next few months I’ll update you on this project, providing details on what we’re doing and how we’re doing it. And I’ll be happy to entertain questions along the way.
6月 082010
The July 2-for-1 SAS training discount is a limited-time offer (in the U.S.) allowing customers to bring a colleague with them to a training class for free, giving companies the chance to train more than one person at a time.

For research analysts Chris Cable and Yelena McElwain from Nathan Associates, Inc., SAS Education’s July 2-for-1 training offer was the perfect opportunity to get the training they needed – twice. Read on to learn how Chris and Yelena benefited from taking SAS training together.

SAS Education: You took advantage of the July 2-for-1 offer in 2008 and in 2009. Why did you decide to come to training together? Are you working together on a project?

Cable and McElwain: We are both research analysts in our litigation consulting practice and use SAS extensively in this role. We prefer attending training together because it is helpful to discuss the topics during and after the training.

SAS Education: How are you using what you’ve learned in your classes?

Cable and McElwain: Taking the classes [SAS Programming 2: Data Manipulation Techniques and SAS Macro Language 1: Essentials] together definitely helped deepen our understanding of the material. We use the macro skills from the courses quite often, not only in our own work, but also in helping our co-workers.

SAS Education: Did you take these training courses because there was a 2-for-1 discount?

Cable and McElwain: We were not waiting for a 2-for-1 offer; we were interested in improving our SAS macros skills and the discount allowed us to take the courses we needed at the same time. We will most likely take advantage of the offer again in 2010.
6月 072010
A few weeks ago I posted 'Chris Brogan Talks About the Value and Measurement of Social Media', the fifth (of six) Chris Brogan videos taped when Chris was at our headquarters in Cary, NC.

In this video, the sixth and final segment in the series, SAS' Deb Orton interviews Chris Brogan of New Marketing Labs. Chris discusses how being a "Trust Agent" applies to corporate marketers. His advice? Don’t abuse your prospects! find ways to show you care outside of the sale.

Enjoy the video. And stay tuned. Soon we'll continue this 'Nuts and Bolts' series with new interviews from SAS marketing practitioners.

6月 072010

In our previous entry, we described how to calculate the Nelson-Aalen estimate of cumulative hazard.

In this entry, we display the estimates for the time to linkage to primary care for both the treatment and control groups in the HELP study.


We use the previously defined function, after removing missing values and sorting by the time to event or censoring.

ds = read.csv("http://www.math.smith.edu/sasr/datasets/help.csv")
smallds = data.frame(dayslink=ds$dayslink,
linkstatus=ds$linkstatus, treat=ds$treat)

# drop subjects with missing data
smallds = na.omit(smallds)

# order by dayslink
smallds = smallds[order(smallds$dayslink),]

rm(ds) # clean up
calcna = function(time, event) {
na.fit = survfit(coxph(Surv(time,event)~1),
jumps = c(0, na.fit$time, max(time))
# need to be careful at the beginning and end
surv = c(1, na.fit$surv, na.fit$surv[length(na.fit$surv)])

# apply appropriate transformation
neglogsurv = -log(surv)

# create placeholder of correct length
naest = numeric(length(time))
for (i in 2:length(jumps)) {
naest[which(time>=jumps[i-1] & time<=jumps[i])] =
neglogsurv[i-1] # snag the appropriate value

nacontrol = calcna(dayslink[treat==0], linkstatus[treat==0])
natreat = calcna(dayslink[treat==1], linkstatus[treat==1])

plot(dayslink[treat==1], natreat, type="s", col="blue",
ylab="Nelson-Aalen estimate",
xlab="Number of days", lwd=2)
lines(dayslink[treat==0], nacontrol, lty=2, lwd=2,
col="red", type="s")
legend(250, 0.55, legend=c("Intervention", "Control"),
lty=1:2, lwd=2, col=c("blue", "red"))

The time to linkage was much lower for the treatment group than for the control group.


In SAS we'll use proc lifetest with the strata statement, as in section 5.6.3, removing subjects with missing time, censoring, or treatment indicators using the nmiss function (section 1.4.14) and subsetting if statement (section 1.5.1). We supress all printed output and save the desired data set using ODS commands.

proc import file="c:\book\help.csv"
out=help dbms=dlm;

data h2;
set help;
if nmiss(dayslink, linkstatus,treat) eq 0;

ods select none;
ods output productlimitestimates=naout1;
proc lifetest data=h2 nelson;
time dayslink*linkstatus(0);
strata treat;
ods select all;

Then we just have to plot the data. We make it a little more self-explanatory by defining a format for the treatment group. Separate curves are requested with the y*x=z syntax, which also produces a legend by default. We use the symbol statement with the stepj interpolation (section 5.1.19) to ask for a step function plot of each curve. The axis statement is used to rotate the title of the y-axis. The label for the y-axis is inherited from proc lifetest.

proc format;
value treat 0="Control" 1="Intervention";

axis1 label = (angle = 90) minor = none order = 0 to 1.1 by .1;
symbol1 i = stepj l = 3 w = 5 c = red;
symbol2 i = stepj l = 1 w = 5 c = blue;
proc gplot data = naout1;
plot cumhaz * dayslink = treat / vaxis = axis1;
label treat = "Treatment group";
format treat treat. cumhaz dayslink 4.1;

Corpus Callosum .. Where Right and Left Brain Meet

 Barry DeVille, linguistics, text, text analytics  Corpus Callosum .. Where Right and Left Brain Meet已关闭评论
6月 042010
The Corpus Callosum is a huge switching station in the middle of our brains that connects the right and left hemisphere. Without it we would not be able to reason about what we are looking at (reasoning is a left brain function while vision is in the right brain).

Similarly, in Text Analytics, the "Corpus" is the "huge switching station" that tells us the meaning of words and how to associate different forms of words to the items of interest that we are trying to extract from text.

The Wall Street Journal’s “Numbers Guy” -- Carl Bialik -- quoted Mike Calcagno, general manager of the Microsoft group that manages Word. Calgagno says "Text corpora is the lifeblood of most of our development and testing processes."

"Microsoft has licensed over one trillion words of English text in each of the past two years, and bolsters its collection with emails exchanged on its Hotmail program, with identifying details removed", according to a Microsoft spokeswoman.

SAS's own Enterprise Content Categorization maintains huge corpora in various languages. As Bialik notes in the Wall Street Journal article ("Making Every Word Count" , Sept. 12, 2008): "Without enough spoken-language data, subtleties may not emerge."

"The word 'rife' only occurs in negative contexts," says Anne O'Keeffe, a linguist at Mary Immaculate College, the University of Limerick, Ireland. "We are never rife with money," despite that affliction's appeal.

In spite of their utility, publicly-available Corpora are hard to come by and even harder to update.
The largest public collection may be the British National Corpus, which was assembled in the early 1990s. The BNC included the recorded conversations of 200 Britons. The intended American counterpart to the BNC --the American National Corpus -- is a collection of text that includes the 9/11 Commission Report and Berlitz travel guides. With only 22 million words, the ANC is small when compared to the BNC.

Copora and associated taxonomies are extremely valuable components of a robust text mining/text analytics solution. We are fortunate to have these assets available to us in support of our text mining/analytics tasks.
6月 032010

I think this blog would be so much more literate, if I started it with a classic opening line, like “Call me Ishmael” or “riverrun, past Eve and Adams”. Those are the opening lines of Melville’s Moby-Dick and Joyce’s Finnegans Wake, respectively. In Finnegans Wake, Joyce meant for the book to be circular or cyclic in structure, so the sentence fragment at the beginning of the book is the end to the sentence fragment that ends the book: “A way a lone a last a loved a long the”.

I know. It’s complicated. Sort of like having someone explain hash objects or macro resolution with multiple ampersands to you the first time. Just when you think you’ve got it, a neuron in your brain gets distracted by some other electrical impulse and what you thought you understood becomes hazy and confusing. Or you just downright don’t get it the first time.

And now you’re thinking, Cynthia must have been an English major. Yep. Got my P.O.E.M. shirt ( http://www.prettygoodgoods.org/product/show/31755 ). And then, that thought was probably followed by the question, “What does any of that literature stuff have to do with SAS?”

OK, so that question ends this prolog (or hook). Here comes the blog. Now I’m going to tell you what that literature stuff has to do with SAS. But those of you who know me also know that I can explain things in a LOT of detail. So it’s going to take more than one blog post to get through the whole explanation.

But I’ll start with my ending point, (borrowing a technique from Joyce) to give you a hint of where I’m going. The first piece of practical information or practical advice that I want to write about is that learning how to rely on yourself and your own skills will help you become a better SAS user.

Continue reading "Literature and SAS: Or, How studying Melville has made me a better SAS user"