1月 062011

A student in my multivariate class last month asked a question about prior probability specifications in discriminant function analysis:

First, a quick refresher of priors in discriminant analysis. Consider a problem of classifying 150 cases (let's say, irises) into three categories (let's say, variety). I have four different measurements taken from each of the flowers.

If I walk through the bog and pick another flower and measure its 4 characteristics, how well can I expect to perform in classifying it as the right variety? One way to derive a classification algorithm is to use linear discriminant analysis.

A linear discriminant function to predict group membership is based on the squared Mahalanobis distance from each observation to the controid of the group plus a function of the prior probability of membership in that group.

This generalized squared distance is then converted into a score of similarity to each group, and the case is classified into the group it is most similar to.

The

If the prior probabilities are the same for all three of the groups (also known as

If the prior for group A is larger than for groups B and C, then the function makes it more likely that an observation will be classified as group A, all else being equal.

The default in PROC DISCRIM is equal priors. This default makes sense in the context of developing computational software: the function with equal priors is the simplest, and therefore the most computationally efficient.

Alternatives are proportional priors (using priors that are the proportion of observations from each group in the same input data set) and user-specified priors (just what it sounds like: specify them yourself).

Of course this kind of problem is far more interesting when you consider something like people making choices, such as kids choosing an action figure of Tinkerbell, Rosetta, or Vidia. Those certainly don't have equal priors, and if your daughter's anything like mine, she doesn't want to be classified into the wrong group.

So back to the original question:

In this case, using the default would probably not be a great idea, as it would assign the dolls with equal probability, all else being equal.

So if not the default, then what should you use? This depends on what you're going to be scoring. Your priors should reflect the probabilities in the population that you will be scoring in the future. Some strategies for getting a decent estimate:

1. Go to historical data to see what the probabilities have been in the past.

2. If your input data set is a simple random sample, use proportional priors.

3. Take a simple random sample from the population and count up the number from each group. This can determine the priors.

4. Combine the probabilities you think are correct with the cost of different types of misclassification.

For example, suppose that. among 4-year-olds, the probabilities of wanting the Tinkerbell, Rosetta, and Vidia action figures are really 0.50, 0.35, and .15 respectively. After all, not many kids want to be the villian.

What is the cost of giving a girl the Rosetta doll when she wanted Tinkerbell? What's the cost of giving a girl Vidia when she wanted Rosetta?, and so on. A table is shown below (based on a very small sample of three interviews of 4-year-old girls):

Clearly the cost of an error is not the same for all errors. It is far worse to assign Vidia to a girl who doesn't want Vidia than for any other error to occur. Also notice the small detail that Vidia fans would prefer to get Rosetta over Tinkerbell. For birthday party favors, I'd massage those priors to err on the side of giving out Rosetta over Vidia.

Of course, depending on your tolerance for crying, you might just give everyone Rosetta and be done with it. But then, really, isn't variety the spice of life?

I hope this has helped at least one or two of you out there who were having trouble with priors. The same concepts apply in logistic regression with offset variables, by the way. But that's a party favor for another day.

*What if I don't know what the probabilities are in my population? Is it best to just use the default in PROC DISCRIM?*First, a quick refresher of priors in discriminant analysis. Consider a problem of classifying 150 cases (let's say, irises) into three categories (let's say, variety). I have four different measurements taken from each of the flowers.

If I walk through the bog and pick another flower and measure its 4 characteristics, how well can I expect to perform in classifying it as the right variety? One way to derive a classification algorithm is to use linear discriminant analysis.

A linear discriminant function to predict group membership is based on the squared Mahalanobis distance from each observation to the controid of the group plus a function of the prior probability of membership in that group.

This generalized squared distance is then converted into a score of similarity to each group, and the case is classified into the group it is most similar to.

The

*prior probability*is the probability of an observation coming from a particular group in a simple random sample with replacement.If the prior probabilities are the same for all three of the groups (also known as

*equal priors*), then the function is only based on the squared Mahalanobis distance.If the prior for group A is larger than for groups B and C, then the function makes it more likely that an observation will be classified as group A, all else being equal.

The default in PROC DISCRIM is equal priors. This default makes sense in the context of developing computational software: the function with equal priors is the simplest, and therefore the most computationally efficient.

**PRIORS equal;**Alternatives are proportional priors (using priors that are the proportion of observations from each group in the same input data set) and user-specified priors (just what it sounds like: specify them yourself).

**PRIORS proportional;**

PRIORS 'A' = .5 'B' = .25 'C' = .25;PRIORS 'A' = .5 'B' = .25 'C' = .25;

Of course this kind of problem is far more interesting when you consider something like people making choices, such as kids choosing an action figure of Tinkerbell, Rosetta, or Vidia. Those certainly don't have equal priors, and if your daughter's anything like mine, she doesn't want to be classified into the wrong group.

So back to the original question:

*What if I don't know what the probabilities are in my population? Is it best to just use the default in PROC DISCRIM?*In this case, using the default would probably not be a great idea, as it would assign the dolls with equal probability, all else being equal.

So if not the default, then what should you use? This depends on what you're going to be scoring. Your priors should reflect the probabilities in the population that you will be scoring in the future. Some strategies for getting a decent estimate:

1. Go to historical data to see what the probabilities have been in the past.

2. If your input data set is a simple random sample, use proportional priors.

3. Take a simple random sample from the population and count up the number from each group. This can determine the priors.

4. Combine the probabilities you think are correct with the cost of different types of misclassification.

For example, suppose that. among 4-year-olds, the probabilities of wanting the Tinkerbell, Rosetta, and Vidia action figures are really 0.50, 0.35, and .15 respectively. After all, not many kids want to be the villian.

**PRIORS 'Tink' = .5 'Rosetta' = .35 'Vidia' = .15**What is the cost of giving a girl the Rosetta doll when she wanted Tinkerbell? What's the cost of giving a girl Vidia when she wanted Rosetta?, and so on. A table is shown below (based on a very small sample of three interviews of 4-year-old girls):

Clearly the cost of an error is not the same for all errors. It is far worse to assign Vidia to a girl who doesn't want Vidia than for any other error to occur. Also notice the small detail that Vidia fans would prefer to get Rosetta over Tinkerbell. For birthday party favors, I'd massage those priors to err on the side of giving out Rosetta over Vidia.

**PRIORS 'Tink' = .5 'Rosetta' = .4 'Vidia' = .1**Of course, depending on your tolerance for crying, you might just give everyone Rosetta and be done with it. But then, really, isn't variety the spice of life?

I hope this has helped at least one or two of you out there who were having trouble with priors. The same concepts apply in logistic regression with offset variables, by the way. But that's a party favor for another day.