## Statistics 2MA3 - Exercise #3 - Hints

### Updated 2001-02-17

##

Plotting ROC Curves

Think of how we plotted the curve by hand: we sorted all the
observations from both groups into one vector then, for each
observation in turn, from smallest to largest, we found the
proportions of the disease sample, and of the control sample, less
than or equal to that observation, and we plotted these proportions
against each other.

> plot.roc
function (sd, sc)
{
sall <- sort(c(sd, sc))
sens <- 0
specc <- 0
for (i in 1:length(sall)) {
sens <- c(sens, mean(sd <= sall[i]))
specc <- c(specc, mean(sc <= sall[i]))
}
plot(specc, sens, xlim = c(0, 1), ylim = c(0, 1), type = "l",
xlab = "1-specificity", ylab = "sensitivity")
abline(0, 1)
invisible()
}

Note that mean(sd <= sall[i]) computes the proportion of times
the condition is satisfied, which is just the sensitivity at cut-off
sall[i]. Note the use of sens <- c(sens, ...) to accumulate
successive values of sensitivity in the vector sens. The invisible()
command ensures that the only effect if the function is plotting and
no values are returned.

We can improve on this code. The following version of plot.roc
will work when there are missing values (NA) in the data and returns
the area under the ROC curve.

> plot.roc
function (sd, sc)
{
sall <- sort(c(sd, sc))
sens <- 0
specc <- 0
for (i in 1:length(sall)) {
sens <- c(sens, mean(sd <= sall[i], na.rm = T))
specc <- c(specc, mean(sc <= sall[i], na.rm = T))
}
plot(specc, sens, xlim = c(0, 1), ylim = c(0, 1), type = "l",
xlab = "1-specificity", ylab = "sensitivity")
abline(0, 1)
npoints <- length(sens)
sum(0.5 * (sens[-1] + sens[-npoints]) * (specc[-1] - specc[-npoints]))
}

## Choosing a Cut-off Point

Note that within each group, the misclassification probability is
given by a normal tail area, either above (for the disease group) or
below (for the control group) the cut-off a. To find the total
probability of misclassification, multiply the conditional rates by
the respective probabilities for the disease group and the control
group, and add. To find the value of a that minimizes this
expression, differentiate with respect to a, set the derivative to
zero and solve for a. Remember that F() is
just the integral of the normal probability density function.

Last modified 2001-02-17 15:39