Tonight I added a couple of features to the scatterplots (scatter charts / scattergrams) that I introduced in my last post. The two new variables that you can now access are: acceptance status and prediction.
Let’s say you wanted to know how accurate our prediction engine is within one particular range of predictions. For example, you want to see what really happens when we claim someone has around 80-100% chances. Set the axes to Prediction, add a small amount of jitter, and you can now get a sense of how many miscategorizations we’ve made in that region. Take a look at Boston College to see an example.
Clearly we do a pretty good job at Boston College. For one, you can instantly see that the blue (accepted) applicants cluster on the right side with high predictions, while the red (rejected) cluster on the left. Additionally, look at the mean (average) lines. The mean prediction for accepted members was 83.3%, while that for rejected members was 35.6% – a very large difference.
Another way to visualize the data would be to set one axis to Accepted, and another to Prediction, so you get perfect separation between the blue (accepted) and red (rejected), and can perhaps more easily see how well our predictions separate the accepted from the rejected at any particular range of chances.
These updates were requested by Christian Romero; thanks, Christian.
