User: Pass: | Sign Up | Help

data-driven
college admission predictions






James « MyChances.net

About: James

Website
http://www.mychances.net/
Profile

Posts by James:

    Scatterplots: Now showing predictions, acceptance status


    by James October 18th, 2009
    No Gravatar

    Tonight I added a couple of features to the scatterplots (scatter charts / scattergrams) that I introduced in my last post. The two new variables that you can now access are: acceptance status and prediction.

    Let’s say you wanted to know how accurate our prediction engine is within one particular range of predictions. For example, you want to see what really happens when we claim someone has around 80-100% chances. Set the axes to Prediction, add a small amount of jitter, and you can now get a sense of how many miscategorizations we’ve made in that region. Take a look at Boston College to see an example.

    You can now graph our predicted probability of acceptance on scatterplots.

    You can now graph our predicted probability of acceptance on scatterplots.

    Clearly we do a pretty good job at Boston College. For one, you can instantly see that the blue (accepted) applicants cluster on the right side with high predictions, while the red (rejected) cluster on the left. Additionally, look at the mean (average) lines. The mean prediction for accepted members was 83.3%, while that for rejected members was 35.6% – a very large difference.

    Another way to visualize the data would be to set one axis to Accepted, and another to Prediction, so you get perfect separation between the blue (accepted) and red (rejected), and can perhaps more easily see how well our predictions separate the accepted from the rejected at any particular range of chances.

    These updates were requested by Christian Romero; thanks, Christian.

    • Share/Save/Bookmark

    New college admissions tool: Interactive flash scatterplots


    by James October 4th, 2009
    No Gravatar

    We have rolled out our interactive flash scatterplots (also known as scattergrams), available on every college page under the ‘My Analysis’ tab.

    These graphs display the accepted and rejected applicants scattered across a 2D canvas according to the variables that you choose. For example, you might look at Unweighted GPA & SAT, or Instate & Average AP Score. To get started with this new tool, see Cornell’s scatterplots.

    For any given SAT score, valedictorians appear more likely to get into Cornell than non-valedictorians.

    For any given SAT score, valedictorians appear more likely to get into Cornell than non-valedictorians.

    Because there are many, many overlaps, you can set a level of jitter, so each point floats near its true value. For example, if you look at Unweighted GPA and Valedictorian Status, everyone will clump on top of one another. (You either are a valedictorian, or you aren’t, so there are only 2 slots that you might possibly fit into – hence lots of clumping.) If you set a 20% jitter to Valedictorian Status, things will spread out nicely, so you can see what is really going on.

    With your feedback and criticism (please post it here or in the forums), we’ll work on improving the tool. Enjoy!

    These display the accepted and rejected applicants on the same canvas. You can choose which dimensions they’ll be displayed against (unweighted GPA and SAT, for example).
    • Share/Save/Bookmark

    College Rankings #2: Pitfalls of Various Preference-Based Ranking Methods


    by James August 30th, 2009
    No Gravatar

    In my previous post, I introduced the new college rankings system that we have implemented. In short, the system ranks colleges based on where their admitted students decide to attend. In this post, I will discuss some of the approaches that might be considered in creating a preference-based ranking. In the next post, I will discuss the preference ranking system that we have implemented.

    2009 MyChances.net College Rankings

    2009 MyChances.net College Rankings

    Yield isn’t enough

    The goal of a preference-based ranking system is to capture people’s true preferences and represent them faithfully. To discover people’s preferences, a reasonable place to start might be a college’s yield. Yield is calculated as follows:

    yield = (# of students attending) / (# of students accepted)

    So how can we use yield to compare two schools? Suppose we match the University of Georgia (#70 on our list) against Pomona (#50 on our list). In this matchup, Georgia’s 55% yield actually beats out Pomona’s 39% yield. More of Georgia’s admitted students end up attending—so Georgia appears to be preferable to Pomona. But there is a problem here: we have no direct evidence that students, given the opportunity to attend either school, would choose Georgia. We simply don’t know what the students who were admitted to both schools would do.

    In the abstract, there is another problem with this approach. Imagine that 100 students apply to both Georgia and Pomona. Suppose Pomona accepts 50 of them but Georgia accepts all 100. Now, suppose the 50 rejected from Pomona all decide to go to Georgia, giving it a 50% yield. Suppose, also, that 40 of Pomona’s accepted students also get into Harvard, Yale, or Princeton, and they all go off to those schools. This leaves Pomona with a 20% yield. Going by yield, it appears that Georgia is the preferred college by far—but in reality, all of the students admitted to both Pomona and Georgia who attended one of the two decide to go to Pomona. Yield, in this situation, gives us exactly the wrong answer about which school is preferred over the other!

    Each student matters

    What can we learn from the failure of yield as a measure of preference? Summary statistics simply don’t tell us enough. We need to drill down to the level of individual students. Only then can we build up a picture of their collective preferences. How can we do this? One approach might simply be to ask them what their preferences are. For example, we could survey a bunch of students applying to college, and ask them to order all of the schools they are considering, from most favorite to least favorite.

    This is better; if those 100 people in our previous example honestly represented their preferences, we would probably see that the Pomona was preferred over Georgia. This is the intuitively correct result given our (fake) example. But even this approach isn’t perfect.

    Talk is cheap; opinions, cheaper

    One problem is that there is no cost associated with ranking a school #1 on your own personal list. Until you actually have to decide which college you are going to attend—and pay tuition to—for the next 4 years, your opinions have no teeth. Let’s say you rank UNC as your #1 school and Duke as your #8 (out of 8), because your Tar Heel family hates those Blue Devils. You apply, and get into both schools. Did I mention that you got a merit scholarship to Duke? All of a sudden, you find yourself attending your supposedly bottom-ranked school. You didn’t lie when you gave us your rankings, but you probably exaggerated how much you preferred UNC over Duke. Furthermore, you didn’t have all of the information that you used to make your decision—such as your merit scholarship—when you reported that Duke was your #8 school.

    In general, asking people for their preferences leads to these additional problems:

    • They may give feedback about colleges where their feedback is of questionable value. If someone with a 1.5 GPA says that they rank State U over Harvard, should that hurt Harvard—even though this person almost certainly wouldn’t be given the opportunity to attend there, anyways?
    • They almost certainly give feedback that is based on imperfect information. At the moment where people are making their decision to attend one school out of several that they were admitted to, they have acquired as much information as they think they need to make this huge decision. Beforehand—and, in particular, before they have applied to and been admitted to colleges—their stated preferences may be much more labile.

    Understanding these flaws helps flesh out a framework for a powerful-yet-simple preference-based college rankings system: one where students simply report where they were admitted and where they decided to attend. In my next post, I’ll get into some of the details of how to take this information and construct a ranked preference list. I’ll even demonstrate how this approach addresses a common criticism of the currently popular college rankings: that there is no way to truly distinguish between schools closely ranked (e.g., #3 vis-a-vis #5).

    Essentially, the problem is that there is no cost associated with ranking a school #1 on your own personal list. Until you actually have to decide which college you are going to attend—and pay tuition to—for the next 4 years, your opinions have no teeth. Let’s say you rank UNC as your #1 school and Duke as your #8 (out of 8), because your Tar Heel family hates those Blue Devils. You apply, and get into both schools. Did I mention that you got a merit scholarship to Duke? All of a sudden, you find yourself attending your supposedly bottom-ranked school. You didn’t lie when you gave us your rankings, but you probably exaggerated how much you prefer UNC over Duke. Furthermore, you didn’t have all of the information that you used to make your decision—such as your merit scholarship—when you reported that Duke was your #8 school.

    • Share/Save/Bookmark

    New College Rankings


    by James July 10th, 2009
    No Gravatar

    Presenting: our new college rankings.

    The college admissions landscape is littered with college rankings. In 1983, US News first ranked American universities. Since then, rankings have been a fixture of the college world: they are produced by various businesses (US News, Princeton Review, Forbes, Atlantic Monthly), and heeded by students and colleges alike. To gain advantage, some universities have been alleged to manipulate their own rankings. And, while some of the factors used in the rankings are justifiable (alumni giving rate), some seem to be arbitrary (peer assessment surveys asking other colleges about your college’s ‘faculty dedication to teaching’). Each year, the methodology changes slightly, producing a slightly different list. In the end, the factors that are used to come up with the rankings seem arbitrary; the occasional change in the weighting of each factor, capricious. There is a need for a new approach.

    Criteria for a ‘good’ college ranking system

    1. The system should be difficult to game; any ‘gaming’ of the system should actually benefit students. In contrast, consider the allegations that some schools tried to manipulate the US News rankings by encouraging more students to apply in order to decrease their acceptance rate.
    2. The factors measured should be relevant to students. In contrast, what Cornell’s dean thinks about the faculty dedication at the University of Texas may be irrelevant.
    3. The overall procedure for generating rankings should be stable from year to year. In other words, any change in the rankings between 2008 and 2009 should be explained by a substantive change in the underlying factors, not by an arbitrary change in how those factors are weighted.

    The MyChances College Rankings

    We have implemented the MyChances College Rankings based on revealed student preference. In this system, the college admissions process is treated like a chess tournament. The colleges play matches (which occur when 2 colleges admit the same student). In each match, there is a winner (the college that the student ends up attending) and a loser. The winner gains points; the loser forfeits them. When a high-ranked school beats a low ranked school, the high-ranked school gains few points, and the low-ranked school loses few points. If a low-ranked school beats a high-ranked opponent, it gains more points than if it beat an equally-matched opponent. After playing many games, the colleges that students prefer rise naturally to the top of the rankings.

    Does the method of revealed student preference meet the 3 criteria outlined above? I believe it does.

    Consider point #1 (gaming the system). Imagine that MIT wanted to beat out Harvard by trying hard to avoid admitting any students that they thought would be admitted to Harvard. They would end up succeeding in a model based on acceptance rate and yield (since their yield would likely increase), but their actual student body would be less qualified. In the revealed preference model, however, they would be less successful. They would not compete head-to-head with Harvard, so would ‘win’ more. But they would be winning against weaker ‘opponents’, earning fewer points for each victory.

    For point #2 (relevance), the idea of revealed preference is that it aggregates the sum total of what matters to students – whatever those factors might be. It is likely that students behave rationally (by attending the school that they find most desirable). So long as other students share similar values, then revealed preference rankings will work well in explaining, and even guiding, their decisions.

    For point #3 (stability), the tournament style system is simple and straightforward. It is responsive to changes in student preference over time. It does not rely on aggregations of various statistical factors, or college faculty survey results; nor does it depend upon arbitrary weighting of those factors.

    The details of the procedure that we use to generate the rankings, and our use of chess-style Elo points, will be explained in a later post. For an academic treatment of a similar college ranking system, I recommend the working paper, “A Revealed Preference Ranking of U.S. Colleges and Universities,” 2005, by Christopher Avery, Mark Glickman, Caroline Hoxby, and Andrew Metrick (free link).

    • Share/Save/Bookmark

    Secret preferences revealed: which colleges do students actually choose?


    by James May 12th, 2009
    No Gravatar

    Today we’re letting everyone in on a sneak-preview of our latest tool: the college cross-admit preference tool. We think it’s a simple but powerful way to see which colleges are most favored by admitted college students.

    To use it is simple: type in the names of two colleges that you want to compare (perhaps Florida and Florida State?). You’ll then see which fraction of site members prefers which school. Preference is determined by the relative fraction of members admitted to both schools who end up attending one or the other. For example, if 25% of students admitted to both College A and College B ultimately go to College B, we say they prefer College B over College A. When the results are statistically significant at the 95% level, you’ll see the results lit up in bright colors.

    For the hardcore college admissions followers out there, this will remind you of this graphic from a 2006 NY Times article. One difference is that our list isn’t limited to 17 schools; as the data continues to become available, we’ll display this information for all 1700 schools that we track.

    Requests? Feedback? Suggestions? Let us know.

    • Share/Save/Bookmark

    Forking Daemons in PHP


    by James April 30th, 2009
    No Gravatar

    Note: this is from http://bipinb.com/making-php-program-as-daemon.htm . It has been intermittently offline, so I’m archiving it here for future reference.

    
    <?php
    include_once('createdb.php');
    declare(ticks=1);
    $pid = pcntl_fork();
    if ($pid == -1) {
    die("could not fork");
    } else if ($pid) {
    exit(); // we are the parent
    } else {
    // we are the child
    }
    // detatch from the controlling terminal
    if (posix_setsid() == -1) {
    die("could not detach from terminal");
    }
    $posid=posix_getpid();
    $fp = fopen("/var/run/process.pid", "w");
    fwrite($fp, $posid);
    fclose($fp);
    // setup signal handlers
     pcntl_signal(SIGTERM, "sig_handler");
     pcntl_signal(SIGHUP, "sig_handler");
    // loop forever performing tasks
     $dbobject = new DB();
     $dbobject->getCon();
     while (1) {
    // do something interesting here, here i have called a function from other flile called "createdb.php"
    $dbobject->CopyCallFiles();
    }
     fclose($fp);
     function sig_handler($signo)
     {
    switch ($signo) {
     case SIGTERM:
     // handle shutdown tasks
     exit;
     break;
     case SIGHUP:
     // handle restart tasks
     break;
     default:
     // handle all other signals
     }
    }
    ?>
    • Share/Save/Bookmark

    HostGator causes unannounced downtime


    by James March 17th, 2009
    No Gravatar

    We’ve been hosted at HostGator for a couple of years now, and have had a good experience with them until last night.

    Last night, HostGator made as-yet-undisclosed, and unannounced, security changes to their servers. During this period, they put up ‘Under Maintenance’ signs across all hosted sites. For us, this lasted from about 3:00am-4:00am. Problematically, these signs were not ‘nocached’. Therefore, some visitors are still seeing these pages instead of the current content.

    Far more troubling is what occurred to the databases during that time. We started getting database errors around 1:30, and one table even crashed. At that point, we ran a repair command, which was successful. So far, so good. Then, from 3-4am, the ‘Under Maintenance’ signs were put up. Also not really a problem, since no database modifications could be made during that time.

    When those ‘Under Maintenance’ signs were turned off, the site was functional again and I assumed we were good to go. We allowed members to continue signing up and making changes to their profiles. We made forum posts. I even did a fair amount of college-name hygiene, replacing less-common college names with their more common nicknames (Virginia Polytechnic Institute and State University ==> Virginia Tech).

    This is where I become profoundly disappointed in HostGator: I awoke around noon (hey, I’m on spring break) to find that none of my changes had stuck. In fact, a whole bunch of forum posts that were made after the site came back online were deleted. Most problematically, college profile updates and new member accounts created in the few hours before and after the update were also deleted. From my perspective, there is no good excuse for this. Since HostGator was aggressive enough to replace our site’s content with “Under Maintenance” signs, all maintenance should have been completed while those signs were still up. The site became accessible, but then had its database rolled back to a version from approximately 5 hours prior. Whatever triggered them to do this, I do not know. What I do know is that they exhibited poor business practices last night.

    For all of you who modified your profile last night from about 11pm Eastern to 6am (and there were surprisingly many of you), we apologize that your changes were lost.

    • Share/Save/Bookmark

    On the Bank Bailout: AIG Bonuses


    by James March 16th, 2009
    No Gravatar

    A choice comment from a New York Times reader:

    If the government owns 80% of AIG, simply have the owners order management not to pay these bonuses.

    Anyone who feels damaged by not receiving a bonus would be free to sue and have the issue decided by a jury selected from the general population.

    • Share/Save/Bookmark

    White House now streaming State of the Union live over the Internet


    by James February 24th, 2009
    No Gravatar

    Up until tonight, if you wanted to watch a State of the Union (typically just called a ‘Presidential Address’ during the President’s first term), you had to do it via television. If you got your TV over the air, this meant selecting among the NBC/ABC/CBS affiliates in your area, with their respective commentators. If you got your TV over cable, this meant selecting among CNN/MSNBC/Fox News, each of which adds tons of spin (CNN mostly spins about its own importance, while MSNBC and Fox News color things with a biased, politicized brush).

    Tonight, for the first time, you can watch the President’s address directly from http://www.whitehouse.gov/ . Now, the only spin will be that coming from the President’s mouth. Love him or hate him, won’t it be nice to praise (or spit fire at) him directly, without the filter of some overexcited, hyperanalytic commentator?

    • Share/Save/Bookmark

    Harvard’s Steven Pinker Off-key on Oath of Office


    by James January 22nd, 2009
    No Gravatar

    Essays should have consistent theses with relevant anecdotes, right?

    Steven Pinker wrote an article in the New York Times yesterday about Chief Justice Roberts’s flub of Obama’s Oath of Office. Essentially, Chief Justice Roberts misspoke, saying “solemnly swear that I will execute the office of president to the United States faithfully” instead of “solemnly swear that I will faithfully execute the office of president of the United States” as specified in the Constitution.

    Pinker, a noted Harvard professor and the chair of the American Heritage Dictionary usage panel, used this to lead into a discussion of split infinitives. His points are generally correct (or, rather, I agree with him). Distilled, his argument is that splitting infinitives has never been a grammatical problem in English, despite what some law journals may claim. While infinitives cannot be split in Latin (literally, you cannot split them, since they consist not of two words but of one), splitting them in English is easy, and allowed.

    Both of these topics are interesting. However, what is odd is that he connects the two based on Roberts’s transposition of faithfully; Pinker infers that this was because Roberts wanted to move the “adverb ‘faithfully’ away from the verb.” That’s all well and good, but the Constitution’s version doesn’t split any infinitives! The adverb ‘faithfully’ in the predicate “… will faithfully execute” does not split an infinitive, but instead separates the auxiliary verb ‘will’ from the main verb ‘execute’. So he wanted to avoid a split auxiliary verb? OK, but if the split auxiliary verb is contentious, why not discuss that, instead of the much better-known disagreements over split infinitives?

    In other words, Pinker wrote an interesting review of split infinitives that was incompletely relevant to the prompt at hand: the issue of a split auxiliary verb, which was hardly contentious.

    • Share/Save/Bookmark