- Prof. Kevin Leyton-Brown, Department of Computer Science, University of British Columbia
- 16 maart 2017
Niels Bohrweg 1
2333 CA Leiden
Peer Grading: From Theory to Practice
This talk describes a research effort on the peer grading problem, running the gamut from theory to practice. First, I'll describe a theoretical investigation of different approaches to incentivizing students to invest the effort to perform thoughtful and honest peer evaluations.
One promising approach is to draw on the large literature on so-called "peer prediction". The mechanisms proposed in this literature can incentivize truth telling in equilibrium, but also give rise to equilibria in which agents do not pay the costs required to evaluate accurately, and hence fail to elicit useful information. We show that this problem is unavoidable whenever agents are able to coordinate using low-cost signals about the items being evaluated (e.g., superficially glancing at the text to be graded, focusing on grammar and formatting). We then consider ways of circumventing this problem by comparing agents' reports to ground truth, which is available in practice when there exist trusted evaluators---such as teaching assistants in the peer grading scenario---who can perform a limited number of unbiased (but noisy) evaluations. Of course, when such ground truth is available, a simpler approach is also possible: rewarding each agent based on agreement with ground truth with some probability, and unconditionally rewarding the agent otherwise. Surprisingly, we show that the simpler mechanism achieves stronger incentive guarantees given less access to ground truth than virtually all previously-proposed peer-prediction mechanisms.
The second part of the talk will describe our experiences with using this simpler mechanism in practice. No existing peer review software made it possible to leverage TA grades, so we built our own, which we called "Mechanical TA" and published as open-source software. In this system, human TAs both evaluate the peer reviews of students who have not yet demonstrated reviewing proficiency and spot check the reviews of students who have. Mechanical TA also features "calibration" reviews, allowing students to quickly gain experience with the peer-review process.
We used Mechanical TA for three years to run weekly essay assignments in a class of about 70 students, a course design that would have been impossible if every assignment had had to be graded by a TA. We show evidence that Mechanical TA helped to support student learning, leading us to believe that it may also be useful to others.