Archive for 20 February 2008

Since Papineni et al. (2002) introduced the BLEU metric for machine translation evaluation, string matching functions have dominated the field. These metrics work well enough, but there are cases where they break down and more and more research is revealing their biases. Also, BLEU does not correlate especially well with human judgments, so the quality [...]