Evaluation Problems from a Developer’s Point of View
Kirsten Falkedal
 

The talk will focus on problems of evaluating the quality of translations as produced by current day general-purpose commercial MT systems. The perspectives on the problems will be those of a developer/supplier of a specific type of system, but parallels and contrasts to the perspectives of other types of evaluator will be systematically made.

The questions underlying the discussion are at least as old as MT itself: which aspects of MT quality is it relevant to whom to assess, are there any aspects it would be relevant to everybody to assess, how can they be assessed, and how should results be expressed.

At a general level, these questions will be examined by a crude and tentative juxtaposition (wrt. evaluation methodology) of providers and users of MT systems:

 At a more specific level, using standardly recommended criteria like intelligibility, accuracy, post-editibility and improvability as exemplification, it is discussed which factors must be considered, eliminated or neutralized when selecting evaluation material, testers and evaluation procedure.

In conclusion it will be argued that as long as machine translations look the way they currently do, detailed evaluation of translation quality using traditional linguistic notions are interesting exercises but most probably wasted efforts. Future work on evaluation methodologies should rather concentrate on (finding) efficient ways of evaluating real pragmatic usefulness and improvability.