 |
Evaluations of IE systems have been going on since the first Message
Understanding Conference was conducted by Beth Sundheim for the Navy
in 1987 (Chinchor 2000). Beginning in 1990, the MUC-3 conference was sponsored
by DARPA. A description of
the MUC-3 evaluation is available in
Chinchor et al. (1993). More recently,
the 1998 Hub-4 Broadcast News and Hub-5 Large Vocabulary Conversational Speech
Recognition evaluations of Speech Transcription have included subtasks
involving IE
(http://www.nist.gov/speech/tests/index.htm).
In the Hub-4 Broadcast news
Evaluation, the IE task which was used was called Named Entity (NE). The purpose of
the NE task "is to identify named expressions in broadcast news transcriptions"
(Robinson et al. 1999). These named
expressions included most proper names (such
as names of persons, locations and organizations) as well as certain numerical
expressions (including monetary amounts, time expressions, and percents).
(Chinchor et al. 1998)
The transcriptions were made
for the most part by automatic speech recognition (ASR) systems, and often
included various types of errors.
The inclusion of ASR errors
made the task somewhat more difficult than it had been during the MUC series of
evaluations, since those evaluations used text as the source material. However,
systems in the Broadcast News evaluation performed remarkably well. The best system
in the 1999 Broadcast News evaluation received an F-score of 91, while the
best system in the MUC-7 evaluation received an F-score of 93 (Robinson et al.
1999).
The paper referenced above also
includes brief examples of "noisy speech transcriptions" followed by the same
noisy transcriptions with Named Entity tags included.
In 1999, at the Broadcast News
evaluation conference, a new task was proposed which would have moved on from the
difficult Scenario Template (ST) task, which was last performed in the MUC-7 evaluation.
The best scores achieved in the ST task were 51% accuracy (Chinchor 2000). In the
proposed replacement task, dubbed the "Event99" task, the annotators were
able to achieve over 80% accuracy, indicating that the task was likely defined
clearly enough to be usable in a formal evaluation
(Hirschman et al. 1999).
Unfortunately, the Broadcast News evaluation moved in a different direction
for 2000, and the Event99 task has not yet been formally evaluated.
|
 |
This page last modified November 13, 2006 by Erica Brown.
httpd://www.oocities.org/ejb_wd/IE-intro.html
© 2000-2006, Erica Jean Lindsey Brown, All rights reserved
This page has been accessed - - times since November 14, 2000
|
 |