Skip navigation View an alternate layout of this website with limited styles and no horizontal scrolling
Menu

Corpus Studies in Word Prediction

By Trnka, Keith; McCoy, Kathleen F.; ASSETS 2007 - The Ninth International ACM SIGACCESS Conference on Computers and Accessibility, pp. 195-202
Publication Date: October 15-17, 2007

Outline of the development of a word-prediction system to enhance the communication rate of people with disabilities who use Augmentative and Alternative Communication (AAC) devices. The basis of the system is a language model that has been trained on a large corpus of data. Such a model then predicts the next word of input, based on what the user has already typed. The system is evaluated by calculating theoretical keystroke savings and correcting poor predictions. Training and testing was done on a wide variety of corpora including conversational speech transcriptions, emails from AAC users, and articles from the online Slate magazine. Three tests were used to investigate the effects of training data for each corpus: in-domain, using the same corpus for training and testing; out-of-domain, using the training sets of all corpora except that used for testing; and mixed-domain, using the training sets of all corpora and evaluating on the testing set of each corpus. Topic modeling, which looks for patterns of words that tend to occur together and automatically categorizes them into topics, was implemented on one corpus. The study found that training on a combination of in-domain data with out-of-domain data is often more beneficial than either set alone, and that topic modeling is portable even when applied to very different text.
Published by: Association for Computing Machinery   (Website:http://www.acm.org)

SIGACCESS (ACM Special Interest Group on Accessible Computing)    (Web Site: http://www.sigaccess.org )
Link to text: http://www.cis.udel.edu/~mccoy/sig-nlp-fall07/Trnka-corpus_study.pdf

AbleData, 8630 Fenton Street, Suite 930, Silver Spring, MD 20910. 1-800-227-0216.
Maintained for the National Institute on Disability and Rehabilitation Research of the U.S. Dept. of Education
by ICF Macro under Contract No. ED-04-CO-0018/0007.

The records in AbleData are provided for information purposes only. Neither the U.S. Department of Education nor ICF Macro has examined, reviewed, or tested any product, device, or information contained in AbleData. The Department and ICF Macro make no endorsement, representation, or warranty express or implied as to any product, device, or information set forth in AbleData. The views expressed on this site do not necessarily represent the opinions of the Department of Education, the National Institute on Disability and Rehabilitation Research, or ICF Macro.