Lexicon Development for Speech and Language Processing

Bok av Frank Van Eynde

This book offers a state-of-the-art survey of methods and techniques for structuring, acquiring and maintaining lexical resources for speech and language processing. The first chapter provides a broad survey of the field of computational lexicography, introducing most of the issues, terms and topics which are addressed in more detail in the rest of the book. The next two chapters focus on the structure and the content of man-made lexicons, concentrating respectively on (morpho-)syntactic and (morpho-)phonological information. Both chapters adopt a declarative constraint-based methodology and pay ample attention to the various ways in which lexical generalizations can be formalized and exploited to enhance the consistency and to reduce the redundancy of lexicons. A complementary perspective is offered in the next two chapters, which present techniques for automatically deriving lexical resources from text corpora. These chapters adopt an inductive data-oriented methodology and focus also on methods for tokenization, lemmatization and shallow parsing. The next three chapters focus on speech applications, more specifically on the organization of speech data bases, and on the use of lexica in speech synthesis and speech recognition. The last chapter takes a psycholinguistic perspective and addresses the relation between storage and computation in the mental lexicon. The relevance of these topics for speech and language processing is obvious, for since NLP systems need large lexica in order to achieve reasonable coverage, and since the construction and maintenance of large-size lexical resources is a complex and costly task, it is of crucial importance for those who design or build such systems to be aware of the latest developments in this fast-moving field. The intended audience for this book includes advanced students and professional scientists working in the areas of computational linguistics and language and speech technology.