IFLE - International and Foreign Language Education
Home Programs Search Conferences Contact Us FAQ & Help
The Southeast Asian Languages Library   printer-friendly-version
Program: Technological Innovation and Cooperation for Foreign Information Access
Award Number: P337A050018
Grant Period: 10/01/2005 - 09/30/2010
World Area: Southeast Asia
2005:  $108,000
2006:  $108,000
2007:  $108,000
2008:  $108,000
Total:  $432,000
Institution: University of Wisconsin-Madison
Project Director: Robert Bickner
Research and Sponsored Programs, 4th Floor, 750 University Avenue
Madison, Wisconsin 53706
Tel: 6082631755
Fax: 6082633735
Email: rbickner@wisc.edu
Southeast Asia is a global crossroads of singular geopolitical importance: an astonishing 30% of the world's trade goods transit its central Malacca Straits. Unfortunately, we have little capacity for information access or language reference in this important and often tumultuous region. The mainland countries are represented least: data and dictionaries that use the Indic-derived scripts of Burma, Laos, Thailand, Cambodia, and the minority Mon, Karen, and Shan states, and Vietnam's Roman-derived, Chinese-influenced QuÑc Ngï script, are largely unavailable in the United States, and there has been little software development beyond basic tools for text input and output.

The Southeast Asian Languages Library - SEAlang Library, for short - is a technically innovative plan to build core lexical resources for all Southeast Asian languages, starting with the difficult scripts used by the five mainland countries.

Broad support for the SEAlang Library reflects its importance to the Southeast Asian Studies community. In preparing this proposal, the University of Wisconsin-Madison Center for Southeast Asian Studies (CSEAS, host of the Southeast Asian Studies Summer Institute, SEASSI) and co-sponsor Center for Research in Computational Linguistics (CRCL Inc., a US 501(3)(c) nonprofit), have made concrete plans for cooperation with the Center for Khmer Studies (CKS, Siem Reap), the Ecole française d'Extrême-Orient (EFEO), the Committee on Research Materials on Southeast Asia (CORMOSEA), the Coalition of Teachers of Southeast Asian Languages (COTSEAL), and NGO-based open source software projects in Burma, Laos, Thailand, Cambodia, and Vietnam.

The SEAlang Library will provide: Dictionaries: we will prepare XML-metatagged digital bilingual dictionaries, based on the best available print reference works - often difficult to obtain from U.S. libraries - for the national languages Burmese, Lao, Thai, Khmer, and Vietnamese, and the major ethnic minority languages Mon, Karen, and Shan. These will be supplemented by historical dictionaries in cases of significant orthographic change, and extended, as possible, by lexicons of newly minted words. All SEAlang dictionaries will be accessible via approximate search software that locates national orthography, transliteration, or phonetic transcription, and can be used both interactively, and as program-accessible Web resources.

Text Corpora: we will build monolingual and aligned bitext corpora. Used to study collocation and usage, and to support data-driven language learning, these are necessary precursors to more advanced translation and monolingual and cross-language information retrieval tools. We will provide substantial (to tens of millions of words) monolingual corpora for each majority language, along with the largest feasible (hundreds of thousands of words for Thai and Vietnamese, and less for others) aligned two-language corpora, drawn from both online resources and on-the-ground publishing contacts.

Software: we will build information access tools for Southeast Asian scripts, including tools for segmentation and transliteration, conversion between font encodings, text harvesting and indexing, and statistical analysis. User applications, including the SEA-Search query builder, the SEA-Cat Library of Congress Romanization / cataloging utility, the SEA-Read reader's helper, and the SEA-See text-as-image utility for scripts (like Khmer) that are difficult to render in Unicode, will be linked to dictionaries, text corpora, and transliteration engines to help fulfill the promise of regional information access.

The Southeast Asian Languages Library is a long-awaited addition to the national digital infrastructure being built with the support of a variety of U.S. Department of Education Title VI programs. It will enable: pedagogy and new teaching, learning, and translation tools for less-commonly taught languages, scholarly inquiry in linguistics, history, lexicography/etymology, and Southeast Asia area studies, scientific research in computational linguistics and cross-language information retrieval, and language reference all but unavailable to 1.8 million Americans of mainland Southeast Asian heritage who can typically speak - but not read, or consult reference materials in - their heritage languages.
Languages: Countries:
Karen, S'Gaw
Khmer (Cambodian)
Disciplines: Subjects:
Computer/information science
Ethnic studies
Foreign languages and literature
Global/international relations and studies
Information management
Library science
Area Studies
Assessment and Testing
Distance Learning
Foreign Language Programs (Domestic)
Foreign Language Programs (Overseas)
Less Commonly Taught Languages (LCTL)
Overseas Opportunities
Self-Instructional Language Programs
Undergraduate Education