
UR Under-Resourced NLP Lab
Headed by Professor C.M. Downey, the UR Under-Resourced Natural Language Processing lab (UR2NLP) develops methods to improve the efficacy of NLP tools for low-resource languages (those lacking the abundant data needed to train modern machine learning models). The most common approach to building machine learning systems is to train huge neural networks on high-resource languages like English and Chinese, for which vast amounts of textual data are available. Such techniques are inapplicable to the majority of the world's languages, which lack the large requisite text datasets. This methodological gap undermines the potentially vital role these systems can play in creating tools such as assisted completion and keyboard auto-correct features, automatic speech recognition, and machine translation services. The lab's work on the development of such tools helps ensure that these languages can thrive in the digital era.