Profile Photo

C.M. Downey

Assistant Professor of Linguistics & Data Science

PhD, University of Washington, 2024

Office Location
507 Lattimore Hall
Web Address
Website

Office Hours: By appointment

Biography

Before joining the UR Department of Linguistics in 2024, I earned my PhD from the University of Washington, on a Computational Linguistics track. My dissertation, titled Adapting Pre-Trained Models and Leveraging Targeted Multilinguality for Under-Resourced and Endangered Language Processing, was advised by Gina-Anne Levow and Shane Steinert-Threlkeld.

Research Overview

My research develops methods to improve the efficacy of Natural Language Processing (NLP) tools for under-resourced languages (those lacking the abundant data needed to train modern machine learning models). The most common approach to building machine learning systems is to train huge neural networks on high-resource languages like English and Chinese, for which vast amounts of textual data (i.e. hundreds of gigabytes) are available. Such techniques are inapplicable to the majority of the world's languages, which lack the large requisite text datasets. This methodological gap undermines the potentially vital role these systems can play in creating tools such as assisted completion and keyboard auto-correct features, automatic speech recognition, and machine translation services. Development of such tools helps ensure that minority and endangered languages can thrive in the digital era. To address this gap, I specialize in machine learning techniques that are applicable to under-resourced languages, with a strong emphasis on:

  • unsupervised/self-supervised learning, enabling training with raw text or much smaller amounts of specialized data than supervised paradigms
  • multilingual modeling, allowing language data to be pooled by training on more than one language at once
  • transfer learning, leveraging existing models trained in higher-resource languages for use with new, low-resource ones

My contributions to this agenda include projects that focus on unsupervised morpheme segmentation, linguistically-informed multilingual modeling, cross-lingual model transfer, and unsupervised machine translation.

Research Interests

  • Computational Linguistics
  • Natural Language Processing (NLP)
  • Low-resource NLP
  • Language Documentation/Revitalization
  • Morphology
  • North American Languages