My formal training is in computer science, but the interests of my research group are broad, and I can advise on many problems in multilingual NLP, from theoretical to empirical. I prefer to do basic curiosity-driven research in areas where the general principles are not yet fully understood, in order to improve the foundations of the field. The best work in this area is interdisciplinary, and to do interdisciplinary research, you first need a discipline. So I look for collaborators who are skilled in mathematics, statistics, computer science, or linguistics. You needn't be an expert in all of these fields—few people are—but you must be willing to engage with all of them, and with a diverse group of people whose expertise differs from yours. If you aren't, my research group is not a good fit for you.

Prospective PhD students

In general, I expect to take one or two students a year, and I'm happy to discuss topics. Right now I'm especially interested in these areas, which are all quite open-ended:
Multilingual semantics and graph formalisms
Natural language processing has been enormously successful, but NLP systems still often fail to preserve the semantics of sentences—the "who did what to whom" relationships that they express. As a result, they fail to correctly understand, translate, extract, or generate meaning in many languages. To preserve semantics, they must model semantics, and in support of this goal, efforts are underway to annotate large numbers of sentences with explicit semantic representations in the form of directed acyclic graphs (DAGs). This data will fuel the development of large-scale statistical models of language, statistical models that must deal with structure: the input and outputs are strings, trees, or graphs. For structured prediction on strings and trees, most models can be understood as weighted grammars, automata and transducers: extensions of classical formalisms with real-valued weights representing probabilities. To extend these models to graph-based representations, we require grammars, automata, and transducers for graphs. Several expressive graph formalisms have been studied in both computational linguistics and formal language theory, but in general they are much less well-developed than formalisms on strings and trees. The goal of projects in this area is to develop the mathematics of these formalisms and apply them to problems in natural language understanding and generation in many languages.
Low-resource language and speech processing
The most effective language and speech processing systems are based on statistical models learned from many annotated examples, a classic application of machine learning on input/ output pairs. But for many languages and domains we have little data. Even in cases where we do have data, it is government or news text. For the vast majority of languages and domains, there is hardly anything. However, in many cases, there is side information that we can exploit: dictionaries or other knowledge sources, or multimodal data such as text paired with images, speech, or timestamps. How can we exploit such heterogeneous information in statistical language processing? How can we exploit linguistic knowledge in our statistical models? If we have little to no output examples, can we treat the development of NLP systems as a decipherment problem? The goal of projects in this area is to develop statistical models and inference techniques that exploit such data, and apply them to real problems.

If you're interested in natural language, but you're not excited about these topics, you may want to contact other prospective supervisors in ILCC after reading about potential dissertation topics that they have proposed. If your primary interest is machine translation, I recommend that you contact Kenneth Heafield. If you aren't genuinely interested in natural language, my research group is not a good fit for you, and you should contact supervisors elsewhere in the School of Informatics.

If this all sounds interesting to you, please apply for a PhD. But don't apply by emailing me: faculty in Edinburgh do not admit students directly. To be considered, you must apply for postgraduate study in the School of Informatics. I can advise students in two different programs:

  1. PhD students in the Institute for Language, Cognition, and Computation. This is a three-year research-only program.
  2. MSc + PhD students in the Center for Doctoral Training in Data Science. This is a one-year MSc including coursework and research, followed by a three-year research-only PhD.

If you already have an MSc from a strong program, you may be a good candidate for the ILCC PhD. Consider one of the latter two programs if you do not have an MSc, or if it is in a technical area that you would like to learn in more depth before beginning full-time research. You can apply to multiple programs concurrently, and you can find more information about the application process here.

I will only review applications during the regular winter admissions cycle. For a September start, you should apply by the second week of December in the preceding year. That is not a hard deadline, but after early December, chances of admission diminish rapidly as funding is allocated to those who applied earlier. I'm happy to advise strong students on a research statement covering any topic of mutual interest, but I cannot advise you on your chances of admission, since that depends on many variables that are outside my control and yours.

Current students in Edinburgh

If you're interested in working with me and you are a current student in one of the CDTs; an MSc, MInf, or undergraduate honours student; or a currently enrolled visiting student, then please get in touch!

Prospective interns and visitors

I do not currently have any openings for internships. It's possible that I may have internship openings in the future. If I do, they will be listed here. If there's no listing, please do not send me an unsolicited application.

I am unlikely to host visiting scholars who I do not know personally. I've had some really excellent technical correspondence with people I've never met before on research topics of mutual interest, and in those cases I have made exceptions. But it is very unlikely that I can accommodate a more generic visiting request, especially if you require funding or a visa. If you are a visitor who is already at the University I am happy to talk about research.

Prospective Postdocs

I do not currently have any openings for postdocs.