My formal training is in computer science, but
the interests of my research group
and I can advise on many problems in multilingual NLP, from theoretical
to empirical. I prefer to do basic curiosity-driven research in areas where the general
principles are not yet fully understood, in order to improve the foundations
of the field. The best work in this area is interdisciplinary, and to do
interdisciplinary research, you first need a discipline. So I look for collaborators
who are exceptionally skilled in at least one of these areas: mathematics, statistics,
computer science, and linguistics. You needn't be an expert in all of them—few
people are—but you must be willing to engage with all of them,
and with a diverse group of other people whose expertise differs from yours.
If you aren't, my research group is not a good fit for you.
Prospective PhD students
In general, I expect to take one or two students a year, and I'm happy to
discuss topics. Right now I'm especially interested in these areas, which
are all quite open-ended:
- Multilingual semantics and graph formalisms
- Natural language processing has been enormously successful, but NLP
systems still often fail to preserve the semantics of sentences—the
"who did what to whom" relationships that they express. As a result, they
fail to correctly understand, translate, extract, or generate meaning
in many languages. To preserve semantics,
they must model semantics, and in support of this goal, efforts are
underway to annotate large numbers of sentences with explicit semantic
representations in the form of directed acyclic graphs (DAGs). This data
will fuel the development of large-scale statistical models of language,
statistical models that must deal with
structure: the input and outputs are strings, trees, or graphs.
For structured prediction on strings and trees, most models can be
understood as weighted grammars, automata and transducers: extensions of
classical formalisms with real-valued weights representing probabilities.
To extend these models to graph-based representations, we require grammars,
automata, and transducers for graphs. Several expressive graph formalisms
have been studied in both computational linguistics and formal language
theory, but in general they are much less well-developed than
formalisms on strings and trees. The goal of projects in this area is to
develop the mathematics of these formalisms and apply them to problems in
natural language understanding and generation in many languages.
- Low-resource language and speech processing
- The most effective language and speech processing systems are based on
statistical models learned from many annotated examples, a classic
application of machine learning on input/ output pairs. But for many
languages and domains we have little data. Even in cases where we do have
data, it is government or news text. For the vast majority of languages and
domains, there is hardly anything. However, in many cases, there is side
information that we can exploit: dictionaries or other knowledge sources,
or multimodal data such as text paired with images, speech, or timestamps.
How can we exploit such heterogeneous information in statistical language
processing? How can we exploit linguistic knowledge in our statistical
models? If we have little to no output examples, can we treat the
development of NLP systems as a decipherment problem? The goal of projects
in this area is to develop statistical models and inference techniques that
exploit such data, and apply them to real problems.
If you're interested in natural language, but you're not excited about
these topics, you may want to contact
other prospective supervisors
in ILCC after reading about potential dissertation topics that
they have proposed.
If your primary interest is machine translation, I recommend that
you contact Kenneth Heafield.
If you aren't genuinely interested in natural language, my research group is not a
good fit for you, and you should contact
supervisors elsewhere in the School
If this all sounds interesting to you,
please apply for a PhD. But don't apply by emailing me:
faculty in Edinburgh do not admit students
directly. To be considered, you must apply for postgraduate study in the
School of Informatics. I can advise students in three different programs:
- PhD students in the Institute for Language, Cognition, and Computation. This is a three-year research-only program.
- MSc + PhD students in the Center for Doctoral Training in Data Science. This is a one-year MSc including coursework and research, followed by a three-year research-only PhD.
- MSC + PhD students in the Center for Doctoral Training in Pervasive Parallelism. This is a one-year MSc including coursework and research, followed by a three-year research-only PhD.
If you already have an MSc from a strong program, you may be a good candidate
for the ILCC PhD. Consider one of the latter two programs if you do not have
an MSc, or if it is in a
technical area that you would like to learn in more depth before beginning
full-time research. You can apply to multiple programs concurrently, and you
can find more information about the application process
I will only review applications during the regular winter admissions cycle.
For a September start, you should apply by the second week of December in
the preceding year. That is not a
hard deadline, but after early December, chances of admission diminish rapidly as
funding is allocated to those who applied earlier. I'm happy to advise strong
students on a research statement covering any topic of mutual interest, but
I cannot advise you on your chances of admission, since that depends on
many variables that are outside my control and yours.
Current students in Edinburgh
If you're interested in working with me and you are a current student in
one of the CDTs; an MSc, MInf, or undergraduate honours student; or a
currently enrolled visiting student, then please get in touch!
Prospective interns and visitors
I do not currently have any openings for internships.
that I may have internship openings in the future. If I do, they will be
listed here. If there's no listing, please do not send me an
I am unlikely to host visiting scholars who I do not know personally.
I've had some really excellent technical correspondence with people I've
never met before on research topics of mutual interest, and in those cases I
have made exceptions. But it is very unlikely that I can accommodate a more
generic visiting request, especially if you require funding or a visa.
If you are a visitor who is already at the University I am
happy to talk about research.
I do not currently have any openings for postdocs.