Possible Theses Topics Theoretical Linguistic...
Monday 07 February 2022

Possible Theses Topics

Theoretical Linguistics

I interested in supervising thesis on topics in theoretical linguistics, especially those concerned with the syntax/semantics interface.

Requirements: students interested in these topics should have followed various linguistics courses, including my Advanced Methods course, or have otherwise acquired an expertise in both generative syntax and formal semantics.

  • Nominalization: e.g. difference between various ways to derive nouns from verbs (e.g. "destruction" vs. "destroying" vs. Italian "Il distruggere" vs. "distruggere (è uno spasso)"). Connections to: Psycholinguistics, Computational Analyses.
  • Kinds: how to derive the different ways in which languages refer to "kinds of things" and express generalizations about them (e.g. "Tigers eat meat", "The tiger eats meat", "A tiger eats meat"); how does anaphoric reference to kinds work?
  • Definiteness and demonstrativity How do languages which do not have definite articles express definiteness? How are demonstratives used across languages? How definite are possessives?
  • Coordination Is there a common meaning for the word "and" in "He was tall and fat / He left and she returned / The guests were in the kitchen and on the balcony"? This article gives a proposal, but later research showed that the system does not work with property coordination in downward monotonic environments. Can this problem be fixed?
  • Italian correlatives. Developing a parsimonious theory of the range of meanings found in the constructions using tanto ("Gianni dorme tanto" “Tanto, Gianni dorme”, “Gianni dorme tanto da preoccuparmi” “Gianni dorme tanto per essere un vecchio”, etc.) and quanto.

Linguistic Methodologies

These are topics that explore novel methods to address classic linguistic issues. Students willing to pursue these issues should have at least some background in linguistics (syntax or semantics) and a good background in computational linguistics (including, in some cases, artificial neural networks ANN and distributional semantics)

  • Alternative ways to collect linguistic data via GWAPs (Games with a purpose). This includes work on the Actors Challenge (AC) forthcoming web site, where players challenge each other on their acting skills, and in so doing generate and validate data on the link between intonation and meaning. Possible theses could involve modifying AC to allow contexts in audio formats, or to ask the players to choose words to describe an intonation profile.
  • Using ANN to simulate human linguistic judgments (part of the TREIL linguistic project). Can we improve current network architectures to make them more effective at capturing this task? What is the best way to measure “ungrammaticality” in a network? Can NN recognize semantically deviant sentences (e.g. “All/*Some people came except Bill” “I like trees more than oaks.”, see this Semeval Task).
  • Backward language models A part of the TREIL project involved training forward and backward  language models to study their difference in a head--first language like English. An interesting thesis project would be to redo the experiment retraining the same architecture on a head-final language (e.g. Turkish, Japanese) and compare the results.
  • Feeding ANN a range of languages/language structures to converge on language universals (structures which are shared by all languages). What kind of tasks could be best used to study the “competence” developed by this "polyglot" network? Can we feed it artificial sentences with features that no language has and hope that they get detected?
  • Extracting new data from Wikipedia. Wikipedia keeps a trace of the edit history of each article. It is possible to use this info to build a dataset which can be used to train systems to guess where a Wikipedia text is going to be edited next?
  • Structure folding Current theoretical syntax often envisions universal complex sequences of functional projections, parts of which move to generate the output we see in actual languages in different ways depending on the feature structure present in each language. The task would be to create a model which explores the feature space by artificially creating all the ways in which a functional sequence can be folded, then comparing the output to known language structures.


While I am not a psycholinguist I am willing to cosupervise students interested in specific topics in sentence comprehension. In particular, I am interested in:

  • follow ups of the experiment THINK (extracting ERP correlates of the process of mentally repeating or translating sentences), which was carried out by a CIMEC master student.
  • Using the semantic presupposition ill-formedness dataset developed for  the PreTENS Semeval Task  to look at N400 in speakers exposed to this understudied type of ill-formedness.

Linguistic education

Education for the general public and for younger students is part of a university's "third mission". In am interesting in developing new ways to teach language structures (which might or might not be a part of actually teaching "languages"). A couple of topics.

  • Testing and expanding Puzz-Ling, the physical language puzzle game. Puzz-ling is a puzzle game to teach basic sentence structures across English, Italian and German. Work on this topic could mean (in order of growing complexity)
    • Designing and optimizing the game rules.
    • Testing puzz-ling and checking which aspects are not yet covered or when the game overgenerates.
    • Expanding it to other languages (e.g. Spanish, French)
    • Designing alternative implementations which solve current problems (e.g. selection to heads, rather than categories; island sensitivity)
    • Designing a “mixed” digital version (ask me).
  • Designing a physical model of syntax for case-rich, free word-order languages, like Latin and Greek. This could be based on a dependency grammar, rather than on a constituent-based grammar like Puzz-Ling
  • Better ways to popularize distributional meanings. Distributional semantics can give us quantitative, vector based similarity measures for words, collocations and constructions (see above), but it remains difficult to convey this information to the general public, beyond saying "A is more similar to B than to C". Most likely a cosupervision with someone knowledgeable about graphics & interfaces.