Computational Natural Language Processing and the Neuro-Cognition of Language Group
This is a multi-disciplinary group comprising of researchers from:
|Centre for Speech Language and the Brain|
|Research Center of English and Applied Linguistics|
|MRC Cognition and Brain Sciences Unit|
|University of Cambridge|
Scientific understanding of the human language system is one of the main challenges facing cognitive science, and of interest to fields as diverse as linguistics, psychology, anthropology, philosophy, biology and computer science. An adequate theory of this complex system would need to integrate scientific knowledge from several fields, many of which are still in a state of rapid development. One such field is cognitive neuroscience, which investigates language function in the human brain, with the aim of developing neurobiologically and cognitively plausible accounts of the human capacity for dynamic comprehension and production of language.
A key input to this research is linguistic information about the core properties of language. This information is typically obtained from conventional resources (dictionaries and grammars) which provide useful generalisations about language, but which do not include statistical information about language use or capture the considerable variation that linguistic items undergo across time and age, data type, and genre. Such information would be an invaluable resource for neuro-cognitive experiments, increasing the plausibility of neurobiological models of language, but it can only be obtained by analysing linguistic patterns and their frequencies in the specific human language data (e.g. patient data, spoken language corpus) of experimental interest. Manual analysis of linguistic data is prohibitively expensive. Automated language analysis using computational Natural Language Processing (NLP) is now a viable alternative.
The last decades have seen a massive expansion in the application of statistical and machine learning methods to NLP. This work has made large-scale processing of human language data possible and yielded impressive results in speech and language processing tasks, including e.g. speech recognition, morphological analysis, parsing, and semantic interpretation. Although the same methods could be used to provide realistic, data-driven linguistic input to neuro-cognitive studies involving language, there have been no systematic attempts to do this. The basic NLP technology is available, but it is inaccessible for researchers without considerable computing skills and requires further development for optimal integration with neuro-cognitive research.
In this new interdisciplinary project we will integrate research in cognitive neuroscience, experimental psycholinguistics and NLP with the aim of providing the infrastructure for more realistic models of language structure for input into theoretically-driven empirical studies of language in the mind and brain. We will conduct a series of neuro-cognitive experiments which focus on the processing of the core components of language at the levels of morphology, syntax and semantics, using linguistic input automatically extracted from relevant human language data. NLP techniques will be improved and extended to deal with a wider range of constructions, domains and text types as required. An easy-to-use tool will then be designed which will enable effective search, extraction and summarisation of the linguistic information in the annotated data and optimal integration with neuro-cognitive experiments.
We expect this project (i) to improve the quality of neuro-cognitive experiments by rooting them in a much more realistic linguistic analysis, (ii) to advance research in NLP by extending existing techniques enabling richer and deeper analysis, and (iii) to provide an important case study for the integration of NLP into critical experimental research in cognitive sciences. The long term goal of this investigation is improved scientific understanding of human language processing which can benefit several disciplines and place researchers in a better position to develop more useful language models, NLP technology, as well as treatments and rehabilitation of various language disorders in the future.