GF chat bot
From Digital Grammars Wiki
This is a project idea. Contact Björn Bringert for more information.
Contents |
[edit] Overall objective
The goal of this project is to create system which can learn transformations between texts in natural language. Such a system could be used for, for example:
- Chat bots, where the transformation that the system learns is from a question/statement to an answer.
- simple machine translation, where the system would learn the translation from a sentence in the source language to a sentence in the target language.
[edit] Implementation idea
The idea is to use Grammatical Framework grammars to translate between the texts and abstract syntax terms, and then use a machine learning algorithm to learn transformations on abstract syntax trees.
Text in language A | | [Parsing with GF grammar for language A] | V Abstract syntax tree for language A | | [Tree transformation, based on machine learning] | V Abstract syntax tree for language B | | [Linearization (generation) with GF grammar for language B] | V Text in language B
In the machine translation example, language A is the source language and language B is the target language. In the chat bot example, languages A and B are normally the same language.
[edit] Rough work outline
- Develop/find, and implement a machine learning algorithm for tree transformations.
- Collect a corpus from some domain, for example Internet Relay Chat logs in the chat bot case.
- Write GF grammar(s) for the corpus data.
- Write a program that puts it all together.
- Train it on the data.
- Evaluate, for example by a simple Turing test.
[edit] Possible extensions
- Learn the grammar from the corpus, to avoid having to write the grammar by hand.
- Make the system more robust in dealing with unrecognized input. For example, by letting the learning algorithm look at the parse chart, instead of just one tree, to handle incomplete parses.
[edit] Required knowledge
This project would be suitable for two students, or one ambitious student with broad interests.
It would be useful for the students to be familiar with:
- Natural language grammars.
- Machine learning.
- Programming in Haskell, Java or possibly Prolog.

