Latent search engines and question-answering (QA) engines fundamentally depend on our intuitive notion of semantics and semantic distance. However, such a semantic distance is likely undefinable, certainly un-computable, and often blindly approximated. Can we develop a theoretical framework for this area?
This tutorial will describe a theory, using the well-defined information distance, to approximate the elusive semantic distance such that it is mathematically proven that our approximation is "better than" any computable approximation of the intuitive concept of semantic distance. Although information distance itself is obviously also not computable, it does allow a natural approximation by compression, especially with the availability of big data. We will then describe a natural language encoding system to implement our theory followed by experiments on a QA system.
Ming Li is a Canada Research Chair in Bioinformatics and a University Professor at the University of Waterloo. He is a fellow of the Royal Society of Canada, ACM, and IEEE. He is a recipient of E.W.R. Steacie Fellowship Award in 1996, the 2001 Killam Fellowship, and the 2010 Killam Prize. Together with Paul Vitanyi they have co-authored the book "An Introduction to Kolmogorov Complexity and Its Applications".