General Info   Events   Staff   Research   Scientific Council   Conferences   Seminars   Recent Publications   Library   Publishing Centre   Staff Services   Links 
Publishing Centre \ 2001 \ 924 - Abstract Site Map  

924 - Abstract

 

2001

 

Publishing Centre

Home

 

Łukasz Dębowski

Quantitative Considerations on Finding the Shortest Descriptions for Meaningful Symbolic Sequences

924

Abstract

The notes provide elements of a new quantitative theory for unsupervised learning from pragmatic language communication. It is argued that the suitable quantitative inference framework free from paradoxes should be based on minimum description length (MDL) interpreted as a simplified algorithmic complexity rather than on classical frequentist probability. Furthermore, it is argued that recently observed non-extensivity of entropy in meaningful symbolic sequences can arise if and only if unsupervised acquisition of the MDL theories for these sequences produces infinite theories and when the unsupervised acquisition is optimal as well. Such result shakes rigorously the belief that a finite formal theory of natural language could be constructed by hands of any experts. On the other hand, unsupervised machine learning is pointed out as a feasible and the only right way to implementing language competence into AIs. From this perspective, a promising compression-learning algorithm by de Marcken, its efficiency and its extensions are discussed. Important parallels with research in cognitive science and statistical physics are pointed out, as well. Thus, the notes may be interesting not only for computer scientists and linguists but also for other statistical and symbolic theorists.


Keywords : unsupervised learning, natural language processing, communication theory, quantitative linguistics, nonextensive thermodynamics, measures of information, cognitive science, formal languages.

  webmaster@IPIPAN.Waw.PL Copyright by ICS PAS - 2003