General Info   Events   Staff   Research   Scientific Council   Conferences   Seminars   Recent Publications   Library   Publishing Centre   Staff Services   Links 
Publishing Centre \ 2001 \ 936 - Abstract Site Map  

936 - Abstract

 

2001

 

Publishing Centre

Home

 

Piotr Bański

The proposed encoding scheme for the IPI PAN corpus

936

Abstract

The present report describes the encoding scheme used for the purpose of creating a large annotated corpus of Polish, referred to here as the IPI PAN corpus. The corpus is going to contain at least 75-100 million words, and is going to be annotated structurally and morphosyntactically according to the suggestions laid out in the Corpus Encoding Standard (CES) Guidelines. The report begins with an overview of the existing approaches to corpus encoding, discusses the reasons behind the adoption of the CES, and finishes with a description of the activities to be performed during the process of constructing the IPI PAN corpus.


Keywords : annotation, Corpus Encoding Standard, language engineering, text corpus, XML.

  webmaster@IPIPAN.Waw.PL Copyright by ICS PAS - 2003