Statistical models of language and Zipf’s law

BOBICEV, Victoria; POPESCU, Anatol; ZIDRAŞCO, Tatiana

DSpace Home
→
Facultatea Calculatoare, Informatică şi Microelectronică
→
Conferinţe
→
Conferinţa "Microelectronics and Computer Science"
→
2005
→
View Item

dc.contributor.author	BOBICEV, Victoria
dc.contributor.author	POPESCU, Anatol
dc.contributor.author	ZIDRAŞCO, Tatiana
dc.date.accessioned	2019-11-12T10:23:03Z
dc.date.available	2019-11-12T10:23:03Z
dc.date.issued	2005
dc.identifier.citation	BOBICEV, Victoria, POPESCU, Anatol, ZIDRAŞCO, Tatiana. Statistical models of language and Zipf’s law. In: Microelectronics and Computer Science: proc. of the 4th intern. conf., September 15-17, 2005. Chişinău, 2005, vol. 2, pp. 133-136. ISBN 9975-66-038-X.	en_US
dc.identifier.isbn	9975-66-038-X
dc.identifier.uri	http://repository.utm.md/handle/5014/6693
dc.description.abstract	Statistical models based on text words became very widespread for the last years. Estimation of words never met in corpus is one of word probability estimation subtasks. Attempts to find the number of never met words, using Zipf’s formula give rather big values for the words never met in corpus. Making several experiments we observed that the number of words never met in corpus is proportional to the number of words met only once and depends on the text vocabulary. If the following texts are of the same type with corpus, estimation of never met words is rather adequate. But if the following texts differ from the corpus, the number of never met words can either increase or decrease considerably.	en_US
dc.language.iso	en	en_US
dc.publisher	Technical University of Moldova	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Zipf law	en_US
dc.subject	statistical language modelling	en_US
dc.subject	statistical models	en_US
dc.subject	zero frequency	en_US
dc.title	Statistical models of language and Zipf’s law	en_US
dc.type	Article	en_US