Publication details

Application of multinomial mixture model to text classification

Conference Paper (international conference)

Novovičová Jana, Malík Antonín

serial: Pattern Recognition and Image Analysis, p. 646-653 , Eds: Perales F. J., Campilho A. J. C.

publisher: Springer, (Berlin 2003)


action: Iberian Conference on Pattern Recognition and Image Analysis. IbPRIA 2003 /1./, (Puerto de Andratx, ES, 04.06.2003-06.06.2003)

research: CEZ:AV0Z1075907

project(s): IAA2075302, GA AV ČR, KSK1019101, GA AV ČR

keywords: text classification, multinomial mixture model, Bhattacharyya distance

abstract (eng):

The mixture of multinomial distributions is proposed as a model for class-conditional distributions in document classification task. Experimental results on the Reuters and the Newsgroups data sets indicate the effectiveness of the multinomial mixture model. Furthermore, an increase in classification accuracy is achieved for small training data sets, when multiclass Bhattacharyya distance is used instead of average mutual information as a feature selection criterion.

Cosati: 09K, 12B