- learning reliable classifiers from unreliably classified documents
- exploiting the notion of uncertainty in improving classification results
- deriving normalized phrasal representations from documents
- using phrase representations in conjunction with statistical learning
methods to increase precision in learning
- Cross-Lingual Text Categorization.
KUN has extended the LCS (Linguistic Classification System), developed as a
prototype in the course of the earlier DORO project, into an industrial quality
system capable of classifying large streams of documents in many languages.
Publicly available documentation and publications:
- from the current project:
- C.H.A. Koster, "From keywords to keyphrases", presentation ps.gz pdf at the 'ICT
kenniscongres' in the Hague, 6/7 September, 2001
- C.H.A. Koster, M.Seutter and J.G. Beney (INSA Lyon), "Classifying Patent
Applications with Winnow", Benelearn 2001, Antwerp, December 21. ps.gz pdf
- C. Peters and C.H.A. Koster (2002), "Uncertainty-based noise reduction
and term selection in text categorization", ECIR 2002. ps.gz pdf
- C.H.A. Koster, P. Jones, M. Vogel and N.Gietema, "The Bootstrapping
Problem", presented at the SIGIR 02 Workshop on Operational Text
Categorization, Tampere, August 2002. ps.gz pdf
- C. Peters and C.H.A. Koster (2003), Uncertainty-based Noise Reduction
and Term Selection in Text Categorisation, International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS) Vol. 11, No. 1,
pp 115-137. ps.gz pdf
- C.H.A. Koster and M. Seutter (2003), Taming Wild Phrases, Proceedings
25th European Conference on IR Research (ECIR 2003), Springer LNCS 2633, pp
161-176. ps.gz
pdf
- (in Dutch) C.H.A. Koster, Automatische Document Klassificatie,
presentatie op de DOCUMENT 2003 beurs, Nijkerk, 17 juni 2003 pdf
- Nuria Bel, Cornelis H.A. Koster and Marta Villegas (2003), Cross-Lingual
Text Categorization, to appear in Proceedings ECDL 2003, Trondheim, August
2003. ps.gz pdf
- C.H.A. Koster, M.Seutter and J.G. Beney (INSA Lyon),
"Multi-Classification of Patent Applications with Winnow", to appear in
Proceedings PSI 2003, Novosibirsk, July 2003 ps.gz pdf
- from the preceding DORO project
- H. Ragas (CAP Gemini) and C.H.A. Koster, "Four text classification
algorithms compared on a Dutch corpus", Proceedings SIGIR 1997 ps.gz pdf
- C.H.A. Koster, C. Derksen, D. van de Ende and J. Potjer, "Normalization
and matching in the DORO project", Proceedings BCS IR conference 1999 ps.gz pdf
- Paula Santalla del Rio (USC), "An architecture for document routing in
Spanish: two language components, pre-processor and parser" ps.gz pdf
- related publications
- Avi Arampatzis, Jean Beney, C.H.A. Koster, Th.P. van der Weide,
"Incrementality, Decay, and Threshold Optimization for Adaptive Filtering
Systems", The Ninth Text REtrieval Conference (TREC-9), Gaithersburg,
Maryland, November 13-16, 2000. ps.gz pdf
- Christiaan Rudolfs, E@SLAVE -- an incremental approach to automated,
content-based email classification, Master's Thesis KU Nijmegen, August
2002. ps.gz pdf
Requests for information can be directed to
Cornelis H.A. Koster
Department of Computing Science University of Nijmegen 6525ED
Nijmegen, The Netherlands tel: +30.24.3653411 fax: +30.24.3553450
email: kees@cs.kun.nl
|