Alternate document: Details Text Classification by Bootstrapping with Keywords, EM and Shrinkage (99 ) Andrew McCallum, Kamal Nigam
(Enter summary)
Abstract: . This paper shows that the accuracy of learned text classifiers can be improved
by augmenting a small number of labeled training documents with a large pool of unlabeled
documents. This is important because in many text classification problems obtaining training
labels is expensive, while large quantities of unlabeled documents are readily available.
We introduce an algorithm for learning from labeled and unlabeled documents based on the
combination of Expectation-Maximization (EM) and a naive ... (Update)
Context of citations to this paper: More .... as a way to utilize the relatively small amount of available annotated data along with much larger collections of unannotated data [1,9] . However, it is as yet unclear whether these methods are effective other than in cases where we have relatively small amounts of... .... proposed in [29] 4 Related Work Many methods have been proposed for text classification (e.g. [22, 21, 15, 6, 11, 12, 5, 31, 25, 28, 13, 3, 9, 14, 16, 24, 10, 27, 32, 17, 23] ) We describe here two typical non rule based methods and two typical rule based method. We also... Cited by: More
Combining Clustering and Co-training to Enhance Text - Classification Using..
(Correct)
Marshalling Evidence Through Data Mining in Support of.. - Barbará, Nolan, Sood
(Correct)
PEBL: Positive Example Based Learning for Web Page - Classification Using Svm
(Correct)
Similar documents (at the sentence level): More
75.8% : Text Classification from Labeled and Unlabeled.. - Nigam, McCallum.. (1999)
(Correct)
27.9% : Using Unlabeled Data to Improve Text Classification - Nigam (2001)
(Correct)
8.1% : Using EM to Classify Text from Labeled and Unlabeled Documents - Nigam (1998)
(Correct)
Active bibliography (related documents): More All
0.2 : A Comparison of Event Models for Naive Bayes Text Classification - McCallum, Nigam (1998)
(Correct)
0.2 : On the Value of Partial Information - Ratsaby, Maiorov (1998)
(Correct)
0.2 : Efficient Web Spidering with Reinforcement Learning - Rennie, McCallum (1999)
(Correct)
Users who viewed this document also viewed: More All
0.2 : Improving Text Classification by Shrinkage in a.. - McCallum, Rosenfeld, .. (1998)
(Correct)
0.2 : An Evaluation of Statistical Approaches to Text Categorization - Yang (1997)
(Correct)
0.2 : A Gentle Tutorial of the EM Algorithm and its Application to.. - Bilmes (1998)
(Correct)
Similar documents based on text: More All
0.9 : Combining Labeled and Unlabeled Data for Text.. - Rayid Ghani Center
(Correct)
0.7 : Text Classification with Enhanced Semi-Supervised - Girish Keswani And
(Correct)
0.7 : Exploiting Unlabeled Data in Ensemble Methods - Bennett, Demiriz (2002)
(Correct)
Related documents from co-citation: More All
25 : Combining labeled and unlabeled data with co-training
- Blum, Mitchell - 1998
17 : Maximum Likelihood from Incomplete Data via the EM Algorithm (context) - Dempster, Laird et al. - 1977
17 : Transductive inference for text classification using support vector machines
- Joachims - 1999
BibTeX entry: (Update)
Nigam, K.; McCallum, A.; Thrun, S.; and Mitchell, T. 1999. Text classification from labeled and unlabeled documents using EM. Machine Learning. To appear. http://citeseer.ist.psu.edu/nigam99text.html More @article{ nigam00text,
author = "Kamal Nigam and Andrew K. McCallum and Sebastian Thrun and Tom M. Mitchell",
title = "Text Classification from Labeled and Unlabeled Documents using {EM}",
journal = "Machine Learning",
volume = "39",
number = "2/3",
pages = "103--134",
year = "2000",
url = "citeseer.ist.psu.edu/nigam99text.html" }
Citations (may not include all citations):
1646
Maximum likelihood from incomplete data via the EM algorithm (context) - Dempster, Laird et al. - 1977
603
Machine Learning (context) - Mitchell - 1997 Book Details from Amazon or Barnes & Noble
211
Text categorization with Support Vector Machines: Learning w..
- Joachims - 1998
207
A Probabilistic Theory of Pattern Recognition (context) - Devroye, Gyorfi et al. - 1996
181
Relevance feedback in information retrieval (context) - Rocchio - 1971
174
A universal prior for integers and estimation by minimum des.. (context) - Rissanen - 1983
139
Relevance weighting of search terms (context) - Robertson, Sparck-Jones - 1976
109
Learning to extract symbolic knowledge from the World Wide W..
- Craven, DiPasquo et al. - 1998
99
An evaluation of statistical approaches to text categorizati..
- Yang - 1998
97
Webert: Identifying interesting Web sites (context) - Pazzani, Muramatsu et al. - 1996
95
Newsweeder: Learning to filter netnews
- Lang - 1995
91
A sequential algorithm for training text classifiers: Corrig..
- Lewis - 1995
91
A sequential algorithm for training text classifiers
- Lewis, Gale - 1994
89
Hierarchically classifying documents using very few words
- Koller, Sahami - 1997
81
Bayesian classification (context) - Cheeseman, Stutz - 1996
75
the optimality of the simple Bayesian classifier under zero-..
- Domingos, Pazzani - 1997
73
Context-sensitive learning methods for text categorization
- Cohen, Singer - 1996
58
at forty: The independence assumption in information retriev.. (context) - Lewis - 1998
58
Supervised learning from incomplete data via an EM approach
- Ghahramani, Jordan - 1994
57
An evaluation of phrasal and clustered representations on a .. (context) - Lewis - 1992
52
Expert network: Effective and efficient learning from human .. (context) - Yang - 1994
38
Data Mining and Knowledge Discovery (context) - Friedman - 1997
35
Feature selection in statistical learning of text categoriza.. (context) - Yang, Pederson - 1997
31
Mixture Models (context) - McLachlan, Basford - 1988
30
Combining classifiers in text categorization
- Larkey, Croft - 1996
28
Committee-based sampling for training probabilistic classifi..
- Dagan, Engelson - 1995
24
the exponential value of labeled samples (context) - Castelli, Cover - 1995
21
Employing EM in pool-based active learning for text classifi..
- McCallum, Nigam - 1998
20
A new metric-based approach to model selection
- Schuurmans - 1997
18
The effect of unlabeled samples in reducing the small sample.. (context) - Shahshahani, Landgrebe - 1994
18
A mixture of experts classifier with learning based on both .. (context) - Miller, Uyar - 1997
17
Improving text clasification by shrinkage in a hierarchy of .. (context) - McCallum, Rosenfeld et al. - 1998
16
Improving the mean field approximation via the use of mixtur..
- Jaakkola, Jordan - 1998
16
The relative value of labeled and unlabeled samples in patte.. (context) - Castelli, Cover - 1996
14
Document classification using a finite mixture model (context) - Li, Yamanishi - 1997
11
Learning from a mixture of labeled and unlabeled examples wi..
- Ratsaby, Venkatesh - 1995
10
overfitting (context) - Ng - 1997
9
Intelligent agents for web-based tasks: An advice-taking app..
- Shavlik, Eliassi-Rad - 1998
6
Optimal rate of convergence for finite mixture models (context) - Chen - 1995
4
Elements of Information Theory (context) - SIGIR, Proceedings et al. - 1991
3
Threading electronic mail: A preliminary study (context) - SIGIR, Proceedings et al. - 1997
2
A Baysian approach to filtering junk e-mail (context) - Sahami, Dumais et al. - 1998
2
A comparison of two learning algorithms for text categorizat.. (context) - Processing, -- et al. - 1994
2
A probabilistic analysis of the Rocchio algorithm with TFIDF.. (context) - Publishers - 1997
1
Developments in automatic text retrieval (context) - AAAI, http et al. - 1991
1
Best-first model merging for hidden Markov model induction (context) - AAAI, http et al. - 1994
1
Approximate statistical tests for comparing supervised class.. (context) - Verlag, Dietterich - 1998
1
Theory of point estimation (context) - FROM, UNLABELED et al. - 1983
The graph only includes citing articles where the year of publication is known. Documents on the same site (http://www.cs.cmu.edu/~knigam/): More
Learning to Extract Symbolic Knowledge from the World.. - Craven, DiPasquo.. (1998)
(Correct)
Learning to Extract Symbolic Knowledge from the World.. - Craven, DiPasquo.. (1998)
(Correct)
Pool-Based Active Learning for Text Classification - Nigam, McCallum (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.PSU Add search form to your site Submit documents
Feedback: citeseer-f eedback at ist dot psu dot edu CiteSeer.PSU - Copyright NEC and IST