Text Classification from Labeled and Unlabeled Documents using EM - Nigam, McCallum, Thrun, Mitchell (ResearchIndex)

Alternate document: Details Text Classification by Bootstrapping with Keywords, EM and Shrinkage (99) Andrew McCallum, Kamal Nigam

Participate in an experiment about recommending research papers

Text Classification from Labeled and Unlabeled Documents using EM (1999) (Make Corrections) (52 citations)
Kamal Nigam, Andrew Mccallum, Sebastian Thrun, Tom MitchellMachine Learning

Home/Search Context Related

View or download:
cmu.edu/~knigam/pap...emcatmlj99.ps.gz
wustl.edu/~zy/./paper/nigam99text.ps
Cached: PS.gz PS PDF DjVu Image Update Help

From: cmu.edu/~knigam/ (more)
(Enter author homepages)

Rate this article:

(best)
Comment on this article

(Enter summary)
Abstract: . This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naive ... (Update)

Context of citations to this paper: More

.... as a way to utilize the relatively small amount of available annotated data along with much larger collections of unannotated data [1,9]. However, it is as yet unclear whether these methods are effective other than in cases where we have relatively small amounts of...

.... proposed in [29] 4 Related Work Many methods have been proposed for text classification (e.g. [22, 21, 15, 6, 11, 12, 5, 31, 25, 28, 13, 3, 9, 14, 16, 24, 10, 27, 32, 17, 23]) We describe here two typical non rule based methods and two typical rule based method. We also...

Cited by: More
Combining Clustering and Co-training to Enhance Text - Classification Using.. (Correct)
Marshalling Evidence Through Data Mining in Support of.. - Barbará, Nolan, Sood (Correct)
PEBL: Positive Example Based Learning for Web Page - Classification Using Svm (Correct)

Similar documents (at the sentence level): More
75.8%: Text Classification from Labeled and Unlabeled.. - Nigam, McCallum.. (1999) (Correct)
27.9%: Using Unlabeled Data to Improve Text Classification - Nigam (2001) (Correct)
8.1%: Using EM to Classify Text from Labeled and Unlabeled Documents - Nigam (1998) (Correct)

Active bibliography (related documents): More All
0.2: A Comparison of Event Models for Naive Bayes Text Classification - McCallum, Nigam (1998) (Correct)
0.2: On the Value of Partial Information - Ratsaby, Maiorov (1998) (Correct)
0.2: Efficient Web Spidering with Reinforcement Learning - Rennie, McCallum (1999) (Correct)

Users who viewed this document also viewed: More All
0.2: Improving Text Classification by Shrinkage in a.. - McCallum, Rosenfeld, .. (1998) (Correct)
0.2: An Evaluation of Statistical Approaches to Text Categorization - Yang (1997) (Correct)
0.2: A Gentle Tutorial of the EM Algorithm and its Application to.. - Bilmes (1998) (Correct)

Similar documents based on text: More All
0.9: Combining Labeled and Unlabeled Data for Text.. - Rayid Ghani Center (Correct)
0.7: Text Classification with Enhanced Semi-Supervised - Girish Keswani And (Correct)
0.7: Exploiting Unlabeled Data in Ensemble Methods - Bennett, Demiriz (2002) (Correct)

Related documents from co-citation: More All
25: Combining labeled and unlabeled data with co-training - Blum, Mitchell - 1998
17: Maximum Likelihood from Incomplete Data via the EM Algorithm (context) - Dempster, Laird et al. - 1977
17: Transductive inference for text classification using support vector machines - Joachims - 1999

BibTeX entry: (Update)

Nigam, K.; McCallum, A.; Thrun, S.; and Mitchell, T. 1999. Text classification from labeled and unlabeled documents using EM. Machine Learning. To appear. http://citeseer.ist.psu.edu/nigam99text.html More

@article{ nigam00text,
    author = "Kamal Nigam and Andrew K. McCallum and Sebastian Thrun and Tom M. Mitchell",
    title = "Text Classification from Labeled and Unlabeled Documents using {EM}",
    journal = "Machine Learning",
    volume = "39",
    number = "2/3",
    pages = "103--134",
    year = "2000",
    url = "citeseer.ist.psu.edu/nigam99text.html" }

Citations (may not include all citations):
1646 Maximum likelihood from incomplete data via the EM algorithm (context) - Dempster, Laird et al. - 1977
603 Machine Learning (context) - Mitchell - 1997 Book Details from Amazon or Barnes & Noble
211 Text categorization with Support Vector Machines: Learning w.. - Joachims - 1998
207 A Probabilistic Theory of Pattern Recognition (context) - Devroye, Gyorfi et al. - 1996
181 Relevance feedback in information retrieval (context) - Rocchio - 1971
174 A universal prior for integers and estimation by minimum des.. (context) - Rissanen - 1983
139 Relevance weighting of search terms (context) - Robertson, Sparck-Jones - 1976
109 Learning to extract symbolic knowledge from the World Wide W.. - Craven, DiPasquo et al. - 1998
99 An evaluation of statistical approaches to text categorizati.. - Yang - 1998
97 Webert: Identifying interesting Web sites (context) - Pazzani, Muramatsu et al. - 1996
95 Newsweeder: Learning to filter netnews - Lang - 1995
91 A sequential algorithm for training text classifiers: Corrig.. - Lewis - 1995
91 A sequential algorithm for training text classifiers - Lewis, Gale - 1994
89 Hierarchically classifying documents using very few words - Koller, Sahami - 1997
81 Bayesian classification (context) - Cheeseman, Stutz - 1996
75 the optimality of the simple Bayesian classifier under zero-.. - Domingos, Pazzani - 1997
73 Context-sensitive learning methods for text categorization - Cohen, Singer - 1996
58 at forty: The independence assumption in information retriev.. (context) - Lewis - 1998
58 Supervised learning from incomplete data via an EM approach - Ghahramani, Jordan - 1994
57 An evaluation of phrasal and clustered representations on a .. (context) - Lewis - 1992
52 Expert network: Effective and efficient learning from human .. (context) - Yang - 1994
38 Data Mining and Knowledge Discovery (context) - Friedman - 1997
35 Feature selection in statistical learning of text categoriza.. (context) - Yang, Pederson - 1997
31 Mixture Models (context) - McLachlan, Basford - 1988
30 Combining classifiers in text categorization - Larkey, Croft - 1996
28 Committee-based sampling for training probabilistic classifi.. - Dagan, Engelson - 1995
24 the exponential value of labeled samples (context) - Castelli, Cover - 1995
21 Employing EM in pool-based active learning for text classifi.. - McCallum, Nigam - 1998
20 A new metric-based approach to model selection - Schuurmans - 1997
18 The effect of unlabeled samples in reducing the small sample.. (context) - Shahshahani, Landgrebe - 1994
18 A mixture of experts classifier with learning based on both .. (context) - Miller, Uyar - 1997
17 Improving text clasification by shrinkage in a hierarchy of .. (context) - McCallum, Rosenfeld et al. - 1998
16 Improving the mean field approximation via the use of mixtur.. - Jaakkola, Jordan - 1998
16 The relative value of labeled and unlabeled samples in patte.. (context) - Castelli, Cover - 1996
14 Document classification using a finite mixture model (context) - Li, Yamanishi - 1997
11 Learning from a mixture of labeled and unlabeled examples wi.. - Ratsaby, Venkatesh - 1995
10 overfitting (context) - Ng - 1997
9 Intelligent agents for web-based tasks: An advice-taking app.. - Shavlik, Eliassi-Rad - 1998
6 Optimal rate of convergence for finite mixture models (context) - Chen - 1995
4 Elements of Information Theory (context) - SIGIR, Proceedings et al. - 1991
3 Threading electronic mail: A preliminary study (context) - SIGIR, Proceedings et al. - 1997
2 A Baysian approach to filtering junk e-mail (context) - Sahami, Dumais et al. - 1998
2 A comparison of two learning algorithms for text categorizat.. (context) - Processing, -- et al. - 1994
2 A probabilistic analysis of the Rocchio algorithm with TFIDF.. (context) - Publishers - 1997
1 Developments in automatic text retrieval (context) - AAAI, http et al. - 1991
1 Best-first model merging for hidden Markov model induction (context) - AAAI, http et al. - 1994
1 Approximate statistical tests for comparing supervised class.. (context) - Verlag, Dietterich - 1998
1 Theory of point estimation (context) - FROM, UNLABELED et al. - 1983

The graph only includes citing articles where the year of publication is known.

Documents on the same site (http://www.cs.cmu.edu/~knigam/): More
Learning to Extract Symbolic Knowledge from the World.. - Craven, DiPasquo.. (1998) (Correct)
Learning to Extract Symbolic Knowledge from the World.. - Craven, DiPasquo.. (1998) (Correct)
Pool-Based Active Learning for Text Classification - Nigam, McCallum (1998) (Correct)

Online articles have much greater impact More about CiteSeer.PSU Add search form to your site Submit documents