Alternate document:   Details   Text Classification by Bootstrapping with Keywords, EM and Shrinkage (99) Andrew McCallum, Kamal Nigam

Participate in an experiment about recommending research papers

Text Classification from Labeled and Unlabeled Documents using EM (1999)  (Make Corrections)  (52 citations)
Kamal Nigam, Andrew Mccallum, Sebastian Thrun, Tom Mitchell
Machine Learning

  Home/Search   Context   Related
 
View or download:
cmu.edu/~knigam/pap...emcatmlj99.ps.gz
wustl.edu/~zy/./paper/nigam99text.ps
Cached:  PS.gz  PS  PDF  DjVu  Image  Update  Help

From:  cmu.edu/~knigam/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: . This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naive ... (Update)

Context of citations to this paper:   More

.... as a way to utilize the relatively small amount of available annotated data along with much larger collections of unannotated data [1,9]. However, it is as yet unclear whether these methods are effective other than in cases where we have relatively small amounts of...

.... proposed in [29] 4 Related Work Many methods have been proposed for text classification (e.g. [22, 21, 15, 6, 11, 12, 5, 31, 25, 28, 13, 3, 9, 14, 16, 24, 10, 27, 32, 17, 23]) We describe here two typical non rule based methods and two typical rule based method. We also...

Cited by:   More
Combining Clustering and Co-training to Enhance Text - Classification Using..   (Correct)
Marshalling Evidence Through Data Mining in Support of.. - Barbará, Nolan, Sood   (Correct)
PEBL: Positive Example Based Learning for Web Page - Classification Using Svm   (Correct)

Similar documents (at the sentence level):   More
75.8%:   Text Classification from Labeled and Unlabeled.. - Nigam, McCallum.. (1999)   (Correct)
27.9%:   Using Unlabeled Data to Improve Text Classification - Nigam (2001)   (Correct)
8.1%:   Using EM to Classify Text from Labeled and Unlabeled Documents - Nigam (1998)   (Correct)

Active bibliography (related documents):   More   All
0.2:   A Comparison of Event Models for Naive Bayes Text Classification - McCallum, Nigam (1998)   (Correct)
0.2:   On the Value of Partial Information - Ratsaby, Maiorov (1998)   (Correct)
0.2:   Efficient Web Spidering with Reinforcement Learning - Rennie, McCallum (1999)   (Correct)

Users who viewed this document also viewed:   More   All
0.2:   Improving Text Classification by Shrinkage in a.. - McCallum, Rosenfeld, .. (1998)   (Correct)
0.2:   An Evaluation of Statistical Approaches to Text Categorization - Yang (1997)   (Correct)
0.2:   A Gentle Tutorial of the EM Algorithm and its Application to.. - Bilmes (1998)   (Correct)

Similar documents based on text:   More   All
0.9:   Combining Labeled and Unlabeled Data for Text.. - Rayid Ghani Center   (Correct)
0.7:   Text Classification with Enhanced Semi-Supervised - Girish Keswani And   (Correct)
0.7:   Exploiting Unlabeled Data in Ensemble Methods - Bennett, Demiriz (2002)   (Correct)

Related documents from co-citation:   More   All
25:   Combining labeled and unlabeled data with co-training - Blum, Mitchell - 1998
17:   Maximum Likelihood from Incomplete Data via the EM Algorithm (context) - Dempster, Laird et al. - 1977
17:   Transductive inference for text classification using support vector machines - Joachims - 1999

BibTeX entry:   (Update)

Nigam, K.; McCallum, A.; Thrun, S.; and Mitchell, T. 1999. Text classification from labeled and unlabeled documents using EM. Machine Learning. To appear. http://citeseer.ist.psu.edu/nigam99text.html   More

@article{ nigam00text,
    author = "Kamal Nigam and Andrew K. McCallum and Sebastian Thrun and Tom M. Mitchell",
    title = "Text Classification from Labeled and Unlabeled Documents using {EM}",
    journal = "Machine Learning",
    volume = "39",
    number = "2/3",
    pages = "103--134",
    year = "2000",
    url = "citeseer.ist.psu.edu/nigam99text.html" }
Citations (may not include all citations):
1646   Maximum likelihood from incomplete data via the EM algorithm (context) - Dempster, Laird et al. - 1977
603   Machine Learning (context) - Mitchell - 1997   Book Details from Amazon or Barnes & Noble  
211   Text categorization with Support Vector Machines: Learning w.. - Joachims - 1998
207   A Probabilistic Theory of Pattern Recognition (context) - Devroye, Gyorfi et al. - 1996
181   Relevance feedback in information retrieval (context) - Rocchio - 1971
174   A universal prior for integers and estimation by minimum des.. (context) - Rissanen - 1983
139   Relevance weighting of search terms (context) - Robertson, Sparck-Jones - 1976
109   Learning to extract symbolic knowledge from the World Wide W.. - Craven, DiPasquo et al. - 1998
99   An evaluation of statistical approaches to text categorizati.. - Yang - 1998
97   Webert: Identifying interesting Web sites (context) - Pazzani, Muramatsu et al. - 1996
95   Newsweeder: Learning to filter netnews - Lang - 1995
91   A sequential algorithm for training text classifiers: Corrig.. - Lewis - 1995
91   A sequential algorithm for training text classifiers - Lewis, Gale - 1994
89   Hierarchically classifying documents using very few words - Koller, Sahami - 1997
81   Bayesian classification (context) - Cheeseman, Stutz - 1996
75   the optimality of the simple Bayesian classifier under zero-.. - Domingos, Pazzani - 1997
73   Context-sensitive learning methods for text categorization - Cohen, Singer - 1996
58   at forty: The independence assumption in information retriev.. (context) - Lewis - 1998
58   Supervised learning from incomplete data via an EM approach - Ghahramani, Jordan - 1994
57   An evaluation of phrasal and clustered representations on a .. (context) - Lewis - 1992
52   Expert network: Effective and efficient learning from human .. (context) - Yang - 1994
38   Data Mining and Knowledge Discovery (context) - Friedman - 1997
35   Feature selection in statistical learning of text categoriza.. (context) - Yang, Pederson - 1997
31   Mixture Models (context) - McLachlan, Basford - 1988
30   Combining classifiers in text categorization - Larkey, Croft - 1996
28   Committee-based sampling for training probabilistic classifi.. - Dagan, Engelson - 1995
24   the exponential value of labeled samples (context) - Castelli, Cover - 1995
21   Employing EM in pool-based active learning for text classifi.. - McCallum, Nigam - 1998
20   A new metric-based approach to model selection - Schuurmans - 1997
18   The effect of unlabeled samples in reducing the small sample.. (context) - Shahshahani, Landgrebe - 1994
18   A mixture of experts classifier with learning based on both .. (context) - Miller, Uyar - 1997
17   Improving text clasification by shrinkage in a hierarchy of .. (context) - McCallum, Rosenfeld et al. - 1998
16   Improving the mean field approximation via the use of mixtur.. - Jaakkola, Jordan - 1998
16   The relative value of labeled and unlabeled samples in patte.. (context) - Castelli, Cover - 1996
14   Document classification using a finite mixture model (context) - Li, Yamanishi - 1997
11   Learning from a mixture of labeled and unlabeled examples wi.. - Ratsaby, Venkatesh - 1995
10   overfitting (context) - Ng - 1997
9   Intelligent agents for web-based tasks: An advice-taking app.. - Shavlik, Eliassi-Rad - 1998
6   Optimal rate of convergence for finite mixture models (context) - Chen - 1995
4   Elements of Information Theory (context) - SIGIR, Proceedings et al. - 1991
3   Threading electronic mail: A preliminary study (context) - SIGIR, Proceedings et al. - 1997
2   A Baysian approach to filtering junk e-mail (context) - Sahami, Dumais et al. - 1998
2   A comparison of two learning algorithms for text categorizat.. (context) - Processing, -- et al. - 1994
2   A probabilistic analysis of the Rocchio algorithm with TFIDF.. (context) - Publishers - 1997
1   Developments in automatic text retrieval (context) - AAAI, http et al. - 1991
1   Best-first model merging for hidden Markov model induction (context) - AAAI, http et al. - 1994
1   Approximate statistical tests for comparing supervised class.. (context) - Verlag, Dietterich - 1998
1   Theory of point estimation (context) - FROM, UNLABELED et al. - 1983



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.cmu.edu/~knigam/):   More
Learning to Extract Symbolic Knowledge from the World.. - Craven, DiPasquo.. (1998)   (Correct)
Learning to Extract Symbolic Knowledge from the World.. - Craven, DiPasquo.. (1998)   (Correct)
Pool-Based Active Learning for Text Classification - Nigam, McCallum (1998)   (Correct)

Online articles have much greater impact   More about CiteSeer.PSU   Add search form to your site   Submit documents    

CiteSeer.PSU - Copyright NEC and IST