Determine the topic/category (for example: Sports, Politics, etc…) of a document . The categories maybe predefined by the user (a.k.a. classification) or automatically detected (a.k.a. clustering).
These are more formal definitions:
- Text Classification: the task to assign a document to predefined categories (supervised training), based on its contents (for example spam filtering or sentiment analysis).
- Text Clustering: unsupervised document classification, where the classification is done entirely without reference to external information.
Some practical application of text categorization include Spam Filtering and Sentiment Analysis.