Text Categorization

Determine the topic/category (for example: Sports, Politics, etc…) of a document . The categories maybe predefined by the user (a.k.a. classification) or automatically detected (a.k.a. clustering).

These are more formal definitions:

  • Text Classification: the task to assign a document to predefined categories (supervised training), based on its contents (for example spam filtering or sentiment analysis).
  • Text Clustering: unsupervised document classification, where the classification is done entirely without reference to external information.

Some practical application of text categorization include Spam Filtering and Sentiment Analysis.