Week #17 – Corpus

April 12, 2016Word of the Week

A corpus is a body of documents to be used in a text mining task. Some corpuses are standard public collections of documents that are commonly used to benchmark and tune new text mining algorithms. More typically, the corpus is a body of documents for a specific text mining task – e.g. a set of maintenance tickets, or a group of discovery documents in a legal case, for which a classification model is needed.