Erabaki

Pimiento

UPV - EHU

PIMIENTO

Pimiento means Platform Independent Text Mining Engine Tool, and it is a framework for Text Mining.

Generally speaking, Text Mining (sometimes also called Intelligent Text Analysis, Text Data Mining or Knowledge-Discovery in Text) consists of the discovery of previously unknown information from existing resources. Text Mining is related to Data Mining, which intends to extract useful patterns from structured text or data usually stored in large database repositories. Instead, Text Mining searches for patterns in unstructured natural language texts (e.g. books, articles, e-mail messages, web pages, etc.). Text Mining is a multidisciplinary field that includes many tasks such as Information Retrieval, Text Analysis, Clustering, Categorisation, Summarisation, etc.

Text Mining is generally found useful in environments where large collections of text documents are handled. One of the well-known premises of using Text Mining is that the value obtained by mining text documents is directly proportional to the value of those documents. The more important the knowledge contained in the document collection, the more value will be derived.

FEATURES

These are the main features of Pimiento:

DOCUMENTATION

The paper Mining Text with Pimiento published by IEEE Internet Computing describes in certain detail Pimiento.

DOWNLOAD

Although Pimiento is in a fairly stable state regading bugs and code quality, it is not directly downloadable and it does not have any licence in particular. However, you can obtain Pimiento under request if you belong to a non-profit research or academic institution. I am mostly interested in establishing collaborations with other researchers or developers interested in text-mining infrastructure. Also, I usually ignore email sent from free providers such as Hotmail, Yahoo, Gmail, etc.

Copyright © Juan José García Adeva