Quantcast
Channel: KNIME RSS
Viewing all articles
Browse latest Browse all 4157

Text mine the first N words of the text, discard the rest.

$
0
0

Hi,

I am interested to know if it is possible to keep and process only the first N words of a set of PDFs that I have parsed. I want to do this because I have a text containing many different dates, and I want to identify the date in which the text was written. Usually this appears somewhere at the beginning of the document after the headings and titles, hence only within the first 100 or so words. Would this be possible? 

Thanks, 

Vigile


Viewing all articles
Browse latest Browse all 4157

Trending Articles