Hi,
I am interested to know if it is possible to keep and process only the first N words of a set of PDFs that I have parsed. I want to do this because I have a text containing many different dates, and I want to identify the date in which the text was written. Usually this appears somewhere at the beginning of the document after the headings and titles, hence only within the first 100 or so words. Would this be possible?
Thanks,
Vigile