Outliner Software Forum RSS Feed Forum Posts Feed

Subscribe by Email

CRIMP Defined

 

Tip Jar

Cross Comparison of documents

View this topic | Back to topic list

Posted by CRC
Feb 28, 2020 at 02:01 PM

 

This is a very interesting question. I think what it really leads you to is text mining or text analysis software. This is because you are probably not looking for character by character identical text, but text expressing particular concepts or ideas. You can find a number of these tools with a simple Google search for text mining.

As an example I once wanted to, given a corpus of documents representing proposals for past work by a particular company, find which one of those documents best matched the requirements in a request for a proposal. I experimented with this tool: https://gate.ac.uk/ . It turned into a very interesting and absorbing project, and while it was never used, it showed some real promise.

I will say that going down this road could be long and winding. If nothing else you will find it a incredibly engrossing and, perhaps rewarding.