Outliner Software Forum RSS Feed Forum Posts Feed

Subscribe by Email

CRIMP Defined




How much data do you store.

< Next Topic | Back to topic list | Previous Topic >

Posted by Lothar Scholz
Feb 13, 2019 at 08:28 PM


Just to get an impression, about how much data an Info manager/personal knowledge application should be able to handle.
How many web pages do you store?
How many PDF files and how much own notes?

Do you use automatic download/webscrapping tools? I think thats the only way to get a lot of data. I have friends who filled their 30TB NAS with important videos from youtube (because you never know when the government takes them away :-)  ) But even all topics from this forum together are just 6MB.

What are the limits? The theoretical but more important the real world limits when it becomes to slow to use of the popular solutions like DevonThink, The Brain, Evernote, OneNote, MyInfo, RightNode, OmniOutliner etc. ?

Has any reviewer ever measured this?
Or are CRIMPers changing their tools so often that they can’t fill it with lots of data before their brain signals that it’s time to try something new better?



Posted by Paul Korm
Feb 13, 2019 at 10:30 PM


Interesting question.  With respect to DEVONthink—my most-frequently-used DEVONthink database contains 9 GB of data.  Split about 50/50 between PDFs and other types (lots of other types).

But for DEVONthink as an info manager, the limits are not so much the size of database, but how many words it contains.  DEVONthink is essentially a massive concordance.  The concordance is at the heart of the search and classify feature of the program.  So, in my database’s case, there are 82 million words in those 9 GB of documents.  That’s about 80% of the loosely-recommended max of around 100 million words.  (The recommendation is flabby—it changes frequently.)


Posted by MadaboutDana
Feb 14, 2019 at 09:32 AM


This is a very interesting question.

I am more and more wary of putting vast amounts of documentary data into any single database. That’s why I use Curiota for general information gathering.

My Curiota collection is about 7GB and steadily growing – but all files are stored separately, and I believe Curiota uses Spotlight as its search engine - it’s very fast and efficient, so I’m not complaining.

I have about 3GB in Notebooks - which, again, stores multiple files rather than creating databases - and I believe also uses Spotlight for searching. This works well, but isn’t very refined (no highlighted hits in the general search; you have to do separate searches in each document to isolate specific search terms).

My largest DEVONthink database is about 5GB, and while the search function is excellent, moving to the first “hit” in a large PDF can take a little time. Once DEVONthink has sorted itself out, it moves from hit to hit within a document extremely fast. But the initial loading takes a few seconds. However, nothing else has DEVONthink’s precision search facility…

... apart from FoxTrot, which is excellent, and in itself a very good argument for preserving data in separate files. I use the Pro version, which has an excellent Preview tool and moves from hit to hit like greased lightning. I can thoroughly recommend FoxTrot, even though they’re not doing a great job of marketing themselves. FoxTrot also has an iOS companion, rather ingeniously using just the text indexes created by the desktop version rather than syncing the entire mass of files. I can’t say I use it, but it’s a neat solution to a tricky problem.

The advantage of storing multiple separate files rather than relying on databases is the much greater ease of sharing across networks. Huge databases are notoriously difficult to transfer/sync, and awkward to back up. This isn’t such an issue when your corpus consists of lots of individual files, and the search index is held separately (either in Spotlight or in some proprietary format, doesn’t really matter).



Posted by Hugh
Feb 14, 2019 at 12:51 PM


I have about 11 Gb in 32 DEVONThink databases (all of which used to be contained in three databases, but I split them a couple of years ago the more easily to sync the data with DEVONThink To Go). I also have quite a few Gb (probably more than 50) in the macOS file system, which I am currently sorting through with a view to placing more within DEVONThink over time. Most are pdfs.


Posted by Hugh
Feb 14, 2019 at 12:52 PM


Oh, and I endorse Bill’s opinion of FoxTrot.


Back to topic list