RFC - New Software Project: Infosqueezer

Posted by Lothar Scholz
Sep 11, 2019 at 09:12 PM

> Sounds interesting. And you have a track record of creating and
>maintaining a stable app :-)

No. I do not. But i think we all learned to buy software never on promised features and timelines but always on what is available right now.
Products come and go, single developer or big corporation. This is not a kickstarter project.
I don’t ask for your money in advance, i ask for your thoughts.

I’m an experienced programmer and do it since i’m 14 years old and now have hit my 50ths birthday a few months ago.
So i think i’m at least more qualified then the guy from Polywick Storyserver.

Oh yeah, my german computer science master thesis was writing a search engine for usenet news. It was used by the once popular german search
engine called “Fireball” in the early days of the internet in 1998. And my interest for information processing never stopped afterwards.
>First question(s): You talk about a database. Will the data be stored in
>a proprietary file format? What about the PDF files and HTML data that
>can be added? Where will they be stored? And what about images, audio,
>video, equations…?

There is no “database”. I like the NoSQL “movement” because they have shown the world that SQL and relational databases are not the only way to do things.

I have developed a preprocessed format to store the markup text and index the data field / hashtag parts. This is good enough. The markdown of cards and outlines will be keept completely in memory (mmapped so it can be swapped out by the system on memory pressure) without special indexes. The data size is hardly a problem. Let this be a few hundert megabytes but even a few gigabytes will be ok. Just remember all threads and messages in this board have less then 20 MB in size. So people often overestimate this a lot.

By the way exactly this question was why in feb this year i asked here: https://www.outlinersoftware.com/topics/viewt/8580

The data itself is written generational, so only the modified delta is stored to reduce write operations on SSD.

Because the program will run purely in single user mode on your own local database on your SSD there is no need for database optimizations. We have disks with transfer rates of 3GB/sec now and CPUs with a 40GB memory throughput with 8 and more cores in mainstream desktops and even phones. It’s time to use them.
The program will not be cloud based but i want implement a Peer2Peer synchronization feature or an on premise synchronisation server.

PDF and HTML will be stored externally and so will any full text index. HTML snapshots are stored in a proprietary format to eliminate duplicate items.

I know very well that some people here love to have their data in the file system as normal markdown so that it can be accessed via Spotlight etc. Therefore i thought about storing a duplicate of the data in the filesystem or the very overengineered but fun idea to implement a custom user file system that gets mounted via FUSE and could provide very interesting access pattern to the stored data. Just for the case anyone want to run a script on them or import them elsewhere. Anyone old enough to remember the MH mail client? That was nerd fun. But there is no FUSE on windows so i doubt it will happen.

Video and audio ... they will be implemented as simple file links, nothing else on the agenda at the moment.

For equations, i looked at the way how ConnectedText handles Latex. It is opensource and i think i could integrate that. But it’s not on my agenda at the moment either, but i say it has a much higher probability to get on my agenda then many other features. In the second round of the markdown editor tables and equations will be added. But this is 2+ years in the future.