Slightly Off-Topic: Organizing offline html files

Posted by Mirce
Oct 23, 2020 at 01:28 PM


Although not strictly related to the main topics of this forum, I would like the input of fellow forum members on how to organize offline html pages.

Since the excellent Firefox addon, Scrapbook X is dead (you can operate it on Waterfox or similar browsers, but its days are numbered) and the successor WebScrapbook requires a Phd to make it work as the old Scrapbook, I was searching for alternatives to save and organize interesting web-articles that I find.

Regarding the saving part, I have settled with SingleFile, a Chrome and Firefox addon which captures the whole page in one html file (including all elements - pictures, styles etc). It is much cleaner and more practicable than the normal browser “Save as” / HTML file approach, or printing to PDF.

However, after I have collected several dozens of those web-pages, I am looking for a way to organize them. By organizing, I envision a way to have a sort of TOC and ideally a search option (for whole text search of the articles), maybe even some way of tagging the articles by topic.

I have tried HelpNDoc (a help authoring application which exports web-sites which you can use offline, with a search option). However, due to the “compression” (URI, base64, whatever) of the SingleFile-files, the imported files have only the text, the pictures and the overall styles are missing.

Does anybody have any ideas on how to organize those files?


Posted by Franz Grieser
Oct 23, 2020 at 02:13 PM


Thanks for the hint to SingleFile. Great add-in.

You can combine it with NotebooksApp for Windows and Mac. However, NotebooksApp does not display the images included in the SingleFile html file. But I can select the “Show in Explorer” command in NotebooksApp and then double-click the html file to open it in Firefox/Chrome.


Posted by Kinook
Oct 23, 2020 at 02:47 PM


You could import (store or link or use folder synchronization) the files into Ultra Recall—https://kinook.com/UltraRecall/

On my computer, the SingleFile extensions saves the files to C:\Users\user\Downloads

I created a .urd file at C:\Users\user

In UR, and added a folder item with a URL of Downloads, then used Item | Synchronize to import the files into UR, where the files can be viewed, searched, tagged, etc.






Posted by Mirce
Oct 23, 2020 at 02:58 PM


You’re welcome,glad that it is useful for other people.

Thank you for the tip regarding NotebooksApp - i suppose it is the version by Alfons? It is a pitty that it also doesn’t show the saved page as it was (i.e. without pictures).

Some more info (maybe useful to other people): I also tried to manage the html files with a app called Snap2Html.
This i basically program to index the contents of external hdd’s and produce a nice html file with a search option (in just one html file!). It also allows you to link to the file directly.
So, what I was doing was the following: sorted the web-page html’s into thematic folders, run Snap2Html on the top folder and choose the option to link to the files. The exported html file (by Snap2Html) became an entry point or index of my collection,where even the file names could be searched. The linked web-articles open i the same browser window in all their glorywith pics and styles.
However, no full text search (on the contents of the individual web articles) and no way of tagging by using this approach.

I will keep on looking,grateful if you all could point me to some other solutions or workarounds.


Posted by Mirce
Oct 23, 2020 at 05:13 PM


Thank you for your suggestion. I see that you also use SingleFile, so I suppose the saved html files are shown correctly in the UltraRecall viewer? (with pictures,styles etc).
UR could be a solution,but as I see it, if I only link to the files from the urd base, i won’t have full text search, but I can tag and manage the collection while the articles remain as they are ( so i am not locked in).
If i import them into the database (urd file), i will have full text search, but the files will be locked in. Given that the html files saved with SingleFile are rather big, what about the scalability of the database format? How big can it get without bogging down (slowing down) UltraRecall?

After the Firefox/Scrapbook X experience i am very hesitant to rely on a locked in solution.

Given the ephemeral nature of web sites, I really wonder that there are so few possibilities to archive and manage interesting web-articles, at lest for the average user (i am aware of the warc format and various other approaches, but they lack the simplicity and straightforwardness of Scrapbook or SingleFile.
How do the other members of this forum keep interesting stuff found on the web?

