The joys of web archiving

Posted by MadaboutDana
Aug 14, 2014 at 12:14 PM


I’m an obsessive collector of web articles and pages, and am always on the lookout for tools that enable me to collect/extract/analyse/search through/archive/organise web pages of all kinds.

On the PC, my favourite apps were Notebooks and OneNote – I have extensive web page collections in both.

On the Mac, however, there is a positive embarras de choix. Not least because so many apps enable ‘Print to PDF’ services in the standard system print dialog box.

I started with GrowlyNotes, but as mentioned elsewhere, found the search function too slow, and also found it surprisingly difficult to move pages from the scrapbook (where they are ‘printed’ by default) into other notebooks. Shame, I love the app otherwise.

So I moved on. I discovered Yojimbo, which is remarkably useful and easy to use, and have already saved several hundred pages there. I also experimented with Together. Both of them have iOS apps, and this cross-platform support was a key factor in choosing them.

But their iOS apps are not really very good. Yojimbo’s iOS search function is rudimentary, and Together too often fails to synchronise at all. I know the developer of GrowlyNotes is working on an iOS app, but it won’t appear for a while yet.

What about Curio? Or Scrivener? Both support web archives (Scrivener has a wonderful ‘Add web page’ function, Curio has its ‘Sleuth’ function).

But Curio is unlikely to appear as an iOS app in the near future, or even in the medium term. And Scrivener’s lovely developer has just posted a lengthy blog article detailing the trials and tribulations of producing an iOS app that does full justice to the desktop version (worth a read: http://www.literatureandlatte.com/blog/?p=405), and explaining that it’s unlikely to be available until 2015 (although it sounds as if it’ll be brilliant, especially on iPad).

So. A bit frustrating. And then I discovered Stache.

Now, I have Stache on my iPad, but I haven’t yet experimented with the synchronisation function – I shall be doing that in the near future. But the Mac app is so splendid I thought I would write about it anyway.

Stache is similar to Yojimbo, except that it’s optimised for web archiving. It has nice little extensions you can use on almost any browser (I have the ones for Safari and Opera – my preferred web browser for research purposes – and they both work extremely well, even when Stache itself isn’t open). It saves web pages as screenshots, but also as web archives – Amazing. And its search function is extremely quick. No, it doesn’t highlight search terms as such, but it zeroes in on web pages instantaneously. It also has a very good tagging facility, although I’m still rather baffled by the ‘collections’ concept (appears to be like folders, but doesn’t seem to work very well).

So it’s got flaws – so what? It’s unbelievably quick and convenient, and you can export web pages either as web archives or as screenshots, so your data isn’t trapped in Stache.

It also appears to be in ongoing development, which is encouraging.

For researchers, academics, writers and anybody who has to build up knowledge bases at speed, I can thoroughly recommend it. I shall continue to explore it (especially the cross-platform aspect) and update my findings as and when.



Posted by Hugh
Aug 14, 2014 at 02:00 PM


What follows may or may not be of use to relative Mac newbies - but here goes.

Seven years of experience on the Mac has taught me to seek out individual applications for individual tasks, rather than the sort of “Jack-of-all-trades” software that I used focus on during the umpteen years I used a PC. Of course this approach may not be straightforward. It may not be immediately obvious what precisely are the tasks that need to be done. And because it is in the interests of those who market programmes to suggest that they are very capable of carrying out one hundred-and-one different jobs, it may also not be immediately obvious which tasks particular programmes are best at doing.

For example, when I bought my Mac I also bought a well-known programme that looked as if it could perform several of the roles I wanted covered. This was a mistake. It could not. It was a false economy. It was not even very good at the one thing it had originally been designed for. So my view evolved that, at least for the Mac, one should buy programmes that are more narrowly targeted.  On the Mac this policy of using specialist programmes for specialist tasks is helped by several characteristics of all Mac applications and the OS X operating system: for example, the “Services” menu of every application, the ease of using the computer to make tasks more straightforward via programmes such as Automator, the “Print to PDF” function every Mac programme has, and the similarities between the user-interfaces of all, or almost all, Mac applications.  In the long run, I don’t believe this policy of seeking specialist programmes for one or two major types of task each has proved to be more financially costly for me - because specialist programmes can do particular tasks better and faster than the Jack-of-all-trades generalists.

So I’ve used a variety of applications for short-form writing, most recently Ulysses and Bean - although Scrivener can do short-form too (I still keep all my business letters in it). And although Scrivener can fulfil the function of web-page collection and storage, there are several much better ways of doing this (see below), at least when the volume of data to be held increases. Scrivener’s raison-d’etre is long-form writing, that is what it has the tools for, and that is what I now chiefly use it for.

For storage/file/document management, there is as you’ve found, Yojimbo. As far as I’m concerned Yojimbo’s main weakness as a file/document manager is the absence of nested folder hierarchies. But there are several others of the same type of file/document storage manager which do use nested folders, including - as I think you mention - Together, Eaglefiler, iDocument and of course DevonThink (in its Pro Office version, the most sophisticated - and expensive - of the group). Some of these encourage their use for writing too, but other than for off-the-cuff jobs, I wouldn’t recommend them - certainly not for long days in the writing saddle.

And then there’s the Finder. On the Mac, the Finder itself is now becoming more of a rival of those file/document managers than it used to be. Now that under Mavericks, the Finder can handle tags, it has become sensible not to put files or web archives in separate “cages”, as the programmes I’ve listed above usually do, but simply keep them in the operating-system file system. (If you want to enhance the Finder’s capabilities further, TotalFinder, XtraFinder, Path Finder, Leap or Yep can be very useful add-ons.) Keeping files and folders in the Finder also makes it slightly more straightforward to export files or folders to the Cloud and/or tablets. (It’s notable that David Sparks in his ebook on going Paperless - which is worth reading for all sorts of reasons even if you’re not going paperless - also recommends “staying in” the Finder for file and document storage.) Also, as you learn more about the Mac, you’ll hear more about Hazel, a wonderful little application which can do a lot of your filing for you and save you considerable time and effort. But Hazel won’t work (yet) inside the folder systems of the document/file managers like Together and DevonThink. So far, it only works within the Finder - yet another reason to stay there.

Finally - about Curio. Curio is more or less a one-off, its closest rival perhaps being Growly Notes - but even so GN is not very close at all. (I’d hoped that OneNote for the Mac would be competition for Curio, but was sadly disappointed when OneNote was launched.) Curio can do storage, up to a point, but it is not about storage. Curio can do writing in a very limited way but it is not about writing. Curio can do mindmaps, but it is not really about mind maps. As I see it, Curio is for visualising ideas in numerous ways - in my case before the writing stage, indeed largely before the outlining stage. Although it’s a terrific tool, if you don’t have that sort of need, Curio’s (relatively high) cost may not be worth paying, for you.


Posted by Hugh
Aug 14, 2014 at 02:08 PM


I should of course have added, Bill, thanks for the recommendation of Stache. But can it work offline?


Posted by MadaboutDana
Aug 14, 2014 at 02:16 PM


Excellent, Hugh – thanks. You mention EagleFiler, which is an interesting but horribly flawed app with a totally useless search function (finds very little at all).

But your contention that one should stick with Finder, or store information at the folder/file level, is interesting. This is, of course, what Notebooks does. And in fact, despite my truly obsessive CRIMPing tendencies, this is also what I do myself: since discovering the astonishing FoxTrot Pro, I keep all of my key reference documents in simple Finder folders organised by client, creating a full-text index for each one as required.

FoxTrot Pro is far more powerful than the PC equivalent, Copernic Desktop (or the other PC competitors like X1). I have developed a huge respect for it. Now if it also allowed you to make spontaneous notes to accompany files, it would be, no doubt, the perfect solution (others would probably want to add tagging etc.). Of course you can do that anyway, using other software – there are so many lovely notetakers for the Mac, such as Ulysses, Moccanote, Metanota Pro, Write, etc., many of which can be instructed to keep notes in specific (Finder) folders.

Which means our working methods are not unalike. I use Curio/Scrivener (depending on my whim at any given moment) to create what you might call “projects”, often including source documents as well as draft translations, notes, reference docs such as PDFs, web pages and the like. But when I put together my resource archives, using final versions of the source and target documents, I always end up storing them in Finder, and not in Curio or Scrivener or any other all-in-one solution. In Finder, they can be accessed by anything – so if I need specific resources for a specific project, I can always pull them into Curio/Scrivener/Other Authoring Tool of your Choice, and if necessary, even leave them there.

It’s the sheer flexibility I love! And it’s tools like FoxTrot (or that elegant, super-cheap alternative EasyFind, ironically provided for free by the makers of DevonThink) – not to mention the improved, upcoming version of Spotlight – that help maintain this flexibility.



Posted by MadaboutDana
Aug 14, 2014 at 02:16 PM


Hm. That’s a good question about Stache. Not sure! I’ll check it out…


