speech input revisited
< Next Topic | Back to topic list | Previous Topic >
Posted by jimspoon
Jan 23, 2013 at 07:22 PM
Just wanted to inquire about everybody’s experiences (if any) in using voice memos as part of their “information processing”.
I don’t think there is any quicker way to capture information than to grab my voice recorder and quickly dictate a note. The problem is how to get that speech transcribed into text and then incorporated into to my information database.
I’ve experimented with using my Android phone and tablet for the purpose. Of course there are many ways to capture voice with different Android apps. And speech recognition is built into the keyboard. If you tap the mic key on the stock Android keyboard, it uploads your speech on the fly to the Google servers, which transcribe the speech; and after a short delay the resulting text is sent back to the phone and put into the app you’re working with. The accuracy is quite good. I also use the Swype keyboard which has the Nuance Dragon Go speech recognition built into it - it works a bit differently from the Android keyboard speech recognition. Hard to say which produces better results.
Apps can add functionality to what is built into the keyboard. For example, Evernote for Android gives you two different ways to input speech. You can make an audio note - in this case no speech recognition is performed, but the recording shows up in the note as an attached AMR file (very small). Or you can use the “Dictate” function. In this case Evernote uploads the voice recording to the Google Servers - the text is put back into the Evernote note, but also the sound file is also saved as an attachment to the note. This is very valuable, because it enables you to listen to the recording if the transcription wasn’t entirely accurate.
Problems - the transcription accuracy isn’t always the greatest. there are too many steps and screen taps involved for the user. You have to find the icon to start the app. Granted, Evernote lets you put a widget on the screen that immediately starts a “dictation” (with speech recognition) note. Then you have to wait for the servers to send back the text, and verify that it’s accurate. Right now Evernote has a nasty bug where speech is clipped out of the sound file that is inserted into the dictation note. So if you play back the sound in order to correct an inaccurate transcription, some of the recorded sound is simply missing - these words are lost.
Sometimes speed is of the essence, and you don’t have time to go through all that. In many cases it’s better to make a quick recording on a dedicated voice recorder. Right now I might make many recordings a day into my Sony voice recorder, then I connect it later in the day to my computer via USB. I copy the new files to a voice memos folder, and I use Bulk Rename Utility to rename them with a yyyy.mm.dd.hh.mm.ss + original name format. Then I browse through them in xplorer2 - listening to the recordings in the xplorer2’s quick viewer pane. I type notes into Ecco. Unfortunately this process is too time consuming as well, and I am looking for a way to get it done quicker.
I also use Dragon naturally Speaking. So far it is not very accurate in transcribing the voice memo files that I’ve copied over to my PC. Also, the Premium version of DNS does not do batch transcription of voice memo files. I’d have to get the very expensive Professional version for that - Nuance never misses an opportunity to gouge you. It is possible to drag and drop one voice file after another into the “Dragon Pad” - in this way you can get a transcription of many different recordings into a single file. And the Dragon Pad has a very nice feature - you can highlight a selection of the transcribed text, right click, and click “Play this Back”, and DNS will play back the corresponding audio, no matter which audio file it came from.
I wish DNS would find some way of capturing the filesystem time stamp from the voice memo file, and incorporate that into the transcription. Right now, if I wanted to have DNS include a timestamp on the file, I’d have to look at my watch and dictate it into each recording. This is inconvenient and I will often forget to do that.
So anyway that’s how I’m doing it right now - if anybody has come up with a better system, or app - to capture, transcribe, and incorporate voice input into your knowledge base, I’d love to hear about your experiences.
jim
Posted by Vincek
Jan 23, 2013 at 08:35 PM
Over the years I experimented with at least 3 earlier versions of Dragon NaturallySpeaking.
V12 was my 4th try—I figured eventually it would be good enough. And it is for my uses. Very few transcription errors.
A few hints:
* Get a good microphone. There are ratings on the company website.
* If you record, do so at the highest bit rate possible—don’t try to scrimp on space.
* Train the software. Earlier versions required a LOT of training for little improvement. V12 doesn’t require a lot but there’s a lot of bang for the buck.
* Keep it simple. There must be hundreds of commands in the software. I find that simply using “new paragraph” and “new line” is 90% sufficient for a good first draft. I then edit with a keyboard.
Posted by Gary Carson
Jan 27, 2013 at 03:46 PM
I’ve been using dictation for years and I’m trying to get to the point where I do all of my work or most of it anyway by voice. I’m dictating this right now, for example, using Dragon NaturallySpeaking version 10 and a Samson Airline 77 wireless headset microphone.
Anyway, dictation has its advantages and disadvantages. Its biggest advantage, like you said, is that it is the fastest way to capture information available right now. I carry a voice recorder around with me everywhere I go these days and I use it to dictate memos, task lists, contact information, brainstorming sessions, rough drafts for both fiction and nonfiction, outlines, the list goes on and on. I dictate while I’m driving around as well, using a headset microphone. Dictating with a voice recorder is so mobile that you can do it just about anywhere.
The disadvantage of using dictation is that audio files are linear. You can’t skip around in an audio file as easily as you can visually scan through a document. Also, as far as I know, there is no way to search an audio file for a specific spoken word or phrase. You can insert index marks into audio files to mark your position, but if you don’t know what you’re likely to need to look up later on, you probably won’t think to index that particular information. The best way to search an audio file is to transcribe it and then search the document.
I’m still on Dragon NaturallySpeaking version 10 (premium), but apparently there is a way to do batch processing of files with the version 12 premium edition IF you’re using one of the Olympus professional grade recorders like the new DS 7000. If you’re really serious about getting into dictation, you should get one of these professional recorders. They’re expensive, but well worth it, and they’re the only recorders that offer a full range of editing functions like insert and append.
Using Dragon NaturallySpeaking, you should be able to get around 98% accuracy with your voice recorder transcripts (after training a dedicated recorder profile.) You’ll need to get at least the premium version of Dragon to transcribe dictation made with a recorder. If your accuracy is much lower than 98%, then there’s something wrong either with your hardware or your software or your dictation technique. 98% is about as good as you can expect, though it is possible to get close to 100% sometimes—anything higher than 98% is mostly a function of how clearly you can dictate.
As for getting dictated information into a personal information manager of some kind, really the only way to do this is to transcribe the dictation first and then simply move it into the PIM. I found, however, that I never really need to transcribe at least 50% of the dictation that I do everyday. I just leave it on the recorder and play it back. For example, I record daily task lists, adding things that I need to do or want to remember as I go through the day, then I’ll play the recording back the next morning while I’m shaving and getting washed up. It works pretty well.
Dictation requires a certain amount of administration. There’s no way to avoid it. One thing I’ve found is that I can manually transcribe a lot of the dictation I do. For example, I try to keep my daily task lists down to a reasonable length. Once they start getting over five or six minutes long, I’ll play them back and transcribe them manually using either a pen and notebook or my laptop. Then I’ll zap the original list and record it over again, getting rid of all the tasks that I’ve completed. Also, I’ve learned to use a kind of verbal shorthand when I’m dictating stuff like this. The lists are really just memory cues. I try to keep them as short and to the point as possible.
Man, I have been going on and on here (something that’s easy to do with dictation.) Anyway, I would really recommend using dictation as much as you can. It’s minimalistic and fast and mobile. A real timesaver.
Posted by jimspoon
Jan 28, 2013 at 04:36 AM
Hi Vince and Gary, glad to read about your experiences. I’ve been looking around and experimenting a bit. I browsed around on the knowbrainer forum and read one of Gary’s posts there.
I did see a reply from the knowbrainer people that you can do batch recognition of multiple recordings, even with DNS Premium rather than DNS Pro, if you are using the Olympus DS-7000 ($500) or DS-3500 ($400). A bit steep for me! DNS Professional is very expensive too - though DNS 11 Pro is available on Ebay for $200 - http://tmpl.at/XL1kTg .
I’ve been using MergeMP3 - http://download.cnet.com/Merge-MP3/3000-2169_4-10410936.html - to combine all my recordings into one, and then dragging and dropping the merged file on to DragonPad. This eliminates the need to drag files one at a time to DragonPad. Works well so far. And I can divide up the transcribed recordings by saying “new line” or “new paragraph” at the start of each recording.
So far I’ve just done the initial voice recorder profile I made my dictating a suggested text, and haven’t done any further training yet. The accuracy isn’t great right now, but I think certainly good enough for my limited purposes.
Unfortunately I’ll have to look at my watch and dictate my timestamps into each recording, instead of being able to use the file timestamp, I can live with that.
Posted by Gary Carson
Jan 28, 2013 at 03:56 PM
I usually start each recording by dictating the date and time and giving a brief description of what the recording is about. Including the date and time is just a habit, though; it’s not really necessary. Most of the mid-range consumer-grade recorders and all of the pro-grade recorders come with software for downloading and managing audio files. This software will keep track of the timestamps for you. The Olympus dictation module, for instance, can be set up to display the time the recording was completed, the time it was downloaded, and a lot of other parameters like the file worktype, which is a key used to categorize the file.