Progam with QDA Qualitative Data Analysis features? Coding/tagging blocks of text?
View this topic | Back to topic list
Posted by Fredy
Aug 7, 2012 at 04:18 PM
Editors are much undervalued. Most people don’t realize that one of the beauties of editors is, you have data you can work upon in many different such tools, i.e. with editor 1 you do this, with editor 2 you do that - and you can automate / script almost everything, within those editors as in the concertation of such a workflow.
To whom it may concern, some hints:
- don’t be afraid of subroutines having to check for things or process 10k of lines, one after the other; a good editor will only take some seconds for millions of such computations…
- incl. checking checkboxes or such, i.e. you do (complete or partial) computations on / in a five-digit number of “lines” = records, all writing intermediate results into up-front “fields” of these lines, e.g. IF in line x ... and ... and ... (= could be 50 or more checks, comparisons, whatever) THEN fill up the “field” $e at the beginning of that line with “1”; ELSE fill it up with (or leave it at) “0”, so you would have a line starting with $e1;other field;other field, etc. or with $e0, etc., etc.
- the same works then for sorting / filtering, i.e. even if you have got “fields” within your “text” within these lines = records, you can SORT by these criteria, just check, and if yes, fill out a special “field” in front of your line; as said, the subroutine doing that special filling out (or a reset to zero there) would be perhaps 2 sec. for 10k of lines or even less on a modern pc
- you can do this with SEVERAL such “check fields” (or even with “de-doubled” data “fields”: Whenever a data “field” within a line becomes relevant for sorting, just de-double it in front of the line), and you know you can sort by several criteria, a, b, c: sort by c, then by b, then by a, and you’ll get a body sorted by a then b, then c; combine this with filtering and you’ll get only the relevant records (to check visually, to export, to print…), and in triply sorted order - remember for tenfold sorting, you can quickly write a routine to do it, then just fill in the criteria each time
- such de-doubling of data is one means to avoid having to cope with leading zeros, for example; if you’ve got 1- to 3 digit numbers, you’ll need subroutines putting zeros in front of 1- and 2-digit numbers, of course, if you want to sort by “fields” containing such numbers, or if you want to “block sort”, i.e. to sort by field contents AFTER such number “fields” - you avoid this by de-doubling such “fields” containing “regular” numbers (= without leading zeros) with special “fields” where you put such leading zeros where needed, but leaving the originals fields intact, OR you could use de-doubled number fields without leading zeros, but with the original numbers PLUS, say, 1,000: if you have numbers having 1- to 3-digits (1 to 999), add 1000 within the special field, so you’ll get invariably 4-digit numbers; the same goes for “original” data under the condition you’ll never forget which range the “real” original data was in, i.e. subtraction needed before publication…
- I spoke of statistical analysis when most editors just have rather basic number crunching. No problem, just remember, do everything as simply as possible WITHIN the records, i.e. put intermediate results there, and then, on these, “run” Excel or special sw (but don’t try to do text processing within Excel, and XL Notes’ a dead end - willing to put 10k of text paragraphs into 10k of XL Notes’ Word files within Excel cells? See what I mean when I so often declare things as “amateurish”?)
- It goes without saying that lots of text analysis can be half-automatted, i.e. if x near y and perhaps even z in th line then… and then, you’ll have that filtering in order to visually check (and undo your unwanted field “yes” or whatever settings by such rules): It’s far better to have 200 such “fields” set up automatically (or their content set to 1 from 0 or whatever), and have to check (in a filter table on the screen) and put off 10 such “fields” / settings manually then, than to code manually 190 of such records, out of perhaps 2000, in a pro like CT or whatever
- Even your Excel (or whatever your statistic tool is) results can then be re-introduced into your editor file, be it in front (append), or be it by replacing the values of the corresponding (let’s say first dozen or so of your) “fields” there, here again, a combination of “block processing” and the Excel export format can do wonders.
- For 80 or 180 codes, an editor is brilliant, especially so that whenever you see that you cannot preserve your code $e but need codes $ea, $et, etc., etc., there is “global replace”, when, as said, and that’s the thing non-editor users first must become aware of, 99 % of all processing will be done WITHIN single lines, but for 10k of lines 10k of times in a row - that’s what editors are for when doing data processing.
- Of course, for reading bits, you toggle to “word wrap” (buy a large screen anyway); the de-doubling of “fields”, btw, is a good way to “read” things = check visually for things on the screen, even for “field” contents where these original “fields” are further away “down into the length of the line”; you could even have some 3 or 4 “read fields” rather in front, into which at any given time you copy the contents of various, interesting fields upon which you wanna check.
- In my first post on this I meant, have your original data preserved in original form, i.e. if you have let’s say 50 conversation with 50 Mali women (or with whomever), do special lines ##mw01 or whatever, and then do a line for every paragraph in that conversation, in the form #mw01£001$code1$code1… up to #mw01£999$code1….. for (in this example) up to 1,000 lines for each conversation, etc., etc. i.e. have a “natural sort” of your original material you can anytime revert back to, and of course, and such editor can easily number those lines / paragraphs of each conversation or whatever then. So the very first such “fields” would be invariable, but as said, if you assure these start “fields” are of equal length, char-wise, you easily can sort on the followings fields
- Must I really add that you could use another special char whenever you must / want to combine several paragraphs in one line, and which will allow you for dissecting them automatically, afterwards? The same goes for text formatting (within the original, to be preserved, or for better presentation afterwards) - people doing stuff with CT don’t need the slightest hint from me how to do this, within an editor just the same.
- Etc, etc., there are lots of more possible hints. As soon as you got the strength of editors, many of commercial sw’s will become not only obsolete for you, but you will discard them for total inacceptability: An editor will do whatever you want it to do for you; commercial sw, most of the time, is “dumb crap” in comparison, or costs some Benjamins I’d be willing to spend… but then, I’d fear that quickly, I’d come to that wall beyond which the developers didn’t to their work as I want their product to be - whilst with an editor, I add a line a two to the script, and I’m done.
- Re KEdit: For a start with filtering, this program is pure gold; it’s just for more elaborate tasks I prefer better stuff.
- Re askSam: The beauty in that prog was that it was a “real application, ready for use”, and it had lots of editor-like features coming with such a “general public program”. Whilst en editor is an editor: beauty on the screen there is not.
But, Carrot, don’t you think that for purely esthetical reasons here, you, like Daly and some others, should avoid endless “citations” when they ain’t of any use?