Each digitized (machine readable) text can be easily converted
for TextQuest, also converting texts from text processors like WORD, WordPerfect
and others can be done easy and fast. There are 6 different input formats
(2 with and 4 without control sequences) available.
Additionally external variables of the text can be defined,
up to 50 are possible. These can be numbers or strings up to 10 characters each.
|
Word lists, word sequences, and word permutations |
A word list sorted by alphabet gives an overview of all strings occuring in the
text and their frequency, sort order tables can be used so that umlauts,
characters with diacritics or accents are sorted properly. Also case folding can
be enabled or disabled. One can also exclude strings due to their frequency
(using absolute or relative values) and their length (in characters). Also
using a list of exclusion words (STOP-words) is possible, these strings are
excluded from further processing. Statistics provided include the TTR, also
calculating the dynamics of the TTR is available for the whole text or a sample of it.
|
Manipulation of word lists, word sequences, and word permutations |
The use of the word list in a content analysis is to find strings that can be
used for the building of categories. The word list contains only single words,
but no combinations of words. word sequences and word permutations can
be generated with TextQuest, thus allowing to define search patterns that consist
of more than a word or any part of it. Word sequences consist of sequences
of words (at least 2) with a variable number of words. Word permutations
are two-word sequences where each word of a text unit is combined with each word that follows.
|
Comparision of word lists, word sequences, and word permutations |
Also the comparision of word lists, word sequences, and word permutations
can be done, and the following statistics are calculated: TTR, inclusive and
exclusive tokens, grouping of the tokens by words, numbers, and other strings.
If the context given with word sequences does not suffice, one can use KWIC
(Key-Word-In-Context) lists with a variable number of characters. Identifiers
can be suppresed to gain more context each line. A line can be as long as you wish,
and if you route the KWIC-results to a file, your text processor can format
this file so that a KWOC (Key-Word-Out of-Context) is built consisting of
multiple lines.
Also an index (or cross reference list) can be generated that shows the
identifiers for each occurence of each string, interactive selection of entries
is possible.
TextQuest was originally developed for computer aided content analysis. Search
patterns for a category system can be words, parts of it, word sequences, and
sequences of (parts of) words (so called word root chains). These are strings
- up to 6 - that must occur within the same text unit, one can specify their
sequence and their distance. Also wild cards may be used. Ambiguous and
negated search patterns can be coded interactively on the screen, the coding
process can be controlled with several log files:
- file of text units containg potentially ambigious search patterns
- file of uncoded text units
- file of text units with negations
- complete control of the coding process
TextQuest works with category labels that forces you to document your categories
and makes the usage more comfortable because these labels are used in interactive
coding, for the variable labels in the setups for statistical data analysis
software (e.g. SAS or SPSS), and in the log files.
Negated search entries are detected, if indicators for negations are found
before or after the search patterns. The negation indicators are stored in
files and can be altered (e.g to adapt them to other languages). Also multiple
negations are recognised.
The differences between automatic and interactive coding is measured with the
ICRC (interactive coding reliability coefficient). Also the generation of
setups for SAS, SPSS and ConClus
(Cluster analysis) enlighten the statistical analysis.
The word list of uncoded tokens contains all strings that were not used for
coding, these can be used for extending an existing category system and is of
special use while coding open ended questions.
One approach for a first test on readability are readability analyses with formulas. These are based
on syntactic criteria like sentence length, word length, etc. TextQuest offers 39 different formulas
for different text genres (manuals, normal text, news, children's books) and languages (English,
French, Spanish, German, Dutch, Danish, and Swedish). Also statistics like frequencies and means for number of words, sentences, syllables were added and new compared to the old INTEXT module.
TextQuest has a help system that is context sensitive, but there is also a part
that serves as a tutorial. Parts of the manual are integrated in the help
system. It is available in English and German.
TextQuest licenses are available for different kind of users.
The test version differs from
the full version e.g. in the maximum file size of 100 text units for the
system file, and printing is disabled.
The installation is easy, just start the downloaded file and follow its instructions.
Then click on the TextQuest icon to start TextQuest. Go to the project menu and select one of the sample
files (sport or contakt). Then go to the file menu and generate a system file, the basis for all later
analyses. You will find more information in the readme file. Please report any problems you may have.
There are versions for students, universities, and also multiple user versions
(e.g. for a whole company). Manuals can be obtained for a small charge, this
will be fully credited if you order a full version.
Your suggestions are always welcome, don't hesitate to
mail me and tell me what does not work and
what to improve.
|
What does TextQuest look like? |
Have a look at the following screenshots (added July, 5, 2001), some of
them are quite large:
files menu
generate system file
fixed format (e.g. reading from data bases)
vocabulary menu
word list
word sequences list
word permutations list
cross references
vocabulary growth
select project name
analysis menu
perform content analysis
interactive coding in a content analysis
test category system
For a demonstration of TextQuest I compared 3 speeches on foreign policy of
the US presidential candidates (Bush, Gore, and McCain). There are several
screenshots that show how the windows of TextQuest can be arranged to compare word lists:
1. word list comparision
2. word list comparision
3. word list comparision
4. word list comparision
5. word list comparision
6. word list comparision
7. word list comparision
8. word list comparision
9. word list comparision
|
What operating systems are supported? |
TextQuest will run under Windows but also under other operating systems,
but not any more under MS-DOS and Win3.x. Versions for the Macintosh and Linux
are planned, also versions for other operating system can be made available:
AIX, Solaris, HP-UX, OS/2. If your operating system is not supported, please
contact me, maybe a port is possible if a C/C++ compiler is available on your system.
© Social Science Consulting, 1999-2006