TextQuest - Text Analysis Software

Last update: 1. June 2006


Overview


Input formats


Each digitized (machine readable) text can be easily converted for TextQuest, also converting texts from text processors like WORD, WordPerfect and others can be done easy and fast. There are 6 different input formats (2 with and 4 without control sequences) available. Additionally external variables of the text can be defined, up to 50 are possible. These can be numbers or strings up to 10 characters each.


Word lists, word sequences, and word permutations


A word list sorted by alphabet gives an overview of all strings occuring in the text and their frequency, sort order tables can be used so that umlauts, characters with diacritics or accents are sorted properly. Also case folding can be enabled or disabled. One can also exclude strings due to their frequency (using absolute or relative values) and their length (in characters). Also using a list of exclusion words (STOP-words) is possible, these strings are excluded from further processing. Statistics provided include the TTR, also calculating the dynamics of the TTR is available for the whole text or a sample of it.


Manipulation of word lists, word sequences, and word permutations


The use of the word list in a content analysis is to find strings that can be used for the building of categories. The word list contains only single words, but no combinations of words. word sequences and word permutations can be generated with TextQuest, thus allowing to define search patterns that consist of more than a word or any part of it. Word sequences consist of sequences of words (at least 2) with a variable number of words. Word permutations are two-word sequences where each word of a text unit is combined with each word that follows.


Comparision of word lists, word sequences, and word permutations


Also the comparision of word lists, word sequences, and word permutations can be done, and the following statistics are calculated: TTR, inclusive and exclusive tokens, grouping of the tokens by words, numbers, and other strings.

If the context given with word sequences does not suffice, one can use KWIC (Key-Word-In-Context) lists with a variable number of characters. Identifiers can be suppresed to gain more context each line. A line can be as long as you wish, and if you route the KWIC-results to a file, your text processor can format this file so that a KWOC (Key-Word-Out of-Context) is built consisting of multiple lines.

Also an index (or cross reference list) can be generated that shows the identifiers for each occurence of each string, interactive selection of entries is possible.


Content analysis


TextQuest was originally developed for computer aided content analysis. Search patterns for a category system can be words, parts of it, word sequences, and sequences of (parts of) words (so called word root chains). These are strings - up to 6 - that must occur within the same text unit, one can specify their sequence and their distance. Also wild cards may be used. Ambiguous and negated search patterns can be coded interactively on the screen, the coding process can be controlled with several log files:

TextQuest works with category labels that forces you to document your categories and makes the usage more comfortable because these labels are used in interactive coding, for the variable labels in the setups for statistical data analysis software (e.g. SAS or SPSS), and in the log files.

Negated search entries are detected, if indicators for negations are found before or after the search patterns. The negation indicators are stored in files and can be altered (e.g to adapt them to other languages). Also multiple negations are recognised.

The differences between automatic and interactive coding is measured with the ICRC (interactive coding reliability coefficient). Also the generation of setups for SAS, SPSS and ConClus (Cluster analysis) enlighten the statistical analysis. The word list of uncoded tokens contains all strings that were not used for coding, these can be used for extending an existing category system and is of special use while coding open ended questions.


Readability analysis


One approach for a first test on readability are readability analyses with formulas. These are based on syntactic criteria like sentence length, word length, etc. TextQuest offers 39 different formulas for different text genres (manuals, normal text, news, children's books) and languages (English, French, Spanish, German, Dutch, Danish, and Swedish). Also statistics like frequencies and means for number of words, sentences, syllables were added and new compared to the old INTEXT module.


Help system


TextQuest has a help system that is context sensitive, but there is also a part that serves as a tutorial. Parts of the manual are integrated in the help system. It is available in English and German.


Versions and licenses


TextQuest licenses are available for different kind of users. The test version differs from the full version e.g. in the maximum file size of 100 text units for the system file, and printing is disabled. The installation is easy, just start the downloaded file and follow its instructions. Then click on the TextQuest icon to start TextQuest. Go to the project menu and select one of the sample files (sport or contakt). Then go to the file menu and generate a system file, the basis for all later analyses. You will find more information in the readme file. Please report any problems you may have.

There are versions for students, universities, and also multiple user versions (e.g. for a whole company). Manuals can be obtained for a small charge, this will be fully credited if you order a full version.

Your suggestions are always welcome, don't hesitate to mail me and tell me what does not work and what to improve.


What does TextQuest look like?


Have a look at the following screenshots (added July, 5, 2001), some of them are quite large:

  • files menu
  • generate system file
  • fixed format (e.g. reading from data bases)
  • vocabulary menu
  • word list
  • word sequences list
  • word permutations list
  • cross references
  • vocabulary growth
  • select project name
  • analysis menu
  • perform content analysis
  • interactive coding in a content analysis
  • test category system
  • For a demonstration of TextQuest I compared 3 speeches on foreign policy of the US presidential candidates (Bush, Gore, and McCain). There are several screenshots that show how the windows of TextQuest can be arranged to compare word lists:

    1. word list comparision
    2. word list comparision
    3. word list comparision
    4. word list comparision
    5. word list comparision
    6. word list comparision
    7. word list comparision
    8. word list comparision
    9. word list comparision


    What operating systems are supported?


    TextQuest will run under Windows but also under other operating systems, but not any more under MS-DOS and Win3.x. Versions for the Macintosh and Linux are planned, also versions for other operating system can be made available: AIX, Solaris, HP-UX, OS/2. If your operating system is not supported, please contact me, maybe a port is possible if a C/C++ compiler is available on your system.


    © Social Science Consulting, 1999-2006