Text file INDEX generator (c) T.Jennings 7/21/81
You can do anything you want with this program except
sell it. Give it to anyone who wants it. Address bugs,
suggestions, etc. to:
Tom Jennings
221 W. Springfield St.
Boston MA 02118
Leave me a message at NECS CBBS.
INDEX is a utility for use with WordStar, and generates
an alphabetically sorted index for a file. Words or phrases
to be put in the indexed are marked with control characters
not used elswhere within WordStar. (At least as of version
1.01) If a file is later edited, invoking INDEX again will
remove the old index, produce a new one, and add it to the
end of the file.
INDEX can also be use with any non-WordStar text editor
that can insert control characters into the text. No other
assumptions are made about the contents of the file, except
that the file is terminated by a control-Z character
(correct way) or end of file.
INDEX scans the text file for certain WordStar "dot
commands", such as page breaks, etc., in order to maintain
proper page numbers. If no page "dot" commands are found, as
with other editors, pages are counted internally.
There are two different kinds of index entries; WORDS
and PHRASES. WORDS are what are normally thought of as
words; groups of characters, seperated by spaces, commas
carriage returns (called CR from now on) or linefeeds (LF).
PHRASES are groups of words, including the spaces that
seperate the words.
Since words are easy to find, only a single marker is
necessary to identify them. This marker is a control-K
character, ^K. Phrases must have both ends marked, and
control-P is used, ^P. Below are some examples:
The sixth word in this ^Ksentence will be put in the index.
^PThis entire phrase will be there^P, also.
.cp8
Since this is page 2 of the manual, the index for these
should look like:
Sentence...................................... 2
This entire phrase............................ 2
These two examples are actually in the index at the end
of this manual.
WordStar dot commands
INDEX is optimized for use with WordStar. By default,
it scans the file for "dot commands"; notably .pa and
"..index". .PA is used to count pages, and must be the first
word on the line to be counted as a dot command.
The "..index" is created and used by INDEX. As defined
in the WordStar manual, any line beginning with two dots
(..) will be ignored when printed. INDEX uses this to mark
the beginning of the index. When INDEX is run, if it finds
the "..index" line, it will remove all text following that
line. This allows creating an index for an updated file that
already has an index. If one was not found, it is added.
CAUTION: NEVER put a ".." WordStar dot command followed
by index, as described above. All text following this line
will be deleted from the file. A single space after the ..
will suffice, or use .IG instead.
Sorting
As stated before, the index generated is sorted
alphabetically. The entire phrase or word is used in
sorting, except that case is ignored.
If identical entries are found, they are listed on a
single line, followed by all page numbers found on.
Unfortunately, multiple identical page numbers will be
listed. For clarity, some examples of how things work
follows.
The following two phrases are equivalent, as case is
ignored, and will be listed on one line. The first occurence
will be the entry on the left side of the page.
This is the first phrase
THIS IS THE FIRST PHrAsE
Since length counts, these next are all in proper order.
This
This is
This is what
Side effects and cautions
This is a list of implementation peculiarities, etc.
-In general, any group of one or more white-space characters
(see below) are converted into a single space character.
Phrases with embedded spaces will have all extra spaces
(more than one) removed. A phrase may start and end on
different lines (or even pages) and will work properly.
Leading spaces will be removed from the index entry.
-The following characters are converted to and treated as a
single ASCII space character. These also mark the end of a
word:
CR LF tab comma (,) semicolon (;)
colon (:) suprise-mark (!)
-BUG NOTICE Periods are removed from the character stream.
This was a cheap way out since it is a sentence-terminator.
The only time this is a problem is when putting things in
the index such as filenames. (i.e., FILENAME.TYP) If someone
complains, it will probably get fixed.
-Words and phrases will have any leading spaces removed. The
first character of any word or phrase will be converted to
upper case. Note that if a phrase consists of a single
blank, it will NOT be removed from the index. This does not
count for words, of course, as the next word that comes
along will be indexed.
-Because of wonderful CP/M, and the fact that some of it's
utilities use end-of-file instead of a control-Z character
to terminate text, INDEX cannot detect the following read
errors: unwriten random record, zero length.
-INDEX sorts in ASCII order. Digits, quotes, parenthesis,
etc come before letters.
-The sort routine used is horrible. It uses a bubble sort,
with extra unnecessary exchanges. Didn't require much
thought, though.
Colon................................... 4
Comma................................... 4
Control-Z............................... 4
CP/M.................................... 4
CR...................................... 4
Embedded spaces......................... 4
End-of-file............................. 4
Examples................................ 2
Filenames............................... 4
INDEX................................... 1
Leading spaces.......................... 4, 4
LF...................................... 4
Non-WordStar text editor................ 1
Periods................................. 4
PHRASES................................. 2
Semicolon............................... 4
Sentence................................ 2
Side effects and cautions............... 4
Suprise-mark............................ 4
Tab..................................... 4
This entire phrase will be there........ 2
White-space characters.................. 4
WORDS................................... 2
WordStar................................ 1
WordStar "dot commands"................. 1
WordStar dot commands................... 2
^K...................................... 2
^P...................................... 2