Method and apparatus for text analysis
First Claim
1. An electronic system for the analysis of digitally encoded text, such text including sequential strings of characters separated by spaces, such system comprisingbuffer means for receiving and storing coded text,means for examining each successive character sequence of the text to determine word and punctuation portions thereof and identify each character sequence having a terminal period,abbreviation table means for storing a table of common abbreviations with an indication if an abbreviation is of a non-terminating type,look-up means for checking a character sequence having a terminal period against the abbreviation table and determining its type, and sentence boundary determining means in communication with the look-up means and operative on successive character sequences for determining whether a sentence boundary occurs at an occurrence of a terminal period and wherein a discrimination means identifies and diverts from the sentence boundary determining means said non-terminating abbreviations.
13 Assignments
0 Petitions
Accused Products
Abstract
An electronic text analyzer operates on an ordered block of digitally coded text by analyzing sequential strings thereof to determine paragraph and sentence boundaries. Each string is broken down into component words. Possible abbreviations are identified and checked against a table of common abbreviations to identify abbreviations which cannot end a sentence. End punctuation and the following string are analyzed to identify the terminal word of a sentence. When sentence boundaries have been determined, the test may be further processed by a grammar checker, a readability analyzer, or other higher-level text processing system.
A preferred embodiment includes a readability analyzer having a syllable counter for determining the number of syllables in each word. The system includes a modified common-word table having an empirical syllable-count field. A checker first determines if a word is in the table and, if so, returns its syllable count. An exception table identifies words not conforming to a syllable counting algorithm. Each word not in the common-word or exception tables is modified, and the modified word is processed to derive its syllable count. In a preferred embodiment, tallies are kept of words per sentence, syllable count, sentences per paragraph, and similar data, and readability scores based on the tallies are displayed.
-
Citations
42 Claims
-
1. An electronic system for the analysis of digitally encoded text, such text including sequential strings of characters separated by spaces, such system comprising
buffer means for receiving and storing coded text, means for examining each successive character sequence of the text to determine word and punctuation portions thereof and identify each character sequence having a terminal period, abbreviation table means for storing a table of common abbreviations with an indication if an abbreviation is of a non-terminating type, look-up means for checking a character sequence having a terminal period against the abbreviation table and determining its type, and sentence boundary determining means in communication with the look-up means and operative on successive character sequences for determining whether a sentence boundary occurs at an occurrence of a terminal period and wherein a discrimination means identifies and diverts from the sentence boundary determining means said non-terminating abbreviations.
-
14. A system for the analysis of digitally encoded text including sequential strings of characters separated by white space, such system comprising
means for receiving and storing digitally encoded text word table means for storing a table of common words together with a syllable count indication for a stored word word identifier means for examining successive strings of coded text and identifying each word therein, and syllable counter means for determining the syllable count of each identified word, wherein the syllable counter means includes means for checking an identified word against the word table and returning, for a word stored therein, the syllable count indication.
-
22. A method for analyzing digitally encoded text, such text including sequential strings of characters separated by spaces, the method comprising the steps of:
-
receiving and storing coded text, storing a table of abbreviations together with an indication of non-terminating abbreviations, examining in sequence character strings of the text to recognize a string including a period, checking a recognized character string against the abbreviation table to identify non-terminating abbreviations, operating on successive character strings not so identified to determine a sentence boundary, identifying successive white space codes in the text, and comparing the identified codes to a nominal document spacing to determine paragraph boundaries of the text. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A method for analyzing digitally encoded text including sequential strings of characters separated by white space, the method comprising the steps of
receiving and storing digitally encoded text storing a table of words together with a syllable count indication for a stored word examining successive strings of coded text and identifying each word therein, and checking an identified word against the table to determine its syllable count indication thereby determining the syllable count of the identified word.
-
42. An electronic system for the analysis of digitally encoded text, such encoded text including white-space code and character code, such system comprising
sentence splitting means operating on successive strings of encoded text for determining sentence boundaries of the text paragraph identifier means operative solely on white-space code of the text for determining paragraph boundaries text fragment identifier means for identifying text fragments by comparing the sentence boundaries and the paragraph boundaries, and identifying as a text fragment text having a paragraph boundary preceding a sentence boundary.
Specification