SYSTEMS AND METHODS FOR ASYMMETRICAL FORMATTING OF WORD SPACES ACCORDING TO THE UNCERTAINTY BETWEEN WORDS
First Claim
1. A method for determining an uncertainty across a word space in text, comprising the steps of:
- a) providing text input;
b) providing a database of function words;
d) examining a plurality of words of the text input;
e) identifying each of the plurality of words as one of the function words in the database or as a content word if the word being identified is not in the database;
f) generating n-gram frequency counts for each unique pseudo-syntactic hybrid, wherein each of the unique pseudo-syntactic hybrids is an n-gram composed of at least one of the following;
a lexical identity, a lexeme, a lexical category, and an open-class word;
h) repeating steps d-f for a next plurality of words until end text input is reached; and
g) using the n-gram frequency counts to compute the uncertainty for each of the unique pseudo-syntactic hybrids;
wherein the lexical identity is the word, the lexeme is the set of forms a word can take, the lexical category is a part of speech of the word, and the open-class word is a content word that lacks syntactic information.
1 Assignment
0 Petitions
Accused Products
Abstract
Asymmetrical formatting of word spaces according to the uncertainty between words includes an initial filtering process and subsequent text formatting process. An equivocation filter generates a mapping of keys and values (output) from a corpus or word sequence frequency data (input). Text formatting process for asymmetrically adjusts the width of spaces adjacent to keys using the values. The filtering process, which generates a mapping of keys and values can be performed once to analyze a corpus and once generated, the key-value mapping can be used multiple times by a subsequent text processing process.
-
Citations
22 Claims
-
1. A method for determining an uncertainty across a word space in text, comprising the steps of:
-
a) providing text input; b) providing a database of function words; d) examining a plurality of words of the text input; e) identifying each of the plurality of words as one of the function words in the database or as a content word if the word being identified is not in the database; f) generating n-gram frequency counts for each unique pseudo-syntactic hybrid, wherein each of the unique pseudo-syntactic hybrids is an n-gram composed of at least one of the following;
a lexical identity, a lexeme, a lexical category, and an open-class word;h) repeating steps d-f for a next plurality of words until end text input is reached; and g) using the n-gram frequency counts to compute the uncertainty for each of the unique pseudo-syntactic hybrids; wherein the lexical identity is the word, the lexeme is the set of forms a word can take, the lexical category is a part of speech of the word, and the open-class word is a content word that lacks syntactic information. - View Dependent Claims (2, 3)
-
-
4. A system for determining an uncertainty across a word space in text, comprising:
-
a database of function words; a counter for generating frequency counts for each unique pseudo-syntactic hybrids, wherein pseudo-syntactic hybrids are composed of at least one of the following;
a lexical identity, a lexeme, a lexical category, and an open-class word; anda filter for computing lexical uncertainties across the word spaces of pseudo-syntactic hybrids using the generated frequency counts; wherein the lexical identity is the word, the lexeme is the set of forms a word can take, the lexical category is a part of speech of the word, and the open-class status of the lexical item is a content word that lacks syntactic information. - View Dependent Claims (5)
-
-
6. A method for formatting text, comprising:
-
providing text input; providing a mapping input of keys and values, the keys each indicating at least one of the unique pseudo-syntactic hybrids, and the values indicating the uncertainties across word spaces adjacent to the keys; and examining the text input to look for the keys in the mapping input and formatting widths of the adjacent spaces of the text input based on the outcome of the examining, wherein the formatting of the widths of the adjacent spaces of the text input is determined by the values. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer program product for formatting text, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
a first portion configured to provide a text; a second portion configured to provide a mapping input of keys and values, each of the keys indicating at least one pseudo-syntactic hybrid, and each of the values indicating the uncertainty across a word space adjacent to the key; and a third executable portion configured to examine the text input to look for the keys in the mapping input and formatting widths of between-word spaces of the text input that is based on an outcome of the examination, wherein the formatting of the widths of the between-word spaces is determined by the value.
-
-
22. In a computer system, having a display, and a method of displaying text, comprising the steps of:
-
a) creating a list of all instances of a word wherein the character preceding the word includes at least one of;
a space, a beginning of the word, a beginning of a line, a beginning of a paragraph, a beginning of a document, a tab, an indent, or a punctuation character;b) for each of the words in the list from step a, looking up the word (n) and a subsequent word (n+1) that immediately follows the word (n) in an adjustment score library, wherein the word and the subsequent word that follows are separated by a space character; and c) if found in the adjustment library, then adjusting the width of the space character using an adjustment score found for a word bigram of the word and the subsequent word in the adjustment library; d) setting n to n+1; and e) repeating steps b-d for all items in the list created in step a.
-
Specification