SYSTEMS AND METHODS FOR ASYMMETRICAL FORMATTING OF WORD SPACES ACCORDING TO THE UNCERTAINTY BETWEEN WORDS

US 20180039617A1
Filed: 03/08/2016
Published: 02/08/2018
Est. Priority Date: 03/10/2015
Status: Active Grant

First Claim

Patent Images

1. A method for determining an uncertainty across a word space in text, comprising the steps of:

a) providing text input;

b) providing a database of function words;

d) examining a plurality of words of the text input;

e) identifying each of the plurality of words as one of the function words in the database or as a content word if the word being identified is not in the database;

f) generating n-gram frequency counts for each unique pseudo-syntactic hybrid, wherein each of the unique pseudo-syntactic hybrids is an n-gram composed of at least one of the following;

a lexical identity, a lexeme, a lexical category, and an open-class word;

h) repeating steps d-f for a next plurality of words until end text input is reached; and

g) using the n-gram frequency counts to compute the uncertainty for each of the unique pseudo-syntactic hybrids;

wherein the lexical identity is the word, the lexeme is the set of forms a word can take, the lexical category is a part of speech of the word, and the open-class word is a content word that lacks syntactic information.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Asymmetrical formatting of word spaces according to the uncertainty between words includes an initial filtering process and subsequent text formatting process. An equivocation filter generates a mapping of keys and values (output) from a corpus or word sequence frequency data (input). Text formatting process for asymmetrically adjusts the width of spaces adjacent to keys using the values. The filtering process, which generates a mapping of keys and values can be performed once to analyze a corpus and once generated, the key-value mapping can be used multiple times by a subsequent text processing process.

Citations

22 Claims

1. A method for determining an uncertainty across a word space in text, comprising the steps of:
- a) providing text input;
  
  b) providing a database of function words;
  
  d) examining a plurality of words of the text input;
  
  e) identifying each of the plurality of words as one of the function words in the database or as a content word if the word being identified is not in the database;
  
  f) generating n-gram frequency counts for each unique pseudo-syntactic hybrid, wherein each of the unique pseudo-syntactic hybrids is an n-gram composed of at least one of the following;
  
  a lexical identity, a lexeme, a lexical category, and an open-class word;
  
  h) repeating steps d-f for a next plurality of words until end text input is reached; and
  
  g) using the n-gram frequency counts to compute the uncertainty for each of the unique pseudo-syntactic hybrids;
  
  wherein the lexical identity is the word, the lexeme is the set of forms a word can take, the lexical category is a part of speech of the word, and the open-class word is a content word that lacks syntactic information.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein the text input is a document containing text.
  - 3. The method of claim 1, wherein the text input is the n-gram frequency counts are generated from a corpus.

4. A system for determining an uncertainty across a word space in text, comprising:
- a database of function words;
  
  a counter for generating frequency counts for each unique pseudo-syntactic hybrids, wherein pseudo-syntactic hybrids are composed of at least one of the following;
  
  a lexical identity, a lexeme, a lexical category, and an open-class word; and
  
  a filter for computing lexical uncertainties across the word spaces of pseudo-syntactic hybrids using the generated frequency counts;
  
  wherein the lexical identity is the word, the lexeme is the set of forms a word can take, the lexical category is a part of speech of the word, and the open-class status of the lexical item is a content word that lacks syntactic information.
- View Dependent Claims (5)
- - 5. The system of claim 4, wherein computing the uncertainty results in providing an input map of keys and values, each of the keys indicating at least one pseudo-syntactic hybrid, and the values indicating the uncertainties across the word spaces adjacent to the keys.

6. A method for formatting text, comprising:
- providing text input;
  
  providing a mapping input of keys and values, the keys each indicating at least one of the unique pseudo-syntactic hybrids, and the values indicating the uncertainties across word spaces adjacent to the keys; and
  
  examining the text input to look for the keys in the mapping input and formatting widths of the adjacent spaces of the text input based on the outcome of the examining, wherein the formatting of the widths of the adjacent spaces of the text input is determined by the values.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 7. The method of claim 6, wherein the widths of the adjacent spaces are adjusted by changing at least one of the following character attributes of a space character, a preceding character or a following character:
    - a letter spacing, a horizontal scaling, kerning, a horizontal offset, padding, a left-margin, or a right-margin.
  - 8. The method of claim 6, wherein the widths of the adjacent spaces are adjusted by inserting an HTML tag within an HTML document.
  - 9. The method of claim 6, wherein the widths of the adjacent spaces are adjusted by inserting an XML tag within an XML document.
  - 10. The method of claim 6, wherein the widths of the adjacent spaces are adjusted by inserting an XHTML tag within an XHTML document.
  - 11. The method of claim 6, wherein one of the values from the mapping input indicates an absolute space size.
  - 12. The method of claim 6, wherein the widths of the adjacent spaces are adjusted by replacing a space character with at least one unicode private use area space character with a specified width that matches the value from the mapping input.
  - 13. The method of claim 6, wherein one of the values from the mapping input indicates a relative space size, which is converted to an absolute space size to be to be applied as the widths of the adjacent spaces.
  - 14. The method of claim 13, wherein the distribution of the relative space sizes across the keys and the values of the mapping input is maintained, but an absolute space size is adjusted dynamically.
  - 15. The method of claim 14, wherein an HTML tag is used to dynamically adjust the absolute space size.
  - 16. The method of claim 14, wherein an HTML tag refers to a CSS stylesheet that provides the adjustment of the absolute space size that is applied by the HTML tag.
  - 17. The method of claim 6, wherein the formatting of the widths of the adjacent spaces is adjusted by inserting one or more pixels or sub-pixels before or after space characters.
  - 18. The method of claim 6, wherein the formatting of the width of the adjacent spaces is adjusted by a web browser or a web-browser plug-in which renders a web document.
  - 19. The method of claim 6, wherein the keys indicate a list of one or more items which are composed of at least one of the following:
    - a lexical identity, a lexical category, an open-class status of a lexical item and a closed-class status of the lexical item wherein the lexical identity is the word, the lexeme is the set of forms a word can take, the lexical category is a part of speech of the word, the open-class status of the lexical item is a content word that lacks syntactic information and the closed-class status of the lexical item is a class of words that does not accept new items.
  - 20. The method of claim 19, wherein the uncertainties across the word spaces is determined by a measure of conditional entropy.

21. A computer program product for formatting text, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
- a first portion configured to provide a text;
  
  a second portion configured to provide a mapping input of keys and values, each of the keys indicating at least one pseudo-syntactic hybrid, and each of the values indicating the uncertainty across a word space adjacent to the key; and
  
  a third executable portion configured to examine the text input to look for the keys in the mapping input and formatting widths of between-word spaces of the text input that is based on an outcome of the examination, wherein the formatting of the widths of the between-word spaces is determined by the value.

22. In a computer system, having a display, and a method of displaying text, comprising the steps of:
- a) creating a list of all instances of a word wherein the character preceding the word includes at least one of;
  
  a space, a beginning of the word, a beginning of a line, a beginning of a paragraph, a beginning of a document, a tab, an indent, or a punctuation character;
  
  b) for each of the words in the list from step a, looking up the word (n) and a subsequent word (n+1) that immediately follows the word (n) in an adjustment score library, wherein the word and the subsequent word that follows are separated by a space character; and
  
  c) if found in the adjustment library, then adjusting the width of the space character using an adjustment score found for a word bigram of the word and the subsequent word in the adjustment library;
  
  d) setting n to n+1; and
  
  e) repeating steps b-d for all items in the list created in step a.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Asymmetrica Labs Inc.
Original Assignee
Asymmetrica Labs Inc.
Inventors
Nicholas, Christopher D, Brownfield, Kenneth R

Granted Patent

US 10,599,748 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/103   Formatting, i.e. changing o...

G06F 40/106   Display of layout of docume...

G06F 40/114   Pagination

G06F 40/117   Tagging; Marking up details...

G06F 40/143   Markup, e.g. Standard Gener...

G06F 40/163   Handling of whitespace

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/284   Lexical analysis, e.g. toke...

SYSTEMS AND METHODS FOR ASYMMETRICAL FORMATTING OF WORD SPACES ACCORDING TO THE UNCERTAINTY BETWEEN WORDS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS AND METHODS FOR ASYMMETRICAL FORMATTING OF WORD SPACES ACCORDING TO THE UNCERTAINTY BETWEEN WORDS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links