Methods for creating a phrase thesaurus
First Claim
Patent Images
1. A method for producing a phrase thesaurus, comprising the steps of:
- providing access to a thesaurus building computer coupled with a machine-readable phrase thesaurus database;
identifying, by thesaurus building computer, a plurality of valid phrases that occur within a text corpus;
determining, by thesaurus building computer, the degree of similarity between the valid phrases, wherein one or more contexts are created for each valid phrase, wherein each context for a valid phrase comprises a word or phrase that appears adjacent to the valid phrase in the text corpus, and wherein the degree of overlap of the contexts of each valid phrase with the contexts of each other valid phrase is determined; and
grouping, in the phrase thesaurus database, the valid phrases into classes of equivalent valid phrases based upon the determined degree of similarity between valid phrases.
5 Assignments
0 Petitions
Accused Products
Abstract
The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). More specifically, the invention concerns a phrase-based modeling of generic structures of verbal interaction and use of these models for the purpose of automating part of the design of such grammar networks.
-
Citations
47 Claims
-
1. A method for producing a phrase thesaurus, comprising the steps of:
-
providing access to a thesaurus building computer coupled with a machine-readable phrase thesaurus database; identifying, by thesaurus building computer, a plurality of valid phrases that occur within a text corpus; determining, by thesaurus building computer, the degree of similarity between the valid phrases, wherein one or more contexts are created for each valid phrase, wherein each context for a valid phrase comprises a word or phrase that appears adjacent to the valid phrase in the text corpus, and wherein the degree of overlap of the contexts of each valid phrase with the contexts of each other valid phrase is determined; and grouping, in the phrase thesaurus database, the valid phrases into classes of equivalent valid phrases based upon the determined degree of similarity between valid phrases. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. A non-transitory computer readable storage medium encoded with one or more computer programs for enabling production of a phrase thesaurus, comprising:
-
instructions for identifying a plurality of valid phrases that occur within a text corpus; instructions for determining the degree of similarity between the valid phrases, further comprising instructions for determining the degree of semantic similarity, wherein one or more contexts are created for each valid phrase, wherein each context for a valid phrase comprises a word or phrase that appears adjacent to the valid phrase in the text corpus, and wherein the degree of overlap of the contexts of each valid phrase with the contexts of each other valid phrase is determined; and instructions for grouping the valid phrases into classes of equivalent valid phrases based upon the determined degree of similarity between valid phrases.
-
Specification