Systems and Methods for Determining Lexical Associations Among Words in a Corpus
First Claim
1. A computer-implemented method of identifying one or more target words of a corpus that have a lexical relationship to a plurality of provided cue words, the method comprising:
- receiving a plurality of cue words;
analyzing the cue words and statistical lexical information derived from a corpus of documents with a processing system to determine candidate words that have a lexical association with the cue words, the statistical information including numerical values indicative of probabilities of word pairs appearing together as adjacent words in a well-formed text or appearing together within a paragraph of a well-formed text;
for each candidate word,determining, using the processing system, a statistical association score between the candidate word and each of the cue words using numerical values included in the statistical information, andgenerating, using the processing system, an aggregate score for each of the candidate words based on the statistical association scores; and
selecting one or more of the candidate words to be the one or more target words based on the aggregate scores of the candidate words.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for identifying one or more target words of a corpus that have a lexical relationship to a plurality of provided cue words. The cue words and statistical lexical information derived from a corpus of documents are analyzed to determine candidate words that have a lexical association with the cue words. The statistical information includes numerical values indicative of probabilities of word pairs appearing together as adjacent words in a well-formed text or appearing together within a paragraph of a well-formed text. For each candidate word, a statistical association score between the candidate word and each of the cue words is determined. An aggregate score for each of the candidate words is determined based on the statistical association scores. One or more of the candidate words are selected to be the one or more target words based on the aggregate scores.
196 Citations
24 Claims
-
1. A computer-implemented method of identifying one or more target words of a corpus that have a lexical relationship to a plurality of provided cue words, the method comprising:
-
receiving a plurality of cue words; analyzing the cue words and statistical lexical information derived from a corpus of documents with a processing system to determine candidate words that have a lexical association with the cue words, the statistical information including numerical values indicative of probabilities of word pairs appearing together as adjacent words in a well-formed text or appearing together within a paragraph of a well-formed text; for each candidate word, determining, using the processing system, a statistical association score between the candidate word and each of the cue words using numerical values included in the statistical information, and generating, using the processing system, an aggregate score for each of the candidate words based on the statistical association scores; and selecting one or more of the candidate words to be the one or more target words based on the aggregate scores of the candidate words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system for identifying one or more target words of a corpus that have a lexical relationship to a plurality of provided cue words, the system comprising:
-
a processing system; and computer-readable memory in communication with the processing system encoded with instructions for commanding the processing system to execute steps comprising; receiving a plurality of cue words; analyzing the cue words and statistical lexical information derived from a corpus of documents to determine candidate words that have a lexical association with the cue words, the statistical information including numerical values indicative of probabilities of word pairs appearing together as adjacent words in a well-formed text or appearing together within a paragraph of a well-formed text; for each candidate word, determining a statistical association score between the candidate word and each of the cue words using numerical values included in the statistical information, and generating an aggregate score for each of the candidate words based on the statistical association scores; and selecting one or more of the candidate words to be the one or more target words based on the aggregate scores of the candidate words. - View Dependent Claims (17, 18)
-
-
19. A non-transitory computer-readable storage medium for identifying one or more target words of a corpus that have a lexical relationship to a plurality of provided cue words, the computer-readable storage medium comprising computer executable instructions which, when executed, cause a processing system to execute steps comprising:
-
receiving a plurality of cue words; analyzing the cue words and statistical lexical information derived from a corpus of documents to determine candidate words that have a lexical association with the cue words, the statistical information including numerical values indicative of probabilities of word pairs appearing together as adjacent words in a well-formed text or appearing together within a paragraph of a well-formed text; for each candidate word, determining a statistical association score between the candidate word and each of the cue words using numerical values included in the statistical information, and generating an aggregate score for each of the candidate words based on the statistical association scores; and selecting one or more of the candidate words to be the one or more target words based on the aggregate scores of the candidate words. - View Dependent Claims (20, 21)
-
-
22. A computer-implemented method of identifying one or more target n-grams of a corpus that have a lexical relationship to a plurality of provided cue words, the method comprising:
-
receiving a plurality of cue words; analyzing the cue words and statistical lexical information derived from a corpus of documents with a processing system to determine candidate n-grams that have a lexical association with the cue words, the statistical information including numerical values indicative of probabilities of multiple words appearing together as adjacent words in text of the corpus or appearing together within a paragraph of text in the corpus; for each candidate n-gram, determining, using the processing system, a statistical association score between the candidate n-gram and each of the cue words using numerical values of the dataset, and generating, using the processing system, an aggregate score for each of the candidate n-grams based on the statistical association scores; and selecting one or more of the candidate n-grams to be the one or more target n-grams based on the aggregate scores of the candidate n-grams. - View Dependent Claims (23, 24)
-
Specification