Contextual weighting of words in a word grouping
First Claim
Patent Images
1. A computer implemented method of determining a co-occurrence relationship between words in a corpus of word groupings, comprising:
- identifying a plurality of word pairs from a vocabulary of words;
determining, utilizing one or more processors, a co-occurrence probability for each of the word pairs in a corpus having a plurality of word groupings, each of the co-occurrence probability is based on the probability of co-occurrence of a single of the word pairs in a single of the word groupings;
wherein determining the co-occurrence probability for a word pair of the word pairs comprises determining a weighted count of the word groupings in which the word pair is present, wherein a word grouping of the word groupings in which the word pair is present is from a document, and wherein the weight of the contribution of the word grouping to the weighted count is based on at least two of frequency of occurrence, field weighting, and decorations of both words of the word pair in the document;
determining, utilizing one or more processors, a co-occurrence consistency for each of the word pairs by comparing the co-occurrence probability for each of the word pairs to an incidental occurrence probability for each of the word pairs, the incidental occurrence probability for each of the word pairs being specific to a respective of the word pairs;
creating a co-occurrence consistency matrix with the co-occurrence consistency for each of the word pairs;
receiving, by a search engine, a query submitted to the search engine by a user, the query including a word grouping having a plurality of word grouping words;
identifying, by the search engine and based on the co-occurrence consistency matrix, the co-occurrence consistency for each of a plurality of the word pairs in the word grouping words;
performing, by the search engine, a link analysis on the word grouping words utilizing the identified co-occurrence consistencies for the plurality of the word pairs in the word grouping words as weighting factors in the link analysis;
assigning, by the search engine, a contextual weight to each of a plurality of the word grouping words based on the link analysis; and
providing, by the search engine and based on the assigned contextual weights, results to the query submitted to the search engine by the user.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus related to contextual weighting of words. Methods are provided for determining co-occurrence relationships between words in a corpus of word groupings and for contextually weighting words in a word grouping as a function of which other words are present in the word grouping.
47 Citations
11 Claims
-
1. A computer implemented method of determining a co-occurrence relationship between words in a corpus of word groupings, comprising:
-
identifying a plurality of word pairs from a vocabulary of words; determining, utilizing one or more processors, a co-occurrence probability for each of the word pairs in a corpus having a plurality of word groupings, each of the co-occurrence probability is based on the probability of co-occurrence of a single of the word pairs in a single of the word groupings; wherein determining the co-occurrence probability for a word pair of the word pairs comprises determining a weighted count of the word groupings in which the word pair is present, wherein a word grouping of the word groupings in which the word pair is present is from a document, and wherein the weight of the contribution of the word grouping to the weighted count is based on at least two of frequency of occurrence, field weighting, and decorations of both words of the word pair in the document; determining, utilizing one or more processors, a co-occurrence consistency for each of the word pairs by comparing the co-occurrence probability for each of the word pairs to an incidental occurrence probability for each of the word pairs, the incidental occurrence probability for each of the word pairs being specific to a respective of the word pairs; creating a co-occurrence consistency matrix with the co-occurrence consistency for each of the word pairs; receiving, by a search engine, a query submitted to the search engine by a user, the query including a word grouping having a plurality of word grouping words; identifying, by the search engine and based on the co-occurrence consistency matrix, the co-occurrence consistency for each of a plurality of the word pairs in the word grouping words; performing, by the search engine, a link analysis on the word grouping words utilizing the identified co-occurrence consistencies for the plurality of the word pairs in the word grouping words as weighting factors in the link analysis; assigning, by the search engine, a contextual weight to each of a plurality of the word grouping words based on the link analysis; and providing, by the search engine and based on the assigned contextual weights, results to the query submitted to the search engine by the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system including memory and one or more processors operable to execute instructions stored in the memory, comprising instructions to:
-
identify a plurality of word pairs from a vocabulary of words; determine a co-occurrence probability for each of the word pairs in a corpus having a plurality of word groupings, each of the co-occurrence probability is based on the probability of co-occurrence of a single of the word pairs in a single of the word groupings; determine a co-occurrence consistency for each of the words pairs by comparing the co-occurrence probability for each of the word pairs to an incidental occurrence probability for each of the word pairs, the incidental occurrence probability for each of the word pairs being specific to a respective of the word pairs; wherein the instructions to determine the co-occurrence probability for a word pair of the word pairs comprise instructions to determine a weighted count of the word groupings in which the word pair is present, wherein a word grouping of the word groupings in which the word pair is present is from a document, and wherein the weight of the contribution of the word grouping to the weighted count is based on at least two of frequency of occurrence, field weighting, and decorations of both words of the word pair in the document; create a co-occurrence consistency matrix with the co-occurrence consistency for each of the word pairs; receive, by a search engine of the system, a query submitted to the search engine by a user, the query including a word grouping having a plurality of word grouping words; identify, by the search engine and based on the co-occurrence consistency matrix, the co-occurrence consistency for each of a plurality of the word pairs in the word grouping words; perform, by the search engine, a link analysis on the word grouping words utilizing the identified co-occurrence consistencies for the plurality of the word pairs in the word grouping words as weighting factors in the link analysis; assign, by the search engine, a contextual weight to each of a plurality of the word grouping words based on the link analysis; and provide, by the search engine and based on the assigned contextual weights, results to the query submitted to the search engine by the user. - View Dependent Claims (11)
-
Specification