Phrase identification in a sequence of words
First Claim
Patent Images
1. A computer implemented method of identifying a phrase weighting of a sequence of words as a function of the position of words present in the sequence of words, comprising:
- identifying a sequence of words;
determining, utilizing one or more processors, a centrality value for each of a plurality of identified words in the sequence of words, the centrality value for each of the identified words based on a co-occurrence consistency with other of the identified words in their respective relative positions in the sequence of words; and
determining, utilizing one or more processors, a phrase weighting of the sequence of words based on the determined centrality value for each of the identified words, wherein the phrase weighting provides an indication of the likelihood that the sequence of words is a phrase.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus related to phrase identification. Methods are provided for determining co-occurrence consistencies for positional word pairings of a plurality of sequences of words in a corpus that may be utilized in identifying a phrase; determining a phrase coherence of a sequence of words based on the co-occurrence consistencies for positional word pairings in the sequence of words; and determining one or more phrase boundaries in a sequence of words.
28 Citations
25 Claims
-
1. A computer implemented method of identifying a phrase weighting of a sequence of words as a function of the position of words present in the sequence of words, comprising:
-
identifying a sequence of words; determining, utilizing one or more processors, a centrality value for each of a plurality of identified words in the sequence of words, the centrality value for each of the identified words based on a co-occurrence consistency with other of the identified words in their respective relative positions in the sequence of words; and determining, utilizing one or more processors, a phrase weighting of the sequence of words based on the determined centrality value for each of the identified words, wherein the phrase weighting provides an indication of the likelihood that the sequence of words is a phrase. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
memory storing instructions; and one or more processors operable to execute the instructions stored in the memory; wherein the instructions comprise instructions to; identify a sequence of words; determine a centrality value for each of a plurality of identified words in the sequence of words, the centrality value for each of the identified words based on a co-occurrence consistency with other of the identified words in their respective relative positions in the sequence of words; and determine a phrase weighting of the sequence of words based on the determined centrality value for each of the identified words, wherein the phrase weighting provides an indication of the likelihood that the sequence of words is a phrase. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A non-transitory computer readable storage medium storing computer instructions executable by a processor to perform a method comprising:
-
identifying a sequence of words; determining, utilizing one or more processors, a centrality value for each of a plurality of identified words in the sequence of words, the centrality value for each of the identified words based on a co-occurrence consistency with other of the identified words in their respective relative positions in the sequence of words; and determining, utilizing one or more processors, a phrase weighting of the sequence of words based on the determined centrality value for each of the identified words, wherein the phrase weighting provides an indication of the likelihood that the sequence of words is a phrase.
-
Specification