Structured term recognition
First Claim
1. computer implemented method of recognizing types of terms in a specified corpus, comprising:
- providing a set of known terms t∈
T, each of the known terms t belonging to a given set of types Γ
(t)={γ
1, γ
2. . . }, wherein each of the terms is comprised of a list of words, t=w1, w2. . . , wn, and the union of all the words w for all the terms t is a word set W;
forming, by a clustering component of a computer system, a multitude of clusters of words from the words in W;
using, by a mapping determining component of the computer system, the set of known terms T and the given set of types Γ
to determine a set of pattern-to-type mappings {p1→
γ
1, p2→
γ
2, . . .}, each of the pattern-to-type mappings p→
γ
mapping an associated sequence of the clusters of words to one or more of the given set of types {γ
1, γ
2, . . . };
using, by a term recognition component of the computer system, the determined set of pattern-to-type mappings to recognize corpus terms in the specified corpus, each of the corpus terms being comprised of two or more words; and
for each of the recognized corpus terms in the specified corpus, using, by a word mapping component of the computer system, a specified context in the corpus of at least a plurality of the words of said each corpus term to map said plurality of words to one of the sequences of the clusters of words, and using, by a type recognition component of the computer system, one of the determined pattern-to-type mappings to map said one of the sequences of the clusters of words to one or more of the types y of the given set of types {γ
1, γ
2, . . . } to recognize said one or more of the types γ
of the given set of types for said each recognized corpus term to boost performance of term recognition systems based on dictionary lookup while extending coverage of ontologies.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and computer program product for recognizing terms in a specified corpus. In one embodiment, the method comprises providing a set of known terms t∈T, each of the known terms t belonging to a set of types Γ (t)={γ1, . . . }, wherein each of the terms is comprised of a list of words, t=w1, w2, . . . , wn, and the union of all the words for all the terms is a word set W. The method further comprises using the set of terms T and the set of types to determine a set of pattern-to-type mappings p→γ; and using the set of pattern-to-type mappings to recognize terms in the specified corpus and, for each of the recognized terms in the specified corpus, to recognize one or more of the types γ for said each recognized term.
17 Citations
19 Claims
-
1. computer implemented method of recognizing types of terms in a specified corpus, comprising:
-
providing a set of known terms t∈
T, each of the known terms t belonging to a given set of types Γ
(t)={γ
1, γ
2. . . }, wherein each of the terms is comprised of a list of words, t=w1, w2. . . , wn, and the union of all the words w for all the terms t is a word set W;forming, by a clustering component of a computer system, a multitude of clusters of words from the words in W; using, by a mapping determining component of the computer system, the set of known terms T and the given set of types Γ
to determine a set of pattern-to-type mappings {p1→
γ
1, p2→
γ
2, . . .}, each of the pattern-to-type mappings p→
γ
mapping an associated sequence of the clusters of words to one or more of the given set of types {γ
1, γ
2, . . . };using, by a term recognition component of the computer system, the determined set of pattern-to-type mappings to recognize corpus terms in the specified corpus, each of the corpus terms being comprised of two or more words; and for each of the recognized corpus terms in the specified corpus, using, by a word mapping component of the computer system, a specified context in the corpus of at least a plurality of the words of said each corpus term to map said plurality of words to one of the sequences of the clusters of words, and using, by a type recognition component of the computer system, one of the determined pattern-to-type mappings to map said one of the sequences of the clusters of words to one or more of the types y of the given set of types {γ
1, γ
2, . . . } to recognize said one or more of the types γ
of the given set of types for said each recognized corpus term to boost performance of term recognition systems based on dictionary lookup while extending coverage of ontologies. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer system for recognizing types of terms in a specified corpus, the computer comprising:
-
a memory for storing data; one or more processing units connected to the memory to transmit data to and to receive data from the memory, and configured for; receiving a set of known terms t∈
T, each of the known terms t belonging to a given set of types Γ
(t)={γ
1, γ
2, . . .}, wherein each of the terms is comprised of a list of words, t=w1, w2. . . , wn, and the union of all the words w for all the terms t is a word set W;forming, by a clustering component of the computer system a multitude of clusters of words from the words in W; using, by a mapping determining component of the computer system, the set of known terms T and the given set of types Γ
to determine a set of pattern-to-type mappings {p1→
γ
1, p2→
γ
2. . . }, each of the pattern-to-type mappings p→
γ
mapping an associated sequence of the clusters of words to one or more of the given set of types {γ
1, γ
2, . . .}; andusing, by a term recognition component of the computer system, the determined set of pattern-to-type mappings to recognize corpus terms in the specified corpus, each of the corpus terms being comprised of two or more words; and for each of the recognized corpus terms in the specified corpus, using, by a word mapping component of the computer system, a specified context in the corpus of at least a plurality of the words of said each corpus term to map said plurality of words to one of the sequences of the clusters of words, and using, by a type recognition component of the computer system, one of the determined pattern-to-type mappings to map said one of the sequences of the clusters of words to one or more types γ
of the given set of types {γ
1, γ
2, . . .} to recognize said one or more of the types γ
of the given set of types for said each recognized corpus term to boost performance of term recognition systems based on dictionary lookup while extending coverage of ontologies. - View Dependent Claims (13, 14, 15)
-
-
16. An article of manufacture comprising:
-
at least one computer readable hardware medium tangibly embodying a program of instructions for recognizing types of terms in a specified corpus, said program of instructions, when executing in a computer, performing the following; receiving at the computer a set of known terms t∈
T, each of the known terms t belonging to a given set of types Γ
(t)={γ
1, γ
2, . . .}, wherein each of the terms is comprised of a list of words, t=w1, w2. . . , wn, and the union of all the words w for all the terms t is a word set W;forming at a clustering component of the computer a multitude of clusters of words from the words in W; using at a mapping determining component of the computer the set of known terms T and the given set of types Γ
to determine a set of pattern-to-type mappings {p1→
γ
1, p2→
γ
2, . . .}, each of the pattern-to-type mappings p→
γ
mapping as associated sequence of the clusters of words to one or more of the given set of types {γ
1, γ
2, . . .}; andusing at a term recognition component of the computer the determined set of known pattern-to-type mappings to recognize corpus terms in the specified corpus, each of the corpus terms being comprised of two or more words; and for each of the recognized corpus terms in the specified corpus, using at a word mapping component of the computer a specified context in the corpus of at least a plurality of the words of said each corpus term to map said plurality of words to one of the sequences of the clusters of words, and using, at a type recognition component of the computer system, one of the determined pattern-to-type mappings to map said one of the sequences of the clusters of words to one or more types γ
of the given set of types {γ
1, γ
2, . . .} to recognize said one or more of the types γ
of the given set of types for said each recognized corpus term to boost performance of term recognition systems based on dictionary lookup while extending coverage of ontologies. - View Dependent Claims (17, 18, 19)
-
Specification