Concept synonym matching engine
First Claim
1. A computer program product having a computer-readable storage medium storing computer program instructions that identify a parent concept referenced in an input string of text by determining the presence of a child concept in the input string, the computer program instructions comprising instructions that when executed by a processor cause the processor to perform the steps of:
- storing a plurality of concept hierarchies, each concept hierarchy comprising a plurality of concepts in a knowledge base, including parent concepts and child concepts, wherein each child concept inherits characteristics of a parent concept, wherein each concept is represented in the knowledge base by a pattern which comprises one or more pattern tokens, wherein at least some of the patterns representing concepts comprise multiple pattern tokens that are associated with each other by a set of constraints;
dividing the input string into input tokens that represent sub-strings of text within the input string;
identifying at least one token match between any of the input tokens and any of the pattern tokens representing a child concept in the concept hierarchies;
identifying at least one pattern match between sub-strings of the input string that are comprised of more than one of the matched input tokens and the pattern representing the child concept based on the token match and the set of constraints for the pattern;
scoring the at least one pattern match based on the corresponding token match to provide at least one match score; and
determining which of the child concepts are present in the input string based on the corresponding match score, wherein a plurality of child concepts, contained in a plurality of concept hierarchies, is present in the input string, the presence of the child concepts in the input string identifying that at least one parent concept is referenced in the input string of text.
9 Assignments
0 Petitions
Accused Products
Abstract
A concept synonym matching engine identifies and extracts specific information referenced in a selection of text and matches the information to a set of defined concepts. The engine is able to perform this identification and matching in the presence of errors (e.g., misspellings, etc.) or variations in the description of those concepts (e.g., use of different terms to define the same idea). The engine performs these functions by tokenizing an input string of text and matching these tokens to tokens of a pattern that represents a concept to be matched. The engine matches the pattern to a sub-string of text, scores the match, and uses the score to determine whether the concept is present in the input string or to select the optimal match.
-
Citations
50 Claims
-
1. A computer program product having a computer-readable storage medium storing computer program instructions that identify a parent concept referenced in an input string of text by determining the presence of a child concept in the input string, the computer program instructions comprising instructions that when executed by a processor cause the processor to perform the steps of:
-
storing a plurality of concept hierarchies, each concept hierarchy comprising a plurality of concepts in a knowledge base, including parent concepts and child concepts, wherein each child concept inherits characteristics of a parent concept, wherein each concept is represented in the knowledge base by a pattern which comprises one or more pattern tokens, wherein at least some of the patterns representing concepts comprise multiple pattern tokens that are associated with each other by a set of constraints; dividing the input string into input tokens that represent sub-strings of text within the input string; identifying at least one token match between any of the input tokens and any of the pattern tokens representing a child concept in the concept hierarchies; identifying at least one pattern match between sub-strings of the input string that are comprised of more than one of the matched input tokens and the pattern representing the child concept based on the token match and the set of constraints for the pattern; scoring the at least one pattern match based on the corresponding token match to provide at least one match score; and determining which of the child concepts are present in the input string based on the corresponding match score, wherein a plurality of child concepts, contained in a plurality of concept hierarchies, is present in the input string, the presence of the child concepts in the input string identifying that at least one parent concept is referenced in the input string of text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented method of identifying a parent concept referenced in an input string of text by determining the presence of a child concept in the input string, the method comprising:
-
storing a plurality of concept hierarchies, each concept hierarchy comprising a plurality of concepts in a knowledge base, including parent concepts and child concepts, wherein each child concept inherits characteristics of a parent concept, wherein each concept is represented in the knowledge base by a pattern which comprises one or more pattern tokens, wherein at least some of the patterns representing concepts comprise multiple pattern tokens that are associated with each other by a set of constraints; dividing the input string into input tokens that represent sub-strings of text within the input string; identifying at least one token match between any of the input tokens and any of the pattern tokens representing a child concept in the concept hierarchies; identifying at least one pattern match between sub-strings of the input string that are comprised of more than one of the matched input tokens and the pattern representing the child concept based on the token match and the set of constraints for the pattern; scoring the at least one pattern match based on the corresponding token match to provide at least one match score; and determining which of the child concepts are present in the input string based on the corresponding match score, wherein a plurality of child concepts, contained in a plurality of concept hierarchies, is present in the input string, the presence of the child concepts in the input string identifying that at least one parent concept is referenced in the input string of text. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer system for identifying a parent concept referenced in an input string of text by determining the presence of a child concept in the input string, the system comprising:
-
a computer-readable storage medium configured to store a plurality of concept hierarchies, each concept hierarchy comprising a plurality of concepts in a knowledge base, including parent concepts and child concepts, wherein each child concept inherits characteristics of a parent concept, wherein each concept is represented in the knowledge base by a pattern which comprises one or more pattern tokens, wherein at least some of the patterns representing concepts comprise multiple pattern tokens that are associated with each other by a set of constraints; a tokenization module, stored on the computer-readable storage medium, configured to divide the input string into input tokens that represent sub-strings of text within the input string; a token matching module, stored on the computer-readable storage medium, configured to identify at least one token match between any of the input tokens and any of the pattern tokens representing a child concept in the concept hierarchies; a pattern matching module, stored on the computer-readable storage medium, configured to identify at least one pattern match between sub-strings of the input string that are comprised of more than one of the matched input tokens and the pattern representing the child concept based on the token match and the set of constraints for the pattern; a pattern scoring module, stored on the computer-readable storage medium, configured to score the at least one pattern match based on the corresponding token match to provide at least one match score; and a pattern selection module, stored on the computer-readable storage medium, configured to determine which of the child concepts are present in the input string based on the corresponding match score, wherein a plurality of child concepts, contained in a plurality of concept hierarchies, is present in the input string, the presence of the child concepts in the input string identifying that at least one parent concept is referenced in the input string of text. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A computer-implemented method of matching a parent concept to one or more sub-strings in an input string of text by determining the presence of a child concept in the input string, the method comprising:
-
storing a plurality of concept hierarchies, each concept hierarchy comprising a plurality of concepts in a knowledge base, including parent concepts and child concepts, where each child concept inherits characteristics of a parent concept, wherein each concept is represented in the knowledge base by a pattern formed of basic patterns, each of which comprises one or more pattern tokens, wherein at least some of the patterns representing concepts comprise multiple pattern tokens that are dividing the input string into input tokens that represent sub-strings of text within the input string; identifying at least one token match between any of the input tokens and any of the pattern tokens representing a child concept in the concept hierarchies; identifying at least one pattern match between sub-strings of the input string that are comprised of more than one of the matched input tokens and the pattern representing the child concept based on the token match and the set of constraints for the pattern; scoring the at least one pattern match based on the corresponding token match to provide at least one match score by assigning each of the basic patterns a weight that together equals a total weight for the pattern; and selecting the at least one pattern match with the total weight that is highest and where the at least one pattern match does not overlap any other pattern matches for the input string, the at least one pattern match indicating that the child concept is present in the input string, wherein a plurality of child concepts, contained in a plurality of concept hierarchies, is present in the input string, the presence of the child concepts in the input string identifying that at least one parent concept is referenced in the input string of text. - View Dependent Claims (47, 48, 49, 50)
-
Specification