Systems and methods for information extraction using contextual pattern discovery
First Claim
Patent Images
1. A system comprising:
- at least one processor; and
a memory device operatively connected to the at least one processor;
wherein, responsive to execution of program instructions accessible to the at least one processor and configured to automatically discover at least one text-based pattern in at least one text corpus, the at least one processor is configured to;
issue a query of the text corpus to extract at least one context string comprising a sequence of text from the text corpus, the sequence of text identified using a positional relationship to at least one text annotator corresponding to a text entity of interest included in text of the at least one text corpus;
analyze the at least one context string to produce at least one sequence representing a text-based pattern of the context string;
determine at least one semantic sequence signature for the context string from the at least one sequence which identifies the context string; and
thereupon use the at least one semantic sequence signature to automatically group semantically similar context strings of the text corpus.
4 Assignments
0 Petitions
Accused Products
Abstract
Described herein are methods, systems, apparatuses and products for automatically discovering patterns in a text corpus. An aspect provides extracting at least one context string related to at least one annotator from the at least one text corpus; analyzing the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determining at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and grouping the at least one sequence signature into at least one group.
68 Citations
10 Claims
-
1. A system comprising:
-
at least one processor; and a memory device operatively connected to the at least one processor; wherein, responsive to execution of program instructions accessible to the at least one processor and configured to automatically discover at least one text-based pattern in at least one text corpus, the at least one processor is configured to; issue a query of the text corpus to extract at least one context string comprising a sequence of text from the text corpus, the sequence of text identified using a positional relationship to at least one text annotator corresponding to a text entity of interest included in text of the at least one text corpus; analyze the at least one context string to produce at least one sequence representing a text-based pattern of the context string; determine at least one semantic sequence signature for the context string from the at least one sequence which identifies the context string; and thereupon use the at least one semantic sequence signature to automatically group semantically similar context strings of the text corpus. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
automatically discovering at least one text-based pattern in at least one text corpus, wherein discovering at least one pattern comprises; issuing a query of the text corpus to extract at least one context string comprising a sequence of text from the text corpus, the sequence of text identified using a positional relationship to at least one annotator corresponding to a text entity of interest included in text of the at least one text corpus; analyzing the at least one context string to produce at least one sequence representing a text-based pattern of the context string; determining at least one semantic sequence signature for the context string from the at least one sequence which identifies the context string; and thereupon using the at least one semantic sequence signature to automatically group semantically similar context strings of the text corpus. - View Dependent Claims (9)
-
-
10. A computer program product comprising:
-
a computer readable storage medium having computer readable program code configured to automatically discover at least one text-based pattern in at least one text corpus embodied therewith, the computer readable program code comprising; computer readable program code configured to issue a query of the text corpus to extract at least one context string comprising a sequence of text from the text corpus, the sequence of text identified using a positional relationship to at least one annotator corresponding to a text entity of interest included in text of the at least one text corpus; computer readable program code configured to analyze the at least one context string to produce at least one sequence representing a text-based pattern of the context string; computer readable program code configured to determine at least one semantic sequence signature for the context string from the at least one sequence which identifies the context string; and computer readable program code configured to thereupon using the at least one semantic sequence signature to automatically group semantically similar context strings of the text corpus.
-
Specification