SYSTEMS AND METHODS FOR INFORMATION EXTRACTION USING CONTEXTUAL PATTERN DISCOVERY
First Claim
Patent Images
1. A system comprising:
- at least one processor; and
a memory device operatively connected to the at least one processor;
wherein, responsive to execution of program instructions accessible to the at least one processor and configured to automatically discover at least one pattern in at least one text corpus, the at least one processor is configured to;
extract at least one context string related to at least one annotator from the at least one text corpus;
analyze the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence;
determine at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and
group the at least one sequence signature into at least one group corresponding to at least one sequence signature.
4 Assignments
0 Petitions
Accused Products
Abstract
Described herein are methods, systems, apparatuses and products for automatically discovering patterns in a text corpus. An aspect provides extracting at least one context string related to at least one annotator from the at least one text corpus; analyzing the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determining at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and grouping the at least one sequence signature into at least one group.
-
Citations
20 Claims
-
1. A system comprising:
-
at least one processor; and a memory device operatively connected to the at least one processor; wherein, responsive to execution of program instructions accessible to the at least one processor and configured to automatically discover at least one pattern in at least one text corpus, the at least one processor is configured to; extract at least one context string related to at least one annotator from the at least one text corpus; analyze the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determine at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and group the at least one sequence signature into at least one group corresponding to at least one sequence signature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method comprising:
-
automatically discovering at least one pattern in at least one text corpus, wherein discovering at least one pattern comprises; extracting at least one context string related to at least one annotator from the at least one text corpus; analyzing the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determining at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and grouping the at least one sequence into at least one group corresponding to at least one sequence signature. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer program product comprising:
-
a computer readable storage medium having computer readable program code configured to automatically discover at least one pattern in at least one text corpus embodied therewith, the computer readable program code comprising; computer readable program code configured to extract at least one context string related to at least one annotator from the at least one text corpus; computer readable program code configured to analyze the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; computer readable program code configured to determine at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and computer readable program code configured to group the at least one sequence into at least one group corresponding to at least one sequence signature.
-
Specification