Mitigation of conflicts between content matchers in automated document analysis
First Claim
1. A method for performing, by at least one processing device implementing a plurality of content matchers that each identify occurrences of respectively corresponding content types, automated document analysis of a document comprising a body of text, the method comprising:
- executing, by the at least one processing device, each content matcher of the plurality of content matchers to identify, for each content matcher, at least one match in the body of the text and assigning a match strength to each of the at least one match, where each of the at least one match is an occurrence of a content type corresponding to the content matcher of the plurality of content matchers that identified the match;
identifying, by the at least one processing device, a conflict in content types between a first match assigned by a first content matcher of the plurality of conflict matchers and a second match assigned by a second content matcher of the plurality of content matchers, the first match having a first match strength and the second match having a second match strength;
determining, by the at least one processing device, whether either of the first match strength or the second match strength is greater than the other;
when one of the first match strength and the second match strength is greater than the other, discarding, by the at least one processing device, the match of the first and second matches corresponding to the lesser of the first and second match strengths;
re-executing, based on the receipt of new information, the content matcher of the discarded match to re-evaluate at least the portion of the body of the text corresponding to the first and second matches, wherein the new information includes at least information of a different content type; and
identifying, based on the re-evaluation of the portion of the body of the text, a new conflict between the first and second matches.
6 Assignments
0 Petitions
Accused Products
Abstract
Each of a plurality of content matchers is executed upon a body of text in a document, identifying at least one match in the text and additionally assigning a match strength for each match. Where a conflict between a first match (have a first match strength associated therewith) and a second match (having a second match strength associated therewith) is noted, it is determined whether either of the first or the second match strength is greater than the other. If so, that match of the first and second matches corresponding to the lesser of the first and second match strengths is discarded. If the first or second match strengths are equal, then respective matcher ranks of the first matcher and the second matcher are compared such that the match of the first and second matches corresponding to the lesser of the first and second matcher ranks is discarded.
-
Citations
9 Claims
-
1. A method for performing, by at least one processing device implementing a plurality of content matchers that each identify occurrences of respectively corresponding content types, automated document analysis of a document comprising a body of text, the method comprising:
-
executing, by the at least one processing device, each content matcher of the plurality of content matchers to identify, for each content matcher, at least one match in the body of the text and assigning a match strength to each of the at least one match, where each of the at least one match is an occurrence of a content type corresponding to the content matcher of the plurality of content matchers that identified the match; identifying, by the at least one processing device, a conflict in content types between a first match assigned by a first content matcher of the plurality of conflict matchers and a second match assigned by a second content matcher of the plurality of content matchers, the first match having a first match strength and the second match having a second match strength; determining, by the at least one processing device, whether either of the first match strength or the second match strength is greater than the other; when one of the first match strength and the second match strength is greater than the other, discarding, by the at least one processing device, the match of the first and second matches corresponding to the lesser of the first and second match strengths; re-executing, based on the receipt of new information, the content matcher of the discarded match to re-evaluate at least the portion of the body of the text corresponding to the first and second matches, wherein the new information includes at least information of a different content type; and identifying, based on the re-evaluation of the portion of the body of the text, a new conflict between the first and second matches. - View Dependent Claims (2)
-
-
3. The method of clam 1, wherein each content matcher of the plurality of content matchers has a corresponding matcher rank, and wherein executing each content matcher of the plurality of content matchers further comprises executing each content matcher in an order determined from highest to lowest of the corresponding matcher ranks.
-
4. An apparatus comprising a plurality of content matchers that each identify occurrences of respectively corresponding content types, the apparatus also being configured to perform automated document analysis of a document comprising a body of text, the apparatus comprising:
-
at least one processing device; and memory, operatively connected to the at least one processing device having stored thereon executable instructions that, when executed by the at least one processing device, cause the at least one processing device to; execute each content matcher of the plurality of content matchers to identify, for each content matcher, at least one match in the body of text and assign a match strength to each of the at least one match, where each of the at least one match is an occurrence of a content type corresponding to the content matcher of the plurality of content matchers that identified the match; identify a conflict in content types between a first match assigned by a first content matcher of the plurality of conflict matchers and a second match assigned by a second content matcher of the plurality of content matchers, the first match having a first match strength and the second match having a second match strength; determine whether either of the first match strength or the second match strength is greater than the other; when one of the first match strength and the second match strength is greater than the other, discard the match of the first and second matches corresponding to the lesser of the first and second match strengths; re-execute, based on the receipt of new information, the content matcher of the discarded match to re-evaluate at least the portion of the body of the text corresponding to the first and second matches, wherein the new information includes at least information of a different content type; and identify, based on the re-evaluation of the portion of the body of the text, a new conflict between the first and second matches. - View Dependent Claims (5)
-
-
6. The apparatus of clam 4, wherein each content matcher of the plurality of content matchers has a corresponding matcher rank, and wherein those instructions that, when executed by the at least one processing device, cause the at least one processing device to execute each content matcher of the plurality of content matchers are further operative to execute each content matcher in an order determined from highest to lowest of the corresponding matcher ranks.
-
7. A non-transitory computer readable medium comprising executable instructions that, when executed by the at least one processing device, cause the at least one processing device to perform automated document analysis of a document comprising a body of text in which the at least one processing device is further caused to:
-
execute each content matcher of a plurality of content matchers that each identify occurrences of respectively corresponding content types to identify, for each content matcher, at least one match in the body of text and assign a match strength to each of the at least one match, where each of the at least one match is an occurrence of a content type corresponding to the content matcher of the plurality of content matchers that identified the match; identify a conflict in content types between a first match assigned by a first content matcher of the plurality of conflict matchers and a second match assigned by a second content matcher of the plurality of content matchers, the first match having a first match strength and the second match having a second match strength; determine whether either of the first match strength or the second match strength is greater than the other; when one of the first match strength and the second match strength is greater than the other, discard the match of the first and second matches corresponding to the lesser of the first and second match strengths; re-execute, based on the receipt of new information, the content matcher of the discarded match to re-evaluate at least the portion of the body of the text corresponding to the first and second matches, wherein the new information includes at least information of a different content type; and identify, based on the re-evaluation of the portion of the body of the text, a new conflict between the first and second matches. - View Dependent Claims (8)
-
-
9. The non-transitory computer readable medium of clam 7, wherein each content matcher of the plurality of content matchers has a corresponding matcher rank, and wherein those instructions that, when executed by the at least one processing device, cause the at least one processing device to execute each content matcher of the plurality of content matchers are further operative to execute each content matcher in an order determined from highest to lowest of the corresponding matcher ranks.
Specification