IDENTIFYING CONTENT OF INTEREST
First Claim
Patent Images
1. A method of generating a marker set comprising markers that identify a desired type of text, the method comprising:
- selecting a seed marker set comprising at least one seed marker;
generating a seed corpus from a first reference corpus, wherein the seed corpus comprises a plurality of textual units, and wherein each of the plurality of textual units included in the seed corpus comprises at least one instance of a seed marker included in the seed marker set;
generating a statistical value describing the seed marker set and the seed corpus; and
generating a revised seed marker set.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods of identifying content of interest within a corpus are disclosed. The methods may comprise the step of applying a first marker set to the corpus, where the first marker set comprises at least one marker identifying a first type of text. For a first textual unit included in the corpus, the methods may comprise generating a score for the first marker set and comparing the score to a reference score. The score may indicate a number of instances of the at least one marker in the first textual unit.
22 Citations
37 Claims
-
1. A method of generating a marker set comprising markers that identify a desired type of text, the method comprising:
-
selecting a seed marker set comprising at least one seed marker;
generating a seed corpus from a first reference corpus, wherein the seed corpus comprises a plurality of textual units, and wherein each of the plurality of textual units included in the seed corpus comprises at least one instance of a seed marker included in the seed marker set;
generating a statistical value describing the seed marker set and the seed corpus; and
generating a revised seed marker set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system comprising a processor, wherein the processor is programmed to perform the steps of:
-
selecting a seed marker set comprising at least one seed marker;
generating a seed corpus from a first reference corpus, wherein the seed corpus comprises a plurality of textual units, and wherein each of the plurality of textual units included in the seed corpus comprises at least one instance of a seed marker included in the seed marker set;
generating a statistical value describing the seed marker set and the seed corpus; and
generating a revised seed marker set.
-
-
15. A computer readable medium comprising instructions that when executed by a processor, cause the processor to perform the steps of:
-
selecting a seed marker set comprising at least one seed marker;
generating a seed corpus from a first reference corpus, wherein the seed corpus comprises a plurality of textual units, and wherein each of the plurality of textual units included in the seed corpus comprises at least one instance of a seed marker included in the seed marker set;
generating a statistical value describing the seed marker set and the seed corpus; and
generating a revised seed marker set.
-
-
16. A method of identifying content of interest within a corpus, the method comprising:
-
applying a first marker set to the corpus, wherein the first marker set comprises at least one marker identifying a first type of text;
for a first textual unit included in the corpus, generating a score for the first marker set, wherein the score indicates a number of instances of the at least one marker in the first textual unit; and
comparing the score to a reference score. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A system comprising a processor, wherein the processor is programmed to perform the steps of:
-
applying a first marker set to the corpus, wherein the first marker set comprises at least one marker identifying a first type of text;
for a first textual unit included in the corpus, generating a score for the first marker set, wherein the score indicates a number of instances of the at least one marker in the first textual unit; and
comparing the score to a reference score.
-
-
22. A computer readable medium comprising instructions that when executed by a processor, cause the processor to perform the steps of:
-
applying a first marker set to the corpus, wherein the first marker set comprises at least one marker identifying a first type of text;
for a first textual unit included in the corpus, generating a score for the first marker set, wherein the score indicates a number of instances of the at least one marker in the first textual unit; and
comparing the score to a reference score.
-
-
23. A method of identifying content of interest within a corpus, the method comprising:
-
identifying a textual unit in the corpus that includes an instance of an anchor marker set;
generating a plurality of scores for the textual unit, wherein each of the plurality of scores indicates a number of instances in the textual unit of one of a plurality of marker sets;
comparing the plurality of scores to a plurality of reference scores;
calculating an offset between the instance of the anchor marker set and an instance of an instance of one of the plurality of marker sets; and
determining whether the textual unit comprises content of interest considering the comparing and the offset. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
-
30. A system comprising a processor, wherein the processor is programmed to perform the steps of:
-
identifying a textual unit in the corpus that includes an instance of an anchor marker set;
generating a plurality of scores for the textual unit, wherein each of the plurality of scores indicates a number of instances in the textual unit of one of a plurality of marker sets;
comparing the plurality of scores to a plurality of reference scores;
calculating an offset between the instance of the anchor marker set and an instance of an instance of one of the plurality of marker sets; and
determining whether the textual unit comprises content of interest considering the comparing and the offset.
-
-
31. A computer readable medium comprising instructions that when executed by a processor, cause the processor to perform the steps of:
-
identifying a textual unit in the corpus that includes an instance of an anchor marker set;
generating a plurality of scores for the textual unit, wherein each of the plurality of scores indicates a number of instances in the textual unit of one of a plurality of marker sets;
comparing the plurality of scores to a plurality of reference scores;
calculating an offset between the instance of the anchor marker set and an instance of an instance of one of the plurality of marker sets; and
determining whether the textual unit comprises content of interest considering the comparing and the offset.
-
-
32. A method of evaluating textual content, the method comprising:
-
identifying instances of a marker in a corpus;
identifying instances of a second marker set in the corpus; and
for an instance of the second marker set that occurs within a predetermined range of an instance of the marker, displaying tokens comprising the instance of the second marker set, tokens comprising the instance of the marker and an intervening token. - View Dependent Claims (33, 34, 35)
-
-
36. A system comprising a processor, wherein the processor is programmed to perform the steps of:
-
identifying instances of a marker in a corpus;
identifying instances of a second marker set in the corpus; and
for an instance of the second marker set that occurs within a predetermined range of an instance of the marker, displaying tokens comprising the instance of the second marker set, tokens comprising the instance of the marker and an intervening token.
-
-
37. A computer readable medium comprising instructions that when executed by a processor, cause the processor to perform the steps of:
-
identifying instances of a marker in a corpus;
identifying instances of a second marker set in the corpus; and
for an instance of the second marker set that occurs within a predetermined range of an instance of the marker, displaying tokens comprising the instance of the second marker set, tokens comprising the instance of the marker and an intervening token.
-
Specification