Temporal document trainer and method
First Claim
Patent Images
1. A method of automatically training an electronic document sorter with a computing system comprising:
- a) identifying a first set of electronic documents containing content associated with a set of topics using the computing system;
wherein said content describes one or more events related to one or more topics;
b) analyzing said first set of electronic documents to identify first temporal components for each of said one or more events and one or more topics;
wherein said first temporal components include content other than but not excluding date/time data specified for said first set of electronic documents, including text words and semantic equivalents of such text words;
c) assigning a first set of corresponding temporal scores to said first set of electronic documents based on analyzing said first temporal components, said first set of temporal scores representing a first set of chronological states of said one or more events for said one or more topics described in said first set of documents,wherein said first set of corresponding temporal scores are provided by automated and/or human based input scoring;
d) training an automated document classifier using said temporal components;
wherein the automated document classifier is adapted by such training to classify and assign a second temporal score to an unclassified electronic document containing content associated with the first event related to a first topic based on temporal characteristics analyzed for such unclassified electronic document, including second content other than but not excluding second date/time data including second text words and second semantic equivalents thereof representing a second set of temporal components contained in such unclassified document;
wherein said unclassified document is ordered in time relative to said first set of documents with respect to at least said first event based on said second temporal score.
1 Assignment
0 Petitions
Accused Products
Abstract
An electronic document sorter is trained to classify documents based on their temporal qualities. The invention can be used in environments such as automated news aggregators, search engines and other electronic systems which compile information having temporal qualities.
63 Citations
27 Claims
-
1. A method of automatically training an electronic document sorter with a computing system comprising:
-
a) identifying a first set of electronic documents containing content associated with a set of topics using the computing system; wherein said content describes one or more events related to one or more topics; b) analyzing said first set of electronic documents to identify first temporal components for each of said one or more events and one or more topics; wherein said first temporal components include content other than but not excluding date/time data specified for said first set of electronic documents, including text words and semantic equivalents of such text words; c) assigning a first set of corresponding temporal scores to said first set of electronic documents based on analyzing said first temporal components, said first set of temporal scores representing a first set of chronological states of said one or more events for said one or more topics described in said first set of documents, wherein said first set of corresponding temporal scores are provided by automated and/or human based input scoring; d) training an automated document classifier using said temporal components;
wherein the automated document classifier is adapted by such training to classify and assign a second temporal score to an unclassified electronic document containing content associated with the first event related to a first topic based on temporal characteristics analyzed for such unclassified electronic document, including second content other than but not excluding second date/time data including second text words and second semantic equivalents thereof representing a second set of temporal components contained in such unclassified document;wherein said unclassified document is ordered in time relative to said first set of documents with respect to at least said first event based on said second temporal score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method of automatically training an electronic document sorter with a computing system comprising:
-
a) identifying a first set of set of electronic documents containing content associated with a set of topics using the computing system; wherein said content describes one or more events related to one or more topics; b) sorting said first set of electronic documents in temporal order with respect to each of said one or more events and one or more topics to form at least one sorted topic/event document set using the computing system; wherein said first set of electronic documents are sorted and annotated by a combination of both human and an automated text parsing; c) analyzing said at least one sorted topic/event document set using the computing system to identify temporal components relating to the topic/event of each electronic document which correlates to a temporal rank of such electronic document with respect to the topic/event; wherein said temporal components include temporal data including text words other than but not excluding date/time data; d) training an automated document classifier using the computing system to automatically identify second temporal components in other electronic documents associated with a particular event/topic and which are not part of said sorted topic/event document set; wherein said second temporal components also include semantic equivalents of said text words; e) collecting a new set of documents and determining a reputation/trust score based on the origin of the new set of documents; f) ranking the new set of electronic documents using said automated classifier and said reputation/trust score to determine a temporal rating of each document relative to an event or topic, which temporal rating defines a relative chronological sequence of such document for said event or topic relative to other documents in said new set; and g) selecting at least some of the documents with a highest temporal rating and outputting them in a form suitable for review by a human reviewer. - View Dependent Claims (24, 25, 26, 27)
-
Specification