Topic discovery, summary generation, automatic tagging, and search indexing for segments of a document
First Claim
1. A method for extracting and presenting information for document segments, comprising:
- obtaining a document, wherein the document is in the form of a communication thread, wherein the document comprises multiple segments, wherein at least two of the segments are created by different persons, wherein the multiple segments are chronologically organized based on the time each segment is added to the document;
identifying the first chronological segment in the document, wherein the first chronological segment contains the content that is most recently added to the document by a specific person in the communication thread, wherein the content of the first chronological segment is a response to the content of a different segment in the document created by a different person at a different time;
dividing the document into a first part and a second part, wherein the first part comprises the first chronological segment, wherein the second part comprises the portion of the document that is not the first chronological segment;
extracting the content of the first part for analysis;
identifying one or more terms in the content of the first part, wherein a term can be a word or phrase or sentence or paragraph;
selecting the one or more terms; and
storing or displaying the selected terms to represent a topic or summary of the document or a chronologically newer content in the document.
1 Assignment
0 Petitions
Accused Products
Abstract
System and methods are disclosed for discovering topics in sub-segments of documents, and extracting terms from a sub-segment representing topics or summaries of the sub-segment, and displaying such terms in connection with the sub-segment or with the document, which can also function as automatically generated tags or labels for the segments or for the documents. Methods are also disclosed for building search indexes based on specific sub-segments of documents, such that, users can search for contents in a specific segment of the document. One embodiment of such a search index is with emails, blogs, and forum articles that typically contain segmented contents added at different times or by different authors in a format known as a thread, and searching in a specific segment such as the most recently added segment can help quickly find the most relevant information without repeating the same information in other segments in the thread.
32 Citations
20 Claims
-
1. A method for extracting and presenting information for document segments, comprising:
-
obtaining a document, wherein the document is in the form of a communication thread, wherein the document comprises multiple segments, wherein at least two of the segments are created by different persons, wherein the multiple segments are chronologically organized based on the time each segment is added to the document; identifying the first chronological segment in the document, wherein the first chronological segment contains the content that is most recently added to the document by a specific person in the communication thread, wherein the content of the first chronological segment is a response to the content of a different segment in the document created by a different person at a different time; dividing the document into a first part and a second part, wherein the first part comprises the first chronological segment, wherein the second part comprises the portion of the document that is not the first chronological segment; extracting the content of the first part for analysis; identifying one or more terms in the content of the first part, wherein a term can be a word or phrase or sentence or paragraph; selecting the one or more terms; and storing or displaying the selected terms to represent a topic or summary of the document or a chronologically newer content in the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer system for discovering and presenting information in document segments, comprising:
-
a computer processor configured to receive a portion of content extracted from a document segment, and to enable the display of the portion of the content in a display area, wherein the portion of the content is displayed as a tag text or label text of the document segment, wherein the tag or label text represents a topic or a summary of the document or a chronologically newer segment in the document, wherein the document contains multiple text segments, wherein at least two of the text segments are created by different persons, wherein the multiple segments are chronologically organized based on the time each segment is added to the document, wherein the type of document includes an email or forum discussion or blog or other documents in the format of a communication thread, wherein the portion of the content is identified and extracted by; (a) identifying the first chronological segment in the document, wherein the first chronological segment contains the content that is most recently added to the document by a specific person in the communication thread, wherein the content of the first chronological segment is a response to the content of a different segment in the document created by a different person at a different time, (b) dividing the document into a first part and a second part, wherein the first part comprises the first chronological segment, wherein the second part comprises the portion of the document that is not the first chronological segment, (c) extracting the content of the first part for analysis, (d) identifying a first term in the content of the first part, wherein the first term includes a word or phrase, (e) selecting the first term or a sentence or paragraph containing the first term, and (f) extracting the first term or the sentence or paragraph as the portion of the content. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A computer-assisted method for indexing and searching information in document segments, comprising:
-
receiving a plurality of documents, wherein each of the documents is in the form of a communication thread, comprising multiple segments, wherein at least two of the segments are created by different persons, wherein the multiple segments are chronologically organized based on the time each segment is added to the document; for each of the plurality of documents, identifying the first chronological segment in the document, wherein the first chronological segment contains the content that is most recently added to the document by a specific person in the communication thread, wherein the content of the first chronological segment is a response to the content of a different segment in the document created by a different person at a different time; dividing the document into a first part and a second part, wherein the first part comprises the first chronological segment, wherein the second part comprises the portion of the document that is not the first chronological segment; and building a search index based on the terms in the first part in one or more of the plurality of documents. - View Dependent Claims (20)
-
Specification