Extraction of key sections from texts using automatic indexing techniques

US 5,960,383 A
Filed: 02/25/1997
Issued: 09/28/1999
Est. Priority Date: 02/25/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for automatically condensing a document containing noun-phrases, to produce a synopsis of the document comprising the steps of:

automatically extracting from said document a list of noun-phrases appearing in said document;

assigning a weight to each noun-phrase occurring in said document noun-phrase list;

storing said document noun-phrase list including said corresponding weights in a memory;

dividing said document by using user input into a plurality of identifiable document-sections;

comparing words in each one of said plurality of identifiable document-sections with said document noun-phrase list;

providing a count associated with each of said plurality of identifiable document-sections;

ranking each one of said plurality of identifiable document-sections in a descending order by said count;

storing said ranks in said memory;

providing as output, using said ranks, a first n number of identifiable document-sections from said ranks where n is a predetermined number; and

producing a synopsis of said document wherein said number of identified document-sections in the synopsis are in a sequence unchanged from how they existed in the document.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A document condensation method and apparatus produce a document synopsis are provided in which automatic indexing techniques are used to analyze an input document to determine a list of words and phrases characteristic of the subject matter of the document. Sections of the document are compared to the list of characteristic words and phrases to determine which sections of the document are most like the overall document in view of subject matter. A predetermined number of sections determined to be most similar to the overall document in content are provided as a condensed version of the whole document.

Citations

14 Claims

1. A method for automatically condensing a document containing noun-phrases, to produce a synopsis of the document comprising the steps of:
- automatically extracting from said document a list of noun-phrases appearing in said document;
  
  assigning a weight to each noun-phrase occurring in said document noun-phrase list;
  
  storing said document noun-phrase list including said corresponding weights in a memory;
  
  dividing said document by using user input into a plurality of identifiable document-sections;
  
  comparing words in each one of said plurality of identifiable document-sections with said document noun-phrase list;
  
  providing a count associated with each of said plurality of identifiable document-sections;
  
  ranking each one of said plurality of identifiable document-sections in a descending order by said count;
  
  storing said ranks in said memory;
  
  providing as output, using said ranks, a first n number of identifiable document-sections from said ranks where n is a predetermined number; and
  
  producing a synopsis of said document wherein said number of identified document-sections in the synopsis are in a sequence unchanged from how they existed in the document.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein said step of extracting includes the step of:
    - ranking each of said noun-phrases in accordance with a frequency with which said each of said noun-phrases occurs in said document.
  - 3. The method of claim 2 wherein said step of extracting includes the step of:
    - ranking each of said noun-phrases in accordance with a frequency with which said each of said noun-phrases occurs in the English language.
  - 4. The method of claim 1 wherein said step of dividing using user input is preceded by the step of:
    - providing, from a user, a definition of a document-section.
  - 5. The method of claim 1 wherein said step of comparing includes the step of:
    - extracting from said each one of said plurality of identifiable document-sections a list of noun-phrases appearing in said document-section;
      
      assigning a weight to each noun-phrase occurring in said document-section noun-phrase list; and
      
      storing said document-section noun phrase list, including said corresponding weights, in said memory.
  - 6. The method of claim 5 wherein said step of providing a count includes the step of:
    - summing the weights of each noun phrase in said document-section noun-phrase list which also occurs in said document noun-phrase list.
  - 7. The method of claim 1 wherein said step of providing a count includes the step of:
    - incrementing a counter each time a noun phrase from said document noun-phrase list appears in said each of said plurality of identifiable document-sections.

8. An apparatus for automatically condensing a document which includes noun-phrases, to produce a synopsis of the document, said apparatus comprising:
- means for automatically extracting from said document a list of noun-phrases appearing in said document;
  
  means for assigning a weight to each noun-phrase occurring in said document noun-phrase list;
  
  means for storing said document noun-phrase list including said corresponding weights in a memory;
  
  means for dividing said document by using user-input, into a plurality of identifiable document-sections;
  
  means for comparing words in each one of said plurality of identifiable document-sections with said document noun-phrase list;
  
  means for providing a count associated with each of said plurality of identifiable document-sections;
  
  means for ranking each one of said plurality of identifiable document-sections in a descending order by said count;
  
  means for storing said ranks in said memory; and
  
  means for providing as output using said ranks, a first n number of identifiable document-sections from said ranks where n is a predetermined number, to produce a document synopsis wherein said n number of identifiable document-sections are in a sequence unchanged from how they existed in the document.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus of claim 8 further including:
    - means for ranking each of said noun phrase in accordance with a frequency with which said each of said noun-phrases occurs in said document.
  - 10. The apparatus of claim 9 further including:
    - means for ranking each of said noun-phrase in accordance with a frequency with which said each of said noun-phrase occurs in the English language.
  - 11. The apparatus of claim 8 further including:
    - means for providing, from a user, a definition of document-section.
  - 12. The apparatus of claim 8 further including:
    - means for extracting from said each one of said plurality of identifiable document-sections a list of noun-phrase appearing in said document-section;
      
      means for assigning a weight to each noun-phrase occurring in said document-section noun-phrase list; and
      
      means for storing said document-section noun-phrase list including said corresponding weights in said memory.
  - 13. The apparatus of claim 12 further including:
    - means for summing the weights of each noun-phrase in said document-section noun-phrase list which also occurs in said document noun-phrase list.
  - 14. The apparatus of claim 8 further including:
    - means for incrementing a counter each time a noun-phrase from said document noun-phrase list appears in said each of said plurality of identifiable document-sections.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett Packard Enterprise Development LP (Hewlett-Packard Enterprise Company)
Original Assignee
Digital Equipment Corporation (HP Inc.)
Inventors
Fleischer, Robert John
Primary Examiner(s)
Isen, Forester W.
Assistant Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US08/805,780
Time in Patent Office

945 Days
Field of Search

704/1, 704/9, 704/10, 707/500, 707/530, 707/531
US Class Current

704/9
CPC Class Codes

G06F 16/345   Summarisation for human users

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

Extraction of key sections from texts using automatic indexing techniques

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Extraction of key sections from texts using automatic indexing techniques

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links