Method of summarizing text by sentence extraction

US 20020174149A1
Filed: 04/26/2002
Published: 11/21/2002
Est. Priority Date: 04/27/2001
Status: Active Grant

First Claim

Patent Images

1. A method of summarizing text, where the text consists of a number of sentences, and where each sentence includes a number of terms, comprising the steps of:

(a) identifying each sentence in the text;

(b) identifying each term in each sentence;

(c) generating a matrix, where each column in the matrix represents a sentence in the text, and where each row in the matrix represents a term in the text;

(d) replacing each entry in the matrix by a product of the matrix entry and a user-definable function that decays exponentially;

(e) determining the Euclidean length of each column by squaring the entries in the corresponding column, summing the squares, and taking the square root of the sum;

(f) selecting the column with a maximum Euclidean length as a summary sentence;

(g) reducing the Euclidean lengths of the columns not selected in step (f); and

(h) returning to step(e) if another summary sentence is desired, otherwise returning the selected summary sentences as the summary of the text.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of summarizing text. The sentences in the text are identified first. Then, the terms in each sentence are identified. A matrix is then generated, where the columns represent the sentences and the rows represent the terms. The entries in the matrix are weighted with an exponentially decaying function or a Hidden Markov Model. The Euclidean length of each column is determined. The sentence corresponding to the column having the maximum Euclidean length is selected as a summary sentence. The columns corresponding to the remaining sentences have their matrix entries reduced. If additional summary sentences are desired then return to the step of determining Euclidean length of the columns.

33 Citations

View as Search Results

8 Claims

1. A method of summarizing text, where the text consists of a number of sentences, and where each sentence includes a number of terms, comprising the steps of:
- (a) identifying each sentence in the text;
  
  (b) identifying each term in each sentence;
  
  (c) generating a matrix, where each column in the matrix represents a sentence in the text, and where each row in the matrix represents a term in the text;
  
  (d) replacing each entry in the matrix by a product of the matrix entry and a user-definable function that decays exponentially;
  
  (e) determining the Euclidean length of each column by squaring the entries in the corresponding column, summing the squares, and taking the square root of the sum;
  
  (f) selecting the column with a maximum Euclidean length as a summary sentence;
  
  (g) reducing the Euclidean lengths of the columns not selected in step (f); and
  
  (h) returning to step(e) if another summary sentence is desired, otherwise returning the selected summary sentences as the summary of the text.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein said step of identifying each term in each sentence is comprised of the step of identifying each term in each sentence after stop-words are eliminated and any term ambiguity is resolved.
  - 3. The method of claim 1, wherein said step of generating a matrix is comprised of the step of generating a matrix where each entry in the matrix is selected from the group of matrix entries comprising one and the frequency of occurrence of a particular term in a particular sentence.
  - 4. The method of claim 1, wherein said step of replacing each entry in the matrix by a product of the matrix entry and a user-definable function that decays exponentially is comprised of the step of replacing each entry in the matrix by a product of the matrix entry and ge⁽⁻
    - 8j/n)+t, where g is a user-definable constant, where e˜
      
      2.718, where j is the column number of the column in question, where n is the number of sentences in the text, and where t is a user-definable constant that makes the decaying function'"'"'s length approximately equal to the decaying function'"'"'s height.
  - 5. The method of claim 1, wherein said step of reducing the Euclidean lengths of the columns not selected in step (f) is comprised of the step of subtracting r_jiq_i, from each entry in the columns not selected in step (f), where r_ji=a_j^Tq_i, where a_j^Tis the transpose of the column in question prior to reducing the entries therein, where q_i=a_s/∥
    - a_s∥
      
      , where a_sis the column selected in step (f), and where ∥
      
      a_s∥
      
      is the Euclidean length of column a_s.

6. A method of summarizing text, where the text consists of a number of sentences, and where each sentence includes a number of terms, comprising the steps of:
- (a) generating a user-definable number of features for each sentence in the text, where the features for each sentence are selected from the group of features consisting of a user-definable value assigned to the sentence in question based on the position of the sentence within a paragraph, a value consisting of a log(number of terms in the sentence in question +1), o2(i)=log(n+1), where n is the number of terms in the sentence, $o3 (i) = \sum_{j \in s_{i}} \log (\frac{b_{j}}{\sum_{k \in D} b_{k}}),$ where b_jis the frequency of occurrence of term j in a set of baseline documents, and where b_kis the frequency of occurrence of term k in the baseline documents, where an outer summation is over all terms j which occur in the i-th sentence s_i, and $o4 (i) = \sum_{j \in s_{i}} \log (\frac{d_{j}}{\sum_{k \in D} b_{k}}),$ where d_jis the frequency of occurrence of term j in the text D, and where b_kis the frequency of occurrence of term k in the text D;
  
  (b) normalizing features ol(i), o2 (i), o3 (i) and o4(i) each by subtracting its mean and dividing the remainder by its standard deviation;
  
  (c) selecting a Markov state space diagram having 2s+1 states, with s summary states and s+1 non-summary states;
  
  (d) generating a Hidden Markov transition matrix from data selected from the group of data consisting of marked data, blind data, and user-definable data;
  
  (e) computing a most likely set of states through the Markov state space diagram using a state space traversal method selected from the group of state space traversal methods consisting of a forward-backward recursion traversal method and a Viterbi traversal method;
  
  (f) identifying those sentences in the text that caused a traversal to a summary state in the Markov state space diagram; and
  
  (g) returning those sentences identified in step (f) as the summary of the text.

7. A method of summarizing text, where the text consists of a number of sentences, and where each sentence includes a number of terms, comprising the steps of:
- (a) identifying each sentence in the text;
  
  (b) identifying each term in each sentence;
  
  (c) generating a matrix, where each column in the matrix represents a sentence in the text, and where each row in the matrix represents a term in the text;
  
  (d) replacing each entry in the matrix by a product of the matrix entry and a Hidden Markov Model probability that the sentence corresponding to the matrix entry is a summary sentence;
  
  (e) determining the Euclidean length of each column by squaring the entries in the column in question, summing the squares, and taking the square root of the sum;
  
  (f) selecting the column with the maximum Euclidean length as a summary sentence;
  
  (g) reducing the Euclidean lengths of the columns not selected in step (f); and
  
  (h) returning to step(e) if another summary sentence is desired, otherwise returning the selected summary sentences as the summary of the text.

8. A method of summarizing a plurality of text, where each of the plurality of text consists of a number of sentences, and where each sentence includes a number of terms, comprising the steps of:
- (a) identifying, for each plurality of text, each sentence in the corresponding plurality of text;
  
  (b) identifying, for each plurality of text, each term in each sentence in the corresponding plurality of text;
  
  (c) generating a matrix, where each column in the matrix represents a sentence in the plurality of text, and where each row in the matrix represents a term in the plurality of text, (d) replacing each entry in the matrix by a product of the matrix entry and a Hidden Markov Model probability that the sentence corresponding to the matrix entry is a summary sentence;
  
  (e) determining the Euclidean length of each column by squaring the entries in the corresponding column, summing the squares, and taking the square root of the sum;
  
  (f) selecting the column with a maximum Euclidean length as a summary sentence;
  
  (g) reducing the Euclidean lengths of the columns not selected in step (f); and
  
  (h) returning to step(e) if another summary sentence is desired, otherwise returning the selected summary sentences as the summary of the text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Security Agency
Original Assignee
Institute For Defense Analyses
Inventors
OʼLeary, Dianne P., Conroy, John M.

Granted Patent

US 6,990,634 B2
Time in Patent Office

Days
Field of Search
US Class Current

715/531
CPC Class Codes

G06F 16/345   Summarisation for human users

G06F 40/216   using statistical methods

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/35   Discourse or dialogue repre...

Method of summarizing text by sentence extraction

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Method of summarizing text by sentence extraction

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links