Method and system for information extraction

US 7,194,406 B2
Filed: 01/11/2005
Issued: 03/20/2007
Est. Priority Date: 06/22/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method for extracting information from a natural language text corpus based on a natural language query, comprising the steps of:

analyzing said natural language text corpus with respect to location of phrases, location of word tokens, phrase types, and lexical meaning of word tokens;

indexing and storing the analyzed natural language text corpus;

analyzing a natural language query with respect to phrases, phrase types, word tokens of phrases, and lexical meaning of word tokens;

identifying, for at least one phrase of the analyzed natural language query, phrases of the indexed and stored analyzed natural language text corpus each having the same phrase type as the at least one phrase of the analyzed natural language query, and each comprising a word token being a lexical head and having the same lexical meaning as a word token being a lexical head of the at least one phrase of the analyzed natural language query; and

extracting, from the indexed and stored analyzed natural language text corpus, portions of text comprising the identified phrases.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a system for extracting information from a natural language text corpus based on a natural language query are disclosed. In the method the natural language text corpus is analyzed with respect to surface structure of word tokens and surface syntactic roles of constituents, and the analyzed natural language text corpus is then indexed and stored. Furthermore a natural language query is analyzed with respect to surface structure of word tokens and surface syntactic roles of constituents. From the analyzed natural language query one or more surface variants are then created, where these surface variants are equivalent to the natural language query with respect to lexical meaning of word tokens and surface syntactic roles of constituents. The surface variants are then compared with the indexed and stored analyzed natural language text corpus, and each portion of text comprising a string of word tokens that matches the any one of the surface variants or the natural language query is extracted from the indexed and stored analyzed natural language text corpus.

45 Citations

View as Search Results

6 Claims

1. A method for extracting information from a natural language text corpus based on a natural language query, comprising the steps of:
- analyzing said natural language text corpus with respect to location of phrases, location of word tokens, phrase types, and lexical meaning of word tokens;
  
  indexing and storing the analyzed natural language text corpus;
  
  analyzing a natural language query with respect to phrases, phrase types, word tokens of phrases, and lexical meaning of word tokens;
  
  identifying, for at least one phrase of the analyzed natural language query, phrases of the indexed and stored analyzed natural language text corpus each having the same phrase type as the at least one phrase of the analyzed natural language query, and each comprising a word token being a lexical head and having the same lexical meaning as a word token being a lexical head of the at least one phrase of the analyzed natural language query; and
  
  extracting, from the indexed and stored analyzed natural language text corpus, portions of text comprising the identified phrases.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the natural language text corpus and natural language query are analyzed with respect to lemmas of word tokens and wherein, for at least one phrase of the analyzed natural language query, phrases of the indexed and stored analyzed natural language text corpus are identified each having the same phrase type as the at least one phrase of the analyzed natural language query, and each comprising a word token being a lexical head and having the same lemma as a word token being a lexical head of the at least one phrase of the analyzed natural language query.
  - 3. The method of claim 2, further comprising the step of:
    - analyzing said natural language text corpus with respect to location of clauses,wherein the step of identifying comprises;
      
      identifying, for each of the phrases of the analyzed natural language query, clauses of the indexed and stored analyzed natural language text corpus, each comprising phrases having the same phrase types as a respective one of the phrases of the analyzed natural language query, and each of the phrases comprising a word token being a lexical head and having the same lemma as a word token being a lexical head of the respective one of the phrases of the analyzed natural language query;
      
      and wherein the step of extracting comprises;
      
      extracting, from the indexed and stored analyzed natural language text corpus, portions of text comprising the identified clauses.
  - 4. The method of claim 2, wherein, for at least one phrase of the analyzed natural language query, phrases of the indexed and stored analyzed natural language text corpus are identified each having the same phrase type as the at least one phrase of the analyzed natural language query, each comprising a word token being a lexical head and having the same lemma as a word token being a lexical head of the at least one phrase of the analyzed natural language query, and each comprising a word token being a modifier and having the same lemma as a word token being a modifier of the at least one phrase of the analyzed natural language query.

5. A method for extracting information from a natural language text corpus based on a natural language query, comprising the steps of:
- analyzing said natural language text corpus with respect to location of phrases, location of word tokens, phrase types, and lexical meaning of word tokens;
  
  indexing and storing the analyzed natural language text corpus;
  
  analyzing a natural language query consisting of one phrase with respect to phrase type, word tokens of the phrase, and lexical meaning of the word tokens;
  
  identifying phrases of the indexed and stored analyzed natural language text corpus each having the same phrase type as the phrase of the analyzed natural language query, each comprising a word token being a lexical head and having the same lexical meaning as a word token being a lexical head of the phrase of the analyzed natural language query, and each comprising a word token being a modifier and having the same lexical meaning as a word token being a modifier of the lexical head of the phrase of the analyzed natural language query; and
  
  extracting, from the indexed and stored analyzed natural language text corpus, portions of text comprising the identified phrases.
- View Dependent Claims (6)
- - 6. The method of claim 5, wherein the natural language text corpus and natural language query are analyzed with respect to lemmas of word tokens and wherein phrases of the indexed and stored analyzed natural language text corpus are identified each having the same phrase type as the phrase of the analyzed natural language query, each comprising a word token being a lexical head and having the same lemma as a word token being a lexical head of the phrase of the analyzed natural language query, and each comprising a word token being a modifier and having the same lemma as a word token being a modifier of the lexical head of the phrase of the analyzed natural language query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Essencient Ltd
Original Assignee
Hapax Ltd.
Inventors
Braroe, Peter A., Ejerhed, Eva Ingegord
Primary Examiner(s)
Harper, V. Paul

Application Number

US11/032,075
Publication Number

US 20050131886A1
Time in Patent Office

798 Days
Field of Search

None
US Class Current

704/9
CPC Class Codes

G06F 16/3334   Selection or weighting of t...

G06F 16/3335   Syntactic pre-processing, e...

G06F 16/3344   using natural language anal...

G06F 40/20   Natural language analysis s...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/253   Grammatical analysis; Style...

G06F 40/268   Morphological analysis

G06F 40/30   Semantic analysis

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Method and system for information extraction

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

45 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for information extraction

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links