Method and system for information extraction

US 20070168181A1
Filed: 03/16/2007
Published: 07/19/2007
Est. Priority Date: 06/22/2000
Status: Active Grant

First Claim

Patent Images

1. A method for extracting information from a natural language text corpus based on a natural language query, comprising the steps of:

indexing and storing the natural language text corpus;

analyzing a natural language query with respect to phrases, phrase types, syntactic roles, word tokens of phrases, and lexical meaning of word tokens;

creating one or more surface variants for at least one phrase of the natural language query, said one or more surface variants each having the same phrase type as said at least one phrase of the natural language query, and each comprising a word token being a lexical head and having the same lexical meaning as a word token being a lexical head of said at least one phrase of the natural language query;

comparing said one or more surface variants and said at least one phrase of the natural language query with the indexed and stored natural language text corpus; and

extracting from said indexed and stored natural language text corpus, portions of text comprising a string of word tokens that matches any one of said surface variants or said at least one phrase of the natural language query.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, a system and a computer program for extracting information from a natural language text corpus based on a natural language query are disclosed. The natural language text corpus is indexed and stored. A natural language query is analyzed with respect to phrases, phrase types, syntactic roles, word tokens of phrases, and lexical meaning of word tokens. One or more surface variants are created for at least one phrase of the natural language query. The one or more surface variants each have the same phrase type as the at least one phrase of the natural language query, and each comprise a word token which is a lexical head and has the same lexical meaning as a word token which is a lexical head of the at least one phrase of the natural language query. The one or more surface variants and the at least one phrase of the natural language query are compared with the indexed and stored natural language text corpus. Portions of text are extracted from the indexed and stored natural language text corpus, which portions comprise a string of word tokens that matches any one of said surface variants or said at least one phrase of the natural language query.

15 Citations

View as Search Results

7 Claims

1. A method for extracting information from a natural language text corpus based on a natural language query, comprising the steps of:
- indexing and storing the natural language text corpus;
  
  analyzing a natural language query with respect to phrases, phrase types, syntactic roles, word tokens of phrases, and lexical meaning of word tokens;
  
  creating one or more surface variants for at least one phrase of the natural language query, said one or more surface variants each having the same phrase type as said at least one phrase of the natural language query, and each comprising a word token being a lexical head and having the same lexical meaning as a word token being a lexical head of said at least one phrase of the natural language query;
  
  comparing said one or more surface variants and said at least one phrase of the natural language query with the indexed and stored natural language text corpus; and
  
  extracting from said indexed and stored natural language text corpus, portions of text comprising a string of word tokens that matches any one of said surface variants or said at least one phrase of the natural language query.
- View Dependent Claims (2, 3, 4, 5, 7)
- - 2. The method of claim 1, wherein the natural language query is further analyzed with respect to lemmas of word tokens and wherein the step of creating comprises:
    - creating one or more surface variants for at least one phrase of the natural language query, said one or more surface variants each having the same phrase type as the at least one phrase of the natural language query, and each comprising a word token being a lexical head and having the same lemma as a word token being a lexical head of the at least one phrase of the natural language query.
  - 3. The method according to claim 1, further comprising analyzing the natural language text corpus with respect to phrases, phrase types, syntactic roles, word tokens of phrases, and lexical meaning of word tokens.
  - 4. The method of claim 3, further comprising the step of analyzing said natural language text corpus with respect to location of clauses, wherein the step of extracting comprises:
    - extracting, from the indexed and stored analyzed natural language text corpus, portions of text comprising clauses which in turn comprises a string of word tokens that matches any one of said surface variants or said at least one phrase of the natural language query.
  - 5. The method of claim 1, wherein the step of creating comprises:
    - creating one or more surface variants for at least one phrase of the natural language query, said one or more surface variants each having the same phrase type as the at least one phrase of the natural language query, each comprising a word token being a lexical head and having the same lemma as a word token being a lexical head of the at least one phrase of the natural language query, and each comprising a word token being a lexical modifier and having the same lemma as a word token being a lexical modifier of the at least one phrase of the natural language query.
  - 7. A computer program comprising computer-executable instructions for performing the steps recited in claim 1.

6. A system for extracting information from a natural language text corpus based on a natural language query, comprising:
- a text analysis unit for analyzing a natural language query with respect to phrases, phrase types, syntactic roles, word tokens of phrases, and lexical meaning of word tokens;
  
  storage means operatively connected to said text analysis unit, for storing the natural language text corpus;
  
  an indexer, operatively connected to said storage means, for indexing the natural language text corpus;
  
  an index, operatively connected to said indexer, for storing said indexed natural language text corpus;
  
  a query manager, operatively connected to said text analysis unit, comprising means for creating surface variants for at least one phrase of the natural language query, said surface variants each having the same phrase type as said at least one phrase of the natural language query, and each comprising a word token being a lexical head and having the same lexical meaning as a word token being a lexical head of said at least one phrase of the natural language query, and means for comparing said surface variants and said at least one phrase of the natural language query with the indexed natural language text corpus in said index; and
  
  a result manager operatively connected to said index, for extracting, from said indexed and stored natural language text corpus, each portion of text comprising a string of word tokens that matches any one of said surface variants or said analyzed natural language query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Essencient Ltd
Original Assignee
Hapax Ltd.
Inventors
Ejerhed, Eva, Braroe, Peter

Granted Patent

US 7,657,425 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 16/3334   Selection or weighting of t...

G06F 16/3335   Syntactic pre-processing, e...

G06F 16/3344   using natural language anal...

G06F 40/20   Natural language analysis s...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/253   Grammatical analysis; Style...

G06F 40/268   Morphological analysis

G06F 40/30   Semantic analysis

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Method and system for information extraction

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

15 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for information extraction

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links