Applying a structured language model to information extraction

US 7,805,302 B2
Filed: 05/20/2002
Issued: 09/28/2010
Est. Priority Date: 05/20/2002
Status: Active Grant

First Claim

Patent Images

1. A method of extracting information from a natural language input using a computer having a processor and memory, comprising:

accessing a semantic schema with a structured language model, the semantic schema having a template with a structure of frames that corresponds to one or more pieces of information to be extracted for an application program;

generating, with the processor, a candidate parse by parsing the natural language input with the structured language model, wherein, during generation, the structured language model generates hypothesis parses of a portion of the natural language input by applying the template and accepts only those hypothesis parses, as possible candidate parses, if the hypothesis parses completely match the structure of frames from the template, and discards all hypothesis parses, during construction of the hypothesis parses, that do not completely match the structure of frames from the templates, each accepted candidate parse including syntactic head words, and semantic labels, and using the head words and semantic labels in each accepted candidate parse to predict a next word in the natural language input, to obtain an overall parse for the natural language input, the overall parse having a semantic frame label and one or more constituents of the natural language input each having a semantic slot label, the overall parse being constrained based on the semantic schema accessed; and

identifying, with the processor, an information extraction frame corresponding to the natural language input based on the frame label and filling in slots in the frame with the one or more constituents labeled by the slot labels.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.

Citations

6 Claims

1. A method of extracting information from a natural language input using a computer having a processor and memory, comprising:
- accessing a semantic schema with a structured language model, the semantic schema having a template with a structure of frames that corresponds to one or more pieces of information to be extracted for an application program;
  
  generating, with the processor, a candidate parse by parsing the natural language input with the structured language model, wherein, during generation, the structured language model generates hypothesis parses of a portion of the natural language input by applying the template and accepts only those hypothesis parses, as possible candidate parses, if the hypothesis parses completely match the structure of frames from the template, and discards all hypothesis parses, during construction of the hypothesis parses, that do not completely match the structure of frames from the templates, each accepted candidate parse including syntactic head words, and semantic labels, and using the head words and semantic labels in each accepted candidate parse to predict a next word in the natural language input, to obtain an overall parse for the natural language input, the overall parse having a semantic frame label and one or more constituents of the natural language input each having a semantic slot label, the overall parse being constrained based on the semantic schema accessed; and
  
  identifying, with the processor, an information extraction frame corresponding to the natural language input based on the frame label and filling in slots in the frame with the one or more constituents labeled by the slot labels.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1 wherein identifying comprises:
    - generating a probability that generated overall parses occur given a word sequence;
      
      selecting an overall parse generated during parsing that has a highest probability of occurring; and
      
      retaining only the semantic information in the overall parse having the highest probability.
  - 3. The method of claim 1 wherein identifying comprises:
    - generating a probability that generated overall parses occur given a word sequence;
      
      summing the probability over all parses having a common semantic parse; and
      
      selecting the semantic parse based on the summed probability.

4. An information extraction system for extracting information from a natural language speech input using a computer, comprising:
- a speech recognizer, including a structured language model, receiving the natural language speech input and generating a textual representation of the natural language speech input based on language modeling by the structured language model, the structured language model accessing a semantic schema having a template with a structure of frames that corresponds to one or more pieces of information to be extracted for an application program and parsing the textual representation, generating a plurality of parse hypotheses, to obtain one or more candidate semantic parse trees, each of the parse hypotheses being constrained during generation, by the template in the semantic schema so that all parse hypotheses that do not match the complete structure of frames in the template are discarded during their construction so that each candidate semantic parse tree obtained by the structured language model matches the structure of frames and all parse hypotheses which do not match the complete structure of frames in the template are rejected, wherein each candidate semantic parse tree has a structure with a semantic frame label and one or more semantic slot labels corresponding to constituents of the textual representation, the semantic frame and slot labels identifying the information to be extracted, and wherein a selected candidate parse includes head words and semantic labels that are used to predict a next word in the natural language input, to obtain an overall parse for the natural language input; and
  
  a processor, being a functional element of the computer, activated by the speech recognizer to facilitate parsing of the textual representation.
- View Dependent Claims (5, 6)
- - 5. The system of claim 4 and further comprising:
    - a ranking component ranking the candidate semantic parse trees generated by the structured language model.
  - 6. The system of claim 5 wherein the ranking component ranks each candidate semantic parse tree by summing over all generated candidate semantic parse trees.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Mahajan, Milind, Chelba, Ciprian
Primary Examiner(s)
Wozniak; James S

Application Number

US10/151,979
Publication Number

US 20030216905A1
Time in Patent Office

3,053 Days
Field of Search

704/1, 704/9, 704/10, 704/257, 704/255, 715503-504, 715/507, 707 3- 4
US Class Current

704/257
CPC Class Codes

G06F 40/00   Handling natural language d...

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/237   Lexical tools

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G06F 40/56   Natural language generation

G10L 15/00   Speech recognition G10L17/0...

G10L 15/04   Segmentation; Word boundary...

G10L 15/05   Word boundary detection

G10L 15/18   using natural language mode...

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

Applying a structured language model to information extraction

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Applying a structured language model to information extraction

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links