APPLYING A STRUCTURED LANGUAGE MODEL TO INFORMATION EXTRACTION

US 20100318348A1
Filed: 08/24/2010
Published: 12/16/2010
Est. Priority Date: 05/20/2002
Status: Active Grant

First Claim

Patent Images

1. A method of training an information extraction system to extract information from a natural language input, comprising:

generating parses with a structured language model using annotated training data that has semantic constituent labels with semantic constituent boundaries identified;

while generating parses, constraining parses to match the semantic constituent boundaries; and

while generating parses, constraining the parses to match the semantic constituent labels.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.

45 Citations

View as Search Results

8 Claims

1. A method of training an information extraction system to extract information from a natural language input, comprising:
- generating parses with a structured language model using annotated training data that has semantic constituent labels with semantic constituent boundaries identified;
  
  while generating parses, constraining parses to match the semantic constituent boundaries; and
  
  while generating parses, constraining the parses to match the semantic constituent labels.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 and further comprising:
    - prior to generating parses, initializing the structured language model with syntactically annotated training data.
  - 3. The method of claim 2 wherein initializing comprises:
    - initializing the structured language model with syntactically annotated training data parsed from in-domain sentences.
  - 4. The method of claim 2 wherein initializing comprises:
    - initializing the structured language model with syntactically annotated training data parsed from out-of-domain sentences.
  - 5. The method of claim 1 wherein generating parses comprises:
    - generating syntactic parses with syntactic labels wherein the parses conform to the semantic constituent boundaries;
      
      enriching the syntactic labels with semantic labels; and
      
      generating semantic parses with semantic labels wherein the semantic labels conform to the semantic constituent labels in the annotated training data.
  - 6. The method of claim 1 wherein generating parses comprises:
    - generating the parses as binary parse trees.
  - 7. The method of claim 1 wherein generating parses comprises:
    - generating the parses in a left-to-right fashion.
  - 8. The method of claim 1 wherein generating parses comprises:
    - generating parses in a bottom-up fashion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Chelba, Ciprian, Mahajan, Milind

Granted Patent

US 8,706,491 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/00   Handling natural language d...

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/237   Lexical tools

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G06F 40/56   Natural language generation

G10L 15/00   Speech recognition G10L17/0...

G10L 15/04   Segmentation; Word boundary...

G10L 15/05   Word boundary detection

G10L 15/18   using natural language mode...

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

APPLYING A STRUCTURED LANGUAGE MODEL TO INFORMATION EXTRACTION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

45 Citations

8 Claims

Specification

Use Cases

Quick Links

Others

APPLYING A STRUCTURED LANGUAGE MODEL TO INFORMATION EXTRACTION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

8 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others