Method and apparatus for robust efficient parsing

US 7,024,351 B2
Filed: 08/21/2001
Issued: 04/04/2006
Est. Priority Date: 08/21/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A method of parsing text to form a representation of the text, the representation having structures that span sub-strings of words in the text, each structure having a token at its root, the method comprising:

identifying a first structure that spans a first sub-string of words in the text and has a first token as its root, the first sub-string having a starting position and an ending position;

indexing the first structure by the first token and the starting position and ending position of the first sub-string;

identifying a second structure that spans the first sub-string of words and has the first token as its root;

using the first token and the starting position and ending position of the first sub-string to locate the first structure; and

removing one of the first structure and second structure from further consideration in the formation of the representation of the text.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a method for improving the efficiency of parsing text. Aspects of the invention include representing parse tokens as integers where a portion of the integer indicates the location in which a definition for the token can be found. In a further aspect, an integer representing a token points to an array of tokens that can be activated by the token. In another aspect, a list of pointers to partial parses is created before attempting to parse a next word in the text string. The list of pointers includes pointers to partial parses that are expecting particular semantic tokens. A fourth aspect of the invention utilizes a data structure to list the semantic tokens that have been fully parsed for each span in the input text segment. When a token is fully parsed, the list is accessed to determine if the new token should be discarded.

Citations

5 Claims

1. A method of parsing text to form a representation of the text, the representation having structures that span sub-strings of words in the text, each structure having a token at its root, the method comprising:
- identifying a first structure that spans a first sub-string of words in the text and has a first token as its root, the first sub-string having a starting position and an ending position;
  
  indexing the first structure by the first token and the starting position and ending position of the first sub-string;
  
  identifying a second structure that spans the first sub-string of words and has the first token as its root;
  
  using the first token and the starting position and ending position of the first sub-string to locate the first structure; and
  
  removing one of the first structure and second structure from further consideration in the formation of the representation of the text.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein removing one of the first structure and second structure comprises removing the second structure.
  - 3. The method of claim 1 wherein removing one of the first structure and second structure comprises removing the first structure.
  - 4. The method of claim 3 wherein removing the first structure comprises removing the first structure so that it is no longer indexed by the first token and the starting position and ending position of the first sub-string and indexing the second structure by the first token and the starting position and ending position of the first sub-string.
  - 5. The method of claim 1 wherein removing one of the first structure and the second structure comprises comparing the first structure to the second structure to determine which structure is better for the representation of the text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Wang, YeYi
Primary Examiner(s)
Knepper, David D.
Assistant Examiner(s)
Han, Qi

Application Number

US09/934,223
Publication Number

US 20030115039A1
Time in Patent Office

1,687 Days
Field of Search

704/9, 704/2, 704/257
US Class Current

704/9
CPC Class Codes

G06F 40/211 Syntactic parsing, e.g. bas...

Method and apparatus for robust efficient parsing

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for robust efficient parsing

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links