Method and apparatus for automated search and retrieval process
First Claim
Patent Images
1. A data processing method for identifying noun phrases in a stream of words, the method comprising the steps of:
- extracting a sequence of tokens from the stream,storing the sequence of tokens in a first memory element,determining the most probable part-of-speech tag and grammatical features for each token, andidentifying parts of a noun phrase by inspecting the part-of-speech tags and the grammatical features of a window of extracted tokens, the window of extracted tokens, including a selected candidate token and a first token preceding the selected candidate token and a second token following the selected candidate token.
8 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method for the identification of noun phrases in a stream of natural language text receives an input stream of text, identifies tokens within the stream of text, and processes the tokens to identify noun phrases. The system processes the tokens by annotating the tokens with tags identifying characteristics of the tokens and by contextually analyzing each token and its associated characteristics. During processing, the system can also disambiguate individual token characteristics and identify agreement between tokens.
133 Citations
39 Claims
-
1. A data processing method for identifying noun phrases in a stream of words, the method comprising the steps of:
-
extracting a sequence of tokens from the stream, storing the sequence of tokens in a first memory element, determining the most probable part-of-speech tag and grammatical features for each token, and identifying parts of a noun phrase by inspecting the part-of-speech tags and the grammatical features of a window of extracted tokens, the window of extracted tokens, including a selected candidate token and a first token preceding the selected candidate token and a second token following the selected candidate token. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A data processing method for identifying noun phrases in a stream of words, the method comprising the steps of:
-
extracting a sequence of tokens from the stream, storing the sequence of tokens in a first memory element, determining the most probable part-of-speech tag and grammatical features for each token, identifying parts of a noun phrase by inspecting the part-of-speech tags of successive tokens, and iteratively checking agreement between a first identified part of the noun phrase and a second identified part of the noun phrase immediately following the first identified part in the stream of text. - View Dependent Claims (12, 13)
-
-
14. A data processing method for identifying noun phrases in a stream of words, the method comprising the steps of:
-
extracting a sequence of tokens from the stream, storing the sequence of tokens in a first memory element, determining at least one part-of-speech tag for each token, disambiguating the at least one part-of-speech tag of an ambiguous token by inspecting the part-of-speech tags of a window of sequential tokens containing the ambiguous token, and identifying parts of a noun phrase by inspecting the part-of-speech tags of successive tokens. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 36, 37)
-
-
23. An apparatus for identifying noun phrases contained in a stream of words, the apparatus comprising:
-
tokenizing means for extracting a sequence of digital signals representative of a sequence of tokens contained in the stream, first addressable memory means containing a list of lexical expressions with each lexical expression being associated with a part-of-speech tag and grammatical features, data processing means coupled with the tokenizing means and with the first addressable memory means, the data processing means, including; means for determining a part-of-speech tag and grammatical features for each token by identifying in the first addressable memory means at least one lexical expression representative of each token, and means for identifying parts of a noun phrase by inspecting the part-of-speech tags of a first window of tokens, and means for generating an output signal representative of the tokens forming the identified noun phrase. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 38, 39)
-
Specification