Method for natural language data processing using morphological and part-of-speech information
First Claim
1. A method for constructing an enhanced text corpus file using a computer and comprising the steps of:
- providing a text corpus file to said computer, said text corpus file comprising respective electrical signals representative of a predetermined natural language data;
processing said electrical signals to parse said text corpus file into a plurality of sentences each constituted of a respective stream of corpus words;
executing linguistic analysis upon each said stream of corpus words to derive respective part-of-speech information and morphological roots corresponding to respective ones of said corpus words; and
generating an enhanced text corpus file using said derived morphological roots and said derived part-of-speech information.
3 Assignments
0 Petitions
Accused Products
Abstract
An enhancement and retrieval method for natural language data using a computer is disclosed. The method includes executing linguistic analysis upon a text corpus file to derive morphological, part-of-speech information as well as lexical variants corresponding to respective corpus words. The derived linguistic information is then used to construct an enhanced text corpus file. A query text file is linguistically analyzed to construct a plurality of trigger token morphemes which are then used to construct a search mask stream which is correlated with the enhanced text corpus file. A match between the search mask stream and the enhanced corpus file allows a user to retrieve selected portions of the enhanced text corpus.
-
Citations
26 Claims
-
1. A method for constructing an enhanced text corpus file using a computer and comprising the steps of:
-
providing a text corpus file to said computer, said text corpus file comprising respective electrical signals representative of a predetermined natural language data; processing said electrical signals to parse said text corpus file into a plurality of sentences each constituted of a respective stream of corpus words; executing linguistic analysis upon each said stream of corpus words to derive respective part-of-speech information and morphological roots corresponding to respective ones of said corpus words; and generating an enhanced text corpus file using said derived morphological roots and said derived part-of-speech information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for constructing trigger token morphemes using a computer and comprising the steps of:
-
providing a query text file to said computer, said query text file comprising respective electrical signals representative of predetermined inquiry data; processing the electrical signals representative of said predetermined inquiry data to parse said query text file into respective query items each constituted of a respective stream of query words; executing morphological analysis upon each said stream of query words to derive respective morphological roots corresponding to respective ones of said query words; executing semantic analysis upon each said stream of query words to generate respective lexical variants corresponding to respective ones of said query words, and generating a plurality of trigger token morphemes corresponding to respective ones of said query items, said plurality of trigger token morphemes using said derived morphological roots and said derived lexical variants corresponding to respective ones of said query words. - View Dependent Claims (11, 12)
-
-
13. A method for retrieving selected portions from an enhanced text corpus file using a computer and comprising the steps of:
-
generating a search mask stream based upon a plurality of predetermined trigger token morphemes; scanning said enhanced text corpus file; and correlating said search mask stream with respect to said enhanced text corpus file for retrieving a selected portion of said enhanced text corpus file based upon a match between said search mask stream and said enhanced text corpus file.
-
-
14. An enhancement and retrieval method for natural language data using a computer and comprising the steps of:
-
providing a text corpus file to said computer, said text corpus file comprising respective electrical signals representative of said natural language data; processing said electrical signals to parse said text corpus file into a plurality of sentences each constituted of a respective stream of corpus words; executing linguistic analysis upon each said stream of corpus words to derive respective part-of-speech information and morphological roots corresponding to respective ones of said corpus words; and generating an enhanced text corpus file using said derived morphological roots and said derived part-of-speech information. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
Specification