Creating a document index from a flex- and Yacc-generated named entity recognizer
First Claim
1. A method of generating a web/document index comprising the steps of:
- using a named entity recognizer generated from a tool used to parse computer programs to identify named entities in web pages/documents; and
constructing a web/document index of web pages/documents based in part on the named entities identified by the tool.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods of constructing a document index including named entity information generated by at least one tool associated with parsing computer programs are presented. The methods include using a lexical analyzer generator, e.g. Flex, and/or a parser generator, e.g. Yacc, to generate named entity recognizers. The named entity recognizers are used to identify named entities in documents, in particular, very large document sets such as web pages available on the Internet. The identified named entities are stored as named entity annotations in the document index. Also, methods of performing searches using the document index are presented. The searches are performed based on queries that can be received on an application programming interface (API). Relevant documents are obtained using the named entity annotations, which can be returned across the API. Also presented are associated computer readable media.
-
Citations
30 Claims
-
1. A method of generating a web/document index comprising the steps of:
-
using a named entity recognizer generated from a tool used to parse computer programs to identify named entities in web pages/documents; and
constructing a web/document index of web pages/documents based in part on the named entities identified by the tool. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer readable medium having stored thereon computer readable instructions which, when read by the computer cause the computer to generate a document index by performing steps of:
-
receiving text documents;
identifying named entities in the text documents using a tool used to parse computer programs;
generating named entity annotations corresponding with the identified named entities; and
storing the generated named entity annotations in a database. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of performing document searches comprising the steps of:
-
constructing a document index with named entity annotations generated at least in part from a tool used for parsing computer programs;
receiving a query comprising at least one named entity class;
searching the document index for the at least one named entity class; and
obtaining relevant documents. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification