Named entity recognition using compiler methods
First Claim
1. A method of identifying named entities in natural language text comprising the steps of:
- receiving natural language text;
specifying regular expression rules corresponding to patterns of named entities in the natural language text;
applying the regular expression rules to the natural language text using a lexical analyzer generated by a lexical analyzer generator to identify named entities in the natural language text.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods of identifying named entities in natural language text using machine or computer compiler tools are provided. A lexical analyzer generator such as Flex or Lex or an equivalent tool can be used to generate a recognizer for named entities, such as digits, date expressions, and email or web addresses. Alternatively, a parser generator, such as Yacc or Bison or an equivalent tool can be used to generate a recognizer for other named entities, such as person and company names. Further, a lexical analyzer generated by Flex, Lex, or its equivalent is used in combination with a parser generated by Yacc, Bison, or its equivalent to identify named entities. Multiple lexical analyzers and/or parsers identify one or more classes of named entities, such as email addresses or person names. In many embodiments, recognized named entities can be used to construct at least one index of web pages or documents including named entities that can be accessed by a natural language processing application.
98 Citations
30 Claims
-
1. A method of identifying named entities in natural language text comprising the steps of:
-
receiving natural language text;
specifying regular expression rules corresponding to patterns of named entities in the natural language text;
applying the regular expression rules to the natural language text using a lexical analyzer generated by a lexical analyzer generator to identify named entities in the natural language text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 28, 29)
-
-
9. A method of recognizing named entities in natural language text comprising the steps of:
-
receiving natural language text;
specifying possible named entities using grammar rules; and
processing the natural language text using a parser generated by a parser generator to identify the possible named entities based on the set of grammar rules. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A method of recognizing named entities from natural language input text comprising the steps of:
-
receiving natural language text;
accessing at least one named entity lexicon using a parser generated by a parser generator designed to parse computer programs; and
identifying named entities based on look up in at least one named entity lexicon.
-
-
15. A method of identifying named entities in natural language text comprising the steps of:
-
receiving natural language text;
applying regular expression rules to the natural language text to generate annotations corresponding to named entities or named entity constituent strings; and
applying grammar rules to the annotations to identify named entities in the natural language text. - View Dependent Claims (16)
-
-
17. A computer readable medium including computer executable instructions performing the steps of:
-
receiving text in a natural language;
generating annotations using at least one lexical analyzer applying a set of regular expression rules, each annotation corresponding to a named entity or constituent character string of a named entity; and
generating annotations using at least one parser applying a set of grammar rules, each annotation corresponding to a named entity. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. A method of generating a web/document index comprising the steps of:
-
using a named entity recognizer generated from a tool used to parse computer programs to identify named entities in web pages/documents; and
constructing a web/document index of web pages/documents based in part on the named entities identified within the web pages/documents.
-
-
26. A computer readable medium having stored thereon computer readable instructions which, when read by the computer cause the computer to perform steps of:
-
receiving a natural language input through an application programming interface (API);
providing the natural language input to one or more natural language processing (NLP) components, including a named entity recognizer to perform named entity analysis operations on the natural language input using a compiler tool designed to parse computer programs, the named entity analysis operations selected from a plurality of different possible NLP analysis operations selectable through the API; and
returning analysis results from the named entity operations through the API.
-
-
27. A computer readable medium including computer executable instructions performing the steps of:
-
receiving natural language text;
processing the natural language text using a lexical analyzer and a parser to generate named entity annotated text, wherein the lexical analyzer and the parser are generated from tools used to parse computer programs; and
processing the named entity annotated text using a full parser to generate fully parsed text. - View Dependent Claims (30)
-
Specification