Sector content mining system using a modular knowledge base
First Claim
1. A sequential textual analysis system operative to identify in a document a set of named entities and correspondingly associated events, said sequential textual analysis process comprising:
- a) a named entity extraction component operative to identify names in a document, said named entity extraction component being further operative to associate each identified name with a name class identifier of a set of name class identifiers;
b) a text classification component operative to analyze said document to identify event identifiers, representative of selected content of said document, having predetermined associations with said set of name class identifiers, said text classification component producing a set of entity-event pairs;
c) a logic component operative to resolve ambiguous name class identifiers relative to said set of entity-event pairs, said logic component including a knowledge base of known names and names variants, said logic component producing a resolved set of entity-event pairs; and
d) a scoring component operative to derive a numeric score for each entity-event pair in said resolved set of entity-event pairs.
1 Assignment
0 Petitions
Accused Products
Abstract
A content mining system and process utilizes a combination of term recognition and rules-based activity-event classification, performed using a modular database that defines one or more vertical markets or information sectors, to identify sector relevant evidence. The primary elements of the identified evidence are scored in a manner that rates the relevance of a content item with respect to a set of identified nominative entities, a set of activity-based event categories, further associated as sets of entity-event pairs. A database constructed of the scored information provides a relevancy indexed repository of the original unstructured content items.
109 Citations
11 Claims
-
1. A sequential textual analysis system operative to identify in a document a set of named entities and correspondingly associated events, said sequential textual analysis process comprising:
-
a) a named entity extraction component operative to identify names in a document, said named entity extraction component being further operative to associate each identified name with a name class identifier of a set of name class identifiers;
b) a text classification component operative to analyze said document to identify event identifiers, representative of selected content of said document, having predetermined associations with said set of name class identifiers, said text classification component producing a set of entity-event pairs;
c) a logic component operative to resolve ambiguous name class identifiers relative to said set of entity-event pairs, said logic component including a knowledge base of known names and names variants, said logic component producing a resolved set of entity-event pairs; and
d) a scoring component operative to derive a numeric score for each entity-event pair in said resolved set of entity-event pairs.
-
-
2. A method of analyzing natural language text to identify events or actions associated with specific named entities.
-
3. A method of determining relevance of a textual content item to entity-event pairs based on scoring the textual evidence for entities and events found in this analysis.
-
4. A method of automatic content mining to produce vertical market defined sector knowledge data, said method comprising the steps of:
-
a) receiving unstructured content documents from a plurality of sources;
b) first processing said unstructured content documents to perform term recognition to produce knowledge records including identifications of the nominative terms, predetermined characteristic of a predetermined vertical market sector, that occur in said unstructured content documents;
c) second processing said unstructured content documents and said knowledge records to perform event classification that identifies activity events correlated to said identifications of said nominative terms, wherein said event classification is operative from a predetermined rule set characteristic of said predetermined vertical market sector, wherein the results of said second processing step is stored in said knowledge records; and
d) third processing said knowledge records to score the correlated occurrences of said nominative terms and said activity events with respect to predetermined documents of said unstructured content documents, wherein the results of said third processing step is stored in a database index accessible for the reporting of market defined sector knowledge data. - View Dependent Claims (5, 6, 7, 8, 9, 10)
-
-
11. A knowledge mining system configurable to exclusively address a defined vertical market, said knowledge mining system comprising:
-
a) a distributable knowledge base including an authority file and a event category rule set, wherein said authority file includes predetermined direct and indirect identifications of nominative entities specific to a predefined vertical market and wherein said event category rule set provides query rules configured to identify predetermined activity-based events specifically related to said nominative entities;
b) a term recognition module, coupled to said distributable knowledge base, operable to produce respective evidence records identifying the occurrence and locations of nominative terms within predetermined unstructured content documents, for each of a sequence of documents provided from a document collection;
c) an event classification module, coupled to said distributable knowledge base, operable to modify respective evidence records identifying the occurrence and location of activity-based events within said predetermined unstructured content documents, for each of said sequence of documents;
d) an event resolution module, coupled to said distributable knowledge base, operable to modify respective evidence records to identify and resolve correlations of activity-based events with respect to nominative terms within said predetermined unstructured content documents, for each of said sequence of documents;
e) a scoring module operable over respective said evidence records to define relative occurrence significance scores based on the resolved correlations of nominative terms and activity-based events within said predetermined unstructured content documents, for each of said sequence of documents; and
f) a database providing for the storage of representations of said predetermined unstructured content documents and an index representative of said evidence records.
-
Specification