Sector content mining system using a modular knowledge base

US 20050131935A1
Filed: 11/18/2004
Published: 06/16/2005
Est. Priority Date: 11/18/2003
Status: Abandoned Application

First Claim

Patent Images

1. A sequential textual analysis system operative to identify in a document a set of named entities and correspondingly associated events, said sequential textual analysis process comprising:

a) a named entity extraction component operative to identify names in a document, said named entity extraction component being further operative to associate each identified name with a name class identifier of a set of name class identifiers;

b) a text classification component operative to analyze said document to identify event identifiers, representative of selected content of said document, having predetermined associations with said set of name class identifiers, said text classification component producing a set of entity-event pairs;

c) a logic component operative to resolve ambiguous name class identifiers relative to said set of entity-event pairs, said logic component including a knowledge base of known names and names variants, said logic component producing a resolved set of entity-event pairs; and

d) a scoring component operative to derive a numeric score for each entity-event pair in said resolved set of entity-event pairs.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A content mining system and process utilizes a combination of term recognition and rules-based activity-event classification, performed using a modular database that defines one or more vertical markets or information sectors, to identify sector relevant evidence. The primary elements of the identified evidence are scored in a manner that rates the relevance of a content item with respect to a set of identified nominative entities, a set of activity-based event categories, further associated as sets of entity-event pairs. A database constructed of the scored information provides a relevancy indexed repository of the original unstructured content items.

109 Citations

View as Search Results

11 Claims

1. A sequential textual analysis system operative to identify in a document a set of named entities and correspondingly associated events, said sequential textual analysis process comprising:
- a) a named entity extraction component operative to identify names in a document, said named entity extraction component being further operative to associate each identified name with a name class identifier of a set of name class identifiers;
  
  b) a text classification component operative to analyze said document to identify event identifiers, representative of selected content of said document, having predetermined associations with said set of name class identifiers, said text classification component producing a set of entity-event pairs;
  
  c) a logic component operative to resolve ambiguous name class identifiers relative to said set of entity-event pairs, said logic component including a knowledge base of known names and names variants, said logic component producing a resolved set of entity-event pairs; and
  
  d) a scoring component operative to derive a numeric score for each entity-event pair in said resolved set of entity-event pairs.

2. A method of analyzing natural language text to identify events or actions associated with specific named entities.

3. A method of determining relevance of a textual content item to entity-event pairs based on scoring the textual evidence for entities and events found in this analysis.

4. A method of automatic content mining to produce vertical market defined sector knowledge data, said method comprising the steps of:
- a) receiving unstructured content documents from a plurality of sources;
  
  b) first processing said unstructured content documents to perform term recognition to produce knowledge records including identifications of the nominative terms, predetermined characteristic of a predetermined vertical market sector, that occur in said unstructured content documents;
  
  c) second processing said unstructured content documents and said knowledge records to perform event classification that identifies activity events correlated to said identifications of said nominative terms, wherein said event classification is operative from a predetermined rule set characteristic of said predetermined vertical market sector, wherein the results of said second processing step is stored in said knowledge records; and
  
  d) third processing said knowledge records to score the correlated occurrences of said nominative terms and said activity events with respect to predetermined documents of said unstructured content documents, wherein the results of said third processing step is stored in a database index accessible for the reporting of market defined sector knowledge data.
- View Dependent Claims (5, 6, 7, 8, 9, 10)
- - 5. The method of claim 4 further comprising the step of providing, to said first processing step, access to an authority database of predetermined nominative terms, predetermined characteristic of said predetermined vertical market sector.
  - 6. The method of claim 5 further comprising the step of providing, to said second processing step, access to an event rules database storing said predetermined rule set characteristic of said predetermined vertical market sector.
  - 7. The method of claim 6 wherein said authority database and said event rules database comprise modules of a distributed database.
  - 8. The method of claim 7 wherein said authority database and said event rules database consist of modular subsets of a master database, wherein said master database stores identifications of nominative terms and event classification rule sets that are comprehensive to a document collection represented by said unstructured content documents.
  - 9. The method of claim 8 wherein said receiving, first, second, and third processing steps run autonomously and wherein said method further comprises the step of continuously filtering modifications to said database index to selectively identify reportable market defined sector knowledge data.
  - 10. The method of claim 9 wherein said step of continuously filtering provides for the filtering of modifications to said database index against personal filter profiles, wherein market defined sector knowledge data is selectively reportable on a per-user basis.

11. A knowledge mining system configurable to exclusively address a defined vertical market, said knowledge mining system comprising:
- a) a distributable knowledge base including an authority file and a event category rule set, wherein said authority file includes predetermined direct and indirect identifications of nominative entities specific to a predefined vertical market and wherein said event category rule set provides query rules configured to identify predetermined activity-based events specifically related to said nominative entities;
  
  b) a term recognition module, coupled to said distributable knowledge base, operable to produce respective evidence records identifying the occurrence and locations of nominative terms within predetermined unstructured content documents, for each of a sequence of documents provided from a document collection;
  
  c) an event classification module, coupled to said distributable knowledge base, operable to modify respective evidence records identifying the occurrence and location of activity-based events within said predetermined unstructured content documents, for each of said sequence of documents;
  
  d) an event resolution module, coupled to said distributable knowledge base, operable to modify respective evidence records to identify and resolve correlations of activity-based events with respect to nominative terms within said predetermined unstructured content documents, for each of said sequence of documents;
  
  e) a scoring module operable over respective said evidence records to define relative occurrence significance scores based on the resolved correlations of nominative terms and activity-based events within said predetermined unstructured content documents, for each of said sequence of documents; and
  
  f) a database providing for the storage of representations of said predetermined unstructured content documents and an index representative of said evidence records.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Green Ridge Systems, Inc.
Original Assignee
Green Ridge Systems, Inc.
Inventors
Ketsdever, David T., Hernandez, Harold, O'Leary, Paul J., Harris, C. Lee

Application Number

US10/992,240
Publication Number

US 20050131935A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/313 Selection or weighting of t...

G06F 40/295 Named entity recognition

Sector content mining system using a modular knowledge base

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

109 Citations

11 Claims

Specification

Use Cases

Quick Links

Others

Sector content mining system using a modular knowledge base

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

109 Citations

11 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others