Systems and methods for automatically categorizing unstructured text

US 20060161423A1
Filed: 11/23/2005
Published: 07/20/2006
Est. Priority Date: 11/24/2004
Status: Active Grant

First Claim

Patent Images

1. A method for processing unstructured text, comprising:

receiving a message stream, the message stream including a plurality of unstructured text messages originating from at least one homogeneous source;

capturing at least a subset of the text messages as an exploration set;

displaying the text messages of the exploration set to an analyst for review;

receiving at least one text category from the analyst for each displayed text message;

associating each text category with at least one text message within the exploration set, the associated categories and messages providing a classification model;

initiating an automated training process to categorize text messages based on the classification model.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods and software products analyze messages of a message stream based upon human generated concept recognizers. A sample set of messages, representative of messages from the message stream, are analyzed to determine interesting or useful categories. Text categorization engines are then trained, using the sample set and text classifiers are published. These text classifiers are then used to categorizing further text messages from the message stream.

Citations

9 Claims

1. A method for processing unstructured text, comprising:
- receiving a message stream, the message stream including a plurality of unstructured text messages originating from at least one homogeneous source;
  
  capturing at least a subset of the text messages as an exploration set;
  
  displaying the text messages of the exploration set to an analyst for review;
  
  receiving at least one text category from the analyst for each displayed text message;
  
  associating each text category with at least one text message within the exploration set, the associated categories and messages providing a classification model;
  
  initiating an automated training process to categorize text messages based on the classification model.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein the association of a category to a text message is indicated with a visual icon.
  - 3. The method of claim 1, wherein the analyst is presented with a plurality of lexical items, each lexical item common within the text messages of the exploration set, the analyst permitted to cross index text messages containing the lexical item, and associate at least one text category with at least one cross indexed text message, the text category identifying the lexical item.

4. A method for processing unstructured text with selected concept identifiers, comprising:
- receiving a message stream, the message stream including a plurality of unstructured text messages originating from at least one homogeneous source;
  
  establishing from the message stream a distinct training set having at least one text message as a training document;
  
  establishing from the message stream a distinct audit set having at least one text message as an audit document;
  
  receiving from an analyst at least one text category, the text category common in the training set, and at least one target category;
  
  reviewing each training document to determine a pseudo instance of the target category and, in response to a positive determination assigning a point value to the training document;
  
  reviewing each audit document to determine a pseudo instance of the target category and, in response to a positive determination assigning a point value to the audit document;
  
  concurrently displaying to the analyst a training document and an audit document;
  
  requesting the auditor to indicate the presence or absence of at least one target category in each displayed text document;
  
  in response to an indicated presence, recording a positive association of the text message to the target category;
  
  repeating the concurrent display of a training document and an audit document until all members of both sets have been displayed;
  
  comparing the recorded positive associations with assigned point values, a positive correlation permitting the pseudo instance text to be added to a text classifier;
  
  utilizing the text classifier in an automated process to evaluate the message stream.
- View Dependent Claims (5, 6, 7, 8)
- - 5. The method of claim 4, wherein for each target category a plurality of pseudo items is provided, each pseudo item receiving a different point value.
  - 6. The method of claim 5, wherein the higher the point value the greater the probability of proper identification.
  - 7. The method of claim 5, wherein for a plurality of target categories, the point values are aggregate.
  - 8. The method of claim 5, wherein the method is stored on a computer-readable medium as a computer program, which when executed by a computer will perform the method of processing unstructured text.

9. A method for processing unstructured text with selected concept identifiers, comprising:
- receiving a message stream, the message stream including a plurality of unstructured text messages originating from at least one homogeneous source;
  
  establishing from the message stream a distinct training set having at least one text message as a training document;
  
  establishing from the message stream a distinct audit set having at least one text message as an audit document;
  
  receiving from an analyst at least one text category, the text category common in the training set, and at least one target category;
  
  concurrently displaying to the analyst a training document and an audit document;
  
  requesting the auditor to indicate the presence or absence of at least one pseudo target category in each displayed text document;
  
  in response to an indicated presence, recording a positive association of the text message and the pseudo target category;
  
  repeating the concurrent display of a training document and an audit document until all members of both sets have been displayed;
  
  reviewing the recorded positive association text documents and associated pseudo terms to establish a text classifier;
  
  utilizing the text classifier in an automated process to evaluate the message stream, the text messages of the text stream evaluated by the text classifier to identify pseudo categories.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verint Americas Incorporated (Verint Systems Incorporated)
Original Assignee
Overtone, Inc. (Verint Systems Incorporated)
Inventors
Rhoads, Katrina A., Scott, Eric D.

Granted Patent

US 7,853,544 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/10
CPC Class Codes

G06F 16/353 into predefined classes

G06F 40/30 Semantic analysis

Systems and methods for automatically categorizing unstructured text

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for automatically categorizing unstructured text

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links