×

Method and apparatus for automatically discovering features in free form heterogeneous data

  • US 8,108,413 B2
  • Filed: 02/15/2007
  • Issued: 01/31/2012
  • Est. Priority Date: 02/15/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method, performed using a data processing system, of automatically discovering one or more features in free form heterogeneous data, the method comprising the steps of:

  • obtaining free form heterogeneous data, wherein the data comprises one or more data items, and wherein at least a portion of the data is textual data representing one or more inquiries received by a call center;

    applying a label to each data item;

    using the labeled data to build a language model, wherein a word distribution associated with each label is derived from the model, and wherein the language model comprises a probability of a word occurring in a cluster of words, the probability comprising a frequency of the word within the cluster of words divided by a total number of words within the cluster of words; and

    automatically discovering one or more features in the data using the word distribution associated with each label, wherein discovering one or more features in the data facilitates one or more operations that use at least a portion of the labeled data, and wherein the one or more features include text related features that are used for recognizing an information type of a particular unit of text;

    wherein the data processing system comprises a memory and a processor coupled to the memory; and

    wherein the obtaining step, the applying step, the labeled data using step, and the word distribution using step are preformed, at least in part, on the data processing system.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×