Please download the dossier by clicking on the dossier button x
×

Preprocessing of text

  • US 8,620,836 B2
  • Filed: 01/10/2011
  • Issued: 12/31/2013
  • Est. Priority Date: 01/10/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving, by a device, a document;

    determining, by the device, a plurality of topics associated with the document;

    each of the plurality of topics being associated with text,determining, by the device, one or more desired topics of the plurality of topics;

    filtering, by the device, a first portion of text from the document without filtering a second portion of text from the document,the second portion of text being associated with the one or more desired topics,the first portion of text not being associated with the one or more desired topics,the first portion of text being removed from the document, andthe second portion of text being different than the first portion of text;

    splitting, by the device, the second portion of text into a plurality of segments;

    clustering, by the device, each of the plurality of segments into one or more clusters of a plurality of clusters,each cluster, of the plurality of clusters, including at least one of the plurality of segments, andeach cluster, of the plurality of clusters, being associated with the one or more desired topics;

    identifying, by the device, at least one segment, of the plurality of segments, having low relevance to a cluster, of the plurality of clusters, that includes the at least one segment; and

    removing, by the device, the at least one segment from the cluster.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×