×

Automatically Creating Training Data For Language Identifiers

  • US 20150006148A1
  • Filed: 07/17/2013
  • Published: 01/01/2015
  • Est. Priority Date: 06/27/2013
  • Status: Abandoned Application
First Claim
Patent Images

1. A method, comprising:

  • accessing a target corpus of electronic communications associated with an electronic communication service;

    identifying a member of the target corpus that includes an attribute from which a predicted classification of the member can be made, the attribute being separate from a message portion of the member;

    accessing the predicted classification of the member, where the predicted classification is a function of the attribute and where the predicted classification is made without reference to a base classifier;

    accessing an actual classification of the member, where the actual classification is made by the base classifier, the base classifier being configured to classify communications associated with the electronic communication service; and

    upon determining that the predicted classification matches the actual classification;

    adding a labeled member to a target training corpus stored in a data store, the labeled member comprising the member and data representing the actual classification.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×