×

Adaptive parser-centric text normalization

  • US 9,471,561 B2
  • Filed: 12/26/2013
  • Issued: 10/18/2016
  • Est. Priority Date: 12/26/2013
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method comprising:

  • receiving at a computing node an input sequence comprising a plurality of tokens;

    applying by a processor of the computing node a plurality of domain-specific generators to the input sequence to generate a set of candidate replacements of the tokens of the input sequence;

    creating in a memory of the computing node a directed graph comprising a plurality of nodes and a plurality of edges, each node having an associated candidate replacement of the set of candidate replacements, and each edge connecting a first node to a second node, the second node being associated with a consistent follower of the candidate replacement associated with the first node, and creating the plurality of edges comprising determining syntactic consistency between each pair of the set of candidate replacements;

    determining by the processor a plurality of paths in the directed graph, each of the plurality of paths comprising at least one of the plurality of edges;

    determining by the processor a score for each of the paths;

    selecting by the processor a path of the plurality of paths having the highest score;

    applying by the processor each candidate replacement of the selected path to the input sequence to generate a normalized output sequence; and

    evaluating a correctness of the normalized output sequence by parsing the normalized output sequence to obtain a parse result and comparing the parse result with a gold standard that is obtained by parsing a manually normalized sequence.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×