×

Correction of misspellings in QA system

  • US 10,803,242 B2
  • Filed: 10/26/2018
  • Issued: 10/13/2020
  • Est. Priority Date: 10/26/2018
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for identifying and correcting a misspelling in a question answering (QA) system, wherein the QA system is coupled to a document corpus, and the document corpus includes a plurality of documents related to a particular domain, the method comprising:

  • receiving, by a processor coupled to one or more user devices, an input question and a plurality of passages, wherein the plurality of passages are extracted from the document corpus by the QA system;

    providing, by the processor, at least one alternate form for each token extracted from the input question and the plurality of passages;

    the step of providing at least one alternate form for each token further comprising;

    providing, by the processor, a first substitution confusion matrix for the input question through a speech to text process, wherein a first group of characters are substituted with a first single character in the first substitution confusion matrix, wherein in the first substitution confusion matrix, a first modified Levenshtein distance value between the character “

    e” and

    the character “

    o”

    is 0, a first modified Levenshtein distance value between the character “

    i” and

    the character “

    t”

    is 0, a first modified Levenshtein distance value between the character “

    a” and

    the character “

    o”

    is 0, and a first modified Levenshtein distance value between the character “

    rn” and

    the character “

    m”

    is 0;

    providing, by the processor, the first modified Levenshtein distance value for each alternate form of each token extracted from the input question;

    providing, by the processor, a second substitution confusion matrix for the plurality of passages through an optical character recognition process, wherein a second group of characters are substituted with a second single character in the second substitution confusion matrix, wherein in the second substitution confusion matrix, a second modified Levenshtein distance value between the character “

    sh” and

    the character “

    s”

    is 0, a second modified Levenshtein distance value between the character “

    ai” and

    the character “

    e”

    is 0, a second modified Levenshtein distance value between the character “

    ed” and

    the character “

    te”

    is 0, a second modified Levenshtein distance value between the character “

    s” and

    the character “

    th”

    is 0, and a second modified Levenshtein distance value between the character “

    t” and

    the character “

    k”

    is 0;

    providing, by the processor, the second modified Levenshtein distance value for each alternate form of each token extracted from the plurality of passages;

    identifying, by the processor, at least one misspelled token from the input question and the plurality of passages; and

    scoring, by the processor, at least one alternate form of each identified misspelled token.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×