Natural language processing of disfluent sentences

  • US 7,930,168 B2
  • Filed: 10/04/2005
  • Issued: 04/19/2011
  • Est. Priority Date: 10/04/2005
  • Status: Active Grant
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A computer-implemented method for processing spoken language comprising:

  • converting spoken words into a text word sequence in a processor-based natural language processing system executing program code;

    tagging words in the text word sequence with part-of-speech (POS) tags through a part-of-speech tagger component of the system; and

    tagging edited words in the text word sequence using a disfluence identifier component of the system that operates with a feature set created with techniques comprising;

    matching only the highest level POS tags in a multi-level hierarchy of such tags, wherein the highest level of the hierarchy comprises categories of tags including a noun category, a verb category, an adjective category, and an adverb category;

    processing a resulting sequence of word-POS-tag pairs to mark each word in a text sequence with an edited-word-tag;

    removing sequence-related errors in edited-word-tag information before parsing the text word sequence;

    parsing the text word sequence into machine instructions with the aid of POS-tag and edited-word-tag information; and

    allowing single mismatches in POS-tag sequences of rough copy, wherein rough copy in a string of POS-tagged words produces candidates for any potential pairs of reparanda and repairs by applying an algorithm to the string of POS-tagged words.

View all claims

    Thank you for your feedback