PHONETIC ALIGNMENT FOR USER-AGENT DIALOGUE RECOGNITION

US 20150058006A1
Filed: 08/23/2013
Published: 02/26/2015
Est. Priority Date: 08/23/2013
Status: Abandoned Application

First Claim

Patent Images

1. A method for speech to text transcription comprising:

providing access to a knowledge base containing solution descriptions, each solution description including a textual description of a solution to a respective problem;

generating a preliminary transcription of at least an agent'"'"'s part of an audio recording of a dialogue between the agent and a user in which the agent had access to the knowledge base, the generating comprising;

identifying a sequence of phonemes based on the agent'"'"'s part of the audio recording, andbased on the identified sequence of phonemes, generating the preliminary transcription, the preliminary transcription including a sequence of words recognized as corresponding to phonemes in the sequence of phonemes and unrecognized phonemes from the phoneme sequence that are not recognized as corresponding to one of the recognized words; and

revising the preliminary transcription, the revising comprising replacement of unrecognized phonemes with at least one word from a solution description, the solution description including words which match words of the sequence of recognized words,wherein at least one of the generating of the preliminary transcription and the revising of the preliminary transcription is performed with a processor.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for speech to text transcription uses a knowledge base containing solution descriptions, each describing, in words, a solution to a respective problem. An audio recording of a dialogue between an agent and a user in which the agent had access to the knowledge base is received. A sequence of phonemes based on the agent'"'"'s part of the audio recording is identified and from this, a preliminary transcription is made which includes a sequence of words recognized as corresponding to phonemes in the identified sequence of phonemes together with any unrecognized phonemes from the phoneme sequence that are not recognized as corresponding to one of the recognized words. The preliminary transcription is revised by replacing one or more of the unrecognized phonemes with a word or words from a solution description that includes words which match adjacent words of the sequence of recognized words.

Citations

20 Claims

1. A method for speech to text transcription comprising:
- providing access to a knowledge base containing solution descriptions, each solution description including a textual description of a solution to a respective problem;
  
  generating a preliminary transcription of at least an agent'"'"'s part of an audio recording of a dialogue between the agent and a user in which the agent had access to the knowledge base, the generating comprising;
  
  identifying a sequence of phonemes based on the agent'"'"'s part of the audio recording, andbased on the identified sequence of phonemes, generating the preliminary transcription, the preliminary transcription including a sequence of words recognized as corresponding to phonemes in the sequence of phonemes and unrecognized phonemes from the phoneme sequence that are not recognized as corresponding to one of the recognized words; and
  
  revising the preliminary transcription, the revising comprising replacement of unrecognized phonemes with at least one word from a solution description, the solution description including words which match words of the sequence of recognized words,wherein at least one of the generating of the preliminary transcription and the revising of the preliminary transcription is performed with a processor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein revising the preliminary transcription comprises:
    - comparing recognized words in the preliminary transcription with words in solution descriptions in the knowledge base to identify candidate solution descriptions which each include a sequence of text which includes words which are determined to match at least some of the identified words in the preliminary transcription, andusing a phoneme sequence corresponding to a sequence of text in one of the candidate solution descriptions, replacing at least one of the unrecognized phonemes in the preliminary transcription with at least one word of the sequence of text in the candidate solution description which is aligned with the at least one unrecognized phoneme to generate a revised transcription.
  - 3. The method of claim 2, wherein the comparing of recognized words in the preliminary transcription with words in the solution descriptions in the knowledge base to identify candidate solution descriptions comprises, for a pair of identified words in the preliminary transcription that are spaced by at least one unrecognized phoneme, determining whether a matching pair of words in a solution description is spaced by a gap of at least one word and comparing the at least one unrecognized phoneme with at least one phoneme corresponding to the at least one word in the gap to determine if there is a match.
  - 4. The method of claim 3, wherein the gap between the matching pair of words in the solution description is permitted to be no more than a threshold size.
  - 5. The method of claim 2, wherein the method includes determining whether there is one of the solution descriptions in the knowledge base which includes the matching pair of words for each of a plurality of pairs of identified words in the preliminary transcription that are spaced by at least one unrecognized phoneme and where the at least one unrecognized phoneme for each pair has at least a threshold similarity with a phoneme sequence corresponding to aligned words in the solution description.
  - 6. The method of claim 2, wherein the comparing of recognized words in the preliminary transcription with words in solution descriptions in the knowledge base to identify candidate solution descriptions comprises, for each of first and second sequential pairs of recognized words in the preliminary transcription:
    - generating a first sequence of phonemes for the words that space two words of an identified solution description that match the sequential pair of recognized words;
      
      computing a matching likelihood between the first sequence of phonemes and a second sequence of phonemes that temporally spaces the pair of matching words of the preliminary transcription;
      
      determining if the matching likelihood meets a predetermined threshold;
      
      where the threshold is met, storing the words appearing in the solution description between two matching words, and an identifier of the solution description; and
      
      comparing the identifiers of the solution descriptions stored for the first and second sequential pairs of recognized words.
  - 7. The method of claim 1, wherein the method further includes, prior to revising the preliminary transcription, associating text sequences of the solution descriptions in the knowledge base with respective sequences of phonemes.
  - 8. The method of claim 1, wherein the method further comprises:
    - generating a preliminary transcription of a user'"'"'s part of the audio recording of the dialogue between the agent and the user comprising;
      
      identifying a sequence of phonemes based on the user'"'"'s part of the audio recording, andbased on the identified sequence of phonemes, generating the preliminary transcription of the user'"'"'s part, the preliminary transcription including a sequence of words recognized as corresponding to phonemes in the sequence of phonemes and unrecognized phonemes from the phoneme sequence that are not recognized as corresponding to one of the recognized words;
      
      revising the preliminary transcription of the user'"'"'s part, comprising;
      
      retrieving an identifier of the solution description used in replacing the at least one of the unrecognized phonemes in the preliminary transcription of the agent'"'"'s part;
      
      retrieving phoneme sequences for a cluster of words associated in memory with the solution identifier; and
      
      comparing the unrecognized phonemes in the preliminary transcription of the user'"'"'s part with the phoneme sequence for each of words in the cluster of words to identify at least one matching word from the cluster of words; and
      
      replacing at least one of the unrecognized phonemes in the preliminary transcription of the user'"'"'s part with at least one matching word from the cluster of words.
  - 9. The method of claim 8, wherein the cluster of words is derived from text communications from users which have been associated with the solution description identifier.
  - 10. The method of claim 8, further comprising, for each of a plurality of the solution descriptions in the knowledge base:
    - processing text communications from a plurality of users to identify a cluster of words frequently used in a cluster of the text communications that has been associated with the solution description; and
      
      associating each of the frequently used words with a respective sequence of phonemes.
  - 11. The method of claim 1, further comprising outputting at least one of the revised transcription and information based thereon.
  - 12. The method of claim 1, wherein each solution to a respective problem relates to a solution to a problem with a device.
  - 13. The method of claim 1, further comprising automatically identifying a first part of the audio recording of the dialogue between the agent and the user as the agent'"'"'s part and a second part of the audio recording of the dialogue between the agent and the user as the user'"'"'s part.
  - 14. The method of claim 13, wherein the agent'"'"'s part and the user'"'"'s part are processed differently.
  - 15. The method of claim 1, wherein the phonemes are drawn from a finite alphabet.
  - 16. A computer program product comprising a non-transitory recording medium storing instructions, which when executed on a computer causes the computer to perform the method of claim 1.
  - 17. A system comprising memory which stores instructions for performing the method of claim 1 and a processor in communication with the memory for executing the instructions.

18. A system for speech to text transcription comprising:
- a speech to text decoder for generating a preliminary transcription of at least an agent'"'"'s part of an audio recording of a dialogue between the agent and a user, the agent having access to an associated knowledge base of solution descriptions, each solution description including a textual description of a solution to a respective problem, the decoder configured for;
  
  identifying a sequence of phonemes based on the agent'"'"'s part of the audio recording, andbased on the identified sequence of phonemes, generating the preliminary transcription, the preliminary text transcription including a sequence of words recognized as corresponding to phonemes in the sequence of phonemes and unrecognized phonemes from the phoneme sequence that are not recognized as corresponding to one of the recognized words;
  
  a revision component for revising the preliminary transcription, the revision component configured for;
  
  comparing recognized words in the preliminary transcription with words in solution descriptions in the knowledge base to identify candidate solution descriptions which each include a sequence of text which includes words which are determined to match at least some of the identified words in the preliminary transcription, andusing a phoneme sequence corresponding to a sequence of text in one of the candidate solution descriptions, replacing unrecognized phonemes in the preliminary transcription with at least one word of the sequence of text in the candidate solution description to generate a revised transcription; and
  
  a processor which implements at least one of the generating of the preliminary transcription and the revising of the preliminary transcription.
- View Dependent Claims (19)
- - 19. The system of claim 18, further comprising the knowledge base of solution descriptions, each solution description being associated in memory with a phoneme sequence corresponding to text of the solution description.

20. A method for providing a system for speech to text transcription comprising:
- with a processor, for each of a set of solution descriptions in a knowledge base which includes a textual description of a solution to a respective problem with a device, associating the solution description with a sequence of phonemes corresponding to at least a part of the textual description;
  
  providing access to a speech to text converter which is configured for generating a preliminary transcription of at least an agent'"'"'s part of an audio recording of a dialogue between the agent and a user in which the agent has access to the knowledge base, the generating comprising;
  
  identifying a sequence of phonemes based on the agent'"'"'s part of the audio recording, andbased on the identified sequence of phonemes, generating the preliminary transcription, the preliminary transcription including a sequence of words recognized as corresponding to phonemes in the sequence of phonemes and any unrecognized phonemes from the phoneme sequence that are not recognized as corresponding to one of the recognized words; and
  
  providing instructions for revising the preliminary transcription when there are unrecognized phonemes from the phoneme sequence, the instructions providing for replacement of unrecognized phonemes with text from a solution description which includes words from the sequence of recognized words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Conduent Business Services, LLC (Conduent, Inc.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Proux, Denys

Application Number

US13/974,515
Publication Number

US 20150058006A1
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

G10L 2015/025 Phonemes, fenemes or fenone...

PHONETIC ALIGNMENT FOR USER-AGENT DIALOGUE RECOGNITION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

PHONETIC ALIGNMENT FOR USER-AGENT DIALOGUE RECOGNITION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links