DIALOGUE SPEECH RECOGNITION SYSTEM, DIALOGUE SPEECH RECOGNITION METHOD, AND RECORDING MEDIUM FOR STORING DIALOGUE SPEECH RECOGNITION PROGRAM

US 20110131042A1
Filed: 05/12/2009
Published: 06/02/2011
Est. Priority Date: 07/28/2008
Status: Active Grant

First Claim

Patent Images

1. A dialogue speech recognition system comprising:

a speech recognition unit that receives a speech signal of each speaker in a dialog among a plurality of speakers and turn information indicating whether a speaker having generated the speech signal has turn to speak or indicating a probability that the speaker has turn to speak and performs speech recognition for the speech signal, whereinthe speech recognition unit at least includes;

an acoustic likelihood computation unit that provides a likelihood of occurrence of an input speech signal from a given phoneme sequence;

a linguistic likelihood computation unit that provides a likelihood of occurrence of a given word sequence; and

a maximum likelihood candidate search unit that provides a word sequence with a maximum likelihood of occurrence from a speech signal by using the likelihoods provided by the acoustic likelihood computation unit and the linguistic likelihood computation unit, andthe linguistic likelihood computation unit provides different linguistic likelihoods when a speaker having generated a speech signal input to the speech recognition unit has the turn to speak and when not.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed is a dialogue speech recognition system that can expand the scope of applications by employing a universal dialogue structure as the condition for speech recognition of dialogue speech between persons. An acoustic likelihood computation means (701) provides a likelihood that a speech signal input from a given phoneme sequence will occur. A linguistic likelihood computation means (702) provides a likelihood that a given word sequence will occur. A maximum likelihood candidate search means (703) uses the likelihoods provided by the acoustic likelihood computation means and the linguistic likelihood computation means to provide a word sequence with the maximum likelihood of occurring from a speech signal. Further, the linguistic likelihood computation means (702) provides different linguistic likelihoods when the speaker who generated the acoustic signal input to the speech recognition means does and does not have the turn to speak.

Citations

13 Claims

1. A dialogue speech recognition system comprising:
- a speech recognition unit that receives a speech signal of each speaker in a dialog among a plurality of speakers and turn information indicating whether a speaker having generated the speech signal has turn to speak or indicating a probability that the speaker has turn to speak and performs speech recognition for the speech signal, whereinthe speech recognition unit at least includes;
  
  an acoustic likelihood computation unit that provides a likelihood of occurrence of an input speech signal from a given phoneme sequence;
  
  a linguistic likelihood computation unit that provides a likelihood of occurrence of a given word sequence; and
  
  a maximum likelihood candidate search unit that provides a word sequence with a maximum likelihood of occurrence from a speech signal by using the likelihoods provided by the acoustic likelihood computation unit and the linguistic likelihood computation unit, andthe linguistic likelihood computation unit provides different linguistic likelihoods when a speaker having generated a speech signal input to the speech recognition unit has the turn to speak and when not.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The dialogue speech recognition system according to claim 1, whereinthe linguistic likelihood computation unit includes:
    - a first linguistic likelihood identification unit that identifies a likelihood from a first linguistic model indicating a linguistic likelihood when a speaker having generated a speech signal has the turn to speak; and
      
      a second linguistic likelihood identification unit that identifies a likelihood from a second linguistic model indicating a linguistic likelihood when a speaker having generated a speech signal does not have the turn to speak, andthe maximum likelihood candidate search unit acquires a candidate for a speech recognition result by using at least one of a linguistic likelihood identified by the first linguistic likelihood identification unit and a linguistic likelihood identified by the second linguistic likelihood identification unit according to the turn information.
  - 3. The dialogue speech recognition system according to claim 2, wherein the maximum likelihood candidate search unit corrects and the merges the linguistic likelihood identified by the first linguistic likelihood identification unit and the linguistic likelihood identified by the second linguistic likelihood identification unit according to the turn information, and acquires a candidate for a speech recognition result by using the merged maximum likelihood.
  - 4. The dialogue speech recognition system according to claim 2, wherein the maximum likelihood candidate search unit linearly combines the linguistic likelihood identified by the first linguistic likelihood identification unit and the linguistic likelihood identified by the second linguistic likelihood identification unit according to the turn information, and acquires a candidate for a speech recognition result from the speech signal by using the linearly combined maximum likelihood.
  - 5. The dialogue speech recognition system according to claim 2, wherein, when performing speech recognition of a speech signal of a speaker not having the turn to speak, the maximum likelihood candidate search unit corrects a linguistic likelihood of a character sequence corresponding to a speech recognition result for speech of a speaker determined to have the turn to speak at a most recent time, among the linguistic likelihood identified by the second linguistic likelihood identification unit.
  - 6. The dialogue speech recognition system according to claim 2, wherein the first linguistic model and the second linguistic model define a linguistic likelihood of a word, a set of words, or a chain of words or sets of words corresponding to a phoneme sequence.
  - 7. The dialogue speech recognition system according to claim 1, further comprising:
    - a turn information generation unit that generates turn information based on start time and end time of a speech signal of each speaker.
  - 8. The dialogue speech recognition system according to claim 7, wherein the turn information generation unit generates turn information indicating that a certain speaker has the turn to speak during a period from time when a speech signal of the speaker becomes sounded from a state where speech signals of all speakers are soundless to time when the speech signal of the speaker becomes soundless, and, provided that a speech signal of another speaker has become sounded at the time when the speech signal of the speaker set to have the turn to speak becomes soundless, generates turn information indicating that said another speaker has the turn to speak during a period from the time to time when the speech signal of said another speaker becomes soundless.
  - 9. The dialogue speech recognition system according to claim 7, wherein the turn information generation unit generates turn information indicating that a certain speaker has the turn to speak during a period from time when a speech signal of the speaker becomes sounded from a state where speech signals of all speakers are soundless to time when the speech signal of the speaker becomes soundless, and, provided that a speech signal of another speaker has become sounded at the time when the speech signal of the speaker set to have the turn to speak becomes soundless, generates turn information indicating that said another speaker has the turn to speak during a period from the time when the speech signal of the speaker becomes sounded to time when the speech signal of said another speaker becomes soundless.

10. A dialogue speech recognition method comprising:
- upon receiving a speech signal of each speaker in a dialog among a plurality of speakers and turn information indicating whether a speaker having generated the speech signal has turn to speak or indicating a probability that the speaker has turn to speak, performing speech recognition for the speech signal;
  
  at time of the speech recognition,performing acoustic likelihood computation that provides a likelihood of occurrence of an input speech signal from a given phoneme sequence;
  
  performing linguistic likelihood computation that provides a likelihood of occurrence of a given word sequence;
  
  performing maximum likelihood candidate search that provides a word sequence with a maximum likelihood of occurrence from a speech signal by using the likelihoods provided by the acoustic likelihood computation and the linguistic likelihood computation; and
  
  at time of the linguistic likelihood computation, providing different linguistic likelihoods when a speaker having generated an input speech signal has the turn to speak and when not.
- View Dependent Claims (11)
- - 11. The dialogue speech recognition method according to claim 10, further comprising:
    - at time of the linguistic likelihood computation,performing first linguistic likelihood identification that identifies a likelihood from a first linguistic model indicating a linguistic likelihood when a speaker having generated a speech signal has the turn to speak;
      
      performing second linguistic likelihood identification that identifies a likelihood from a second linguistic model indicating a linguistic likelihood when a speaker having generated a speech signal does not have the turn to speak; and
      
      at time of the maximum likelihood candidate search, acquiring a candidate for a speech recognition result by using at least one of a linguistic likelihood identified by the first linguistic likelihood identification and a linguistic likelihood identified by the second linguistic likelihood identification according to the turn information.

12. A storage medium for storing a dialogue speech recognition program that causes a computer to execute speech recognition processing that, upon receiving a speech signal of each speaker in a dialog among a plurality of speakers and turn information indicating whether a speaker having generated the speech signal has turn to speak or indicating a probability that the speaker has turn to speak, performs speech recognition for the speech signal, whereinthe speech recognition processing at least includes:
- acoustic likelihood computation processing that provides a likelihood of occurrence of an input speech signal from a given phoneme sequence;
  
  linguistic likelihood computation processing that provides a likelihood of occurrence of a given word sequence; and
  
  maximum likelihood candidate search processing that provides a word sequence with a maximum likelihood of occurrence from a speech signal by using the likelihoods provided by the acoustic likelihood computation processing and the linguistic likelihood computation processing, andthe linguistic likelihood computation processing provides different linguistic likelihoods when a speaker having generated the speech signal input to the speech recognition unit has the turn to speak and when not.
- View Dependent Claims (13)
- - 13. The storage medium for storing the dialogue speech recognition program according to claim 12, wherein the program causes a computer to executein the linguistic likelihood processing,first linguistic likelihood identification processing that identifies a likelihood from a first linguistic model indicating a linguistic likelihood when a speaker having generated the speech signal has the turn to speak, andsecond linguistic likelihood identification processing that identifies a likelihood from a second linguistic model indicating a linguistic likelihood when a speaker having generated the speech signal does not have the turn to speak, andin the maximum likelihood candidate search processing, to acquire a candidate for a speech recognition result by using at least one of a linguistic likelihood identified by the first linguistic likelihood identification processing and a linguistic likelihood identified by the second linguistic likelihood identification processing according to the turn information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Nagatomo, Kentaro

Granted Patent

US 8,818,801 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/240
CPC Class Codes

G10L 15/18 using natural language mode...

DIALOGUE SPEECH RECOGNITION SYSTEM, DIALOGUE SPEECH RECOGNITION METHOD, AND RECORDING MEDIUM FOR STORING DIALOGUE SPEECH RECOGNITION PROGRAM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

DIALOGUE SPEECH RECOGNITION SYSTEM, DIALOGUE SPEECH RECOGNITION METHOD, AND RECORDING MEDIUM FOR STORING DIALOGUE SPEECH RECOGNITION PROGRAM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links