Method for segmenting utterances by using partner's response

US 8,793,132 B2
Filed: 12/26/2007
Issued: 07/29/2014
Est. Priority Date: 12/26/2006
Status: Active Grant

First Claim

Patent Images

1. An apparatus for dividing a main speech of a first speaker in a conversational dialog comprising the first speaker and a second speaker into at least one utterance, the apparatus comprising:

a computer processor configured to execute;

a pause detecting section for detecting pauses in the main speech of the first speaker received from a first channel among at least two channels;

an acknowledgement detecting section for detecting acknowledgements in a speech of the second speaker received from a second channel of the at least two channels, wherein the second channel is separate from the first channel;

a boundary-candidate extracting section for extracting boundary candidates in the main speech of the first speaker received from the first channel based, at least in part, on identifying pauses detected by the pause detecting section that are located within a predetermined range before and/or after respective locations of the acknowledgements detected by the acknowledgement detecting section in the speech of the second speaker received from the second channel; and

a recognizing unit for outputting a word string associated with at least one utterance formed by segmenting the main speech of the first speaker received from the first channel according to at least one of the extracted boundary candidates.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus, method and program for dividing a conversational dialog into utterance. The apparatus includes: a computer processor; a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel making a main speech among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not making the main speech; a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the main speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.

37 Citations

View as Search Results

13 Claims

1. An apparatus for dividing a main speech of a first speaker in a conversational dialog comprising the first speaker and a second speaker into at least one utterance, the apparatus comprising:
- a computer processor configured to execute;
  
  a pause detecting section for detecting pauses in the main speech of the first speaker received from a first channel among at least two channels;
  
  an acknowledgement detecting section for detecting acknowledgements in a speech of the second speaker received from a second channel of the at least two channels, wherein the second channel is separate from the first channel;
  
  a boundary-candidate extracting section for extracting boundary candidates in the main speech of the first speaker received from the first channel based, at least in part, on identifying pauses detected by the pause detecting section that are located within a predetermined range before and/or after respective locations of the acknowledgements detected by the acknowledgement detecting section in the speech of the second speaker received from the second channel; and
  
  a recognizing unit for outputting a word string associated with at least one utterance formed by segmenting the main speech of the first speaker received from the first channel according to at least one of the extracted boundary candidates.
- View Dependent Claims (2, 3)
- - 2. The apparatus according to claim 1, wherein the recognizing unit is capable of accessing a word database for storing spellings and pronunciations of words and a grammar database for storing syntactic rules on words, and wherein the grammar database includes at least one of a fixed-phrase grammar, an acknowledgement grammar, and a recognition grammar.
  - 3. The apparatus according to claim 2, wherein the processor is further configured to execute a recognition-target segment determination unit for determining a recognition target segment to be divided into one or more utterances by referring to the fixed-phrase grammar, wherein:
    - the fixed-phrase grammar includes fixed phrases for starting and ending a confirmation,the word database includes spellings and pronunciations of the fixed phrases for starting and ending a confirmation.

4. A method for dividing a main speech of a first speaker in a conversational dialog comprising the first speaker and a second speaker into at least one utterance, the method comprising the steps of:
- detecting pauses in the main speech of the first speaker received from a first channel of a plurality of channels;
  
  detecting acknowledgements in a speech of the second speaker received from a second channel of the plurality of channels, wherein the second channel is separate from the first channel;
  
  extracting boundary candidates from the main speech of the first speaker received from the first channel at least in part by identifying detected pauses that are located within a predetermined range before and after respective locations of the detected acknowledgements detected in the speech of the second speaker received from the second channel; and
  
  outputting a word string associated with at least one utterance formed by segmenting the main speech of the first speaker received from the first channel according to at least one of the extracted boundary candidates.
- View Dependent Claims (5, 6, 7, 8)
- - 5. The method according to claim 4, wherein, in the step of outputting the word string, likelihoods of speech segments divided by the boundary candidates are calculated in reference to a word database in which spellings and pronunciations of words are described and a grammar database in which syntactic rules on words is described, and wherein a word string of a speech segment having a highest likelihood is outputted after dividing the speech segment into at least one utterance.
  - 6. The method according to claim 4, wherein the grammar database includes at least one of a fixed-phrase grammar, an acknowledgement grammar, and a recognition grammar.
  - 7. The method according to claim 6, wherein:
    - the fixed-phrase grammar includes fixed phrases for starting and ending a confirmation;
      
      the word database includes spellings and pronunciations of the fixed phrases for starting and ending a confirmation.
  - 8. The method according to claim 6, the method further comprising determining a recognition target segment to be divided into utterance by referring to the fixed-phrase grammar.

9. A computer-readable storage device storing computer-executable instructions that, when executed by at least one processor, perform a method for dividing a main speech of a first speaker in a conversational dialog comprising the first speaker and a second speaker into at least one utterance, the method comprising:
- detecting pauses in the main speech of the first speaker received from a first channel of a plurality of channels;
  
  detecting acknowledgements in a speech of the second speaker received from a second channel of the plurality of channels, wherein the second channel is separate from the first channel;
  
  extracting boundary candidates from the main speech of the first speaker received from the first channel at least in part by identifying detected pauses that are located within a predetermined range before and after respective locations of the detected acknowledgements detected in the speech of the second speaker received from the second channel; and
  
  outputting a word string associated with at least one utterance formed by segmenting the main speech of the first speaker received from the first channel according to at least one of the extracted boundary candidates.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The computer-readable storage device according to claim 9, wherein outputting the word string comprises calculating likelihoods of speech segments divided by the boundary candidates in reference to a word database in which spellings and pronunciations of words are described and a grammar database in which syntactic rules on words is described, and wherein a word string of a speech segment having a highest likelihood is outputted after dividing the speech segment into at least one utterance.
  - 11. The computer-readable storage device according to claim 9, wherein the grammar database includes at least one of a fixed-phrase grammar, an acknowledgement grammar, and a recognition grammar.
  - 12. The computer-readable storage device according to claim 11, wherein:
    - the fixed-phrase grammar includes fixed phrases for starting and ending a confirmation; and
      
      the word database includes spellings and pronunciations of the fixed phrases for starting and ending a confirmation.
  - 13. The computer-readable storage device according to claim 11, the method further comprising determining a recognition target segment to be divided into utterance by referring to the fixed-phrase grammar.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Itoh, Nobuyasu, Kurata, Gakuto
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Serrou, Abdelali

Application Number

US11/964,051
Publication Number

US 20080154594A1
Time in Patent Office

2,407 Days
Field of Search

704/253, 704/248, 704/E19.005, 704/E19.003, 704/251
US Class Current

704/253
CPC Class Codes

G10L 15/04 Segmentation; Word boundary...

G10L 15/19 Grammatical context, e.g. d...

Method for segmenting utterances by using partner's response

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Method for segmenting utterances by using partner's response

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links