VOICE RECOGNITION APPARATUS, VOICE RECOGNITION METHOD AND PROGRAM

US 20160005394A1
Filed: 12/20/2013
Published: 01/07/2016
Est. Priority Date: 02/14/2013
Status: Active Grant

First Claim

Patent Images

1. A voice recognition apparatus, comprising:

a tracking unit for detecting a sound source direction and a voice segment to execute a sound source extraction process; and

a voice recognition unit for inputting a sound source extraction result from the tracking unit to execute a voice recognition process,the tracking unit creating a segment being created management unit that creates and manages a voice segment per unit of sound source,each segment being created management unit createdsequentially detecting a sound source direction to execute a voice segment creation process that sequentially updates a voice segment estimated by connecting a detection result to a time direction,creating an extraction filter for a sound source extraction after a predetermined time is elapsed from a voice segment beginning, andsequentially applying the extraction filter created to an input voice signal to sequentially create a partial sound source extraction result of a voice segment,the tracking unitsequentially outputting the partial sound source extraction result created by the segment being created management unit to the voice recognition unit,the voice recognition unitsequentially executing the voice recognition process to the partial sound source extraction result inputted from the tracking unit to output a voice recognition result.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is provided an apparatus and a method for rapidly extracting a target sound from a sound signal where a variety of sounds are mixed generated from a plurality of the sound sources. There is a voice recognition unit including a tracking unit for detecting a sound source direction and a voice segment to execute a sound source extraction process, and a voice recognition unit for inputting a sound source extraction result to execute a voice recognition process. In the tracking unit, a segment being created management unit that creates and manages a voice segment per unit of sound source sequentially detects a sound source direction, sequentially updates a voice segment estimated by connecting a detection result to a time direction, creates an extraction filter for a sound source extraction after a predetermined time is elapsed, and sequentially creates a sound source extraction result by sequentially applying the extraction filter to an input voice signal. The voice recognition unit sequentially executes the voice recognition process to a partial sound source extraction result to output a voice recognition result.

Citations

20 Claims

1. A voice recognition apparatus, comprising:
- a tracking unit for detecting a sound source direction and a voice segment to execute a sound source extraction process; and
  
  a voice recognition unit for inputting a sound source extraction result from the tracking unit to execute a voice recognition process,the tracking unit creating a segment being created management unit that creates and manages a voice segment per unit of sound source,each segment being created management unit createdsequentially detecting a sound source direction to execute a voice segment creation process that sequentially updates a voice segment estimated by connecting a detection result to a time direction,creating an extraction filter for a sound source extraction after a predetermined time is elapsed from a voice segment beginning, andsequentially applying the extraction filter created to an input voice signal to sequentially create a partial sound source extraction result of a voice segment,the tracking unitsequentially outputting the partial sound source extraction result created by the segment being created management unit to the voice recognition unit,the voice recognition unitsequentially executing the voice recognition process to the partial sound source extraction result inputted from the tracking unit to output a voice recognition result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The voice recognition apparatus according to claim 1, whereinthe tracking unit executes a voice segment creation process to connect collectively a plurality of sound source direction information detected in accordance with a plurality of different methods to a time direction in each segment being created management unit.
  - 3. The voice recognition apparatus according to claim 1, whereinthe tracking unit immediately executes beginning or end determination process if it detects that a user'"'"'s sign detected from an input image from an image input unit represents beginning or end of a voice segment.
  - 4. The voice recognition apparatus according to claim 1, whereinthe segment being created management unit of the tracking unit creates an extraction filter for preferentially extracting a voice of a specific sound source from an observation signal by utilizing an observation signal inputted from a time before beginning of a voice segment to a time when a filter is created.
  - 5. The voice recognition apparatus according to claim 1, whereinthe segment being created management unit of the tracking unit applies an extraction filter for preferentially extracting a voice of a specific sound source from an observation signal, estimates a whole dead corner space filter that attenuates a voice of all sound sources included in the observation signal used in the estimation of the extraction filter, and subtracts a result of applying the whole dead corner space filter from a result of applying the extraction filter to remove a disturbing sound not included in the observation signal and to create a sound source extraction result.
  - 6. The voice recognition apparatus according to claim 1, whereinthe segment being created management unit of the tracking unit changes a mask that decreases a transmittance of the observation signal for each frequency and each time as a proportion of a sound other than a target sound is higher than a target sound in the observation signal corresponding to the segment being created, executes time frequency masking process that sequentially applies the mask to the observation signal, and extracts a sound source of the target sound.
  - 7. The voice recognition apparatus according to claim 1, further comprising:
    - an extraction result buffering unit for temporary storing the sound source extraction result generated by the tracking unit; and
      
      a ranking unit for determining a priority to output a plurality of the sound source extraction results corresponding to the respective sound sources stored in the extraction result buffering unit,the ranking unit setting a priority of the sound source extraction result corresponding to the voice segment having the beginning or the end determined based on a user'"'"'s explicit sign.
  - 8. The voice recognition apparatus according to claim 7, whereinthe tracking unit sets a “
    - registered attribute”
      
      in order to identify a voice segment set based on a speaker'"'"'s explicit sign provided based on an image analysis, andthe ranking unit executes a process that sets a priority of the voice segment to which the registered attribute is set to high.
  - 9. The voice recognition apparatus according to claim 8, whereinthe ranking unit determines a priority to output to the voice recognition unit by applying the following scales:
    - (Scale
      
           1) the voice segment having the attribute of “
      
      registered”
      
      has a priority, if there are a plurality of the voice segments having the attribute of “
      
      registered”
      
      , the voice segment having the earliest beginning has a priority;
      
      (Scale
      
           2) as to the voice segment not having the attribute of “
      
      registered”
      
      , the voice segment having the end already determined has a priority, if there are a plurality of the voice segments having the ends already determined, the voice segment having the earliest end has a priority;
      
      (Scale
      
           3) the voice segment having the end not determined, the voice segment having the earliest beginning has a priority.
  - 10. The voice recognition apparatus according to claim 7, whereinthe voice recognition unit has a plurality of decoders for executing a voice recognition process, requests an output of a sound source extraction result generated by the tracking unit in accordance with availability of the decoders, inputs a sound source extraction result in accordance with the priority, and preferentially executes a voice recognition on a sound source extraction result having a high priority.
  - 11. The voice recognition apparatus according to claim 1, whereinthe tracking unit creates a feature amount adapted to a form used in a voice recognition of the voice recognition unit in each segment being created management unit, and outputs the feature amount created to the voice recognition unit.
  - 12. The voice recognition apparatus according to claim 11, whereinthe feature amount is a Mel-Frequency Cepstral Coefficient.
  - 13. The voice recognition apparatus according to claim 1, further comprising:
    - a sound input unit including a microphone array;
      
      an image input unit having a camera;
      
      a sound source direction estimation unit for estimating a sound source direction based on an inputted sound from the sound input unit; and
      
      an image process unit for analyzing a sound source direction based on an analysis of an inputted image from the image input unit,the tracking unit creating one integrated sound source direction information by applying sound source direction information created by the sound source direction estimation unit and sound source direction information created by the image process unit.
  - 14. The voice recognition apparatus according to claim 13, whereinthe image process unit includesa lip image process unit for detecting a movement of a speaker'"'"'s lip area based on an analysis of an input image from the image input unit;
    - anda hand image process unit for detecting a movement of a speaker'"'"'s hand area.
  - 15. The voice recognition apparatus according to claim 13, whereinthe tracking unitsets an “
    - registered attribute”
      
      in order to identify a voice segment set based on a speaker'"'"'s explicit sign inputted from the image process unit, andperforms a merge process between a voice segment having a registered attribute and a voice segment not having a registered attribute for integrating other voice segment into the voice segment having a registered attribute.
  - 16. The voice recognition apparatus according to claim 15, whereinthe tracking unitin the voice segment having a registered attribute, if sound source direction information is not inputted, direction information is automatically generated to execute a voice segment extension process.
  - 17. The voice recognition apparatus according to claim 1, whereinthe voice recognition unit is configured toinclude a plurality of recognition tasks each being a pair of a dictionary having a vocabulary to be recognized and a language model, andexecute a meaning estimation process for searching a task most adaptable to a user'"'"'s speech among a plurality of difference tasks.
  - 18. The voice recognition apparatus according to claim 1, further comprising:
    - a configuration that a pointer is moved on a display unit by synchronizing with a speaker'"'"'s hand movement provided based on an analysis of a captured image of the speaker, and beginning or end of a speech segment is determined depending on a movement of the pointer.

19. A voice recognition method executed by a voice recognition apparatus, the voice recognition apparatus, comprising:
- a tracking unit for detecting a sound source direction and a voice segment to execute a sound source extraction process; and
  
  a voice recognition unit for inputting a sound source extraction result from the tracking unit to execute a voice recognition process,the tracking unit creating a segment being created management unit that creates and manages a voice segment per unit of sound source,each segment being created management unit createdsequentially detecting a sound source direction to execute a voice segment creation process that sequentially updates a voice segment by connecting a detection result to a time direction,creating an extraction filter for a sound source extraction after a predetermined time is elapsed from a voice segment beginning, andsequentially applying the extraction filter created to an input voice signal to sequentially create a partial sound source extraction result of a voice segment,the tracking unitsequentially outputting the partial sound source extraction result created by the segment being created management unit to the voice recognition unit,the voice recognition unitsequentially executing the voice recognition process to the partial sound source extraction result inputted from the tracking unit to output a voice recognition result.

20. A program for executing a voice recognition method executed by a voice recognition apparatus, the voice recognition apparatus, comprising:
- a tracking unit for detecting a sound source direction and a voice segment to execute a sound source extraction process; and
  
  a voice recognition unit for inputting a sound source extraction result from the tracking unit to execute a voice recognition process,the program allowsthe tracking unit to create a segment being created management unit that creates and manages a voice segment per unit of sound source,each segment being created management unit createdto sequentially detect a sound source direction to execute a voice segment creation process that sequentially updates a voice segment by connecting a detection result to a time direction,to create an extraction filter for a sound source extraction after a predetermined time is elapsed from a voice segment beginning, andto sequentially apply the extraction filter created to an input voice signal to sequentially create a partial sound source extraction result of a voice segment,the tracking unitto sequentially output the partial sound source extraction result created by the segment being created management unit to the voice recognition unit,the voice recognition unitto sequentially execute the voice recognition process to the partial sound source extraction result inputted from the tracking unit to output a voice recognition result.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
HIROE, Atsuo

Granted Patent

US 10,475,440 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/28   Constructional details of s...

G10L 21/0272   Voice signal separating

VOICE RECOGNITION APPARATUS, VOICE RECOGNITION METHOD AND PROGRAM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

VOICE RECOGNITION APPARATUS, VOICE RECOGNITION METHOD AND PROGRAM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links