Speech recognition candidate selection based on non-acoustic input

US 9,626,001 B2
Filed: 11/13/2014
Issued: 04/18/2017
Est. Priority Date: 11/13/2014
Status: Active Grant

First Claim

Patent Images

1. An apparatus, comprising:

a memory; and

a processor operatively coupled to the memory and configured to;

receive a speech input;

generate at least two speech recognition candidates from the speech input;

observe a scene related to the speech input using one or more non-acoustic sensors;

segment the observed scene into one or more regions;

compute one or more properties for the one or more regions, wherein the computation of the one or more properties comprises a determination of a textual label using optical character recognition; and

select one of the speech recognition candidates based on the one or more computed properties of the one or more regions.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.

Citations

20 Claims

1. An apparatus, comprising:
- a memory; and
  
  a processor operatively coupled to the memory and configured to;
  
  receive a speech input;
  
  generate at least two speech recognition candidates from the speech input;
  
  observe a scene related to the speech input using one or more non-acoustic sensors;
  
  segment the observed scene into one or more regions;
  
  compute one or more properties for the one or more regions, wherein the computation of the one or more properties comprises a determination of a textual label using optical character recognition; and
  
  select one of the speech recognition candidates based on the one or more computed properties of the one or more regions.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 15, 16, 17)
- - 2. The apparatus of claim 1, wherein each speech recognition candidate comprises one of a word and a phrase.
  - 3. The apparatus of claim 1, wherein the speech recognition candidate is selected based on a comparison of the one or more computed properties of the one or more regions with the two or more speech recognition candidates generated based on the speech input.
  - 4. The apparatus of claim 3, wherein the processor is further configured to interpret the selected speech recognition candidate as a portion of an action command.
  - 5. The apparatus of claim 1, wherein the one or more properties further comprise a color and a shape.
  - 6. The apparatus of claim 1, wherein the computation of the one or more relational properties of the given region comprises a determination of one or more spatial relations between the given region and the remaining ones of the one or more regions.
  - 7. The apparatus of claim 1, wherein the computation of the one or more properties of the one or more regions comprises a detection of one or more gestures.
  - 15. The apparatus of claim 3, wherein the processor is further configured to transcribe the selected speech recognition candidate as text for display on a device.
  - 16. The apparatus of claim 1, wherein the one or more non-acoustic sensors comprise at least one camera.
  - 17. The apparatus of claim 1, wherein each of the one or more segmented regions corresponds to an object or a surface in the observed scene.

8. An article of manufacture comprising a computer readable storage medium for storing computer readable program code which, when executed, causes a computer to:
- receive a speech input;
  
  generate at least two speech recognition candidates from the speech input;
  
  observe a scene related to the speech input using one or more non-acoustic sensors;
  
  segment the observed scene into one or more regions;
  
  compute one or more properties for the one or more regions, wherein the computation of the one or more properties comprises program code to determine a textual label using optical character recognition; and
  
  select one of the speech recognition candidates based on the one or more computed properties of the one or more regions.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 18, 19, 20)
- - 9. The article of claim 8, wherein each speech recognition candidate comprises one of a word and a phrase.
  - 10. The article of claim 8, wherein the speech recognition candidate is selected based on a comparison of the one or more computed properties of the one or more regions with the two or more speech recognition candidates generated based on the speech input.
  - 11. The article of claim 10, further comprising program code to interpret the selected speech recognition candidate as a portion of an action command.
  - 12. The article of claim 8, wherein the one or more properties further comprise a color and a shape.
  - 13. The article of claim 8, wherein the computation of the one or more properties of the one or more regions comprises program code to determine one or more spatial relations between the one or more regions.
  - 14. The article of claim 8, wherein the computation of the one or more properties of the one or more regions comprises program code to detect one or more gestures, and further comprising program code to associate the one or more detected gestures with at least a portion of the one or more regions.
  - 18. The article of claim 10, further comprising program code to transcribe the selected speech recognition candidate as text for display on a device.
  - 19. The article of claim 8, wherein the one or more non-acoustic sensors comprise at least one camera.
  - 20. The article of claim 8, wherein each of the one or more segmented regions corresponds to an object or a surface in the observed scene.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Connell, II, Jonathan H., Marcheret, Etienne
Primary Examiner(s)
Neway, Samuel G

Application Number

US14/540,527
Publication Number

US 20160140955A1
Time in Patent Office

887 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/017   Gesture based interaction, ...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/19   Grammatical context, e.g. d...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/25   using position of the lips,...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/226   using non-speech characteri...

G10L 2021/065   Aids for the handicapped in...

G10L 21/10   Transforming into visible i...

Speech recognition candidate selection based on non-acoustic input

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition candidate selection based on non-acoustic input

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links