Speech recognition candidate selection based on non-acoustic input

US 9,805,720 B2
Filed: 01/13/2017
Issued: 10/31/2017
Est. Priority Date: 11/13/2014
Status: Expired due to Fees

First Claim

Patent Images

1. A method, comprising:

receiving a speech input;

generating at least two speech recognition candidates from the speech input;

observing a scene related to the speech input using one or more non-acoustic sensors;

segmenting the observed scene into a plurality of regions, wherein each of the regions corresponds to an object or a surface in the observed scene;

computing properties for at least a given region of the plurality of regions, wherein computing the properties for the given region comprises computing one or more characteristics for the given region and computing one or more relationships between the given region and remaining ones of the plurality of regions, and wherein the one or more characteristics of the given region comprise a color, a shape and a textual label; and

selecting one of the speech recognition candidates based at least in part on the computed properties of the given region.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.

Citations

20 Claims

1. A method, comprising:
- receiving a speech input;
  
  generating at least two speech recognition candidates from the speech input;
  
  observing a scene related to the speech input using one or more non-acoustic sensors;
  
  segmenting the observed scene into a plurality of regions, wherein each of the regions corresponds to an object or a surface in the observed scene;
  
  computing properties for at least a given region of the plurality of regions, wherein computing the properties for the given region comprises computing one or more characteristics for the given region and computing one or more relationships between the given region and remaining ones of the plurality of regions, and wherein the one or more characteristics of the given region comprise a color, a shape and a textual label; and
  
  selecting one of the speech recognition candidates based at least in part on the computed properties of the given region.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein each speech recognition candidate comprises one of a word and a phrase.
  - 3. The method of claim 1, wherein the speech recognition candidate is selected based on a comparison of the computed properties of the given region with the two or more speech recognition candidates generated based on the speech input.
  - 4. The method of claim 3, further comprising transcribing the selected speech recognition candidate as text for display on a device.
  - 5. The method of claim 3, further comprising interpreting the selected speech recognition candidate as a portion of an action command.
  - 6. The method of claim 1, wherein the one or more non-acoustic sensors comprise at least one camera.
  - 7. The method of claim 1, wherein the one or more relationships between the given region and remaining ones of the plurality of regions comprise one or more spatial relations between the given region and the remaining ones of the plurality of regions.
  - 8. The method of claim 1, wherein computing the properties for the given region further comprises detecting one or more gestures.
  - 9. The method of claim 8, further comprising associating the one or more detected gestures with the given region.

10. An apparatus, comprising:
- a memory; and
  
  a processor operatively coupled to the memory and configured to;
  
  receive a speech input;
  
  generate at least two speech recognition candidates from the speech input;
  
  observe a scene related to the speech input using one or more non-acoustic sensors;
  
  segment the observed scene into a plurality of regions, wherein each of the regions corresponds to an object or a surface in the observed scene;
  
  compute properties for at least a given region of the plurality of regions, wherein the computation of the properties for the given region comprises a computation of one or more characteristics for the given region a computation of one or more relationships between the given region and remaining ones of the plurality of regions, and wherein the one or more characteristics of the given region comprise a color, a shape and a textual label; and
  
  select one of the speech recognition candidates based at least in part on the computed properties of the given region.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The apparatus of claim 10, wherein each speech recognition candidate comprises one of a word and a phrase.
  - 12. The apparatus of claim 10, wherein the speech recognition candidate is selected based on a comparison of the computed properties of the given region with the two or more speech recognition candidates generated based on the speech input.
  - 13. The apparatus of claim 12, further comprising interpreting the selected speech recognition candidate as a portion of an action command.
  - 14. The apparatus of claim 10, wherein the one or more relationships between the given region and remaining ones of the plurality of regions comprise one or more spatial relations between the given region and the remaining ones of the plurality of regions.
  - 15. The apparatus of claim 10, wherein the computation of the properties of the given region further comprises a detection of one or more gestures, and wherein the processor is further configured to associate the one or more detected gestures with the given region.

16. An article of manufacture comprising a computer readable storage medium for storing computer readable program code which, when executed, causes a computer to:
- receive a speech input;
  
  generate at least two speech recognition candidates from the speech input;
  
  observe a scene related to the speech input using one or more non-acoustic sensors;
  
  segment the observed scene into a plurality of regions, wherein each of the regions corresponds to an object or a surface in the observed scene;
  
  compute properties for at least a given region of the plurality of regions, wherein the computation of the properties for the given region comprises a computation of one or more characteristics for the given region a computation of one or more relationships between the given region and remaining ones of the plurality of regions, and wherein the one or more characteristics of the given region comprise a color, a shape and a textual label; and
  
  select one of the speech recognition candidates based at least in part on the computed properties of the given region.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The article of manufacture of claim 16, wherein each speech recognition candidate comprises one of a word and a phrase.
  - 18. The article of manufacture of claim 16, wherein the one or more relationships between the given region and the remaining ones of the plurality of regions comprise one or more spatial relations between the given region and the remaining ones of the plurality of regions.
  - 19. The article of manufacture of claim 16, wherein the speech recognition candidate is selected based on a comparison of the computed properties of the given region with the two or more speech recognition candidates generated based on the speech input.
  - 20. The article of manufacture of claim 16, wherein the computation of the properties of the given region further comprises a detection of one or more gestures, and wherein the processor is further configured to associate the one or more detected gestures with the given region.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Connell, II, Jonathan H., Marcheret, Etienne
Primary Examiner(s)
Neway, Samuel G

Application Number

US15/405,416
Publication Number

US 20170133016A1
Time in Patent Office

291 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/017   Gesture based interaction, ...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/19   Grammatical context, e.g. d...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/25   using position of the lips,...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/226   using non-speech characteri...

G10L 2021/065   Aids for the handicapped in...

G10L 21/10   Transforming into visible i...

Speech recognition candidate selection based on non-acoustic input

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition candidate selection based on non-acoustic input

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links