Multi-level speech recognition

US 9,305,554 B2
Filed: 07/16/2014
Issued: 04/05/2016
Est. Priority Date: 07/17/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

obtaining, by a first electronic device, context data comprising at least one of location, time and activity, wherein the context data is associated with the first electronic device;

transmitting, by the first electronic device, the context data to a second electronic device;

receiving, by the first electronic device, a first speech recognition model, the first speech recognition model being a subset of a second speech recognition model present at the second electronic device, wherein the first speech recognition model is selected based on the context data;

determining, by the first electronic device, whether an utterance can be recognized by a speech recognition process, wherein the speech recognition process is performed by the first electronic device and uses the first speech recognition model;

in response to determining that the utterance cannot be recognized by the speech recognition process, sending, by the first electronic device, at least a portion of the utterance to the second electronic device; and

in response to determining that the utterance can be recognized by the speech recognition process, causing, by the first electronic device, performance of an action associated with the utterance.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and device for recognizing an utterance. The method includes transmitting context data associated with a first device to a second device. A first speech recognition model is received from the second device. The first speech recognition model is a subset of a second speech recognition model present at the second device. The first speech recognition model is based on the context data. It is determined whether the utterance can be recognized at the first device based on the first speech recognition model. If the utterance cannot be recognized at the first device, then at least a portion of the utterance is sent to the second device. If the utterance can be recognized at the first device, then an action associated with the recognized utterance is performed.

Citations

31 Claims

1. A method comprising:
- obtaining, by a first electronic device, context data comprising at least one of location, time and activity, wherein the context data is associated with the first electronic device;
  
  transmitting, by the first electronic device, the context data to a second electronic device;
  
  receiving, by the first electronic device, a first speech recognition model, the first speech recognition model being a subset of a second speech recognition model present at the second electronic device, wherein the first speech recognition model is selected based on the context data;
  
  determining, by the first electronic device, whether an utterance can be recognized by a speech recognition process, wherein the speech recognition process is performed by the first electronic device and uses the first speech recognition model;
  
  in response to determining that the utterance cannot be recognized by the speech recognition process, sending, by the first electronic device, at least a portion of the utterance to the second electronic device; and
  
  in response to determining that the utterance can be recognized by the speech recognition process, causing, by the first electronic device, performance of an action associated with the utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the second electronic device comprises a higher level of processing capability than the first electronic device.
  - 3. The method of claim 1, wherein:
    - the first speech recognition model includes at least one of;
      
      a first language model and a first acoustic model; and
      
      the second speech recognition model includes at least one of;
      
      a second language model and a second acoustic model.
  - 4. The method of claim 1 wherein the action associated with the utterance is performed locally at the first electronic device.
  - 5. The method of claim 1 further comprising:
    - upon a determination that the first electronic device is near a third electronic device, receiving, by the first electronic device, a third speech recognition model from the third electronic device; and
      
      transmitting, by the first electronic device, speech recognition information to the third electronic device, wherein the speech recognition information is generated by recognizing an utterance that is received by the third electronic device.
  - 6. The method of claim 1, wherein:
    - the determining whether the utterance can be recognized by the speech recognition process comprises determining whether a threshold value is exceeded; and
      
      the threshold value is based on at least one of;
      
      an estimated word error rate, a length of an utterance, presence of keywords in an utterance, availability of a network connection, prior history of processing an utterance, and a processing capability of the first electronic device.
  - 7. The method of claim 1, further comprising:
    - receiving, by the first electronic device, speech recognition information that is based on selective pre-processing of speech recognition, wherein the selective pre-processing is based on the context data.
  - 8. The method of claim 4, further comprising determining, by the first electronic device, a particular electronic device that is in a vicinity of the first electronic device and that is capable of executing the action.
  - 9. The method of claim 1, wherein:
    - the speech recognition process is configured to occur within a hierarchy of electronic devices and includes a guaranteed order of execution of actions; and
      
      the guaranteed order of execution of actions is based on using a gateway configured to determine order of action execution.
  - 10. The method of claim 1, wherein:
    - the speech recognition process includes speech decoding;
      
      the speech decoding is selectively passed between the first electronic device and one or more electronic devices that have a higher processing level than the first electronic device; and
      
      the speech decoding is processed sequentially or hierarchically.
  - 11. The method of claim 10, wherein the speech decoding continues until a particular processing level is matched.
  - 12. The method of claim 1, further comprising:
    - determining, using access patterns, at least one automatic speech recognition (ASR) implementation, wherein the action is executed based on multiple language models and multiple acoustic models, and wherein multiple actionable commands are published by and are subscribed to by one or more networked devices.
  - 13. The method of claim 1, wherein the first electronic device is a mobile electronic device, a smart appliance device, smart television device, or a smart home system.

14. A first electronic device comprising:
- a processor device configured to obtain context data comprising at least one of location, time and activity, wherein the context data is associated with the first electronic device;
  
  a transmitter configured to transmit the context data to a second electronic device;
  
  a microphone configured to capture an utterance; and
  
  a speech processor configured to;
  
  receive a first speech recognition model, wherein the first speech recognition model is a subset of a second speech recognition model present at the second electronic device, and the first speech recognition model is selected based on the context data, determine whether the utterance can be recognized by a speech recognition process, wherein the speech recognition process is performed by the speech processor and uses the first speech recognition model, send at least a portion of the utterance to the second electronic device in response to a determination that the utterance cannot be recognized by the speech recognition process, and upon a determination that the utterance can be recognized by the speech recognition process, causing by the first electronic device, performance of an action associated with the utterance.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 15. The first electronic device of claim 14, wherein the second electronic device comprises a higher level of processing capability than the first electronic device.
  - 16. The first electronic device of claim 14, wherein:
    - the first speech recognition model includes at least one of;
      
      a first language model and a first acoustic model; and
      
      the second speech recognition model includes at least one of;
      
      a second language model and a second acoustic model.
  - 17. The first electronic device of claim 14, wherein:
    - the speech processor is configured to;
      
      receive a third speech recognition model from a third electronic device; and
      
      when the first electronic device is near the third electronic device, process, using the third speech recognition model, an utterance received by the third electronic device;
      
      the transmitter is configured to transmit speech recognition information to the third electronic device; and
      
      the speech recognition information is generated by recognizing the utterance received by the third electronic device.
  - 18. The first electronic device of claim 14, wherein:
    - the speech processor is configured to determine whether a threshold value is exceeded; and
      
      the threshold value is based on at least one of;
      
      an estimated word error rate, a length of an utterance, presence of keywords in an utterance, availability of a network connection, prior history of processing an utterance, and a processing capability of the first electronic device.
  - 19. The first electronic device of claim 16, wherein:
    - the speech processor is configured to receive speech recognition information that is based on selective pre-processing speech recognition; and
      
      the selective pre-processing is based on the context data.
  - 20. The first electronic device of claim 16, wherein:
    - the action comprises commands;
      
      the commands are performable based on device capability;
      
      the speech recognition process includes speech decoding;
      
      the speech decoding is selectively passed between the first electronic device and one or more electronic devices that have a higher processing level than the first electronic device; and
      
      the speech decoding is processed sequentially or hierarchically.
  - 21. The first electronic device of claim 19, wherein:
    - the first electronic device is configured to receive at least one automatic speech recognition (ASR) implementation; and
      
      the at least one ASR implementation is selected based on access patterns.
  - 22. The first electronic device of claim 14, wherein:
    - the action is executed based on multiple language models and multiple acoustic models; and
      
      multiple actionable commands are published by and are subscribed to by one or more devices networked with the first electronic device.
  - 23. The first electronic device of claim 14, wherein the first electronic device is a mobile electronic device, a smart appliance, smart television device, or a smart home system.

24. A non-transitory processor-readable medium that includes a program that, when executed by a processor, performs a method comprising:
- obtaining, by a first electronic device, context data comprising at least one of location, time and activity, wherein the context data is associated with the first electronic device;
  
  transmitting, by the first electronic device, the context data to a second electronic device;
  
  receiving, by the first electronic device, a first speech recognition model, wherein the first speech recognition model is a subset of a second speech recognition model present at the second electronic device, wherein the first speech recognition model is selected based on the context data;
  
  determining, by the first electronic device, whether an utterance can be recognized by a speech recognition process, wherein the speech recognition process is performed by the first electronic device and uses the first speech recognition model;
  
  in response to determining that the utterance cannot be recognized by the speech recognition process, sending, by the first electronic device, at least a portion of the utterance to the second electronic device; and
  
  in response to determining that the utterance can be recognized by the speech recognition process, causing the first electronic device to perform an action associated with the utterance.
- View Dependent Claims (25, 26, 27, 28, 29, 30)
- - 25. The non-transitory processor-readable medium of claim 24, wherein the second electronic device comprises a higher level of processing capability than the first electronic device.
  - 26. The non-transitory processor-readable medium of claim 24, wherein:
    - the first speech recognition model includes at least one of;
      
      a first language mod and a first acoustic model; and
      
      the second speech recognition model includes at least one of;
      
      a second language model and a second acoustic model.
  - 27. The non-transitory processor-readable medium of claim 24, the method further comprising:
    - upon a determination that the first electronic device is near a third electronic device, receiving, by the first electronic device, a third language model from the third electronic device;
      
      transmitting, by the first electronic device, speech recognition information to the third electronic device, wherein the speech recognition information is generated by recognizing an utterance that is received by the third electronic device;
      
      wherein;
      
      the determining whether the utterance can be recognized by the speech recognition process comprises determining whether a threshold value is exceeded; and
      
      the threshold value is based on at least one of;
      
      an estimated word error rate, a length of an utterance, presence of keywords in an utterance, availability of a network connection, prior history of recognizing an utterance, and a processing capability of the first electronic device.
  - 28. The non-transitory processor-readable medium of claim 24, the method further comprising:
    - receiving, by the first electronic device, speech recognition information that is based on selective pre-processing of speech recognition, wherein the selective pre-processing is based on the context data; and
      
      determining, by the first electronic device, a particular electronic device that is in a vicinity of the first electronic device and that is capable of executing the action.
  - 29. The non-transitory processor-readable medium of claim 24, wherein:
    - the speech recognition process includes speech decoding;
      
      speech decoding is selectively passed between the first electronic device and one or more electronic devices that have a higher processing level than the first electronic device;
      
      the speech decoding is processed sequentially or hierarchically;
      
      the speech decoding continues until a particular processing level is matched;
      
      the action is executed based on multiple language models and multiple acoustic models; and
      
      multiple actionable commands are published by and are subscribed to by one or more networked devices.
  - 30. The non-transitory processor-readable medium of claim 24, wherein the first electronic device is a mobile electronic device, a smart appliance device, smart television device, or a smart home system.

31. A method comprising:
- obtaining, by a first electronic device, context data comprising at least one of location, time and activity, wherein the context data is associated with the first electronic device;
  
  transmitting, by the first electronic device, the context data to a second electronic device;
  
  receiving, by the first electronic device, a first speech recognition model, wherein the first speech recognition model is different than a second speech recognition model present at the second electronic device, wherein the first speech recognition model is selected based on the context data;
  
  determining, by the first electronic device, whether an utterance can be recognized by a speech recognition process, wherein the speech recognition process is performed by the first electronic device and uses the first speech recognition model;
  
  in response to determining that the utterance cannot be recognized by the speech recognition process, sending, by the first electronic device, at least a portion of the utterance to the second electronic device; and
  
  in response to determining that the utterance can be recognized by the speech recognition process, causing the first electronic device to perform an action associated with the utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Jagatheesan, Arun, Lee, Juhan, Ahnn, Jong Hoon
Primary Examiner(s)
YEN, ERIC L

Application Number

US14/333,463
Publication Number

US 20150025890A1
Time in Patent Office

629 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/183   using context dependencies,...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

Multi-level speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-level speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links