Information processing device and method for determining whether a state of collected sound data is suitable for speech recognition

US 10,720,154 B2
Filed: 09/15/2015
Issued: 07/21/2020
Est. Priority Date: 12/25/2014
Status: Active Grant

First Claim

Patent Images

1. An information processing device, comprising:

circuitry configured to;

acquire an image of a user;

control a display device to display an object on a display screen;

determine an arrival direction of user voice with respect to a microphone based on analysis of the image of the user, wherein the microphone is configured to collect sound data;

control a movement of the object on the display screen based on the arrival direction;

acquire the collected sound data from the arrival direction based on a direction of the movement of the object on the display screen;

determine utterance of an expression based on the collected sound data, wherein the expression indicates one of a beginning of a sentence included in the collected sound data or an end of the sentence included in the collected sound data;

determine a state of the collected sound data based on the determination of utterance of the expression, wherein the state is one of a first state that indicates that the collected sound data is suitable for speech recognition or a second state that indicates that the collected sound data is unsuitable for the speech recognition;

control an output device to output the state of the collected sound data; and

control at least one parameter of the object based on the state of the collected sound data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided is an information processing device including: a collected sound data acquisition portion that acquires collected sound data; and an output controller that causes an output portion to output at least whether or not a state of the collected sound data is suitable for speech recognition.

0 Citations

21 Claims

1. An information processing device, comprising:
- circuitry configured to;
  
  acquire an image of a user;
  
  control a display device to display an object on a display screen;
  
  determine an arrival direction of user voice with respect to a microphone based on analysis of the image of the user, wherein the microphone is configured to collect sound data;
  
  control a movement of the object on the display screen based on the arrival direction;
  
  acquire the collected sound data from the arrival direction based on a direction of the movement of the object on the display screen;
  
  determine utterance of an expression based on the collected sound data, wherein the expression indicates one of a beginning of a sentence included in the collected sound data or an end of the sentence included in the collected sound data;
  
  determine a state of the collected sound data based on the determination of utterance of the expression, wherein the state is one of a first state that indicates that the collected sound data is suitable for speech recognition or a second state that indicates that the collected sound data is unsuitable for the speech recognition;
  
  control an output device to output the state of the collected sound data; and
  
  control at least one parameter of the object based on the state of the collected sound data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The information processing device according to claim 1, wherein the circuitry is further configured to control the output device to output the first state of the collected sound data based on an amount of the collected sound data that is within an allowable range.
  - 3. The information processing device according to claim 2, wherein the circuitry is further configured to control the output device to output the second state of the collected sound data in an event the amount of the collected sound data is less than a lower limit of the allowable range.
  - 4. The information processing device according to claim 2, wherein the circuitry is further configured to control the output device to output the second state of the collected sound data in an event the amount of the collected sound data is greater than an upper limit of the allowable range.
  - 5. The information processing device according to claim 2, wherein the circuitry is further configured to control the output device to output the first state of the collected sound data in an event the amount of the collected sound data is greater than a lower limit of the allowable range and in an event the amount of the collected sound data is less than an upper limit of the allowable range.
  - 6. The information processing device according to claim 2, wherein the circuitry is further configured to control the output device to output the at least one of a lower limit of the allowable range or an upper limit of the allowable range.
  - 7. The information processing device according to claim 6, wherein the amount of the collected sound data includes a temporal length of the collected sound data.
  - 8. The information processing device according to claim 6, wherein the amount of the collected sound data includes a number of phonemes extracted from the collected sound data.
  - 9. The information processing device according to claim 6, wherein the amount of the collected sound data includes a temporal length of a speech-like part in the collected sound data.
  - 10. The information processing device according to claim 2, whereinthe circuitry is further configured to update at least one of a lower limit of the allowable range or an upper limit of the allowable range based on an amount of noise in the collected sound data, andthe allowable range is a range suitable for the speech recognition.
  - 11. The information processing device according to claim 1, wherein the circuitry is further configured to control the output device to output an amount of the collected sound data.
  - 12. The information processing device according to claim 1, wherein the circuitry is further configured to increase an upper limit based on an increase in an amount of noise.
  - 13. The information processing device according to claim 1, whereinthe circuitry is further configured to control the output device to output the expression as one of the first state of the collected sound data or the second state of the collected sound data.
  - 14. The information processing device according to claim 13,wherein the circuitry is further configured to determine whether the collected sound data includes the expression based on an intermediate result of the speech recognition.
  - 15. The information processing device according to claim 1, wherein the circuitry is further configured to cause the speech recognition for at least one of a part of the collected sound data or an entirety of the collected sound data to be performed.
  - 16. The information processing device according to claim 15,wherein the circuitry is further configured to determine termination of the part of the collected sound data that is a target of the speech recognition, based on a determination that a volume in the collected sound data is less than a threshold volume for a time period that exceeds a threshold value.
  - 17. The information processing device according to claim 16, wherein the circuitry is further configured to update the threshold value based on the expression.
  - 18. The information processing device according to claim 1, wherein the circuitry is further configured to determine the arrival direction of the user voice based on one of a direction or an orientation of a user finger indicated in the image of the user.
  - 19. The information processing device according to claim 1, wherein the at least one parameter of the object comprises at least one of a shape, a transparency, a color, a size, or a motion of the object.

20. A method, comprising:
- acquiring an image of a user;
  
  controlling a display device to display an object on a display screen;
  
  determining an arrival direction of user voice with respect to a microphone, based on analysis of the image of the user, wherein the microphone is configured to collect sound data;
  
  controlling a movement of the object on the display screen based on the arrival direction;
  
  acquiring the collected sound data from the arrival direction based on a direction of the movement of the object on the display screen;
  
  determining utterance of an expression based on the collected sound data, wherein the expression indicates one of a beginning of a sentence included in the collected sound data or an end of the sentence included in the collected sound data;
  
  determining a state of the collected sound data based on the determination of utterance of the expression, wherein the state is one of a first state that indicates that the collected sound data is suitable for speech recognition or a second state that indicates that the collected sound data is unsuitable for the speech recognition;
  
  controlling an output device to output the state of the collected sound data; and
  
  controlling at least one parameter of the object based on the collected sound data.

21. A non-transitory computer-readable medium having stored thereon, computer-executable instructions, which when executed by a computer, cause the computer to execute operations, the operations comprising:
- acquiring an image of a user;
  
  controlling a display device to display an object on a display screen;
  
  determining an arrival direction of user voice with respect to a microphone, based on analysis of the image of the user, wherein the microphone is configured to collect sound data;
  
  controlling a movement of the object on the display screen based on the arrival direction;
  
  acquiring the collected sound data from the arrival direction based on a direction of the movement of the object on the display screen;
  
  determining utterance of an expression based on the collected sound data, wherein the expression indicates one of a beginning of a sentence included in the collected sound data or an end of the sentence included in the collected sound data;
  
  determining a state of the collected sound data based on the determination of utterance of the expression, wherein the state is one of a first state that indicates that the collected sound data is suitable for speech recognition or a second state that indicates that the collected sound data is unsuitable for the speech recognition;
  
  controlling an output device to output the state of the collected sound data; and
  
  controlling at least one parameter of the object based on the collected sound data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Kawano, Shinichi, Taki, Yuhei, Shibuya, Takashi
Primary Examiner(s)
Pullias, Jesse S

Application Number

US15/535,101
Publication Number

US 20170345423A1
Time in Patent Office

1,771 Days
Field of Search

704231-257, 704270-275
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/04   Segmentation; Word boundary...

G10L 15/22   Procedures used during a sp...

G10L 15/222   Barge in, i.e. overridable ...

G10L 15/28   Constructional details of s...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/221   Announcement of recognition...

G10L 2015/225   Feedback of the input speech

Information processing device and method for determining whether a state of collected sound data is suitable for speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

0 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Information processing device and method for determining whether a state of collected sound data is suitable for speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

0 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links