Method and system for endpoint automatic detection of audio record

US 9,330,667 B2
Filed: 10/29/2010
Issued: 05/03/2016
Est. Priority Date: 10/29/2010
Status: Active Grant

First Claim

Patent Images

1. A method for detecting an endpoint of an audio record, comprising presetting a mute duration threshold as a first time threshold;

wherein the method further comprises;

obtaining an audio record text;

determining an acoustic model for a text endpoint of the audio record text; and

obtaining each frame of audio record data in turn starting from an audio record start frame of the audio record data;

determining a characteristics acoustic model of a decoding optimal path for an obtained current frame of the audio record data; and

determining that the characteristics acoustic model of the decoding optimal path for the current frame of the audio record data is the same as the acoustic model for the endpoint;

updating the mute duration threshold to a second time threshold, wherein the second time threshold is smaller than the first time threshold.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for endpoint automatic detection of audio record is provided. The method comprises the following steps: acquiring a audio record text and affirming the text endpoint acoustic model for the audio record text; starting acquiring the audio record data of each frame in turn from the audio record start frame in the audio record data; affirming the characteristics acoustic model of the decoding optimal path for the acquired current frame of the audio record data; comparing the characteristics acoustic model of the decoding optimal path acquired from the current frame of the audio record data with the endpoint acoustic model to determine if they are the same; if yes, updating a mute duration threshold with a second time threshold, wherein the second time threshold is less than a first time threshold. This method can improve the recognizing efficiency of the audio record endpoint.

22 Citations

14 Claims

1. A method for detecting an endpoint of an audio record, comprising presetting a mute duration threshold as a first time threshold;
- wherein the method further comprises;
  
  obtaining an audio record text;
  
  determining an acoustic model for a text endpoint of the audio record text; and
  
  obtaining each frame of audio record data in turn starting from an audio record start frame of the audio record data;
  
  determining a characteristics acoustic model of a decoding optimal path for an obtained current frame of the audio record data; and
  
  determining that the characteristics acoustic model of the decoding optimal path for the current frame of the audio record data is the same as the acoustic model for the endpoint;
  
  updating the mute duration threshold to a second time threshold, wherein the second time threshold is smaller than the first time threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1, wherein the determining an acoustic model for a text endpoint comprises:
    - generating a decoding network corresponding to the text according to the audio record text, and determining a last acoustic model of the decoding network as the acoustic model for the text endpoint.
  - 3. The method according to claim 2, wherein the determining a characteristics acoustic model of a decoding optimal path for a current frame of the audio record data comprises:
    - extracting an MFCC characteristic corresponding to a preset acoustic model from the current frame of the audio record data to obtain the decoding optimal path for the current frame of the audio record data; and
      
      determining a last acoustic model of the decoding optimal path for the current frame of the audio record data as the characteristics acoustic model of the decoding optimal path.
  - 4. The method according to claim 1, further comprising:
    - retaining the mute duration threshold as the first time threshold if it is determined that the characteristics acoustic model of the decoding optimal path for the current frame of the audio record data is different from the acoustic model for the endpoint.
  - 5. The method according to claim 1, wherein after one frame of audio record data is obtained, the method further comprises:
    - ending the audio record if it is determined that the obtained current frame of the audio record data is mute data and a current mute duration is larger than a current mute duration threshold.
  - 6. The method according to claim 1, wherein before the obtaining each frame of audio record data, the method further comprises:
    - receiving the audio record data and determining the audio record start frame of the audio record data.
  - 7. The method according to claim 6, wherein the determining the audio record start frame of the audio record data comprises:
    - determining in turn whether each frame of the audio record data is the mute data or non-mute data, and using a first frame of the non-mute data as the audio record start frame.

8. A system for detecting an endpoint of an audio record, wherein a mute duration threshold is preset as a first time threshold;
- and the system further comprises;
  
  a first determining unit adapted to obtain an audio record text and determine an acoustic model for a text endpoint of the audio record text;
  
  a first obtaining unit adapted to obtain each frame of audio record data in turn starting from an audio record start frame of the audio record data;
  
  a second determining unit adapted to determine a characteristics acoustic model of a decoding optimal path for an obtained current frame of the audio record data; and
  
  a threshold determining unit adapted to update the mute duration threshold to the second time threshold if it is determined that the characteristics acoustic model of the decoding optimal path for the current frame of the audio record data is the same as an acoustic model for the endpoint, wherein the second time threshold is smaller than the first time threshold.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system according to claim 8, wherein the first determining unit comprises:
    - an obtaining subunit adapted to obtain the audio record text;
      
      a network establishing subunit adapted to establish a decoding network corresponding to the text according to the audio record text; and
      
      a first characteristic determining subunit adapted to determine a last acoustic model of the decoding network as the acoustic model for the text endpoint.
  - 10. The system according to claim 9, wherein the second determining unit comprises:
    - an extracting subunit adapted to extract an MFCC characteristic corresponding to a preset acoustic model from the current frame of the audio record data to obtain the decoding optimal path for the current frame of the audio record data; and
      
      a second characteristic determining subunit adapted to determine a last acoustic model of the decoding optimal path for the current frame of the audio record data as the characteristics acoustic model of the decoding optimal path.
  - 11. The system according to claim 8, wherein the threshold determining unit is further adapted to retain the mute duration threshold as the first time threshold if it is determined that the characteristics acoustic model of the decoding optimal path for the current frame of the audio record data is different from the acoustic model for the endpoint.
  - 12. The system according to claim 8, further comprising:
    - an audio record control unit adapted to end the audio record if it is determined that the obtained current frame of the audio record data is mute data and a current mute duration is larger than a current mute duration threshold.
  - 13. The system according to claim 8, further comprising:
    - a receiving unit adapted to receive the audio record data and determine the audio record start frame of the audio record data.
  - 14. The system according to claim 13, wherein the receiving unit comprises:
    - a receiving subunit adapted to receive the audio record data; and
      
      a start frame determining subunit adapted to determine in turn whether each frame of the audio record data is the mute data or non-mute data, and use a first frame of the non-mute data as the audio record start frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Iflytek Co., Ltd.
Original Assignee
Iflytek Co., Ltd.
Inventors
Wei, Si, Hu, Guoping, Hu, Yu, Liu, Qingfeng
Primary Examiner(s)
Adesanya, Olujimi

Application Number

US13/878,818
Publication Number

US 20130197911A1
Time in Patent Office

2,013 Days
Field of Search

704/210, 704/215, 704/248, 704/253
US Class Current

1/1
CPC Class Codes

G10L 15/083   Recognition networks G10L15...

G10L 15/26   Speech to text systems G10L...

G10L 25/87   Detection of discrete point...

Method and system for endpoint automatic detection of audio record

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

22 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for endpoint automatic detection of audio record

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links