Method for updating voiceprint feature model and terminal

US 9,685,161 B2
Filed: 12/30/2014
Issued: 06/20/2017
Est. Priority Date: 07/09/2012
Status: Active Grant

First Claim

Patent Images

1. A method for updating a voiceprint feature model, comprising:

obtaining an original audio stream comprising at least one speaker;

obtaining a respective audio stream of each speaker of the at least one speaker in the original audio stream according to a preset speaker segmentation and clustering algorithm;

separately matching the respective audio stream of each speaker of the at least one speaker with an original voiceprint feature model to obtain a successfully matched audio stream;

using the successfully matched audio stream as an additional audio stream training sample for generating the original voiceprint feature model; and

updating the original voiceprint feature model to improve a voice recognition capability of a computing device that uses the original voiceprint feature model to identify the at least one speaker.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for updating a voiceprint feature model and a terminal are provided that are applicable to the field of voice recognition technologies. The method includes: obtaining an original audio stream including at least one speaker; obtaining a respective audio stream of each speaker of the at least one speaker in the original audio stream according to a preset speaker segmentation and clustering algorithm; separately matching the respective audio stream of each speaker of the at least one speaker with an original voiceprint feature model, to obtain a successfully matched audio stream; and using the successfully matched audio stream as an additional audio stream training sample for generating the original voiceprint feature model, and updating the original voiceprint feature model.

24 Citations

View as Search Results

20 Claims

1. A method for updating a voiceprint feature model, comprising:
- obtaining an original audio stream comprising at least one speaker;
  
  obtaining a respective audio stream of each speaker of the at least one speaker in the original audio stream according to a preset speaker segmentation and clustering algorithm;
  
  separately matching the respective audio stream of each speaker of the at least one speaker with an original voiceprint feature model to obtain a successfully matched audio stream;
  
  using the successfully matched audio stream as an additional audio stream training sample for generating the original voiceprint feature model; and
  
  updating the original voiceprint feature model to improve a voice recognition capability of a computing device that uses the original voiceprint feature model to identify the at least one speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, wherein before obtaining the original audio stream comprising the at least one speaker, the method further comprises establishing the original voiceprint feature model according to a preset audio stream training sample.
  - 3. The method according to claim 2, wherein obtaining the respective audio stream of each speaker of the at least one speaker in the original audio stream according to the preset speaker segmentation and clustering algorithm comprises:
    - segmenting the original audio stream into a plurality of audio clips according to a preset speaker segmentation algorithm, wherein each audio clip of the plurality of audio clips comprises only audio information of a same speaker of the at least one speaker; and
      
      clustering, according to a preset speaker clustering algorithm, the audio clips that comprise only the same speaker of the at least one speaker, to generate an audio stream that comprises only the audio information of the same speaker of the at least one speaker.
  - 4. The method according to claim 3, wherein separately matching the respective audio stream of each speaker of the at least one speaker with the original voiceprint feature model to obtain the successfully matched audio stream comprises:
    - obtaining a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      selecting an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 5. The method according to claim 2, wherein separately matching the respective audio stream of each speaker of the at least one speaker with the original voiceprint feature model to obtain the successfully matched audio stream comprises:
    - obtaining a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      selecting an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 6. The method according to claim 1, wherein obtaining the respective audio stream of each speaker of the at least one speaker in the original audio stream according to the preset speaker segmentation and clustering algorithm comprises:
    - segmenting the original audio stream into a plurality of audio clips according to a preset speaker segmentation algorithm, wherein each audio clip of the plurality of audio clips comprises only audio information of a same speaker of the at least one speaker; and
      
      clustering, according to a preset speaker clustering algorithm, the audio clips that comprise only the same speaker of the at least one speaker to generate an audio stream that comprises only the audio information of the same speaker of the at least one speaker.
  - 7. The method according to claim 6, wherein separately matching the respective audio stream of each speaker of the at least one speaker with the original voiceprint feature model to obtain the successfully matched audio stream comprises:
    - obtaining a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      selecting an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 8. The method according to claim 1, wherein separately matching the respective audio stream of each speaker of the at least one speaker with the original voiceprint feature model to obtain a successfully matched audio stream comprises:
    - obtaining a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      selecting an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 9. The method according to claim 1, wherein using the successfully matched audio stream as the additional audio stream training sample for generating the original voiceprint feature model and updating the original voiceprint feature model comprises:
    - generating a corrected voiceprint feature model according to the successfully matched audio stream and the preset audio stream training sample, wherein the preset audio stream training sample is an audio stream for generating the original voiceprint feature model; and
      
      updating the original voiceprint feature model to the corrected voiceprint feature model.
  - 10. The method according to claim 1, further comprising unlocking a screen of a mobile phone based upon matching the original voiceprint feature model.

11. A terminal, comprising:
- a non-transitory computer readable medium having instructions stored thereon; and
  
  a computer processor coupled to the non-transitory computer readable medium and configured to execute the instructions to;
  
  obtain an original audio stream comprising at least one speaker;
  
  obtain a respective audio stream of each speaker of the at least one speaker in the original audio stream according to a preset speaker segmentation and clustering algorithm;
  
  separately match the respective audio stream of each speaker of the at least one speaker with an original voiceprint feature model, to obtain a successfully matched audio stream;
  
  use the successfully matched audio stream as an additional audio stream training sample for generating the original voiceprint feature model; and
  
  update the original voiceprint feature model to improve a voice recognition capability of a computing device that uses the original voiceprint feature model to identify the at least one speaker.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The terminal according to claim 11, wherein the computer processor is further configured to execute the instructions to:
    - obtain a preset audio stream training sample; and
      
      establish the original voiceprint feature model according to the preset audio stream training sample.
  - 13. The terminal according to claim 12, wherein the computer processor is further configured to execute the instructions to:
    - segment the original audio stream into a plurality of audio clips according to a preset speaker segmentation algorithm, wherein each audio clip of the plurality of audio clips comprises only audio information of a same speaker of the at least one speaker; and
      
      cluster, according to a preset speaker clustering algorithm, the audio clips that comprise only the same speaker of the at least one speaker, to generate an audio stream that comprises only the audio information of the same speaker of the at least one speaker.
  - 14. The terminal according to claim 13, wherein the computer processor is further configured to execute the instructions to:
    - obtain a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      select an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 15. The terminal according to claim 11, wherein the computer processor is further configured to execute the instructions to:
    - segment the original audio stream into a plurality of audio clips according to a preset speaker segmentation algorithm, wherein each audio clip of the plurality of audio clips comprises only audio information of a same speaker of the at least one speaker; and
      
      cluster, according to a preset speaker clustering algorithm, the audio clips that comprise only the same speaker of the at least one speaker to generate an audio stream that comprises only the audio information of the same speaker of the at least one speaker.
  - 16. The terminal according to claim 15, wherein the computer processor is further configured to execute the instructions to:
    - obtain a matching degree between the audio stream of each speaker of the at feast one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      select an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 17. The terminal according to claim 11, wherein the computer processor is further configured to execute the instructions to:
    - obtain a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      select an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 18. The terminal according to claim 12, wherein the computer processor is further configured to execute the instructions to:
    - obtain a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      select an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 19. The terminal according to claim 11, wherein the computer processor is further configured to execute the instructions to:
    - generate a corrected voiceprint feature model according to the successfully matched audio stream and the preset audio stream training sample; and
      
      update the original voiceprint feature model to the corrected voiceprint feature model.
  - 20. The terminal according to claim 11, wherein the computer processor is further configured to execute the instructions to unlock a screen of a mobile phone based upon matching the original voiceprint feature model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Device Company Limited (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Device Company Limited (Huawei Investment & Holding Co., Ltd.)
Inventors
Lu, Ting
Primary Examiner(s)
Chawan, Vijay B

Application Number

US14/585,486
Publication Number

US 20150112680A1
Time in Patent Office

903 Days
Field of Search

704246, 704273, 704245, 704250, 704244, 379 8802, 379 8801, 379 8804
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/142   Hidden Markov Models [HMMs]

G10L 17/02   Preprocessing operations, e...

G10L 17/04   Training, enrolment or mode...

G10L 2015/0631   Creating reference template...

Method for updating voiceprint feature model and terminal

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

24 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method for updating voiceprint feature model and terminal

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links