Method for Updating Voiceprint Feature Model and Terminal

US 20150112680A1
Filed: 12/30/2014
Published: 04/23/2015
Est. Priority Date: 07/09/2012
Status: Active Grant

First Claim

Patent Images

1. A method for updating a voiceprint feature model, comprising:

obtaining an original audio stream comprising at least one speaker;

obtaining a respective audio stream of each speaker of the at least one speaker in the original audio stream according to a preset speaker segmentation and clustering algorithm;

separately matching the respective audio stream of each speaker of the at least one speaker with an original voiceprint feature model to obtain a successfully matched audio stream;

using the successfully matched audio stream as an additional audio stream training sample for generating the original voiceprint feature model; and

updating the original voiceprint feature model.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for updating a voiceprint feature model and a terminal are provided that are applicable to the field of voice recognition technologies. The method includes: obtaining an original audio stream including at least one speaker; obtaining a respective audio stream of each speaker of the at least one speaker in the original audio stream according to a preset speaker segmentation and clustering algorithm; separately matching the respective audio stream of each speaker of the at least one speaker with an original voiceprint feature model, to obtain a successfully matched audio stream; and using the successfully matched audio stream as an additional audio stream training sample for generating the original voiceprint feature model, and updating the original voiceprint feature model.

34 Citations

View as Search Results

18 Claims

1. A method for updating a voiceprint feature model, comprising:
- obtaining an original audio stream comprising at least one speaker;
  
  obtaining a respective audio stream of each speaker of the at least one speaker in the original audio stream according to a preset speaker segmentation and clustering algorithm;
  
  separately matching the respective audio stream of each speaker of the at least one speaker with an original voiceprint feature model to obtain a successfully matched audio stream;
  
  using the successfully matched audio stream as an additional audio stream training sample for generating the original voiceprint feature model; and
  
  updating the original voiceprint feature model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, wherein before obtaining the original audio stream comprising the at least one speaker, the method further comprises establishing the original voiceprint feature model according to a preset audio stream training sample.
  - 3. The method according to claim 1, wherein obtaining the respective audio stream of each speaker of the at least one speaker in the original audio stream according to the preset speaker segmentation and clustering algorithm comprises:
    - segmenting the original audio stream into a plurality of audio clips according to a preset speaker segmentation algorithm, wherein each audio clip of the plurality of audio clips comprises only audio information of a same speaker of the at least one speaker; and
      
      clustering, according to a preset speaker clustering algorithm, the audio clips that comprise only the same speaker of the at least one speaker to generate an audio stream that comprises only the audio information of the same speaker of the at least one speaker.
  - 4. The method according to claim 2, wherein obtaining the respective audio stream of each speaker of the at least one speaker in the original audio stream according to the preset speaker segmentation and clustering algorithm comprises:
    - segmenting the original audio stream into a plurality of audio clips according to a preset speaker segmentation algorithm, wherein each audio clip of the plurality of audio clips comprises only audio information of a same speaker of the at least one speaker; and
      
      clustering, according to a preset speaker clustering algorithm, the audio clips that comprise only the same speaker of the at least one speaker, to generate an audio stream that comprises only the audio information of the same speaker of the at least one speaker.
  - 5. The method according to claim 1, wherein separately matching the respective audio stream of each speaker of the at least one speaker with the original voiceprint feature model to obtain a successfully matched audio stream comprises:
    - obtaining a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      selecting an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 6. The method according to claim 2, wherein separately matching the respective audio stream of each speaker of the at least one speaker with the original voiceprint feature model to obtain the successfully matched audio stream comprises:
    - obtaining a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      selecting an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 7. The method according to claim 3, wherein separately matching the respective audio stream of each speaker of the at least one speaker with the original voiceprint feature model to obtain the successfully matched audio stream comprises:
    - obtaining a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      selecting an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 8. The method according to claim 4, wherein separately matching the respective audio stream of each speaker of the at least one speaker with the original voiceprint feature model to obtain the successfully matched audio stream comprises:
    - obtaining a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model; and
      
      selecting an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 9. The method according to claim 1, wherein using the successfully matched audio stream as the additional audio stream training sample for generating the original voiceprint feature model and updating the original voiceprint feature model comprises:
    - generating a corrected voiceprint feature model according to the successfully matched audio stream and the preset audio stream training sample, wherein the preset audio stream training sample is an audio stream for generating the original voiceprint feature model; and
      
      updating the original voiceprint feature model to the corrected voiceprint feature model.

10. A terminal, comprising:
- an original audio stream obtaining unit;
  
  a segmentation and clustering unit;
  
  a matching unit; and
  
  a model updating unit,wherein the original audio stream obtaining unit is configured to obtain an original audio stream comprising at least one speaker, and send the original audio stream to the segmentation and clustering unit,wherein the segmentation and clustering unit is configured to receive the original audio stream sent by the original audio stream obtaining unit, obtain a respective audio stream of each speaker of the at least one speaker in the original audio stream according to a preset speaker segmentation and clustering algorithm, and send the respective audio stream of each speaker of the at least one speaker to the matching unit,wherein the matching unit is configured to receive the respective audio stream of each speaker of the at least one speaker sent by the segmentation and clustering unit, separately match the respective audio stream of each speaker of the at least one speaker with an original voiceprint feature model, to obtain a successfully matched audio stream, and send the successfully matched audio stream to the model updating unit, andwherein the model updating unit is configured to receive the successfully matched audio stream sent by the matching unit, use the successfully matched audio stream as an additional audio stream training sample for generating the original voiceprint feature model, and update the original voiceprint feature model.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The terminal according to claim 10, wherein the terminal further comprises:
    - a sample obtaining unit; and
      
      an original model establishing unit,wherein the sample obtaining unit is configured to obtain a preset audio stream training sample, and send the preset audio stream training sample to the original model establishing unit, andwherein the original model establishing unit is configured to receive the preset audio stream training sample sent by the sample obtaining unit, and establish the original voiceprint feature model according to the preset audio stream training sample.
  - 12. The terminal according to claim 10, wherein the segmentation and clustering unit comprises:
    - a segmentation unit; and
      
      a clustering unit,wherein the segmentation unit is configured to segment the original audio stream into a plurality of audio clips according to a preset speaker segmentation algorithm, wherein each audio clip of the plurality of audio clips comprises only audio information of a same speaker of the at least one speaker, and send the audio clips that comprise only the same speaker of the at least one speaker to the clustering unit, andwherein the clustering unit is configured to receive the audio clips, sent by the segmentation unit, that comprise only the same speaker of the at least one speaker, and cluster, according to a preset speaker clustering algorithm, the audio clips that comprise only the same speaker of the at least one speaker to generate an audio stream that comprises only the audio information of the same speaker of the at least one speaker.
  - 13. The terminal according to claim 11, wherein the segmentation and clustering unit comprises:
    - a segmentation unit; and
      
      a clustering unit,wherein the segmentation unit is configured to segment the original audio stream into a plurality of audio clips according to a preset speaker segmentation algorithm, wherein each audio clip of the plurality of audio clips comprises only audio information of a same speaker of the at least one speaker, and send the audio clips that comprise only the same speaker of the at least one speaker to the clustering unit, andwherein the clustering unit is configured to receive the audio clips, sent by the segmentation unit, that comprise only the same speaker of the at least one speaker, and cluster, according to a preset speaker clustering algorithm, the audio clips that comprise only the same speaker of the at least one speaker, to generate an audio stream that comprises only the audio information of the same speaker of the at least one speaker.
  - 14. The terminal according to claim 10, wherein the matching unit comprises:
    - a matching degree obtaining unit; and
      
      a matched audio stream obtaining unit,wherein the matching degree obtaining unit is configured to obtain a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model, and send the matching degree to the matched audio stream obtaining unit, andwherein the matched audio stream obtaining unit is configured to receive the matching degree, sent by the matching degree obtaining unit, between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model, and select an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 15. The terminal according to claim 11, wherein the matching unit comprises:
    - a matching degree obtaining unit; and
      
      a matched audio stream obtaining unit,wherein the matching degree obtaining unit is configured to obtain a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model, and send the matching degree to the matched audio stream obtaining unit, andwherein the matched audio stream obtaining unit is configured to receive the matching degree, sent by the matching degree obtaining unit, between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model, and select an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 16. The terminal according to claim 12, wherein the matching unit comprises:
    - a matching degree obtaining unit; and
      
      a matched audio stream obtaining unit,wherein the matching degree obtaining unit is configured to obtain a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model, and send the matching degree to the matched audio stream obtaining unit, andwherein the matched audio stream obtaining unit is configured to receive the matching degree, sent by the matching degree obtaining unit, between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model, and select an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 17. The terminal according to claim 13, wherein the matching unit comprises:
    - a matching degree obtaining unit; and
      
      a matched audio stream obtaining unit,wherein the matching degree obtaining unit is configured to obtain a matching degree between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model according to the audio stream of each speaker of the at least one speaker and the original voiceprint feature model, and send the matching degree to the matched audio stream obtaining unit, andwherein the matched audio stream obtaining unit is configured to receive the matching degree, sent by the matching degree obtaining unit, between the audio stream of each speaker of the at least one speaker and the original voiceprint feature model, and select an audio stream corresponding to a matching degree that is the highest and is greater than a preset matching threshold as the successfully matched audio stream.
  - 18. The terminal according to claim 10, wherein the model updating unit comprises:
    - a corrected model obtaining unit; and
      
      a model updating subunit,wherein the corrected model obtaining unit is configured to generate a corrected voiceprint feature model according to the successfully matched audio stream and the preset audio stream training sample, and send the corrected voiceprint feature model to the model updating subunit, andwherein the model updating subunit is configured to receive the corrected voiceprint feature model sent by the corrected model obtaining unit, and update the original voiceprint feature model to the corrected voiceprint feature model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Device Company Limited (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Device Company Limited (Huawei Investment & Holding Co., Ltd.)
Inventors
Lu, Ting

Granted Patent

US 9,685,161 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/244
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/142   Hidden Markov Models [HMMs]

G10L 17/02   Preprocessing operations, e...

G10L 17/04   Training, enrolment or mode...

G10L 2015/0631   Creating reference template...

Method for Updating Voiceprint Feature Model and Terminal

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

34 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method for Updating Voiceprint Feature Model and Terminal

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links