Speech segment clustering and ranking

US 20060129401A1
Filed: 12/15/2004
Published: 06/15/2006
Est. Priority Date: 12/15/2004
Status: Active Grant

First Claim

Patent Images

1. A method of identifying potentially misaligned speech segments from an ordered sequence of speech segments, the method comprising:

generating a first cluster comprising at least one speech segment selected from the ordered sequence if the at least one speech segment satisfies a predetermined filtering test;

generating a second cluster comprising at least one different speech segment selected from the ordered sequence if the at least one different speech segment satisfies the predetermined filtering test and if there is at least one intervening speech segment occupying a sequential position between the at least one speech segment and the at least one different speech segment, the intervening speech segment failing to satisfy the predetermined filtering test; and

combining the first and second clusters and the at least one intervening speech segment to generate an aggregated cluster if the aggregated cluster satisfies a predetermined combining criterion, the aggregated cluster replacing the first and second clusters.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method, and apparatus for identifying problematic speech segments is provided. The system includes a clustering module for generating a first cluster of one or more consecutive speech segments if the consecutive speech segments satisfy a predetermined filtering test, and for generating a second cluster comprising at least one different consecutive speech segment selected from the ordered sequence if the at least one different consecutive speech segment satisfies the predetermined filtering test. The system also includes a combining module for combining the first and second clusters as well as the at least one intervening consecutive speech segment to form an aggregated cluster if the aggregated cluster satisfies a predetermined combining criterion. The system can further include a ranking module for ranking aggregated clusters, the ranking reflecting a relative severity of misalignments among problematic speech segments. Once identified, more severely misaligned speech segments can be analyzed more effectively and efficiently.

Citations

21 Claims

1. A method of identifying potentially misaligned speech segments from an ordered sequence of speech segments, the method comprising:
- generating a first cluster comprising at least one speech segment selected from the ordered sequence if the at least one speech segment satisfies a predetermined filtering test;
  
  generating a second cluster comprising at least one different speech segment selected from the ordered sequence if the at least one different speech segment satisfies the predetermined filtering test and if there is at least one intervening speech segment occupying a sequential position between the at least one speech segment and the at least one different speech segment, the intervening speech segment failing to satisfy the predetermined filtering test; and
  
  combining the first and second clusters and the at least one intervening speech segment to generate an aggregated cluster if the aggregated cluster satisfies a predetermined combining criterion, the aggregated cluster replacing the first and second clusters.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the predetermined combining criterion reflects a likelihood that the at least one intervening speech segment is a misaligned speech segment.
  - 3. The method of claim 1, wherein the predetermined combining criterion is based upon at least one of a breaking test condition and a sizing test condition.
  - 4. The method of claim 1, wherein each speech segment belonging to the ordered sequence has a corresponding confidence index indicating a likelihood that the speech segment to which the confidence index corresponds is a misaligned speech segment, and wherein the filtering test is based upon a comparison of each confidence index with a predetermined confidence threshold.
  - 5. The method of claim 1, further comprising generating at least one additional aggregated cluster according to the same steps if the additional aggregated cluster satisfies the predetermined combining criterion, the aggregated cluster and the additional aggregated cluster being distinct from one another.
  - 6. The method of claim 5, further comprising:
    - ranking each cluster relative to one another if at least two clusters is generated;
      
      ranking each aggregate cluster relative to one another if at least two aggregate clusters is generated; and
      
      ranking each cluster and each aggregate cluster relative to each other if at least one cluster and at least one aggregate cluster is generated.
  - 7. The method of claim 6, wherein the ranking reflects a relative severity of speech misalignments.

8. A system for identifying potentially misaligned speech segments from an ordered sequence of speech segments, the system comprising:
- a clustering module for generating a first cluster comprising at least one speech segment selected from the ordered sequence if the at least one speech segment satisfies a predetermined filtering test, and generating a second cluster comprising at least one different speech segment selected from the ordered sequence if the at least one different speech segment satisfies the predetermined filtering test and if there is at least one intervening speech segment occupying a sequential position between the at least one speech segment and the at least one different speech segment, the intervening speech segment failing to satisfy the predetermined filtering test; and
  
  a combining module for combining the first and second clusters and the at least one intervening consecutive speech segment to form an aggregated cluster if the aggregated cluster satisfies a predetermined combining criterion.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the predetermined combining criterion reflects a likelihood that the at least one intervening speech segment is a misaligned speech segment.
  - 10. The system of claim 8, wherein the predetermined combining criterion is based upon at least one of a breaking test condition and a sizing test condition.
  - 11. The system of claim 8, wherein each speech segment belonging to the ordered sequence has a corresponding confidence index indicating a likelihood that the speech segment to which the confidence index corresponds is a misaligned speech segment, and wherein the filtering test is based upon a comparison of each confidence index with a predetermined confidence threshold.
  - 12. The system of claim 8, further comprising generating at least one additional aggregated cluster according to the same steps if the additional aggregated cluster satisfies the predetermined combining criterion, the aggregated cluster and the additional aggregated cluster being distinct from one another.
  - 13. The system of claim 12, further comprising a ranking module for:
    - ranking each cluster relative to one another if at least two clusters is generated;
      
      ranking each aggregate cluster relative to one another if at least two aggregate clusters is generated; and
      
      ranking each cluster and each aggregate cluster relative to each other if at least one cluster and at least one aggregate cluster is generated.
  - 14. The system of claim 13, wherein the ranking reflects a relative severity of speech misalignments.

15. A computer-readable storage medium for use in identifying potentially misaligned speech segments from an ordered sequence of speech segments, the computer-readable storage medium comprising computer instructions for:
- generating a first cluster comprising at least one speech segment selected from the ordered sequence if the at least one speech segment satisfies a predetermined filtering test;
  
  generating a second cluster comprising at least one different speech segment selected from the ordered sequence if the at least one different speech segment satisfies the predetermined filtering test and if there is at least one intervening speech segment occupying a sequential position between the at least one speech segment and the at least one different speech segment, the intervening speech segment failing to satisfy the predetermined filtering test; and
  
  combining the first and second clusters and the at least one intervening speech segment to generate an aggregated cluster if the aggregated cluster satisfies a predetermined combining criterion, the aggregated cluster replacing the first and second clusters.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer-readable storage medium of claim 15, wherein the predetermined combining criterion reflects a likelihood that the at least one intervening speech segment is a misaligned speech segment.
  - 17. The computer-readable storage medium of claim 15, wherein the predetermined combining criterion is based upon at least one of a breaking test condition and a sizing test condition.
  - 18. The computer-readable storage medium of claim 15, wherein each speech segment belonging to the ordered sequence has a corresponding confidence index indicating a likelihood that the speech segment to which the confidence index corresponds is a misaligned speech segment, and wherein the filtering test is based upon a comparison of each confidence index with a predetermined confidence threshold.
  - 19. The computer-readable storage medium of claim 15, wherein the instructions contained therein further cause generation of at least one additional aggregated cluster if the additional aggregated cluster satisfies the predetermined combining criterion, the aggregated cluster and the additional aggregated cluster being distinct from one another.
  - 20. The computer-readable storage medium of claim 19, further comprising computer instructions for:
    - ranking each cluster relative to one another if at least two clusters is generated;
      
      ranking each aggregate cluster relative to one another if at least two aggregate clusters is generated; and
      
      ranking each cluster and each aggregate cluster relative to each other if at least one cluster and at least one aggregate cluster is generated.
  - 21. The computer-readable storage medium of claim 20, wherein the ranking reflects a relative severity of speech misalignments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Zeng, Jie Z., Smith, Maria E.

Granted Patent

US 7,475,016 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/06 Elementary speech units use...

Speech segment clustering and ranking

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Speech segment clustering and ranking

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links