Dynamic match lattice spotting for indexing speech content
First Claim
1. A computer implemented method of indexing speech content, the method comprising the steps of:
- generating a phone lattice from said speech content;
processing the phone lattice to generate a set of observed sequences Q=(Θ
,i), wherein Θ
are the observed sequences for each node i in said phone lattice; and
storing said set of observed sequences Q=(Θ
,i) for each node.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for indexing and searching speech content, the system includes two distinct stages, a speech indexing stage (100) and a speech retrieval stage (200). A phone lattice (103) is generated by passing speech content (101) through a speech recogniser (102). The resulting phone lattice is then processed to produce a set of observed sequences Q=(Θ,i) where Θ are the set of observed phone sequences for each node i in the phone lattice. During the retrieval stage (200), a user first inputs a target word (205) into the system, which is then reduced to a target phone sequence P=(p1, p2, . . . , pN) (207). The system then compares target sequence P with the set of observed sequences Q (208), suitably by scoring each observed sequence against the target sequence using a Minimum Edit Distance (MED) calculation to produce a set of matching sequences R (209).
56 Citations
32 Claims
-
1. A computer implemented method of indexing speech content, the method comprising the steps of:
-
generating a phone lattice from said speech content;
processing the phone lattice to generate a set of observed sequences Q=(Θ
,i), wherein Θ
are the observed sequences for each node i in said phone lattice; and
storing said set of observed sequences Q=(Θ
,i) for each node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 32)
-
-
8. A method for searching indexed speech content wherein said indexed speech content is stored in the form of a phone lattice, the method comprising the steps of:
-
obtaining a target sequence P=(p1, p2, p3, . . . pN);
comparing the target sequence P with a set of observed sequences Q=(Θ
,i) generated for each node i in said phone lattice, wherein the comparison between the target sequence and observed sequences includes scoring each observed sequence against the target sequence using a Minimum Edit Distance (MED) calculation; and
outputting a set of sequences R from said set of observed sequences that match said target sequence. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system for indexing and searching speech content, the system comprising:
-
a speech recognition engine for generating a phone lattice from said speech content;
a first database for storing said phone lattice generated by said speech recognition engine;
an input device for obtaining a target sequence P=(p1, p2, p3, . . . pN);
at least one processor coupled to said input device and said first database, which processor is configured to;
process said phone lattice to generate a set of observed sequences Q=(Θ
,i), wherein Θ
are the observed sequences for each node i in said phone lattice;
store said observed sequences Q=(Θ
,i) in a second database;
compare said target sequence P with the set of observed sequences Q=(Θ
,i) wherein the comparison between the target sequence and observed sequences includes scoring each observed sequence against the target sequence using a Minimum Edit Distance (MED) calculation; and
output a set of sequences R from said set of observed sequences Q=(Θ
,i) that match said target sequence. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification