System and method for robust access and entry to large structured data using voice form-filling
First Claim
Patent Images
1. A method comprising:
- receiving a speech;
recognizing the speech using a phonotactic grammar to generate a phone lattice;
removing silence and filler words from the phone lattice, to yield a revised phone lattice;
normalizing, via a processor, costs in the revised phone lattice such that a cost of a best path is set to zero;
generating a cost-normalized query using factors of interest, wherein an index of words is indexed by the factors of interest;
generating, via the processor and by performing a first pass of entries in a database, a shortlist of recognized speech possibilities using the revised phone lattice, the index of words, and indices contained in the cost-normalized query;
performing a second pass on the shortlist of recognized speech possibilities using a grammar generated from the entries in the database to obtain a final result; and
providing a response to the speech based on the final result.
5 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus and machine-readable medium are provided. A phonotactic grammar is utilized to perform speech recognition on received speech and to generate a phoneme lattice. A document shortlist is generated based on using the phoneme lattice to query an index. A grammar is generated from the document shortlist. Data for each of at least one input field is identified based on the received speech and the generated grammar.
-
Citations
11 Claims
-
1. A method comprising:
-
receiving a speech; recognizing the speech using a phonotactic grammar to generate a phone lattice; removing silence and filler words from the phone lattice, to yield a revised phone lattice; normalizing, via a processor, costs in the revised phone lattice such that a cost of a best path is set to zero; generating a cost-normalized query using factors of interest, wherein an index of words is indexed by the factors of interest; generating, via the processor and by performing a first pass of entries in a database, a shortlist of recognized speech possibilities using the revised phone lattice, the index of words, and indices contained in the cost-normalized query; performing a second pass on the shortlist of recognized speech possibilities using a grammar generated from the entries in the database to obtain a final result; and providing a response to the speech based on the final result. - View Dependent Claims (2, 3, 4)
-
-
5. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; receiving a speech; recognizing the speech using a phonotactic grammar to generate a phone lattice; removing silence and filler words from the phone lattice, to yield a revised phone lattice; normalizing costs in the revised phone lattice such that a cost of a best path is set to zero; generating a cost-normalized query using factors of interest, wherein an index of words is indexed by the factors of interest; generating, by performing a first pass of entries in a database, a shortlist of recognized speech possibilities using the revised phone lattice, the index of words, and indices contained in the cost-normalized query; performing a second pass on the shortlist of recognized speech possibilities using a grammar generated from the entries in the database to obtain a final result; and providing a response to the speech based on the final result. - View Dependent Claims (6, 7, 8)
-
-
9. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
receiving a speech; recognizing the speech using a phonotactic grammar to generate a phone lattice; removing silence and filler words from the phone lattice, to yield a revised phone lattice; normalizing costs in the revised phone lattice such that a cost of a best path is set to zero; generating a cost-normalized query using factors of interest, wherein an index of words is indexed by the factors of interest; generating, by performing a first pass of entries in a database, a shortlist of recognized speech possibilities using the revised phone lattice, the index of words, and indices contained in the cost-normalized query; performing a second pass on the shortlist of recognized speech possibilities using a grammar generated from the entries in the database to obtain a final result; and providing a response to the speech based on the final result. - View Dependent Claims (10, 11)
-
Specification