Method and Apparatus for Generation and Augmentation of Search Terms from External and Internal Sources
First Claim
1. A method for identifying names, personalities, titles, and topics that may or may not be present in a repository and for placing them into a grammar, comprising the steps of:
- using information from external data sources comprising non-speech, text-based searches to expand search terms entered into one or more automatic speech recognition ASR grammars;
said search terms expansion comprising any of;
finding plausible linguistic variants of existing search terms that are already comprehended in the repository, but that are under slightly different names; and
expanding an existing search term list with items that should be in said list by virtue of their currency in popular culture, but which for whatever reason have not yet been reflected with content items in the repository.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus to identify names, personalities, titles, and topics that are present in a repository, and place them into a grammar, and to identify names, personalities, titles, and topics that are not present in the repository, and place them into a grammar, uses information from external data sources, notably the text used in non-speech, text-based searches, to expand the search terms entered into the ASR grammars. The expansion takes place in two forms: (1) finding plausible linguistic variants of existing search terms that are already comprehended in the repository, but that are present under slightly different names; and (2) expanding the existing search term list with items that should be there by virtue of their currency in popular culture, but which for whatever reason have not yet been reflected with content items in the repository.
-
Citations
16 Claims
-
1. A method for identifying names, personalities, titles, and topics that may or may not be present in a repository and for placing them into a grammar, comprising the steps of:
-
using information from external data sources comprising non-speech, text-based searches to expand search terms entered into one or more automatic speech recognition ASR grammars;
said search terms expansion comprising any of;
finding plausible linguistic variants of existing search terms that are already comprehended in the repository, but that are under slightly different names; and
expanding an existing search term list with items that should be in said list by virtue of their currency in popular culture, but which for whatever reason have not yet been reflected with content items in the repository.
-
-
2. A method for identifying names, personalities, titles, and topics that may or may not be present in a repository and for placing them into a grammar for use in an automatic speech recognition (ASR) system, comprising the steps of:
-
extracting search term candidates from external sources;
extracting verified search terms from internal sources;
matching candidate search terms against verified search terms by edit distance techniques to obtain plausible linguistic variants of verified search terms;
using said linguistic variants to generate augmented verified search terms; and
including candidate search terms which do not point to actual content elements, but which the ASR system should nevertheless recognize, as null search terms in a final grammar by virtue of any of their high incidence count, repeated appearance in history as either a candidate or verified search term, or other criterion. - View Dependent Claims (3, 4, 5)
-
-
6. An apparatus for identifying names, personalities, titles, and topics that may or may not be present in a repository and for placing them into a grammar for use in an automatic speech recognition (ASR) system, comprising:
-
a repository comprising external sources and internal sources;
a module for combining data from all external sources and for generating an output comprising candidate search terms;
a module for processing said candidate search terms by performing any of incidence counting, low pass filtering, and other functions;
a module arranged to receive as inputs the output of said module for processing candidate search terms and historical information comprising any of a history of candidate search terms, a history of final search terms, and verified search terms, and for providing a processed external source output;
a module for receiving the processed external source output, and for identifying and outputting null search terms;
a module for combining data from all internal sources and for generating an output comprising said verified search terms;
an approximate text matching module arranged to receive as inputs said verified search terms and said candidate search terms, and for providing an output;
a module for receiving the output of said approximate text matching module and for providing an output comprising augmented verified search terms;
a module for combining said null search terms with said augmented verified search terms and for outputting final search terms for grammar generation. - View Dependent Claims (7, 8)
-
-
9. An apparatus for identifying names, personalities, titles, and topics that may or may not be present in a repository and for placing them into a grammar, comprising:
-
a plurality of external data sources, comprising non-speech, text-based searches;
means for expanding search terms entered into one or more automatic speech recognition ASR grammars by using information from said external data sources, said means for expanding search terms comprising any of;
means for finding plausible linguistic variants of existing search terms that are already comprehended in the repository, but that are under slightly different names; and
means for expanding an existing search term list with items that should be in said list by virtue of their currency in popular culture, but which for whatever reason have not yet been reflected with content items in the repository. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A method for identifying names, personalities, titles, and topics that may or may not be present in a repository and for placing them into a grammar for use in an automatic speech recognition (ASR) system, comprising the steps of:
-
providing a repository comprising said external sources and internal sources;
combining data from all external sources and generating candidate search terms;
processing said candidate search terms by performing any of incidence counting, low pass filtering, and other functions;
receiving said processed candidate search terms and historical information comprising any of a history of candidate search terms, a history of final search terms, and verified search terms, and providing a processed external source output;
receiving the processed external source output, and identifying and outputting null search terms;
combining data from all internal sources and generating said verified search terms;
receiving said verified search terms and said candidate search terms and providing an approximate text matching output;
receiving said approximate text matching output and outputting augmented verified search terms;
combining said null search terms with said augmented verified search terms and outputting final search terms for grammar generation. - View Dependent Claims (15, 16)
-
Specification