Method to expand inputs for word or document searching
First Claim
Patent Images
1. A method for electronic document or word searching, comprising the steps of:
- (a) given an input, expanding the input as a function of at least one of (i) acoustic similarity and/or (ii) frequency of word sequence occurrence, said expanding resulting in alternative input words or phrases, wherein the step of expanding includes;
translating words in the given input to one or more phoneme strings;
determining word boundaries in each of the phoneme strings to produce respective phoneme subsequences; and
for each produced phoneme subsequence, generating at least one confusable word phrase having a pronunciation which is acoustically similar to the phoneme subsequence, said confusable word phrase forming an alternative input word or phrase, wherein the step of generating at least one confusable word phrase includes;
comparing each phoneme subsequence to word pronunciations from a dictionary, said comparing resulting in a list of words from the dictionary; and
ordering the list of words from the dictionary, the most likely word being at the top of the list, said ordering of the list including scoring the word pronunciation of each word in the dictionary against the phoneme subsequence using a distance metric based on the distance between a hypothesized pronunciation in the phoneme subsequence and the pronunciation from the dictionary; and
(b) returning the alternative input words or phrases for further processing.
4 Assignments
0 Petitions
Accused Products
Abstract
An electronic document searching system or word searching system which when given an input, expands the input as a function of acoustic similarity and/or word sequence occurrence frequency. Results of the system are alternative input words or phrases. The alternative input words or phrases are output from the system for further processing.
106 Citations
32 Claims
-
1. A method for electronic document or word searching, comprising the steps of:
-
(a) given an input, expanding the input as a function of at least one of (i) acoustic similarity and/or (ii) frequency of word sequence occurrence, said expanding resulting in alternative input words or phrases, wherein the step of expanding includes; translating words in the given input to one or more phoneme strings; determining word boundaries in each of the phoneme strings to produce respective phoneme subsequences; and for each produced phoneme subsequence, generating at least one confusable word phrase having a pronunciation which is acoustically similar to the phoneme subsequence, said confusable word phrase forming an alternative input word or phrase, wherein the step of generating at least one confusable word phrase includes; comparing each phoneme subsequence to word pronunciations from a dictionary, said comparing resulting in a list of words from the dictionary; and ordering the list of words from the dictionary, the most likely word being at the top of the list, said ordering of the list including scoring the word pronunciation of each word in the dictionary against the phoneme subsequence using a distance metric based on the distance between a hypothesized pronunciation in the phoneme subsequence and the pronunciation from the dictionary; and (b) returning the alternative input words or phrases for further processing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16)
-
- 10. A method as claimed in claim 10, wherein the step of determining word boundaries in each of the phoneme strings to produce respective phoneme subsequences uses a syllable-based word boundary.
-
17. A computer system for electronic document or word searching, comprising:
-
an expansion module for expanding a input as a function of at least one of (i) acoustic similarity and/or (ii) frequency of word sequence occurrence, said expanding resulting in alternative input words or phrases and returning the alternative input words or phrases for further processing, wherein the expansion module comprises; a translation module for translating words in the given input to one or more phoneme strings; a determination module for determining word boundaries in each of the phoneme strings to produce respective phoneme subsequences; and a production module for each produced phoneme subsequence, generating at least one confusable word phrase having a pronunciation which is acoustically similar to the phoneme subsequence, said confusable word phrase forming an alternative input word or phrase, wherein the production module comprises; a comparison module for comparing each phoneme subsequence to word pronunciations from a dictionary, said comparing resulting in a list of words from the dictionary; and an ordering module for ordering the list of words from the dictionary, the most likely word being at the top of the list, the ordering module scoring the word pronunciation of each word in the dictionary against the phoneme subsequence using a distance metric based on the distance between a hypothesized pronunciation in the phoneme subsequence and the pronunciation from the dictionary. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer system comprising:
-
(a) means for expanding a given input and resulting in alternative input words or phrases, wherein the means for expanding includes; means for translating words in the given input to one or more phoneme strings; means for determining word boundaries in each of the phoneme strings to produce respective phoneme subsequences; and for each produced phoneme subsequence, means for generating at least one confusable word phrase having a pronunciation which is acoustically similar to the phoneme subsequence, said confusable word phrase forming an alternative input word or phrase, said means for generating at least one confusable word phrase; comparing each phoneme subsequence to word pronunciations from a dictionary, said comparing resulting in a list of words from the dictionary; and ordering the list of words from the dictionary, the most likely word being at the top of the list, said ordering of the list including scoring the word pronunciation of each word in the dictionary against the phoneme subsequence using a distance metric based on the distance between a hypothesized pronunciation in the phoneme subsequence and the pronunciation from the dictionary; and (b) means for searching an index of electronic documents for the alternative input words or phrases in response to the given input.
-
Specification