Word matching with context sensitive character to sound correlating

US 20070150279A1
Filed: 12/27/2005
Published: 06/28/2007
Est. Priority Date: 12/27/2005
Status: Abandoned Application

First Claim

Patent Images

1. A method, comprising:

automatically generating one or more context sensitive character to sound correlation rules;

providing the one or more rules to a query processing logic;

converting a word into a first set of sounds using the one or more rules; and

storing the word and first set of sounds in a data store searchable by the query processing logic.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, media, and other embodiments associated with word matching with context sensitive character to sound correlating are described. One exemplary method embodiment includes automatically generating context sensitive character to sound correlation rules, making the rules available to a query processing logic, converting words into sets of sounds using the rules, and storing a data entry linking the word and set of sounds in a data store searchable by the query processing logic.

Citations

25 Claims

1. A method, comprising:
- automatically generating one or more context sensitive character to sound correlation rules;
  
  providing the one or more rules to a query processing logic;
  
  converting a word into a first set of sounds using the one or more rules; and
  
  storing the word and first set of sounds in a data store searchable by the query processing logic.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, including:
    - accepting a query term to match on pronunciation;
      
      converting the query term into a second set of sounds using the one or more rules;
      
      accessing the data store; and
      
      controlling the query processing logic to select one or more words from the data store based, at least in part, on matching the second set of sounds to one or more first set of sounds.
  - 3. The method of claim 1, where automatically generating the one or more rules includes machine learning the rules using one or more culturally aware pronunciation dictionaries during training, the culturally aware pronunciation dictionaries including words having characters described in a phonetically characterized training set of characters.
  - 4. The method of claim 3, including creating a character specific training table for a character in the training set of characters, the character specific training table including one or more words in which the character is found, one or more grams for the character, and one or more sounds associated with the character, the character specific training table including one or more entries containing a related word, gram, and sound.
  - 5. The method of claim 1, the one or more rules being configured to favor recall over precision.
  - 6. The method of claim 1, including modifying an existing document classifying logic to automatically generate the one or more rules, where modifying an existing document classifying logic includes replacing a document classification definition used by the existing document classifying logic with a word classification definition, replacing a document category used by the existing document classifying logic with a sound that represents a character, and replacing one or more document tokens used by the existing document classifying logic by one or more grams for a character.
  - 7. The method of claim 2, including controlling the query processing logic to input a string of grams associated with the query term and controlling the query processing logic to provide one or more possible sounds and one or more related confidences based on a context associated with the query term.
  - 8. The method of claim 1, where automatically generating the one or more rules includes controlling a text-to-phoneme conversion logic to build grapheme-to-phoneme rules in the form of decision trees and providing as input to the text-to-phoneme conversion logic one or more pronunciation dictionaries, where the text-to-phoneme conversion logic relies on alignment where letters are matched with phonemes and a mapping is made between ordered lists of letters and phonemes.
  - 9. The method of claim 8, including producing one or more feature vectors for a letter based, at least in part, on alignment, the feature vectors being configured to provide a context for the letter.
  - 10. The method of claim 9, where the context includes a relationship to one or more of, a previous letter, and a following letter.
  - 11. The method of claim 2, including controlling the query processing logic to select one or more words from the data store based, at least in part, on matching items, where matching items includes an orthographic match and a phonetic match, the orthographic match computing an edit distance between two items being compared, the phonetic match computing a linguistic edit distance between two items being compared, the orthographic match and the phonetic match being combined into a score upon which a match can be ranked.
  - 12. The method of claim 2, including accepting one or more user inputs concerning one or more of, a maximum number of highest confidence sounds considered for a character, and a minimum confidence for a combination of character sounds.
  - 13. The method of claim 2, including computing an overall confidence for a match for a word selected from the data store from one or more confidences related to letters in the word.
  - 14. The method of claim 1, including accepting a user input to configure an index for use by the query processing logic, the user input concerning one or more of, selecting a field that includes word data to index, assigning a confidence weighting on a field, setting a confidence score for a possible field ordering, determining a phonetic sound representation of a word based on pronunciation training data, storing combinations of words and sounds, storing grams of combinations of words and sounds in inverted indexes, storing base table names, and storing meta-data.
  - 15. The method of claim 2, including accepting a user input configured to manipulate a query for use by the query processing logic, the user input concerning one or more of, setting a threshold and discount factor, selecting a maximum number of results, selecting a minimum overall confidence threshold, adjusting an orthographic similarity weighting, adjusting a phonetic similarity weighting, adjusting an orthographic similarity confidence threshold, adjusting a phonetic similarity confidence threshold, assigning one or more confidence weightings to one or more fielded query terms, and establishing a region parameter associated with a region-specific pronunciation rewrite rule.
  - 16. The method of claim 2, where the word converted into the first set of sounds using the one or more rules is a name and where the query term is a name.
  - 17. The method of claim 2, the data store being configured as a relational database.

18. A computer-readable medium storing processor executable instructions operable to perform a method, the method comprising:
- automatically generating one or more recall biased context sensitive character to sound correlation rules using one or more culturally aware pronunciation dictionaries during machine learning training, the culturally aware pronunciation dictionaries including words having characters described in a phonetically characterized training set of characters, where automatically generating the one or more rules includes controlling a text-to-phoneme conversion logic to build grapheme-to-phoneme rules in the form of decision trees and includes providing as input to the text-to-phoneme conversion logic one or more pronunciation dictionaries, where the text-to-phoneme conversion logic relies on alignment where letters are matched with phonemes and a mapping is made between ordered lists of letters and phonemes;
  
  creating a character specific training table for a character in the training set of characters, the character specific training table including one or more words in which the character is found, one or more grams for the character, and one or more sounds associated with the character, the character specific training table including one or more entries containing related words, grams, and sounds;
  
  producing one or more feature vectors for a letter based, at least in part, on alignment, the feature vectors being configured to provide a context for the letter, where the context includes a relationship to one or more of, a previous letter, and a following letter;
  
  providing the one or more rules to a query processing logic;
  
  converting a word into a first set of sounds using the one or more rules;
  
  storing the word and first set of sounds in a data store searchable by the query processing logic;
  
  accepting a query term to match on pronunciation;
  
  converting the query term into a second set of sounds using the one or more rules;
  
  controlling the query processing logic to input a string of grams associated with the query term;
  
  accessing the data store;
  
  controlling the query processing logic to select one or more words from the data store based, at least in part, on matching the second set of sounds to one or more first set of sounds;
  
  controlling the query processing logic to provide one or more confidences related to the one or more words; and
  
  computing an overall confidence for a match for a word selected from the data store from confidences related to the letters in the word.

19. A system, comprising:
- one or more data stores configured to store one or more text to sound pronunciation data entries, one or more text training words, one or more text to sound conversion rules, and one or more text and sound representation data entries; and
  
  a machine learning logic configured to automatically generate one or more text to sound conversion rules from the text to sound pronunciation data entries and the text training words, to store the text to sound conversion rules, to automatically generate one or more text and sound representation data entries, and to store the one or more text and sound representation data entries.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The system of claim 19, including a query processing logic configured to receive a textual representation of a word, to produce a sound representation of the word using one or more of the text to sound conversion rules, and to provide one or more elements of one or more text and sound representation data entries based, at least in part, on matching sounds associated with the word to sounds associated with sound representation data stored in the text and sound representation data entries.
  - 21. The system of claim 20, the query processing logic being configured to favor recall over precision.
  - 22. The system of claim 20, text to sound pronunciation data entries including an ordered list of letters and phonemes, text to sound conversion rules being alignment based grapheme to phoneme rules organized in a decision tree, text and sound representation data entries including one or more context providing feature vectors for a letter in a word;
    - and the machine learning logic being configured to create character specific training tables for characters in the text training words, character specific training tables including one or more words in which a character is found, one or more grams for a character, and one or more sounds associated with a character, a character specific training table including one or more related sets of data containing a related word, gram, and sound.
  - 23. The system of claim 22, including an index manipulation logic configured to perform one or more of, selecting a field that includes word data to index, assigning a confidence weighting on a field, setting a confidence score for a possible field ordering, determining a phonetic sound representation of a word based on pronunciation training data, storing combinations of words and sounds, storing grams of combinations of words and sounds in inverted indexes, storing base table names, and storing meta-data;
    - and a query manipulation logic configured to manipulate a query for use by the query processing logic, the manipulating including one or more of, setting a threshold and discount factor, selecting a maximum number of results, selecting a minimum overall confidence threshold, adjusting an orthographic edit distance weighting, adjusting a phonetic edit distance weighting, adjusting an orthographic edit distance confidence threshold, adjusting a phonetic edit distance confidence threshold, assigning one or more confidence weightings to one or more query terms, and establishing a region parameter associated with a region-specific pronunciation rewrite rule.

24. A system, comprising:
- means for computing a control data for selectively controlling a text to sound conversion logic;
  
  means for computing a set of sounds from a word; and
  
  means for matching a first set of sounds to a second set of sounds, the first set of sounds being computed from a first word and the second set of sounds being computed from a second word.

25. A set of application programming interfaces embodied on a computer-readable medium for execution by a computer component in conjunction with word matching with context sensitive character to sound correlating, comprising:
- a first interface for communicating a text to sound pronunciation data; and
  
  a second interface for communicating a text to sound conversion rule that is based, at least in part, on the text to sound pronunciation data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Liao, Ciya, Gandhi, Rikin

Application Number

US11/318,826
Publication Number

US 20070150279A1
Time in Patent Office

Days
Field of Search
US Class Current

704/258
CPC Class Codes

G10L 13/08 Text analysis or generation...

Word matching with context sensitive character to sound correlating

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Word matching with context sensitive character to sound correlating

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links