Compound word recognition
First Claim
1. In a system for recognizing the speech in a language, a computer-implemented method for improving recognition of a text string, the text string comprising words associated with parts of speech, the method comprising:
- analyzing the text string with respect to information about expected patterns of the parts of speech in the language, the information comprising;
rules descriptive of combinations of parts of speech in the language corresponding to compound words in the language; and
rules descriptive of unpreferred combinations of parts of speech in the language; and
modifying the text string based on the analysis.
8 Assignments
0 Petitions
Accused Products
Abstract
Recognition of a text string is improved by analyzing the text string with respect to information about expected patterns of the parts of speech of words in the text string and by modifying the text string based on the analysis. Analyzing may include comparing the combinations of parts of speech to parts of speech associated with the words in the text string and, if at least one of the combinations of parts of speech matches parts of speech associated with the words, indicating that a compound word should be formed from the words associated with the matched parts of speech.
-
Citations
24 Claims
-
1. In a system for recognizing the speech in a language, a computer-implemented method for improving recognition of a text string, the text string comprising words associated with parts of speech, the method comprising:
-
analyzing the text string with respect to information about expected patterns of the parts of speech in the language, the information comprising;
rules descriptive of combinations of parts of speech in the language corresponding to compound words in the language; and
rules descriptive of unpreferred combinations of parts of speech in the language; and
modifying the text string based on the analysis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
comparing the combinations of parts of speech to parts of speech associated with the words in the text string; and
if at least one of the combinations of parts of speech matches parts of speech associated with the words, indicating that a compound word should be formed from the words associated with the matched parts of speech.
-
-
4. The method of claim 3, further comprising:
-
analyzing the text string with respect to rules descriptive of unpreferred combinations of parts of speech in the language corresponding to combinations of words which do not typically form compound words in the language; and
if at least one of the unpreferred combinations of parts of speech matches parts of speech associated with the words, indicating that a compound word should not be formed from the words associated with the matched parts of speech.
-
-
5. The method of claim 4, further comprising:
-
analyzing the text string with respect to agreement rules descriptive of patterns of agreement of case, number, and gender of words corresponding to combinations of words which do not typically form compound words in the language; and
if at least one of the agreement rules matches words in the text string, indicating that a compound word should not be formed from the matching words.
-
-
6. The method of claim 5, wherein the agreement rules include a rule indicating that if a noun in a subordinate clause matches the case, number, and gender of a preceding determiner, a compound word should not be formed from the noun and subsequent words in the subordinate clause.
-
7. The method of claim 5, wherein the agreement rules include a rule indicating that if a noun in a non-subordinate clause matches the case, number, and gender of a preceding determiner, a compound word should not be formed from words in the noun phrase containing the noun and words subsequent to the noun phrase.
-
8. The method of claim 3, wherein the unpreferred combinations of parts of speech correspond to combinations of groups of parts of speech, the groups corresponding to phrases.
-
9. The method of claim 8, wherein groups comprise pairs.
-
10. The method of claim 3, further comprising:
adding the compound word to a compound word cache.
-
11. The method of claim 10, wherein adding the compound word to the compound word cache comprises increasing a frequency of the compound word in the compound word cache.
-
12. The method of claim 3, further comprising:
-
identifying the compound word as an incorrect compound word; and
adding the compound word to a compound word error cache.
-
-
13. The method of claim 12, wherein adding the compound word to the compound word error cache comprises increasing a frequency of the compound word in the compound word error cache.
-
14. The method of claim 3, further comprising:
if the compound word has been identified as an incorrect compound word, indicating that the compound word should not be formed from the words associated with the matched parts of speech.
-
15. The method of claim 14, wherein the compound word has been identified as an incorrect compound word in response to action of a user by adding the compound word to a compound word error cache.
-
16. The method of claim 3, further comprising:
indicating that the compound word should not be formed from the words associated with the matched parts of speech if the compound word has been identified as an incorrect compound word more frequently than the compound word has not been identified to be an incorrect compound word.
-
17. The method of claim 1, wherein modifying the text string comprises forming a compound word from words in the text string.
-
18. The method of claim 17, further comprising adding the compound word to a vocabulary.
-
19. The method of claim 17, wherein modifying the text string comprises replacing words in the text string with the compound word.
-
20. The method of claim 19, further comprising:
adding the modified text string to a list of candidate text strings.
-
21. The method of claim 17, further comprising:
adding the compound word to a compound word cache.
-
22. The method of claim 21, wherein adding the compound word comprises increasing the frequency count of the compound word in the compound word cache.
-
23. The method of claim 17, further comprising:
adding the compound word to a vocabulary.
-
24. The method of claim 1, wherein the language comprises German.
Specification