Analyzing inflectional morphology in a spoken language translation system
First Claim
Patent Images
1. A method for analyzing inflectional morphology, comprising:
- receiving at least one speech input;
generating at least one token from the at least one speech input;
reducing at least one morpheme of the at least one token to at least one feature;
identifying an inflection type of the at least one token;
searching at least one dictionary for at least one entry comprising at least one entry feature that matches the at least one feature;
generating at least one lexical feature structure for the at least one token by inserting at least one morphological feature associated with the inflection type into the at least one entry feature; and
outputting the at least one lexical feature structure.
1 Assignment
0 Petitions
Accused Products
Abstract
At least one speech input is received and at least one token is generated from speech input. Morphemes of the tokens are reduced to at least one feature. Furthermore, an inflection type of the token is identified. At least one dictionary is searched for entries comprising features that match the features reduced from the morphemes. At least one lexical feature structure is generated for the token by inserting at least one morphological feature associated with the inflection type into the entry feature. An output is provided comprising at least one lexical feature structure.
-
Citations
47 Claims
-
1. A method for analyzing inflectional morphology, comprising:
-
receiving at least one speech input;
generating at least one token from the at least one speech input;
reducing at least one morpheme of the at least one token to at least one feature;
identifying an inflection type of the at least one token;
searching at least one dictionary for at least one entry comprising at least one entry feature that matches the at least one feature;
generating at least one lexical feature structure for the at least one token by inserting at least one morphological feature associated with the inflection type into the at least one entry feature; and
outputting the at least one lexical feature structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
identifying at least one possible split of the at least one token according to at least one list of endings when at least one possible split exists;
determining whether the at least one possible split is valid; and
eliminating invalid splits using at least one morphological rule.
-
-
4. The method of claim 1, further comprising:
-
analyzing the at least one token using at least one set of analysis rules; and
identifying a root form and a grammatical category of the at least one token.
-
-
5. The method of claim 1, wherein the at least one lexical feature structure comprises at least one feature-value pair.
-
6. The method of claim 1, wherein generating at least one lexical feature structure for the at least one token comprises:
-
generating multiple feature structures when more than one valid analysis is found for the at least one token; and
generating a feature structure for an unknown word when a valid analysis is not found for the at least one token.
-
-
7. The method of claim 1, wherein the at least one dictionary comprises at least one lexical entry comprising a feature structure format, wherein each lexical entry comprises information on base form and grammatical category.
-
8. The method of claim 7, wherein each lexical entry further comprises information at least one of semantic contents, person, number, case, gender, category preferences, and lexical type.
-
9. The method of claim 1, further comprising discerning three types of lexical entries of the at least one dictionary for inflectional information encoding, wherein a first type of lexical entry does not comprise inflectional information, wherein a second type of lexical entry comprises at least one feature indicative of morphographic changes, wherein a third type of lexical entry comprises irregular inflections.
-
10. The method of claim 9, wherein default inflectional rules apply to the first type of lexical entry and special inflectional rules apply to the second type of lexical entry.
-
11. The method of claim 9, wherein irregular inflections are represented with a string-feature slot that comprises a surface form.
-
12. The method of claim 1, wherein the inflection type is selected from the group comprising past-tense form of verbs, past participle form of verbs, present participle form of verbs, present-tense form of verbs, present-tense first-person singular form of verbs, present-tense third-person singular form of verbs, past-tense first-person singular form of verbs, past-tense third-person singular form of verbs, comparative form of adjectives, comparative form of adverbs, superlative form of adjectives, superlative form of adverbs, adverbial form of adjectives, plural form of nouns, and genitive form of nouns.
-
13. The method of claim 1, wherein generating at least one token from the at least one speech input comprises:
-
applying at least one set of tokenization rules; and
breaking a sequence of words into individual words using the tokenization rules.
-
-
14. The method of claim 13, wherein the sequence of words are broken at sequence locations selected from the group comprising a space character, an apostrophe plus a space character, an apostrophe plus a character “
- s”
, an apostrophe plus a character sequence “
re”
, an apostrophe plus a character “
d”
, an apostrophe plus a character sequence “
ve”
, an apostrophe plus a character sequence “
ll”
, a period marking an end of sentence, a question mark, an exclamation mark, a comma between characters, a dollar sign, a percent sign, a plus sign, a minus sign, a semicolon, and a colon.
- s”
-
15. The method of claim 1, wherein the inflectional morphology analysis is performed by a spoken language translation system comprising at least one analog-to-digital converter, at least one digital-to-analog converter, at least one amplifier, at least one output device selected from the group comprising at least one speaker and at least one display device, and at least one input device selected from the group comprising at least one microphone, at least one keyboard, at least one cursor, and at least one touch-sensitive screen.
-
16. The method of claim 1, wherein the at least one lexical feature structure comprises a linguistic data structure comprising feature-value pairs for entities selected from the group comprising strings, symbols, and numbers.
-
17. An apparatus for analyzing inflectional morphology comprising:
-
at least one processor;
an input coupled to the at least one processor, the input capable of receiving at least one speech input, the at least one processor having circuitry configured to, p2 generate at least one token from the at least one speech input;
reduce at least one morpheme of the at least one token to at least one feature;
identify an inflection type of the at least one token;
search at least one dictionary for at least one entry comprising at least one entry feature that matches the at least one feature;
generate at least one lexical feature structure for the at least one token by inserting at least one morphological feature associated with the inflection type into the at least one entry feature;
an output coupled to the at least one processor, the output capable of providing the at least one lexical feature structure. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
identify at least one possible split of the at least one token according to at least one list of endings when at least one possible split exists;
determine whether the at least one possible split is valid; and
eliminate invalid splits using at least one morphological rule.
-
-
20. The apparatus of claim 17, wherein the at least one processor has circuitry configured to:
-
analyze the at least one token using at least one set of analysis rules; and
identify a root form and a grammatical category of the at least one token.
-
-
21. The apparatus of claim 17, wherein the at least one lexical feature structure comprises at least one feature-value pair.
-
22. The apparatus of claim 17, wherein the circuitry configured to generate at least one lexical feature structure for the at least one token comprises circuitry configured to:
-
generate multiple feature structures when more than one valid analysis is found for the at least one token; and
generate a feature structure for an unknown word when a valid analysis is not found for the at least one token.
-
-
23. The apparatus of claim 17, wherein the processor has circuitry configured to analyze by forming the at least one dictionary, wherein the at least one dictionary comprises at least one lexical entry comprising a feature structure format, wherein each lexical entry comprises information selected from the group comprising base form and grammatical category, semantic contents, person, number, case, gender, category preferences, and lexical type.
-
24. The apparatus of claim 17, wherein the at least lexical entry further comprises information on at least one of semantic contents, person, number, case, gender, category preferences, and lexical type.
-
25. The apparatus of claim 17, wherein the processor has circuitry configured to analyze by discerning three types of lexical entries of the at least one dictionary for inflectional information encoding, wherein a first type of lexical entry does not comprise inflectional information, wherein a second type of lexical entry comprises at least one feature indicative of morphographic changes, wherein a third type of lexical entry comprises irregular inflections.
-
26. The apparatus of claim 25, wherein default inflectional rules apply to the first type of lexical entry and special inflectional rules apply to the second type of lexical entry.
-
27. The apparatus of claim 25, wherein irregular inflections are represented with a string-feature slot that comprises a surface form.
-
28. The apparatus of claim 17, wherein the inflection type is selected from the group comprising past-tense form of verbs, past participle form of verbs, present participle form of verbs, present-tense form of verbs, present-tense first-person singular form of verbs, present-tense third-person singular form of verbs, past-tense first-person singular form of verbs, past-tense third-person singular form of verbs, comparative form of adjectives, comparative form of adverbs, superlative form of adjectives, superlative form of adverbs, adverbial form of adjectives, plural form of nouns, and genitive form of nouns.
-
29. The apparatus of claim 17, wherein generating at least one token from the at least one speech input comprises:
-
applying at least one set of tokenization rules; and
breaking a sequence of words into individual words using the tokenization rules.
-
-
30. The apparatus of claim 29, wherein the sequence of words are broken at sequence locations selected from the group comprising a space character, an apostrophe plus a space character, an apostrophe plus a character “
- s”
, an apostrophe plus a character sequence “
re”
, an apostrophe plus a character “
d”
, an apostrophe plus a character sequence “
ve”
, an apostrophe plus a character sequence “
ll”
, a period marking an end of sentence, a question mark, an exclamation mark, a comma between characters, a dollar sign, a percent sign, a plus sign, a minus sign, a semicolon, and a colon.
- s”
-
31. The apparatus of claim 17, wherein the inflectional morphology analysis is performed by a spoken language translation system comprising at least one analog-to-digital converter, at least one digital-to-analog converter, at least one amplifier, at least one output device selected from the group comprising at least one speaker and at least one display device, and at least one input device selected from the group comprising at least one microphone, at least one keyboard, at least one cursor, and at least one touch-sensitive screen.
-
32. The apparatus of claim 17, wherein the at least one lexical feature structure comprises a linguistic data structure comprising feature-value pairs for entities selected from the group comprising strings, symbols, and numbers.
-
33. A computer readable medium containing executable instructions which, when executed in a processing system, causes the system to perform a method for analyzing inflectional morphology, the method comprising:
-
receiving at least one speech input;
generating at least one token from the at least one speech input;
reducing at least one morpheme of the at least one token to at least one feature;
identifying an inflection type of the at least one token;
searching at least one dictionary for at least one entry comprising at least one entry feature that matches the at least one feature;
generating at least one lexical feature structure for the at least one token by inserting at least one morphological feature associated with the inflection type into the at least one entry feature; and
outputting the at least one lexical feature structure. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
identifying at least one possible split of the at least one token according to at least one list of endings;
determining whether the at least one possible split is valid; and
eliminating invalid splits using at least one morphological rule.
-
-
36. The computer readable medium of claim 33, wherein the method further comprises:
-
analyzing the at least one token using at least one set of analysis rules; and
identifying a root form and a grammatical category of the at least one token.
-
-
37. The computer readable medium of claim 33, wherein the at least one lexical feature structure comprises at least one feature-value pair.
-
38. The computer readable medium of claim 33, wherein generating at least one lexical feature structure for the at least one token comprises:
-
generating multiple feature structures when more than one valid analysis is found for the at least one token; and
generating a feature structure for an unknown word when a valid analysis is not found for the at least one token.
-
-
39. The computer readable medium of claim 33, wherein the method further comprises forming the at least one dictionary, wherein the at least one dictionary comprises at least one lexical entry comprising a feature structure format, wherein each lexical entry comprises information selected from the group comprising information on base form, grammatical category, semantic contents, person, number, case, gender, category preferences, and lexical type.
-
40. The computer readable medium of claim 33, wherein the method further comprises discerning three types of lexical entries of the at least one dictionary for inflectional information encoding, wherein a first type of lexical entry does not comprise inflectional information, wherein a second type of lexical entry comprises at least one feature indicative of morphographic changes, wherein a third type of lexical entry comprises irregular inflections.
-
41. The computer readable medium of claim 40, wherein default inflectional rules apply to the first type of lexical entry and special inflectional rules apply to the second type of lexical entry.
-
42. The computer readable medium of claim 40, wherein irregular inflections are represented with a string-feature slot that comprises a surface form.
-
43. The computer readable medium of claim 33, wherein the inflection type is selected from the group comprising past-tense form of verbs, past participle form of verbs, present participle form of verbs, present-tense form of verbs, present-tense first-person singular form of verbs, present-tense third-person singular form of verbs, past-tense first-person singular form of verbs, past-tense third-person singular form of verbs, comparative form of adjectives, comparative form of adverbs, superlative form of adjectives, superlative form of adverbs, adverbial form of adjectives, plural form of nouns, and genitive form of nouns.
-
44. The computer readable medium of claim 33, wherein generating at least one token from the at least one speech input comprises:
-
applying at least one set of tokenization rules; and
breaking a sequence of words into individual words using the tokenization rules.
-
-
45. The computer readable medium of claim 44, wherein the sequence of words are broken at sequence locations selected from the group comprising a space character, an apostrophe plus a space character, an apostrophe plus a character “
- s”
, an apostrophe plus a character sequence “
re”
, an apostrophe plus a character “
d”
, an apostrophe plus a character sequence “
ve”
, an apostrophe plus a character sequence “
ll”
, a period marking an end of sentence, a question mark, an exclamation mark, a comma between characters, a dollar sign, a percent sign, a plus sign, a minus sign, a semicolon, and a colon.
- s”
-
46. The computer readable medium of claim 33, wherein the inflectional morphology analysis is performed by a spoken language translation system comprising at least one analog-to-digital converter, at least one digital-to-analog converter, at least one amplifier, at least one output device selected from the group comprising at least one speaker and at least one display device, and at least one input device selected from the group comprising at least one microphone, at least one keyboard, at least one cursor, and at least one touch-sensitive screen.
-
47. The computer readable medium of claim 33, wherein the at least one lexical feature structure comprises a linguistic data structure comprising feature-value pairs for entities selected from the group comprising strings, symbols, and numbers.
Specification