Analyzing inflectional morphology in a spoken language translation system

US 6,442,524 B1
Filed: 01/29/1999
Issued: 08/27/2002
Est. Priority Date: 01/29/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for analyzing inflectional morphology, comprising:

receiving at least one speech input;

generating at least one token from the at least one speech input;

reducing at least one morpheme of the at least one token to at least one feature;

identifying an inflection type of the at least one token;

searching at least one dictionary for at least one entry comprising at least one entry feature that matches the at least one feature;

generating at least one lexical feature structure for the at least one token by inserting at least one morphological feature associated with the inflection type into the at least one entry feature; and

outputting the at least one lexical feature structure.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

At least one speech input is received and at least one token is generated from speech input. Morphemes of the tokens are reduced to at least one feature. Furthermore, an inflection type of the token is identified. At least one dictionary is searched for entries comprising features that match the features reduced from the morphemes. At least one lexical feature structure is generated for the token by inserting at least one morphological feature associated with the inflection type into the entry feature. An output is provided comprising at least one lexical feature structure.

Citations

47 Claims

1. A method for analyzing inflectional morphology, comprising:
- receiving at least one speech input;
  
  generating at least one token from the at least one speech input;
  
  reducing at least one morpheme of the at least one token to at least one feature;
  
  identifying an inflection type of the at least one token;
  
  searching at least one dictionary for at least one entry comprising at least one entry feature that matches the at least one feature;
  
  generating at least one lexical feature structure for the at least one token by inserting at least one morphological feature associated with the inflection type into the at least one entry feature; and
  
  outputting the at least one lexical feature structure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, wherein generating at least one lexical feature structure comprises sequentially applying morphological rules to the at least one entry feature to assign morphological information to corresponding features, the morphological rules being rules selected from the group comprising rules for verbs, rules for nouns, rules for adjectives, and rules for adverbs.
  - 3. The method of claim 1, wherein generating at least one lexical feature structure further comprises:
4. The method of claim 1, further comprising:
- analyzing the at least one token using at least one set of analysis rules; and
  
  identifying a root form and a grammatical category of the at least one token.
5. The method of claim 1, wherein the at least one lexical feature structure comprises at least one feature-value pair.
6. The method of claim 1, wherein generating at least one lexical feature structure for the at least one token comprises:
- generating multiple feature structures when more than one valid analysis is found for the at least one token; and
  
  generating a feature structure for an unknown word when a valid analysis is not found for the at least one token.
7. The method of claim 1, wherein the at least one dictionary comprises at least one lexical entry comprising a feature structure format, wherein each lexical entry comprises information on base form and grammatical category.
8. The method of claim 7, wherein each lexical entry further comprises information at least one of semantic contents, person, number, case, gender, category preferences, and lexical type.
9. The method of claim 1, further comprising discerning three types of lexical entries of the at least one dictionary for inflectional information encoding, wherein a first type of lexical entry does not comprise inflectional information, wherein a second type of lexical entry comprises at least one feature indicative of morphographic changes, wherein a third type of lexical entry comprises irregular inflections.
10. The method of claim 9, wherein default inflectional rules apply to the first type of lexical entry and special inflectional rules apply to the second type of lexical entry.
11. The method of claim 9, wherein irregular inflections are represented with a string-feature slot that comprises a surface form.
12. The method of claim 1, wherein the inflection type is selected from the group comprising past-tense form of verbs, past participle form of verbs, present participle form of verbs, present-tense form of verbs, present-tense first-person singular form of verbs, present-tense third-person singular form of verbs, past-tense first-person singular form of verbs, past-tense third-person singular form of verbs, comparative form of adjectives, comparative form of adverbs, superlative form of adjectives, superlative form of adverbs, adverbial form of adjectives, plural form of nouns, and genitive form of nouns.
13. The method of claim 1, wherein generating at least one token from the at least one speech input comprises:
- applying at least one set of tokenization rules; and
  
  breaking a sequence of words into individual words using the tokenization rules.
14. The method of claim 13, wherein the sequence of words are broken at sequence locations selected from the group comprising a space character, an apostrophe plus a space character, an apostrophe plus a character “
- s”
  
  , an apostrophe plus a character sequence “
  
  re”
  
  , an apostrophe plus a character “
  
  d”
  
  , an apostrophe plus a character sequence “
  
  ve”
  
  , an apostrophe plus a character sequence “
  
  ll”
  
  , a period marking an end of sentence, a question mark, an exclamation mark, a comma between characters, a dollar sign, a percent sign, a plus sign, a minus sign, a semicolon, and a colon.
15. The method of claim 1, wherein the inflectional morphology analysis is performed by a spoken language translation system comprising at least one analog-to-digital converter, at least one digital-to-analog converter, at least one amplifier, at least one output device selected from the group comprising at least one speaker and at least one display device, and at least one input device selected from the group comprising at least one microphone, at least one keyboard, at least one cursor, and at least one touch-sensitive screen.
16. The method of claim 1, wherein the at least one lexical feature structure comprises a linguistic data structure comprising feature-value pairs for entities selected from the group comprising strings, symbols, and numbers.

17. An apparatus for analyzing inflectional morphology comprising:
- at least one processor;
  
  an input coupled to the at least one processor, the input capable of receiving at least one speech input, the at least one processor having circuitry configured to, p2 generate at least one token from the at least one speech input;
  
  reduce at least one morpheme of the at least one token to at least one feature;
  
  identify an inflection type of the at least one token;
  
  search at least one dictionary for at least one entry comprising at least one entry feature that matches the at least one feature;
  
  generate at least one lexical feature structure for the at least one token by inserting at least one morphological feature associated with the inflection type into the at least one entry feature;
  
  an output coupled to the at least one processor, the output capable of providing the at least one lexical feature structure.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 18. The apparatus of claim 17, wherein the circuitry configured to generate at least one lexical feature structure comprises circuitry configured to sequentially apply morphological rules to the at least one entry feature to assign morphological information to corresponding features, the morphological rules being rules selected from the group comprising rules for verbs, rules for nouns, rules for adjectives, and rules for adverbs.
  - 19. The apparatus of claim 17, wherein the circuitry configured to generate at least one lexical feature structure further comprises circuitry configured to:
20. The apparatus of claim 17, wherein the at least one processor has circuitry configured to:
- analyze the at least one token using at least one set of analysis rules; and
  
  identify a root form and a grammatical category of the at least one token.
21. The apparatus of claim 17, wherein the at least one lexical feature structure comprises at least one feature-value pair.
22. The apparatus of claim 17, wherein the circuitry configured to generate at least one lexical feature structure for the at least one token comprises circuitry configured to:
- generate multiple feature structures when more than one valid analysis is found for the at least one token; and
  
  generate a feature structure for an unknown word when a valid analysis is not found for the at least one token.
23. The apparatus of claim 17, wherein the processor has circuitry configured to analyze by forming the at least one dictionary, wherein the at least one dictionary comprises at least one lexical entry comprising a feature structure format, wherein each lexical entry comprises information selected from the group comprising base form and grammatical category, semantic contents, person, number, case, gender, category preferences, and lexical type.
24. The apparatus of claim 17, wherein the at least lexical entry further comprises information on at least one of semantic contents, person, number, case, gender, category preferences, and lexical type.
25. The apparatus of claim 17, wherein the processor has circuitry configured to analyze by discerning three types of lexical entries of the at least one dictionary for inflectional information encoding, wherein a first type of lexical entry does not comprise inflectional information, wherein a second type of lexical entry comprises at least one feature indicative of morphographic changes, wherein a third type of lexical entry comprises irregular inflections.
26. The apparatus of claim 25, wherein default inflectional rules apply to the first type of lexical entry and special inflectional rules apply to the second type of lexical entry.
27. The apparatus of claim 25, wherein irregular inflections are represented with a string-feature slot that comprises a surface form.
28. The apparatus of claim 17, wherein the inflection type is selected from the group comprising past-tense form of verbs, past participle form of verbs, present participle form of verbs, present-tense form of verbs, present-tense first-person singular form of verbs, present-tense third-person singular form of verbs, past-tense first-person singular form of verbs, past-tense third-person singular form of verbs, comparative form of adjectives, comparative form of adverbs, superlative form of adjectives, superlative form of adverbs, adverbial form of adjectives, plural form of nouns, and genitive form of nouns.
29. The apparatus of claim 17, wherein generating at least one token from the at least one speech input comprises:
- applying at least one set of tokenization rules; and
  
  breaking a sequence of words into individual words using the tokenization rules.
30. The apparatus of claim 29, wherein the sequence of words are broken at sequence locations selected from the group comprising a space character, an apostrophe plus a space character, an apostrophe plus a character “
- s”
  
  , an apostrophe plus a character sequence “
  
  re”
  
  , an apostrophe plus a character “
  
  d”
  
  , an apostrophe plus a character sequence “
  
  ve”
  
  , an apostrophe plus a character sequence “
  
  ll”
  
  , a period marking an end of sentence, a question mark, an exclamation mark, a comma between characters, a dollar sign, a percent sign, a plus sign, a minus sign, a semicolon, and a colon.
31. The apparatus of claim 17, wherein the inflectional morphology analysis is performed by a spoken language translation system comprising at least one analog-to-digital converter, at least one digital-to-analog converter, at least one amplifier, at least one output device selected from the group comprising at least one speaker and at least one display device, and at least one input device selected from the group comprising at least one microphone, at least one keyboard, at least one cursor, and at least one touch-sensitive screen.
32. The apparatus of claim 17, wherein the at least one lexical feature structure comprises a linguistic data structure comprising feature-value pairs for entities selected from the group comprising strings, symbols, and numbers.

33. A computer readable medium containing executable instructions which, when executed in a processing system, causes the system to perform a method for analyzing inflectional morphology, the method comprising:
- receiving at least one speech input;
  
  generating at least one token from the at least one speech input;
  
  reducing at least one morpheme of the at least one token to at least one feature;
  
  identifying an inflection type of the at least one token;
  
  searching at least one dictionary for at least one entry comprising at least one entry feature that matches the at least one feature;
  
  generating at least one lexical feature structure for the at least one token by inserting at least one morphological feature associated with the inflection type into the at least one entry feature; and
  
  outputting the at least one lexical feature structure.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
- - 34. The computer readable medium of claim 33, wherein generating at least one lexical feature structure comprises sequentially applying morphological rules to the at least one entry feature to assign morphological information to corresponding features, the morphological rules being rules selected from the group comprising rules for verbs, rules for nouns, rules for adjectives, and rules for adverbs.
  - 35. The computer readable medium of claim 33, wherein generating at least one lexical feature structure further comprises:
36. The computer readable medium of claim 33, wherein the method further comprises:
- analyzing the at least one token using at least one set of analysis rules; and
  
  identifying a root form and a grammatical category of the at least one token.
37. The computer readable medium of claim 33, wherein the at least one lexical feature structure comprises at least one feature-value pair.
38. The computer readable medium of claim 33, wherein generating at least one lexical feature structure for the at least one token comprises:
- generating multiple feature structures when more than one valid analysis is found for the at least one token; and
  
  generating a feature structure for an unknown word when a valid analysis is not found for the at least one token.
39. The computer readable medium of claim 33, wherein the method further comprises forming the at least one dictionary, wherein the at least one dictionary comprises at least one lexical entry comprising a feature structure format, wherein each lexical entry comprises information selected from the group comprising information on base form, grammatical category, semantic contents, person, number, case, gender, category preferences, and lexical type.
40. The computer readable medium of claim 33, wherein the method further comprises discerning three types of lexical entries of the at least one dictionary for inflectional information encoding, wherein a first type of lexical entry does not comprise inflectional information, wherein a second type of lexical entry comprises at least one feature indicative of morphographic changes, wherein a third type of lexical entry comprises irregular inflections.
41. The computer readable medium of claim 40, wherein default inflectional rules apply to the first type of lexical entry and special inflectional rules apply to the second type of lexical entry.
42. The computer readable medium of claim 40, wherein irregular inflections are represented with a string-feature slot that comprises a surface form.
43. The computer readable medium of claim 33, wherein the inflection type is selected from the group comprising past-tense form of verbs, past participle form of verbs, present participle form of verbs, present-tense form of verbs, present-tense first-person singular form of verbs, present-tense third-person singular form of verbs, past-tense first-person singular form of verbs, past-tense third-person singular form of verbs, comparative form of adjectives, comparative form of adverbs, superlative form of adjectives, superlative form of adverbs, adverbial form of adjectives, plural form of nouns, and genitive form of nouns.
44. The computer readable medium of claim 33, wherein generating at least one token from the at least one speech input comprises:
- applying at least one set of tokenization rules; and
  
  breaking a sequence of words into individual words using the tokenization rules.
45. The computer readable medium of claim 44, wherein the sequence of words are broken at sequence locations selected from the group comprising a space character, an apostrophe plus a space character, an apostrophe plus a character “
- s”
  
  , an apostrophe plus a character sequence “
  
  re”
  
  , an apostrophe plus a character “
  
  d”
  
  , an apostrophe plus a character sequence “
  
  ve”
  
  , an apostrophe plus a character sequence “
  
  ll”
  
  , a period marking an end of sentence, a question mark, an exclamation mark, a comma between characters, a dollar sign, a percent sign, a plus sign, a minus sign, a semicolon, and a colon.
46. The computer readable medium of claim 33, wherein the inflectional morphology analysis is performed by a spoken language translation system comprising at least one analog-to-digital converter, at least one digital-to-analog converter, at least one amplifier, at least one output device selected from the group comprising at least one speaker and at least one display device, and at least one input device selected from the group comprising at least one microphone, at least one keyboard, at least one cursor, and at least one touch-sensitive screen.
47. The computer readable medium of claim 33, wherein the at least one lexical feature structure comprises a linguistic data structure comprising feature-value pairs for entities selected from the group comprising strings, symbols, and numbers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Inventors
Duan, Lei, Franz, Alexander M., Horiguchi, Keiko, Ecker, Doris M.
Primary Examiner(s)
Knepper, David D.

Application Number

US09/239,642
Time in Patent Office

1,306 Days
Field of Search

704/277, 704/257, 704/9, 704/3
US Class Current

704/277
CPC Class Codes

G06F 40/268 Morphological analysis

G10L 15/18 using natural language mode...

Analyzing inflectional morphology in a spoken language translation system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

47 Claims

Specification

Solutions

Use Cases

Quick Links

Analyzing inflectional morphology in a spoken language translation system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

47 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links