Method for goal-oriented speech translation in hand-held devices using meaning extraction and dialogue
First Claim
1. An apparatus for performing spoken translation in processing a spoken utterance from a user, comprising:
- a speech recognizer for converting said spoken utterance into a digital format;
a speech understanding module connected to said speech recognizer for determining semantic components of said spoken utterance;
a dialogue manager connected to said speech understanding module for determining a condition of insufficient semantic information existing within said spoken utterance based upon said determined semantic components; and
a speech translation module for generating a translation related to said insufficient semantic information, said generated translation being provided to said user in order for said user to utter to said speech recognizer a response related to said insufficient semantic information.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method and apparatus is provided for processing a spoken request from a user. A speech recognizer converts the spoken request into a digital format. A frame data structure associates semantic components of the digitized spoken request with predetermined slots. The slots are indicative of data which are used to achieve a predetermined goal. A speech understanding module which is connected to the speech recognizer and to the frame data structure determines semantic components of the spoken request. The slots are populated based upon the determined semantic components. A dialog manager which is connected to the speech understanding module may determine at least one slot which is unpopulated based upon the determined semantic components and in a preferred embodiment may provide confirmation of the populated slots. A computer generated-request is formulated in order for the user to provide data related to the unpopulated slot. The method and apparatus are well-suited (but not limited) to use in a hand-held speech translation device.
175 Citations
31 Claims
-
1. An apparatus for performing spoken translation in processing a spoken utterance from a user, comprising:
-
a speech recognizer for converting said spoken utterance into a digital format;
a speech understanding module connected to said speech recognizer for determining semantic components of said spoken utterance;
a dialogue manager connected to said speech understanding module for determining a condition of insufficient semantic information existing within said spoken utterance based upon said determined semantic components; and
a speech translation module for generating a translation related to said insufficient semantic information, said generated translation being provided to said user in order for said user to utter to said speech recognizer a response related to said insufficient semantic information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
a data structure for associating semantic components of said digitized spoken utterance with attributes indicative of a predetermined goal.
-
-
3. The apparatus of claim 2 further comprising:
-
a frame data structure for associating semantic components of said digitized spoken utterance with predetermined slots, said slots being indicative of data used to achieve a predetermined goal, said slots being populated based upon said determined semantic components by said speech understanding module.
-
-
4. The apparatus of claim 3 wherein said speech recognizer converts said response from said user into a digital format,
said speech understanding module determining semantic components of said response in order to populate said frame data structure with information related to said insufficient semantic information. -
5. The apparatus of claim 4 wherein said dialogue manager determines that sufficient semantic information exists and performs at least one computer-implemented activity related to said predetermined goal.
-
6. The apparatus of claim 5 wherein said computer-implemented activity is selected from the group consisting of performing hotel reservations via a remote database, purchasing a piece of merchandise via a remote database, performing location directory assistance via a remote database, exchanging money via a remote database, and combinations thereof.
-
7. The apparatus of claim 5 wherein said spoken utterance is spoken in a first language, said speech translation module generating a second translation in a second language based upon said determined semantic components, said computer-implemented activity including vocalizing said generated second translation.
-
8. The apparatus of claim 3 wherein said dialogue manager determines said condition of insufficient semantic information due to at least one of said slots being unpopulated.
-
9. The apparatus of claim 1 wherein said dialogue manager determines said condition of insufficient semantic information due to input to said speech recognizer from said user being insufficient with respect to a semantic level.
-
10. The apparatus of claim 9 wherein said dialogue manager determines said condition of insufficient semantic information due to input to said speech recognizer from said user being insufficient with respect to a pragmatic level.
-
11. The apparatus of claim 1 wherein a first spoken utterance is spoken in a first language, said speech translation module generating a translation in a second language based upon said determined semantic components.
-
12. The apparatus of claim 11 wherein a second spoken utterance is spoken by another user to said speech recognizer in said second language,
said speech understanding module determining second semantic components of said second spoken utterance, said dialogue manager determining a second condition of insufficient semantic information existing within said second spoken utterance based upon said second determined semantic components, said speech translation module generating a second translation in said second language related to said second insufficient semantic information, said generated second translation being provided to said other user in order for said other user to utter to said speech recognizer a response related to said second insufficient semantic information. -
13. The apparatus of claim 1 further comprising:
a computer response module for communicating via a predetermined communication mode said generated second translation to said user, said predetermined communication mode being selected from the group consisting of a textual display communication mode, a speech vocalization communication mode, a graphical communication mode, and combinations thereof.
-
14. The apparatus of claim 1 further comprising:
a remote database in communication with said dialogue manager for storing data related to a predetermined goal, said remote database providing said data to said dialogue manager.
-
15. The apparatus of claim 14 wherein said remote database communicates with said dialogue manager via a radio frequency communication mode.
-
16. The apparatus of claim 14 wherein said dialog manager formulates a first database request for said remote database to provide data related to said predetermined goal.
-
17. The apparatus of claim 16 wherein said dialog manager determines that said predetermined goal is substantially unattainable based upon said data from said remote database, said dialog manager determining what items in said remote database are substantially similar to said predetermined goal, said dialog manager communicating said items to said user via said speech translation module.
-
18. The apparatus of claim 17 wherein said spoken utterance of said user includes constraints related to said predetermined goal, said dialog manager formulating a second database request for said remote database in order to determine what items in said remote database are substantially similar to said predetermined goal, said dialog manager formulating said second database request by excluding from said second database request at least one of said constraints.
-
19. The apparatus of claim 16 wherein said dialog manager provides a summary of said data from said remote database to said user.
-
20. The apparatus of claim 1 further comprising:
a dialog history data file for storing a plurality of utterances of said user, said dialog manager determining information related to said insufficient semantic information via said dialog history data file.
-
21. The apparatus of claim 20 wherein said dialogue manager determines that a sufficient semantic information exists based at least in part upon the information determined via said dialog history data file, said dialogue manager performing at least one computer-implemented activity related to said predetermined goal.
-
22. The apparatus of claim 1 wherein said dialogue manager determines that a sufficient semantic information exists and communicates the determined semantic information to said user for user confirmation of accuracy of said determined semantic information, said dialogue manager performing at least one computer-implemented activity related to said predetermined goal after said user has confirmed the accuracy of said determined semantic information.
-
23. The apparatus of claim 22 wherein said computer-implemented activity is selected from the group consisting of performing hotel reservations via a remote database, purchasing a piece of merchandise via a remote database, performing location directory assistance via a remote database, exchanging money via a remote database, and combinations thereof.
-
24. The apparatus of claim 22 wherein said spoken utterance is spoken in a first language, said speech translation module generating a translation in a second language based upon said determined semantic components, said computer-implemented activity including vocalizing said translated first spoken utterance.
-
25. The apparatus of claim 1 further comprising:
a local parser connected to said speech understanding module for identifying predetermined speech fragments in said spoken utterance, said speech understanding module determining said semantic components based upon said identified speech fragments.
-
26. The apparatus of claim 25 wherein said local parser associates said speech fragments with predetermined tags, said tags being related to a predetermined goal.
-
27. The apparatus of claim 25 further comprising:
a global parser connected to said speech understanding module for determining said semantic components of said spoken utterance.
-
28. The apparatus of claim 27 further comprising:
-
a knowledge database for encoding the semantics of a predetermined domain, said domain being indicative of a predetermined goal, said global parser utilizing said knowledge database for determining said semantic components of said spoken utterance.
-
-
29. The apparatus of claim 28 further comprising:
-
first and second computer-storage media for storing respectively a first and second knowledge database, said first and second knowledge database being related respectively to a first and second domain, said first computer-storage medium being detachable from said global parser so that said second computer-storage medium can be used with said global parser.
-
-
30. The apparatus of claim 29 wherein said first and second computer-storage media are flash memory cards.
-
31. A method for performing spoken translation in processing a spoken utterance from a user, comprising:
-
converting said spoken utterance into a digital format;
determining semantic components of said spoken utterance;
determining a condition of insufficient semantic information existing within said spoken utterance based upon said determined semantic components; and
generating a translation related to said insufficient semantic information, providing said generated translation to said user in order for said user to utter a response related to said insufficient semantic information.
-
Specification