Device for extracting information from a dialog

US 9,257,115 B2
Filed: 02/06/2013
Issued: 02/09/2016
Est. Priority Date: 03/08/2012
Status: Active Grant

First Claim

Patent Images

1. A device comprising:

at least one microphone;

a screen display; and

at least one programmable processor and at least one data storage unit for storing digital data, wherein the at least one programmable processor is in communication with the at least one microphone and the screen display, and wherein the at least one programmable processor is programmed to;

automatically recognize speech input by a first speaker received by the at least one microphone, comprising;

receiving the speech input from the first speaker;

determining a recognized speech result based on the received speech input; and

determining, by the computer-based speech translation system, whether there exists a recognition ambiguity in the recognized speech of the first speaker, wherein the recognition ambiguity indicates more than one possible match for the recognized speech result;

translate, by the computer-based speech translation system, the recognized speech result of the first speaker in the first language into a second language;

determine, by the computer-based speech translation system, whether there exists a translation ambiguity for one or more words in the translation of the recognized speech result of the first speaker in the first language into the second language, wherein the translation ambiguity indicates more than one possible translation of the one or more words;

upon a determination by the computer-based speech translation system that there is either (i) a recognition ambiguity in the recognized speech result of the first speaker or (ii) a translation ambiguity in the translation of the recognized speech result of the first speaker in the first language into the second language, determining a confidence score based on the recognition or translation ambiguity; and

responsive to the confidence score being below a threshold, issuing by the computer-based speech translation system a disambiguation query to the first speaker via a user-interface of the speech translation system, wherein a response to the disambiguation query resolves the recognition or translation ambiguity.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Computer-implemented systems and methods for extracting information during a human-to-human mono-lingual or multi-lingual dialog between two speakers are disclosed. Information from either the recognized speech (or the translation thereof) by the second speaker and/or the recognized speech by the first speaker (or the translation thereof) is extracted. The extracted information is then entered into an electronic form stored in a data store.

Citations

37 Claims

1. A device comprising:
- at least one microphone;
  
  a screen display; and
  
  at least one programmable processor and at least one data storage unit for storing digital data, wherein the at least one programmable processor is in communication with the at least one microphone and the screen display, and wherein the at least one programmable processor is programmed to;
  
  automatically recognize speech input by a first speaker received by the at least one microphone, comprising;
  
  receiving the speech input from the first speaker;
  
  determining a recognized speech result based on the received speech input; and
  
  determining, by the computer-based speech translation system, whether there exists a recognition ambiguity in the recognized speech of the first speaker, wherein the recognition ambiguity indicates more than one possible match for the recognized speech result;
  
  translate, by the computer-based speech translation system, the recognized speech result of the first speaker in the first language into a second language;
  
  determine, by the computer-based speech translation system, whether there exists a translation ambiguity for one or more words in the translation of the recognized speech result of the first speaker in the first language into the second language, wherein the translation ambiguity indicates more than one possible translation of the one or more words;
  
  upon a determination by the computer-based speech translation system that there is either (i) a recognition ambiguity in the recognized speech result of the first speaker or (ii) a translation ambiguity in the translation of the recognized speech result of the first speaker in the first language into the second language, determining a confidence score based on the recognition or translation ambiguity; and
  
  responsive to the confidence score being below a threshold, issuing by the computer-based speech translation system a disambiguation query to the first speaker via a user-interface of the speech translation system, wherein a response to the disambiguation query resolves the recognition or translation ambiguity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The device of claim 1, wherein the at least one programmable processor is further programmed to:
    - automatically recognize speech by the second speaker received by the at least one microphone;
      
      extract at least information from the recognized speech by the second speaker; and
      
      enter the extracted information from the recognized speech by the second speaker into an electronic form that is stored in the at least one data storage unit of the computer system and displayed in a graphical user interface on the screen display.
  - 3. The device of claim 2, wherein:
    - the first speaker speaks a first language;
      
      the second speaker speaks a second language that is different from the first language; and
      
      the at least one programmable processor is further programmed to;
      
      automatically translate the recognized speech by second speaker in the second language to the first language;
      
      extract at least information from the recognized speech by the second speaker by extracting at least information from the translation of the recognized speech by the second speaker translated to the first language; and
      
      enter the extracted information by entering the extracted information from the translation of the recognized speech by the second speaker translated to the first language into the electronic form stored in the at least one data storage unit.
  - 4. The device of claim 3, wherein the processor is further programmed to:
    - extract at least information from the recognized speech result by the first speaker in the first language; and
      
      enter the extracted information from the recognized speech result by the first speaker in the first language into the electronic form.
  - 5. The device of claim 3, wherein the processor is programmed to extract the information from the translation of the recognized speech by the second speaker translated to the first language by parsing the translation by a semantic grammar.
  - 6. The device of claim 5, wherein the processor is further programmed to retrieve one or more documents related to the extract information from a remote database.
  - 7. The device of claim 3, wherein the processor is programmed to extract the information from the translation of the recognized speech by the second speaker translated to the first language by detecting one or more keywords in the translation.
  - 8. The device of claim 7, wherein the processor is further programmed to retrieve one or more documents related to the extract information from a remote database.
  - 9. The device of claim 1, wherein the processor is further programmed to solicit feedback from at least one of the first speaker and the second speaker prior to entering the extracted information in the electronic form.
  - 10. The device of claim 1, wherein the at least one programmable processor is programmed to recognize and received an edit to extracted information in the electronic form input via the screen display by a user of the device.
  - 11. The device of claim 1, wherein the at least one programmable processor is further programmed to:
    - output the recognized speech result of the first speaker and the recognized speech of the second speaker on a first portion of a graphical user interface that is displayed on the screen display during the dialog between the first and second speakers; and
      
      output on a second portion of the graphical user interface that is displayed on the screen display a form with information extracted from the dialog between the first and second speakers.
  - 12. The device of claim 11, wherein the at least one programmable processor is further programmed to extract information from the dialog between the first and second speakers for outputting on the second portion of the graphical user interface in the form.
  - 13. The device of claim 12, wherein:
    - the at least one programmable processor is programmed todisplay on the first portion of the graphical user interface the translations of the recognized speech result of the first speaker and the recognized speech of the second speaker.
  - 14. The device of claim 12, wherein the at least one programmable processor is programmed to recognize and received an edit to extracted information input via the screen display by a user of the device.

15. A computer-based device comprising:
- at least one microphone;
  
  a screen display;
  
  at least one data storage unit for storing digital data;
  
  a first automatic speech recognition module for automatically recognizing speech input by a first speaker received by the at least one microphone, wherein automatically recognizing the speech input by the first speaker comprises;
  
  receiving the speech input from the first speaker;
  
  determining a recognized speech result based on the received speech input;
  
  an interactive disambiguation module for a determining whether there exists a recognition ambiguity in the recognized speech result of the first speaker, wherein the recognition ambiguity indicates more than one possible match for the recognized speech result, and for determining whether there exists a translation ambiguity for one or more words in the translation of the recognized speech result of the first speaker in the first language into the second language, wherein the translation ambiguity indicates more than one possible translation of the one or more words;
  
  a first machine translation module for translating the recognized speech result of the first speaker in the first language into a second language; and
  
  wherein the interactive disambiguation module is further configured for, upon a determination that there is either (i) a recognition ambiguity in the recognized speech result of the first speaker or (ii) a translation ambiguity in the translation of the recognized speech result of the first speaker in the first language into the second language, determining a confidence score based on the recognition or translation ambiguity, and responsive to the confidence score being below a threshold, issuing by the computer-based speech translation system a disambiguation query to the first speaker via a user-interface of the speech translation system, wherein a response to the disambiguation query resolves the recognition or translation ambiguity.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
- - 16. The device of claim 15, wherein:
    - the first speaker speaks a first language;
      
      a second speaker speaks a second language that is different from the first language; and
      
      the device further comprises;
      
      a second automatic speech recognition module for automatically recognizing speech by the second speaker received by the at least one microphone;
      
      a second machine translation module for automatically translating the recognized speech by second speaker in the second language to the first language; and
      
      an information extraction module is for extracting at least information from the recognized speech by the second speaker by extracting at least information from the translation of the recognized speech by the second speaker translated into the first language and for entering the extracted information by entering the extracted information from translation of the recognized speech by the second speaker translated to the first language into the electronic form stored in the at least one data storage unit.
  - 17. The device of claim 16, wherein the information extraction module is further for:
    - extracting at least information from the recognized speech result by the first speaker in the first language; and
      
      entering the extracted information from the recognized speech result by the first speaker in the first language into the electronic form.
  - 18. The device of claim 16, wherein the information extraction module extracts the information from the translation of the recognized speech by the second speaker translated to the first language by parsing the translation by a semantic grammar.
  - 19. The device of claim 18, further comprising an information retriever module for retrieving one or more documents related to the extract information from a remote database.
  - 20. The device of claim 16, wherein the information extraction module extracts the information from the translation of the recognized speech by the second speaker translated to the first language by detecting one or more keywords in the translation.
  - 21. The device of claim 20, further comprising an information retriever module for retrieving one or more documents related to the extract information from a remote database.
  - 22. The device of claim 15, further comprising a multimodal interaction interface to solicit feedback from at least one of the first speaker and the second speaker prior to entering of the extracted information in the electronic form.

23. A computer-implemented method comprising:
- recognizing, by a computer-based speech translation system, speech input by a first speaker in a first language, comprising;
  
  receiving the speech input from the first speaker;
  
  determining a recognized speech result based on the received speech input; and
  
  determining, by the computer-based speech translation system, whether there exists a recognition ambiguity in the recognized speech result of the first speaker, wherein the recognition ambiguity indicates more than one possible match for the recognized speech result;
  
  translating, by the computer-based speech translation system, the recognized speech result of the first speaker in the first language into a second language;
  
  determining, by the computer-based speech translation system, whether there exists a translation ambiguity for one or more words in the translation of the recognized speech result of the first speaker in the first language into the second language, wherein the translation ambiguity indicates more than one possible translation of the one or more words;
  
  upon a determination by the computer-based speech translation system that there is either (i) a recognition ambiguity in the recognized speech result of the first speaker or (ii) a translation ambiguity in the translation of the recognized speech result of the first speaker in the first language into the second language, determining a confidence score based on the recognition or translation ambiguity; and
  
  responsive to the confidence score being below a threshold, issuing by the computer-based speech translation system a disambiguation query to the first speaker via a user-interface of the speech translation system, wherein a response to the disambiguation query resolves the recognition or translation ambiguity.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 24. The method of claim 23, further comprising:
    - receiving, by at least one microphone of a computer-based information extraction device, the speech by the first speaker in the first language spoken to a second speaker as part of a human-to-human dialog;
      
      automatically recognizing, by the computer-based information extraction device, the speech by the second speaker in a second language;
      
      extracting, by the computer-based information extraction device, at least information from the recognized speech by the second speaker; and
      
      entering, by the computer-based information extraction device, the extracted information from the recognized speech by the second speaker into an electronic form stored in at least one data storage unit of the information extraction device.
  - 25. The method of claim 24, further comprising display the form on a screen display of the computer-based information extraction device.
  - 26. The method of claim 25, wherein the first speaker speaks a first language, and the second speaker speaks a second language that is different from the first language, and wherein the method further comprises:
    - automatically translating, by the computer-based information extraction device, the recognized speech result by first speaker in the first language to the second language;
      
      automatically translating, by the computer-based information extraction device, the recognized speech by second speaker in the second language to the first language, andwherein;
      
      extracting at least information comprises extracting by the computer-based information extraction device at least information from the translation of the recognized speech by the second speaker translated to the first language; and
      
      entering the extracted information comprises entering, by the computer-based information extraction device, the extracted information from the translation of the recognized speech by the second speaker translated to the first language into the electronic form stored in the at least one data storage unit of the information extraction device.
  - 27. The method of claim 26, further comprising:
    - extracting at least information from the recognized speech result by the first speaker in the first language; and
      
      entering the extracted information from the recognized speech result by the first speaker in the first language into the electronic form.
  - 28. The method of claim 26, wherein extracting the information from the translation of the recognized speech by the second speaker translated to the first language comprises parsing the translation by a semantic grammar.
  - 29. The method of claim 28, further comprising retrieving, by the computer-based information extraction device, one or more documents related to the extract information from a remote database.
  - 30. The method of claim 26, wherein extracting the information from the translation of the recognized speech by the second speaker translated to the first language comprises detecting one or more keywords in the translation.
  - 31. The method of claim 30, further comprising retrieving, by the computer-based information extraction device, one or more documents related to the extract information from a remote database.
  - 32. The method of claim 26, further comprising soliciting, by the computer-based information extraction device, feedback from at least one of the first speaker and the second speaker prior to entering the extracted information in the electronic form.
  - 33. The method of claim 26, wherein the screen display of the computer-based information extraction device comprises a touch-screen display.
  - 34. The method of claim 23, wherein the disambiguation query issued to the first speaker is different when the ambiguity is in the recognized speech result of the first speaker than when the ambiguity is in the translation of the recognized speech result of the first speaker in the first language into the second language.
  - 35. The method of claim 23, wherein the determination of whether there exists an ambiguity in the recognized speech result of the first speaker is based upon a plurality of factors, the factors comprising:
    - an acoustic confidence score in the recognized speech result of the first speaker;
      
      a context of the dialog between the first and second speakers; and
      
      a language context given by a translation of one or more utterances from the second speaker from the second language to the first language.
  - 36. The method of claim 23, wherein the determination of whether there exists an ambiguity in the translation of the recognized speech result of the first speaker in the first language into the second language is based upon a plurality of factors, the factors comprising:
    - whether there are one or more alternative output translations within a threshold scoring difference of a highest scoring output translation; and
      
      whether, if there are no alternative output translations within the threshold scoring difference of the highest scoring output translation, the score for the highest scoring output translation is below a minimum threshold.
  - 37. The method of claim 23, wherein the user-interface of the speech translation system comprises a touch-screen display.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Waibel, Alexander
Primary Examiner(s)
Pham, Thierry L

Application Number

US13/760,535
Publication Number

US 20130238312A1
Time in Patent Office

1,098 Days
Field of Search

704/2, 704/7, 704/8, 704/200, 704/231, 704/251, 704/275, 704/277
US Class Current

1/1
CPC Class Codes

G06F 3/04842   Selection of displayed obje...

G06F 3/0486   Drag-and-drop

G06F 40/117   Tagging; Marking up details...

G06F 40/174   Form filling; Merging

G06F 40/58   Use of machine translation,...

G10L 13/027   Concept to speech synthesis...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 15/26   Speech to text systems G10L...

G10L 2015/088   Word spotting

G16H 10/20   for electronic clinical tri...

G16H 15/00   ICT specially adapted for m...

G16H 80/00   ICT specially adapted for f...

Device for extracting information from a dialog

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

37 Claims

Specification

Solutions

Use Cases

Quick Links

Device for extracting information from a dialog

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

37 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links