Conference transcription based on conference data

US 9,031,839 B2
Filed: 12/01/2010
Issued: 05/12/2015
Est. Priority Date: 12/01/2010
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, with a collaboration server device hosting a current conference for a plurality of conference participants, conference data from at least one of the plurality of conference participants, the conference data comprising a shared material that is shared among the plurality of conference participants during the current conference and that includes data indicative of words or phrases discussed among the plurality of conference participants during the current conference when referencing the shared material, the shared material being other than data identifying the conference participants and other than audio generated during the current conference;

after receiving the shared material, sending, with the collaboration server device, the words or phrases of the shared material to a speech recognition engine to update a language model of the speech recognition engine with the words or phrases in order to improve an accuracy of a transcription of an output media stream of the current conference generated by the speech recognition engine upon receiving the output media stream;

receiving, with the collaboration server device, a plurality of input media streams from the plurality of conference participants generated during the current conference;

generating, with the collaboration server device, the output media stream from the plurality of input media streams;

sending, with the collaboration server device, the output media stream to the speech recognition engine for generation of the transcription of the output media stream using the updated language model;

receiving, with the collaboration server device, from one of the plurality of conference participants, mode data indicating whether the updated language model is to be used for only the current conference or also for a future conference; and

when the mode data indicates that the updated language model is to be used also for the future conference, sending, with the collaboration server device, a command to the speech recognition engine, the command indicating to the speech recognition engine to store the updated language model for the future conference.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one implementation, a collaboration server is a conference bridge or other network device configured to host an audio and/or video conference among a plurality of conference participants. The collaboration server sends conference data and a media stream including speech to a speech recognition engine. The conference data may include the conference roster or text extracted from documents or other files shared in the conference. The speech recognition engine updates a default language model according to the conference data and transcribes the speech in the media stream based on the updated language model. In one example, the performance of default language model, the updated language model, or both may be tested using a confidence interval or submitted for approval of the conference participant.

Citations

20 Claims

1. A method comprising:
- receiving, with a collaboration server device hosting a current conference for a plurality of conference participants, conference data from at least one of the plurality of conference participants, the conference data comprising a shared material that is shared among the plurality of conference participants during the current conference and that includes data indicative of words or phrases discussed among the plurality of conference participants during the current conference when referencing the shared material, the shared material being other than data identifying the conference participants and other than audio generated during the current conference;
  
  after receiving the shared material, sending, with the collaboration server device, the words or phrases of the shared material to a speech recognition engine to update a language model of the speech recognition engine with the words or phrases in order to improve an accuracy of a transcription of an output media stream of the current conference generated by the speech recognition engine upon receiving the output media stream;
  
  receiving, with the collaboration server device, a plurality of input media streams from the plurality of conference participants generated during the current conference;
  
  generating, with the collaboration server device, the output media stream from the plurality of input media streams;
  
  sending, with the collaboration server device, the output media stream to the speech recognition engine for generation of the transcription of the output media stream using the updated language model;
  
  receiving, with the collaboration server device, from one of the plurality of conference participants, mode data indicating whether the updated language model is to be used for only the current conference or also for a future conference; and
  
  when the mode data indicates that the updated language model is to be used also for the future conference, sending, with the collaboration server device, a command to the speech recognition engine, the command indicating to the speech recognition engine to store the updated language model for the future conference.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the shared material includes a document, a slide show, a spreadsheet, a diagram, or a website.
  - 3. The method of claim 1, wherein the conference data further comprises a conference roster that includes names of the plurality of conference participants.
  - 4. The method of claim 1, wherein sending the words or phrases to the speech recognition engine comprises sending, with the collaboration server device, the words or phrases to the speech recognition engine before sending the output media stream to the speech recognition engine.
  - 5. The method of claim 1, further comprising:
    - receiving, with the collaboration server device, the transcription from the speech recognition engine, wherein the transcript was created using the updated language model.
  - 6. The method of claim 5, further comprising:
    - sending, with the collaboration server device, the transcription received from the speech recognition engine to a conference participant of the plurality of conference participants when the conference participant sent at least part of the shared material prior to or during the current conference.
  - 7. The method of claim 1, wherein the speech recognition engine determines a confidence score of the transcription generated with the updated language model to determine that the language model updated with the words or phrases generates a more accurate transcription than a default language model initially set for use with the current conference.
  - 8. The method of claim 1, wherein the conference data further includes chat data typed into a chat window by at least one of the plurality of conference participants during the current conference, the chat data being on topic with the words or phrases from the shared material discussed during the current conference.
  - 9. The method of claim 1, wherein sending the words or phrases to the speech recognition engine comprises sending, with the collaboration server device, the words or phrases to the speech recognition engine prior to commencement of the current conference.
  - 10. The method of claim 1, wherein sending the words or phrases comprises sending, with the collaboration server device, words or phrases of a portion of the shared material that is currently being presented during the current conference.
  - 11. The method of claim 1, wherein the updated language model, when stored in response to the command, is indexed according to at least one of:
    - a conference topic, one or more conference participants, a conference topic keyword, or a comparison of the shared material shared during the current conference with prior shared material shared during a prior conference.

12. A collaboration server device comprising:
- a memory storing conference data associated with a current conference received from at least one of a plurality of conference participants, the conference data comprising a shared material that is shared among the plurality of conference participants during the current conference and that includes data indicative of words or phrases discussing among the plurality of conference participants during the current conference when referencing the shared material, the shared material being other than data identifying the conference participants and other than audio generated during the current conference;
  
  a collaboration server controller configured to;
  
  host the current conference and allow the shared material to be shared among the plurality of conference participants during the current conference;
  
  obtain words or phrases from the shared material stored in the memory;
  
  generate an output media stream from a plurality of input media streams received from the plurality of conference participants;
  
  send, via a communication interface, the output media stream and the words or phrases to a speech recognition engine to update a language model of the speech recognition engine with the words or phrases in order to improve an accuracy of a transcription of the output media stream generated by the speech recognition engine;
  
  receive mode data indicating to store the updated language model for a future conference; and
  
  in response to the mode data indicating to store the updated language model for a future conference, index the updated language model by conference topic with one or more prior language models based on a comparison of the shared material shared during the current conference with prior shared material shared during one or more prior conferences,wherein the memory is configured to store the updated language model according to the index.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The collaboration server device of claim 12, wherein the collaboration server controller, via the communication interface, is further configured to send the output media stream to the plurality of conference participants.
  - 14. The collaboration server device of claim 12, wherein the transcription comprises an initial transcription of the output media stream generated using a default language model.
  - 15. The collaboration server device of claim 14, wherein the speech recognition engine calculates a confidence score based on the second transcription and updates the default language model with the words or phrases to generate the updated language model in response to the confidence score not exceeding a predetermined threshold.
  - 16. The collaboration server device of claim 12, wherein the shared material comprises shared presentation material from at least one of the plurality of conference participants.
  - 17. The collaboration sever device of claim 12, wherein the communication interface is configured to send the words or phrases to the speech recognition engine before the output media stream is sent to the speech recognition engine.
  - 18. The apparatus of claim 12, wherein the collaboration server controller, via the communication interface, is configured to send the words or phrases obtained from the shared material to the speech recognition data prior to receipt of the plurality of input media streams generated during the current conference.

19. A non-transitory computer readable storage medium comprising computer-executable instructions comprising:
- instructions executable by a collaboration server device hosting a current conference to receive shared data that is shared among a plurality of conference participants during the current conference, the shared data received from at least one of the plurality of conference participants and including data indicative of words or phrases discussed among the plurality of conference participants during the current conference when referencing the shared data and being other than data identifying the conference participants and other than audio generated during the current conference;
  
  instructions executable by the collaboration server device to extract the words or phrases from the shared data;
  
  instructions executable by the collaboration server device to send the words or phrases to a speech recognition engine to update a default language model with the words or phrases in order to improve an accuracy of a transcription of an output media stream from the current conference;
  
  instructions executable by the collaboration server device to receive from one of the plurality of conference participants mode data indicating whether the updated language model is to be used for only the current conference or also for a future conference; and
  
  instructions executable by the collaboration server device to send a command to the speech recognition engine when the mode data indicates that the updated language model is to be used also for the future conference, the command indicating to the speech recognition engine to store the updated language model for the future conference.
- View Dependent Claims (20)
- - 20. The non-transitory computer readable storage medium of claim 19, further comprising:
    - instructions executable by the speech recognition engine to transcribe a media stream from the current conference to generate a transcription using the updated language model;
      
      instructions executable by the speech recognition engine to transcribe the portion of the media stream from the current conference using the default language model;
      
      instructions executable by the speech recognition engine to calculate a first confidence interval using the default language model;
      
      instructions executable by the speech recognition engine to calculate a second confidence interval using the updated language model; and
      
      instructions executable by the speech recognition engine to compare the first confidence interval and the second confidence interval to determine whether the updated language model is a better language model than the default language model for generation of the transcription of the output media stream.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Thorsen, Tyrone Terry, Gatzke, Alan Darryl
Primary Examiner(s)
JACKSON, JAKIEDA R

Application Number

US12/958,129
Publication Number

US 20120143605A1
Time in Patent Office

1,623 Days
Field of Search

704/235, 704/260
US Class Current

704/235
CPC Class Codes

G10L 15/065 Adaptation

G10L 15/183 using context dependencies,...

Conference transcription based on conference data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Conference transcription based on conference data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links