Systems and methods for collaborative note-taking

US 7,542,971 B2
Filed: 02/02/2004
Issued: 06/02/2009
Est. Priority Date: 02/02/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A method for collaborative note taking based on a speech of a speaker and providing a summary to a user in an audience of the speaker, the method comprising:

receiving a first set of information from the speech;

performing speech recognition on the first set of information and determining selected portions of the speech;

determining portions of context information corresponding to a domain information from a presentation information source temporally associated with the selected portions of the speech;

determining at least one language model based on the selected portions of the speech and the temporally associated portions of context information from the presentation information source, wherein the at least one language model is dynamically determined;

applying the language model to the first set of information to extract salient tokens from the first set of information;

verifying relevance of the salient tokens based on the presentation information source to obtain verified tokens;

generating the summary including the extracted salient tokens, wherein generating the summary includes assembling the verified tokens;

displaying the summary to the user; and

receiving collaborative user feedback information relating to the summary and adjusting the language model according to the collaborative user feedback,wherein the method is implemented by a computer.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are provided for determining collaborative notes and automatically recognizing speech, handwriting and other type of information. Domain and optional actor/speaker information associated with the support information is determined. An initial automatic speech recognition model is determined based on the domain and/or actor information. The domain and/or actor/speaker language model is used to recognize text in the speech information associated with the support information. Presentation support information such as slides, speaker notes and the like are determined. The semantic overlap between the support information and the salient non-function words in the recognized text and collaborative user feedback information are used to determine relevancy scores for the recognized text. Grammaticality, well formedness, self referential integrity and other features are used to determine correctness scores. Suggested collaborative notes are displayed in the user interface based on the salient non-function words. User actions in the user interface determine feedback signals. Recognition models such as automatic speech recognition, handwriting recognition are determined based on the feedback signals and the correctness and relevance scores.

244 Citations

46 Claims

1. A method for collaborative note taking based on a speech of a speaker and providing a summary to a user in an audience of the speaker, the method comprising:
- receiving a first set of information from the speech;
  
  performing speech recognition on the first set of information and determining selected portions of the speech;
  
  determining portions of context information corresponding to a domain information from a presentation information source temporally associated with the selected portions of the speech;
  
  determining at least one language model based on the selected portions of the speech and the temporally associated portions of context information from the presentation information source, wherein the at least one language model is dynamically determined;
  
  applying the language model to the first set of information to extract salient tokens from the first set of information;
  
  verifying relevance of the salient tokens based on the presentation information source to obtain verified tokens;
  
  generating the summary including the extracted salient tokens, wherein generating the summary includes assembling the verified tokens;
  
  displaying the summary to the user; and
  
  receiving collaborative user feedback information relating to the summary and adjusting the language model according to the collaborative user feedback,wherein the method is implemented by a computer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 43, 44)
- - 2. The method of claim 1, wherein the language model recognizes at least one of:
    - speech and handwriting information.
  - 3. The method of claim 1, wherein the language model recognizes features associated with at least one of:
    - audio information, video information, and tactile information.
  - 4. The method of claim 1, wherein the language model recognizes attributes associated with at least one of:
    - a song, a book and a video.
  - 5. The method of claim 4, wherein the attributes are at least one of:
    - a title, a genre, an author.
  - 6. The method of claim 1, wherein the language model is a composite language model.
  - 7. The method of claim 6, wherein the composite language model is comprised of at least one of an actor based model, a domain based model and a genre based model.
  - 8. The method of claim 7, wherein the actor based model is a speaker based language model.
  - 9. The method of claim 1, wherein the summary is a suggested note.
  - 10. The method of claim 1, wherein the presentation information source is based on at least one of:
    - audio, visual, tactile information.
  - 11. The method of claim 10, wherein the audio information is collaborative user generated audio.
  - 12. The method of claim 10, wherein the visual information is at least one of video information, video capture of user gestures, textual information.
  - 13. The method of claim 1, wherein the collaborative user feedback is based on user actions.
  - 14. The method of claim 13, wherein the user actions operate on at least one of:
    - suggested collaborative notes, user notes, user corrections, user pauses.
  - 15. The method of claim 13, wherein a user action is ignoring a suggested note.
  - 16. The method of claim 13, wherein the collaborative user feedback is based on at least one of:
    - numbers of user actions, identity of users taking action, weighting based on which users take action.
  - 17. The method of claim 10, wherein the tactile information is displayed using a dynamically refreshable tactile display.
  - 18. The method of claim 16, wherein the at least one composite language model is determined dynamically.
  - 19. The method of claim 1, wherein the collaborative feedback information is determined based on at least one of:
    - intensity of user response, number of users responding, type of responses, time to respond, identity of user responding.
  - 43. The method of claim 8, further comprising determining matching verbalization of acronyms in the speech with a tokenized representation of the acronym in the presentation information source.
  - 44. The method of claim 8, further comprising determining matching verbalization of acronyms in the presentation information source with a tokenized representation of the acronym in the speech.

20. A system for collaborative note taking based on a speech by a speaker and providing a summary to a user in an audience of the speaker, the system comprising:
- a memory;
  
  an input/output circuit for;
  
  receiving a set of information from the speech;
  
  retrieving portions of information from the set of information; and
  
  retrieving portions of context information from a presentation information source to obtain domain information, the portions of information from the speech being temporally associated with the portions of context information from the presentation information source;
  
  a processor that performs the operations;
  
  determines at least one language model based on the portions of information from the speech and the temporally associated portion of context information from the presentation information source, wherein the at least one language model is dynamically determined;
  
  applies the language model to the set of information to extract salient tokens from the set of information;
  
  generates the summary;
  
  transmits the summary to be displayed to the user; and
  
  adjusts the language model according to the user feedback after receiving collaborative user feedback information relating to the summary; and
  
  a relevance and correctness determination circuit for verifying relevance of the salient tokens based on support information to obtain verified tokens,wherein the summary is generated by assembling the verified tokens.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 45, 46)
- - 21. The system of claim 20, wherein the summary is transmitted to a device of the user.
  - 22. The system of claim 21, wherein the language model recognizes at least one of speech and handwriting information.
  - 23. The system of claim 21, wherein the language model recognizes features associated with at least one of:
    - audio information, video information, and tactile information.
  - 24. The system of claim 21, wherein the language model recognizes attributes associated with at least one of:
    - a song, a book and a video.
  - 25. The system of claim 24, wherein the attributes are at least one of:
    - a title, a genre, an author.
  - 26. The system of claim 21, wherein the context information is at least one of support information and collaborative user feedback information.
  - 27. The system of claim 21, wherein the language model is a composite language model.
  - 28. The system of claim 27, wherein the composite language model is comprised of at least one of an actor based model, a domain based model and a genre based model.
  - 29. The system of claim 21, wherein the actor based model is a speaker based recognition model.
  - 30. The system of claim 21, wherein the summary is a suggested note.
  - 31. The system of claim 26, wherein the presentation information source is based on at least one of:
    - audio, visual, tactile information.
  - 32. The system of claim 31, wherein the audio information is collaborative user generated audio.
  - 33. The system of claim 31, wherein the visual information is at least one of video information, video capture of user gestures, textual information.
  - 34. The system of claim 26, wherein the collaborative user feedback is based on user actions.
  - 35. The system of claim 34, wherein the user actions operate on at least one of:
    - suggested collaborative notes, user notes, user corrections, user pauses.
  - 36. The system of claim 34, wherein a user action is ignoring a suggested note.
  - 37. The system of claim 34, wherein the collaborative user feedback is based on at least one of:
    - numbers of user actions, identity of users taking action, weighting based on which users take action.
  - 38. The system of claim 31, wherein the tactile information is displayed using a dynamically refreshable tactile display.
  - 39. The system of claim 26, wherein the collaborative feedback information is determined based on at least one of:
    - intensity of user response, number of users responding, type of responses, time to respond, identity of user responding.
  - 45. The system of claim 29, wherein the processor further determines matching verbalization of acronyms in the speech with a tokenized representation of the acronym in the presentation information source.
  - 46. The system of claim 29, wherein the processor further determines matching verbalization of acronyms in the presentation information source with a tokenized representation of the acronym in the speech.

40. Computer readable storage medium comprising computer readable program code embodied on the computer readable storage medium, the computer readable program code usable to program a computer to recognize ambiguous information comprising:
- determining portions of information from a speech of a speaker;
  
  determining portions of context information from a presentation information source temporally associated with the portions of information from the speech;
  
  determining at least one language model based on the portions of information from the speech and the temporally associated portions of context information from the presentation information source, wherein the at least one language model is dynamically determined;
  
  applying the at least one language model to the speech to extract salient tokens from the speech;
  
  verifying relevance of the salient tokens based on the presentation information source to obtain verified tokens;
  
  determining a summary of the speech based on at least one of the determined language models, wherein determining the summary includes assembling the verified tokens;
  
  displaying the summary to a user in an audience; and
  
  receiving collaborative user feedback information relating to the summary and adjusting the language model according to the collaborative user feedback.
- View Dependent Claims (41)
- - 41. The computer readable storage medium of claim 40, wherein the computer readable program codes is usable to program the computer to recognize ambiguous information.

42. A system for recognizing information from a speaker and providing a summary for a user in an audience of the speaker, comprising:
- means for determining portions of information from a speech of he speakermeans for determining portions of context information from a presentation information source temporally associated with the portions of information from the speech, and determining domain information of the speech from the context information;
  
  means for determining at least one language model based on the portions of information from the speech and the temporally associated portions of context information from the presentation information source, wherein the at least one language model is dynamically determined;
  
  means for determining output information based on at least one of the determined language models, said means for determining output applying the language model to the speech to extract salient tokens from the speech, verifying relevance of the salient tokens based on the presentation information source to obtain verified tokens, and generating a summary of the speech by assembling the verified tokens;
  
  means for displaying the summary to the user; and
  
  means for receiving collaborative user feedback information relating to the summary and adjusting the language model according to the collaborative user feedback.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fuji Xerox, Fuji Xerox Company Limited (Xerox Holdings Corp.)
Original Assignee
Fuji Xerox Company Limited (Xerox Holdings Corp.)
Inventors
Denoue, Laurent, Thione, Giovanni Lorenzo, Van Den Berg, Martin Henk
Primary Examiner(s)
Alam; Hosain T
Assistant Examiner(s)
Ahluwalia; Navneet K

Application Number

US10/768,675
Publication Number

US 20050171926A1
Time in Patent Office

1,947 Days
Field of Search

707 1- 10, 707100-1041, 707200-205, 704250-280
US Class Current

1/1
CPC Class Codes

G06F 40/166   Editing, e.g. inserting or ...

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

Systems and methods for collaborative note-taking

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

244 Citations

46 Claims

Specification

Use Cases

Quick Links

Others

Systems and methods for collaborative note-taking

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

244 Citations

46 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others