Speech recognition by automated context creation

US 7,243,069 B2
Filed: 07/20/2001
Issued: 07/10/2007
Est. Priority Date: 07/28/2000
Status: Expired due to Term

First Claim

Patent Images

1. In a speech recognition system, a method of speech recognition comprising:

(a) receiving non-voice input in a computer system communicatively linked to the speech recognition system, said input having been sent to a user from a different user and comprising at least one of text contained in an e-mail sent or received by the user, information in a document attached to an e-mail sent or received by the user, information in a document viewed by the user on a display of the computer system, information in a plurality of linked documents accessible to the computer system, information in a spread sheet executing on the computer system, facsimile information received via a facsimile device connected to the computer system, call center information received via calling device connected to the computer system, and information recorded by a web browser executing on the computer system;

(b) creating a word list defining a context-enhanced database based upon said input or modifying an existing context-enhanced database by adding a word list created based upon said input, wherein said created and modified context-enhanced databases are dynamically generated based upon at least one of a current activity performed by the user on the computer system and a past activity performed by the user on the computer system within a predetermined time interval, said current and past activities comprising at least one of sending or receiving an e-mail, displaying a document contained in an e-mail, displaying information contained in a spread sheet executing on the computer system, receiving facsimile information via a facsimile device connected to the computer system, receiving call center information via a calling device connected to the computer system, and receiving information recorded by a web browser executing on the computer system;

(c) preparing a first textual output from the speech signal by performing a speech recognition task to convert a speech signal into said first textual output, wherein said context-enhanced database is accessed to improve the speech recognition rate, wherein said speech signal is parsed into a plurality of computer processable speech segments, wherein said first textual output comprises a plurality of text segments, each corresponding to one of the computer processable speech segments, and wherein selective ones of the text segments are generated by matching a computer processable speech segment against an entry within the context-enhanced database, said context-enhanced database including a plurality of entries, each entry comprising a speech utterance and a corresponding textual segment for the speech utterance;

(d) enabling editing of said first textual output to generate a final voice-generated output; and

(e) making said final voice-generated output available.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for speech recognition can include generating a context-enhanced database from a system input. A voice-generated output can be generated from a speech signal by performing a speech recognition task to convert the speech signal into computer-processable segments. During the speech recognition task, the context-enhanced database can be accessed to improve the speech recognition rate. Accordingly, the speech signal can be interpreted with respect to words included within the context-enhanced database. Additionally, a user can edit or correct an output in order to generate the final voice-generated output which can be made available.

Citations

38 Claims

1. In a speech recognition system, a method of speech recognition comprising:
- (a) receiving non-voice input in a computer system communicatively linked to the speech recognition system, said input having been sent to a user from a different user and comprising at least one of text contained in an e-mail sent or received by the user, information in a document attached to an e-mail sent or received by the user, information in a document viewed by the user on a display of the computer system, information in a plurality of linked documents accessible to the computer system, information in a spread sheet executing on the computer system, facsimile information received via a facsimile device connected to the computer system, call center information received via calling device connected to the computer system, and information recorded by a web browser executing on the computer system;
  
  (b) creating a word list defining a context-enhanced database based upon said input or modifying an existing context-enhanced database by adding a word list created based upon said input, wherein said created and modified context-enhanced databases are dynamically generated based upon at least one of a current activity performed by the user on the computer system and a past activity performed by the user on the computer system within a predetermined time interval, said current and past activities comprising at least one of sending or receiving an e-mail, displaying a document contained in an e-mail, displaying information contained in a spread sheet executing on the computer system, receiving facsimile information via a facsimile device connected to the computer system, receiving call center information via a calling device connected to the computer system, and receiving information recorded by a web browser executing on the computer system;
  
  (c) preparing a first textual output from the speech signal by performing a speech recognition task to convert a speech signal into said first textual output, wherein said context-enhanced database is accessed to improve the speech recognition rate, wherein said speech signal is parsed into a plurality of computer processable speech segments, wherein said first textual output comprises a plurality of text segments, each corresponding to one of the computer processable speech segments, and wherein selective ones of the text segments are generated by matching a computer processable speech segment against an entry within the context-enhanced database, said context-enhanced database including a plurality of entries, each entry comprising a speech utterance and a corresponding textual segment for the speech utterance;
  
  (d) enabling editing of said first textual output to generate a final voice-generated output; and
  
  (e) making said final voice-generated output available.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, wherein each of said computer-processable speech segments represent digitally encoded spoken words, and wherein each of the text segments is a word in text format.
  - 3. The method of claim 2, wherein during said speech recognition task, said speech signals are analyzed to determine whether matches exist within the context-enhanced database for the computer-processable speech segments before another database is searched to locate text matches for the computer-processable speech segments.
  - 4. The method of claim 2, wherein a second database is accessed to find a matching word for each of said words for which no matching word was found in said context-enhanced database, wherein the context-enhanced database is created from said input and from entries within the second database.
  - 5. The method of claim 1, wherein during said speech recognition task, said speech signals are analyzed to determine whether matches exist within the context-enhanced database for the computer-processable speech segments before another database is searched to locate text matches for the computer-processable speech segments.
  - 6. The method of claim 1, wherein at least two steps selected from the group consisting of steps (b), (c), (d), and (e), are performed concurrently.
  - 7. The method of claim 1, wherein said speech utterances and said textual segments of said context enhanced database represent words.
  - 8. The method of claim 1, wherein said speech signal is interpreted as part of said speech recognition task in light of entries included in said context-enhanced database.
  - 9. The method of claim 1, wherein the creating step further comprises the step of:
    - creating the context-enhanced database from those entries of a context-independent database having words included within the word list.
  - 10. The method of claim 1, wherein said voice-generated output is a physical output.
  - 11. The method of claim 10, wherein said voice-generated output is temporarily put into a memory.
  - 12. The method of claim 1, wherein said editing is enabled by highlighting words of said first textual output having a predetermined likelihood of misinterpretation of said speech signal.
  - 13. The method of claim 1, wherein said context-enhanced database is derived from an existing database based upon said input.
  - 14. The method of claim 1, wherein said context-enhanced database is dynamically generated specifically for the specified context, wherein the method further comprises the step of:
    - detecting an event signifying the context has changed; and
      
      responsively updating said context-enhanced database.
  - 15. The method of claim 1, further comprising the steps of:
    - automatically detecting a change in one or more active applications;
      
      responsive to the detected change, automatically deriving new input; and
      
      responsive to the new input, dynamically updating the context-dependant database based upon the new input.
  - 16. The method of claim 1, wherein one or more of a synonym lexicon and a meaning variants database is accessed when preparing said voice-generated output.

17. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
- (a) receiving non-voice input in a computer system communicatively linked to the speech recognition system, said input having been sent to a user from a different user and comprising at least one of text contained in an e-mail sent or received by the user, information in a document attached to an e-mail sent or received by the user, information in a document viewed by the user on a display of the computer system, information in a plurality of linked documents accessible to the computer system, information in a spread sheet executing on the computer system, facsimile information received via a facsimile device connected to the computer system, call center information received via calling device connected to the computer system, and information recorded by a web browser executing on the computer system;
  
  (b) creating a word list defining a context-enhanced database based upon said input or modifying an existing context-enhanced database by adding a word list created based upon said input, wherein said created and modified context-enhanced databases are dynamically generated based upon at least one of a current activity performed by the user on the computer system and a past activity performed by the user on the computer system within a predetermined time interval, said current and past activities comprising at least one of sending or receiving an e-mail, displaying a document contained in an e-mail, displaying information contained in a spread sheet executing on the computer system, receiving facsimile information via a facsimile device connected to the computer system, receiving call center information via a calling device connected to the computer system, and receiving information recorded by a web browser executing on the computer system;
  
  (c) preparing a first textual output from a speech signal by performing a speech recognition task to convert said speech signal into said first textual output, wherein said context-enhanced database is accessed to improve the speech recognition rate, wherein said speech signal is parsed into a plurality of computer processable speech segments, wherein said first textual output comprises a plurality of text segments, each corresponding to one of the computer processable speech segments, and wherein selective ones of the text segments are generated by matching a computer processable speech segment against an entry within the context-enhanced database, said context-enhanced database including a plurality of entries, each entry comprising a speech utterance and a corresponding textual segment for the speech utterance;
  
  (d) enabling editing of said first textual output to generate a final voice-generated output; and
  
  (e) making said final voice-generated output available.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 18. The machine-readable storage of claim 17, wherein each of said computer-processable speech segments represent digitally encoded spoken words, and wherein each of the text segments is a word in text format.
  - 19. The method of claim 18, wherein during said speech recognition task, said speech signals are analyzed to determine whether matches exist within the context-enhanced database for the computer-processable speech segments before another database is searched to locate text matches for the computer-processable speech segments.
  - 20. The method of claim 18, wherein a second database is accessed to find a matching word for each of said words for which no matching word was found in said context-enhanced database, wherein the context-enhanced database is created from said input and from entries within the second database.
  - 21. The machine-readable storage of claim 17, wherein during said speech recognition task, said speech signals are analyzed to determine whether matches exist within the context-enhanced database for the computer-processable speech segments before another database is searched to locate text matches for the computer-processable speech segments.
  - 22. The machine-readable storage of claim 17, wherein at least two steps selected from the group consisting of steps (b), (c), (d), and (e), are performed concurrently.
  - 23. The machine-readable storage of claim 17, wherein speech utterances and said textual segments of said context enhanced database represent words.
  - 24. The machine-readable storage of claim 17, wherein said speech signal is interpreted as part of said speech recognition task in light of entries included in said context-enhanced database.
  - 25. The machine-readable storage of claim 17, wherein the creating step further comprises the step of:
    - creating the context-enhanced database from those entries of a context-independent database having words included within a word list.
  - 26. The machine-readable storage of claim 17, wherein said voice-generated output is a physical output.
  - 27. The method of claim 26, wherein said voice-generated output is temporarily put into a memory.
  - 28. The machine-readable storage of claim 17, wherein said editing is enabled by highlighting words of said first textual output having a predetermined likelihood of misinterpretation of said speech signal.
  - 29. The machine-readable storage of claim 17, wherein said context-enhanced database is derived from an existing database based upon said input.
  - 30. The machine-readable storage of claim 17, wherein said context-enhanced database is dynamically generated specifically for the specified context, wherein the method further comprises the step of:
    - detecting an event signifying the context has changed; and
      
      responsively updating said context-enhanced database.
  - 31. The machine-readable storage of claim 17, further comprising the steps of:
    - automatically detecting a change in one or more active applications;
      
      responsive to the detected change, automatically deriving new input; and
      
      responsive to the new input, dynamically updating the context-dependant database based upon the new input.
  - 32. The machine-readable storage of claim 17, wherein one or more of a synonym lexicon and a meaning variants database is accessed when preparing said voice-generated output.

33. In a speech recognition system, a method of speech recognition comprising the steps of:
- receiving non-voice input in a computer system communicatively linked to the speech recognition system, said input having been sent to a user from a different user and comprising at least one of text contained in an e-mail sent or received by the user, information in a document attached to an e-mail sent or received by the user, information in a document viewed by the user on a display of the computer system, information in a plurality of linked documents accessible to the computer system, information in a spread sheet executing on the computer system, facsimile information received via a facsimile device connected to the computer system, call center information received via calling device connected to the computer system, and information recorded by a web browser executing on the computer system;
  
  creating a word list defining a context-enhanced database based upon the input or modifying an existing context-enhanced database by adding a word list created based upon the input, wherein said created and modified context-enhanced databases are dynamically generated based upon at least one of a current activity performed by the user on the computer system and a past activity performed by the user on the computer system within a predetermined time interval, said current and past activities comprising at least one of sending or receiving an e-mail, displaying a document contained in an e-mail, displaying information contained in a spread sheet executing on the computer system, receiving facsimile information via a facsimile device connected to the computer system, receiving call center information via a calling device connected to the computer system, and receiving information recorded by a web browser executing on the computer system;
  
  parsing a received speech signal into a plurality of speech segments;
  
  comparing said speech segments against entries in the context-enhanced database;
  
  when matching entries are found in the comparing step, for each matching entry retrieving a textual segment from the context-enhanced database that is associated with the matching entry; and
  
  generating textual output for the speech signal that includes the retrieved textual segments.
- View Dependent Claims (34, 35, 36, 37, 38)
- - 34. The method of claim 33, further comprising the steps of:
    - when matching entries are not found in the comparing step, generating a textual segment for the speech segment using a context-independent database, wherein the generated textual output includes the generated textual segments.
  - 35. The method of claim 34, wherein entries within the context-enhanced database are a subset of entries contained within the context-independent database that are derived from the context-independent database and the input.
  - 36. The method of claim 33, wherein the creating step further comprises the step of:
    - creating the context-enhanced database from those entries of a context-independent database having words included within the word list.
  - 37. The method of claim 33, further comprising the steps of:
    - automatically detecting a change in one or more active applications;
      
      responsive to the detected change, automatically deriving new input; and
      
      responsive to the new input, modifying the context-dependant database based upon the new input.
  - 38. The method of claim 37, further comprising the step of:
    - repeating the detecting step, the deriving step, and the modifying step of claim 37 to ensure the context-dependant database includes information for a current state of the active applications.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Jaepel, Dieter, Klenk, Juergen
Primary Examiner(s)
Lerner; Martin

Application Number

US09/910,657
Publication Number

US 20020013705A1
Time in Patent Office

2,181 Days
Field of Search

704/231, 704/235, 704/243, 704/244, 704/251, 704/252, 704/254, 704/255, 704/256, 704/257
US Class Current

704/235
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/228   of application context

Speech recognition by automated context creation

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition by automated context creation

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links