Transcription of spoken communications

US 9,787,819 B2
Filed: 09/18/2015
Issued: 10/10/2017
Est. Priority Date: 09/18/2015
Status: Active Grant

First Claim

Patent Images

1. A user terminal comprising:

a microphone for capturing a portion of speech spoken by a near-end user of said user terminal;

a network interface for connecting to a communication network;

a touchscreen user interface;

a communication client application configured to;

conduct a communication session, over said communication network, between the near-end user and one or more far-end users of one or more far-end terminals, said communication session including an estimated transcription of said portion of speech that is capable of being sent in a message to the one or more far-end users;

obtain a plurality of alternative transcriptions for said portion of speech including an estimated probability of being correct for each transcription of the plurality of alternative transcriptions;

implement a vetting mechanism to allow the near-end user to vet the estimated transcription via the touchscreen user interface prior to the estimated transcription being sent in said message, the vetting mechanism including;

a first gesture received at the touchscreen user interface indicating acceptance of the estimated transcription to be included in a predetermined role in the message; and

one or more second gestures received at the touchscreen user interface indicating rejection of the estimated transcription from being included in said message; and

in response to receiving an indication of the one or more second gestures, select a next most probable transcription from the plurality of alternative transcriptions according to the respective estimated probability of being correct, and present the next most probable transcription with an option to accept or reject the next most probable transcription via the touchscreen user interface to be sent in said message.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A portion of speech is captured when spoken by a near-end user. A near-end user terminal conducts a communication session, over a network, between the near-end user and one or more far-end users, the session including a message sent to the one or more far-end users. A vetting mechanism is provided via a touchscreen user interface of the near-end user terminal, to allow the near-end user to vet an estimated transcription of the portion of speech prior to being sent to the one or more far-end users in the message. According to the vetting mechanism: (i) a first gesture performed by the near-end user through the touchscreen user interface accepts the estimated transcription to be included in a predetermined role in the sent message, while (ii) one or more second gestures performed by the near-end user through the touchscreen user interface each reject the estimated transcription to be sent in the message.

Citations

20 Claims

1. A user terminal comprising:
- a microphone for capturing a portion of speech spoken by a near-end user of said user terminal;
  
  a network interface for connecting to a communication network;
  
  a touchscreen user interface;
  
  a communication client application configured to;
  
  conduct a communication session, over said communication network, between the near-end user and one or more far-end users of one or more far-end terminals, said communication session including an estimated transcription of said portion of speech that is capable of being sent in a message to the one or more far-end users;
  
  obtain a plurality of alternative transcriptions for said portion of speech including an estimated probability of being correct for each transcription of the plurality of alternative transcriptions;
  
  implement a vetting mechanism to allow the near-end user to vet the estimated transcription via the touchscreen user interface prior to the estimated transcription being sent in said message, the vetting mechanism including;
  
  a first gesture received at the touchscreen user interface indicating acceptance of the estimated transcription to be included in a predetermined role in the message; and
  
  one or more second gestures received at the touchscreen user interface indicating rejection of the estimated transcription from being included in said message; and
  
  in response to receiving an indication of the one or more second gestures, select a next most probable transcription from the plurality of alternative transcriptions according to the respective estimated probability of being correct, and present the next most probable transcription with an option to accept or reject the next most probable transcription via the touchscreen user interface to be sent in said message.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20)
- - 2. The user terminal of claim 1, wherein the communication client is further configured to send no transcription of said portion of speech to the one or more far-end users responsive to receiving an indication to abandon transcription.
  - 3. The user terminal of claim 2, wherein the abandonment of the transcription comprises abandoning the sending of the message, or discarding from the sending a part of the message comprising said portion of speech.
  - 4. The user terminal of claim 1, wherein the communication client application is further configured so as in response to the estimated transcription being rejected, to:
    - capture a re-spoken version of said portion of speech from the near-end user, to obtain a new transcription of the re-spoken version, and provide the near-end user with an option via the touchscreen user interface to accept or reject the new transcription to be sent in said message.
  - 5. The user terminal of claim 1, wherein the communication client application is further configured so as in response to the next most probable transcription being rejected, to:
    - select a further next most probable alternative transcription, according to said estimated probabilities of being correct, andpresent the further next most probable alternative transcription with an option via the touchscreen user interface to accept or reject the further next most probable transcription to be sent in said message.
  - 6. The user terminal of claim 1, wherein the communication client application is further configured so as in response to the next most probable transcription being rejected, to:
    - present a list of at least some of the plurality of alternative transcriptions displayed in association with the respective estimated probability of being correct for each transcription with an option to select one of the plurality of alternative transcriptions from the list via the touchscreen user interface.
  - 7. The user terminal of claim 1, wherein the message comprises a video and an indication of a thumbnail image configured to provide a preview of the video at the one or more far-end terminals, wherein said predetermined role comprises inclusion in the thumbnail image.
  - 8. The user terminal of claim 1, wherein the message comprises audio and/or video, and said predetermined role comprises subtitling of the audio and/or video.
  - 9. The user terminal of claim 1, wherein the message comprises an IM message or email having a textual body, and said predetermined role is as a part or all of the body of the message.
  - 10. The user terminal of claim 1, wherein one or more of the first gesture and the one or more second gestures are each:
    - a single gesture, a one or two dimensional gesture across the touchscreen user interface, and/or a gesture in a single straight-line direction across the touchscreen user interface.
  - 11. The user terminal of claim 1, wherein the first gesture is performed in an opposite direction to one of the one or more second gestures.
  - 12. The user terminal of claim 1, wherein the communication session is a bidirectional communication session between the near-end user and the one or more far-end users.
  - 13. The user terminal of claim 1, wherein the communication client application is further configured to send an audio and/or video recording of said portion of speech to the one or more far-end users as part of said message.
  - 14. The user terminal of claim 1, wherein the communication session is with only one far-end user.
  - 15. The user terminal of claim 1, wherein the communication session is with multiple far-end users.
  - 16. The user terminal of claim 1, wherein one of the estimated transcription or one of the plurality of alternative transcriptions includes a translation into a different language from a language in which the portion of speech is spoken.
  - 20. The user terminal of claim 1, wherein the communication session comprises a video messaging conversation and/or a voice call.

17. A method comprising:
- capturing a portion of speech spoken by a near-end user of a near-end user terminal;
  
  operating the near-end user terminal to conduct a communication session, over a network, between the near-end user and one or more far-end users of one or more far-end terminals, the communication session including an estimated transcription for said portion of speech that is capable of being sent in a message to the one or more far-end users;
  
  obtaining a plurality of alternative transcriptions for said portion of speech including an estimated probability of being correct for each transcription of the plurality of alternative transcriptions;
  
  implementing a vetting mechanism via a touchscreen user interface of the near-end user terminal, to allow the near-end user to vet an estimated transcription of said portion of speech prior to being sent to the one or more far-end users in said message, wherein said vetting mechanism includes a first gesture received at the touchscreen user interface indicating acceptance of the estimated transcription to be included in a predetermined role in said message and one or more second gestures received at the touchscreen user interface indicating rejection of the estimated transcription from being included in said message; and
  
  responsive to receiving an indication of the one or more second gestures, selecting a next most probable transcription from the plurality of alternative transcriptions according to the respective estimated probability of being correct, and presenting the next most probable transcription with an option to accept or reject the next most probable transcription via the touchscreen user interface to be sent in said message.
- View Dependent Claims (19)
- - 19. A method as recited in claim 17, wherein the first gesture comprises a swipe gesture in a first direction, and wherein the one or more second gestures comprise a swipe gesture in a second direction opposite the first direction.

18. A computer-readable storage medium storing program code that is executable on a near-end user terminal to perform operations comprising:
- capturing a portion of speech spoken by the near-end user;
  
  operating the near-end user terminal to conduct a communication session, over a network, between the near-end user and one or more far-end users of one or more far-end terminals, the communication session including an estimated transcription of said portion of speech that is capable of being sent in a message to the one or more far-end users;
  
  obtaining a plurality of alternative transcriptions for said portion of speech including an estimated probability of being correct for each transcription of the plurality of alternative transcriptions;
  
  implementing a vetting mechanism via a touchscreen user interface of the near-end user terminal to allow the near-end user to vet the estimated transcription prior to being sent in said message, wherein said vetting mechanism includes a first gesture received at the touchscreen user interface indicating acceptance of the estimated transcription to be included in a predetermined role in said message and one or more second gestures received at the touchscreen user interface indicating rejection of the estimated transcription from being included in said message; and
  
  responsive to receiving an indication of the one or more second gestures, selecting a next most probable transcription from the plurality of alternative transcriptions according to the respective estimated probability of being correct; and
  
  presenting the next most probable transcription with an option to accept or reject the next most probable transcription via the touchscreen user interface to be sent in said message.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Abkairov, Nikolay
Primary Examiner(s)
Ajibade Akonai, Olumide T
Assistant Examiner(s)
Zhang, Edward

Application Number

US14/858,648
Publication Number

US 20170085696A1
Time in Patent Office

753 Days
Field of Search

455563
US Class Current
CPC Class Codes

G06F 3/017   Gesture based interaction, ...

G06F 3/04883   for inputting data by handw...

G06F 40/40   Processing or translation o...

G10L 15/22   Procedures used during a sp...

G10L 2015/221   Announcement of recognition...

H04M 1/72433   for voice messaging, e.g. d...

H04M 1/72439   for image or video messaging

Transcription of spoken communications

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Transcription of spoken communications

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links