Web-based audio transcription tool

US 8,676,590 B1
Filed: 09/26/2012
Issued: 03/18/2014
Est. Priority Date: 09/26/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

generating, at a server having one or more processors, an image representing audio content;

providing, from the server, the image and the audio content to a plurality of client devices, the image being provided for display along a vertical axis on a display of each of the client devices;

receiving, at the server, a first post from a first client device of the plurality of client devices, the first post including a first identifier indicating (i) a first position along the vertical axis of the image, and (ii) a first text portion representative of at least a portion of the audio content at the first position, the first text portion being entered by a first user of the first client device;

receiving, at the server, a second post from a second client device of the plurality of client devices, the second post including a second identifier indicating (i) a second position along the vertical axis of the image, and (ii) a second text portion representative of at least a portion of the audio content at the second position, the second text portion being entered by a second user of the second client device;

synchronizing, at the server, the first and second posts based on the first and second identifiers;

correlating, at the server, the first and second posts to provide a single transcription of the audio content,receiving, at the server, a command to zoom in on a portion of the image from the first client device;

generating, at the server, a second image in response to receiving the command, the second image representing an enlargement of the portion of the image;

providing, from the server, the second image to the first client device for display along the vertical axis on the display of the first client device;

receiving, at the server, a third post from the first client device, the third post including a third identifier indicating (i) a third position along the vertical axis of the second image, and (ii) a third text portion representative of at least a portion of the audio content at the third position, the third text portion being entered by the first user of the first client device; and

synchronizing, at the server, the first, second and third posts based on the first, second and third identifiers,wherein correlating the first and second posts to provide the single transcription of the audio content includes correlating the first, second and third posts to provide the single transcription.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented technique for transcribing audio data includes generating, along a vertical axis on a display of a client device, an image representing audio content. The technique further includes receiving, from a user of the client device, a selection of a portion of the image; and generating, via an audio module of the client device, an audio output corresponding to the selected portion of the image. The technique further includes receiving, from the user, a selection indicating a position along the vertical axis on the display to enter a text portion representing the audio output, wherein the position is aligned to the selected portion of the image. The technique further includes receiving, from the user, the text portion representing the audio output; and displaying, on the display, the text portion at the position, wherein the text portion extends along a horizontal axis on the display.

Citations

18 Claims

1. A computer-implemented method, comprising:
- generating, at a server having one or more processors, an image representing audio content;
  
  providing, from the server, the image and the audio content to a plurality of client devices, the image being provided for display along a vertical axis on a display of each of the client devices;
  
  receiving, at the server, a first post from a first client device of the plurality of client devices, the first post including a first identifier indicating (i) a first position along the vertical axis of the image, and (ii) a first text portion representative of at least a portion of the audio content at the first position, the first text portion being entered by a first user of the first client device;
  
  receiving, at the server, a second post from a second client device of the plurality of client devices, the second post including a second identifier indicating (i) a second position along the vertical axis of the image, and (ii) a second text portion representative of at least a portion of the audio content at the second position, the second text portion being entered by a second user of the second client device;
  
  synchronizing, at the server, the first and second posts based on the first and second identifiers;
  
  correlating, at the server, the first and second posts to provide a single transcription of the audio content,receiving, at the server, a command to zoom in on a portion of the image from the first client device;
  
  generating, at the server, a second image in response to receiving the command, the second image representing an enlargement of the portion of the image;
  
  providing, from the server, the second image to the first client device for display along the vertical axis on the display of the first client device;
  
  receiving, at the server, a third post from the first client device, the third post including a third identifier indicating (i) a third position along the vertical axis of the second image, and (ii) a third text portion representative of at least a portion of the audio content at the third position, the third text portion being entered by the first user of the first client device; and
  
  synchronizing, at the server, the first, second and third posts based on the first, second and third identifiers,wherein correlating the first and second posts to provide the single transcription of the audio content includes correlating the first, second and third posts to provide the single transcription.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the third position in the second image corresponds to at least one of the first position and the second position in the first image.
  - 3. The method of claim 1, wherein correlating the first, second and third posts provides the single transcription at different resolutions.
  - 4. The method of claim 1, wherein the server comprises a plurality of servers.
  - 5. The method of claim 1, wherein the portion of the audio content at the third position comprises a sub-portion of the audio content at least one of the portions of the audio content at the first and second positions.
  - 6. The method of claim 1, wherein the first post is received from the first client device subsequent to the first user selecting the first position and entering the first text portion at the first client device.

7. A computer-implemented method, comprising:
- generating, at a server having one or more processors, a first image representing audio content;
  
  providing, from the server, the first image and the audio content to a plurality of client devices, the first image being provided for display along a vertical axis on a display of each of the client devices;
  
  receiving, at the server, a first post from a first client device of the plurality of client devices, the first post including a first identifier indicating (i) a first position along the vertical axis of the first image, and (ii) a first text portion representative of at least a portion of the audio content at the first position, the first text portion being entered by a first user of the first client device;
  
  receiving, at the server, a command to zoom in on a portion of the first image from a second client device of the plurality of client devices;
  
  generating, at the server, a second image in response to receiving the command, the second image representing an enlargement of the portion of the first image;
  
  providing, from the server, the second image to a second client device for display along the vertical axis on the display of the second client device;
  
  receiving, at the server, a second post from the second client device, the second post including a second identifier indicating (i) a second position along the vertical axis of the second image, and (ii) a second text portion representative of at least a portion of the audio content at the second position, the second text portion being entered by a second user of the second client device;
  
  synchronizing, at the server, the first and second posts based on the first and second identifiers; and
  
  correlating, at the server, the first and second posts to provide a single transcription of the audio content.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method of claim 7, wherein correlating the first and second posts provides the single transcription at different resolutions.
  - 9. The method of claim 7, wherein the server comprises a plurality of servers.
  - 10. The method of claim 7, wherein the second position in the second image corresponds to the first position in the first image.
  - 11. The method of claim 7, wherein the first post is received from the first client device subsequent to the first user selecting the first position and entering the first text portion at the first client device.
  - 12. The method of claim 7, wherein the portion of the audio content at the second position comprises a sub-portion of the audio content at the portion of the audio content at the first position.

13. A non-transitory computer-readable storage medium storing computer executable code that, when executed by a computing device having one or more processors, cause the computing device to perform operations comprising:
- generating an image representing audio content;
  
  providing the image and the audio content to a plurality of client devices, the image being provided for display along a vertical axis on a display of each of the client devices;
  
  receiving a first post from a first client device of the plurality of client devices, the first post including a first identifier indicating (i) a first position along the vertical axis of the image, and (ii) a first text portion representative of at least a portion of the audio content at the first position, the first text portion being entered by a first user of the first client device;
  
  receiving a second post from a second client device of the plurality of client devices, the second post including a second identifier indicating (i) a second position along the vertical axis of the image, and (ii) a second text portion representative of at least a portion of the audio content at the second position, the second text portion being entered by a second user of the second client device;
  
  synchronizing the first and second posts based on the first and second identifiers;
  
  correlating the first and second posts to provide a single transcription of the audio content;
  
  receiving a command to zoom in on a portion of the image from the first client device;
  
  generating a second image in response to receiving the command, the second image representing an enlargement of the portion of the image;
  
  providing the second image to the first client device for display along the vertical axis on the display of the first client device;
  
  receiving a third post from the first client device, the third post including a third identifier indicating (i) a third position along the vertical axis of the second image, and (ii) a third text portion representative of at least a portion of the audio content at the third position, the third text portion being entered by the first user of the first client device; and
  
  synchronizing the first, second and third posts based on the first, second and third identifiers,wherein correlating the first and second posts to provide the single transcription of the audio content includes correlating the first, second and third posts to provide the single transcription.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer-readable storage medium of claim 13, wherein the third position in the second image corresponds to at least one of the first position and the second position in the first image.
  - 15. The computer-readable storage medium of claim 13, wherein correlating the first, second and third posts provides the single transcription at different resolutions.
  - 16. The computer-readable storage medium of claim 13, wherein the server comprises a plurality of servers.
  - 17. The computer-readable storage medium of claim 13, wherein the portion of the audio content at the third position comprises a sub-portion of the audio content at least one of the portions of the audio content at the first and second positions.
  - 18. The computer-readable storage medium of claim 13, wherein the first post is received from the first client device subsequent to the first user selecting the first position and entering the first text portion at the first client device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sorensen, Jeffrey Scott, Nanzawa, Masayuki, Rajakumar, Ravindran
Primary Examiner(s)
Borsetti, Greg

Application Number

US13/627,027
Time in Patent Office

538 Days
Field of Search

None
US Class Current

704/276
CPC Class Codes

G06F 3/167 Audio in a user interface, ...

G10L 21/06 Transformation of speech in...

Web-based audio transcription tool

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Web-based audio transcription tool

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links