Using automated content analysis for audio/video content consumption

US 7,640,272 B2
Filed: 12/07/2006
Issued: 12/29/2009
Est. Priority Date: 12/07/2006
Status: Active Grant

First Claim

Patent Images

1. An audio/video (A/V) processing system, comprising:

an automatic content analyzer receiving A/V content and analyzing the A/V content using speech recognition and natural language processing to generate speech metadata and natural language metadata corresponding to the A/V content, the speech metadata including speaker identification metadata identifying speakers in the A/V content and a location in the A/V content that the identified speakers are speaking and the natural language metadata including subject matter metadata describing subject matter of segments of the A/V content and where in the A/V content the subject matter is mentioned, wherein the automatic content analyzer comprises an audio analyzer generating speech metadata by recognizing words in the A/V content and aligning the words with the A/V content, wherein the words comprise a transcription of the A/V content;

a player displaying a plurality of different metadata displays, the metadata displays displaying information based on the speech and the natural language metadata, the metadata displays including the speaker identification metadata, the subject matter metadata and the transcription, the metadata displays indicating where in the A/V content a speaker is speaking and where a subject matter is mentioned, wherein the player generates a user interface providing a user actuable input, actuable to select a speaker in the speaker identification metadata and either a word in the transcription, or a subject matter in the subject matter metadata, and to cause the player to begin playing the A/V content at a point in the A/V content that is aligned with the selected speaker and either the selected word or the selected subject matter, the user interface including a thumbnail section, a speaker indicator section below the thumbnail section, and a legend, the legend identifying each of the speakers in the A/V content, the thumbnail section including a plurality of different thumbnail photographs, each of the plurality of different thumbnail photographs representing a predominant speaker in one of the A/V content segments, the speaker indicator section identifying all the speakers that speak during the A/V content and approximately where in the A/V content each speaker speaks; and

a computer processor, being a functional component of the A/V processing system, activated by the automatic content analyzer to facilitate analyzing of the A/V content.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Audio/video (A/V) content is analyzed using speech and language analysis components. Metadata is automatically generated based upon the analysis. The metadata is used in generating user interface interaction components which allow a user to view subject matter in various segments of the A/V content and to interact with the A/V content based on the automatically generated metadata.

91 Citations

View as Search Results

8 Claims

1. An audio/video (A/V) processing system, comprising:
- an automatic content analyzer receiving A/V content and analyzing the A/V content using speech recognition and natural language processing to generate speech metadata and natural language metadata corresponding to the A/V content, the speech metadata including speaker identification metadata identifying speakers in the A/V content and a location in the A/V content that the identified speakers are speaking and the natural language metadata including subject matter metadata describing subject matter of segments of the A/V content and where in the A/V content the subject matter is mentioned, wherein the automatic content analyzer comprises an audio analyzer generating speech metadata by recognizing words in the A/V content and aligning the words with the A/V content, wherein the words comprise a transcription of the A/V content;
  
  a player displaying a plurality of different metadata displays, the metadata displays displaying information based on the speech and the natural language metadata, the metadata displays including the speaker identification metadata, the subject matter metadata and the transcription, the metadata displays indicating where in the A/V content a speaker is speaking and where a subject matter is mentioned, wherein the player generates a user interface providing a user actuable input, actuable to select a speaker in the speaker identification metadata and either a word in the transcription, or a subject matter in the subject matter metadata, and to cause the player to begin playing the A/V content at a point in the A/V content that is aligned with the selected speaker and either the selected word or the selected subject matter, the user interface including a thumbnail section, a speaker indicator section below the thumbnail section, and a legend, the legend identifying each of the speakers in the A/V content, the thumbnail section including a plurality of different thumbnail photographs, each of the plurality of different thumbnail photographs representing a predominant speaker in one of the A/V content segments, the speaker indicator section identifying all the speakers that speak during the A/V content and approximately where in the A/V content each speaker speaks; and
  
  a computer processor, being a functional component of the A/V processing system, activated by the automatic content analyzer to facilitate analyzing of the A/V content.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The A/V processing system of claim 1 wherein the automatic content analyzer is configured to segment the A/V content based on the speech metadata and the natural language metadata, wherein the metadata displays include a display of the segments, wherein the user interface provides a user actuable input, actuable to select a segment of the A/V content for display and to cause the player to begin displaying the A/V content at the selected segment, and wherein the user interface includes a subject matter section having a topic index, the topic index identifying a topic or subject matter being discussed during segments of the A/V content.
  - 3. The A/V processing system of claim 1 wherein the automatic content analyzer comprises a chapter analyzer identifying chapter boundaries in the A/V content based on the speech metadata and the natural language metadata and calculating a confidence score for each chapter boundary, and wherein the user interface includes a factoid section that identifies factoids that are mentioned in the A/V content, the factoids representing discrete facts cited in the A/V content, the factoids being associated with designators on a topic section of the user interface.
  - 4. The A/V processing system of claim 3 wherein the user interface comprises a user actuable granularity selector and a context section, the user actuable granularity selector actuable to adjust a number of chapter boundaries displayed based on the confidence scores for each chapter boundary, the content section providing an overview of a context of the A/V content, the context being derived from the metadata generated for the A/V content.
  - 5. The A/V processing system of claim 2 wherein the automatic content analyzer comprises:
    - a natural language processor configured to generate summaries of different segments of the A/V content as the natural language metadata.
  - 6. The A/V processing system of claim 2 wherein the automatic content analyzer comprises a keyword identifier configured to identify keywords in each segment.
  - 7. The A/V processing system of claim 6 wherein the automatic content analyzer comprises a search engine configured to identify related information, related to a segment of the A/V content, based on the keywords identified in the segment, and wherein the metadata displays include at least the related information or a link to the related information.
  - 8. The A/V processing system of claim 2 wherein the user interface includes an input mechanism configured to receive community metadata corresponding to the segments in the A/V content, and wherein the metadata displays comprise a community metadata display displaying information indicative of the community metadata, the information displayed by the community metadata display changing as different segments of the A/V content are displayed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Acero, Alejandro, Nguyen, Patrick, Mahajan, Milind
Primary Examiner(s)
Breene; John E
Assistant Examiner(s)
COLAN, GIOVANNA B

Application Number

US11/635,153
Publication Number

US 20080140385A1
Time in Patent Office

1,118 Days
Field of Search

725/136, 725/149, 704/275, 704/270, 704/243, 709/203, 358/1.15, 348/14.12, 707/104.1
US Class Current

1/1
CPC Class Codes

G06F 16/745   the internal structure of a...

G06F 16/7834   using audio features

G06F 16/7844   using original textual cont...

G10L 15/18   using natural language mode...

Y10S 707/99945   Object-oriented database st...

Y10S 707/99948   Application of database or ...

Using automated content analysis for audio/video content consumption

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

91 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Using automated content analysis for audio/video content consumption

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

91 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links