Activity-ware for non-textual objects

US 7,716,054 B2
Filed: 06/29/2007
Issued: 05/11/2010
Est. Priority Date: 06/29/2007
Status: Active Grant

First Claim

Patent Images

1. A system that facilitates organization of audio media, the system comprising:

a memory, wherein the memory is encoded with instructions;

a processor, wherein the processor executes the instructions;

the instructions being executed comprising;

an inference component that determines a point of interest based at least in part upon identification of an energy level, wherein identification of the energy level occurs in an oral conversation or review of a recording of the oral conversation, wherein the energy level is based at least in part on measurable auditory indications;

an annotation component that marks an audio media file at a location associated with the point of interest; and

a summarization component that generates a summary of the oral conversation by compiling portions of the audio media file that are in a threshold proximity to one or more locations marked by the annotation component, wherein the summarization component automatically determines an appropriate size portion for the threshold proximity as a function of relevancy to each marked point of interest, wherein the appropriate size portion is determined by;

analyzing a first portion of the audio media file within a first proximity to a point of interest by translating the first portion into text and identifying a first set of keywords representative of the first portion of the audio media file;

analyzing a second portion of the audio media file within a second threshold proximity to a point of interest by translating the first portion into text and identifying a second set of keywords representative of the second portion of the audio media file;

comparing relevancy of keywords within the first portion to keywords within the second portion of the audio media file and determining whether relevancy of keywords within the second portion drops below a default relevancy factor, wherein the default relevancy factor is a function of the first set of keywords representative of the first portion of the audio media file; and

selecting the appropriate size portion based on whether relevancy of keywords within the second portion drops below the default relevancy factor.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Providing for summarization and analysis of audio content is described herein. By way of example, an oral conversation can be analyzed, such that points of interest within the oral conversation can be identified and file locations related to such points of interest can be marked. Points of interest can be inferred based on a level of energy, e.g., excitement, pitch, tone, pace, or the like, associated with one or more speakers. Alternatively, or in addition, speaker and/or reviewer activity can form the basis for identifying points of interest within the conversation. Moreover, a compilation of the identified points of interest and portions of the original oral conversation related thereto can be assembled. As described herein, audio content can be succinctly summarized with respect to inferred and/or indicated points of interest, to facilitate an efficient and pertinent review of such content.

27 Citations

View as Search Results

17 Claims

1. A system that facilitates organization of audio media, the system comprising:
- a memory, wherein the memory is encoded with instructions;
  
  a processor, wherein the processor executes the instructions;
  
  the instructions being executed comprising;
  
  an inference component that determines a point of interest based at least in part upon identification of an energy level, wherein identification of the energy level occurs in an oral conversation or review of a recording of the oral conversation, wherein the energy level is based at least in part on measurable auditory indications;
  
  an annotation component that marks an audio media file at a location associated with the point of interest; and
  
  a summarization component that generates a summary of the oral conversation by compiling portions of the audio media file that are in a threshold proximity to one or more locations marked by the annotation component, wherein the summarization component automatically determines an appropriate size portion for the threshold proximity as a function of relevancy to each marked point of interest, wherein the appropriate size portion is determined by;
  
  analyzing a first portion of the audio media file within a first proximity to a point of interest by translating the first portion into text and identifying a first set of keywords representative of the first portion of the audio media file;
  
  analyzing a second portion of the audio media file within a second threshold proximity to a point of interest by translating the first portion into text and identifying a second set of keywords representative of the second portion of the audio media file;
  
  comparing relevancy of keywords within the first portion to keywords within the second portion of the audio media file and determining whether relevancy of keywords within the second portion drops below a default relevancy factor, wherein the default relevancy factor is a function of the first set of keywords representative of the first portion of the audio media file; and
  
  selecting the appropriate size portion based on whether relevancy of keywords within the second portion drops below the default relevancy factor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, comprising a context component that determines at least time and location information related to the oral conversation and associates the information with the audio media file.
  - 3. The system of claim 2, comprising a network search component that provides context related to the oral conversation, including at least in part by interfacing with one or more network data servers and compiling data pertinent to at least the time and location information, as well as data pertaining to time, location, news, weather, or political information, or combinations thereof, pertinent to keywords extracted from the conversation.
  - 4. The system of claim 2, comprising a user input component that stores a user preference file, the user preference file establishes defaults pertinent to determining the point of interest, determining the time and location information, or determining a context related to the oral conversation, or combinations thereof.
  - 5. The system of claim 1, comprising a multi media component that associates diverse content germane to the oral conversation with the audio media file.
  - 6. The system of claim 5, the annotation component marks a file at a location where the diverse content is germane to the point of interest.
  - 7. The system of claim 5, the diverse content includes photographic, video, or textual content, or combinations thereof.
  - 8. The system of claim 1, comprising a navigation component that retrieves a portion of the audio file proximate the marked location.
  - 9. The system of claim 1, comprising a parsing component that discards a portion of the audio media file that is not within a threshold proximity to a marked point of interest.
  - 10. The system of claim 1, the energy level is determined implicitly by a pitch, tone, pause rate, word rate, or volume of a speaker'"'"'s or a reviewer'"'"'s voice, or a number of speakers or reviewers speaking concurrently, or explicitly by a predetermined verbal, somatic, or auditory trigger, or press of a button on a device, or combinations thereof.

11. A method for providing a summary of an audio content, comprising:
- storing, in a memory, instructions for performing the method of providing a summary of an audio content;
  
  executing the instructions on a processor;
  
  according to the instructions being executed;
  
  capturing at least a portion of an oral conversation in an audio file;
  
  marking the audio file at one or more locations proximate to one or more points of interest, wherein the one or more points of interest are identified via a speaker activity or inferred from a degree of emotion in one or more speakers'"'"' voices;
  
  associating portions of the audio file that are within a threshold proximity to at least one point of interest; and
  
  summarizing the oral conversation by compiling portions of the audio media file that are in a threshold proximity to one or more locations marked, wherein summarizing automatically determines an appropriate size portion for the threshold proximity as a function of relevancy to each marked point of interest.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The method of claim 11, comprising discarding portions of the audio file that are not within a threshold proximity to at least one point of interest.
  - 13. The method of claim 11, wherein the speaker activity includes a verbal trigger, somatic trigger or press of a button on a device.
  - 14. The method of claim 11, comprising marking the audio file as a result of one or more points of interest identified as points of interest to each sneaker of multiple speakers, wherein a mark associated with one speaker is distinct from a mark associated with another speaker.
  - 15. The method of claim 11, wherein the appropriate size portion is determined by:
    - analyzing a first portion of the audio file within a first proximity to a point of interest by translating the first portion into text and identifying a first set of keywords representative of the first portion of the audio file;
      
      analyzing a second portion of the audio file within a second threshold proximity to a point of interest by translating the first portion into text and identifying a second set of keywords representative of the second portion of the audio file;
      
      comparing relevancy of keywords within the first portion to keywords within the second portion of the audio file and determining whether relevancy of keywords within the second portion drops below a default relevancy factor, wherein the default relevancy factor is a function of the first set of keywords representative of the first portion of the audio media file; and
      
      selecting the appropriate size portion based on whether relevancy of keywords within the second portion drops below the default relevancy factor.

16. A system that facilitates annotation and summarization of auditory objects, the system comprising:
- a memory, wherein the memory is encoded with instructions;
  
  a processor, wherein the processor executes the instructions;
  
  the instructions being executed comprising;
  
  means for identifying one or more points of interest within audio content based on a level of emotion of one or more speakers'"'"' voices, or based on a predetermined human activity, or combinations thereof;
  
  means for book marking an audio file at one or more locations commensurate with the one or more identified points of interest;
  
  means for correlating the audio file with diverse media related to the audio content, wherein the diverse media includes photographic media, video media, and textual media; and
  
  means for book marking one or more diverse media files containing the diverse media at locations commensurate with the one or more points of interest within the audio content.
- View Dependent Claims (17)
- - 17. The system of claim 16, further comprising means for compiling a summary of the points of interest across diverse media types by compiling portions of the audio file and portions of the one or more diverse media files within a threshold proximity of the bookmarked locations, into one or more related compilation files.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Dumais, Susan T., Harris, Jensen M., Megiddo, Eran, Wolf, Richard J.
Primary Examiner(s)
Vo; Huyen X.

Application Number

US11/771,135
Publication Number

US 20090006082A1
Time in Patent Office

1,047 Days
Field of Search

704/270, 704/260, 704/270.1, 704/251, 704/231, 704/235, 704/253, 704/254, 704/277, 348/468
US Class Current

704/270
CPC Class Codes

G06F 16/634   Query by example, e.g. quer...

G06F 16/68   Retrieval characterised by ...

G06F 16/683   using metadata automaticall...

G10L 17/26   Recognition of special voic...

G10L 25/48   specially adapted for parti...

G11B 27/28   by using information signal...

G11B 27/329   on a disc [VTOC]

Activity-ware for non-textual objects

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

27 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Activity-ware for non-textual objects

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links