System and Method for Audio Scene Understanding of Physical Object Sound Sources
First Claim
1. A method of training an audio monitoring system comprising:
- receiving with a processor in the audio monitoring system first registration information for a first object in a first scene around a sound sensor in the audio monitoring system;
training with the processor a first classifier for a first predetermined action of the first object in the first scene, the first predetermined action generating sound detected by the sound sensor;
receiving with the processor second registration information for a second object in the first scene around the sound sensor;
training with the processor a second classifier for a second predetermined action of the second object in the first scene, the second predetermined action generating sound detected by the sound sensor;
receiving with the processor object relationship data corresponding to a relationship between the first object and the second object in the first scene;
generating with the processor a specific scene grammar including a first sound event formed from with reference to a predetermined general scene grammar stored in a memory, the first registration information, the second registration information, and the object relationship data; and
storing with the processor the specific scene grammar in the memory in association with the first classifier and the second classifier for identification of a subsequent occurrence of the first sound event including the first predetermined action of the first object and the second predetermined action of the second object.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of operating an audio monitoring system includes generating with a sound sensor audio data corresponding to a sound event generated by an object in a scene around the sound sensor, identifying with a processor a type and action of the object in the scene that generated the sound with reference to the audio data, generating with the processor a timestamp corresponding to a time of the detection of the sound event, and updating a scene state model corresponding to sound events generated by a plurality of objects in the scene with reference to the identified type of object, action taken by the object, and the timestamp. The method further includes identifying a sound event in the scene with reference to the scene state model and a predetermined scene grammar stored in a memory, and generating with the processor an output corresponding to the sound event.
19 Citations
20 Claims
-
1. A method of training an audio monitoring system comprising:
-
receiving with a processor in the audio monitoring system first registration information for a first object in a first scene around a sound sensor in the audio monitoring system; training with the processor a first classifier for a first predetermined action of the first object in the first scene, the first predetermined action generating sound detected by the sound sensor; receiving with the processor second registration information for a second object in the first scene around the sound sensor; training with the processor a second classifier for a second predetermined action of the second object in the first scene, the second predetermined action generating sound detected by the sound sensor; receiving with the processor object relationship data corresponding to a relationship between the first object and the second object in the first scene; generating with the processor a specific scene grammar including a first sound event formed from with reference to a predetermined general scene grammar stored in a memory, the first registration information, the second registration information, and the object relationship data; and storing with the processor the specific scene grammar in the memory in association with the first classifier and the second classifier for identification of a subsequent occurrence of the first sound event including the first predetermined action of the first object and the second predetermined action of the second object. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of operating an audio monitoring system comprising:
-
generating with a sound sensor audio data corresponding to sound produced by an action performed by an object in a first scene around the sound sensor; identifying with a processor a type of object in the first scene that generated the sound with reference to the audio data; identifying with the processor the action taken by the object to generate a sound event with reference to the audio data; generating with the processor a timestamp corresponding to a time of the detection of the sound; updating with the processor a scene state model corresponding to a plurality of sound events generated by a plurality of objects in the first scene around the sound sensor with reference to the identified type of object, action taken by the object, and the timestamp; identifying with the processor one sound event in the plurality of sound events for the first scene with reference to the first scene state model and a predetermined scene grammar stored in a memory; and generating with the processor an output corresponding to the one sound event. - View Dependent Claims (9, 10, 11, 12)
-
-
13. An audio monitoring system comprising:
-
a sound sensor configured to generate audio data corresponding to sound produced by an action performed by an object in a first scene around the sound sensor; an output device; and a processor operatively connected to the sound sensor, the output device, and a memory, the processor being configured to; identifying a type of object in the first scene that generated the sound with reference to the audio data; identify the action taken by the object to generate a sound event with reference to the audio data; generate a timestamp corresponding to a time of the detection of the sound; update a scene state model corresponding to a plurality of sound events generated by a plurality of objects in the first scene around the sound sensor with reference to the identified type of object, action taken by the object, and the timestamp; identify one sound event in the plurality of sound events for the first scene with reference to the first scene state model and a predetermined scene grammar stored in the memory; and generate an output corresponding to the one sound event. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification