Apparatus and method for retrieving a desired video/voice scenes using voice recognition
First Claim
1. A video retrieval data generation apparatus comprising:
- an extractor that is configured to extract a characteristic pattern from a voice signal synchronous with a video signal;
an index generator that is configured to set the voice signal for a voice period as a processing target, to prepare standard voice patterns of a subword, to detect, at each voice period, for each subword, a characteristic pattern similar to a standard voice pattern, and to generate, for each subword, an index containing time synchronization information corresponding to a position where the similar characteristic pattern is detected; and
a multiplexer that is configured to multiplex video signals, voice signals and indexes to output in a data stream format.
0 Assignments
0 Petitions
Accused Products
Abstract
A video retrieval data generation apparatus includes an extractor that is configured to extract a characteristic pattern from a voice signal synchronous with a video signal. The video retrieval data generation apparatus also includes an index generator that is configured to set the voice signal for a voice period as a processing target. The index generator is further configured to prepare standard voice patterns of a subword corresponding to a plurality of subwords, detect, for each subword, a characteristic pattern similar to a standard voice pattern at each of the voice periods, and generate, for each subword, an index containing time synchronization information corresponding to a position where the similar characteristic pattern is detected. The video retrieval data generation apparatus also includes a multiplexer that is configured to multiplex video signals, voice signals and indexes to output in a data stream format.
-
Citations
29 Claims
-
1. A video retrieval data generation apparatus comprising:
-
an extractor that is configured to extract a characteristic pattern from a voice signal synchronous with a video signal;
an index generator that is configured to set the voice signal for a voice period as a processing target, to prepare standard voice patterns of a subword, to detect, at each voice period, for each subword, a characteristic pattern similar to a standard voice pattern, and to generate, for each subword, an index containing time synchronization information corresponding to a position where the similar characteristic pattern is detected; and
a multiplexer that is configured to multiplex video signals, voice signals and indexes to output in a data stream format. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A video retrieval data generation method, comprising:
-
extracting a characteristic pattern from a voice signal synchronous with a video signal;
setting the voice signal for a voice period as a processing target, preparing standard voice patterns of a subword, detecting, at each voice period, for each subword, a characteristic pattern similar to a standard voice pattern, and generating, for each subword, an index containing time synchronization information corresponding to a position where the similar characteristic pattern is detected; and
multiplexing video signals, voice signals and indexes to output in a data stream format. - View Dependent Claims (8, 9, 10)
-
-
11. A video retrieval apparatus comprising:
-
a demultiplexer that is configured to demultiplex a data stream on which are multiplexed video signals, voice signals synchronous with the video signals, and indexes generated from the voice signals on a subword basis, into at least the indexes; and
a retrieval processor that is configured to obtain time information for an input keyword from a combination of the indexes to retrieve a desired video, wherein each of the indexes contains time synchronization information indicative of a position of a characteristic pattern in the voice signals, the characteristic pattern being similar to a standard voice pattern of a subword corresponding to each of the indexes. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A video retrieval method, comprising:
-
demultiplexing a data stream on which are multiplexed video signals, voice signal synchronous with the video signals, and indexes generated from the voice signals on a subword basis, into at least the indexes; and
obtaining time information for an input keyword from a combination of the indexes to retrieve a desired video, wherein each of the indexes contains time synchronization information indicative of a position of a characteristic pattern in the voice signals, the characteristic pattern being similar to a standard voice pattern of a subword corresponding to each of the indexes. - View Dependent Claims (20, 21)
-
-
22. A video recording apparatus comprising:
-
an extractor that is configured to extract a characteristic pattern from a voice signal synchronously input with a video signal in recording a video;
an index generator that is configured to set the voice signal for a voice period as a processing target, to prepare standard voice patterns on a subword basis, to detect, for each subword, a characteristic pattern similar to a standard voice pattern at each voice period, and to generate, for each subword, an index containing time synchronization information corresponding to a position where the similar characteristic pattern is detected;
a multiplexer that is configured to multiplex input video signals, input voice signals, and indexes, to output in a data stream format; and
a video storage medium in which a data stream output from the multiplexer is stored. - View Dependent Claims (24, 25)
-
-
23. A video recording apparatus comprising:
-
an extractor that is configured to extract a characteristic pattern from a voice signal synchronously input with a video signal in recording a video;
an index generator that is configured to generate packets on a time basis while maintaining a time series of extracted characteristic patterns so as to generate indexes where each of the packets contains time information;
a multiplexer that is configured to multiplex input video signals, input voice signals and indexes to output in a data stream format; and
a video storage medium in which a data stream output from the multiplexer is stored.
-
-
26. A video reproducing apparatus comprising:
-
a video storage that stores a data stream on which video signals, voice signals synchronous with the video signals, and indexes generated from the voice signals on a subword basis, are multiplexed, each of the indexes containing time synchronization information indicative of a position of a characteristic pattern in the voice signals, the characteristic pattern being similar to a standard voice pattern of a subword corresponding to each of the indexes;
a read processor that is configured to read the video signals and the voice signals from the video storage while maintaining synchronization in reproducing a video, and to demultiplex the indexes from the data stream stored in the video storage when a video retrieval instruction is given;
a key word convertor that is configured to convert an input key word into time-series data on a subword basis;
a key word collator that is configured to collate the time-series data of the input key word with the indexes to obtain time information of a period at which the time series data is similar to the indexes; and
a controller that is configured to instruct the read processor to read the video signals and the voice signals using a position specified by the obtained time information as a read beginning position.
-
-
27. A video reproducing apparatus comprising:
-
a video storage that stores a data stream on which video signals, voice signals synchronous with the video signals, and indexes packetized on a time basis while maintaining a time series of characteristic patterns extracted from the voice signals, are multiplexed, where each packet contains time information;
a read processor that is configured to read the video signals and the voice signals from the video storage while maintaining synchronization in reproducing a video, and to demultiplex the indexes from the data stream stored in the video storage when a video retrieval instruction is given;
a key word converter that is configured to convert an input key word into time-series data of the characteristic patterns;
a key word collator that is configured to collate the time series data of the input key word with the indexes to obtain time information of a period at which the time series data is similar to the indexes; and
a controller that is configured to instruct the read processor to read the video signal and the voice signal using a position specified by the obtained time information as a read beginning position.
-
-
28. A video reproducing method, comprising:
-
storing, in a video storage, a data stream on which are multiplexed video signals, voice signals synchronous with the video signals, and indexes generated from the voice signals on a subword basis, each of the indexes containing time synchronization information indicative of a position of a characteristic pattern that is similar to a standard voice pattern of a subword corresponding to each of the indexes;
reading the video signals and the voice signals from the video storage while maintaining synchronization in reproducing a video, and demultiplexing the indexes from the data stream stored in the video storage when a video retrieval instruction is given;
converting an input key word into time-series data on a subword basis;
collating the time-series data of the input key word with the indexes to obtain time information of a period at which the time series data is similar to the indexes; and
reading the video signal and the voice signal using a position specified by the obtained time information as a read beginning position.
-
-
29. A video reproducing method, comprising:
-
storing, in a video storage, a data stream on which are multiplexed video signals, voice signals synchronous with the video signals, and indexes packetized on a time basis while maintaining a time series of characteristic patterns extracted from the voice signals, where each packet contains time information;
reading the video signals and the voice signals from the video storage while maintaining synchronization in reproducing a video, and demultiplexing the indexes from the data stream stored in the video storage when a video retrieval instruction is given;
converting an input key word into time-series data of the characteristic patterns;
collating the time series data of the input key word with the indexes to obtain time information of a period at which the time series data is similar to the indexes; and
reading the video signal and the voice signal using a position specified by the obtained time information as a read beginning position.
-
Specification