Methods and apparatus for semantic unit based automatic indexing and searching in data archive systems
First Claim
1. A method of processing audio-based data associated with a particular language, the method comprising the steps of:
- storing the audio-based data;
generating a textual representation of the audio-based data, the textual representation being in the form of one or more semantic units corresponding to the audio-based data, wherein each of at least a portion of the one or more semantic units comprise a sub-unit of a word and not a complete word itself; and
indexing the one or more semantic units and storing the one or more indexed semantic units for use in searching the stored audio-based data in response to a user query, wherein at least one segment of the stored audio-based data is retrievable by obtaining a location indicative of where the at least one segment is stored from a direct correspondence between at least one of the indexed semantic units and the at least one segment.
2 Assignments
0 Petitions
Accused Products
Abstract
An audio-based data indexing and retrieval system for processing audio-based data associated with a particular language, comprising: (i) memory for storing the audio-based data; (ii) a semantic unit based speech recognition system for generating a textual representation of the audio-based data, the textual representation being in the form of one or more semantic units corresponding to the audio-based data; (iii) an indexing and storage module, operatively coupled to the semantic unit based speech recognition system and the memory, for indexing the one or more semantic units and storing the one or more indexed semantic units; and (iv) a search engine, operatively coupled to the indexing and storage module and the memory, for searching the one or more indexed semantic units for a match with one or more semantic units associated with a user query, and for retrieving the stored audio based data based on the one or more indexed semantic units. The semantic unit may preferably be a syllable or morpheme. Further, the invention is particularly well suited for use with Asian and Slavic languages.
92 Citations
29 Claims
-
1. A method of processing audio-based data associated with a particular language, the method comprising the steps of:
-
storing the audio-based data; generating a textual representation of the audio-based data, the textual representation being in the form of one or more semantic units corresponding to the audio-based data, wherein each of at least a portion of the one or more semantic units comprise a sub-unit of a word and not a complete word itself; and indexing the one or more semantic units and storing the one or more indexed semantic units for use in searching the stored audio-based data in response to a user query, wherein at least one segment of the stored audio-based data is retrievable by obtaining a location indicative of where the at least one segment is stored from a direct correspondence between at least one of the indexed semantic units and the at least one segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. Apparatus for processing audio-based data associated with a particular language, the apparatus comprising:
-
a memory; and at least one processor coupled to the memory and operative to;
(i) store the audio-based data in the memory;
(ii) generate a textual representation of the audio-based data, the textual representation being in the form of one or more semantic units corresponding to the audio-based data, wherein each of at least a portion of the one or more semantic units comprise a sub-unit of a word and not a complete word itself; and
(iii) index the one or more semantic units and store the one or more indexed semantic units for use in searching the stored audio-based data in response to a user query, wherein at least one segment of the stored audio-based data is retrievable by obtaining a location indicative of where the at least one segment is stored from a direct correspondence between at least one of the indexed semantic units and the at least one segment.
-
-
29. An audio-based data indexing and retrieval system for processing audio-based data associated with a particular language, the system comprising:
-
memory for storing the audio-based data; a semantic unit based speech recognition system for generating a textual representation of the audio-based data, the textual representation being in the form of one or more semantic units corresponding to the audio-based data, wherein each of at least a portion of the one or more semantic units comprise a sub-unit of a word and not a complete word itself; an indexing and storage module, operatively coupled to the semantic unit based speech recognition system and the memory, for indexing the one or more semantic units and storing the one or more indexed semantic units; and a search engine, operatively coupled to the indexing and storage module and the memory, for searching the one or more indexed semantic units for a match with one or more semantic units associated with a user query, and for retrieving the stored audio based data based on the one or more indexed semantic units, wherein at least one segment of the stored audio-based data is retrievable by obtaining a location indicative of where the at least one segment is stored from a direct correspondence between at least one of the indexed semantic units and the at least one segment.
-
Specification