Indexing and searching speech with text meta-data
First Claim
Patent Images
1. A method of indexing a spoken document comprising speech data and text meta-data, the method comprising:
- generating information pertaining to recognized speech from the speech data, the information comprising probabilities of occurrence of words and positional information of the words in the recognized speech;
generating information pertaining to at least positional information of words in the text meta-data in substantially the same format as the information pertaining to recognized speech; and
building an index based on the information pertaining to recognized speech and the information pertaining to the text meta-data.
2 Assignments
0 Petitions
Accused Products
Abstract
An index for searching spoken documents having speech data and text meta-data is created by obtaining probabilities of occurrence of words and positional information of the words of the speech data and combining it with at least positional information of the words in the text meta-data. A single index can be created because the speech data and the text meta-data are treated the same and considered only different categories.
-
Citations
20 Claims
-
1. A method of indexing a spoken document comprising speech data and text meta-data, the method comprising:
-
generating information pertaining to recognized speech from the speech data, the information comprising probabilities of occurrence of words and positional information of the words in the recognized speech;
generating information pertaining to at least positional information of words in the text meta-data in substantially the same format as the information pertaining to recognized speech; and
building an index based on the information pertaining to recognized speech and the information pertaining to the text meta-data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-readable medium having computer-executable instructions for performing steps comprising:
-
receiving a search query;
searching an index for an entry associated with a word in the search query, the index comprising;
information pertaining to a document identifier for a spoken document having speech data and text meta-data;
a category type identifier identifying at least one of different types of speech data, and speech data relative to text meta-data; and
a position for the word, and a probability of the word appearing at the position, using the probabilities to rank spoken documents relative to each other; and
returning search results based on the ranked spoken documents. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A method of retrieving spoken documents based on a search query, the method comprising:
-
receiving the search query;
searching an index containing probabilities of positions for words generated from speech data in the spoken documents, the probabilities of positions for words referenced to different categories of speech data in the spoken document;
scoring each spoken document based on a set of probabilities for a word from the index for each category; and
returning search results based on the ranked spoken documents - View Dependent Claims (18, 19, 20)
-
Specification