Method and system for indexing and searching timed media information based upon relevance intervals
First Claim
1. A method of indexing and searching timed media files comprising the steps of:
- extracting data from timed media files, said extracted data comprising at least one information representation; and
calculating relevance intervals for each of said at least one information representation from said extracted data, said calculation of relevance intervals including a calculation of relevance interval start and end times that is dependent upon the information representation in question.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for indexing, searching, and retrieving information from timed media files based upon relevance intervals. The method and system for indexing, searching, and retrieving this information is based upon relevance intervals so that a portion of a timed media file is returned, which is selected specifically to be relevant to the given information representations, thereby eliminating the need for a manual determination of the relevance and avoiding missing relevant portions. The timed media includes streaming audio, streaming video, timed HTML, animations such as vector-based graphics, slide shows, other timed media, and combinations thereof.
-
Citations
174 Claims
-
1. A method of indexing and searching timed media files comprising the steps of:
-
extracting data from timed media files, said extracted data comprising at least one information representation; and
calculating relevance intervals for each of said at least one information representation from said extracted data, said calculation of relevance intervals including a calculation of relevance interval start and end times that is dependent upon the information representation in question. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174)
-
2. A method of indexing and searching timed media files, as recited in claim 1, comprising the steps of:
-
extracting data from timed media files, said extracted data comprising all of multiple information representations that are of sufficient importance;
calculating relevance intervals for each of said information representations, said calculation of relevance intervals for each information representation including a calculation[calculating] of relevance interval start and end times that is dependent upon each of said the information representation.
-
-
3. A method of indexing and searching timed media files, as recited in claim 2, wherein said calculation of relevance intervals for each of said information representations results in relevance intervals for different information representations that strictly overlap.
-
4. A method of indexing and searching timed media files, as recited in claim 1, wherein each said relevance interval is composed of one or more continuous sections of timed media.
-
5. A method of indexing and searching timed media files, as recited in claim 1, further comprising the step of creating a search index containing said relevance intervals.
-
6. A method of indexing and searching timed media files, as recited in claim 1, further comprising the step of extracting data from said timed media files.
-
7. A method of indexing and searching timed media files, as recited in claim 6, wherein said data extraction includes the extraction of said at least one information representation from the timed media file.
-
8. A method of indexing and searching timed media files, as recited in claim 6, wherein said data extraction comprises the step of performing speech recognition of said data on said timed media files.
-
9. A method of indexing and searching timed media files, as recited in claim 8, wherein said data extraction includes extracting an indication of the certainty of a correct word by said speech recognition.
-
10. A method of indexing and searching timed media files, as recited in claim 6, wherein said data extraction includes extracting time-code data indicating a time for each occurrence of said at least one information representation.
-
11. A method of indexing and searching timed media files, as recited in claim 6, wherein said data extraction comprises the step of performing optical character recognition on said data in said timed media files.
-
12. A method of indexing and searching timed media files, as recited in claim 11, wherein said data extraction includes extracting an indication of the certainty of a correct word by said optical character recognition.
-
13. A method of indexing and searching timed media files, as recited in claim 11, wherein said data extraction includes extracting the time-code corresponding to the time at which the optical characters are visible.
-
14. A method of indexing and searching timed media files, as recited in claim 6, wherein said data extraction includes extracting meta-data about text visible on-screen within the timed media files.
-
15. A method of indexing and searching timed media files, as recited in claim 14, wherein said meta-data includes the display position of said text visible on-screen within the timed media files.
-
16. A method of indexing and searching timed media files, as recited in claim 14, wherein said meta-data includes the orientation of said text visible on-screen within the timed media files.
-
17. A method of indexing and searching timed media files, as recited in claim 14, wherein said meta-data includes font characteristics of said text visible on-screen within the timed media files, such as font size, emphasis, spacing, and style.
-
18. A method of indexing and searching timed media files, as recited in claim 8, wherein said data extraction includes extracting an identification of a speaker of said at least one information representation.
-
19. A method of indexing and searching timed media files, as recited in claim 18, wherein said extracting an identification of a speaker includes extracting information describing said speaker such as the name, title, position, or organization of said speaker of said at least one information representation.
-
20. A method of indexing and searching timed media files, as recited in claim 1, wherein inputs include a transcript of the text spoken within a timed media file.
-
21. A method of indexing and searching timed media files, as recited in claim 20, further comprising the step of using speech recognition to synchronize the transcript with the time-code of the timed media file.
-
22. A method of indexing and searching timed media files, as recited in claim 20, further comprising the step of dividing the transcript into sentences based upon punctuation, capitalized words, and other formatting.
-
23. A method of indexing and searching timed media files, as recited in claim 8, further comprising the step of separating the output of said speech recognition into sentences.
-
24. A method of indexing and searching timed media files, as recited in claim 23, wherein said separation of the output of said speech recognition into sentences includes the analysis of words and phrases contained within the output.
-
25. A method of indexing and searching timed media files, as recited in claim 23, wherein said separation of the output of said speech recognition into sentences includes the analysis of grammatical information derived from the words and phrases contained within the output.
-
26. A method of indexing and searching timed media files, as recited in claim 23, wherein said separation of the output of said speech recognition into sentences includes the analysis of prosodic information contained within the output.
-
27. A method of indexing and searching timed media files, as recited in claim 1, wherein said calculation includes the processing of language contained within the timed media file.
-
28. A method of indexing and searching timed media files, as recited in claim 27, wherein said processed language includes text that is visible within the timed media file.
-
29. A method of indexing and searching timed media files, as recited in claim 27, wherein said processed language includes language that is spoken within the timed media file.
-
30. A method of indexing and searching timed media files, as recited in claim 1, comprising the further step of creating a raw data index containing data used in and produced by said calculation of at least one relevance interval.
-
31. A method of indexing and searching timed media files, as recited in claim 30, further comprising the steps
extracting data from said timed media files; - and
storing said extracted data in said raw data index.
- and
-
32. A method of indexing and searching timed media files, as recited in claim 30, wherein said calculation includes the processing of language contained within the timed media file, further comprising the step of storing results of said processing in said raw data index.
-
33. A method of indexing and searching timed media files, as recited in claim 28, wherein said processing of visible text includes dividing simultaneously visible text into logical elements based upon meta-data of said visible text.
-
34. A method of indexing and searching timed media files, as recited in claim 33, wherein said processing of visible text further comprises the step of determining the logical hierarchy of each set of said logical elements that are simultaneously visible based upon meta-data of said visible text.
-
35. A method of indexing and searching timed media files, as recited in claim 34, wherein said processing of visible text further comprises the step of combining said hierarchies by identifying logical elements that are very similar but not simultaneously visible.
-
36. A method of indexing and searching timed media files, as recited in claim 33, wherein
said processing of language further includes processing language that is spoken within the media file; - and
said processing of language that is spoken within the media file includes calculating what language was spoken in reference to at least one particular visible logical object.
- and
-
37. A method of indexing and searching timed media files, as recited in claim 36, wherein said calculation of what language was spoken in reference to at least one particular visible logical object includes statistical analysis of the occurrences within said spoken language of n-grams that occur within the said particular visible logical object.
-
38. A method of indexing and searching timed media files, as recited in claim 28, wherein a time interval is associated with at least one visible information representation based upon the time interval over which the logical object is visible.
-
39. A method of indexing and searching timed media files, as recited in claim 36, wherein a time interval is associated with at least one visible information representation based upon the time interval during which language is spoken in reference to the logical object that contains said visible information representation.
-
40. A method of indexing and searching timed media files, as recited in claim 27, wherein a part of speech is determined for at least one information representation.
-
41. A method of indexing and searching timed media files, as recited in claim 27, wherein said processing of language includes grammatical parsing of at least a portion of said language.
-
42. A method of indexing and searching timed media files, as recited in claim 27, wherein a lemmatized form is determined for at least one word in said language.
-
43. A method of indexing and searching timed media files, as recited in claim 29, wherein said processing of language that is spoken within the timed media file includes analyzing the logical structure of the language.
-
44. A method of indexing and searching timed media files, as recited in claim 43, wherein
language contained within the timed media file is segmented into sentences; - and
said analysis of the logical structure of the language includes using a rule-based system to identify sentences that indicate logical structure.
- and
-
45. A method of indexing and searching timed media files, as recited in claim 44, wherein said rule-based system identifies sentences that indicate logical structure based upon the inclusion within said sentences of combinations of words, phrases, parts of speech, and grammatical structures that indicate logical structure.
-
46. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes identifying indications of topic shift.
-
47. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes o identifying indications of semantic dependence, including intersentential anaphora.
-
48. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes identifying indications of topical lists.
-
49. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes identifying indications of continuation of topic.
-
50. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes identifying a specific logical relationship with the preceding or following language such as contrasting point, supporting reason, causal relationship, or example.
-
51. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes identifying indications of information concerning a previous, current, or subsequent speaker.
-
52. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes identifying indications of summary or conclusion content.
-
53. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes identifying indications of introductory content.
-
54. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes identifying indications of explicit mention of topics discussed within the media.
-
55. A method of indexing and searching timed media files, as recited in claim 43, wherein said analysis of the logical structure of the language includes identifying indications of sentences that do not contain substantive information.
-
56. A method of indexing and searching timed media files, as recited in claim 43, wherein said rule-based system for identifying sentences that contain indications of logical structure further comprises the step of labeling sentences according to the logical structure indications.
-
57. A method of indexing and searching timed media files, as recited in claim 27, further comprising the step of calculating a centrality number for at least one occurrence of at least one information representation contained within the media file.
-
58. A method of indexing and searching timed media files, as recited in claim 57, wherein
said processing of language contained within the timed media file includes segmenting said language into sentences; - and
said calculation of a centrality number of at least one occurrence of at least one information representation is based on the position of said occurrence within the grammatical structure of the sentence that contains that occurrence.
- and
-
59. A method of indexing and searching timed media files, as recited in claim 57, wherein said calculation of a centrality number includes the identification of phrases within sentences.
-
60. A method of indexing and searching timed media files, as recited in claim 59, further comprising the step of assigning a hierarchical structure to the phrases.
-
61. A method of indexing and searching timed media files, as recited in claim 57, wherein said calculation of a centrality number involves analysis of the verb argument structure within the sentence containing said at least one occurrence of at least one information representation.
-
62. A method of indexing and searching timed media files, as recited in claim 27, wherein said processing of language includes the filtering of language to exclude at least one word that does not carry semantic information, such as a determiner, preposition, or conjunction.
-
63. A method of indexing and searching timed media files, as recited in claim 1, further comprising the use of a quantitative model of semantic relatedness between at least one pair of information representations.
-
64. A method of indexing and searching timed media files, as recited in claim 63, wherein said quantitative model of semantic relatedness is built through the analysis of a corpus of spoken and/or written language.
-
65. A method of indexing and searching timed media files, as recited in claim 64, wherein said analysis includes calculating the frequency of co-occurrence of at least one pair of information representations.
-
66. A method of indexing and searching timed media files, as recited in claim 65, wherein said model of semantic relatedness includes a measure of the mutual information, or probability of co-occurrence of the pair of information representations relative to the probability of random co-occurrence of such a pair of information representations, between the pair of information representations.
-
67. A method of indexing and searching timed media files, as recited in claim 66, wherein frequency of co-occurrence is measured as the number of occurrences of both information representations within a certain textual or temporal distance.
-
68. A method of indexing and searching timed media files, as recited in claim 67, wherein
said analysis of a corpus includes filtering to exclude at least one word that does not carry semantic information, such as a determiner, preposition, or conjunction; - and
frequency of co-occurrence is measured as the number of occurrences of both information representations within a distance of a constant number of words that remain after filtering to exclude at least one word that does not carry semantic information.
- and
-
69. A method of indexing and searching timed media files, as recited in claim 64, wherein said analysis of a corpus includes the calculation of a centrality score for the occurrences of at least one information representation within the corpus.
-
70. A method of indexing and searching timed media files, as recited in claim 65, wherein
said analysis of a corpus includes the calculation of a centrality score for the occurrences of at least one information representation within the corpus; - and
said frequency of co-occurrence is calculated such that each occurrence of each information representation being considered is weighted according to its centrality score.
- and
-
71. A method of indexing and searching timed media files, as recited in claim 64, wherein said analysis of a corpus includes the lemmatization of occurrences of at least one information representation within the corpus.
-
72. A method of indexing and searching timed media files, as recited in claim 64, wherein said analysis of a corpus includes filtering to exclude at least one word that does not carry semantic information, such as a determiner, preposition, or conjunction.
-
73. A method of indexing and searching timed media files, as recited in claim 65, wherein if a pair of information representations that do not co-occur within the corpus of language, then the frequency of co-occurrence of the pair of information representations is considered to be the frequency that would occur in a true corpus if the information representations co-occur randomly given their individual frequency of occurrence.
-
74. A method of indexing and searching timed media files, as recited in claim 66, wherein if a pair of information representations co-occur so infrequently that the uncertainty in the calculated mutual information relative to the calculated mutual information is above an acceptable threshold, then the frequency of co-occurrence of the pair of information representations is considered to be the frequency that would occur in a true corpus if the information representations co-occur randomly given their individual frequency of occurrence.
-
75. A method of indexing and searching timed media files, as recited in claim 66, wherein if either or both of a pair of information representations occurs so infrequently that the uncertainty in the calculated mutual information is above an acceptable threshold, then the frequency of co-occurrence of the pair of information representations is considered to be the frequency that would occur in a true corpus if the information representations co-occur randomly given their individual frequency of occurrence.
-
76. A method of indexing and searching timed media files, as recited in claim 27, wherein said processing of language includes the calculation of one or more major topic shifts.
-
77. A method of indexing and searching timed media files, as recited in claim 76, wherein said calculation of major topic shifts includes maximizing a lexical cohesiveness function.
-
78. A method of indexing and searching timed media files, as recited in claim 77, wherein
said calculation of major topic shifts includes chunking language contained within the media file into short segments such as sentences; - and
said lexical cohesiveness function includes the pair-wise comparison of segments to determine a relatedness value between each pair of segments.
- and
-
79. A method of indexing and searching timed media files, as recited in claim 78, wherein said pair-wise comparison of segments is calculated for two segments A and B by performing a calculation that includes quantitatively comparing one or more information representations in segment A with each of one or more information representations in segment B to obtain a numerical value of relatedness between each of the information representations being compared.
-
80. A method of indexing and searching timed media files, as recited in claim 79, wherein said numerical value of relatedness between each of the information representations being compared takes into account whether the information representations being compared are equivalent, as in the case of identical phrases or synonyms.
-
81. A method of indexing and searching timed media files, as recited in claim 79, wherein said numerical value of relatedness between each of the information representations being compared takes into account the mutual information between the information representations being compared.
-
82. A method of indexing and searching timed media files, as recited in claim 79, wherein a subset of the numerical relatedness values between information representations in segment A and information representations in segment B are chosen such that the subset does not include more than one numerical relatedness value for any one occurrence of a particular information representation.
-
83. A method of indexing and searching timed media files, as recited in claim 82, wherein said pair-wise relatedness for two segments A and B includes the sum of the said subset of numerical relatedness values between information representations in segment A and information representations in segment B.
-
84. A method of indexing and searching timed media files, as recited in claim 78, wherein
a relatedness rank is calculated from each said relatedness value; - and
the relatedness rank for two segments A and B indicates the relative relatedness as compared to the relatedness of segments that are near A and B.
- and
-
85. A method of indexing and searching timed media files, as recited in claim 78, wherein said topic shifts are calculated via an iterative process including
calculating a sequence of lexical cohesiveness values, where each of said lexical cohesive values is calculated as if there were a topic shift at a particular potential topic shift location; - and
inserting a topic boundary at the potential topic shift location with the maximum calculated lexical cohesiveness value.
- and
-
86. A method of indexing and searching timed media files, as recited in claim 84, wherein said lexical cohesive function is the inside density of the relatedness rank of the subset of segments.
-
87. A method of indexing and searching timed media files, as recited in claim 76, wherein each topic shift that is identified is rated by a boundary sharpness measure that includes
a measurement of the lexical cohesiveness of nearby segments on either side of the topic shift; - and
a measurement of the lexical difference between nearby segments on opposite sides of the topic shift.
- and
-
88. A method of indexing and searching timed media files, as recited in claim 76, wherein
each topic shift that is identified is rated by a boundary sharpness measure; -
said boundary sharpness measure is calculated for all potential topic shift locations; and
local maxima of the boundary sharpness measure are identified.
-
-
89. A method of indexing and searching timed media files, as recited in claim 88, wherein identified topic shifts that are not sufficiently near a local maximum of the boundary sharpness measure are treated as though they are not topic shifts.
-
90. A method of indexing and searching timed media files, as recited in claim 88, wherein identified topic shifts that are not at a local maximum of the boundary sharpness measure but are sufficiently near a local maximum of the boundary sharpness measure are treated as though they are located at the nearby local maximum of the boundary sharpness measure.
-
91. A method of indexing and searching timed media files, as recited in claim 76, wherein
each topic shift that is identified is rated by a boundary sharpness measure; - and
topic shifts are identified in an iterative process that is terminated based upon an analysis of the boundary sharpness measure of topic shifts identified by said iterative process.
- and
-
92. A method of indexing and searching timed media files, as recited in claim 88, wherein
topic shifts are identified in an iterative process that is terminated based upon an analysis of the boundary sharpness measure of topic shifts identified by said iterative process; - and
said iterative process is terminated if more than a maximum number of consecutive identified topic shifts are determined to be not sufficiently near a local maximum of the boundary sharpness measure.
- and
-
93. A method of indexing and searching timed media files, as recited in claim 91, wherein said iterative process is not terminated before a minimum number of topic shifts have been identified that are sufficiently near a local maximum of the boundary sharpness measure.
-
94. A method of indexing and searching timed media files, as recited in claim 88, wherein
topic shifts are identified in an iterative process that is terminated based upon an analysis of the boundary sharpness measure of topic shifts identified by said iterative process; - and
said iterative process is terminated if more than a maximum percentage of the identified topic shifts after any given number of iterations are determined to be not sufficiently near a local maximum of the boundary sharpness measure.
- and
-
95. A method of indexing and searching timed media files, as recited in claim 27, wherein said processing of language includes the identification of named entities within the language.
-
96. A method of indexing and searching timed media files, as recited in claim 95, wherein said processing of language further comprises the step of classification of named entities by type.
-
97. A method of indexing and searching timed media files, as recited in claim 95, wherein said processing of language further comprises the step of identifying multiple information representations that are referring to the same named entity.
-
98. A method of indexing and searching timed media files, as recited in claim 95, wherein
a part of speech is determined for at least one information representation; - and
said identification of named entities includes the use of rules for identifying named entities based upon parts of speech.
- and
-
99. A method of indexing and searching timed media files, as recited in claim 95, wherein
said processing of language includes grammatical parsing of at least a portion of said language; - and
said identification of named entities includes the use of rules for identifying named entities based upon grammatical parsing information.
- and
-
100. A method of indexing and searching timed media files, as recited in claim 95, wherein said identification of named entities includes checking language against lists of named entities.
-
101. A method of indexing and searching timed media files, as recited in claim 95, wherein said identification of named entities includes checking language against lists of words that are sometimes a part of named entities.
-
102. A method of indexing and searching timed media files, as recited in claim 101, wherein said lists of words that are sometimes a part of named entities include an indication of the likelihood that each word is a part of a named entity.
-
103. A method of indexing and searching timed media files, as recited in claim 102, wherein said lists of words that are sometimes a part of named entities include multiple indications of the likelihood that indicate the likelihood that each word is a part of a named entity in each of multiple registers.
-
104. A method of indexing and searching timed media files, as recited in claim 95, wherein said identification of named entities includes checking language against lists of suffixes and prefixes to named entities.
-
105. A method of indexing and searching timed media files, as recited in claim 96, wherein said classification of named entities by type includes the use of rules based upon semantic information related to the words contained in the identified named entity.
-
106. A method of indexing and searching timed media files, as recited in claim 97, wherein said identification of multiple information representations that are referring to the same named entity includes the use of rules concerning common manipulations of specific types of named entities.
-
107. A method of indexing and searching timed media files, as recited in claim 97, wherein said identification of multiple information representations that are referring to the same named entity includes the use of lists of words that are synonyms when used as part of named entities.
-
108. A method of indexing and searching timed media files, as recited in claim 97, wherein
said processing of language includes the calculation of one or more major topic shifts; - and
said identification of multiple information representations that are referring to the same named entity includes the use of rules concerning topic shifts.
- and
-
109. A method of indexing and searching timed media files, as recited in claim 97, wherein said processing of language further comprises the steps of
classification of named entities by type; - and
creating a co-reference table of named entities within the media file that includes the classification by type of each occurrence of each named entity and indicates multiple occurrences that refer to the same named entity.
- and
-
110. A method of indexing and searching timed media files, as recited in claim 27, wherein said processing of language includes the identification of anaphora within the language.
-
111. A method of indexing and searching timed media files, as recited in claim 110, wherein said identification of anaphora within the language includes identifying pronouns.
-
112. A method of indexing and searching timed media files, as recited in claim 110, wherein said identification of anaphora within the language includes identifying definite references.
-
113. A method of indexing and searching timed media files, as recited in claim 110, wherein said identification of anaphora within the language includes identifying indirect anaphora or implicit references.
-
114. A method of indexing and searching timed media files, as recited in claim 111, wherein said processing of language further comprises the step of identifying and excluding from further analysis non-referential occurrences of pronouns such as complementizers.
-
115. A method of indexing and searching timed media files, as recited in claim 110, wherein said processing of language further comprises the step of determining the antecedents of identified anaphora.
-
116. A method of indexing and searching timed media files, as recited in claim 115, wherein
said processing of language includes the calculation of one or more major topic shifts; - and
said determination of the antecedents of identified anaphora includes the use of major topic shifts.
- and
-
117. A method of indexing and searching timed media files, as recited in claim 115, wherein
said processing of language includes the identification of named entities within the language; - and
said determination of the antecedents of identified anaphora includes the use of named entities that have been identified within the language.
- and
-
118. A method of indexing and searching timed media files, as recited in claim 115, wherein said determination of the antecedents of identified anaphora includes the use of ontological information.
-
119. A method of indexing and searching timed media files, as recited in claim 115, wherein
a part of speech is determined for at least one information representation; - and
said determination of the antecedents of identified anaphora includes the use of part of speech information.
- and
-
120. A method of indexing and searching timed media files, as recited in claim 117, wherein
said processing of language further comprises the step of classification of named entities by type; - and
said determination of the antecedents of identified anaphora includes the use of the classification of named entities by type.
- and
-
121. A method of indexing and searching timed media files, as recited in claim 118, wherein said determination of the antecedents of identified anaphora includes the filtering of potential antecedents of personal pronouns using an ontology to determine if each potential antecedent could represent a human.
-
122. A method of indexing and searching timed media files, as recited in claim 115, wherein said determination of the antecedents of identified anaphora includes the filtering of potential antecedents of personal pronouns using an ontology or lexicon to determine the gender of the potential antecedents.
-
123. A method of indexing and searching timed media files, as recited in claim 115, wherein said determination of the antecedents of identified anaphora includes the use of an ontology or lexicon to determine whether potential antecedents take a singular or plural anaphor.
-
124. A method of indexing and searching timed media files, as recited in claim 115, wherein
said processing of language includes grammatical parsing of at least a portion of said language; -
said determination of the antecedents of identified anaphora includes the identification of one or more grammatical constraints on one or more identified anaphora; and
said determination of the antecedents of the antecedents of identified anaphora further comprises the step of filtering potential antecedents according to the grammatical constraints.
-
-
125. A method of indexing and searching timed media files, as recited in claim 115, wherein
said processing of language includes grammatical parsing of at least a portion of said language; - and
said determination of the antecedents of identified anaphora includes the comparison of the grammatical role of an anaphora and the grammatical role of its potential antecedents.
- and
-
126. A method of indexing and searching timed media files, as recited in claim 115, wherein said determination of the antecedents of identified anaphora includes
the identification of one or more semantic constraints on one or more identified anaphora; - and
filtering potential antecedents according to the semantic constraints.
- and
-
127. A method of indexing and searching timed media files, as recited in claim 118, wherein
said identification of anaphora within the language includes identifying definite references; - and
said determination of the antecedents of identified definite references includes the use of an ontology to filter the potential antecedents of each definite reference according to whether the potential antecedent is an example of the definite reference.
- and
-
128. A method of indexing and searching timed media files, as recited in claim 118, wherein
said identification of anaphora within the language includes identifying indirect references; - and
said determination of the antecedents of identified definite references includes the use of an ontology to filter the potential antecedents of each definite reference according to whether the potential antecedent implies the existence of the indirect referent.
- and
-
129. A method of indexing and searching timed media files, as recited in claim 115, wherein said determination of the antecedents of identified anaphora includes the consideration of the distance between an anaphora and its potential antecedents.
-
130. A method of indexing and searching timed media files, as recited in claim 115, wherein
said processing of language further comprises the step of calculating a centrality number for at least one occurrence of at least one information representation contained within the media file; - and
said determination of the antecedents of identified anaphora includes the consideration of the centrality of the anaphor and the centrality of the potential antecedents.
- and
-
131. A method of indexing and searching timed media files, as recited in claim 27, wherein said processing of language includes the disambiguation of at least one word or phrase.
-
132. A method of indexing and searching timed media files, as recited in claim 1, wherein said calculation of at least one relevance interval for a given information representation includes the consideration of the temporal distribution of occurrences of said information representation.
-
133. A method of indexing and searching timed media files, as recited in claim 132, wherein said occurrences of the information representation include occurrences of synonyms and other information representations that have very similar meanings.
-
134. A method of indexing and searching timed media files, as recited in claim 97, wherein
said calculation of at least one relevance interval for a given information representation includes the consideration of the temporal distribution of occurrences of said information representation; - and
said occurrences of the information representation include occurrences of one or more named entities that refer to the information representation.
- and
-
135. A method of indexing and searching timed media files, as recited in claim 115, wherein
said calculation of at least one relevance interval for a given information representation includes the consideration of the temporal distribution of occurrences of said information representation; - and
said occurrences of the information representation include occurrences of one or more anaphora that have been determined to refer to the information representation.
- and
-
136. A method of indexing and searching timed media files, as recited in claim 132, wherein
said processing of language includes the determination of sentences within the language; - and
said calculation of relevance intervals includes considering the initial relevance intervals for each spoken information representation to be the sentences that contain occurrences of said spoken information representation.
- and
-
137. A method of indexing and searching timed media files, as recited in claim 38, wherein
said processing of language includes the association of a time interval with at least one information representation that is contained in visible text; - and
said calculation of relevance intervals includes considering the initial relevance intervals for each information representation that is contained in visible text to be the time interval associated with said information representation.
- and
-
138. A method of indexing and searching timed media files, as recited in claim 1, wherein said calculation of at least one relevance interval for a given information representation includes the combining of any adjacent or overlapping relevance interval.
-
139. A method of indexing and searching timed media files, as recited in claim 49, wherein said calculation of at least one relevance interval for a given information representation includes the expansion of relevance intervals based on logical structure cues.
-
140. A method of indexing and searching timed media files, as recited in claim 139, wherein said expansion of relevance intervals based on logical structure cues includes expansion to include information that is required to understand the context of the said information representation.
-
141. A method of indexing and searching timed media files, as recited in claim 139, wherein said expansion of relevance intervals based on logical structure cues includes expansion to include information that is referring to the said information representation.
-
142. A method of indexing and searching timed media files, as recited in claim 63, wherein said calculation of at least one relevance interval for a given information representation includes the expansion of relevance intervals to include content that contains information representations that have a sufficiently high degree of semantic relatedness with the said information representation.
-
143. A method of indexing and searching timed media files, as recited in claim 76, wherein said calculation of at least one relevance interval for a given information representation includes the determination of whether to expand relevance intervals to begin or end at topic shifts.
-
144. A method of indexing and searching timed media files, as recited in claim 76, wherein said calculation of at least one relevance interval for a given information representation includes the determination of whether to expand relevance intervals to entire topic segments.
-
145. A method of indexing and searching timed media files, as recited in claim 1, further comprising the step of calculating a magnitude of relevance for at least one relevance interval.
-
146. A method of indexing and searching timed media files, as recited in claim 145, wherein said calculation of a magnitude of relevance for a relevance interval includes calculating the number of occurrences of the indexing information representation.
-
147. A method of indexing and searching timed media files, as recited in claim 145, wherein said calculation of a magnitude of relevance for a relevance interval includes calculating the number of anaphora that refer to the indexing information representation.
-
148. A method of indexing and searching timed media files, as recited in claim 145, wherein said calculation of a magnitude of relevance for a relevance interval includes calculating the length of the relevance interval.
-
149. A method of indexing and searching timed media files, as recited in claim 145, wherein said calculation of a magnitude of relevance for a relevance interval includes the use of a degree of semantic relatedness between the indexing information representation and other information representations occurring within the relevance interval.
-
150. A method of indexing and searching timed media files, as recited in claim 5, further comprising the steps of
calculating a magnitude of relevance for each said relevance interval; - and
storing said magnitude of relevance for each said relevance interval in the said search index.
- and
-
151. A method of indexing and searching timed media files, as recited in claim 145, wherein said magnitudes of relevance are measures of a degree to which said relevance interval is relevant to said at least one information representation.
-
152. A method of indexing and searching timed media files, as recited in claim 1, further comprising the step of creating at least one virtual document comprising a collection of at least one of said relevance intervals.
-
153. A method of indexing and searching timed media files, as recited in claim 152, wherein said collection of at least one of said relevance intervals contains relevance intervals from multiple media files.
-
154. A method of indexing and searching timed media files, as recited in claim 152, further comprising the step of creating a search index containing said at least one virtual document.
-
155. A method of indexing and searching timed media files, as recited in claim 152, further comprising the step of calculating a magnitude of relevance for each said at least one virtual document.
-
156. A method of indexing and searching timed media files, as recited in claim 155, further comprising the steps of
calculating a magnitude of relevance for each said virtual document; - and
storing said magnitude of relevance for each said virtual document in said search index.
- and
-
157. A method of indexing and searching timed media files, as recited in claim 155, wherein said magnitudes of relevance are measures of a degree to which said virtual document is relevant to said at least one information representation.
-
158. A method of indexing and searching timed media files, as recited in claim 1, further comprising the step of receiving at least one query information representation.
-
159. A method of indexing and searching timed media files, as recited in claim 158, further comprising the step of determining if at least two query information representations have been input.
-
160. A method of indexing and searching timed media files, as recited in claim 159, wherein when at least two query information representations have been input, further comprises the step of determining if a user has requested a degree of accuracy higher than the default.
-
161. A method of indexing and searching timed media files, as recited in claim 160, wherein when said user has not requested a search with a degree of accuracy higher than the default, further comprises the steps of
creating a search index containing at least one relevance interval or virtual document; - and
comparing said at least two query information representations to said search index to perform step of creating at least one virtual document.
- and
-
162. A method of indexing and searching timed media files, as recited in claim 160, wherein when said user has requested a search with a degree of accuracy higher than the default, further comprises the steps of
creating a raw data index containing data used in and produced by said calculation of at least one relevance interval comparing said at least two query information representations to said raw data index to perform step of creating at least one virtual document. -
163. A method of indexing and searching timed media files, as recited in claim 159, wherein when at least two query information representations have been input, further comprises the step of calculating a merged virtual document that is relevant to all of the said at least two query information representations.
-
164. A method of indexing and searching timed media files, as recited in claim 152, wherein said virtual documents are displayed.
-
165. A method of indexing and searching timed media files, as recited in claim 164, wherein said display of said virtual documents includes playing the said collection of relevance intervals back-to-back.
-
166. A method of indexing and searching timed media files, as recited in claim 164, wherein the viewer can expand the display of said virtual documents to include playing additional portions of the timed media file.
-
167. A method of indexing and searching timed media files, as recited in claim 155, wherein a ranking of said virtual documents according to said magnitudes of relevance is displayed.
-
168. A method of indexing and searching timed media files, as recited in claim 1, further comprising the step of displaying at least one relevance interval to the user.
-
169. A method of indexing and searching timed media files, as recited in claim 145, wherein a ranking of said relevance intervals according to said magnitudes of relevance is displayed.
-
170. A method of indexing and searching timed media files, as recited in claim 163, wherein said calculation of a merged virtual document includes the calculation of a relevant virtual document and a highly relevant virtual document.
-
171. A method of indexing and searching timed media files, as recited in claim 163, wherein said calculation of a merged virtual document includes the calculation of the intersection of the virtual documents associated with each of the query information representations for a given media file.
-
172. A method of indexing and searching timed media files, as recited in claim 171, wherein said calculation of a merged virtual document includes the calculation of the minimal expansion of the intersection of the virtual documents associated with each of the query information representations for a given media file such that the expanded intersection includes at least one occurrence of each query information representation.
-
173. A method of indexing and searching timed media files, as recited in claim 163, wherein said calculation of a merged virtual document includes the calculation of the union of the virtual documents associated with each of the query information representations for a given media file.
-
174. A method of indexing and searching timed media files, as recited in claim 163, further comprising the step of calculating a relevance magnitude for the said merged virtual document.
-
2. A method of indexing and searching timed media files, as recited in claim 1, comprising the steps of:
-
Specification
- Resources
-
Current AssigneeComcast Cable Communications Management LLC (Comcast Corporation)
-
Original AssigneeStreamSage Inc (Comcast Corporation)
-
InventorsRubinoff, Robert, Sibley, Tim V., Aveni-Deforge, Kyle, Davis, Anthony Ruiz, Unger, Noam Carl, Morton, Michael Scott
-
Granted Patent
-
Time in Patent OfficeDays
-
Field of Search
-
US Class Current1/1
-
CPC Class CodesG06F 16/245 Query processingG06F 16/31 Indexing; Data structures t...G06F 16/3344 using natural language anal...G06F 16/41 Indexing; Data structures t...G06F 16/435 Filtering based on addition...G06F 16/447 Temporal browsing, e.g. tim...G06F 16/7844 using original textual cont...G06F 40/205 ParsingG06F 40/253 Grammatical analysis; Style...G06F 40/263 Language identificationY10S 707/913 MultimediaY10S 707/99931 Database or file accessingY10S 707/99935 Query augmenting and refini...