Language recognition using sequence frequency
First Claim
Patent Images
1. A comparison apparatus comprising:
- a receiver operable to receive first and second sequences of labels;
an identifier operable to identify a plurality of different first sub-sequences of labels within said first sequence of labels;
a first determiner operable to determine and to output the number of times each of said different first sub-sequences occurs within said first sequence of labels;
a definer operable to define a plurality of second sub-sequences of labels from said second sequence of labels;
a second determiner operable to determine and to output the number of times each of said different first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second sub-sequence of labels; and
a similarity measure calculator operable to calculate a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determiner with the numbers output from said second determiner;
wherein said second determiner comprises;
a comparator operable to compare a current first sub-sequence of labels with each second sub-sequence of labels using predetermined data including confusion information which defines confusability between different labels, to provide a set of sub-sequence similarity measures; and
a counter operable to count the number of times the current first sub-sequence of labels occurs within the second sequence of labels in dependence upon the set of sub-sequence similarity measures provided by said comparator for the current first sub-sequence of labels.
1 Assignment
0 Petitions
Accused Products
Abstract
A system is provided for comparing an input query with a number of stored annotations to identify information to be retrieved from a database. The comparison technique divides the input query into a number of fixed-size fragments and identifies how many times each of the fragments occurs within each annotation using a dynamic programming matching technique. The frequencies of occurrence of the fragments in both the query and the annotation are then compared to provide a measure of the similarity between the query and the annotation. The information to be retrieved is then determined from the similarity measures obtained for all the annotations.
-
Citations
130 Claims
-
1. A comparison apparatus comprising:
-
a receiver operable to receive first and second sequences of labels;
an identifier operable to identify a plurality of different first sub-sequences of labels within said first sequence of labels;
a first determiner operable to determine and to output the number of times each of said different first sub-sequences occurs within said first sequence of labels;
a definer operable to define a plurality of second sub-sequences of labels from said second sequence of labels;
a second determiner operable to determine and to output the number of times each of said different first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second sub-sequence of labels; and
a similarity measure calculator operable to calculate a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determiner with the numbers output from said second determiner;
wherein said second determiner comprises;
a comparator operable to compare a current first sub-sequence of labels with each second sub-sequence of labels using predetermined data including confusion information which defines confusability between different labels, to provide a set of sub-sequence similarity measures; and
a counter operable to count the number of times the current first sub-sequence of labels occurs within the second sequence of labels in dependence upon the set of sub-sequence similarity measures provided by said comparator for the current first sub-sequence of labels. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A comparison apparatus comprising:
-
a receiver operable to receive first and second sequences of labels;
an identifier operable to identify a plurality of different first sub-sequences of labels within said first sequence of labels;
a first determiner operable to determine the number of times each of said different first sub-sequences occurs within said first sequence of labels;
a second determiner operable to determine the number of times each of said different first sub-sequences occurs within said second sequence of labels; and
a similarity measure calculator operable to calculate a similarity score measure representative of the similarity between the first and second sequences of labels using the numbers obtained from said first and second determiners;
wherein the apparatus further comprises a third determiner operable to determine the total number of sub-sequences of labels in said second sequence; and
in thatsaid similarity score calculator comprises;
a first sub-calculator operable to calculate a measure of the probability of each of said first sub-sequences occurring in said second sequence of labels using the numbers obtained from said second determiner and the number obtained from said third determiner; and
a second sub-calculator operable to calculate said similarity score by taking products of said computed probability measures in dependence upon said numbers obtained from said first determiner. - View Dependent Claims (47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 102)
-
-
58. A comparison method comprising the steps of:
-
receiving first and second sequences of labels;
identifying a plurality of different first sub-sequences of labels within said first sequence of labels;
a first determining step of determining and outputting the number of times each of said different first sub-sequences occurs within said first sequence of labels;
defining a plurality of second sub-sequences of labels from said second sequence of labels;
a second determining step of determining and outputting the number of times each of said different first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second sub-sequence of labels; and
computing a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determining step with the numbers output from said second determining step;
wherein said second determining step comprises the steps of;
comparing a current first sub-sequence of labels with each second sub-sequence of labels using predetermined data including confusion information which defines confusability between different labels, to provide a set of sub-sequence similarity measures; and
counting the number of times the current first sub-sequence of labels occurs within the second sequence of labels in dependence upon the set of sub-sequence similarity measures provided by the comparing step for the current first sub-sequence of labels. - View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 103, 104)
-
-
105. A comparison method comprising the steps of:
-
receiving first and second sequences of labels;
identifying a plurality of different first sub-sequences of labels within said first sequence of labels;
a first obtaining step of obtaining the number of times each of said different first sub-sequences occurs within said first sequence of labels;
a second obtaining step of obtaining the number of times each of said different first sub-sequences occurs within said second sequence of labels; and
computing a similarity score representative of the similarity between the first and second sequences of labels using the numbers obtained from said first and second obtaining steps;
wherein the method further comprises a third obtaining step of obtaining the total number of sub-sequences of labels in said second sequence; and
in thatsaid computing step comprises;
a first computing step of computing a measure of the probability of each of said first sub-sequences occurring in said second sequence of labels using the numbers obtained from said second obtaining step and the number obtained from said third obtaining step; and
a second computing step of computing said similarity score by taking products of said computed probability measures in dependence upon said numbers obtained from said first obtaining step. - View Dependent Claims (106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)
-
-
117. A computer readable medium storing processor implementable process steps for the carrying out a comparison method, the process steps comprising:
-
a step of receiving first and second sequences of labels;
a step of identifying a plurality of different first sub-sequences of labels within said first sequence of labels;
a first determining step of determining and outputting the number of times each of said different first sub-sequences occurs within said first sequence of labels;
a step of defining a plurality of second sub-sequences of labels from said second sequence of labels;
a second determining step of determining and outputting the number of times each of said different first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second sub-sequence of labels; and
a step of computing a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determining step with the numbers output from said second determining step;
wherein said second determining step comprises;
a step of comparing a current first sub-sequence of labels with each second sub-sequence of labels using predetermined data including confusion information which defines confusability between different labels, to provide a set of sub-sequence similarity measures; and
a step of counting the number of times the current first sub-sequence of labels occurs within the second sequence of labels in dependence upon the set of sub-sequence similarity measures provided by the comparing step for the current first sub-sequence of labels. - View Dependent Claims (119)
-
-
118. A computer readable medium storing processor implementable process steps for carrying out a comparison method, the process steps comprising:
-
a step of receiving first and second sequences of labels;
a step of identifying a plurality of different first sub-sequences of labels within said first sequence of labels;
a first obtaining step of obtaining the number of times each of said different first sub-sequences occurs within said first sequence of labels;
a second obtaining step of obtaining the number of times each of said different first sub-sequences occurs within said second sequence of labels; and
a step of computing a similarity score representative of the similarity between the first and second sequences of labels using the numbers obtained from said first and second obtaining steps;
wherein the process steps further comprise a third obtaining step of obtaining the total number of sub-sequences of labels in said second sequence; and
in thatsaid computing step comprises;
a first computing step of computing a measure of the probability of each of said first sub-sequences occurring in said second sequence of labels using the numbers obtained from said second obtaining step and the number obtained from said third obtaining step; and
a second computing step of computing said similarity score by taking products of said computed probability measures in dependence upon said numbers obtained from said first obtaining step.
-
-
120. Processor implementable instructions for carrying out a comparison method, the process steps comprising:
-
a step of receiving first and second sequences labels;
a step of identifying a plurality of different first sub-sequences of labels within said first sequence labels;
a first determining step of determining and outputting the number of times each of said different first sub-sequences occurs within said first sequence of labels;
a step of defining a plurality of second sub-sequences of labels from said second sequence of labels;
a second determining step of determining and outputting the number of times each of said different first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second sub-sequence of labels; and
a step of computing a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determining step with the numbers output from said second determining step;
wherein said second determining step comprises;
a step of comparing a current first sub-sequence of labels with each second sub-sequence of labels using predetermined data including confusion information which defines confusability between different labels, to provide a set of sub-sequence similarity measures; and
counting the number of times the current first sub-sequence of labels occurs within the second sequence of labels in dependence upon the set of sub-sequence similarity measures provided by the comparing step for the current first sub-sequence of labels. - View Dependent Claims (122)
-
-
121. Processor implementable instructions for carrying out a comparison method, comprising:
-
a step of receiving first and second sequences of labels;
a step of identifying a plurality of different first sub-sequences of labels within said first sequence of labels;
a first obtaining step of obtaining the number of times each of said different first sub-sequences occurs within said first sequence of labels;
a second obtaining step of obtaining the number of times each of said different first sub-sequences occurs within said second sequence of labels; and
a step of computing a similarity score representative of the similarity between the first and second sequences of labels using the numbers obtained from said first and second obtaining steps;
wherein the process steps further comprise a third obtaining step of obtaining the total number of sub-sequences of labels in said second sequence; and
in thatsaid computing step comprises;
a first computing step of computing a measure of the probability of each of said first sub-sequences occurring in said second sequence of labels using the numbers obtained from said second obtaining step and the number obtained from said third obtaining step; and
a second computing step of computing said similarity score by taking products of said computed probability measures in dependence upon said numbers obtained from said first obtaining step.
-
-
123. A comparison apparatus comprising:
-
means for receiving first and second sequences of labels;
means for identifying a plurality of different first sub-sequences of labels within said first sequence of labels;
first determining means for determining and outputting the number of times each of said different first sub-sequences occurs within said first sequence of labels;
means for defining a plurality of second sub-sequences of labels from said second sequence of labels;
second determining means for determining and outputting the number of times each of said different first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second sub-sequence of labels; and
means for computing a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determining means with the numbers output from said second determining means;
wherein said second determining means comprises;
means for comparing a current first sub-sequence of labels with each second sub-sequence of labels using predetermined data including confusion information which defines confusability between different labels, to provide a set of intermediate similarity measures; and
means for counting the number of times the current first sub-sequence of labels occurs within the second sequence of labels in dependence upon the set of intermediate similarity measures provided by said comparing means for the current first sub-sequence of labels. - View Dependent Claims (124, 125)
-
-
126. A comparison apparatus comprising:
-
means for receiving first and second sequences of labels;
means for identifying a plurality of different first sub-sequences of labels within said first sequence of labels;
first obtaining means for obtaining the number of times each of said different first sub-sequences occurs within said first sequence of labels;
second obtaining means for obtaining the number of times each of said different first sub-sequences occurs within said second sequence of labels; and
means for computing a similarity score representative of the similarity between the first and second sequences of labels using the numbers obtained from said first and second obtaining means;
wherein the apparatus further comprises third obtaining means for obtaining the total number of sub-sequences of labels in said second sequence; and
in that said computing means comprises;
first computing means for computing a measure of the probability of each of said first sub-sequences occurring in said second sequence of labels using the numbers obtained from said second obtaining means and the number obtained from said third obtaining means; and
second computing means for computing said similarity score by taking products of said computed probability measures in dependence upon said numbers obtained from said first obtaining means.
-
-
127. A comparison apparatus comprising:
-
a receiver operable to receive first and second sequences of labels;
an identifier operable to identify a plurality of first sub-sequences of labels within said first sequence of labels;
a first determiner operable to determine and to output the number of times each of said first sub-sequences occurs within said first sequence of labels;
a definer operable to define a plurality of second sub-sequences of labels from said second sequence of labels;
a second determiner operable to determine and to output the number of times each of said first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second sub-sequence of labels; and
a similarity measure calculator operable to calculate a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determiner with the numbers output from said second determiner.
-
-
128. A comparison method comprising:
-
receiving first and second sequences of labels;
identifying a plurality of first sub-sequences of labels within said first sequence of labels;
a first determining step of determining and outputting the number of times each of said first sub-sequences occurs within said first sequence of labels;
defining a plurality of second sub-sequences of labels from said second sequence of labels;
a second determining step of determining and outputting the number of times each of said first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second-sequence of labels; and
computing a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determining step with the numbers output from said second determining step.
-
-
129. A computer readable medium storing processor implementable process steps for carrying out a comparison method, the process steps comprising:
-
a step of receiving first and second sequences of labels;
a step of identifying a plurality of first sub-sequences of labels within said first sequence of labels;
a first determining step of determining and outputting the number of times each of said first sub-sequences occurs within said first sequence of labels;
a step of defining a plurality of second sub-sequences of labels from said second sequence of labels;
a second determining step of determining and outputting the number of times each of said first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second sub-sequence of labels; and
a step of computing a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determining step with the numbers output from said second determining step.
-
-
130. Processor implementable instructions for carrying out a comparison method, the instructions comprising:
-
instructions for receiving first and second sequences of labels;
instructions for identifying a plurality of first sub-sequences of labels within said first sequence of labels;
instructions for a first determining step of determining and outputting the number of times each of said first sub-sequences occurs within said first sequence of labels;
instructions for defining a plurality of second sub-sequences of labels from said second sequence of labels;
instructions for a second determining step of determining and outputting the number of times each of said first sub-sequences occurs within said second sequence of labels by comparing each first sub-sequence of labels with each second sub-sequence of labels; and
instructions for computing a measure of the similarity between the first and second sequences of labels by comparing the numbers output from said first determining step with the numbers output from said second determining step.
-
Specification