Region-Matching Transducers for Text-Characterization
First Claim
1. A computer implemented method, comprising:
- (A) recording in computer memory (a) computer readable instructions for performing text-characterization, (b) input data and (c) a finite state transducer that along each path (i) accepts on a first side an n-gram representing a text-characterization and (ii) outputs on a second side a sequence of symbols identifying one or more text-characterizations from a set of text-characterizations;
(B) processing the computer readable instructions with a computer processor;
(C) wherein the computer processor in processing the computer readable instructions;
(a) applying the finite state transducer to the input data to identify n-grams of text-characterization in the input data accepted by the finite state transducer on the first side;
for each n-gram accepted by the finite state transducer on the first side, incrementing a frequency counter associated with the one or more text-characterizations in the set of text-characterizations;
(b) assigning the input data one or more text-characterizations from the set of text-characterizations using the frequency counters associated therewith.
6 Assignments
0 Petitions
Accused Products
Abstract
Computer methods, apparatus and articles of manufacture therefor, are disclosed for text-characterization using a finite state transducer that along each path accepts on a first side an n-gram of text-characterization (e.g., a language or a topic) and outputs on a second side a sequence of symbols identifying one or more text-characterizations from a set of text-characterizations. The finite state transducer is applied to input data. For each n-gram accepted by the finite state transducer, a frequency counter associated with the n-gram of the one or more text-characterizations in the set of text-characterizations is incremented. The input data is classified as one or more text-characterizations from the set of text-characterizations using the frequency counters associated therewith.
54 Citations
20 Claims
-
1. A computer implemented method, comprising:
-
(A) recording in computer memory (a) computer readable instructions for performing text-characterization, (b) input data and (c) a finite state transducer that along each path (i) accepts on a first side an n-gram representing a text-characterization and (ii) outputs on a second side a sequence of symbols identifying one or more text-characterizations from a set of text-characterizations; (B) processing the computer readable instructions with a computer processor; (C) wherein the computer processor in processing the computer readable instructions; (a) applying the finite state transducer to the input data to identify n-grams of text-characterization in the input data accepted by the finite state transducer on the first side;
for each n-gram accepted by the finite state transducer on the first side, incrementing a frequency counter associated with the one or more text-characterizations in the set of text-characterizations;(b) assigning the input data one or more text-characterizations from the set of text-characterizations using the frequency counters associated therewith. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computing apparatus, comprising:
-
a memory for recording (a) computer readable instructions for performing text-characterization, (b) input data and (c) a finite state transducer that along each path (i) accepts on a first side an n-gram representing a text-characterization and (ii) outputs on a second side a sequence of symbols identifying one or more text-characterizations from a set of text-characterizations; a processor for processing the computer readable instructions; wherein the processor in processing the computer readable instructions; (a) applying the finite state transducer to the input data to identify n-grams of text-characterization in the input data accepted by the finite state transducer on the first side;
for each n-gram accepted by the finite state transducer on the first side, incrementing a frequency counter associated with the one or more text-characterizations in the set of text-characterizations;(b) assigning the input data one or more text-characterizations from the set of text-characterizations using the frequency counters associated therewith. - View Dependent Claims (13, 14)
-
-
15. A computing apparatus, comprising:
-
a memory for recording (a) computer readable instructions for performing text-characterization, (b) input data and (c) a finite state transducer that along each path (i) accepts on a first side an n-gram representing a text-characterization and (ii) outputs on a second side a sequence of symbols identifying one or more text-characterizations from a set of text-characterizations; an FST engine (a) for applying the finite state transducer to the input data to identify n-grams of text-characterization in the input data accepted by the finite state transducer on the first side;
for each n-gram accepted by the finite state transducer on the first side, incrementing a frequency counter associated with the one or more text-characterizations in the set of text-characterizations; and
(b) assigning the input data one or more text-characterizations from the set of text-characterizations using the frequency counters associated therewith. - View Dependent Claims (16, 17)
-
-
18. An article of manufacture comprising computer usable media including computer readable instructions embedded therein that causes a computer to perform a method, wherein the method comprises:
-
recording (a) input data and (b) a finite state transducer that along each path (i) accepts on a first side an n-gram representing a text-characterization and (ii) outputs on a second side a sequence of symbols identifying one or more text-characterizations from a set of text-characterizations; applying the finite state transducer to the input data to identify n-grams of text-characterization in the input data accepted by the finite state transducer on the first side;
for each n-gram accepted by the finite state transducer on the first side, incrementing a frequency counter associated with the one or more text-characterizations in the set of text-characterizations;assigning the input data one or more text-characterizations from the set of text-characterizations using the frequency counters associated therewith. - View Dependent Claims (19, 20)
-
Specification