Systems and methods for detecting text
First Claim
1. A system, embodied on a computer-readable storage medium, that when executed, facilitates detecting text in data, comprising:
- an input component that receives data;
a connected components identifier that generates a set of connected components from the data, the connected components are utilized to generate a set of features for training and testing; and
a classification component that automatically detects text in the data via a transductive classifier employed in connection with a trained boosted classifier, the trained boosted classifier infers labels for the training connected components, the inferred labels are subjected to a clustering process by which the set of training features feature sets are expanded to define training properties, the transductive classifier is trained based in part upon the training properties.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject invention relates to facilitating text detection. The invention employs a boosted classifier and a transductive classifier to provide accurate and efficient text detection systems and/or methods. The boosted classifier is trained through features generated from a set of training connected components and labels. The boosted classifier utilizes the features to classify the training connected components, wherein inferred labels are conveyed to a transductive classifier, which generates additional properties. The initial set of features and the properties are utilized to train the transductive classifier. Upon training, the system and/or methods can be utilized to detect text in data under text detection, wherein unlabeled data is received, and connected components are extracted therefrom and utilized to generate corresponding feature vectors, which are employed to classify the connected components using the initial boosted classifier. Inferred labels are utilized to generate properties, which are utilized along with the initial feature vectors to classify each connected component using the transductive classifier.
-
Citations
18 Claims
-
1. A system, embodied on a computer-readable storage medium, that when executed, facilitates detecting text in data, comprising:
-
an input component that receives data; a connected components identifier that generates a set of connected components from the data, the connected components are utilized to generate a set of features for training and testing; and a classification component that automatically detects text in the data via a transductive classifier employed in connection with a trained boosted classifier, the trained boosted classifier infers labels for the training connected components, the inferred labels are subjected to a clustering process by which the set of training features feature sets are expanded to define training properties, the transductive classifier is trained based in part upon the training properties. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for detecting text, comprising:
-
employing a processor to execute computer executable code stored on a storage medium to perform the following acts; identifying one or more connected components associated with unlabeled data under text detection; utilizing the connected components to extract a feature vector for each connected component; utilizing a boosted classifier to classify each connected component represented by its respective feature vector; employing inferred labels to bin the connected components across a plurality of bins; computing properties for each bin; and utilizing a transductive classifier to classify the connected components as a function of the feature vectors and corresponding computed bin properties. - View Dependent Claims (8, 9, 10, 11, 12, 13, 18)
-
-
14. A system, embodied on a computer-readable storage medium, that when executed, trains a text detector, comprising:
-
means for identifying a set of connected components from at least one of unlabeled data or labeled data; means for extracting a feature vector for each connected component identified; means for training a boosted classifier with the connected components and corresponding text and non-text labels for text detection; means for training a transductive classifier with as a function of an expanded feature set, the expanded feature set generated from at least one of labels, feature vectors and computed bin properties; and means for employing the trained boosted classifier in connection with the trained transductive classifier to detect text within the unlabeled data.
-
-
15. A computer-readable storage medium having computer-executable instructions stored thereon to perform a method comprising:
-
receiving labeled training data that includes connected components, text labels and nontext labels; identifying one or more connected components associated with the labeled training data; utilizing a spatial relation technique to generate a feature vector for each of the one or more connected components identified in the labeled data; training one or more boosted classifiers with the labels and the feature vectors; employing the one or more trained boosted classifiers to infer labels for the training connected components; generating one or more histograms that define a plurality of equally sized bins as a function of a percentage of a maximum range of feature values; utilizing the labels inferred from the training connected components to bin the training connected components across the plurality of bins; computing properties and additional features for each bin; generating expanded feature vectors with the properties and additional features; utilizing the original training data labels and expanded feature vectors to train a transductive classifier; receiving unlabeled data; and employing the trained boosted classifier and the trained transductive classifier to detect text within the unlabeled data. - View Dependent Claims (16, 17)
-
Specification