Classifying functions of web blocks based on linguistic features
First Claim
1. A method performed by one or more computing devices for classifying a block of a document based on its function, the method comprising:
- identifying blocks of training documents, a block of a document containing words that are displayed when the document is displayed;
for each identified block,receiving a classification label for the identified block indicating its function; and
generating a feature vector for the identified block, the feature vector including a linguistic feature of a word of the block;
training a classifier using the feature vectors and classification labels to classify blocks of documents based on the feature vectors of the blocks;
classifying a block of a document based on its function by applying the trained classifier to a feature vector for the block; and
when a document will not fit on a display of a device, displaying blocks of the document giving preference to blocks with a certain classification.
2 Assignments
0 Petitions
Accused Products
Abstract
A classification system trains a classifier to classify blocks of the web page into various classifications of the function of the block. The classification system trains a classifier using training web pages. To train a classifier, the classification system identifies the blocks of the training web pages, generates feature vectors for the blocks that include a linguistic feature, and inputs classification labels for each block. The classification system learns the coefficients of the classifier using any of a variety of machine learning techniques. The classification system can then use the classifier to classify blocks of web pages.
-
Citations
19 Claims
-
1. A method performed by one or more computing devices for classifying a block of a document based on its function, the method comprising:
-
identifying blocks of training documents, a block of a document containing words that are displayed when the document is displayed; for each identified block, receiving a classification label for the identified block indicating its function; and generating a feature vector for the identified block, the feature vector including a linguistic feature of a word of the block; training a classifier using the feature vectors and classification labels to classify blocks of documents based on the feature vectors of the blocks; classifying a block of a document based on its function by applying the trained classifier to a feature vector for the block; and when a document will not fit on a display of a device, displaying blocks of the document giving preference to blocks with a certain classification. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computing device generating a classifier for classifying blocks of web pages into functional classifications, comprising:
-
a training data store that includes training web pages, the web pages having blocks, a block of a web page containing text that is displayed when the web page is displayed; a block identification component that identifies blocks within a web page; a feature generation component that generates a feature vector for a block of a web page, the feature vector including layout features and linguistic features, the layout features including size of text within a block when the block is displayed; a labeler component that inputs a classification label for each block of each training web page; a component that learns coefficients of a classifier using the feature vectors of the training web pages and the label classifications and stores the coefficients in a classifier coefficients store; and a component that, when a web page will not fit on a display of a device, provides that the blocks of the web page are displayed giving preference to blocks with a certain classification as determined by applying the classifier with the learned coefficients to the blocks of the web page. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A computer-readable storage medium encoded with instructions for controlling a computing device to classify blocks of web pages based on their function, by a method comprising:
-
identifying blocks of training web pages, each block of a web page containing text that is displayed when the web page is displayed; for each identified block, receiving a classification label for the identified block, the classifications including information and non-information; and generating a feature vector for the identified block, the feature vector including a linguistic feature and a layout feature, the linguistic feature based on parts of speech of words within the text of the block, the parts of speech of words within the text of the block identified by submitting the text of the block to a natural language processor; training a classifier using the feature vectors and classification labels; and classifying a block of a web page as information or non-information by applying the trained classifier to a feature vector for the block so that when the web page will not fit on the display of a device, blocks of the web page are displayed giving preference to blocks with a certain classification. - View Dependent Claims (19)
-
Specification