Method and system for selecting documents by measuring document quality
First Claim
1. A system for providing a client data according to a quality value, said system comprising;
- a downloading component for obtaining at least one item of data from a source;
a classifier component for associating a quality value to each said item of data using a profile, wherein said quality value is based on low-level features of said item of data selected from the group consisting of length, vocabulary, fraction of words spelled correctly, title, author, reading grade level, average length of sentences, average length of words, usage of punctuation, usage of grammar, formatting, capitalization, source, display tags and any combinations thereof;
a training component that selects at least one of said item of data according to certain labels, said selected items of data being grouped to form training data;
a learning component that accepts said training data and automatically creates said profile; and
a presenter component for accepting a request from a client and transmitting said items of data selected according to said quality value.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for document filtering and selection based on quality automatically operates to make value judgments for document retrieval. Items of data, e.g. documents, are automatically associated a value. Items of data may be then selected based upon value, which is not only for the specific subject or topic requested, but also desirable according to certain criteria, including each document'"'"'s quality. A specific application of the invention is to a filter for computerized bulletin boards. Many of these systems, also known as discussion groups, have thousands of new messages per day. Readers and human editors do not have time to classify new messages by quality quickly. Messages may be ranked by quality automatically, to perform the same function performed by a human editor or moderator. Values and qualities may be assigned by interestingness, appropriateness, timeliness, humor, style of language, obscenity, sentiment, and any combinations thereof, for example.
97 Citations
25 Claims
-
1. A system for providing a client data according to a quality value, said system comprising;
-
a downloading component for obtaining at least one item of data from a source; a classifier component for associating a quality value to each said item of data using a profile, wherein said quality value is based on low-level features of said item of data selected from the group consisting of length, vocabulary, fraction of words spelled correctly, title, author, reading grade level, average length of sentences, average length of words, usage of punctuation, usage of grammar, formatting, capitalization, source, display tags and any combinations thereof; a training component that selects at least one of said item of data according to certain labels, said selected items of data being grouped to form training data; a learning component that accepts said training data and automatically creates said profile; and a presenter component for accepting a request from a client and transmitting said items of data selected according to said quality value.
-
-
2. A computer implemented method of obtaining and automatically associating a quality value to an item of data comprising software code for executing the steps of:
-
obtaining at least one item of data from a source via a network communication; obtaining labels for at least one of said items of data; selecting items of data with certain labels to form training data;
creating a profile from said training data;automatically associating a quality value to at least one of said items of data using said profile; transmitting at least one item of data to a client according to an associated quality value assigned in said step of automatically associating; wherein said profile specifies said associated quality value based on low-level features of said item selected from the group consisting of length, vocabulary, fraction of words spelled correctly, title, author, reading grade level, average length of sentences, average length of words, usage of punctuation, usage of grammar, formatting, capitalization, source, display tags and any combinations thereof. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for providing a client with data according to a quality value, said method further comprising the computer implemented steps of:
-
obtaining labels for at least one item of data; selecting items of data with certain labels to form training data; creating a profile from said training data; automatically associating a quality value with said at least one item of data using said profile, wherein said quality value is based on low-level features of said item of data selected from the group consisting of length, vocabulary, fraction of words spelled correctly, title, author, reading grade level, average length of sentences, average length of words, usage of punctuation, usage of grammar, formatting, capitalization, source, display tags and any combinations thereof; accepting a request including quality value selection criteria from a client; selecting at least one item of data according to said quality value selection criteria; and transmitting selected items of data to said client. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for providing a client with data according to a quality value, said method further comprising the computer implemented steps of:
-
obtaining labels for at least one item of data, wherein said item of data is information contained within an electronic bulletin board, and said labels designate level of quality, such as interestingness, appropriateness, timeliness, humor, style of language, obscenity, sentiment, or any combination thereof; selecting items of data with certain labels to form training data; creating a profile from said training data; associating a quality value to items of data using said profile; accepting a request including quality selection criteria from a client; selecting at least one item of data according to said quality values and said quality selection criteria; and transmitting selected items of data to said client. - View Dependent Claims (22, 23)
-
-
24. A system for automatically retrieving and presenting a client with items according to their qualitative nature, comprising:
-
at least one computer system having at least one item of data available; at least one access device for enabling said client to communicate with said computer system; a classifier means effective to automatically associate a quality value to items of data, wherein said quality value is indicative of the qualitative nature of said items of data and is based on low-level features of said item of data selected from the group consisting of length, vocabulary, fraction of words spelled correctly, title, author, reading grade level, average length of sentences, average length of words, usage of punctuation, usage of grammar, formatting, capitalization, source, display tags and any combinations thereof; a means for a client to provide a request for at least one said item of data according to criteria; and a transmitting means adapted to present at least one said item of data to said client selected according to said criteria. - View Dependent Claims (25)
-
Specification