Automatic stop word identification and compensation
First Claim
1. A computer-based method for automatically compensating for stop words contained in documents during a query of the documents, the method comprising:
- (a) generating an abstract mathematical space based on documents included in a collection of documents, wherein each document has a representation in the abstract mathematical space;
(b) receiving a user query;
(c) generating a representation of the user query in the abstract mathematical space;
(d) computing a similarity between the representation of the user query and the representation of each document, wherein computing a similarity between the representation of the user query and the representation of a first document in the collection of documents comprises applying a weighting function to a value associated with a frequently occurring word contained in the first document, thereby automatically compensating for the frequently occurring word contained in the first document; and
(e) displaying a result based on the similarity computations.
4 Assignments
0 Petitions
Accused Products
Abstract
Computer-based methods for automatically identifying and compensating for stop words contained in documents are described. The method for compensating for stop words includes: generating an abstract mathematical space based on documents included in a collection of documents, wherein each document has a representation in the abstract mathematical space; receiving a user query; generating a representation of the user query in the abstract mathematical; computing a similarity between the representation of the user query and the representation of each document, wherein computing a similarity between the representation of the user query and the representation of a first document in the collection of documents comprises applying a weighting function to a value associated with a frequently occurring word contained in the first document, thereby automatically compensating for the frequently occurring word contained in the first document; and displaying a result based on the similarity computations.
-
Citations
26 Claims
-
1. A computer-based method for automatically compensating for stop words contained in documents during a query of the documents, the method comprising:
-
(a) generating an abstract mathematical space based on documents included in a collection of documents, wherein each document has a representation in the abstract mathematical space;
(b) receiving a user query;
(c) generating a representation of the user query in the abstract mathematical space;
(d) computing a similarity between the representation of the user query and the representation of each document, wherein computing a similarity between the representation of the user query and the representation of a first document in the collection of documents comprises applying a weighting function to a value associated with a frequently occurring word contained in the first document, thereby automatically compensating for the frequently occurring word contained in the first document; and
(e) displaying a result based on the similarity computations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product for automatically compensating for stop words contained in documents during a query of the documents, the computer program product comprising:
-
a computer usable medium having computer readable program code means embodied in the medium for causing an application program to execute on an operating system of a computer, the computer readable program code means comprising;
a computer readable first program code means for generating an abstract mathematical space based on documents in a collection of documents, wherein each document has a representation in the abstract mathematical space;
a computer readable second program code means for receiving a user query;
a computer readable third program code means for generating a representation of the user query in the abstract mathematical space;
a computer readable fourth program code means for computing a similarity between the representation of the user query and the representation of each document, wherein computing a similarity between the representation of the user query and the representation of a first document in the collection of documents comprises applying a weighting function to a value associated with a frequently occurring word contained in the first document, thereby automatically compensating for the frequently occurring word contained in the first document; and
a computer readable fifth program code means for displaying a result based on the similarity computations. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-based method for automatically identifying stop words contained in a document collection, the method comprising:
-
(a) generating an abstract mathematical space based on documents included in a collection of documents, wherein each unique term contained in the documents has a multi-dimensional representation in the abstract mathematical space; and
(b) identifying stop words contained in the documents based on a magnitude of a predetermined dimension of each multi-dimensional representation in the abstract mathematical space. - View Dependent Claims (22, 23)
-
-
24. A computer program product for automatically identifying stop words contained in a document collection, the computer program product comprising:
-
a computer usable medium having computer readable program code means embodied in the medium for causing an application program to execute on an operating system of a computer, the computer readable program code means comprising;
a computer readable first program code means for generating an abstract mathematical space based on documents included in a collection of documents, wherein each unique term contained in the documents has a multi-dimensional representation in the abstract mathematical space; and
a computer readable second program code means for identifying stop words contained in the documents based on a magnitude of a predetermined dimension of each multi-dimensional representation in the abstract mathematical space. - View Dependent Claims (25, 26)
-
Specification