Method and apparatus for categorizing and presenting documents of a distributed database
First Claim
1. A computer-implemented method for categorizing Resulting Pages into categories, comprising:
- designating a first category as commercial pages and a second category as informational pages;
determining a quality score q(wi) for each Resulting Page;
determining a transactional rating τ
(wi)for each Resulting Page,deriving a propagation matrix P;
determining a commercial score κ
for each Resulting Page;
filtering out all Resulting Pages that meet or exceed a commercial score threshold value;
wherein the Resulting Pages that meet or exceed the commercial page threshold value are placed in the first category and all remaining Resulting Pages are placed in the second category,wherein the determining the transactional rating τ
(wi) comprisesdetermining whether each Resulting Page meets select criteria,determining how strongly each Resulting Page meets the select criteria,determining a transactional score for each page, anddetermining the transactional rating for each page from the transactional score, andwherein determining a transactional score for each page comprises creating a vector for each Resulting Page α
k(wi), wherein each vector contains a plurality of elements α
kn(wi), wherein each of the plurality of elements α
kn(wi) is a Boolean value that reflects how strongly each of the Resulting Pages meets each of the select criteria.
10 Assignments
0 Petitions
Accused Products
Abstract
Described herein are methods for creating categorized documents, categorizing documents in a distributed database and categorizing Resulting Pages. Also described herein is an apparatus for searching a distributed database. The method for creating categorized documents generally comprises: initially assuming all documents are of type 1; filtering out all type 2 documents and placing them in a first category; filtering out all type 3 documents and placing them in a second category; and defining all remaining documents as type 4 documents and placing all type 4 documents in a third category. The apparatus for searching a distributed database generally comprises at least one memory device; a computing apparatus; an indexer; a transactional score generator; and a category assignor; a search server; and a user interface in communication with the search server.
-
Citations
18 Claims
-
1. A computer-implemented method for categorizing Resulting Pages into categories, comprising:
-
designating a first category as commercial pages and a second category as informational pages; determining a quality score q(wi) for each Resulting Page; determining a transactional rating τ
(wi)for each Resulting Page,deriving a propagation matrix P; determining a commercial score κ
for each Resulting Page;filtering out all Resulting Pages that meet or exceed a commercial score threshold value; wherein the Resulting Pages that meet or exceed the commercial page threshold value are placed in the first category and all remaining Resulting Pages are placed in the second category, wherein the determining the transactional rating τ
(wi) comprisesdetermining whether each Resulting Page meets select criteria, determining how strongly each Resulting Page meets the select criteria, determining a transactional score for each page, and determining the transactional rating for each page from the transactional score, and wherein determining a transactional score for each page comprises creating a vector for each Resulting Page α
k(wi), wherein each vector contains a plurality of elements α
kn(wi), wherein each of the plurality of elements α
kn(wi) is a Boolean value that reflects how strongly each of the Resulting Pages meets each of the select criteria.
-
-
2. A computer-implemented method for categorizing Resulting Pages into categories, comprising:
-
designating a first category as commercial pages and a second category as informational pages; determining a quality score q(wi) for each Resulting Page; determining a transactional rating τ
(wi) for each Resulting Page,deriving a propagation matrix P; determining a commercial score κ
for each Resulting Page;filtering out all Resulting Pages that meet or exceed a commercial score threshold value; wherein the Resulting Pages that meet or exceed the commercial page threshold value are placed in the first category and all remaining Resulting Pages are placed in the second category, wherein the determining the transactional rating τ
(wi) comprisesdetermining whether each Resulting Page meets select criteria, determining how strongly each Resulting Page meets the select criteria, determining a transactional score for each page, and determining the transactional rating for each page from the transactional score, and wherein determining a transactional score for each page comprises creating a vector for each Resulting Page β
k(wi), wherein each vector contains a plurality of elements β
kn(wi), wherein each of the plurality of elements β
kn(wi) is a weighted value that reflects how strongly each of the Resulting Pages meets each of the select criteria.
-
-
3. A computer-implemented method for categorizing Resulting Pages into categories, comprising:
-
designating a first category as commercial pages and a second category as informational pages; determining a quality score q(wi) for each Resulting Page; determining a transactional rating τ
(wi) for each Resulting Page,deriving a propagation matrix P; determining a commercial score κ
for each Resulting Page;filtering out all Resulting Pages that meet or exceed a commercial score threshold value; wherein the Resulting Pages that meet or exceed the commercial page threshold value are placed in the first category and all remaining Resulting Pages are placed in the second category, wherein the determining the transactional rating τ
(wi) comprisesdetermining whether each Resulting Page meets select criteria, determining how strongly each Resulting Page meets the select criteria, determining a transactional score for each page, and determining the transactional rating for each page from the transactional score, and wherein determining the transactional rating τ
(wi) for each page from the transactional score comprises evaluating a relationship between the transactional rating τ
(wi), and a p-norm of a vector for each Resulting Page α
k(wi) wherein the relationship is defined by - View Dependent Claims (4)
-
-
5. A computer-implemented method for categorizing Resulting Pages into categories, comprising:
-
designating a first category as commercial pages and a second category as informational pages; determining a quality score q(wi) for each Resulting Page; determining a transactional rating τ
(wi) for each Resulting Page,deriving a propagation matrix P; determining a commercial score κ
for each Resulting Page;filtering out all Resulting Pages that meet or exceed a commercial score threshold value; wherein the Resulting Pages that meet or exceed the commercial page threshold value are placed in the first category and all remaining Resulting Pages are placed in the second category, wherein the determining the transactional rating τ
(wi) comprisesdetermining whether each Resulting Page meets select criteria, determining how strongly each Resulting Page meets the select criteria, determining a transactional score for each page, and determining the transactional rating for each page from the transactional score, and wherein determining the transactional rating τ
(wi) for each page from the transactional score comprises evaluating a relationship between the transactional rating τ
(wi) and a p-norm of a vector for each Resulting Page β
k(wi) wherein the relationship is defined by - View Dependent Claims (6)
-
-
7. A computer-implemented method for categorizing Resulting Pages into categories, comprising:
-
designating a first category as commercial pages and a second category as informational pages; determining a quality score q(wi) for each Resulting Page; determining a transactional rating for each Resulting Page τ
(wi);deriving a propagation matrix;
Pdetermining a commercial score κ
for each Resulting Page;filtering out all Resulting Pages that meet or exceed a commercial score threshold value; wherein the Resulting Pages that meet or exceed the commercial page threshold value are placed in the first category and all remaining Resulting Pages are placed in the second category wherein deriving a propagation matrix, comprises; creating a hyperlink connectivity matrix C containing elements Ci,j; calculating a plurality of authority scores ai and a plurality of hub scores hi; calculating a plurality of transition counts Ti,j and a plurality of pageviews vi for each Resulting Page; and creating the propagation matrix P containing propagation matix elements Pi,j. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-implemented method for categorizing Resulting Pages into categories, comprising:
-
designating a first category as commercial pages and a second category as informational pages; determining a quality score q(wi) for each Resulting Page; determining a transactional rating for each Resulting Page τ
(wi);deriving a propagation matrix;
Pdetermining a commercial score κ
for each Resulting Page;filtering out all Resulting Pages that meet or exceed a commercial score threshold value; wherein the Resulting Pages that meet or exceed the commercial page threshold value are placed in the first category and all remaining Resulting Pages are placed in the second category, and designating a third category as spam pages; and
determining a spam score σ
(wi) for each Resulting Page;wherein determining the commercial score κ
for each Resulting Page is recursively determined over t iterations from a transverse of the propagation matrix PT, propagation matrix weight η and
commercial score initial value κ
′
(0), wherein κ
′
(0) is weighted by select quantities A and B and defined as;
-
Specification