Method and apparatus for database retrieval utilizing vector optimization
First Claim
1. A method of information retrieval, comprising the steps of:
- creating a profile having a number of terms;
determining a term weight for each term in said profile;
estimating a point at which a curve representing said term weights and a number of terms in a profile becomes flat;
using said estimated point to determine a term weight threshold;
modifying said profile to remove terms having a weight less than said term weight threshold; and
analyzing a data source using said modified profile to retrieve information.
3 Assignments
0 Petitions
Accused Products
Abstract
A technique for optimizing the number of terms in a profile used for information extraction. This optimization is performed by estimating the number of terms which will substantively affect the information extraction process. That is, the technique estimates the point in a term weight curve where that curve becomes flat. A term generally is important and remains part of the profile as long as its weight and the weight of the next term may be differentiated. When terms'"'"' weights are not differentiable, then they are not significant and may be cut off. Reducing the number of terms used in a profile increases the efficiency and effectiveness of the information retrieval process.
-
Citations
28 Claims
-
1. A method of information retrieval, comprising the steps of:
-
creating a profile having a number of terms;
determining a term weight for each term in said profile;
estimating a point at which a curve representing said term weights and a number of terms in a profile becomes flat;
using said estimated point to determine a term weight threshold;
modifying said profile to remove terms having a weight less than said term weight threshold; and
analyzing a data source using said modified profile to retrieve information.
-
-
2. A method of information retrieval, comprising the steps of:
-
creating a profile having a number of terms;
determining a term weight for each term in said profile;
organizing said terms in an order according to said term weight;
dividing said term weights into a plurality of bins wherein the number of term weights in a bin satisfies the condition that (alpha×
Ni) is greater than Ni+1, wherein alpha is a coefficient indicative of how many of said terms are added, wherein (i) corresponds to a specific bin number; and
modifying said profile to contain only terms from one or more of said bins. - View Dependent Claims (3)
-
-
4. A method of information retrieval, comprising the steps of:
-
creating a profile having a number of terms;
determining a term weight for each term in said profile;
selecting a minimum weight (wmin) using said term weights;
selecting a maximum weight (wmax) using said term weights;
modifying said profile wherein each term in said profile has a weight not less than approximately (wmin+(alpha×
(wmax−
wmin)) wherein alpha is a percentage selected from the range of zero to one; and
analyzing a data source using said modified profile to retrieve information. - View Dependent Claims (5)
-
-
6. A method of information retrieval, comprising the steps of:
-
creating a profile having a number of terms;
determining a term weight for each term in said profile;
selecting a maximum weight (wmax) using said term weights;
selecting a minimum weight (wmin) using said term weights;
organizing said terms in an order according to said term weight;
determining the difference in weight between adjacent terms in said order;
modifying said profile when said difference between said adjacent terms is less than approximately (beta×
(wmax−
wmin)) wherein beta is a percentage selected from the range of zero to one; and
analyzing a data source using said modified profile to retrieve information. - View Dependent Claims (7)
-
-
8. A method for retrieving information, comprising:
-
entering a request for information in a first computer system;
transmitting said request to a second computer system over a communications network;
creating a profile having a number of terms corresponding to said request in said second computer system;
determining a term weight for each term in said profile;
estimating a point at which a curve representing said term weights and a number of terms in a profile becomes flat;
using said estimated point to determine a term weight threshold;
modifying said profile to remove terms having a weight less than said term weight threshold; and
analyzing a data source using said modified profile to retrieve information.
-
-
9. A method of information retrieval, comprising the steps of:
-
entering a request for information in a first computer system;
transmitting said request to a second computer system over a communications network;
creating a profile having a number of terms;
determining a term weight for each term in said profile;
organizing said terms in an order according to said term weight;
dividing said term weights into a plurality of bins wherein the number of term weights in a bin satisfies the condition that (alpha×
Ni) is greater than Ni+1, wherein alpha is a coefficient indicative of how many of said terms are added, wherein (i) corresponds to a specific bin number; and
modifying said profile to contain only terms from one or more of said bins. - View Dependent Claims (10)
-
-
11. A method of information retrieval, comprising the steps of:
-
entering a request for information in a first computer system;
transmitting said request to a second computer system over a communications network;
creating a profile having a number of terms corresponding to said request in said second computer system;
determining a term weight for each term in said profile;
selecting a minimum weight (wmin) using said term weights;
selecting a maximum weight (wmax) using said term weights;
modifying said profile wherein each term in said profile has a weight not less than approximately (wmin+(alpha×
(wmax−
wmin)) wherein alpha is a percentage selected from the range of zero to one; and
analyzing a data source using said modified profile to retrieve information. - View Dependent Claims (12)
-
-
13. A method of information retrieval, comprising the steps of:
-
entering a request for information in a first computer system;
transmitting said request to a second computer system over a communications network;
creating a profile having a number of terms corresponding to said request in said second computer system;
determining a term weight for each term in said profile;
selecting a maximum weight (wmax) using said term weights;
selecting a minimum weight (wmin) using said term weights;
organizing said terms in an order according to said term weight;
determining the difference in weight between adjacent terms in said order;
modifying said profile when said difference between said adjacent terms is less than approximately (beta×
(wmax−
wmin)) wherein beta is a percentage selected from the range of zero to one; and
analyzing a data source using said modified profile to retrieve information. - View Dependent Claims (14)
-
-
15. A method for retrieving information, comprising:
-
receiving a request for information from a communications network;
creating a profile having a number of terms corresponding to said request;
determining a term weight for each term in said profile;
estimating a point at which a curve representing said term weights and a number of terms in a profile becomes flat;
using said estimated point to determine a term weight threshold;
modifying said profile to remove terms having a weight less than said term weight threshold;
analyzing a data source using said modified profile to retrieve information; and
transmitting said retrieved information over said communication network.
-
-
16. A method of information retrieval, comprising the steps of:
-
receiving a request for information from a communications network;
creating a profile having a number of terms;
determining a term weight for each term in said profile;
organizing said terms in an order according to said term weight;
dividing said term weights into a plurality of bins wherein the number of term weights in a bin satisfies the condition that (alpha×
Ni) is greater than Ni+1, wherein alpha is a coefficient indicative of how many of said terms are added, wherein (i) corresponds to a specific bin number; and
modifying said profile to contain only terms from one or more of said bins. - View Dependent Claims (17)
-
-
18. A method of information retrieval, comprising the steps of:
-
receiving a request for information from a communications network;
creating a profile having a number of terms corresponding to said request;
determining a term weight for each term in said profile;
selecting a minimum weight (wmin) using said term weights;
selecting a maximum weight (wmax) using said term weights;
modifying said profile wherein each term in said profile has a weight not less than approximately (wmin+(alpha×
(wmax−
wmin)) wherein alpha is a percentage selected from the range of zero to one;
analyzing a data source using said modified profile to retrieve information; and
transmitting said retrieved information over said communications network. - View Dependent Claims (19)
-
-
20. A method of information retrieval, comprising the steps of:
-
receiving a request for information from a communications network;
creating a profile having a number of terms corresponding to said request;
determining a term weight for each term in said profile;
selecting a maximum weight (wmax) using said term weights;
selecting a minimum weight (wmin) using said term weights;
organizing said terms in an order according to said term weight;
determining the difference in weight between adjacent terms in said order;
modifying said profile when said difference between said adjacent terms is less than approximately (beta×
(wmax−
wmin)) wherein beta is a percentage selected from the range of zero to one;
analyzing a data source using said modified profile to retrieve information; and
transmitting said retrieved information over said communication network. - View Dependent Claims (21, 28)
-
-
22. A system for retrieving information from a data source, comprising:
-
a central processing unit coupled to a memory unit and an input system;
said central processing unit executes instructions retrieved from said memory in response to commands entered into said input system;
said instructions cause said central processing unit to;
i) create a profile having a number of terms;
ii) determine a term weight for each term in said profile;
iii) estimate a point at which a curve representing said term weights and a number of terms in a profile becomes flat;
iv) determine said term weight threshold using said estimated point;
iv) modify said profile to remove terms having a weight less than said term weight threshold; and
v) analyze a data source using said modified profile to retrieve information.
-
-
23. A system for retrieving information from a data source, comprising:
-
a central processing unit coupled to a memory unit and an input system;
said central processing unit executes instructions retrieved from said memory in response to commands entered into said input system;
said instructions cause said central processing unit to;
i) create a profile having a number of terms;
ii) determine a term weight for each term in said profile;
iii) organize said terms in an order according to said term weight;
iv) divide said term weights into a plurality of bins wherein the number of term weights in a bin satisfies the condition that (alpha×
Ni) is greater than Ni+1, wherein alpha is a coefficient indicative of how many of said terms are added, wherein (i) corresponds to a specific bin number;
v) modify said profile to contain only terms from one or more of said bins; and
vi) analyze a data source using said modified profile to retrieve information. - View Dependent Claims (24)
-
-
25. A system for retrieving information from a data source, comprising:
-
a central processing unit coupled to a memory unit and an input system;
said central processing unit executes instructions retrieved from said memory in response to commands entered into said input system;
said instructions cause said central processing unit to;
i) create a profile having a-number of terms;
ii) determine a term weight for each term in said profile;
iii) select a minimum weight (wmin) using said term weights;
iv) select a maximum weight (wmax) using said term weights;
v) modify said profile wherein each term in said profile has a weight not less than approximately (wmin+(alpha×
(wmax−
wmin)) wherein alpha is a percentage selected from the range of zero to one.- View Dependent Claims (26)
-
-
27. A system for retrieving information from a data source, comprising:
-
a central processing unit coupled to a memory unit and an input system;
said central processing unit executes instructions retrieved from said memory in response to commands entered into said input system;
said instructions cause said central processing unit to;
i) create a profile having a number of terms;
ii) determine a term weight for each term in said profile;
iii) select a maximum weight (wmax) using said term weights;
iv) select a minimum weight (wmin) using said term weights;
v) organize said terms in an order according to said term weight;
vi) determine the difference in weight between adjacent terms in said order; and
vii) remove a term from said profile when said difference between said adjacent terms is less than approximately (beta×
(wmax−
wmin)) wherein beta is a percentage selected from the range of zero to one.
-
Specification