Information retrieval system and method
First Claim
1. An information retrieval method for extracting topicality by a computer process from a database consisting of a plurality of data elements, each data element having time information and containing information that can be treated as keywords, said method comprising the steps of:
- (a) determining the consistent frequency of appearance for a given keyword, said frequency being defined as an estimated number of data elements having time information within a unit of time, which data elements consistently contain said given keyword contained in said data elements over a predetermined period of said time information;
(b) along the axis of said time information, determining the time at which the value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time, which data elements contain said given keyword, becomes maximum, as the beginning of the topicality of said given keyword;
(c) along the axis of said time information, determining the time later than the beginning of said topicality and at which the number of data elements having time information within a unit of time, which data elements contain said given keyword, becomes substantially as low as said consistent frequency of appearance, as the end of said topicality of said given keyword;
(d) previously providing a model as a function of change in the frequency of a topic, said function monotonically decreasing from the beginning to the end of a topic, said function characterized in that the absolute value of its negative gradient gradually decreases along said time axis;
(e) determining the distance between said function previously provided as a model and the graph of the change in a value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time from said beginning to said end of said topicality; and
(f) in response to the value of said distance for said given keyword being smaller than a threshold value, selecting said given keyword as a topic.
1 Assignment
0 Petitions
Accused Products
Abstract
A technique for evaluating the topicality of keywords assigned to references retrieved from a database, so that interesting topics may be extracted. Since the number of references containing a specific keyword increases at a certain time and then gradually decreases with the passage of time, the topicality of a keyword can be evaluated by quantifying this phenomenon. Keywords are sorted based on the value of their topicality and displayed either as a list or as a graph in which the level of topicality is displayed along the time axis.
67 Citations
25 Claims
-
1. An information retrieval method for extracting topicality by a computer process from a database consisting of a plurality of data elements, each data element having time information and containing information that can be treated as keywords, said method comprising the steps of:
-
(a) determining the consistent frequency of appearance for a given keyword, said frequency being defined as an estimated number of data elements having time information within a unit of time, which data elements consistently contain said given keyword contained in said data elements over a predetermined period of said time information; (b) along the axis of said time information, determining the time at which the value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time, which data elements contain said given keyword, becomes maximum, as the beginning of the topicality of said given keyword; (c) along the axis of said time information, determining the time later than the beginning of said topicality and at which the number of data elements having time information within a unit of time, which data elements contain said given keyword, becomes substantially as low as said consistent frequency of appearance, as the end of said topicality of said given keyword; (d) previously providing a model as a function of change in the frequency of a topic, said function monotonically decreasing from the beginning to the end of a topic, said function characterized in that the absolute value of its negative gradient gradually decreases along said time axis; (e) determining the distance between said function previously provided as a model and the graph of the change in a value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time from said beginning to said end of said topicality; and (f) in response to the value of said distance for said given keyword being smaller than a threshold value, selecting said given keyword as a topic. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An information retrieval method for extracting topicality by a computer process from a database consisting of a plurality of data elements, each data element having time information and containing information that can be treated as keywords, said method comprising the steps of:
-
(a) determining the consistent frequency of appearance for a given keyword, said frequency being defined as an estimated number of data elements having time information within a unit of time, which data elements consistently contain said given keyword contained in said data elements over a predetermined period of said time information; (b) along the axis of said time information, determining the time at which the value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time, which data elements contain said given keyword, becomes maximum, as the beginning of the topicality of said given keyword; (c) along the axis of said time information, determining the time later than the beginning of said topicality and at which the number of data elements having time information within a unit of time, which data elements contain said given keyword, becomes substantially as low as said consistent frequency of appearance, as the end of topicality of said given keyword; (d) previously providing a model as a function of change in the frequency of a topic, said function monotonically decreasing from the beginning to the end of a topic, said function characterized in that the absolute value of its negative gradient gradually decreases along said time axis; (e) determining the distance between said function previously provided as a model and the graph of the change in a value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time from said beginning to said end of said topicality; (f) in response to the value of said distance for said given keyword being smaller than a threshold value, selecting said given keyword as a topic; and (g) applying said steps (a) to (f) to each of a plurality of keywords contained in the data elements of said database, and sorting in descending order those selected as a topic from said plurality of keywords, with the value of said distance. - View Dependent Claims (7, 8, 9, 10)
-
-
11. An information retrieval system for extracting topicality by a computer process from a database consisting of a plurality of data elements, each data element having time information and containing information that can be treated as keywords, said system comprising:
-
(a) means for determining the consistent frequency of appearance for a given keyword, said frequency being defined as an estimated number of data elements having time information within a unit of time, which data elements consistently contain said given keyword contained in said data elements over a predetermined period of said time information; (b) means for determining, along the axis of said time information, the time at which the value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time, which data elements contain said given keyword, becomes maximum, as the beginning of said topicality of said given keyword; (c) means for determining, along the axis of said time information, the time later than the beginning of said topicality and at which the number of data elements having time information within a unit of time, which data elements contain said given keyword, becomes substantially as low as said consistent frequency of appearance, as the end of said topicality of said given keyword; (d) means for determining the distance between a function of change in a topic frequency, said function being previously provided as a model and monotonically decreasing from the beginning to the end of a topic, said function characterized in that the absolute value of its negative gradient gradually decreases along said time axis, and the graph of the change in a value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time from said beginning to said end of said topicality; and (e) means responsive to the value of said distance for said given keyword being smaller than a threshold value for selecting said given keyword as a topic. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. An information retrieval system for extracting topicality by a computer process from a database consisting of a plurality of data elements, each data element having time information and containing information that can be treated as keywords, said system comprising:
-
(a) retrieval means responsive to a retrieval demand from a user for retrieving data elements fulfilling the condition of said retrieval demand in said database, (b) means for determining the consistent frequency of appearance for a given keyword which is consistently contained in said data elements retrieved by said retrieval means over a predetermined period of said time information, said frequency being defined as an estimated number of data elements having time information within a unit of time, which data elements consistently contain said keyword; (c) means for determining, along the axis of said time information, the time at which the value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time, which data elements contain said given keyword, becomes maximum, as the beginning of said topicality of said given keyword; (d) means for determining, along the axis of said time information, the time later than the beginning of said topicality and at which the number of data elements having time information within a unit of time, which data elements contain said given keyword, becomes substantially as low as said consistent frequency of appearance, as the end of said topicality of said given keyword; (e) means for determining the distance between a function of change in a topic frequency, said function being previously provided as a model and monotonically decreasing from the beginning to the end of a topic, said function characterized in that the absolute value of its negative gradient gradually decreases along said time axis, and the graph of the change in a value obtained by subtracting said consistent frequency of appearance from the number of data elements having time information for each unit of time from said beginning to said end of said topicality; (f) means responsive to the value of said distance for said given keyword being smaller than a threshold value for selecting said given keyword as a topic; and (g) means for listing said keywords selected as a topic in the set of data elements retrieved by said retrieval means and displaying them to the user. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
Specification