Information analysing apparatus
First Claim
1. Information analysing apparatus for clustering information elements in items of information into groups of related information elements, the apparatus comprising:
- a count data provider for providing count data representing the number of occurrences of elements in each item of information;
an initial model parameter determiner for determining first model parameters representing a probability distribution for the groups, second model parameters representing for each element the probability for each group of that element being associated with that group, and third model parameters representing for each item the probability for each group of that item being associated with that group;
a user input receiver for enabling a user to input prior information relating to the relationship between at least some of the groups and at least some of the elements;
a prior data determiner for determining from prior information input by a user using the user input receiver prior probability data for at least some of the second model parameters;
an expected probability calculator for receiving the first, second and third model parameters and the prior probability data and for calculating, for each item of information and for each information element of that item, the expected probability of that item and that element being associated with each group using the first, second and third model parameters and the prior probability data determined by the prior data determiner;
a model parameter updater for updating the first, second and third model parameters in accordance with the expected probabilities calculated by the expected probability calculator and the count data stored by the count data provider;
a likelihood calculator for calculating a likelihood on the basis of the expected probabilities and the count data stored by the count data provider; and
a controller for causing for causing the expected probability calculator, the model parameter updater and the likelihood calculator to recalculate the expected probabilities using the prior probability data and updated model parameters, to update the model parameters and to recalculate the likelihood, respectively, until the likelihood meets a given criterion.
1 Assignment
0 Petitions
Accused Products
Abstract
Information analysing apparatus is described for clustering information elements in items of information into groups of related information elements. The apparatus has an expected probability calculator (11a), a model parameter updater (11b) and an end point determiner (19) for iteratively calculating expected probabilities using first, second and third model parameters representing probability distributions for the groups, for the elements and for the items, updating the model parameters in accordance with the calculated expected probabilities and count data representing the number of occurrences of elements in each item of information until a likelihood calculated by the end point determiner meets a given criterion.
The apparatus includes a user input (5) that enables a user to input prior information relating to the relationship between at least some of the groups and at least some of the elements. At least one of the expected probability calculator (11a), the model parameter updater (11b) and the likelihood calculator is arranged to use prior data derived from the user input prior information in its calculation. In one example, the expected probability calculator uses the prior data in the calculation of the expected probabilities and in another example, the count data used by the model parameter updater and the likelihood calculator is modified in accordance with the prior data.
77 Citations
33 Claims
-
1. Information analysing apparatus for clustering information elements in items of information into groups of related information elements, the apparatus comprising:
-
a count data provider for providing count data representing the number of occurrences of elements in each item of information;
an initial model parameter determiner for determining first model parameters representing a probability distribution for the groups, second model parameters representing for each element the probability for each group of that element being associated with that group, and third model parameters representing for each item the probability for each group of that item being associated with that group;
a user input receiver for enabling a user to input prior information relating to the relationship between at least some of the groups and at least some of the elements;
a prior data determiner for determining from prior information input by a user using the user input receiver prior probability data for at least some of the second model parameters;
an expected probability calculator for receiving the first, second and third model parameters and the prior probability data and for calculating, for each item of information and for each information element of that item, the expected probability of that item and that element being associated with each group using the first, second and third model parameters and the prior probability data determined by the prior data determiner;
a model parameter updater for updating the first, second and third model parameters in accordance with the expected probabilities calculated by the expected probability calculator and the count data stored by the count data provider;
a likelihood calculator for calculating a likelihood on the basis of the expected probabilities and the count data stored by the count data provider; and
a controller for causing for causing the expected probability calculator, the model parameter updater and the likelihood calculator to recalculate the expected probabilities using the prior probability data and updated model parameters, to update the model parameters and to recalculate the likelihood, respectively, until the likelihood meets a given criterion. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. Information analysing apparatus for clustering information elements in items of information into groups of related information elements, the apparatus comprising:
-
a count data provider for providing count data representing the number of occurrences of elements in each item of information;
an initial model parameter determiner for determining first model parameters representing a probability distribution for the groups, second model parameters representing for each element the probability for each group of that element being associated with that group, and third model parameters representing for each item the probability for each group of that item being associated with that group;
a user input receiver for enabling a user to input prior information for modifying the count data;
a prior data determiner for determining from prior information input by a user using the user input receiver prior data and for modifying the count data provided by the count data provider in accordance with the prior data to provide modified count data;
an expected probability calculator for receiving the first, second and third model parameters and for calculating, for each item of information and for each information element of that item, the expected probability of that item and that element being associated with each group using the first, second and third model parameters;
a model parameter updater for updating the first, second and third model parameters in accordance with the expected probabilities calculated by the expected probability calculator and the modified count data;
a likelihood calculator for calculating a likelihood on the basis of the expected probabilities and the modified count data; and
a controller for causing for causing the expected probability calculator, the model parameter updater and the likelihood calculator to recalculate the expected probabilities using updated model parameters, to update the model parameters and to recalculate the likelihood, respectively, until the likelihood meets a given criterion.
-
-
8. A method of clustering information elements in items of information into groups of related information elements, the method comprising a processor carrying out the steps of:
-
providing count data representing the number of occurrences of elements in each item of information;
determining initial first model parameters representing a probability distribution for the groups, initial second model parameters representing for each element the probability for each group of that element being associated with that group, and initial third model parameters representing for each item the probability for each group of that item being associated with that group;
determining from prior information input by a user using a user input receiver prior probability data for at least some of the second model parameters;
calculating, for each item of information and for each information element of that item, the expected probability of that item and that element being associated with each group using the initial first, second and third model parameters and the determined prior probability data;
updating the first, second and third model parameters in accordance with calculated expected probabilities and the count data;
calculating a likelihood on the basis of the expected probabilities and the count data; and
causing the expected probability calculating, model parameter updating and likelihood calculating to be repeated, until the likelihood meets a given criterion. - View Dependent Claims (9, 10, 11, 13, 30, 32)
-
-
12. A method according to any of claims, which further comprises enabling a user to input data indicating the overall relevance of prior information input by the user using the user input receiver.
-
14. A method of clustering information elements in items of information into groups of related information elements, the method comprising a processor carrying out the steps of:
-
providing count data representing the number of occurrences of elements in each item of information;
determining initial first model parameters representing a probability distribution for the groups, initial second model parameters representing for each element the probability for each group of that element being associated with that group, and initial third model parameters representing for each item the probability for each group of that item being associated with that group;
determining prior data from prior information input by a user using a user input receiver;
modifying the count data in accordance with the prior data to provide modified count data;
calculating, for each item of information and for each information element of that item, the expected probability of that item and that element being associated with each group using the first, second and third model parameters;
updating the first, second and third model parameters in accordance with the calculated expected probabilities and the modified count data;
calculating a likelihood on the basis of the expected probabilities and the modified count data; and
causing the expected probability calculating, model parameter updating and likelihood calculating to be repeated, until the likelihood meets a given criterion.
-
-
15. Calculating apparatus for information analysing apparatus for clustering information elements in items of information into groups of related information elements, the apparatus comprising:
-
a receiver for receiving count data representing the number of occurrences of elements in each item of information modified by prior information input by a user using the user input, first model parameters representing a probability distribution for the groups, second model parameters representing for each element the probability for each group of that element being associated with that group, third model parameters representing for each item the probability for each group of that item being associated with that group;
an expected probability calculator for receiving the first, second and third model parameters and for calculating, for each item of information and for each information element of that item, the expected probability of that item and that element being associated with each group using the first, second and third model parameters;
a model parameter updater for updating the first, second and third model parameters in accordance with the expected probabilities calculated by the expected probability calculator and the modified count data;
a likelihood calculator for calculating a likelihood on the basis of the expected probabilities and the modified count data; and
a controller for causing for causing the expected probability calculator, the model parameter updater and the likelihood calculator to recalculate the expected probabilities using updated model parameters, to update the model parameters and to recalculate the likelihood, respectively, until the likelihood meets a given criterion. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. Information analysing apparatus for clustering information elements in items of information into groups of related information elements, the apparatus comprising:
-
a count data provider for providing count data representing the number of occurrences of elements in each item of information;
an initial model parameter determiner for determining a plurality of parameters;
a user input receiver for enabling a user to input prior information relating to the relationship between at least some of the groups and at least some of the elements;
a prior data determiner for determining from prior information input by a user using the user input receiver prior probability data;
an expected probability calculator for receiving the first, second and third model parameters and the prior probability data and for calculating, for each item of information and for each information element of that item, the expected probability of that item and that element being associated with each group using the plurality of parameters and the prior probability data determined by the prior data determiner;
a parameter updater for updating the plurality of parameters in accordance with the expected probabilities calculated by the expected probability calculator and the count data stored by the count data provider. - View Dependent Claims (24, 25)
-
-
26. A method of clustering information elements in items of information into groups of related information elements, the method comprising the steps of:
-
providing count data representing the number of occurrences of elements in each item of information;
determining a plurality of parameters;
receiving from a user prior information relating to the relationship between at least some of the groups and at least some of the elements;
determining prior probability data from prior information input by a user;
calculating, for each item of information and for each information element of that item, the expected probability of that item and that element being associated with each group using the plurality of parameters and the determined prior probability data;
updating the plurality of parameters in accordance with the calculated expected probabilities and the count data. - View Dependent Claims (27, 28, 31, 33)
-
-
29. Information analysing apparatus for clustering information elements in items of information into groups of related information elements, the apparatus comprising:
-
count data providing means for providing count data representing the number of occurrences of elements in each item of information;
initial model parameter determining means for determining a plurality of parameters;
user input means for enabling a user to input prior information relating to the relationship between at least some of the groups and at least some of the elements;
prior data determining means for determining from prior information input by a user using the user input means prior probability data;
expected probability calculating means for receiving the first, second and third model parameters and the prior probability data and for calculating, for each item of information and for each information element of that item, the expected probability of that item and that element being associated with each group using the plurality of parameters and the prior probability data determined by the prior data determining means;
parameter updating means for updating the plurality of parameters in accordance with the expected probabilities calculated by the expected probability calculating means and the count data stored by the count data providing means.
-
Specification