Selection of initial document collection for visual interactive search
First Claim
1. A method of identifying an initial collection of k documents I1, I2, . . . , Ik from n1 candidate documents X1, X2, . . . , Xn1 in an embedding space, the initial collection of k documents I1, I2, . . . , Ik to be used for user identification of a desired document, the method comprising:
- providing, accessibly to a computer system, a database identifying (i) the n1 candidate documents X1, X2, . . . , Xn1 in the embedding space and (ii) a distance between each pair of documents of the n1 candidate documents X1, X2, . . . , Xn1 in the embedding space, the distance between each pair of candidate documents corresponding to a predetermined measure of dissimilarity between the pair of candidate documents, wherein n1>
k>
1;
identifying the k initial documents I1, I2, . . . , Ik to be identified to a user by, for each i'"'"'th one of k iterations, beginning with a first iteration (i=1), performing;
calculating a cost score for documents of the ni candidate documents X1, X2, . . . , Xni, the cost score being calculated according to an algorithm that operates in dependence on a representativeness calculation and a diversity calculation,adding, to the initial collection of k documents I1, I2, . . . , Ik, a minimum cost document, from the scored documents, having a lowest cost score, andremoving, from the ni candidate documents X1, X2, . . . , Xni, the minimum cost document and all r documents that are within a predetermined distance from the minimum cost document, where r≥
0, and ni+1 being ni—
(r +1); and
identifying toward the user the initial collection of k documents I1, I2, . . . , Ik for selection of a document.
1 Assignment
0 Petitions
Accused Products
Abstract
Roughly described, a system for user identification of a desired document. A database identifies a catalog of documents in an embedding space, in which the distance between documents corresponds to a measure of their dissimilarity. The system presents an initial collection of the documents toward the user from an initial candidate space which is part of the embedding space, then in response to iterative user input, refines the candidate space and subsequent collections of documents presented toward the user. The initial collection is determined using a weighted cost-based iterative addition to the initial collection of documents from the initial candidate space, trading off between two sub-objectives of representativeness and diversity.
54 Citations
27 Claims
-
1. A method of identifying an initial collection of k documents I1, I2, . . . , Ik from n1 candidate documents X1, X2, . . . , Xn1 in an embedding space, the initial collection of k documents I1, I2, . . . , Ik to be used for user identification of a desired document, the method comprising:
-
providing, accessibly to a computer system, a database identifying (i) the n1 candidate documents X1, X2, . . . , Xn1 in the embedding space and (ii) a distance between each pair of documents of the n1 candidate documents X1, X2, . . . , Xn1 in the embedding space, the distance between each pair of candidate documents corresponding to a predetermined measure of dissimilarity between the pair of candidate documents, wherein n1>
k>
1;identifying the k initial documents I1, I2, . . . , Ik to be identified to a user by, for each i'"'"'th one of k iterations, beginning with a first iteration (i=1), performing; calculating a cost score for documents of the ni candidate documents X1, X2, . . . , Xni, the cost score being calculated according to an algorithm that operates in dependence on a representativeness calculation and a diversity calculation, adding, to the initial collection of k documents I1, I2, . . . , Ik, a minimum cost document, from the scored documents, having a lowest cost score, and removing, from the ni candidate documents X1, X2, . . . , Xni, the minimum cost document and all r documents that are within a predetermined distance from the minimum cost document, where r≥
0, and ni+1 being ni—
(r +1); andidentifying toward the user the initial collection of k documents I1, I2, . . . , Ik for selection of a document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-readable storage medium impressed in a non-transitory manner with computer program instructions for identifying an initial collection of k documents I1I2, . . . , Ik from ni candidate documents X1, X2, . . . , Xn1 in an embedding space, the initial collection of k documents 1, I2, . . . , Ik to be used for user identification of a desired document, the computer program instructions, when executed, causing a computer to perform a method comprising:
-
providing, accessibly to a computer system, a database identifying (i) the n1 candidate documents X1, X2, . . . , Xn1 in the embedding space and (ii) a distance between each pair of documents of the n1 candidate documents X1, X2, . . . , Xn1 in the embedding space, the distance between each pair of candidate documents corresponding to a predetermined measure of dissimilarity between the pair of candidate documents, wherein n122 k>
1 ;identifying the k initial documents I1, I2, . . . , Ik to be identified to a user by, for each i'"'"'th one of k iterations, beginning with a first iteration (i−
1), performing;calculating a cost score for documents of the ni candidate documents X1, X2, . . . , Xni the cost score being calculated according to an algorithm that operates in dependence on a representativeness calculation and a diversity calculation, adding, to the initial collection of k documents I1, I2, . . . , Ik, a minimum cost document, from the scored documents, having a lowest cost score, and removing, from the ni candidate documents X1, X2, . . . , Xni, the minimum cost document and all r documents that are within a predetermined distance from the minimum cost document, where r≥
0, and ni+1 being ni−
(r +1); andidentifying toward the user the initial collection of k documents I1, I2, . . . , Ik for selection of a document.
-
-
14. A system for identifying an initial collection of k documents I1, I2, . . . , Ik from ni candidate documents X1, X2, . . . , Xn1 in an embedding space, the initial collection of k documents I1, I2, . . . , Ik to be used for user identification of a desired document, the system including:
-
a processor; a memory storing the embedding space; and a computer-readable medium coupled to the processor, computer-readable medium having stored thereon, in a non-transitory manner, a plurality of software code portions defining logic for; a first module for providing, accessibly to a computer system, a database identifying (i) the n1 candidate documents X1, X2, . . . , Xn1 in the embedding space and (ii) a distance between each pair of documents of the n1 candidate documents X1, X2, . . . , Xn1 in the embedding space, the distance between each pair of candidate documents corresponding to a predetermined measure of dissimilarity between the pair of candidate documents, wherein n1>
k>
1;a second module for identifying the k initial documents I1, I2, . . . , Ik to be identified to a user by, for each i'"'"'th one of k iterations, beginning with a first iteration (i=1), performing; calculating a cost score for documents of the ni candidate documents X1, X2, . . . , Xni the cost score being calculated according to an algorithm that operates in dependence on a representativeness calculation and a diversity calculation, adding, to the initial collection of k documents I1, I2, . . . , Ik, a minimum cost document, from the scored documents, having a lowest cost score, and removing, from the ni candidate documents X1, X2, . . . , Xni, the minimum cost document and all r documents that are within a predetermined distance from the minimum cost document, where r≥
0, and ni+1 being ni −
(r +1); anda third module for identifying toward the user the initial collection of k documents I1,I2, . . . , Ik for selection of a document. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method for user identification of a desired document, comprising:
-
providing, accessibly to a computer system, a database identifying (i) n0 candidate documents X1, X2, . . . , Xn0 in an embedding space and (ii) a distance between each pair of documents of the n0 candidate documents X1, X2, . . . , Xn0 in the embedding space, the distance between each pair of candidate documents corresponding to a predetermined measure of dissimilarity between the pair of candidate documents; identifying an initial (j=0) collection of k documents I1, I2, . . . , Ik, n0 >
k>
1, from the n0 candidate documents X1, X2, . . . , Xn0 by, for each i'"'"'th one of k iterations, beginning with a first iteration (i=1), performing;calculating a cost score for documents of the ni candidate documents X1, X2, . . . , Xni the cost score being calculated according to an algorithm that operates in dependence on a representativeness calculation and a diversity calculation, adding, to the initial collection of k documents I1, I2, . . . , Ik, a minimum cost document, from the scored documents, having a lowest cost score, and removing, from the ni candidate documents X1, X2, . . . , Xni, the minimum cost document and all r documents that are within a predetermined distance from the minimum cost document, where r≥
0, and ni+1 being ni−
(r +1);a computer system identifying the initial (j=0) collection of k documents toward the user; for each j'"'"'th iteration in a plurality of iterations, beginning with a first iteration (j=1); in response to user selection of a j'"'"'th selected subset of the documents from the (j−
1)'"'"'th collection of documents, and in dependence upon the j'"'"'th selected subset, a computer system identifying a j'"'"'th candidate space of nj candidate documents from the (j−
1)'"'"'th candidate space of nj−
1 candidate documents, the j'"'"'th candidate space being smaller than the (j−
1)'"'"'th candidate space,identifying a j'"'"'th collection of documents which is a subset of the j'"'"'th candidate space of nj candidate documents, and identifying toward the user the j'"'"'th collection of documents; and taking action in response to user indicating commitment to a particular collection of documents identified toward the user.
-
-
27. A system for user identification of a desired document, the system including:
-
a processor; a memory storing accessibly to a computer system, a database identifying (i) no candidate documents X1, X2, . . . , Xn0 in an embedding space and (ii) a distance between each pair of documents of the n0 candidate documents X1, X2, . . . , Xn0 in the embedding space, the distance between each pair of candidate documents corresponding to a predetermined measure of dissimilarity between the pair of candidate documents; and a computer-readable medium coupled to the processor, computer-readable medium having stored thereon, in a non-transitory manner, a plurality of software code portions defining logic for; a first module which identifies an initial (j=0) collection of k documents I1, I2, . . . , Ik, n0>
k>
1, from the n0 candidate documents X1, X2, . . . , Xn0 by, for each i'"'"'th one of k iterations, beginning with a first iteration (i=1), the first module performing;calculating a cost score for documents of the ni candidate documents X1, X2, . . . , Xni the cost score being calculated according to an algorithm that operates in dependence on a representativeness calculation and a diversity calculation, adding, to the initial collection of k documents I1, I2, . . . , Ik, a minimum cost document, from the scored documents, having a lowest cost score, and removing, from the ni candidate documents X1, X2, . . . , Xni, the minimum cost document and all r documents that are within a predetermined distance from the minimum cost document, where r≥
0, and ni+1 being ni−
(r +1);a second module which identifies the initial (j=0) collection of k documents toward the user; a third module which, for each j'"'"'th iteration in a plurality of iterations, beginning with a first iteration (j=1), performs; in response to user selection of a j'"'"'th selected subset of the documents from the (j-31
1)'"'"'th collection of documents, and in dependence upon the j'"'"'th selected subset, a computer system identifying a j'"'"'th candidate space of nj candidate documents from the (j−
1)'"'"'th candidate space of nj−
1 candidate documents, the j'"'"'th candidate space being smaller than the (j−
1)'"'"'th candidate space,identifying a j'"'"'th collection of documents which is a subset of the j'"'"'th candidate space of nj candidate documents, and identifying toward the user the j'"'"'th collection of documents; and a fourth module which takes action in response to user indicating commitment to a particular collection of documents identified toward the user.
-
Specification