Methods and systems for analyzing discrete-valued datasets
First Claim
1. A system for determining the structure of an electronic dataset, the system comprising:
- one or more processors configured to perform the steps of;
receiving a matrix with a first dimension corresponding to items, a second dimension corresponding to features, and discrete-valued elements indicating a presence, absence, or frequency of the features in the items;
generating an engineered features set and a weights set for the matrix, the engineered features set and the weights set corresponding to latent structures in the matrix, generating the engineered features set and the weights set comprising;
generating a first engineered feature and a first weights vector corresponding to a first latent structure in the matrix, generating the first engineered feature and the first weights vector comprising;
updating the first engineered feature of the engineered features set using the matrix and the first weights vector of the weights set, andupdating the first weights vector of the weights set using a mutual information of the matrix and the first engineered feature; and
generating a second engineered feature and a second weights vector corresponding to a second latent structure using a subset of the matrix associated with the first latent structure, generating the second engineered feature and the second weights vector comprising;
determining the subset of the matrix using at least one of the first engineered feature and the first weights vector;
updating the second engineered feature of the engineered features set using the subset and the second weights vector of the weights set; and
updating the second weights vector of the weights set using a mutual information of the subset and the second engineered feature;
receiving a request indicating at least one of the engineered features set;
identifying items based on the matrix and the indicated at least one of the engineered features set; and
providing a response based on the identified items.
0 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems disclosed herein may be used to determine the structure of a dataset comprising discrete-valued data corresponding to features and items. In some embodiments, a device may receive a discrete-valued matrix with a first dimension corresponding to items and a second dimension corresponding to features. The device may calculate an engineered features set and a weights set for the matrix. The device may update the engineered features set using the weights set, and update the weights set using the updated engineered features set based on the mutual information between the matrix and one of the updated engineered features set. The device may receive a request indicating at least one of the engineered features set, identify items based on the matrix and the indicated at least one of the engineered features set, and provide a response based on the identified items.
8 Citations
21 Claims
-
1. A system for determining the structure of an electronic dataset, the system comprising:
one or more processors configured to perform the steps of; receiving a matrix with a first dimension corresponding to items, a second dimension corresponding to features, and discrete-valued elements indicating a presence, absence, or frequency of the features in the items; generating an engineered features set and a weights set for the matrix, the engineered features set and the weights set corresponding to latent structures in the matrix, generating the engineered features set and the weights set comprising; generating a first engineered feature and a first weights vector corresponding to a first latent structure in the matrix, generating the first engineered feature and the first weights vector comprising; updating the first engineered feature of the engineered features set using the matrix and the first weights vector of the weights set, and updating the first weights vector of the weights set using a mutual information of the matrix and the first engineered feature; and generating a second engineered feature and a second weights vector corresponding to a second latent structure using a subset of the matrix associated with the first latent structure, generating the second engineered feature and the second weights vector comprising; determining the subset of the matrix using at least one of the first engineered feature and the first weights vector; updating the second engineered feature of the engineered features set using the subset and the second weights vector of the weights set; and updating the second weights vector of the weights set using a mutual information of the subset and the second engineered feature; receiving a request indicating at least one of the engineered features set; identifying items based on the matrix and the indicated at least one of the engineered features set; and providing a response based on the identified items. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
11. A non-transitory computer-readable medium containing instructions that, when executed by one or more processors, cause a device to perform operations for determining the structure of an electronic dataset, the operations comprising:
-
receiving a discrete-valued dataset comprising rows corresponding to items and columns corresponding to features; determining, based on the discrete-valued dataset; a first engineered feature indicating a first latent structure in the discrete-valued dataset with elements of the first engineered feature corresponding to the items, a first weights vector providing context for interpreting the first engineered feature with elements of the first weights vector corresponding to the features, and wherein determining the first engineered feature comprises iteratively updating, until a condition is satisfied; the elements of the first engineered feature based on a product of the first weights vector and rows of the discrete-valued dataset, and the elements of the first weights vector based on a product of the first engineered feature and columns of the discrete-valued dataset; generating a subset of the discrete-valued dataset based on at least the first engineered feature; determining, using the subset of the discrete-valued dataset, a second engineered feature indicative of a second latent structure in the discrete-valued dataset and a second weights vector providing context for interpreting the second engineered feature; and outputting indications of the first latent structure and the second latent structure. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-implemented method for determining the structure of an electronic dataset, the method comprising:
-
receiving, on a computer, a first word corresponding to a term in a first term-document matrix for a first corpus of documents in a first language; identifying a first engineered feature of the first term-document matrix corresponding to the first word; determining a second engineered feature of a second term-document matrix corresponding to the first engineered feature, the second term-document matrix for a second corpus of documents in a second language; identifying one or more second words in the second language corresponding to the second engineered feature; and providing the identified one or more second words to a user or non-transitory memory. - View Dependent Claims (20, 21)
-
Specification