OBTAINING DATA FROM ELECTRONIC DOCUMENTS
First Claim
1. A method performed with a computing system for obtaining information from a set of related electronic documents, the method comprising:
- accessing the set of related electronic documents;
identifying a product page associated with the set of related electronic documents using a page recognition model, the page recognition model generated based on a first machine learning algorithm, and the product page comprising a plurality of terms;
filtering the plurality of terms into a first set of terms and a second set of terms, the first set of terms and the second set of terms including different terms of the plurality of terms, each term in the first set of terms identified as potentially being associated with a product name, and each term in the second set of terms identified as not being associated with a product name; and
identifying each term in the first set of terms as being associated with a product name or not being associated with a product name with a name recognition model, the name recognition model generated based on a second machine learning algorithm.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for obtaining information from an electronic document include accessing a set of related electronic documents; identifying a product page associated with the set of related electronic documents using a page recognition model, the product page comprising a plurality of terms; filtering the plurality of terms into a first set of terms and a second set of terms, the first set of terms and the second set of terms including different terms of the plurality of terms, each term in the first set of terms identified as potentially being associated with a product name, and each term in the second set of terms identified as not being associated with a product name; and identifying each term in the first set of terms as being associated with a product name or not being associated with a product name with a name recognition model.
32 Citations
29 Claims
-
1. A method performed with a computing system for obtaining information from a set of related electronic documents, the method comprising:
-
accessing the set of related electronic documents; identifying a product page associated with the set of related electronic documents using a page recognition model, the page recognition model generated based on a first machine learning algorithm, and the product page comprising a plurality of terms; filtering the plurality of terms into a first set of terms and a second set of terms, the first set of terms and the second set of terms including different terms of the plurality of terms, each term in the first set of terms identified as potentially being associated with a product name, and each term in the second set of terms identified as not being associated with a product name; and identifying each term in the first set of terms as being associated with a product name or not being associated with a product name with a name recognition model, the name recognition model generated based on a second machine learning algorithm. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
accessing a set of related electronic documents; identifying a product page associated with the set of related electronic documents using a page recognition model, the page recognition model generated based on a first machine learning algorithm, and the product page comprising a plurality of terms; filtering the plurality of terms into a first set of terms and a second set of terms, the first set of terms and the second set of terms including different terms of the plurality of terms, each term in the first set of terms identified as potentially being associated with a product name, and each term in the second set of terms identified as not being associated with a product name; and identifying each term in the first set of terms as being associated with a product name or not being associated with a product name with a name recognition model, the name recognition model generated based on a second machine learning algorithm. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system of one or more computers configured to perform operations comprising:
-
accessing a set of related electronic documents; identifying a product page associated with the set of related electronic documents using a page recognition model, the page recognition model generated based on a first machine learning algorithm, and the product page comprising a plurality of terms; filtering the plurality of terms into a first set of terms and a second set of terms, the first set of terms and the second set of terms including different terms of the plurality of terms, each term in the first set of terms identified as potentially being associated with a product name, and each term in the second set of terms identified as not being associated with a product name; and identifying each term in the first set of terms as being associated with a product name or not being associated with a product name with a name recognition model, the name recognition model generated based on a second machine learning algorithm. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29)
-
Specification