System and method for determining concepts in a content item using context
First Claim
1. A method implemented on at least one machine having at least one processor, storage, and a communication platform connected to a network for indexing one or more items of content, the method comprising:
- extracting, by the at least one processor, one or more items of text from a given item of content;
tokenizing, by the at least one processor, the one or more extracted items of text into one or more concepts based on past queries submitted by one or more users;
identifying one or more related concepts associated with the one or more concepts;
obtaining, by the at least one processor, a support score for the individual one or more concepts based on whether one or more of the one or more concepts appear in the given item of content and/or whether one or more of the one or more related concepts appear in the given item of content; and
generating an index, the index comprising the given item of content associated with the one or more concepts and corresponding support scores for the individual one or more concepts;
receiving a search query;
identifying, based on the index, a set of items of content responsive to the search query, wherein individual items of content in the set are indexed with one or more concepts that are related to the search query;
obtaining, for each individual item of content in the set, a sum of support scores associated with the one or more concepts that are related to the search query; and
providing the set, wherein the items of content in the set are sorted based on the sum of support scores.
9 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed towards systems and methods for indexing one or more items of content. The method of the present invention comprises extracting one or more items of text from a given item of content. The one or more items of extracted text are tokenized into one or more concepts. One or more related concepts associated with the one or more concepts are identified. A support score is generated for the one or more concepts, and the item of content is index with the one or more concepts and the one or more associated support scores.
-
Citations
43 Claims
-
1. A method implemented on at least one machine having at least one processor, storage, and a communication platform connected to a network for indexing one or more items of content, the method comprising:
-
extracting, by the at least one processor, one or more items of text from a given item of content; tokenizing, by the at least one processor, the one or more extracted items of text into one or more concepts based on past queries submitted by one or more users; identifying one or more related concepts associated with the one or more concepts; obtaining, by the at least one processor, a support score for the individual one or more concepts based on whether one or more of the one or more concepts appear in the given item of content and/or whether one or more of the one or more related concepts appear in the given item of content; and generating an index, the index comprising the given item of content associated with the one or more concepts and corresponding support scores for the individual one or more concepts; receiving a search query; identifying, based on the index, a set of items of content responsive to the search query, wherein individual items of content in the set are indexed with one or more concepts that are related to the search query; obtaining, for each individual item of content in the set, a sum of support scores associated with the one or more concepts that are related to the search query; and providing the set, wherein the items of content in the set are sorted based on the sum of support scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system comprising a processor coupled to a memory for indexing one or more items of content, the system comprising:
-
a text extractor operative to extract one or more items of text from an item of content; a concept dictionary operative to maintain concepts; a context dictionary operative to maintain related concepts associated with the concepts maintained in the concept dictionary; and an aboutness extractor operative to; tokenize the one or more extracted items of text into one or more concepts maintained in the concept dictionary based on past queries submitted by one or more users; identify one or more related concepts associated with the one or more concepts in the item of content based on the context dictionary; obtain a support score for the individual one or more concepts based on whether one or more of the one or more concepts appear in the item of content and/or whether one or more of the one or more related concepts appear in the item of content; generate an index, the index comprising the item of content associated with the one or more concepts and corresponding support scores for the individual one or more concepts; receive a search query; identify, based on the index, a set of items of content responsive to the search query, wherein individual items of content in the set are indexed with one or more concepts that are related to the search query; obtain, for each individual item of content in the set, a sum of support scores associated with the one or more concepts that are related to the search query; and provide the set, wherein the items of content in the set are sorted based on the sum of support scores. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 41, 42, 43)
-
- 37. The system of clam 21, comprising a data store.
Specification