System and method for matching search requests and relevant data
First Claim
1. A computerized method of arrangement and representation of terms in context, comprising:
- arranging terms in a Human Knowledge Structure (HKS), said structure being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;
“
essence” and
“
contain”
between adjacent terms, wherein term ‘
A’
is the essence of term ‘
B’
, or term ‘
A’
contains term ‘
B’
.
0 Assignments
0 Petitions
Accused Products
Abstract
A system and methods for matching between search requests and relevant data (web pages, online documents, essays, online text in general, images, video, footage etc.). The system comprises three components that can work separately or together and can be integrated with other search engine methods in order to further improve the relevancy of search results. The system can find similarity between different document and measure the distance (in similarity) between documents. The three components are: Context based understanding, comprising putting the documents in the context of aspects of the human knowledge external to the documents, Partial Sentence analysis and 100 percentage points to keyword/tag sets.
105 Citations
28 Claims
-
1. A computerized method of arrangement and representation of terms in context, comprising:
arranging terms in a Human Knowledge Structure (HKS), said structure being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;
“
essence” and
“
contain”
between adjacent terms, wherein term ‘
A’
is the essence of term ‘
B’
, or term ‘
A’
contains term ‘
B’
.
-
2. A computerized method of putting a document in context, comprising:
-
arranging terms in a Human Knowledge Structure (HKS), said structure being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;
“
essence” and
“
contain”
between adjacent terms, wherein term ‘
A’
is the essence of term ‘
B’
, or term ‘
A’
contains term ‘
B’
;identifying in the document frequent terms and marking them with associated respective initial weights on the HKS; creating a final colored terms set for the document by; calculating scores for the marked terms and for terms related to them in the HKS, based on said relations, whereby the marked terms define a colored set of terms for the document; selecting from the colored set terms having weights greater than a predefined threshold, thereby defining a final colored set of terms for the document; and linking the final colored set to the document. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computerized method of matching between a search string and documents, comprising:
-
providing a search string comprising terms that form a partial sentence; retrieving a set of documents comprising at least part of said partial sentence; counting the number of exact occurrences of the partial sentence in each of the retrieved documents; counting the number of occurrences of equivalent permutations of the partial sentence that do not conflict with the meaning of the partial sentence in each of the retrieved document; counting the number of occurrences of close permutations of the partial sentence that do not conflict with the meaning of the partial sentence in each of the retrieved document; for each retrieved document, associating scores to all said counted occurrences, based on the similarity between the occurrence and the partial sentence; summing up the scores to a final score for each document; and ranking the documents in a result list according to said final scores. - View Dependent Claims (15, 16, 17)
-
-
18. A computerized method of setting weights to terms in a set of terms used as metadata tags to describe the highlights of an object, comprising:
-
merging strings and substrings among the terms to a single term in the form of {string ∥
substring};predefining the total sum of the term weights; assigning a weight to each term in the set, whereby the weights sum up to the predefined total sum; saving the assigned weights in a weighted terms table comprising pairs of <
<
term >
, <
weight>
>
; andlinking the weighted terms table to the object. - View Dependent Claims (19)
-
-
20. A computerized method of setting default weights to legacy tags or sets of keywords related to a document, whereby a weighted terms table is generated and linked to the document, comprising:
-
removing duplicate tags; assigning weights to the remaining tags, based on parts of speech (POS), wherein each POS is assigned a predefined default weight; removing synonyms, whereby the remaining tag per synonym set is assigned one of the highest weight and the aggregate weights of the synonyms of the set; identifying terms consisting of two words or more and assigning them the accumulated weights of the words that comprise them; identifying string and substrings and merging them to tags of the form (strings ∥
substrings) and assigning them the highest weight of the strings that comprise them;building a table representing all the tags and their weights, said table comprising pairs of <
<
tag >
, <
weight>
>
; andnormalizing the weights to sum-up to a predefined total.
-
-
21. A computerized system for arrangement and representation of terms in context, comprising:
-
a server; at least one source of terms; communication means between said at least one source of terms and said server; a first storage device connected with said server; and means for storing said terms in said first storage device in a Human Knowledge Structure (HKS) being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;
“
essence” and
“
contain”
between adjacent terms, wherein term ‘
A’
is the essence of term ‘
B’
, or term ‘
A’
contains term ‘
B’
.
-
-
22. A computerized system for putting a document in context, comprising:
-
a server; a first storage device connected with said server, said first storage device storing a Human Knowledge Structure (HKS), said structure being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;
“
essence” and
“
contain”
between adjacent terms, wherein term ‘
A’
is the essence of term ‘
B’
, or term ‘
A’
contains term ‘
B’
;at least one source of documents; communication means between said at least one source of documents and said server; and a second storage device connected with said server, wherein said server comprises computerized means for; receiving a document from said at least one source of documents; identifying in the document frequent terms and marking them with associated respective initial weights on the HKS; creating a final colored terms set for the document by; calculating scores for the marked terms and for terms related to them in the HKS, based on said relations, whereby the marked terms define a colored set of terms for the document; selecting from the colored set terms having weights greater than a predefined threshold, thereby defining a final colored set of terms for the document; and linking the final colored set to the document; and means for storing said linked final colored set in said second storage device. - View Dependent Claims (23)
wherein the server comprises additional computerized means for; receiving a set of terms from said source of terms; and matching between said set of terms and a set of documents received from said at least one source of documents, said additional means comprising means for; identifying terms within the set of terms; calculating a final matching grade for each of said documents, based on the weights of terms in the final colored set of the document that are included in the set of terms; and ranking said documents according to said final matching grades.
-
-
24. A computerized system for matching between a search string and documents, comprising:
-
a server; at least one source of search strings; at least one source of documents; communication means between said at least one source of search strings and said server; and communication means between said at least one source of documents and said server, said server comprising computerized means for; receiving from said at least one source of search strings a search string comprising terms that form a partial sentence; retrieving from said at least one source of documents a set of documents comprising at least part of said partial sentence; counting the number of exact occurrences of the partial sentence in each of the retrieved documents; counting the number of occurrences of equivalent permutations of the partial sentence that do not conflict with the meaning of the partial sentence in each of the retrieved document; counting the number of occurrences of close permutations of the partial sentence that do not conflict with the meaning of the partial sentence in each of the retrieved document; for each retrieved document, associating scores to all said counted occurrences, based on the similarity between the occurrence and the partial sentence; summing up the scores to a final score for each document; and ranking the documents in a result list according to said final scores. - View Dependent Claims (25, 26)
-
-
27. A computerized system for setting weights to terms in a set of terms used as metadata tags to describe the highlights of an object, comprising:
-
a server; at least one source of terms; and communication means between said at least one source of terms and said server, wherein said server comprises computerized means for; receiving a set of terms from said at least one source of terms; merging strings and substrings among the terms to a single term in the form of {string ∥
substring};predefining the total sum of the term weights; assigning a weight to each term in the set, whereby the weights sum up to the predefined total sum; saving the assigned weights in a weighted terms table comprising pairs of <
<
term >
, <
weight>
>
; andlinking the weighted terms table to the object.
-
-
28. A computerized system for setting default weights to legacy tags or sets of keywords related to a document, comprising:
-
a server; at least one source of legacy tags or sets of keywords related to a document; and communication means between said at least one source and said server, wherein said server comprises computerized means for; receiving a set of legacy tags or keywords related to a document from said at least one source; removing duplicate tags; assigning weights to the remaining tags, based on parts of speech (POS), wherein each POS is assigned a predefined default weight; removing synonyms, whereby the remaining tag per synonym set is assigned one of the highest weight and the aggregate weights of the synonyms of the set; identifying terms consisting of two words or more and assigning them the accumulated weights of the words that comprise them; identifying string and substrings and merging them to tags of the form {strings ∥
substrings} and assigning them the highest weight of the strings that comprise them;building a table representing all the tags and their weights, said table comprising pairs of <
<
tag>
, <
weight>
>
; andnormalizing the weights to sum-up to a predefined total.
-
Specification