System and method for matching search requests and relevant data

US 20090254543A1
Filed: 03/04/2009
Published: 10/08/2009
Est. Priority Date: 04/03/2008
Status: Active Grant

First Claim

Patent Images

1. A computerized method of arrangement and representation of terms in context, comprising:

arranging terms in a Human Knowledge Structure (HKS), said structure being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;

“

essence” and

“

contain”

between adjacent terms, wherein term ‘

A’

is the essence of term ‘

B’

, or term ‘

A’

contains term ‘

B’

.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and methods for matching between search requests and relevant data (web pages, online documents, essays, online text in general, images, video, footage etc.). The system comprises three components that can work separately or together and can be integrated with other search engine methods in order to further improve the relevancy of search results. The system can find similarity between different document and measure the distance (in similarity) between documents. The three components are: Context based understanding, comprising putting the documents in the context of aspects of the human knowledge external to the documents, Partial Sentence analysis and 100 percentage points to keyword/tag sets.

105 Citations

View as Search Results

28 Claims

1. A computerized method of arrangement and representation of terms in context, comprising:
- arranging terms in a Human Knowledge Structure (HKS), said structure being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;
  
  “
  
  essence” and
  
  “
  
  contain”
  
  between adjacent terms, wherein term ‘
  
  A’
  
  is the essence of term ‘
  
  B’
  
  , or term ‘
  
  A’
  
  contains term ‘
  
  B’
  
  .

2. A computerized method of putting a document in context, comprising:
- arranging terms in a Human Knowledge Structure (HKS), said structure being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;
  
  “
  
  essence” and
  
  “
  
  contain”
  
  between adjacent terms, wherein term ‘
  
  A’
  
  is the essence of term ‘
  
  B’
  
  , or term ‘
  
  A’
  
  contains term ‘
  
  B’
  
  ;
  
  identifying in the document frequent terms and marking them with associated respective initial weights on the HKS;
  
  creating a final colored terms set for the document by;
  
  calculating scores for the marked terms and for terms related to them in the HKS, based on said relations, whereby the marked terms define a colored set of terms for the document;
  
  selecting from the colored set terms having weights greater than a predefined threshold, thereby defining a final colored set of terms for the document; and
  
  linking the final colored set to the document.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 3. The method of claim 2, wherein the initial weight of a term is its frequency values in the document.
  - 4. The method of claim 2, wherein terms from the header, abstract or keywords set of the document are given higher initial weight values.
  - 5. The method of claim 2, wherein said document comprises a set of terms used as metadata tags to describe the highlights of an object and wherein the initial weights associated with said set of terms sum up to a predefined sum.
  - 6. The method of claim 5, wherein said object is selected from the group consisting of images, video clips, footages, songs, articles, papers and text documents.
  - 7. The method of claim 2, additionally comprising adding to the colored set of terms, terms from the document that do not exist in the HKS, with initial weights.
  - 8. The method of claim 2, additionally comprising matching between a set of terms and a set of documents by:
    - identifying terms within the set of terms;
      
      retrieving the final colored sets of terms linked to each of said documents;
      
      calculating a final matching grade for each of said documents, based on the weights of terms in the final colored set of the document that are included in the set of terms; and
      
      ranking said documents according to said final matching grades.
  - 9. The method of claim 8 wherein the set of terms is a search string.
  - 10. The method of claim 8, wherein said documents comprise documents provided by at least one of external search methodologies and search engine indexing systems, as results for said set of terms.
  - 11. The method of claim 8, wherein the set of terms is a reference document and wherein the step of identifying terms comprises extracting terms from the final colored set of the reference document.
  - 12. The method of claim 2, additionally comprising matching between a reference document and a set of documents, comprising:
    - mapping the final colored set of said reference document and of each document in said set of documents to a vector in ‘
      
      n’
      
      Euclidean space;
      
      for each document in the set of documents, calculating the distance between the vector representing the document and the vector representing the reference document; and
      
      ranking the documents from the set of documents by relevancy to the reference document, measured in distance units.
  - 13. The method of claim 12 wherein said step of mapping is performed using only the terms with the highest scores.

14. A computerized method of matching between a search string and documents, comprising:
- providing a search string comprising terms that form a partial sentence;
  
  retrieving a set of documents comprising at least part of said partial sentence;
  
  counting the number of exact occurrences of the partial sentence in each of the retrieved documents;
  
  counting the number of occurrences of equivalent permutations of the partial sentence that do not conflict with the meaning of the partial sentence in each of the retrieved document;
  
  counting the number of occurrences of close permutations of the partial sentence that do not conflict with the meaning of the partial sentence in each of the retrieved document;
  
  for each retrieved document, associating scores to all said counted occurrences, based on the similarity between the occurrence and the partial sentence;
  
  summing up the scores to a final score for each document; and
  
  ranking the documents in a result list according to said final scores.
- View Dependent Claims (15, 16, 17)
- - 15. The method of claim 14, additionally comprising:
    - counting the number of occurrences of permutations of the partial sentence that conflict with the meaning of the partial sentence in each of the retrieved document; and
      
      associating scores to all said counted occurrences of conflicting permutations in each of the retrieved document.
  - 16. The method of claim 14, wherein said retrieving comprises retrieving documents provided by at least one of external search methodologies and search engine indexing systems, as results for said search string.
  - 17. The method of claim 15, wherein if the number of non-conflicting occurrences of the partial sentence is zero and the number of conflicting occurrences is greater than zero, the document is filtered out.

18. A computerized method of setting weights to terms in a set of terms used as metadata tags to describe the highlights of an object, comprising:
- merging strings and substrings among the terms to a single term in the form of {string ∥
  
  substring};
  
  predefining the total sum of the term weights;
  
  assigning a weight to each term in the set, whereby the weights sum up to the predefined total sum;
  
  saving the assigned weights in a weighted terms table comprising pairs of <
  
  <
  
  term >
  
  , <
  
  weight>
  
  >
  
  ; and
  
  linking the weighted terms table to the object.
- View Dependent Claims (19)
- - 19. The method of claim 18 wherein the predefined total sum depends on the number of keywords, where adding points to the sum is done according to the principle of diminishing marginal.

20. A computerized method of setting default weights to legacy tags or sets of keywords related to a document, whereby a weighted terms table is generated and linked to the document, comprising:
- removing duplicate tags;
  
  assigning weights to the remaining tags, based on parts of speech (POS), wherein each POS is assigned a predefined default weight;
  
  removing synonyms, whereby the remaining tag per synonym set is assigned one of the highest weight and the aggregate weights of the synonyms of the set;
  
  identifying terms consisting of two words or more and assigning them the accumulated weights of the words that comprise them;
  
  identifying string and substrings and merging them to tags of the form (strings ∥
  
  substrings) and assigning them the highest weight of the strings that comprise them;
  
  building a table representing all the tags and their weights, said table comprising pairs of <
  
  <
  
  tag >
  
  , <
  
  weight>
  
  >
  
  ; and
  
  normalizing the weights to sum-up to a predefined total.

21. A computerized system for arrangement and representation of terms in context, comprising:
- a server;
  
  at least one source of terms;
  
  communication means between said at least one source of terms and said server;
  
  a first storage device connected with said server; and
  
  means for storing said terms in said first storage device in a Human Knowledge Structure (HKS) being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;
  
  “
  
  essence” and
  
  “
  
  contain”
  
  between adjacent terms, wherein term ‘
  
  A’
  
  is the essence of term ‘
  
  B’
  
  , or term ‘
  
  A’
  
  contains term ‘
  
  B’
  
  .

22. A computerized system for putting a document in context, comprising:
- a server;
  
  a first storage device connected with said server, said first storage device storing a Human Knowledge Structure (HKS), said structure being a directed acyclic graph, wherein the graph'"'"'s nodes comprise the terms and the graph'"'"'s arcs consist of at least two relations;
  
  “
  
  essence” and
  
  “
  
  contain”
  
  between adjacent terms, wherein term ‘
  
  A’
  
  is the essence of term ‘
  
  B’
  
  , or term ‘
  
  A’
  
  contains term ‘
  
  B’
  
  ;
  
  at least one source of documents;
  
  communication means between said at least one source of documents and said server; and
  
  a second storage device connected with said server,wherein said server comprises computerized means for;
  
  receiving a document from said at least one source of documents;
  
  identifying in the document frequent terms and marking them with associated respective initial weights on the HKS;
  
  creating a final colored terms set for the document by;
  
  calculating scores for the marked terms and for terms related to them in the HKS, based on said relations, whereby the marked terms define a colored set of terms for the document;
  
  selecting from the colored set terms having weights greater than a predefined threshold, thereby defining a final colored set of terms for the document; and
  
  linking the final colored set to the document; and
  
  means for storing said linked final colored set in said second storage device.
- View Dependent Claims (23)
- - 23. The system of claim 22, additionally comprising:
    - at least one source of terms; and
      
      communication means between said at least one source of terms and said server,

24. A computerized system for matching between a search string and documents, comprising:
- a server;
  
  at least one source of search strings;
  
  at least one source of documents;
  
  communication means between said at least one source of search strings and said server; and
  
  communication means between said at least one source of documents and said server,said server comprising computerized means for;
  
  receiving from said at least one source of search strings a search string comprising terms that form a partial sentence;
  
  retrieving from said at least one source of documents a set of documents comprising at least part of said partial sentence;
  
  counting the number of exact occurrences of the partial sentence in each of the retrieved documents;
  
  counting the number of occurrences of equivalent permutations of the partial sentence that do not conflict with the meaning of the partial sentence in each of the retrieved document;
  
  counting the number of occurrences of close permutations of the partial sentence that do not conflict with the meaning of the partial sentence in each of the retrieved document;
  
  for each retrieved document, associating scores to all said counted occurrences, based on the similarity between the occurrence and the partial sentence;
  
  summing up the scores to a final score for each document; and
  
  ranking the documents in a result list according to said final scores.
- View Dependent Claims (25, 26)
- - 25. The system of claim 24, wherein said server additionally comprises computerized means for:
    - counting the number of occurrences of permutations of the partial sentence that conflict with the meaning of the partial sentence in each of the retrieved document; and
      
      associating scores to all said counted occurrences of conflicting permutations in each of the retrieved document.
  - 26. The system of claim 24, wherein said source of documents comprises documents provided by at least one of external search methodologies and search engine indexing systems, as results for said search string.

27. A computerized system for setting weights to terms in a set of terms used as metadata tags to describe the highlights of an object, comprising:
- a server;
  
  at least one source of terms; and
  
  communication means between said at least one source of terms and said server,wherein said server comprises computerized means for;
  
  receiving a set of terms from said at least one source of terms;
  
  merging strings and substrings among the terms to a single term in the form of {string ∥
  
  substring};
  
  predefining the total sum of the term weights;
  
  assigning a weight to each term in the set, whereby the weights sum up to the predefined total sum;
  
  saving the assigned weights in a weighted terms table comprising pairs of <
  
  <
  
  term >
  
  , <
  
  weight>
  
  >
  
  ; and
  
  linking the weighted terms table to the object.

28. A computerized system for setting default weights to legacy tags or sets of keywords related to a document, comprising:
- a server;
  
  at least one source of legacy tags or sets of keywords related to a document; and
  
  communication means between said at least one source and said server, wherein said server comprises computerized means for;
  
  receiving a set of legacy tags or keywords related to a document from said at least one source;
  
  removing duplicate tags;
  
  assigning weights to the remaining tags, based on parts of speech (POS), wherein each POS is assigned a predefined default weight;
  
  removing synonyms, whereby the remaining tag per synonym set is assigned one of the highest weight and the aggregate weights of the synonyms of the set;
  
  identifying terms consisting of two words or more and assigning them the accumulated weights of the words that comprise them;
  
  identifying string and substrings and merging them to tags of the form {strings ∥
  
  substrings} and assigning them the highest weight of the strings that comprise them;
  
  building a table representing all the tags and their weights, said table comprising pairs of <
  
  <
  
  tag>
  
  , <
  
  weight>
  
  >
  
  ; and
  
  normalizing the weights to sum-up to a predefined total.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ofer Ber, Ran Ber
Original Assignee
Ofer Ber, Ran Ber
Inventors
Ber, Ofer, Ber, Ran

Granted Patent

US 8,306,987 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/334   Query execution G06F16/335 ...

G06F 16/36   Creation of semantic tools,...

G06F 16/90335   Query processing

G06F 16/907   Retrieval characterised by ...

System and method for matching search requests and relevant data

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

105 Citations

28 Claims

Specification

Use Cases

Quick Links

Others

System and method for matching search requests and relevant data

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

105 Citations

28 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others