SYSTEM AND METHOD FOR USING A COMBINATION OF SEMANTIC AND STATISTICAL PROCESSING OF INPUT STRINGS OR OTHER DATA CONTENT
First Claim
1. A system that uses a combination of semantic and statistical processing of input or data content, comprising:
- a system that receives an input in the form of a user-entered search query, a set of text retrieved by an automated robot process, web page, electronic document, or some other form of input;
a semantically-enhanced statistical lookup data, which is created by analysis of a plurality of documents on various topics, to determine sufficient and necessary keyphrases, wherein a keyphrase is considered sufficient for a particular topic when if that keyphrase is found in the input, the input is likely to be in that topic, and a keyphrase is considered necessary for a particular topic when, if that keyphrase is found in the input, the input is both very likely to be in that topic, and very unlikely to be in any other topic; and
a semantically-enhanced comparison logic which uses the information in the semantically-enhanced statistical lookup data to analyze the input, compare search words in the input with keyphrases, determine an appropriate topic, and generate an appropriate output.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for using a combination of semantic and statistical processing of input strings or other data content, such as a web page or an electronic document. In accordance with an embodiment, the system enables the injection of semantics into an otherwise statistically-based environment, by recognizing that, within various topics, certain words, combinations of words, or phrases, herein referred to as keyphrases have different weights. Some keyphrases may be relatively unique within a particular topic, or have a relatively high weighting towards that topic; whereas other keyphrases may not be unique, or may have a relatively low rating toward that topic. In accordance with an embodiment, the system allows for characterization of both (a) “sufficient” and (b) “necessary” keyphrases. A keyphrase is considered sufficient for a particular topic when, if that keyphrase is found in the input string or data content, one is likely to be in that topic (but could be in another topic). A keyphrase is considered necessary for a particular topic when, if that keyphrase is found in the input string or data content, one is both very likely to be in that topic, and very unlikely to be in any other topic. This information can be used as part of the input processing.
-
Citations
18 Claims
-
1. A system that uses a combination of semantic and statistical processing of input or data content, comprising:
-
a system that receives an input in the form of a user-entered search query, a set of text retrieved by an automated robot process, web page, electronic document, or some other form of input; a semantically-enhanced statistical lookup data, which is created by analysis of a plurality of documents on various topics, to determine sufficient and necessary keyphrases, wherein a keyphrase is considered sufficient for a particular topic when if that keyphrase is found in the input, the input is likely to be in that topic, and a keyphrase is considered necessary for a particular topic when, if that keyphrase is found in the input, the input is both very likely to be in that topic, and very unlikely to be in any other topic; and a semantically-enhanced comparison logic which uses the information in the semantically-enhanced statistical lookup data to analyze the input, compare search words in the input with keyphrases, determine an appropriate topic, and generate an appropriate output. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-based method of using a combination of semantic and statistical processing of input or data content, comprising the steps of:
-
receiving an input in the form of a user-entered search query, a set of text retrieved by an automated robot process, web page, electronic document, or some other form of input; accessing a semantically-enhanced statistical lookup data, which is created by analysis of a plurality of documents on various topics, to determine sufficient and necessary keyphrases, wherein a keyphrase is considered sufficient for a particular topic when if that keyphrase is found in the input, the input is likely to be in that topic, and a keyphrase is considered necessary for a particular topic when, if that keyphrase is found in the input, the input is both very likely to be in that topic, and very unlikely to be in any other topic; and using the information in the semantically-enhanced statistical lookup data to analyze the input, compare search words in the input with keyphrases, determine an appropriate topic, and generate an appropriate output. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer readable medium, including instructions stored thereon which when read and executed by one or more computers cause the one or more computers to perform the steps comprising:
-
receiving an input in the form of a user-entered search query, a set of text retrieved by an automated robot process, web page, electronic document, or some other form of input; accessing a semantically-enhanced statistical lookup data, which is created by analysis of a plurality of documents on various topics, to determine sufficient and necessary keyphrases, wherein a keyphrase is considered sufficient for a particular topic when if that keyphrase is found in the input, the input is likely to be in that topic, and a keyphrase is considered necessary for a particular topic when, if that keyphrase is found in the input, the input is both very likely to be in that topic, and very unlikely to be in any other topic; and using the information in the semantically-enhanced statistical lookup data to analyze the input, compare search words in the input with keyphrases, determine an appropriate topic, and generate an appropriate output. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification