Method for searching a file having a format unsupported by a search engine
First Claim
1. A method of searching a file by using terms in the file to find related topics in the file, comprising the steps of:
- (a) receiving the file to be searched;
(b) parsing the file into selected ones of the terms related to a certain one of the topics;
(c) creating term-topic links for the selected terms, wherein each term-topic link associates one of the selected terms to the certain topic; and
(d) if a selected term matches a term in an existing term-topic link, creating a term-topic list by associating the matching term and related terms with a term-topic link, wherein each term-topic list is related to a term-topic link.
2 Assignments
0 Petitions
Accused Products
Abstract
Searching a file in a format unsupported by a search engine by creating term-topic links with associated probabilities. A file is retrieved comprising a compressed HTML file or a webpage. The file is parsed to retrieve data associated with title tags and body tags. In addition, user queries are received so that the user may associate a query with the title data. Term-topic links are created by linking terms from the retrieved data and the query with a topic. Heuristics are then used to determine the probability associated with each term-topic link. Term-topic links having a term containing nouns are assigned a higher probability than verbs, verbs are assigned a higher probability than adjectives, and adjectives and adverbs are assigned the same probability. The term-topic links are trained by adjusting the assigned probabilities based on a user defined query and an associated target topic.
60 Citations
34 Claims
-
1. A method of searching a file by using terms in the file to find related topics in the file, comprising the steps of:
-
(a) receiving the file to be searched;
(b) parsing the file into selected ones of the terms related to a certain one of the topics;
(c) creating term-topic links for the selected terms, wherein each term-topic link associates one of the selected terms to the certain topic; and
(d) if a selected term matches a term in an existing term-topic link, creating a term-topic list by associating the matching term and related terms with a term-topic link, wherein each term-topic list is related to a term-topic link. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
identifying synonyms for each selected term; and
creating term-topic links for each of the identified synonyms by associating each identified synonym with the certain topic.
-
-
4. The method of claim 1, wherein the file is a compressed HTML file and prior to performing the step (b), further comprising the step of decompressing the compressed HTML file.
-
5. The method of claim 1, wherein the file is a path with an associated address of a webpage, and the step of parsing the file comprises the steps of:
-
retrieving the webpage; and
parsing the webpage into selected ones of the terms related to the certain topic.
-
-
6. The method of claim 1, further comprising the step of assigning a probability to each term-topic link.
-
7. The method of claim 6, wherein the step of assigning the probability is performed by speech heuristics on the term in each term-topic link.
-
8. The method of claim 7, wherein the step of assigning the probability by speech heuristics comprises the steps of:
-
assigning nouns a higher probability than verbs;
assigning verbs a higher probability then adjectives; and
assigning adjectives and adverbs the same probability.
-
-
9. The method of claim 1, prior to performing the step of creating term-topic links, further comprising the steps of:
-
identifying stop-terms, wherein stop-terms comprise a predefined set of common words; and
removing the identified stop-terms from the selected terms.
-
-
10. The method of claim 1, wherein the file is a query.
-
11. A method of searching a file, comprising the steps of:
-
receiving the file to be searched;
identifying title tags and body tags contained in the file;
identifying terms contained in data associated with the title tags and in data associated with the body tags, wherein the data associated with the title tag is called a topic of the file;
creating term-topic links by associating each identified term with the topic, wherein each identified term is related to the topics and if a term matches a term in an existing term-topic link, creating a term-topic list by associating the matching term and related terms with a term-topic link, wherein each term-topic list is related to a term-topic link. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
retrieving the webpage; and
identifying title tags and body tags contained in the webpage.
-
-
14. The method of claim 11, further comprising the steps of:
-
identifying synonyms for each identified term; and
creating term-topic links for each of the identified synonyms by associating each identified synonym with the certain topic.
-
-
15. The method of claim 11, further comprising the step of assigning a probability to each term-topic link.
-
16. The method of claim 15, wherein the step of assigning the probability is performed by speech heuristics on the term in each term-topic link.
-
17. The method of claim 16, wherein the step of assigning the probability by speech heuristics comprises the steps of:
-
assigning nouns a higher probability than verbs;
assigning verbs a higher probability then adjectives; and
assigning adjectives and adverbs the same probability.
-
-
18. The method of claim 11, prior to performing the step of creating term-topic links, further comprising the steps of:
-
identifying stop-terms, wherein stop-terms comprise a predefined set of common words; and
and removing the identified stop-terms from the identified terms.
-
-
19. The method of claim 11, further comprising the steps of:
-
receiving a query;
identifying terms contained in the query; and
creating term-topic links by associating each identified term of the query with the certain topic, wherein the identified terms of the query are related to the topics.
-
-
20. A computer readable medium having computer-executable instructions for performing the steps recited in claim 11.
-
21. A method of searching a help file, comprising the steps of:
-
receiving the help file containing help information to be searched;
identifying title tags and body tags contained in the help file;
identifying a certain one of a plurality of topics of the help file comprising data associated with the title tag;
receiving a first query;
identifying terms of the help file contained in the certain topic, in data associated with the body tag and in the query;
creating term-topic links by associating each identified term with the certain topic, wherein the identified terms are related to the certain topic;
associating a probability to each term-topic link;
receiving a second query and a target topic; and
training the term-topic links by adjusting the probability of term-topic links to reflect an association between the second query and the target topic. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
retrieving the webpage using a browser; and
identifying title tags and body tags contained in the webpage.
-
-
24. The method of claim 21, prior to performing the step of training the term-topic links, further comprising the steps of:
-
identifying synonyms for each identified term; and
creating term-topic links for each of the identified synonyms by associating each identified synonym with the certain topic.
-
-
25. The method of claim 21, further comprising the step of assigning a probability to each term-topic link.
-
26. The method of claim 25, wherein the step of assigning the probability is performed by speech heuristics on the term in each term-topic link.
-
27. The method of claim 26, wherein the step of assigning the probability by speech heuristics comprises the steps of:
-
assigning nouns a higher probability than verbs;
assigning verbs a higher probability then adjectives; and
assigning adjectives and adverbs the same probability.
-
-
28. The method of claim 21, prior to performing the step of creating term-topic links, further comprising the steps of:
-
identifying stop-terms, wherein stop-terms comprise a predefined set of common words; and
removing the identified stop-terms from the selected terms.
-
-
29. A method of training term-topic links by refining the probabilities of term-topic links to reflect an association between a given query and a target topic, comprising the steps of:
-
receiving the query and the target topic, wherein the target topic is related to the query;
identifying a first selected ones of terms contained in the query;
retrieving topics linked to each one of the first selected terms in the term-topic links, wherein the target topic will match one of the retrieved topics;
for each retrieved topic, assigning a combined probability to each retrieved topic by combining the probabilities associated with each one of a second selected ones of terms contained in the retrieved topic;
identifying a predefined number of retrieved topics based on the sorted combined probabilities; and
in the event the target topic is not above a predefined threshold of retrieved topics, decreasing the combined probability of one of the retrieved topics having a higher combined probability than the target topic and increasing the combined probability of the target topic. - View Dependent Claims (30, 31, 32)
decreasing the probability associated with a term contained in one of the retrieved topics having a higher combined probability than the target topic;
retrieving the terms associated with the target topic by using the term-topic links; and
increasing the associated probabilities of each retrieved term.
-
-
31. The method of claim 29, wherein the step of decreasing the probability associated with a term contained in a randomly selected retrieved topic is performed the number of times equal to the number of terms identified in the target topic.
-
32. A computer readable medium having computer-executable instructions for performing the steps recited in claim 29.
-
33. A method of searching a file by using terms in the file to find related topics in the file, comprising the steps of:
-
(a) receiving the file to be searched;
(b) parsing the file into selected ones of the terms related to a certain one of the topics;
(c) creating term-topic links for the selected terms, wherein each term-topic link associates one of the selected terms to the certain topic; and
(d) assigning a probability to each of the term-topic links by performing speech heuristics on the term in each term-topic link, wherein the step of assigning the probability by speech heuristics comprises the steps of;
assigning nouns a higher probability than verbs, assigning verbs a higher probability then adjectives, and assigning adjectives and adverbs the same probability.
-
-
34. A method of searching a file, comprising the steps of:
-
receiving the file to be searched;
identifying title tags and body tags contained in the file;
identifying terms contained in data associated with the title tags and in data associated with the body tags, wherein the data associated with the title tag is called a topic of the file;
creating term-topic links by associating each identified term with the topic, wherein each identified term is related to the topics;
andassigning a probability to each of the term-topic links by performing speech heuristics on the term in each term-topic link, wherein the step of assigning the probability by speech heuristics comprises the steps of;
assigning nouns a higher probability than verbs, assigning verbs a higher probability then adjectives, and assigning adjectives and adverbs the same probability.
-
Specification