Method and device for mining query with similar requirements
 CN 103,136,210 A
 Filed: 11/23/2011
 Published: 06/05/2013
 Est. Priority Date: 11/23/2011
 Status: Active Application
First Claim
1. an excavation has the method for the inquiry of similar demands, it is characterized in that, described method comprises:
 A. obtain kind of a subquery from the search daily record;
B. extract the described kind of page address that subquery is corresponding in the search daily record, calculate first degree of correlation between page address corresponding to described kind of subquery and described kind of subquery, and choose from page address corresponding to described kind of subquery according to first degree of correlation of calculating satisfy default first requirement the page address as excavating the address;
C. extract inquiry corresponding to described excavation address in the search daily record, calculate second degree of correlation between inquiry corresponding to described excavation address and described excavation address, and choose from inquiry corresponding to described excavation address according to second degree of correlation of calculating and satisfy the default second inquiry that requires as the inquiry with similar demands.
Chinese PRB Reexamination
Abstract
The invention provides a method and a device for mining a query with similar requirements, wherein the method for mining the query with the similar requirements comprises the following steps: A. obtaining a seed query from a search log; B. calculating a first relevance between the seed query and page addresses corresponding to the seed query, and selecting a page address which meets a preset first requirement from the page addresses corresponding to the seed query as a mining address according to the first relevance; and C. calculating a second relevance between the mining address and queries corresponding to the mining address, and selecting a query which meets a preset second requirement from the quires corresponding to the mining address as a query with the similar requirements according to the second relevance. Through the method, the query with the similar requirements can be obtained, and guarantees are provided for a search engine to better meet user requirements.

8 Citations
16 Claims

1. an excavation has the method for the inquiry of similar demands, it is characterized in that, described method comprises:

A. obtain kind of a subquery from the search daily record; B. extract the described kind of page address that subquery is corresponding in the search daily record, calculate first degree of correlation between page address corresponding to described kind of subquery and described kind of subquery, and choose from page address corresponding to described kind of subquery according to first degree of correlation of calculating satisfy default first requirement the page address as excavating the address; C. extract inquiry corresponding to described excavation address in the search daily record, calculate second degree of correlation between inquiry corresponding to described excavation address and described excavation address, and choose from inquiry corresponding to described excavation address according to second degree of correlation of calculating and satisfy the default second inquiry that requires as the inquiry with similar demands.


2. method according to claim 1, is characterized in that, the mode of obtaining kind of subquery from the search daily record comprises one of following mode at least:

(1) obtain the kind subquery of artificial mark in the search daily record; (2) will search in daily record inquiry with default inquiry stencil matching as kind of a subquery; (3) will search for inquiry corresponding with the page address of artificial mark in daily record as kind of a subquery; (4) will search in daily record and the corresponding inquiry conduct in the page address of default page address stencil matching kind subquery.


3. method according to claim 1, it is characterized in that, the page address u that inquiry q is corresponding and first degree of correlation between q cause that by the q that records in the search daily record the clicked number of times of u determines, perhaps, the inquiry q that page address u is corresponding and second degree of correlation between u cause that by the q that records in the search daily record the clicked number of times of u determines.

4. method according to claim 3, is characterized in that, the inquiry q that first degree of correlation between page address u corresponding to inquiry q and q or page address u are corresponding and second degree of correlation between u are calculated one of in the following ways:

(1) similarity (q, u)=count (q, u); (2) $\mathrm{similarity}(q,u)=\frac{\mathrm{count}(q,u)}{\underset{}{\mathrm{\Σ{u}_{i}Element;U\mathrm{count}(q,{u}_{i});}}}$ (3) $\mathrm{similarity}(q,u)=\frac{\mathrm{count}(q,u)}{\underset{}{\mathrm{\Σ{u}_{i}Element;U\mathrm{count}(q,{u}_{i})CenterDot;\underset{}{\mathrm{Sigma;{q}_{i}Element;Q\mathrm{count}({q}_{i},u);}}}}}$ (4) $\mathrm{similarity}(q,u)=\mathrm{log}\frac{\frac{\mathrm{count}(q,u)}{\underset{}{\mathrm{\Σ{q}_{i}Element;Q,{u}_{i}Element;U\mathrm{count}({q}_{i},{u}_{i})\frac{\underset{}{\mathrm{Sigma;{u}_{i}Element;U\mathrm{count}(q,{u}_{i})\underset{}{\mathrm{Sigma;{q}_{i}Element;Q,{u}_{i}Element;U\mathrm{count}({q}_{i},{u}_{i})CenterDot;\frac{\underset{}{\mathrm{Sigma;{q}_{i}Element;Q\mathrm{count}({q}_{i},u)\underset{}{\mathrm{Sigma;{q}_{i}Element;Q,{u}_{i}Element;U\mathrm{count}({q}_{i},{u}_{i});}}}}}{}}}}}}{}}}}}{}$ Wherein similarity (q, u) represents the degree of correlation between q and u, and the q that records in count (q, u) expression search daily record causes the number of times that u is clicked, The q that records in expression search daily record causes the number of times sum that all page addresses corresponding with q are clicked, The clicked number of times sum of u that records in expression search daily record, The clicked number of times sum in all page addresses that records in expression search daily record.


5. method according to claim 1, is characterized in that, described default first requirement comprises:

And the first degree of correlation rank between described kind of subquery is positioned at front N _{1}Position, wherein N _{1}Be positive integer;
Perhaps,And first degree of correlation between described kind of subquery surpasses the first threshold of setting;
Perhaps,And first degree of correlation between described kind of subquery surpasses the first degree of correlation mean value, and wherein said the first degree of correlation mean value is the mean value of first degree of correlation between each page address corresponding to described kind of subquery and described kind of subquery.


6. method according to claim 1, is characterized in that, described default second requires to comprise:

And the second degree of correlation rank between described excavation address is positioned at front N _{2}Position, wherein N _{2}Be positive integer;
Perhaps,And second degree of correlation between described excavation address surpasses the Second Threshold of setting;
Perhaps,And second degree of correlation between described excavation address surpasses the second degree of correlation mean value, and wherein said the second degree of correlation mean value is the mean value of second degree of correlation between each inquiry corresponding to described excavation address and described excavation address.


7. method according to claim 1, is characterized in that, described method further comprises:
D. judge that whether end condition satisfies, if so, export described inquiry with similar demands, otherwise with described inquiry with similar demands as described kind of subquery, return to described step B.

8. method according to claim 7, is characterized in that, described end condition comprises:

The executory iterations of described method satisfies default iterations requirement;
Perhaps,The execution time of described method is satisfied the Preset Time length requirement;
Perhaps,Described method is in iterative process, and quantity or the described quantity with inquiry of similar demands of described excavation address no longer increase;
PerhapsSecond degree of correlation between first degree of correlation between described excavation address and described kind of subquery or described inquiry with similar demands and described excavation address satisfies the preset value requirement.


9. an excavation has the device of the inquiry of similar demands, it is characterized in that, described device comprises:

The seed acquiring unit is used for obtaining kind of a subquery from the search daily record; The unit is excavated in the address, be used for extracting the described kind of page address that subquery is corresponding in the search daily record, calculate first degree of correlation between page address corresponding to described kind of subquery and described kind of subquery, and choose from page address corresponding to described kind of subquery according to first degree of correlation of calculating satisfy default first requirement the page address as excavating the address; The unit is excavated in inquiry, be used for extracting inquiry corresponding to described excavation address in the search daily record, calculate second degree of correlation between inquiry corresponding to described excavation address and described excavation address, and choose from inquiry corresponding to described excavation address according to second degree of correlation of calculating and satisfy the default second inquiry that requires as the inquiry with similar demands.


10. device according to claim 9, is characterized in that, described seed acquiring unit obtains kind of subquery from the search daily record mode comprises one of following mode at least:

(1) obtain the kind subquery of artificial mark in the search daily record; (2) will search in daily record inquiry with default inquiry stencil matching as kind of a subquery; (3) will search for inquiry corresponding with the page address of artificial mark in daily record as kind of a subquery; (4) will search in daily record and the corresponding inquiry conduct in the page address of default page address stencil matching kind subquery.


11. device according to claim 9, it is characterized in that, the unit is excavated when calculating page address u corresponding to inquiry q and first degree of correlation between q in described address, first degree of correlation between u and q causes that by the q that records in the search daily record the clicked number of times of u determines, perhaps, described inquiry is excavated the unit when calculating inquiry q corresponding to page address u and second degree of correlation between u, and second degree of correlation between q and u causes that by the q that records in the search daily record the clicked number of times of u determines.

12. device according to claim 11, it is characterized in that, described address is excavated the unit and is calculated one of in the following ways page address u corresponding to inquiry q and first degree of correlation between q, perhaps, described inquiry is excavated the unit and is calculated one of in the following ways inquiry q corresponding to page address u and second degree of correlation between u:

(1) similarity (q, u)=count (q, u); (2) $\mathrm{similarity}(q,u)=\frac{\mathrm{count}(q,u)}{\underset{}{\mathrm{\Σ{u}_{i}Element;U\mathrm{count}(q,{u}_{i});}}}$ (3) $\mathrm{similarity}(q,u)=\frac{\mathrm{count}(q,u)}{\underset{}{\mathrm{\Σ{u}_{i}Element;U\mathrm{count}(q,{u}_{i})CenterDot;\underset{}{\mathrm{Sigma;{q}_{i}Element;Q\mathrm{count}({q}_{i},u);}}}}}$ (4) $\mathrm{similarity}(q,u)=\mathrm{log}\frac{\frac{\mathrm{count}(q,u)}{\underset{}{\mathrm{\Σ{q}_{i}Element;Q,{u}_{i}Element;U\mathrm{count}({q}_{i},{u}_{i})\frac{\underset{}{\mathrm{Sigma;{u}_{i}Element;U\mathrm{count}(q,{u}_{i})\underset{}{\mathrm{Sigma;{q}_{i}Element;Q,{u}_{i}Element;U\mathrm{count}({q}_{i},{u}_{i})CenterDot;\frac{\underset{}{\mathrm{Sigma;{q}_{i}Element;Q\mathrm{count}({q}_{i},u)\underset{}{\mathrm{Sigma;{q}_{i}Element;Q,{u}_{i}Element;U\mathrm{count}({q}_{i},{u}_{i});}}}}}{}}}}}}{}}}}}{}$ Wherein similarity (q, u) represents the degree of correlation between q and u, and the q that records in count (q, u) expression search daily record causes the number of times that u is clicked, The q that records in expression search daily record causes the number of times sum that all page addresses corresponding with q are clicked, The clicked number of times sum of u that records in expression search daily record, The clicked number of times sum in all page addresses that records in expression search daily record.


13. device according to claim 9 is characterized in that, the comprising of described default first requirement:

And the first degree of correlation rank between described kind of subquery is positioned at front N _{1}Position, wherein N _{1}Be positive integer;
Perhaps,And first degree of correlation between described kind of subquery surpasses the first threshold of setting;
Perhaps,And first degree of correlation between described kind of subquery surpasses the first degree of correlation mean value, and wherein said the first degree of correlation mean value is the mean value of first degree of correlation between each page address corresponding to described kind of subquery and described kind of subquery.


14. device according to claim 9 is characterized in that, described default second requires to comprise:

And the second degree of correlation rank between described excavation address is positioned at front N _{2}Position, wherein N _{2}Be positive integer;
Perhaps,And second degree of correlation between described excavation address surpasses the Second Threshold of setting;
Perhaps,And second degree of correlation between described excavation address surpasses the second degree of correlation mean value, and wherein said the second degree of correlation mean value is the mean value of second degree of correlation between each inquiry corresponding to described excavation address and described excavation address.


15. device according to claim 9 is characterized in that, described device further comprises:
The judgement output unit, be used for judging whether end condition satisfies, if so, with described inquiry output with similar demands, otherwise send to described address excavate the unit and trigger described address and excavate the unit operation as described kind of subquery described inquiry with similar demands.

16. device according to claim 15 is characterized in that, described end condition comprises:

The operating iterations of described device satisfies default iterations requirement;
Perhaps,Satisfy the Preset Time length requirement working time of described device;
Perhaps,Described device is in the process of iteration operation, and quantity or the described quantity with inquiry of similar demands of described excavation address no longer increase;
PerhapsSecond degree of correlation between first degree of correlation between described excavation address and described kind of subquery or described inquiry with similar demands and described excavation address satisfies the preset value requirement.

