DOCUMENT PROCESSING DEVICE AND DOCUMENT PROCESSING METHOD

US 20090265344A1
Filed: 04/21/2009
Published: 10/22/2009
Est. Priority Date: 04/22/2008
Status: Active Grant

First Claim

Patent Images

1. A document processing method, comprising:

a collection step of collecting access history of a user;

a document similarity computing step of computing a document similarity, which indicates similarity between documents, by one user pattern which indicates a plurality of users who have accessed one document and another user pattern which indicates a plurality of users who have accessed another document, according to the access history collected in the collection step;

a keyword weight vector correction step of correcting a keyword weight vector of the one document using the document similarity computed in the document similarity computing step; and

an evaluation value calculation step of calculating an evaluation value for input information for searching, based on the keyword weight vector corrected in the keyword weight vector correction step.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An object of the present invention is to provide a document processing device and document processing method that can provide a search result satisfactory to a user with respect to WWW documents in which a number of links among WWW documents is low and a number of accesses by users is low. An access pattern collection unit 101 generates an access user vector u_jof one WWW document D_jand an access user vector u_jeof another document D_je. A user similarity computing unit 105 computes a document similarity sim (u_j, u_je) which indicates a user similarity between the WWW document D_jand WWW document D_je. A keyword vector smoothing unit 106 acquires a smoothed keyword weight vector w′_jby correcting a keyword weight vector w_jin one document, using the computed document similarity sim (u_j, u_je). An rearranging unit 110 calculates an evaluation value B_SCORE for input information for searching, based on the smoothed keyword weight vector w′_j.

36 Citations

View as Search Results

23 Claims

1. A document processing method, comprising:
- a collection step of collecting access history of a user;
  
  a document similarity computing step of computing a document similarity, which indicates similarity between documents, by one user pattern which indicates a plurality of users who have accessed one document and another user pattern which indicates a plurality of users who have accessed another document, according to the access history collected in the collection step;
  
  a keyword weight vector correction step of correcting a keyword weight vector of the one document using the document similarity computed in the document similarity computing step; and
  
  an evaluation value calculation step of calculating an evaluation value for input information for searching, based on the keyword weight vector corrected in the keyword weight vector correction step.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 9)
- - 2. The document processing method according to claim 1, wherein the keyword weight vector correction step further comprises a step of correcting a keyword weight vector in the other document using the document similarity, and correcting a keyword weight vector in the one document using the corrected keyword weight vector.
  - 3. The document processing method according to claim 1, further comprising:
    - a user similarity computing step of computing user similarity, which indicates similarity between users, by one document pattern which indicates a plurality of documents accessed by one user and another document pattern which indicates a plurality of documents accessed by another user, according to the access history collected in the collection step; and
      
      a user profile correction step of correcting a user profile which indicates characteristics of the one user using the user similarity computed in the user similarity computing step, whereinthe evaluation value calculation step further comprises a step of calculating the evaluation value for the input information for searching based on the user profile of the one user corrected in the user profile correction step.
  - 4. The document processing method according to claim 3, wherein the user profile correction step further comprises a step of correcting a user profile of another user using the user similarity and correcting the user profile of the one user based on the corrected user profile.
  - 5. The document processing method according to claim 1, further comprising an acquisition step of acquiring significance information which indicates a significance attached to each document, whereinthe evaluation value calculation step further comprises a step of calculating an evaluation value for the input information for searching, using the significance information acquired in the acquisition step.
  - 6. The document processing method according to claim 1, whereinthe evaluation value calculation step further comprises a step of calculating an evaluation value using the corrected keyword weight vector when the corrected keyword weight vector in the one document exists, and calculating an evaluation value using the keyword weight vector before correction when the corrected keyword weight vector in the one document does not exist.
  - 7. The document processing method according to claim 1 further comprising an acquisition step of acquiring a document from a search server according to an access by a user, wherein accesses accepted in the acquisition step are collected in the collection step as the access history.
  - 9. The document processing method according to claim 1, further comprising an output step of outputting the search result searched by the user according to the evaluation value calculated in the evaluation value calculation step.

8. A document processing method, comprising:
- a collection step of collecting access history of a user;
  
  a document similarity computing step of computing a document similarity, which indicates similarity between documents, by one user pattern which indicates a plurality of users who have accessed one document and another user pattern which indicates a plurality of users who have accessed another document, according to the access history collected in the collection step;
  
  a keyword weight vector correction step of correcting a keyword weight vector of the one document using the document similarity computed in the document similarity computing step;
  
  an acquisition step of acquiring significance information which indicates a significance attached to each document;
  
  a significance correction step of distinguishing a first user pattern which indicates users who have accessed one document during a first time period, and a second user pattern which indicates users who have accessed one document during a second time period, according to the accesses history of users collected in the collection step, and correcting the significance of the one document based on the similarity of the first user pattern and the second user pattern and a number of access to the one document; and
  
  an evaluation value calculation step of calculating an evaluation value for input information for searching, based on the keyword weight vector corrected in the keyword weight vector correction step, and the significance information corrected in the significance correction step.

10. A document processing method, comprising:
- a first generation step of generating a user profile based on a keyword weight vector that is to be a reference value;
  
  a second generation step of generating a new keyword weight vector based on the user profile generated in the first generation step and the keyword weight vector that is to be a reference value;
  
  a third generation step of generating the new use profile based on the new keyword weight vector generated in the second generation step;
  
  a user profile similarity generation step of computing similarity between the new user profile generated in the third generation step and the user profile generated immediately before the new user profile; and
  
  an evaluation value calculation step of calculating an evaluation value based on the similarity computed in the user profile similarity generation step, the keyword weight vector and user profile.
- View Dependent Claims (11)
- - 11. The document processing method according to claim 10, further comprising a judgment step of judging whether the similarity generated in the user profile similarity generation step is a predetermined value or more, wherein the evaluation value calculation step further comprises a step of calculating the evaluation value based on the keyword weight vector and user profile when the similarity computed in the user profile similarity generation step becomes a predetermined value or more.

12. A document processing device, comprising:
- access history collection means for collecting access history of a user;
  
  document similarity computing means for computing a document similarity, which indicates similarity between documents, by a user pattern which indicates a plurality of users who have accessed one document and a user pattern which indicates a plurality of users who have accessed another document, according to the access history collected by the collection means;
  
  keyword weight vector correction means for correcting a keyword weight vector of the one document, using the document similarity computed by the document similarity computing means; and
  
  evaluation value calculation means for calculating an evaluation value for input information for searching, based on the keyword weight vector corrected by the keyword weight vector correction means.
- View Dependent Claims (13)
- - 13. A search system, comprising:
    - a user terminal for storing access history;
      
      an information collection device for generating a keyword weight vector of a document accessed by the user terminal; and
      
      the document processing device according to claim 12, for acquiring the access history of the user terminal and the keyword weight vector generated by the information collection device.

14. A document processing program, comprising:
- a collection module for collecting access history of a user;
  
  a document similarity computing module for computing a document similarity which indicates similarity between documents, by a user pattern which indicates a plurality of users have who accessed one document and a user pattern which indicates a plurality of users who have accessed another document, according to the access history collected by the collection module;
  
  a keyword weight vector correction module for correcting a keyword weight vector of the one document, using the document similarity computed by the document similarity computing module; and
  
  an evaluation value calculation module for calculating an evaluation value for input information for searching, based on the keyword weight vector corrected by the keyword weight vector correction module.

15. A document processing device, comprising:
- primary WWW document extraction means for extracting WWW documents according to a searching word;
  
  user extraction means for extracting a user set of users who have accessed the WWW documents extracted by the primary WWW document extraction means;
  
  secondary WWW document extraction means for extracting a WWW document set of WWW documents accessed by the users extracted by the user extraction means; and
  
  significance calculation means for calculating significance of the WWW documents extracted by the primary WWW document extraction means based on a degree of accesses by users to the WWW document set extracted by the secondary WWW document extraction means.
- View Dependent Claims (16)
- - 16. The document processing device according to claim 15, wherein the significance calculation means calculates the significance of a WWW document based on a degree of accesses by each user of the user set extracted by the user extraction means.

17. A document processing device, comprising:
- primary WWW document extraction means for extracting WWW documents according to a searching method;
  
  user extraction means for extracting a user set of users who accessed the WWW documents extracted by the primary WWW document extraction means;
  
  data structure holding means for holding data for which reference relationships among the WWW documents can be managed as a directed graph;
  
  secondary WWW document extraction means for extracting other WWW documents which each WWW document extracted by the primary WWW document extraction means refers to, and other WWW documents which refer to each WWW document, based on the data stored in the data structure holding means; and
  
  significance calculation means for calculating significance of the WWW documents extracted by the primary WWW document extraction means based on a degree of accesses by the users extracted by the user extraction means to the WWW document set extracted by the secondary WWW document extraction means.

18. A document processing device, comprising:
- access history holding means for holding an access history to a WWW document by a plurality of users;
  
  data structure holding means for holding data for which reference relationships among WWW documents can be managed as a directed graph;
  
  primary WWW document extraction means for extracting WWW documents according to a searching word;
  
  user extraction means for extracting a user set of users who have accessed the WWW documents extracted by the primary WWW document extraction means from the access history holding means;
  
  secondary WWW document extraction means for extracting other WWW documents which each WWW document extracted by the primary WWW document extraction means refers to, and other WWW documents which refer to each of the WWW documents, based on the data stored in the data structure holding means, and extracting one node set by adding the user set extracted by the user extraction means and the WWW document set of the extracted WWW documents; and
  
  significance calculation means for calculating significance of the WWW documents by weighting a degree of being referred to among the WWW documents in the node set extracted by the secondary WWW document extraction means and a degree of accesses by each of the users to each of the WWW documents respectively.

19. A document processing device, comprising:
- data structure holding means for holding data for which reference relationships among WWW documents can be managed as a directed graph;
  
  primary WWW document extraction means for extracting WWW documents according to a searching word;
  
  user extraction means for extracting a user set of users who have accessed the WWW documents extracted by the extraction means from the access history holding means;
  
  secondary WWW document extraction means for extracting other WWW documents which each WWW document extracted by the primary WWW document extraction means refers to, and other WWW documents which refer to each of the WWW documents, based on the data stored in the data structure holding means;
  
  hub score calculation means for calculating a hub score indicating a degree of accesses by each user of the user set extracted by the user extraction means to each WWW document extracted by the secondary WWW document extraction means; and
  
  significance calculation means for calculating significance based on a degree of matching of a visit vector of users who have visited a WWW document, included in any of the WWW documents and the hub score calculated by the hub score calculation means.

20. A document processing method, comprising:
- a primary WWW document extraction step of extracting WWW documents according to a searching word;
  
  a user extraction step of extracting a user set of users who have accessed the WWW documents extracted in the primary WWW document extraction step;
  
  secondary WWW document extraction step of extracting a WWW document set of WWW documents accessed by the users extracted in the user extraction step; and
  
  significance calculation step of calculating significance of the WWW documents extracted in the primary WWW document extraction step based on a degree of accesses by the users to the WWW document set extracted in the secondary WWW document extraction step.

21. A document processing method for a document processing device having data structure holding means for holding data for which reference relationships among WWW documents can be managed as a directed graph, the method comprising:
- a primary WWW document extraction step of extracting WWW documents according to a searching word;
  
  a user extraction step of extracting a user set of users who have accessed the WWW documents extracted in the primary WWW document extraction step;
  
  a secondary WWW document extraction step of extracting other WWW documents which each WWW document extracted in the primary WWW document extraction step refers to, and other WWW documents which refer to each WWW document, based on the data stored in the data structure holding means; and
  
  a significance calculation step of calculating significance of the WWW documents extracted in the primary WWW document extraction step based on a degree of accesses by the users extracted in the user extraction step to the WWW document set extracted in the secondary WWW document extraction step.

22. A document processing method for a document processing device having access history holding means for holding history of access to a WWW document by a plurality of users, and data structure holding means for holding data for which reference relationships among WWW documents can be managed as a directed graph, the method comprising:
- a primary WWW document extraction step of extracting WWW documents according to a searching word;
  
  a user extraction step of extracting a user set of users who have accessed the WWW documents extracted in the primary WWW document extraction step from the access history holding means;
  
  a secondary WWW document extraction step of extracting other WWW documents which each WWW document extracted in the primary WWW document extraction step refers to, and other WWW documents which refer to each of the WWW documents, based on the data stored in the data structure holding means, and extracting one node set by adding the user set extracted in the user extraction step and the WWW document set of the extracted WWW documents; and
  
  significance calculation step of calculating significance of the WWW documents by weighting a degree of being referred to among the WWW documents in the node set extracted in the secondary WWW document extraction step and a degree of accesses by each of the users to each of the WWW documents respectively.

23. A document processing method for a document processing device having access history holding means for holding history of access to a WWW document by a plurality of users, and data structure holding means for holding data for which reference relationships among WWW documents can be managed as a directed graph,the method comprising:
- a primary WWW document extraction step of extracting WWW documents according to a searching word;
  
  a user extraction step of extracting a user set of users who have accessed the WWW documents extracted in the primary WWW document extraction step from the access history holding means;
  
  a secondary WWW document extraction step of extracting other WWW documents which each WWW document extracted in the primary WWW document extraction step refers to, and other WWW documents which refer to each of the WWW documents, based on the data stored in the data structure holding means;
  
  a hub score calculation step of calculating a hub score which indicates a degree of accesses by each user of the user set extracted in the user extraction step to each WWW document extracted in the secondary WWW document extraction step; and
  
  a significance calculation step of calculating significance based on a degree of matching of a visit vector of users who have visited a WWW document included in any of the WWW documents and the hub score calculated in the hub score calculation step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NTT Docomo Incorporated (Nippon Telegraph and Telephone Corporation)
Original Assignee
NTT Docomo Incorporated (Nippon Telegraph and Telephone Corporation)
Inventors
Akinaga, Yoshikazu, Nakayama, Takehiro, Etoh, Minoru

Granted Patent

US 8,176,033 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/3347 using vector based model

G06F 16/951 Indexing; Web crawling tech...

DOCUMENT PROCESSING DEVICE AND DOCUMENT PROCESSING METHOD

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

36 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

DOCUMENT PROCESSING DEVICE AND DOCUMENT PROCESSING METHOD

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links