Method and apparatus for maintaining and navigating a non-hierarchical personal spatial file system
First Claim
1. A method for storing a document in a file system, comprising the steps of:
- determining a term weight for terms appearing in said document, wherein a given term weight is based on a frequency of occurrence of said corresponding term in a reference corpus; and
storing said document in said file system with an indication of said term weights, wherein said given term weight is obtained by dividing a fractional frequency of said term in said document by a fractional frequency of said term in said reference corpus, wherein said fractional frequency of said term in said document is the number of occurrences of the term in the document divided by the total number of terms in the document and wherein said fractional frequency of said term in said reference corpus is the number of occurrences of the term in the reference corsus divided by the total number of words in the reference corpus.
0 Assignments
0 Petitions
Accused Products
Abstract
A self-organizing personal file system is disclosed that evaluates the “importance” of terms and phrases in a document in a personal corpus relative to usage in a reference corpus. A personalized term weighting scheme assigns a weight to terms or phrases based on the frequency of occurrence of the corresponding term or phrase in a reference corpus. The personalized term weighting for a given term or phrase can be used to store and access documents containing the corresponding term or phrase in the spatial file system and provides coordinates in a spatial file system, for one or more documents containing the corresponding term or phrase. The location of a given document in a file space may be specified by the relative frequency distribution of the stems of its significant terms or phrases compared to the occurrence of such terms or phrases in a reference corpus.
30 Citations
16 Claims
-
1. A method for storing a document in a file system, comprising the steps of:
-
determining a term weight for terms appearing in said document, wherein a given term weight is based on a frequency of occurrence of said corresponding term in a reference corpus; and storing said document in said file system with an indication of said term weights, wherein said given term weight is obtained by dividing a fractional frequency of said term in said document by a fractional frequency of said term in said reference corpus, wherein said fractional frequency of said term in said document is the number of occurrences of the term in the document divided by the total number of terms in the document and wherein said fractional frequency of said term in said reference corpus is the number of occurrences of the term in the reference corsus divided by the total number of words in the reference corpus. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for identifying one or more documents in a file system that are related to one or more specified words, comprising the steps of:
-
storing each of said documents in said file system with an indication of a term weight for terms appearing in said corresponding document, wherein a given term weight is based on a frequency of occurrence of said corresponding term in a reference corpus; and evaluating a distance between each of said documents and said one or more specified words, wherein said distance is based on said term weights and wherein said given term weight is obtained by dividing a fractional frequency of said term in said document by a fractional frequency of said term in said reference corpus, wherein said fractional frequency of said term in said document is the number of occurrences of the term in the document divided by the total number of terms in the document and wherein said fractional frequency of said term in said reference corpus is the number of occurrences of the term in the reference corpus divided by the total number of words in the reference corpus. - View Dependent Claims (12)
-
-
13. A system for storing a document in a file system, comprising:
-
a memory that stores computer-readable code; and a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to; determine a term weight for terms appearing in said document, wherein a given term weight is based on a frequency of occurrence of said corresponding term in a reference corpus; and store said document in said file system with an indication of said term weights, wherein said given term weight is obtained by dividing a fractional frequency of said term in said document by a fractional frequency of said term in said reference corpus, wherein said fractional frequency of said term in said document is the number of occurrences of the term in the document divided by the total number of terms in the document and wherein said fractional frequency of said term in said reference corpus is the number of occurrences of the term in the reference corpus divided by the total number of words in the reference corpus.
-
-
14. A system for identifying one or more documents in a file system that are related to one or more specified words, comprising:
-
a memory that stores computer-readable code; and a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to; store each of said documents in said file system with an indication of a term weight for terms appearing in said corresponding document, wherein a given term weight is based on a frequency of occurrence of said corresponding term in a reference corpus; and evaluate a distance between each of said documents and said one or more specified words, wherein said distance is based on said term weights and wherein said given term weight is obtained by dividing a fractional frequency of said term in said document by a fractional frequency of said term in said reference corpus, wherein said fractional frequency of said term in said document is the number of occurrences of the term in the document divided by the total number of terms in the document and wherein said fractional frequency of said term in said reference corpus is the number of occurrences of the term in the reference corpus divided by the total number of words in the reference corpus.
-
-
15. An article of manufacture for storing a document in a file system, comprising:
-
a tangible computer readable recordable storage medium having computer readable code means embodied thereon, said computer readable program code means comprising; a step to determine a term weight for terms appearing in said document, wherein a given term weight is based on a frequency of occurrence of said corresponding term in a reference corpus; and a step to store said document in said file system with an indication of said term weights, wherein said oven term weight is obtained by dividing a fractional frequency of said term in said document by a fractional frequency of said term in said reference corpus, wherein said fractional frequency of said term in said document is the number of occurrences of the term in the document divided by the total number of terms in the document and wherein said fractional frequency of said term in said reference corpus is the number of occurrences of the term in the reference corpus divided by the total number of words in the reference corpus.
-
-
16. An article of manufacture for identifying one or more documents in a file system that are related to one or more specified words, comprising:
-
a tangible computer readable recordable storage medium having computer readable code means embodied thereon, said computer readable program code means comprising; a step to store each of said documents in said file system with an indication of a term weight for terms appearing in said corresponding document, wherein a given term weight is based on a frequency of occurrence of said corresponding term in a reference corpus; and a step to evaluate a distance between each of said documents and said one or more specified words, wherein said distance is based on said term weights and wherein said given term weight is obtained by dividing a fractional frequency of said term in said document by a fractional frequency of said term in said reference corpus, wherein said fractional frequency of said term in said document is the number of occurrences of the term in the document divided by the total number of terms in the document and wherein said fractional frequency of said term in said reference corpus is the number of occurrences of the term in the reference corpus divided by the total number of words in the reference corpus.
-
Specification