File system with access and retrieval of XML documents
First Claim
Patent Images
1. A computer implemented method of information retrieval, comprising the steps of:
- retrieving structural information of memorized documents according to a document type declaration that corresponds to each of said documents;
retrieving elements, attributes and values of said elements and said attributes of said documents;
generating a multilevel inverted index from said structural information, said elements, said attributes and said values;
accepting a specification from a user having members that comprise at least one of said elements, said attributes and said values;
responsive to said specification, extracting data from said index that complies with at least one of said members;
displaying a hierarchy of virtual directory paths of files in a file system, said files corresponding to ones of said documents, wherein said directory paths each comprise a sequence of said members, and wherein contents of directories that are identified in said directory paths comprise selected ones of said documents possessing said specification;
responsive to said step of displaying a hierarchy of virtual directory paths, navigating said hierarchy; and
browsing among said documents.
1 Assignment
0 Petitions
Accused Products
Abstract
An XML-aware file system exploits attributes encoded in an XML document. The file system presents a dynamic directory structure to the user, and breaks the conventional tight linkage between sets of files and the physical directory structure, thus allowing different users to see files organized in a different fashion. The dynamic structure is based upon content, which is extracted using an inverted index according to attributes and values defined by the XML structure.
136 Citations
47 Claims
-
1. A computer implemented method of information retrieval, comprising the steps of:
-
retrieving structural information of memorized documents according to a document type declaration that corresponds to each of said documents;
retrieving elements, attributes and values of said elements and said attributes of said documents;
generating a multilevel inverted index from said structural information, said elements, said attributes and said values;
accepting a specification from a user having members that comprise at least one of said elements, said attributes and said values;
responsive to said specification, extracting data from said index that complies with at least one of said members;
displaying a hierarchy of virtual directory paths of files in a file system, said files corresponding to ones of said documents, wherein said directory paths each comprise a sequence of said members, and wherein contents of directories that are identified in said directory paths comprise selected ones of said documents possessing said specification;
responsive to said step of displaying a hierarchy of virtual directory paths, navigating said hierarchy; and
browsing among said documents. - View Dependent Claims (2, 3, 4, 5, 6, 7)
extracting a document identifier from one of said postings of said values;
extracting an offset of a context from said one of said postings of said values; and
extracting an entry length of said context from said one of said postings of said values.
-
-
4. The method according to claim 1, wherein said documents are XML documents.
-
5. The method according to claim 1, further comprising the steps of:
-
noting changes in a composition of a repository of said documents; and
updating said index responsive to said changes.
-
-
6. The method according to claim 1, wherein said specification comprises a partial query and a complete query that are expressed as components of a path within said file system.
-
7. The method according to claim 1, wherein a portion of said specification is stated as a path name by the user.
-
8. A computer software product, comprising a computer-readable medium in which computer program instructions are stored, which instructions, when read by a computer, cause the computer to perform the steps of:
-
retrieving structural information of memorized documents according to a document type declaration that corresponds to each of said documents;
retrieving elements, attributes and values of said elements and said attributes of said documents;
generating a multilevel inverted index from said structural information, said elements, said attributes and said values;
accepting a specification from a user having members that comprise at least one of said elements, said attributes and said values;
responsive to said specification, extracting data from said index that complies with at least one of said members;
associating said data with corresponding ones of said documents;
displaying said corresponding ones of said documents as a hierarchy of virtual directory paths of a file system, wherein said directory paths each comprise a sequence of said members, and wherein contents of directories that are identified in said directory paths comprise files containing selected ones of said documents possessing said specification;
responsive to said step of displaying, navigating said hierarchy; and
browsing among said documents. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
extracting a document identifier from one of said postings of said values;
extracting an offset of a context from said one of said postings of said values; and
extracting an entry length of said context from said one of said postings of said values.
-
-
11. The computer software product according to claim 8, wherein said documents are XML documents.
-
12. The computer software product according to claim 8, further comprising the steps of:
-
noting changes in a composition of a repository of said documents; and
updating said index responsive to said changes.
-
-
13. The computer software product according to claim 8, wherein said specification comprises a partial query and a complete query, that are expressed as components of a path within said file system.
-
14. The computer software product according to claim 8, wherein said specification is stated as a path name by the user.
-
15. The computer software product according to claim 8, wherein said specification is issued via a file system applications programming interface.
-
16. The computer software product according to claim 15, wherein said instructions define a file system engine that issues calls to an operating system.
-
17. A computer implemented information retrieval system for presenting a semantically dependent directory structure of XML files to a user, comprising:
-
a file system engine, that receives a file request via a file system application programming interface and issues file system calls to an operating system, wherein said file request specifies a file content of memorized files in a file system;
an XML parser, linked to said file system engine, that retrieves structural information of XML documents, said XML parser further retrieving at least one of elements, attributes and respective values thereof from said XML documents;
an indexer, linked to said XML parser, for constructing an inverted index of said elements and said attributes and said respective values thereof, wherein responsive to said file request, said file system engine retrieves postings of said inverted index that satisfy requirements of said file request, and returns directory paths to said file system application programming interface of said files containing said XML documents corresponding to said postings; and
a browser for displaying said directory paths and said files as a navigable hierarchical display that is constructed on-the-fly by said file system engine. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
a document identifier of one of said XML documents;
an offset of a context of said one XML document; and
an entry length of said context of said one XML document.
-
-
20. The information retrieval system of claim 17, further comprising an XML analyzer for updating said inverted index, wherein said XML analyzer analyzes additions to said memorized files.
-
21. The information retrieval system of claim 17, wherein said XML parser retrieves said structural information from document type declarations of said XML documents.
-
22. The information retrieval system of claim 17, wherein said file request comprises a partial query and a complete query, that are expressed as components of a path within said file system.
-
23. The information retrieval system of claim 17, wherein a portion of said file request is a path name.
-
24. The information retrieval system of claim 17, wherein a repository of said XML documents is a networked file system.
-
25. A computer implemented method of information retrieval, comprising the steps of:
-
retrieving structural information of memorized documents according to a document type declaration that corresponds to each of said documents, wherein said documents are written in a markup language;
retrieving elements, attributes and values of said elements and said attributes of said documents;
generating a multilevel inverted index from said structural information, said elements, said attributes and said values;
accepting a specification from a user having members that comprise at least one of said elements, said attributes and said values;
responsive to said specification, extracting data from said index that complies with at least one of said members;
displaying a hierarchy of virtual directory paths in a file system, files containing corresponding ones of said documents, wherein said directory paths each comprise a sequence of said members, and wherein contents of directories that are identified in said directory paths comprise selected ones of said documents possessing said specification;
responsive to said step of displaying a hierarchy of virtual directory paths, navigating said hierarchy; and
browsing among said documents. - View Dependent Claims (26, 27, 28, 29, 30)
extracting a document identifier from one of said postings of said values;
extracting an offset of a context from said one of said postings of said values; and
extracting an entry length of said context from said one of said postings of said values.
-
-
28. The method according to claim 25, further comprising the steps of:
-
noting changes in a composition of a repository of said documents; and
updating said index responsive to said changes.
-
-
29. The method according to claim 25, wherein said specification comprises a partial query and a complete query, that are expressed as components of a path within said file system.
-
30. The method according to claim 25, wherein a portion of said specification is stated as a path name by the user.
-
31. A computer software product, comprising a computer-readable medium in which computer program instructions are stored, which instructions, when read by a computer, cause the computer to perform the steps of:
-
retrieving structural information of memorized documents according to a document type declaration that corresponds to each of said documents;
wherein said documents are written in a markup language;
retrieving elements, attributes and values of said elements and said attributes of said documents;
generating a multilevel inverted index from said structural information, said elements, said attributes and said values;
accepting a specification from a user having members that comprise at least one of said elements, said attributes and said values;
responsive to said specification, extracting data from said index that complies with at least one of said members;
associating said data with corresponding ones of said documents;
displaying a hierarchy of virtual directory paths of a file system, wherein said directory paths each comprise a sequence of said members, and wherein files in directories that are identified in said directory paths comprise selected ones of said documents possessing said specification;
responsive to said step of displaying, navigating said hierarchy; and
browsing among said documents. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39)
extracting a document identifier from one of said postings of said values;
extracting an offset of a context from said one of said postings of said values; and
extracting an entry length of said context from said one of said postings of said values.
-
-
34. The computer software product according to claim 31, wherein said documents are XML documents.
-
35. The computer software product according to claim 31, further comprising the steps of:
-
noting changes in a composition of a repository of said documents; and
updating said index responsive to said changes.
-
-
36. The computer software product according to claim 31, wherein said specification comprises a partial query and a complete query, that are expressed as components of a path within said file system.
-
37. The computer software product according to claim 31, wherein said specification is stated as a path name by the user.
-
38. The computer software product according to claim 31, wherein said specification is issued via a file system applications programming interface.
-
39. The computer software product according to claim 38, wherein said instructions define a file system engine that issues calls to an operating system.
-
40. A computer implemented information retrieval system for presenting a semantically dependent directory structure of document files to a user, wherein documents of said document files are written in a markup language, comprising:
-
a file system engine, that receives a file request via a file system application programming interface and issues file system calls to an operating system, wherein said file request specifies a file content of memorized files;
a parser of said markup language, linked to said file system engine, that retrieves structural information of said documents, said parser further retrieving at least one of elements, attributes and respective values thereof from said documents;
an indexer, linked to said parser, for constructing an inverted index of said elements and said attributes and said respective values thereof, wherein responsive to said file request, said file system engine retrieves postings of said inverted index that satisfy requirements of said file request, and returns directory paths to selected ones of said document files corresponding to said postings; and
a browser for displaying said directory paths and said document files as a navigable hierarchical display that is constructed on-the-fly by said file system engine. - View Dependent Claims (41, 42, 43, 44, 45, 46, 47)
a document identifier of one of said documents;
an offset of a context of said one document; and
an entry length of said context of said one document.
-
-
43. The information retrieval system of claim 40, further comprising an analyzer for updating said inverted index, wherein said analyzer analyzes additions to said memorized files.
-
44. The information retrieval system of claim 40, wherein said parser retrieves said structural information from document type declarations of said documents.
-
45. The information retrieval system of claim 40, wherein said file request comprises a partial query and a complete query, that are expressed as components of a path within said file system.
-
46. The information retrieval system of claim 40, wherein a portion of said file request is a path name.
-
47. The information retrieval system of claim 40, wherein a repository of said documents is a networked file system.
Specification