Comparing and merging structured documents syntactically and semantically

US 8,286,132 B2
Filed: 09/25/2008
Issued: 10/09/2012
Est. Priority Date: 09/25/2008
Status: Expired due to Fees

First Claim

Patent Images

1. A method of performing a syntactic and semantic three-way merge of structured software documents, the method comprising:

receiving a first version of a document coded in a structured programming language containing a first plurality of elements, a second version of the document containing a second plurality of elements, and a third version of the document containing a third plurality of elements, wherein the first version of a document is an original version of the document, the second version of the document is an end-user modified version of the original version and the third version of the documents is a developer modified version of the original version of the document;

deserializing the first, second, and third versions of the document to generate a first data model, a second data model, and a third data model respectively representing the first, second, and third versions in a first data store, each data model comprising a tree data structure that includes a corresponding node for each element of the plurality of elements contained within the version of the document represented by the data model, each node of each data model containing a context describing the element corresponding to the node;

generating an identifier for each node of each data model in the first data store that is unique to the node within the data model by applying a set of identifier determination rules to the context describing the element corresponding to the node;

comparing the identifier for each node in the first data model with the identifier for each node in the second data model to identify each node in the first data model not having matching identifiers with any node in the second data model in the first data store and to link each pair of nodes in the first and second data models that have matching identifiers;

applying a set of comparison rules to the contexts of each linked pair of nodes in the first and second data models to identify differences between each linked pair of nodes in the first and second data models in the first data storegenerating a copy of the third data model in the first data store, deleting each node in the copy of the third data model having matching identifiers with an node in the first data model not identified as having matching identifiers with any node in the second data model, and modifying each node in the copy of the third data model having matching identifiers with any linked pair of nodes in the first and second data models by applying a set of merge rules based upon the identified differences between the linked pair of nodes; and

serializing the copy of the third data model to generate a fourth version of the document.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of performing a three-way merge includes receiving first, second, and third versions of a structured document containing first, second, and third pluralities of elements respectively; deserializing the first, second, and third versions to generate first, second, and third tree-structured data models respectively representing the first, second, and third versions; generating an identifier for each node of each data model that is unique within the data model by applying identifier determination rules to a context describing the element corresponding to the node; comparing each identifier in the first data model with each identifier in the second data model to identify each node in the first data model not having matching identifiers with any node in the second data model and to link each pair of nodes having matching identifiers; and applying comparison rules to the contexts of each linked pair of nodes to identify differences therebetween.

34 Citations

View as Search Results

19 Claims

1. A method of performing a syntactic and semantic three-way merge of structured software documents, the method comprising:
- receiving a first version of a document coded in a structured programming language containing a first plurality of elements, a second version of the document containing a second plurality of elements, and a third version of the document containing a third plurality of elements, wherein the first version of a document is an original version of the document, the second version of the document is an end-user modified version of the original version and the third version of the documents is a developer modified version of the original version of the document;
  
  deserializing the first, second, and third versions of the document to generate a first data model, a second data model, and a third data model respectively representing the first, second, and third versions in a first data store, each data model comprising a tree data structure that includes a corresponding node for each element of the plurality of elements contained within the version of the document represented by the data model, each node of each data model containing a context describing the element corresponding to the node;
  
  generating an identifier for each node of each data model in the first data store that is unique to the node within the data model by applying a set of identifier determination rules to the context describing the element corresponding to the node;
  
  comparing the identifier for each node in the first data model with the identifier for each node in the second data model to identify each node in the first data model not having matching identifiers with any node in the second data model in the first data store and to link each pair of nodes in the first and second data models that have matching identifiers;
  
  applying a set of comparison rules to the contexts of each linked pair of nodes in the first and second data models to identify differences between each linked pair of nodes in the first and second data models in the first data storegenerating a copy of the third data model in the first data store, deleting each node in the copy of the third data model having matching identifiers with an node in the first data model not identified as having matching identifiers with any node in the second data model, and modifying each node in the copy of the third data model having matching identifiers with any linked pair of nodes in the first and second data models by applying a set of merge rules based upon the identified differences between the linked pair of nodes; and
  
  serializing the copy of the third data model to generate a fourth version of the document.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein the context of each node of each data model describes each attribute, each descendent element, and each attribute of each descendent element of the element corresponding to the node within the version of the document represented by the data model.
  - 3. The method of claim 2, wherein the set of comparison rules applied to the contexts of each linked pair of nodes in the first and second data models includes a first comparison rule for identifying each child element of the element corresponding to the linked node in the second data model not having matching identifiers with any child element of the element corresponding to the linked node in the first data model, and wherein the set of merge rules includes a first merge rule for inserting a copy of each node corresponding to a child element of the element corresponding to the linked node in the second data model not having matching identifiers with any node corresponding to a child element of the element corresponding to the linked node in the first data model in the copy of the third data model.
  - 4. The method of claim 2, wherein the set of comparison rules applied to the contexts of each linked pair of nodes in the first and second data models includes a first comparison rule for identifying each attribute of the element corresponding to the linked node in the second data model that does not have matching values with any corresponding attribute of the element corresponding to the linked node in the first data model, each attribute of the element corresponding to the linked node in the first data model that is not included in the element corresponding to the linked node in the second data model, and each attribute of the element corresponding to the linked node in the second data model that is not included in the element corresponding to the linked node in the first data model.
  - 5. The method of claim 4, wherein the set of merge rules includes a first merge rule for assigning the value of each attribute of the element corresponding to the linked node in the second data model that does not have matching values with any corresponding attribute of the element corresponding to the linked node in the first data model to the corresponding attribute in the context of the node in the copy of the third data model having matching identifiers with the linked pair of nodes in the first and second data models, a second merge rule for deleting each attribute in the context of the node in the copy of the third data model having matching identifiers with the linked pair of nodes in the first and second data models corresponding to any attribute of the element corresponding to the linked node in the first data model that is not included in the element corresponding to the linked node in the second data model, and a third merge rule for inserting a copy of each attribute of the element corresponding to the linked node in the second data model that is not included in the element corresponding to the linked node in the first data model in the context of the node in the copy of the third data model having matching identifiers with the linked pair of nodes in the first and second data models.
  - 6. The method of claim 3, wherein each version of the document is provided according to a respective set of semantic constraints specifying whether each element of the corresponding plurality of elements is ordered or unordered, and wherein comparing the identifier for each node in the first data model with the identifier for each node in the second data model further comprises comparing each node in the first data model with the identifier of each sibling node of each node in the second data model not having matching identifiers with the node in the first data model to identify whether the node in the first data model has matching identifiers with any node in the second data where the set of semantic constraints for the first version of the document specifies that the element corresponding to the node in the first data model is unordered.
  - 7. The method of claim 6, wherein the set of comparison rules applied to the contexts of each linked pair of nodes in the first and second data models includes a second comparison rule for identifying any differences between a child element sequence of the element corresponding to the linked node in the first data model and a child element sequence of the element corresponding to the linked node in the first data model, and wherein the set of merge rules includes a second merge rule for reordering the nodes corresponding to child elements of the element corresponding to each node in the copy of the third data model having matching identifiers with any linked pair of nodes in the first and second data models where the set of semantic constraints for the first version of the document specifies that the element corresponding to the linked node in the first data model is ordered.
  - 8. The method of claim 3, wherein the set of comparison rules applied to the contexts of each linked pair of nodes in the first and second data models includes a second comparison rule for comparing the identifier for each node corresponding to a child element of the element corresponding to the linked node in the first data model with the identifier for each node corresponding to a child element of the element corresponding to the linked node in the second data model to identify each node corresponding to a child element of the element corresponding to the linked node in the first data model not having matching identifiers with any node corresponding to a child element of the element corresponding to the linked node in the second data model in the first data store and to link each pair of nodes corresponding to child elements of the elements corresponding to the linked nodes in the first and second data models that have matching identifiers, and a third comparison rule for applying the set of comparison rules to the contexts of each linked pair of nodes corresponding to child elements of the elements corresponding to the linked nodes.
  - 9. The method of claim 8, wherein the set of merge rules includes a second merge rule for deleting each node corresponding to a child element of the element corresponding to each node in the copy of the third data model having matching identifiers with any node in the first data model not identified as having matching identifiers with any node in the second data model, and modifying each node corresponding to a child element of the element corresponding to each node in the copy of the third data model having matching identifiers with any linked pair of nodes in the first and second data models by applying the set of merge rules based upon the identified differences between the linked pair of nodes.
  - 10. The method of claim 1, wherein the set of identifier determination rules are maintained in a first data repository, and wherein the first data repository provides a pluggable framework for the set of identifier determination rules.
  - 11. The method of claim 10, wherein the identifier generated for each node of each data model includes a name and zero or more attributes values of the element corresponding to node, and wherein the zero or more attribute values included in the identifier are sufficient to make the identifier unique to the node within the data model.
  - 12. The method of claim 1, wherein the set of comparison rules are maintained in a first data repository, and wherein the first data repository provides a pluggable framework for the set of comparison rules.
  - 13. The method of claim 1, wherein the set of merge rules are maintained in a first data repository, and wherein the first data repository provides a pluggable framework for the set of merge rules.
  - 14. The method of claim 1, wherein the structured programming language is selected from SGML, XML, HTML, WML, XHTML, DHTML, other SGML derivatives, and user interface markup languages.
  - 15. The method of claim 14, wherein the structured programming language is XML.
  - 16. The method of claim 1, further comprising converting a first, second, and third versions of the document from the structured programming language to a second structured programming language.
  - 17. The method of claim 1, wherein the fourth version of the document is generated in a second structured programming language.

18. A non-transitory computer-usable medium having computer readable instructions stored thereon for execution by a processor to perform a method of performing a syntactic and semantic three-way merge of structured software documents, the method comprising:
- receiving a first version of a document coded in a structured programming language containing a first plurality of elements, a second version of the document containing a second plurality of elements, and a third version of the document containing a third plurality of elements, wherein the first version of a document is an original version of the document, the second version of the document is an end-user modified version of the original version and the third version of the documents is a developer modified version of the original version of the document;
  
  deserializing the first, second, and third versions of the document to generate a first data model, a second data model, and a third data model respectively representing the first, second, and third versions in a first data store, each data model comprising a tree data structure that includes a corresponding node for each element of the plurality of elements contained within the version of the document represented by the data model, each node of each data model containing a context describing the element corresponding to the node;
  
  generating an identifier for each node of each data model in the first data store that is unique to the node within the data model by applying a set of identifier determination rules to the context describing the element corresponding to the node;
  
  comparing the identifier for each node in the first data model with the identifier for each node in the second data model to identify each node in the first data model not having matching identifiers with any node in the second data model in the first data store and to link each pair of nodes in the first and second data models that have matching identifiers; and
  
  applying a set of comparison rules to the contexts of each linked pair of nodes in the first and second data models to identify differences between each linked pair of nodes in the first and second data models in the first data store;
  
  generating a copy of the third data model in the first data store, deleting each node in the copy of the third data model having matching identifiers with any node in the first data model not identified as having matching identifiers with any node in the second data model, and modifying each node in the copy of the third data model having matching identifiers with any linked pair of nodes in the first and second data models by applying a set of merge rules based upon the identified differences between the linked pair of nodes; and
  
  serializing the copy of the third data model to generate a fourth version of the document.

19. A data processing system comprising:
- at least one processor;
  
  a random access memory for storing data and programs for execution by the at least one processor; and
  
  computer readable instructions stored in the random access memory for execution by the at least one processor to perform a method of performing a syntactic and semantic three-way merge of structured software documents, the method comprising;
  
  receiving a first version of a document coded in a structured programming language containing a first plurality of elements, a second version of the document containing a second plurality of elements, and a third version of the document containing a third plurality of elements, wherein the first version of a document is an original version of the document, the second version of the document is an end-user modified version of the original version and the third version of the documents is a developer modified version of the original version of the document;
  
  deserializing the first, second, and third versions of the document to generate a first data model, a second data model, and a third data model respectively representing the first, second, and third versions in a first data store, each data model comprising a tree data structure that includes a corresponding node for each element of the plurality of elements contained within the version of the document represented by the data model, each node of each data model containing a context describing the element corresponding to the node;
  
  generating an identifier for each node of each data model in the first data store that is unique to the node within the data model by applying a set of identifier determination rules to the context describing the element corresponding to the node;
  
  comparing the identifier for each node in the first data model with the identifier for each node in the second data model to identify each node in the first data model not having matching identifiers with any node in the second data model in the first data store and to link each pair of nodes in the first and second data models that have matching identifiers; and
  
  applying a set of comparison rules to the contexts of each linked pair of nodes in the first and second data models to identify differences between each linked pair of nodes in the first and second data models in the first data store;
  
  generating a copy of the third data model in the first data store, deleting each node in the copy of the third data model having matching identifiers with any node in the first data model not identified as having matching identifiers with any node in the second data model, and modifying each node in the copy of the third data model having matching identifiers with any linked pair of nodes in the first and second data models by applying a set of merge rules based upon the identified differences between the linked pair of nodes; and
  
  serializing the copy of the third data model to generate a fourth version of the document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Yuan, Yu, Guminy, Scott M
Primary Examiner(s)
Zhen, Li B
Assistant Examiner(s)
Kia, Arshia S

Application Number

US12/238,080
Publication Number

US 20100088676A1
Time in Patent Office

1,475 Days
Field of Search

717/170, 717/120
US Class Current

717/120
CPC Class Codes

G06F 16/80 of semi-structured data, e....

G06F 40/197 Version control for softwar...

Comparing and merging structured documents syntactically and semantically

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

34 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Comparing and merging structured documents syntactically and semantically

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links