DataComparingAlgorithms

Last edit January 31, 2007
Here is a general algorithm for comparing two similar tables (or copies of tables).

 for r = each record of t1 not in t2  // based on primary key
   display record
 end-for

for r = each record of t2 not in t1 display record end-for

for key = each record common to both

r1 = getRecord(t1, key) r2 = getRecord(t2, key)

for each attribute in r1 not in r2 display key, attribute, and value end-for

for each attribute in r2 not in r1 display key, attribute, and value end-for

for attr = each attribute in r1 if r1[attr] not-equal-to r2[attr] display key, attribute, r1[attr], r2[attr] end-if end-for

end-for


Trees

One technique for comparing trees is to print them as text, and then use text-based difference finders common in most operating systems. Use recursion to create the outline-like format, and perhaps disable white-space sensativity so that different levels won't be over-emphasized. You may want to try it with and without the full node path, as each will emphasize different kinds of changes. Here is a pre-differenced sample text file:

  node 1 foo=7 bar=9
  ..node 1.2 foo=12 bar=7
  ..node 1.3 foo=9
  node 2 bar=6
  ..node 2.1 foo=12
  ....node 2.1.1 foo=12 bar=99
  ..node 2.2 bar=786  

(Dots used to prevent TabMunging, but may not otherwise be needed in actual file.)