View this PageEdit this Page (locked)Attachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

Fall2002 M7 - Viewing multiple maps, merging maps and testing

M7: Working with multiple maps

UPDATES:

One of the really tough problems in genealogical work is figuring out where two trees overlap. Is person p the same as person q? Maybe their names are close, the birthdates are off a little, but the death dates are off a lot. Maybe person p is really person r? What's the best fit?

You are to modify your GenealogyMap to provide a user with the ability to browse through two maps at the same time. You must also provide the functionality of comparing two networks of people. HOW the user does this is up to you. Can you only match maps open at the same time? Can you have one open and then compare to another map read from a GEDCOM file? Whatever the process is, communicate it in your user interface so that your TA can do it. At least one GenealogyMap must be open to start, since the user has to start the comparison from your user interface.

What you need to do is:
  1. Provide the ability to look at two maps simultaneously.
  2. Give the user a way to designate Persons from the two maps as the same individual, thus beginning or continuing the process of merging the two open maps.
  3. Provide the ability to automatically compare two maps, looking for Persons from each map who may represent the same individual.
  4. Create a ranking of such matches, for all pairs above some threshhold of "closeness."
  5. Provide an interface that allows the user to approve matches before you incorporate the information from two matched Person objects in a merged map.
  6. Finally (and most importantly) generate some kind of interface or visualization that illustrates the merged tree. The user has to be able to move around representations of Persons, see the relationships, and somehow discern which nodes are merged, and which nodes are from which maps and are NOT merged. You should also support the user unmerging nodes and merging nodes that your algorithm didn't merge. Note that this new interface or visualization must continue to work and be usable while the previously open GenealogyMap must continue to work. (Hint!: What was that MVC thing again...?)

What is a match?

Matching obviously starts with names, but completeness of information and spelling variations make this more complex than string matching. Spellings that are close can be considered a possible match, though with less certainty than exact matches. An initial and a first name that starts with that initial are a likely match. Names with and without middle initials can match one another. Differing first names do not eliminate the possibility of a match. (Extra Credit: Find out what Soundex codes are and use them in your matching process.)

Birth and death dates can give clues about matches, but different genealogy sources may give varying dates and places for these events. Exact matches of birth and death info can strengthen the probability of a match. Dates that vary by decades or centuries are an easy way to eliminate the possibility of a match.

Family information can be very valuable in evaluating matches, but in different ways. Finding parents that can't possibility be the same people are another good way to eliminate a match. Parents that do match are a strong positive clue (but not a guarantee!). Matching spouses are a strong positive indicator, while non-matching spouses typically don't help, since multiple marriages are a possibility. Similarly, matching children are a positive indicator, but lack of matching children may only be a sign of incomplete information in one or both trees.

Merging (and unmerging)

Merging has implications for the Persons being merged and even more so for surrounding family structures. When two Persons are merged, the resulting representation should contain a combination of the information in the original two representations. If there are conflicts, the user should be asked to guide the merger by making choices (or perhaps by designating one Person as overriding the other in all cases).

Doing this merger has significant implications on the family structures. Once a merger between two Persons is indicated, their parents and any other ancestors represented in the two files must necessarily be the same. Merging ancestors may potentially generate a string of requests to the user to guide the choice of information about ancestors. Spouses of two Persons being merged may also be merged, but not necessarily (again, multiple marriages are a possibility). If they are merged, then a family match has been identified. Children in matched families may be merged, if identified as matching; otherwise, the lists of children in the two families should be combined.

Given all of the above considerations, unmerging is hard! It is best implemented as an undo feature, meaning you need to have a way to put things back the way they were before a match. (Hint: this means mergers should be accomplished by creating new objects, not modifying the ones in the original map.)

Here is an article to read if you are interested in thinking about an advanced merging algorithm: Remerge.pdf

Scenarios and testing

There are a lot of possibilities covered in the matching and merging discussions above. You should carefully generate scenarios and test cases that cover the various situations. Start with simple ones and then add additional complexity. Of course, add new CRC cards and UML structures as you find it necessary

Here are two example GEDCOM files that you may want to use to construct your test cases:
Example.ged
Kennedy.ged
The simplest way to create a test case is to use some subset of a GEDCOM file (perhaps a very small subset) to create a GEDCOM and use it as the second file to be matched or merged with the first. You can create a sequence of progressively more complex tests by including more person and family records in the subset. Of course, you can delete or edit information to demonstrate inexact matching.

NOTES:

Turn in your code on-line, and your design in-class.

Grading:



The percentages above add up to 110. Getting the full credit for merging and matching depends on how the complexity of the test cases you can hande.

Note: If the TAs can't figure out how to do these things, they don't have to give you the points. The UI must be usable.


Questions on Fall2002 M7 Milestone


Links to this Page