View this PageEdit this PageAttachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

The unnamed Group

The unnamed Group consists of:

Our semester's project was a genealogy tracking system. Notably it had to be able to exchange data in the GEDCOM format, search both locally and on the internet, and merge related genealogies.
Spring 2004 Project Milestones

Our code: unnamed.zip Installations instructions are in the included readme.

Overview

In this page we will cover the overall design of our system, interesting points in our GUI, parser, search engine, and merging algorithm, and some assorted tips and hints from our wonderful ;) experience with Squeak. After making a string of B's we made an A on the final milestone so we'll also try to point out our mistakes and what you can do to push your group over the edge from a B to an A.

  1. Overall Design with UML
  2. GUI Design
  3. Parsing Files
  4. Local and Internet Searching
  5. Merging
  6. Other Tips and Hints


Program Design

unnamed-uml.pdf
After the first 2 milestones, which had no real design requirements, we had a pretty good looking interface that could add and edit individuals and put them in families (see first screenshot in Graphics section.) Rather than start from scratch we decided to use the code we already had and add a master interface to it (second screenshot.) This turned out to be a very sound decision. As you can see in the UML diagram, extra features such as the searching and merging were implemented as additional classes and our design remained very clean and object oriented. It is very simple to remove and add modules this way.

Ironically, the biggest problem in our design was in M2 and because of our "bolt-on" design was carried over from the beginning. We linked people and families to a specific Dictionary, which an individual genealogy stored, by a unique ID number. This link between a person/family and a genealogy caused problems later on in searching and merging. We had to pull tricks such as using temporary genealogies and using lots of references to determine a person's genealogy using their id. In retrospect, this binding of a person and family to a genalogy is a clear violation of object independence and should not have been done in the first place.


GUI Design

Rather than use a gui toolkit we decided to use the morphic tools provided in squeak. This proved to be difficult due to the lack of documentation in Morphic however our program turned out quite nice looking. We'll go through each of our major screens and give an overview of our interface and what could be improved.
Uploaded Image: unnamed-generations.gif
This window, originally named Generations, allows one to create and edit families and people. Getting drag and drop working to move people into families was a pain but the interface works well.
Uploaded Image: unnamed-bigbrother.gif
The Big Brother window controls multiple genealogies using a multiple-document interface paradigm. All the possible operations on a genealogy as a whole are listed as button on the left side of the window. Some discovery on the user's part is required as to what buttons require to initiate them. In addition not all the buttons' purposes are clear at first. Some sort of mouseover or information box in the unused space would go a long way towards making it more user friendly, but overall it works well.
Uploaded Image: unnamed-treeview.gif
The treeview and it's button in the genealogy window were added in M4. Although we had some problems with squeak's layout managers the window is clear and displays the requisite information.
Uploaded Image: unnamed-search.gif
The search interface is pretty clear although it does suffer from a few problems. Morphic did not have any built-in radio buttons so we implemented them ourselves. Unfortunately not all users detect this at first so some other aid would be helpful to display what the search target is. There is also a layout problem in the first row of text boxes. We struggled with the squeak layout system because it is not well documented but the window still has a fairly clean look.
Uploaded Image: unnamed-merge.gif
Although merging is a complicated operation we think our interface provides a lot of power. One the right are the two genealogies which will be merged and on the left is the list of merges to be made. The user can manually add and remove mergers, edit the attributes of a person, and ask the program to automatically find candidate merges. Again the buttons are not very clear at first, and there is also some duplication of commands in being able to edit individuals both before and after merge.


Parsing Files


Blanton explains the parser we used for gedcoms files:

The program requirements called for a GEDCOM parser. Personally, I am familiar with Lex and Yacc. Unfortunately, squeak does not supply either of these. I chose to look at Smacc instead of TGen since it claimed to be better. However, I found that Smacc did not provide the functionality that I needed. At least, the poor documentation did not help. There parsers are often supposed to be context independent. However, Gedcom relies heavily upon context. Therefore, I decided that I would rather write my own parser from the ground up.

The foundation of the parser is a function that returns records as a collection of tokens. This function relies upon a token reader (lexical analyzer) to supply it with tokens for the records. The lexical analyzer grabs one character at a time from a stream to build tokens. Once a token is found, it is run through a regular expression matcher to determine whether it is a number, word, pointer, etc. The token is returned to the record builder. The record function returns records any time that it finds a new line.

There are two primary states in the parser. The first parses the header portion of a Gedcom file, while the second parses the body until a trlr line is found. Since the header state mostly discards information, I will only discuss the body parsing. In order to emulate a SAX parser, I designed a system of callbacks. For example, the body state will ask the record retriever for a new record. If the line is an INDI type, the record's data will be forwarded to an "indiCallback" method within the parser. Each callback is designed to manage any data that would be typical for its type of record.

Another major advance in the parser was the inclusion of a stack. Oddly enough, I did not find a Stack class within Squeak. Therefore, I wrote my own subclass of OrderedCollection. This allows the parser keep track of where it has been. For example, if a DATE record is found, the dateCallback looks at the next to last record on the stack. If it is a BIRT tag, then the dateCallback callback knows to set the birth date of the current person. If it is a DEAT tag, then the death date would be set.


Searching

Blanton explains local searching:

I wrote a class called SearchEngine. It has a few static class side methods that accept a query and a genealogy and return a list of results. A query is a dictionary. The keys are symbols that correspond to attributes of a Person. The values represent the desired data. For example, a key would be #givenName with the associated value be 'John'. In addition, there could be an entry #surName associated with 'Doe'.

Consider the above query with two values. Results range from 0 to 100. In order to handle multiple specifications within the query, each portion contributes to a fraction of total possible result. That is to say, a query with two attributes (e.g. #givenName and surName) would be set up such that each part would contribute up to 50 points. Along the same lines, a 3 part query would supply approximately 33 points for each part.

The search engine has a rate:vs: message. Consider the first half of the example query where #givenName->'John'. For each person in the given Genealogy, rate:vs: would be called where the first argument is 'John' and the second argument is the givenName of the current person. rate:vs: supports strings by using String's alike: method. This result is normalized to 100 and returned. Dates are supported within the rate:vs: method with a custom algorithm.

Internet Searching:

Internet searching was actually not very difficult due to the modularity of our design. www.rootsweb.com provides downloadable gedcoms so searching that site was as easy as determining their cgi commands for a search, finding the links to the gedcoms on the results page, and passing it into our already existing gedcom parser.
In addition we found a class on the cases page from The CS Majors - The Genealogy Application and were able to quickly adapt it to generate a genealogy using our classes. This provided searching on genealogy.com.


Merging

The merging of two genealogies consists of two halfs, merging of people and merging of families. The first task was to merge people together. The user has two options, manual or automerging.

Manual merging consist of selecting two indiviuals, one from each respective genealogy. Time only allowed me to write the most crude of methods to merge info about a person together, that is only copying over fields that both orignal people matched on. Originally the idea was the prompt the user if multiple choices exisited and allow the user to pick, or define it themselves. Due to time, the final version allowed the user to open on a person editor on the newly merged person. Auto merge is the second option the user has, it populates the merge field with likely matches of people from the two orignal geneologies. Basically the program interated through each person in one genealogy performing a local search through the other genealogy. Since the local search already has a ranking system indictating the strength of a match, automerge returned a list of merge canidants if their ranking met a predefined threshold.

In order to support unmerging, it was necassary to create a list of merge info. Each merge node consist of three people references, one for each original person and one for the newly created merged person. Whenever two people are selected to merge, the a new person is created, added to the new genealogy and a new merge node is created to retain the merge info. The unmerging of a person was simply to remove that person from the new genealogy and destorying the corrisponing node.

Merging families proved to be a difficult task. Since families only stored ids of people, of which these ids were are unquie to the genealogy they are bound to, a direct copy of families was not possible. This is also another reason for the list of merge objects. With this list it was possible to search for person ids from their respective genealgies and obtain the new ids of the merged persons. From there copy each family over. Merging issues where approached on a sinerio bases. First, if two families are completly disjoint, no merge conflicts. Second, if a person had different spouses from each geneology, no conflict since a different family was suppose to exisist anyway. Last would be same spouse, difference of children. If the spouse is the same and have the same set of children, then only one copy of that family should exisist in the new genealogy. If different sets of children exisist, by default merge the children list into one list, prompted the user for approval. The odd case of same person, same child, different spouse, should be handled by the user since this reflects inaccurate data in the merging genealogies.


Other Tips and Hints

The Squeak environment can be a pain to work in until you get used to it. We've collected several of our Squeak tips here:

Misc tips

Link to this Page