View this PageEdit this PageAttachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

Cap'n Crunch and the Cereal Killers

Group: Cap'n Crunch and the Cereal Killers

Members:
Matt Balaun, gtg503e
Ray Cole, gtg999e
Jon Vonica, gte248z
Charles Whittington, gte997m

Project: Genealogical Database

Milestone Turnin Files (the good stuff):

General Tips and Tricks (don't reinvent reinventing the wheel):

Per-Milestone Notes (the long-winded explanations):
Milestone 1 - Create a set of genealogy objects and write a GEDCOM file
This was the sole individual milestone for the semester, and it was also our introduction to the Squeak environment. The initial challenge was getting used to operating within Squeak. After downloading Squeak and loading the image, the first thing one has to learn to do is open a new (morphic) project, then open a browser, then start coding in the browser. There are all sorts of fun games the browser will play with you if you're not careful. For starters, don't ever, ever, ever click cancel. Most people would assume that "cancel" means "go back to whatever the state was before trying to accept the changes". Squeak, on the other hand, interprets "cancel" as "obliterate whatever changes were in the middle of being accepted, and return to the initial state. Not good when you're in the middle of accepting code changes and you decide to go back to editing instead. Word to the wise: just click outside the dialog box, and if that doesn't make it go away, hit escape, or hold down alt and click on the dialog box, then click on the X that should appear at the top left. Get used to doing this.

In addition to having one or more browsers open for code editing, it is probably a good idea to have a workspace and a transcript window open too. A file list window will be useful as well, for reading in .st files from your groupmates. What I'd recommend doing is opening a fresh image, opening a new morphic project, opening a browser window, a workspace, a transcript, and a file list, and then saving that image as something new. This will save you a few minutes every time you start Squeak in the future.

Now that the getting-to-know-Squeak phase is over, its time to write some code. For us, the big challenge was analyzing the problem and coming up with a set of data structure objects that would adequately model the problem (we came up with Genealogy, Family, and Person objects). Reusability was also a goal (we used those same objects in every subsequent milestone. At the time, an 'export' method in Genealogy seemed the logical mechanism for writing out a GEDCOM file based on the internal data representation. Future milestones continued to make use of this, even though it was somewhat hackish and probably would have been better implemented as a series of method calls from GEDCOMFile, GEDCOMRecord, GEDCOMField, and Genealogy.

The functionality detailed in the milestone requirements was all implemented with minimal difficulty. There was an added challenge of SUnit testing all the methods, but that is a largely trivial matter, far simpler than using JUnit for Java testing, and I leave the explanation of that to be written elsewhere.

There was one neat trick for getting the dates formatted properly in the export-as-GEDCOM-file functionality, which was: "(((i born) printFormat: #(1 2 3 $ 2 1 2)) asUppercase)". This nifty little snippet first appeared on the newsgroup courtesy of a TA, and was modified marginally to fit the bill perfectly. In this particular case, i is an instance of Person, receiving the born message, which returns a Date object of i's birthdate. The printFormat magic turns it into a String in the GEDCOM-approved format for a date.

Milestone 2 - Add some error checking, and UI for editing persons and families
This was the first of the group-based milestones, which all the remaining milestones for the semester would be. Two of our group members had worked together before. All four members of our group had busy schedules that were difficult to coordinate, several of us living very far off campus. Nonetheless, we managed to get together often enough to work side-by-side in the lab for long coding sessions on the weekends, and during the week we coordinated via email, sending .st fileouts back and forth to eachother, with one of us (Ray) taking the responsibility of maintaining the "master" copy of everyone's code submissions. We started out with a rule that each person SUnit'ed whatever code they wrote, which we stuck to fairly well up until the last milestone. We chose one person's M1 code to reuse for M2 and further milestones, and (at first) we had one person handling all of the documentation. Organization, I think, was intended to be one of the challenges to overcome for this milestone. Our group had no problem with it.

As for the actual coding requirements, we decided to go with a non-draconian error checking system. In other words, our data structure should be able to hold and handle whatever data the user decided to give it. Instead of disallowing data that we, the programmers, decided was invalid, we simply allow the user to call our "verify" method instead, to check to see if the data the user has entered meets our definition of valid or not. The verify method was implemented in both Person and Family, and returns a String of one or more error messages regarding any invalid data entered for either object. The details of these checks can be found in the milestone requirements.

For the user interface portion, we decided to go the morphic route, rather than trying to reuse code from Joe the Box or Bob's UI. Morphic was just easier to figure out, better documented, more generalized and more extensible.

Most of the points we lost on this particular milestone came from insufficient SUnit tests (many of our tests were fairly anorexic, and I think a few just returned true nomatter what) and improper scenarios. Our design itself was pretty good, at least in the eyes of the graders. Our scenarios were too attached to our implementation, however. They should have been more focused on the user's standpoint rather than the developer's standpoint. Keep that in mind as you write scenarios for your own projects.

Milestone 3 - Design the whole thing
No code for this milestone! Instead, a couple weekends spent sitting around a whiteboard discussing design options for every subsequent milestone. This was actually an extremely challenging milestone, because it required us to think long and hard about what we were facing with milestones 4 through 6, what classes we would need, what functionality would go where, what code could be reused, what needed to be thrown out, what subclassing/superclassing could be implemented, etc... Fortunately, all of our group members had experience using UML and had good grasps of design concepts, so after a couple days of intense design, we came up with something that we only partially threw away in future milestones ;).

For milestone 4, we knew that we would need some functionality for parsing and representing the information contained in a GEDCOM file. We already had our internal representation classes (Genealogy, Family, Person), so we went with GEDCOMFile, GEDCOMRecord, and GEDCOMField to handle the reading and deciphering of the elements of the file. These three classes interacted with Genealogy, Family, and Person to go from a GEDCOM file on a disk to an internal Genealogy, containing zero or more Families and zero or more Persons. At the time, we were planning on also making the reverse operation, the export-from-internal-representation-to-GEDCOM-file-on-disk functionality, part of the GEDCOMFile/Record/Field classes, but we decided that what wasn't broken shouldn't be fixed, and stayed with our export method in Genealogy.

In addition to GEDCOM File input, we would also have to handle a multi-generational view of an individual. This we would accomplish using a special window containing a set layout of morphs, each of which would represent a Person. The layout would center on the individual and would show two generations back and one generation down. The two generations back would be simple, statically located. The one generation down would be dynamic based on number of children (and on number of marriages). Individual person data would be displayed by a PersonDetailWindow. In order to keep everything nice and simple and uniform, we decided that all of our windows would inherit from a BaseWindow class. Several other less-major classes were designed for handling smaller tasks, all of which are detailed in our design document for M3.

For milestone 5, we would have to implement search functionality, both local and web-based. For this, we added a new SearchWindow class as well as a SearchResultWindow class. We also had to implement a PersonMergeWindow, since the results from a web search would have to be incorporated into the Genealogy if the user so desired it. A few new matching methods were necessary for comparing one Person to another. Additionally, we were interested in implementing the Soundex matching on names for extra credit, but we weren't at this point certain exactly where that would fit into the design. Several other less-major classes were designed for handling smaller tasks, all of which are detailed in our design document for M3.

For milestone 6, we would have to implement extended (family-based) matching and entire-genealogy merging, with the ability to undo any merge. The undo-ability was easy, we just made MergedPerson, MergedFamily, and MergedGenealogy that subclassed Person, Family, and Genealogy, and contained pointers to the two "parent" classes that were merged. The family-based extended matching was also fairly simple, simply adding matching functionality in Family and also in Genealogy. The hard part would be the visualization of all of this, which required the design of several windows... FamilyMergeWindow, GenealogyMergeWindow, and so forth. We were a bit iffy about the design for this milestone, but this was just an initial design, so we went with it for the time.

Sadly, all of this design had to be done in ECoDE. In some future version, ECoDE might be quite nice. For us, it mainly just got in the way. We ended up keeping a UML diagram done in visio for our actual use, as well as textfile notes of classes and responsibilities. ECoDE was handy for auto-generating the CRC cards at least. The group-collaboration functionality in ECoDE was broken, so we ended up just emailing .st files back and forth. Messy, fills up your quota in a hurry, but it works. I recommend doing this and just staying on top of it to not let the mess become a problem. As it stands, ECoDE should be considered a design challenge, not a design tool.

Milestone 4 - Read GEDCOM files and provide multi-generational views

Back to coding after a lovely spring break. Several days spent with one person working on reading GEDCOM files and interpreting the data as Genealogies, Families, and Persons. Rather than going with a recursive descent parser or a full GEDCOM grammer, we went with a line-by-line analyzer that converted the file into records, which contained fields, which contained one or more lines of the original file, broken up into 'level', 'type', and 'value' for each line. These records and fields were then iterated through to generate Persons and Families, with any fields of unknown type being thrown into an unused fields collection inside each Person or Family. Thus, no data from the original GEDCOM file was lost, even if we weren't displaying it in our view. The export method in Genealogy required a small modification to write out these unused fields to a new GEDCOM file being created.

The various windows that we designed had to be created, and we pretty much stuck with our design from M3. Our multi-generational view looks great, auto-recentering on any person displayed if you click on their icon. Our BaseWindow class is extended by all our other windows. We made a PersonDetailsMorph that can be inserted into any window for displaying (or editing!) the information about an individual person. It all worked out extremely well, with minimal deviation from the initial design.

Milestone 5 - Add search functions and automatically gather genealogical information

We had hoped to reuse some code from previous semesters for our web search functionality, but the previous semesters had used Squeak 3.2, whereas our code was being implemented in 3.6. The Socket class was completely reimplemented in between those two releases, and the HTTPSocket class (which much of the code on the cases pages used) was no longer in existence in 3.6. So we pretty much started from scratch. One of our group members tackled the web search functionality. SearchWindow functionality had to be able to differentiate between a local search or a web search. Then all of the information gained from a web search had to be incorporated into the local database, done with simple Person merging. PersonMergeWindow, SearchResult, and MergedPerson handled much of this needed functionality.

Milestone 6 - Matching and Merging

Initially, we had been thinking of making our genealogy merging more-or-less automatic, as in: provide the merge algorithm with a starting Person node, and it would automatically build the new genealogy by traversing the two previous graphs simultaneously. The problem with this is that the nodes are not well-defined. Persons with the same name could in theory be different persons. The only way for automatic merging to work would be to use a best-guess heuristic, which just sounded too clumsy and inelegant. Therefore, we went with the time-honored approach of "let the user handle it". Present the two genealogies to be merged, let the user drag and drop families onto eachother to merge them, then add any un-merged families straight in. Allow canceling and undoing at any stage, and do not enforce anything. Let the user decide what needs to be merged and what doesn't. That way, if anything comes out wrong, it was the user's fault!

Most of the classes we used were straight from our M3 design. FamilyMergeWindow had to be implemented, and all of the MergedFoo classes had to be populated with methods. However, we did end up throwing together some last-minute hacks to get things working (like GenealogyMergeFamilyWindow and other such classes). We also ended using some of the classes from ECoDE (namely, Connector) to represent our merged families and persons and the unmerged entities they were created from. All in all, our implementation came out amazingly well. Everything works, many minor errors and oversights from previous milestones were corrected, and everything was pretty rigorously tested (to paraphrase a group member, "I couldn't break it, which is really saying something, considering this is Squeak.") Its not the most flashy or idiot-proof implementation out there, but it works extremely well, and lets the user do pretty much whatever the user wants, with any data, valid or not, warning the user whenever things seem fishy but allowing him to continue regardless if he wants.

Link to this Page