View this PageEdit this Page (locked)Attachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

Spring2004 M5 - Add search functions and automatically gather genealogical information

This milestone involves implementing two kinds of searches: finding Persons within a Genealogy who match search criteria and finding information available on the Web to augment information stored about a Person in one of your Genealogies.


Local searching
  1. Extend the graphical user interface you built for M4 to enable a user to search for individuals who meet specified search criteria. Search criteria may include any of the kinds of information that can be entered interactively about individuals via your M2 and M4 interfaces, excluding notes. The simplest way to do this is probably to use something that looks like your data entry form as a search specification form. For now, a "match" is defined as an individual whose stored information exactly matches the search criteria.
  2. Searches of genealogy data often return multiple results since, for example, names definitely aren't unique. You must figure out some way to present multiple matches in a way that will be useful to a user of your system.
  3. Exact matching as specified in requirement 1 is actually not very useful in working with genealogy data. Issues like incomplete of information and spelling variations make really useful matching more complex than simple string matching. Spellings that are close can be considered a possible match, though with less certainty than exact matches. An initial and a first name that starts with that initial are a likely match. Names with and without middle initials can match one another. Differing first names do not eliminate the possibility of a match, since immigrants to the US often became know by English versions of their names (Johan -> John, Isabella -> Elizabeth). Birth and death dates can give clues about matches, but different genealogy sources may give varying dates and places for these events. Exact matches of birth and death info can strengthen the probability of a match. Dates that vary by decades or centuries are an easy way to eliminate the possibility of a match, but small variations are common in what should be considered matching records. So, you must implement an inexact matching function that ranks possible matches by how "close" they are to the information specified in the search template. The design of this matching algorithm is up to you. Be creative!
Extra credit opportunity: Find out what Soundex codes are and use them in your matching process.

Web searching
  1. There is a great deal of genealogy information available on the Web, much of it in freely accessible databases. In this part of your project, you must provide a user with the capability to search for genealogy information on the web. Search criteria should be specified in the same way as for local searches. It also should be possible to use the information stored in a particular Person object as search criteria.
  2. Your web search feature must be able to get data from at least two different web sites.
  3. Candidate matches must be ranked as they are for local searches and presented to the user in an appropriate way.
  4. The user must be provided with a way to indicate that data from the results of a web search should be added to the information about an existing Person in a Genealogy or that it should be used to create a new Person in a Genealogy.

You will obviously have to be able to parse and interact with web pages in order to implement these web search features. Parsing will be discussed in class well in advance of this assignment. In Fall 2002, students had to implement similar functionality. Their Cases (including code) are available for reuse.

If you can't find any matches, you can't parse a web page as expected (i.e., the page changed since you wrote your parser), or you can't establish a network connection, YOU MUST PROVIDE A REASONABLE ERROR MESSAGE! A Squeak error is not considered reasonable!


A single zip file containing all your design documents & code (.ect design and .txt/.doc/.pdf materials and .st, .cs or .pr file) should be turned in on the cs2340turnin coweb: This file should be submitted to the coweb before class (noon) on April 8.



(NOTE: In order to help your TA evaluate your submission, include a readme file in your turnin with a summary of updates to each of the deliverables, a description of your matching algorithm, whether you match names using Soundex codes, the names of the web sites you can search and what kinds of information you extract from them.)

Note: If the TAs can't figure out how to do these things, they don't have to give you the points. The UI must be usable.

Questions on Spring2004 M5 Milestone

Links to this Page