






Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007
Spring2004 M5 - Add search functions and automatically gather genealogical information
This milestone involves implementing two kinds of searches: finding Persons within a Genealogy who match search criteria and finding information available on the Web to augment information stored about a Person in one of your Genealogies.
Requirements
Local searching
- Extend the graphical user interface you built for M4 to enable a user to search for individuals who meet specified search criteria. Search criteria may include any of the kinds of information that can be entered interactively about individuals via your M2 and M4 interfaces, excluding notes. The simplest way to do this is probably to use something that looks like your data entry form as a search specification form. For now, a "match" is defined as an individual whose stored information exactly matches the search criteria.
- Searches of genealogy data often return multiple results since, for example, names definitely aren't unique. You must figure out some way to present multiple matches in a way that will be useful to a user of your system.
- Exact matching as specified in requirement 1 is actually not very useful in working with genealogy data. Issues like incomplete of information and spelling variations make really useful matching more complex than simple string matching. Spellings that are close can be considered a possible match, though with less certainty than exact matches. An initial and a first name that starts with that initial are a likely match. Names with and without middle initials can match one another. Differing first names do not eliminate the possibility of a match, since immigrants to the US often became know by English versions of their names (Johan -> John, Isabella -> Elizabeth). Birth and death dates can give clues about matches, but different genealogy sources may give varying dates and places for these events. Exact matches of birth and death info can strengthen the probability of a match. Dates that vary by decades or centuries are an easy way to eliminate the possibility of a match, but small variations are common in what should be considered matching records. So, you must implement an inexact matching function that ranks possible matches by how "close" they are to the information specified in the search template. The design of this matching algorithm is up to you. Be creative!
Extra credit opportunity: Find out what Soundex codes are and use them in your matching process.
Web searching
- There is a great deal of genealogy information available on the Web, much of it in freely accessible databases. In this part of your project, you must provide a user with the capability to search for genealogy information on the web. Search criteria should be specified in the same way as for local searches. It also should be possible to use the information stored in a particular Person object as search criteria.
- Your web search feature must be able to get data from at least two different web sites.
- Candidate matches must be ranked as they are for local searches and presented to the user in an appropriate way.
- The user must be provided with a way to indicate that data from the results of a web search should be added to the information about an existing Person in a Genealogy or that it should be used to create a new Person in a Genealogy.
You will obviously have to be able to parse and interact with web pages in order to implement these web search features. Parsing will be discussed in class well in advance of this assignment. In Fall 2002, students had to implement similar functionality. Their Cases (including code) are available for reuse.
If you can't find any matches, you can't parse a web page as expected (i.e., the page changed since you wrote your parser), or you can't establish a network connection, YOU MUST PROVIDE A REASONABLE ERROR MESSAGE! A Squeak error is not considered reasonable!
Turn-in
A single zip file containing all your design documents & code (.ect design and .txt/.doc/.pdf materials and .st, .cs or .pr file) should be turned in on the cs2340turnin coweb: http://coweb.cc.gatech.edu/cs2340turnin. This file should be submitted to the coweb before class (noon) on April 8.
Grading
Grading
(NOTE: In order to help your TA evaluate your submission, include a readme file in your turnin with a summary of updates to each of the deliverables, a description of your matching algorithm, whether you match names using Soundex codes, the names of the web sites you can search and what kinds of information you extract from them.)
- 5% Updated ECoDE CRC CARDS representing a good behavior analysis: Reasonable names, understandable and clearly defined responsibilities, useful comments.
- 5% Updated ECoDE SCENARIOS (About Scenarios) described from specific examples that touch on every major function in the system from the user's point of view.
- Clear descriptions for each scenario are written for a non-computer-science audience.
- Good assignment of responsibilities impelemented by CRC Cards that satisfy each scenario.
- 10% Updated UML CLASS DIAGRAM reflecting your analysis and design.
- A quality design covering all the features of the project.
- Detailed and understandable names and comments.
- All the classes are connected with responsibilities.
- 5% Updated TEST PLAN (About Test Plans) with clear linkages between scenarios and test cases and expected results. We should be able to verify that your test plan covers all of the system requirements.
- 5% Well-documented & good style source code
- Reuse of existing code
- Effective Commenting
- 15% Quality of the design and implementation
- Code matched design
- Usable interface
- The users (your TA) like it.
- 55% Working system:
- 25% Local searching
- 12% Provide a search interface and find an exact match to search criteria
- 3% Support display of multiple individuals that match the search criteria
- 10% Implement inexact matching and present multiple results ordered by "quality" of probable match (you must implement a reasonable matching algorithm to get full credit)
- 2% extra credit for including Soundex codes as part of your inexact matching algorithm
- 25% Web searching
- 2% Initiate web search from search interface
- 3% Initiate web search by using data from a Person
- 10% Ability to obtain data from two different web sites
- 10% Abiltiy to update an existing Person or create a new Person from web data
- 5% SUnit tests which cover all non-ui classes and methods
Note: If the TAs can't figure out how to do these things, they don't have to give you the points. The UI must be usable.
Questions on Spring2004 M5 Milestone
Links to this Page