Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007
Milestone 5 VBM
1. Essentially the assignment here was to provide a means of automatically completing missing information about any person requested.
2. This was done by parsing a series of websites specializing in genealogical information.
Design and Approach
- An interface was provided that allowed the user to select which information he/she wanted to search for on the current person. Four sites were available.
- Using the site www.lineages.com and the requested person's given and surnames, birth dates and death dates were searched. The website was traversed using HTMLTokenizer to build a list of tokens that would later be examined for useful information. Using the ordering of the tokens and certain consistencies in the webpage, the dates were extracted, the tags removed, and Date objects were created for all people matching the input given and surname. A list of people and their respective birth and death dates was compiled and shown to the user in the form of a menu, where the user could select one listed to be automatically updated with the information or cancel and no changes would take place.
- Using the site www.whitepages.com, a search was performed on the current person to search for their address and phone number. If multiple entries were returned, a menu was presented to allow the user to choose which he/she wanted to use. After an entry was selected, the information was added as a record to the current person. If no entries were returned, or if there were so many entries that the webpage returned an error message, a message was displayed to the user. This section of code did not use a parser in the traditional sense...there really wasn't a need to. Instead, a key phrase of HTML was searched for in the return page. When this piece of HTML was found, subsequent information was tokenized and the appropriate information was pulled from it.
- Using the site www.ancestry.com, the first and last name of the current person, the residence of current person before he or she died is search. This functionality is in the class AncestSite. The result is retrieved as an html file stream. The stream is then tokenized using htmlTokenizer. There may or maynot be any information for the residence of the current person. It is very important to find pattern of the data in the html file returned, and extract the residence information if present. For this site, the residence information is presentd as an html text, 2 tokens behind the token htmlText Residence: . The html text is converted to regular string. When a new address is found it is added to a pop-up menu so that the user may choose which one he/she wants. The choice is stored as a record in the current object.
Link to this Page