Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007
- This milestone was the first parsing assignment. As such, it seemed daunting at first. With careful consideration and a good design, it proved pretty straightforward.
- Having a working knowledge of cgi forms and html made this milestone much easier. If no one in your group has any experience with cgi or html (not FrontPage, but notepad-based coding of html) and you have to parse html, make friends with someone who's had experience. Or ask a TA for some guidance! Don't be afraid to view html source; it's not that ugly.
- The inspect message was especially useful in writing this milestone. Being able to see what the HtmlParser returned made parsing the pages much easier.
- In my experience, it's best to start out writing code in the workspace. This milestone was no exception. I had several workspaces which started out as sandboxes and evolved into my parsing methods.
- As usual, collections were used extensively in this milestone.
Considering the Requirements and Developing a Design
- Before coding anything, we had to decide which databases to use. This was tricky. Some databases required a password, which wasn't going to work for us. We reviewed the requirements to make an informed decision.
- The requirements specified that we needed to support four sites of auto completion, meaning we had to auto complete: one piece of data from four web databases, four pieces of data from one web databases, or some variation therein. Upon selecting our databases, we decided to support auto completion of two forms of data (birth date and death date) from two databases.
- Eventually, we settled on the Social Security Death Index and Roots Web's World Connect databases for a few reasons. First, they didn't require passwords. Second, they produced nice html tables which would make parsing fairly easy. Lastly, they provided the data fields we needed.
- The essentials of the design invoved: creating a new class, SearchEngine, which would be responsible for getting the data from the websites and parsing the pages; and adding the option to auto complete data to the GenealoguGUI.
- Since auto completion was to work on a single Person, we decided to add the option to auto complete a date to the Person's menu. The option was only offered for the dates missing. This design decision simplified the approach, yet allowed us to satisfy all of the requirements. One of the design lessons we've learned is that simplicity is key. A simple design isn't easy; it requires significantly more thought than just using the first thing that comes to mind. But taking the extra time to develop a simple, elegant design saves much time in coding. If you take the time to really design, the coding is much easier. We could tell you this a million times, but the only way you'll learn is by doing. Just remember this: Strive for simplicity.
- We decided to use Squeak's HtmlParser because it was there and it was what we needed. So often that's all that it takes to decide what to use.
- Now, we had to figure out what the variables were that were send through the cgi script to get the web pages we needed. This is a simple task: view the html source code, search for "form," and then find the input tags. The name fields are the variable names you'll need to assign values to. (This is difficult to explain, but makes sense when you're looking at the html.)
- Now, taking these fields, you have to create a dictionary of arrays. The keys are the variable names and the values are the fields you're sending to the database.
- You get the contents of the html page with this line of code:
doc := HTTPSocket httpGetDocument: theUrl args: theArgs.
- Now that we had the HTML document we wanted, we had to parse it.
Tackling the parsing
- To get something parsable, we followed the above code with the following lines:
contents := doc contents.
parsed := HtmlParser parse: contents.
- Here's an inspect window of a HtmlDocument, which is returned by HtmlParser:
- Since HTML is a parser's dream, the task became figuring out where the data we needed was, that is, in which tags to find our data. This was a matter of manipulating collections. While the individual databases had different formatting, the basics of manipulations were the same.
- The specification required that we handle the error cases that resulted if you couldn't connect to the website (no network connection) or couldn't parse the page. Specifically, we couldn't allow any Squeak errors to occur.
- To check for a network connection, we used the following code, which would display the message below to the user if no connection was available.
"Make sure we're connected to the internet"
netStatus := NetNameResolver addressForName: 'www.rootsweb.com' timeout: 15.
(netStatus isNil) ifTrue: [ PopUpMenu notify: 'Unable to connect to online genealogy databases.', nl, 'Please check your internet connection and try again.'. results add: 'failure'. ^results.]
- As it worked out, the way we wrote the code, if the page couldn't be parsed, it would return as if there were no results. We confirmed this by trying to break our parser– since we couldn't change the page, we passed in a completely different URL, google.com. Here's the code that failed to break our parser:
"Test code with goal: Try to break the parser"
theUrl := 'http://www.google.com/search'.
theArgs add: (Association key: 'hl' value: (Array with: 'en' )); add: (Association key: 'q' value: (Array with: 'lauren')).
The finished product:
- The option to auto complete a birth or death date appeared in an indivdual person's menu if they were lacking a given piece of information. Here' s a menu for Helen, who needs both her birth and death dates filled in:
- Here is the results offered for auto completion of Helen's birth date.
- And this is the menu of selections for auto completing Helen's death date. Note that these are all valid dates, which are checked so that they don't occur before the birth date or more than 100 years after the birth date.
- If the database didn't return any results, or if the results weren't valid (checked if the other date was completed), this was displayed to the user:
So, that's the basics of how we completed Milestone 5. The project is linked to below for your viewing pleasure.
Link to this Page