View this PageEdit this PageAttachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

Milestone 5 Case Study - DY

Milestone 5 Specs

Getting Started

Before we really got into coding, we looked over old cases and found website parsing previously done (see? Cases really do come in handy!). We found the best way is to handle each site separately – the way html is written, site structures will be different – there is no standard layout. So our design held here, with different parsing objects for each site. Another main thing that took a lot of time before we were able to start coding was figuring out how to use the websites we needed. Figuring out the URL with our query inputted presented more problems than we anticipated, and involved hours of tediously searching website source for the correct link.

Making it Work

Squeak has an HTMLParser built in – we know Squeak can handle html because of its web browser. So we used HTMLParser on each site, and found OrderedCollection HELL. The main problem is that the way the Parser works, each tag, subtag, etc is its own OrderedCollection, which brings nested OrderedCollections en masse, which are extremely hard to work with. Luckily, Squeak has great list processes, with select, collect, etc. Once we found the entry point to the data (i.e. which OrderedCollection would contain what we needed), we were able to find certain tags and thus our data. Sites are pretty standard within themselves as far as formatting is concerned, so our findings were pretty consistent. So after we found the data we wanted, all we had to do was parse it out and present it to the user for selection. By following the same design for each “site parser”, we were able to create a pretty solid structure for what we returned and presented. Basically, the hardest part was finding our data within the site – after that, adding information was easy.

The Design

Uploaded Image: m5ClassDiagram.jpg

Afterthoughts

  1. Site parsing with HTMLParsing takes time. Let Squeak do its thing for a little while before you start thinking that it’s broken/frozen/whatever.
  2. Learn how to use explorer – it really helps with OrderedCollection hell.


Link to this Page