View this PageEdit this PageAttachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

milestone 3

What we're discussing:

  • Our Squeak UI
  • Webpage Parsing

    Our Squeak User Interface

    Uploaded Image: ff5ui.gif

    Here's a screenshot of our Squeak User Interface. Not very flashy. It's a very simple design, and the menus update dynamically depending on what has been selected. One major flaw with our approach is that it's not clear where the source topics are that have been selected, nor is it clear what the primary news category is. Also, removing topics is very tedious, since you must go back and find the exact combo of source and topic in the second and third boxes, and then click remove. There is no way to directly remove a source topic from the main source topic bin. Also, after the user clicks generate, the UI just locks up and sits there until the paper is generated. And even then, there's no real indication that it has been complete, except for the fact that your mouse starts working again in Squeak. We should have put some loading bar or other visual device to let people know the program status.

    But I think once users get a handle on how it works, it can become a very efficient interface to work with.

    We chose to place all the functional buttons next to each other in the middle, because it could be difficult for users to find the different buttons were they to be scattered all over the place.

  • Clean, lots of white. Not flashy or overloaded with graphics.
  • Buttons in close proximity to each other
  • Dynamically generated menu selection
  • Selected source topics not labeled
  • Selected primary news category not labeled
  • difficulty in removing source topics
  • No indication of whether or not it is creating/generating the page

    Our approach to parsing the Web News Sources
    Several ideas were thrown around as to how we should handle the parsing of news sources. We could have tried to build a universal parsing engine that was broad and robust enough to parse any kind of web page, but that seemed too difficult to implement. We then thought it a good idea to encapsulate different parsing techniques (for each web site) into a block which would be passed around from our editor to a generic "correspondent", who wouldn't actually know about the parsing technique, just implement it. This too had its problems, because there are very specific issues that must be dealt with at each site, and passing around these blocks would get confusing.

    what we actually did
    We created a generic "Correspondent" class, which contains all the basic parsing utilities needed to get at content on a page. It could do things like return all table data cells, return all paragraph tags, return all tables. Basically, anything you would need to grab some sort of content, including actually going out and fetching a page and returning its source code, was found in our Correspondent.

    Then we created specific correspondents who's only job was to parse a specific web source (i.e.-CNN, BBC, etc.). They inherited the general parsing abilities of the Correspondent class, but included specific methods and variables to handle issues uncommon amongst all the web sites.
    For example:
    CNN starts off all of its story content with a paragraph tag < p>, while Slashdot simply puts it inside a < table> cell.

    These Correspondents (we named them with their source+Correspondent - CNNCorrespondent) know the specifics about the web source they are parsing. They know what location to go to, how to find the main story links, how to grab the headlines and graphics for each story. Yet they still call methods from the main Correspondent class when they do a lot of the parsing.

    The Correspondents have the ability to create a "Story", which is the object that holds all the story elements, like the text, headlines, graphics, url. When they find this information, they add it to a Story, and throw it into an ordered collection of stories which the Correspondent will kindly return to the editor for later publication.

  • Very easy to implement
  • Modeled much like real world
  • Easy to fix parsing technique problems
  • Not entirely re-usable (every new source added requires a new correspondent)
  • Parsing techniques can get VERY specific some times.

  • Link to this Page