Milestones 6 and 7 from Spring 2001 involve mining WWW sites to find MP3 files. The Spring 2002 project may involve mminig WWW sites, as well. This page tries to sketch an approach to this part of the project.
Start by stepping through the web sites in a regular web browser. See which web pages you will want your program to use. Perhaps map on a scrap of paper the sequence of pages that are traversed to submit a search and then find relevant mp3's.
Open a workspace in Squeak, and try to duplicate the path followed using Squeak's HTTP code. For example, try inspecting the result of:
Start writing some code to search the HTML results from each step and find the requests to make in the next step. This will probably involve some kind of parsing. Don't worry about making the analysis perfect; heuristics are inherent to web mining.
Finally, bundle up your bits of code into nicely factored methods and classes that can be used by the main part of your system.
Note that mp3.com only requests an email address, etc, if you have not already entered that information and obtained a relevant cookie. Thus, a Squeak image can be primed, once, by downloading an MP3 file through Scamper. Then, the image will have the necessary cookies to submit and mp3.com won't ask for the information again.
mp3.com, and probably other sites, conclude a search by providing you a file with extension "m3u". The usual content-type of the file is "audio/x-mpegurl". Such files contain a list of URLs pointing to actual mp3 files; thus, to "play" an M3U file, open the file, retrieve the mp3 file on each line, and then play the individual mp3 files.
Note, M3U files often use Unix line-ending conventions. To cope with this, you can use the method #withSqueakLineEndings.
Trying it Yourself
wwwhack.zip (screenshot) contains an image and changes file that walks you through the above things in detail, and lets you try it yourself.