View this PageEdit this PageAttachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

2006Spring: MusExMachina: Cases: Team Productivity Practices: Requirements Monitoring


Monitoring course webpages for requirements changes

An overview of how webpage monitoring can be accomplished follows: (bolded terms are names of unix utilities)
Once a day do:

We employed a web-based platform to manage our webpage monitoring needs. Though monitoring course webpages can be fulfilled using a command line utility, a web platform provides richer interaction opportunities and the ability for non-technical users to use them. Free webpage monitoring services exist which provide such ability, an example of which is (

However, our efforts to effectively monitor for content changes were limited. We found many swiki pages have ancillary metadata content within them which changes every day, even if the content published to the webpage has not changed. Links to other swiki pages are annotated with the number of days since that particular webpage last changed, a figure which updates every day. We illustrate this in the following image:

Uploaded Image: link-with-number-of-days-changed-since.png

Consequently, we received daily notice that all of the swiki webpages we monitored have changed! We needed some way to strip out such metadata tags to better monitor the primary content within the webpage. This required greater control of how we handled webpage monitoring. We employed the use of a webpage monitoring software project which one of our team members is currently involved in developing.

More information about the webpage monitoring software can be found here

Stripping metadata tags with regular expression

We present below the regular expressions (regex) we found to be sufficiant for stripping metadata content from swiki webpages. They are presented within the context of the unix sed utility, a stream editor utility which permits us to do text substitution.

// The following line permits ripping out stuff about page updates
$cmd = "/bin/sed -i 's|[l][a][s][t][ ][e][d][i][t][e][d][^" . '"' . "]*[ " . '"' . "][>]|" . '"' . "|g'" . " " . $storedWebpageFile;
// The following line strips out trailing "Links to this page" 
$cmd = "/bin/sed -i 's|[l][a][s][t][ ][e][d][i][t][e][d][^]*||g'" . " " . $storedWebpageFile; 

Link to this Page