View this PageEdit this PageAttachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

Squeak and XML: A Tutorial by Badrul Islam

This tutorial shows you how to write an XML file and parse the file in Squeak.

XML document has a hierarchical structure with strict rules regarding placement of user created tags and can only contains character data (no binary data). An XML file is composed of elements that are delimited by start tag and an end tag. A start tag begin with <, you place the name of the tag before closing the tag with >. An end tag begins with and place the name of the tag in between. Data between the beginning and end of a tag is called element content.
<personName> John McCarthy </personName>

Above is an example of element.
<personName> is a start tag and </personName> 
is an end tag. Tag name is case sensitive.

Unlike HTML which has a restricted set of tags and no new tags can be introduced by the users, XML has limited number of system defined tags but allows users to introduce any new tags user deems necessary. Furthermore, XML also has strict rule regarding syntax and placement of tags.

XML document is structured as a tree. This means that an XML document is composed of root element and all other elements are considered children. Below is an example of an XML document:
  <scene background="barn">
    <block perspective="pan 0 400">		
      <position actorName="John" xPosition="100" yPosition="300"></position>                                                            
      <position actorName="Bob" xPosition="220" yPosition="300"> </position>
      <position actorName="Bill" xPosition="340" yPosition="300"> </position>
      <action actionNo="1" actorName="Bill" act="sayText" arity="hello everyone i am the best character I
       know how to talk"></action>	
      <action actionNo="2" actorName="Bill" act="say" arity="my name is Bill></action>   
      <action actionNo="3" actorName="John" act="sayText" arity="No i am the best character watch me turn        
       sideway talk and walk"></action> 
      <action actionNo="4" actorName="John" act="moonwalk" arity="100 5000"></action>
    <block perspective="pan 1 400">
      <action actionNo="5" actorName="Bill" act="sayText" arity="we are running late for class"></action>
      <action actionNo="6" actorName="Bill" act="walk" arity="200 10000"></action> 
      <action actionNo="6" actorName="Bill" act="faceRight" arity=""></action>
      <action actionNo="6" actorName="John" act="walk" arity="200 10000"></action> 
      <action actionNo="6" actorName="John" act="faceRight" arity=""></action>
  <scene background="school">			
    <block perspective="pan 2 100">
      <position actorName="John" xPosition="100" yPosition="300"> </position>
      <position actorName="Bob" xPosition="220" yPosition="300"> </position>
      <position actorName="Bill" xPosition="340" yPosition="300"> </position>
      <position actorName="Bill" xPosition="250" yPosition="300"> </position>			      
      <action actionNo="1" actorName="Bob" act="walk" arity="100 1000"></action>
      <action actionNo="2" actorName="John" act="sayText" arity="where is is Joe"></action>
      <action actionNo="2" actorName="John" act="say" arity="where is is Joe"></action>
      <action actionNo="3" actorName="Bill" act="sayText" arity="i do not know"></action>
      <action actionNo="3" actorName="Bill" act="say" arity="i do not know"></action>
      <action actionNo="4" actorName="Bill" act="sayText" arity="it is beautiful outside let us goplay">     

Above XML file is composed of root element: play. Actor tag has two children element: actorName and bodyCostume. These three elements are siblings in relation to each other. Each element can have multiple children but only a single parent. The single parent rule means that children elements of must be completely enclosed within the parent entity. Thus, if an element’s start tag is inside an element then its end tag must also be inside that element. Overlapping tags such as
<actor> <actorName> </actor> </actortName>
is not allowed.

In the above document
<action actionNo=”1” … </action>
is an example of attribute. An attribute is a name-value pair attached to the element’s start tag. Name of an attribute is separated from value by an equal sign and optional whitespace. Value of an attribute must be enclosed in single or double quotations. For example action elements has four attributes: actionNo, actorName, act and arity. Each of these attribute is separated from its values by equal sign and values are enclosed in double quotation: “1”, “Bill”, “sayText”, and “hello everyone i am the best character I know how to talk".

In order for an XML document to be parsed it must be well formed which means that it must abide by certain rules such as:

Squeak supports two parsers for XML: SAX and DOM. In truth, if you look at the Squeak code you will see that DOM is implemented using SAX driver and adding wrappers on top of SAX methods. SAX is used to parse streaming XML data while DOM used to parse a saved file. SAX is event based which means as it encounters tags and attributes it will inform you. DOM on the other hand is a tree-oriented API that treats an XML document as a set of nested objects with various properties.

To initiate parsing, you must open the XML file. If you just provide the filename then parser will try to locate the file where Squeak image file is stored. You can also give the full path name which is platform dependent thus Window will expect “/” as directory separator while Unix variant will expects “\”. To open the file you write:
xmlFile := FileStream fileNamed: ‘play.xml’.

Once the file has been opened, it must be stored in DOM format:
document := XMLDOMParser parseDocumentFrom: xmlFile.

Now you can close the XML file:
xmlFile close.

To get list of elements that XML document is composed of:
xmlElements := document elements.

To get the hierarchal elements:
element := xmlElements at: 1.

Now I will show you how to parse a file since it has been open, loaded and DOM has created a tree structure of your elements (above code does all of that). I will show code that parse the XML file of the above format (play).
If you want to parse every occurrence of an element instead of traversing the tree that DOM has created you can do something like this:
element tagsNamed: #actor
       do: [:i | 
	      actorNameTag := i firstTagNamed: #actorName.
	      actorName := actorNameTag contentString.
	      bodCostumeTag := i firstTagNamed: #bodyCostume.
	      bodCostume := bodCostumeTag contentString.
                                          actorObj := Actor new.
                     		      actorObj name: actorName.
	      actorObj costume: (bodCostume asLowercase).
	      propList add: actorObj].

Above code goes through the XML file and locates every instance of actor elements. Actor has a hierarchal structure (it has two children: actorName and bodyCostume) which gets parsed by locating element actorName (actorNameTag := i firstTagnamed: #actorName). After locating actorName element its data (character between and ) is extracted and placed in a variable (actorName := actorNameTag contentString.). In a similar manner body costume element of that actor (which firstTagNamed gives) is located and its content is extracted and bounded with a variable. Rest of the code creates an object called Actor and sets some of its instance variables.

Now I will show you how to use DOM parser’s tree structure to extract all the data iteratively. Scroll up and take a look at the XML data. It will show that scene is the parent of block and block is parent of position, actor and action. We will not be parsing actor elements since we already did that in the above code. Each scene is parsed iteratively (element tagsNamed: #scene do:) and since block is the child of scene, blocks are parsed iteratively inside each occurrence of the scene (scn tagsNamed: #block do:) and block is the parent of actions. Thus actions are parsed iteratively inside the block (blk tagsNamed: #action do:). Position elements can be either inside a block or outside of it but all positions must be inside a scene. Thus, all positions inside a scene irrespective of whether inside blocks or not are parsed iteratively (scn tagsNamed: #position do:).
 element tagsNamed: #scene
	         do: [:scn | 
		  bkGrnd := scn attributes at: 'background'.
		   scn tagsNamed: #position do: [:pos| actorOrig:=pos attributes at:'actorName'.
		              			       	xPos := pos attributes at:'xPosition'.
							yPos := pos attributes at:'yPosition'.].
			scn tagsNamed: #block
			      do: [:blk | 
				focus := blk attributes at: 'perspective'.
				actionList := actionList , '$ ' , nl , '-perspective', ' ',  focus , nl.
				actionList _ actionList,posList.
				posList _ String new.
				blk tagsNamed: #action
					do: [:task | 
						currActionNo := task attributes at: 'actionNo'.
						currActionNo = prevActionNo
						actorName := task attributes at: 'actorName'.
						act := task attributes at: 'act'.
						arity := task attributes at:'arity'.]]].

Above code also show you how to extract attribute data. For example, in order to extract attributes data inside action tag, first you must get the element which contains the attributes. For action element you get to it by:
 blk tagsNamed: #action do: [:task|…] 
. The local variable task contains action element. Now to get values of attributes we will use the variable task which contains action element:
 currActionNo := task attributes at:’actionNo’. 
. This code gives us the value for the attribute actionNo. We get all the rest of the attribute values in similar manner.

Links to this Page