View this PageEdit this PageAttachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

Sp01 Final Exam Review: Choosing your collection objects

See Final Exam Review - Sp2001

Comments? Answers? Criticisms?

a. Identify a use where Arrays are the best choice, where Bags are the best choice, and where Dictionaries are the best choice.

Arrays - fixed size collections ordered by sequence number, best for collections where the size does not change, e.g. page tables in the OS

Bags - never used it before, never needed to, can't think of anyway that is could be better than ordered collections

Dictionaries - best for uses where it is more convenient to look up for a piece of data item using a String reference instead of a number reference, e.g. used in the PWS for storing the HTML form fields

b. For figuring out if an element is in the Collection, Sets are much, MUCH faster than Arrays. Why?

What's the meaning of Sets over here? If it means hash tables (i.e. Dictionaries), then they are much faster than arrays because they can calculate the location in which the data is stored using a hashing formula and so they can access the data almost directly, whereas for arrays, they have to traverse through the entire array from beginning to end and compare each element of the array, so the average number of comparisons they have to make is equal to half the array size.

c. When are OrderedCollections much slower than Arrays? When are OrderedCollections almost the same speed as Arrays? What causes OrderedCollections to be slower?

I don't understand, when are we expected to know how OrderedCollections work internally? I'm assuming that OrderedCollections in Squeak means Linked List in the rest of the programming world.

Linked Lists are much slower than arrays when we want to access a data item using the index of the data item within the collection. With a linked list we have to start from the first element and proceed to the data item sequentially, while for an array we can access the data item directly using that index.

Linked lists are almost the same speed as arrays when we are looking for a given data item within the collection, so for both the linked list and the array, the algorithm used is to start comparing the first element of the collection sequentially down the collection till the data item is found or till the collection is exhausted.

Linked lists are slower because in order to reach a particular node at a certain index of the linked list, we have to follow the link starting from the first link till the end of the list, and we cannot skip links. Arrays are accessed directly using the offset value calculated from the index.

You may have missed the class on optimizations – that's where Bags and Sets were discussed, as well as how OrderedCollections are like/unlike arrays. Bags and Sets are WAY better than OrderedCollections – for some tasks. You might want to learn what Sets and Bags are before the final. And yes, you are expected to know how OrderedCollections work internally, since it was in the lecture. Mark Guzdial

i'll take a stab at it, someone correct me though -heh
a)a bag is great to use when you want to keep track of how many copies of an object you have added, but don't need to overhead of searching and storing identical copies! In 1312x we had a C program that kept track of how many times certain words were used in some of Shakespeare's sonnets (made a histogram). This would have worked great with Bags. Dictionaries are a subclass of Set, so obviously its a hashing collection. sad answer: dictionaries are good to use when hash tables are good to use! Arrays are great when you know how many elements you will be needing. You want the collection to be constant size.

b) sets search beginning from the (hash mod size) location within the collection. I suppose this means the hash table stores it's objects in a manner where retrieval is based on this "hashing function"!
for an array collection searching is n-time, check every spot. The Set a little more conscientious of where it puts its elements.

c)the ordered collection is like a vector. If the orderedCollection initializes with room for zero elements and then increases in size by 1 each time an element is added, this will be very inefficient. An array has its space already allocated and adding to it will have much less overhead since no dynamic growth is occurring. An ordered collection is the same speed as the array if it's very large to start out. Adding will not cause trouble with either if they have room to fill. orderedcollections are slow when they have to dynamically resize over and over. Webb
The notes on optimization are in the Extras folder on the CD – I suggest looking at them. Webb's (c) is close, but misses a couple key points. For example, do you know by how much a Squeak OrderedCollection (or a Java Vector – it's identical code, actually!) grows when it has to grow? For (a) and (b), think about operations with data structures: Insertions, deletions, searching for an element, asking if an element exists, etc. For which of these are bags and sets a couple orders of magnitude faster than arrays? And why? (Hint: There is at least one.) Mark Guzdial


b.As Webb mentioned, Sets are much faster at figuring out whether they contain a certain element because a Set is implemented as a Hash table. This means elements can be accessed according to a hash value and retrieval (or determination of existence in this case) can occur in O(1)... (with a little overhead in determining the hash value). For an Array, on the other hand, the data structure must be traversed until the element is found which is O(n) search time.

c.OrderedCollections are much slower than arrays when adding. OrderedCollections must check their bounds first, and if it is not large enough to store the element, it must "grow" which takes time. However, OrderedCollections double their size everytime they "grow" thus, they approach the speed of an Array as they get larger (a space vs. time tradeoff). OrderedCollections are also about as fast as arrays when accessing an element (about being relative since it still takes about 3 times as long).

Doug Powers

Two additions to these great replies. First, Dictionary's can use anything as keys, not just strings. Second, there is a difference between OrderedCollection's and Arrays, even when the OrderedCollection isn't growing. Can anyone describe that difference? -Lex Spoon
Accessing an element in OrderedCollection is much slower than in Array since it has to perform bounds-checking every time

Arrays are good for iterations. If you want to iterate down something to find a specific thing than an Array is the fastest. If you want to get to an element using a value then using a bag or dictionary is fastest because of the hashing stuff mentioned above.

I believe sets are collections that only contain one copy of each object. So if you tried to add another object it wouldnt let you actually it would let you but it wouldnt actually add the object. So if you wanted to build a collection of objects without duplicates you would use a set. Anyone object? Arkady

Link to this Page