View this PageEdit this PageAttachments to this PageHistory of this PageHomeRecent ChangesSearch the SwikiHelp Guide
Hotspots: Admin Pages | Turn-in Site |
Current Links: Cases Final Project Summer 2007

Los Pimps - Matching Tutorial

Matching Algorithm:

Here is the outline of the algorithm we used to implement our local search class

For Names:
first name recieved and overall weight of 20%
last name recieved an overall weight of 30%
combined, this was 50% of the overall weight for matching

we used a soundex algorithm and made a few slight modifications.

Format for Soundex Algorithm:
1. Retain 1st letter of name
2. Remove all occurences of doubled letters that are right next to each other. Such as t in letters.
3. Remove all vowels from name.
4. Assign each letter to a given number except the very first letter:
1 = 'B' 'F' 'P' 'V'
2 = 'C' 'G' 'J' 'K' 'Q' 'S' 'X' 'Z'
3 = 'D' 'T'
4 = 'L'
5 = 'M' 'N'
6 = 'R'
5. add trailing zeros if less than three digits
6. Remove digits if there are more than three. Three digits is needed
7. Form should be

From the format of Letter digit digit digit we did the following...

1) compared the two letters
  • the first letter received a 30% weight of the name percentage
  • if the letters were exact it recieved full points
  • if the letters were an E and I respectively, it received a weight of 20%
  • if the first letter was a T and the next letter was the same as the first letter of the other name
this received a weight of 20% since there are a few names that begin with a silent T, such is the
case with the name Tchebysheff and chebyshev
  • if one name began with a ph and the other with an f, this recieved a weight of 20%

2) compared the digits
  • if all the digits were exactly the same and in the correct order, this received a weight of 70%, the highest for this part
  • if all the digits were the same but in a different order, this recieved a weight of 60%
  • if two digits were correct and in the same position this received a weight of 50%
  • if two digits were correct but in the wrong order then this received a weight of 30 %
  • if one digit was correct and in the right place this received a weight of 20%

3) Checked total weight for names
  • if the total percentage for a name was below 60% it was disgarded because we feel this was not enough to be considered a match.


For other Information:
The rest of the information was then used for search criteria.
The elements, birth date, birth location, death date, death location, and sex were used
each one recieved 20 percent weight for the Other information part.
this information combined for a 50% weight. This plus the weight of the name equals a full
100% weight for all search criteria. A perfect match will be indicated by a priority
of 100%

1) compared sex
  • if the same then a full 20% was given, else 0 %

2) compared birth date
  • if the same then a full 20% was given, else 0%

3) compared death date
  • if the same then a full 20% was given, else 0%

4) compared birth location
  • used the same algorithm as for the names. Used the soundex algorithm and did the same comparisons as explained above. If a match was found, a full 20% was given, else 0%

5) compared death location
  • used the same algorithm as for the names. Used the soundex algorithm and did the same comparisons as explained above. If a match was found, a full 20% was given, else 0%



NOTE: in all cases, a nil item was considered a perfect match for that criteria area.
we decided to do this because they could very well match and we did not want to
leave off a potential search match.


Things to Consider:
For this type of matching, there are many inaccuries. Some of these we tried to account for by altering our algorithm from the actual soundex algorithm we found online. Thus for some names we had a better search result than we would have had using the exact given algorithm. However, this algorithm is far from perfect and would need modifications either way to account for the things one is trying to accomplish with a search algorithm.

Links to this Page