skip to our work or skip to our software

Add “did you mean” function to website, mySQL « Media Modus. A creative, ideas led, website production & consultancy company. - +44 (0) 845 535 1716

« Adobe Spry Tooltip Flicker FIX

Add “did you mean” function to website, mySQL

did you mean?

did you mean?

Say what?

Like Google you might want to offer “did you mean” functionality to your site visitors, not sure where to start? or just looking for a quick fix! then you’r in the right place…

To kick off with it’s worth noting that we are going to be using mySQL Stored procedures, along with the various mySQL built in functions, such as: SOUNDEX() et al.

Lets start by building some key parts of our database, begining with the “tbl_dictionary”, we are going to need a dictionary to compare our users input against, below I show you the table structure.

1. tbl_dictionary


CREATE TABLE `tbl_dictionary` (
`fld_id` int(11) NOT NULL auto_increment,
`fld_word` varchar(255) character set latin1 default NULL,
`fld_relation` int(11) default '0',
PRIMARY KEY (`fld_id`),
FULLTEXT KEY `iword` (`fld_word`)
) ENGINE=MyISAM AUTO_INCREMENT=82610 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Dictionary UK Word List';

you can download the full dictionary (tbl_dictionary) SQL dump here (should make life a little easier)

2. LEVENSHTEIN Stored Procedure ( Function )

The Levenshtein distance is a metric for measuring the amount of difference between two sequences such as a string, in our case that’s the difference between our users input and the strings (records) in our database table.

Here is the DDL for the function:


CREATE FUNCTION 'LEVENSHTEIN'(s1 VARCHAR(255), s2 VARCHAR(255)) RETURNS int(11)
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
RETURN 0;
ELSEIF s1_len = 0 THEN
RETURN s2_len;
ELSEIF s2_len = 0 THEN
RETURN s1_len;
ELSE
WHILE j <= s2_len DO
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
END WHILE;
WHILE i <= s1_len DO
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
WHILE j <= s2_len DO
SET c = c + 1;
IF s1_char = SUBSTRING(s2, j, 1) THEN SET cost = 0; ELSE SET cost = 1; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
IF c > c_temp THEN SET c = c_temp; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
IF c > c_temp THEN SET c = c_temp; END IF;
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
END WHILE;
SET cv1 = cv0, i = i + 1;
END WHILE;
END IF;
RETURN c;
END;

3.0 LEVENSHTEIN_RATIO Stored Procedure ( Function )

Now we create our next stored function, this one utilises the Levenshtein one we just created, it returns a value between 0 - 100 that indicates how close the match is.

Here’s the DDL


CREATE FUNCTION `LEVENSHTEIN_RATIO`(s1 VARCHAR(255), s2 VARCHAR(255)) RETURNS int(11)
BEGIN
DECLARE s1_len, s2_len, max_len INT;
SET s1_len = LENGTH(s1), s2_len = LENGTH(s2);
IF s1_len > s2_len THEN SET max_len = s1_len; ELSE SET max_len = s2_len; END IF;
RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100);
END;

4.0 Our final Stored Procedure DID_YOU_MEAN ( returns dataset )

This function will return a single record as defined by the LIMIT 0,1 in the SP. You could change this and it would return a list in order of closest match.

The first parameter takes a string ( word ) to compare and the second parameter takes an integer that represents the threshold to prevent completely incorrect suggestions being returned, you should play with this figure to find one that best suits your enviroment.

[code]
CREATE PROCEDURE `DID_YOU_MEAN`(IN param1 VARCHAR(255), IN param2 INTEGER(2))
BEGIN
SELECT
fld_id,
tbl_dictionary.fld_word,
LEVENSHTEIN_RATIO(param1,tbl_dictionary.fld_word) as Ratio
FROM
tbl_dictionary
WHERE
SOUNDEX(param1) = SOUNDEX(tbl_dictionary.fld_word)
HAVING
LEVENSHTEIN_RATIO(param1,tbl_dictionary.fld_word) > param2
ORDER BY Ratio DESC
LIMIT 0,1;
END
[/code]

5.0 How to use

Its straight forward from here..

Call
DID_YOU_MEAN(’Cheeze’,50) and we get as a result:
–> 11691 | cheese | 83

That’s all folks, enjoy :P

Your Comments are Welcome!

  • Share/Bookmark

Tags: , , , ,

Leave a Reply

Media Modus. A creative, ideas led, website production & consultancy company. - +44 (0) 845 535 1716 was built by mediamodus.com and powered by vonhost.co.uk
You can subscribe to either entries (RSS) and comments (RSS). xxii09150673 Advertise on this site

                   listenability, compliance, and navigatability logo