Thursday, August 25, 2011

MySQL FULLTEXT Searches

Want to search a particular string in MYSQL database? then make use of Full Text Search techniques

There are 3 types of Full Text Search techniques available:

1.Natural language Full-Text Search
2.Boolean Full-Text Search
3.Query Expansion Search

For the above searches we make use of MATCH(col1,col2..) AGAINST(expr [IN NATURAL LANGUAGE MODE/IN BOOLEAN MODE/WITH QUERY EXPANSION])

One or more columns can be provided in the MATCH function. Search string is given as argument to AGAINST function.
  • Natural language Full-Text Search(is used by default)
  • Boolean Full-Text Search
  • Query Expansion Search


Natural language Full-Text Search can be applied only on MYISAM tables. For performing the search we create an FULLTEXT index on the particular columns to be searched. Non indexed columns cannot be searched. FULLTEXT index can be created while creating a table or can be altered later to add the index. It is better to first create a table, upload the data and then add the index. This is because when large data sets must be loaded to table, FULLTEXT indexed table takes larger time.This search interprets the search string as a phrase in natural language. There are no special operators.

By default the search is not case sensitive. To make search case sensitive, we need to go for binary collation for the indexed columns.For example, a column that uses the latin1 character set can be assigned a collation of latin1_bin to make it case sensitive for full-text searches.

This returns rows basing on the relevance. Relevance is calculated basing on the number of times the search string occurs in a particular row(from higher relevance to lower relevance).

Limitations:

Words that are present in 50% or more of the rows are considered common and do not match the selection.The stopword list applies.

Syntax:

select * from table_name where MATCH(indexed columns seperated by comma) AGAINST(SEARCH STRING IN '');

Ex:
select * from test where MATCH(column1,column2) AGAINST('search string');
                                                           OR
select * from test where MATCH(column1,column2) AGAINST('search string' IN NATURAL LANGUAGE MODE);

A boolean search interprets the search string using the rules of a special query language. The string contains the words to search for. It can also contain operators that specify requirements such that a word must be present or absent in matching rows, or that it should be weighted higher or lower than usual. Common words such as “some” or “then” are stopwords and do not match if present in the search string.

This does not follow any relevance order as Natural Language Search.
Boolean search can be performed even without creating an index on the columns.But this type of search would be slow.
This does not have 50% matching criteria as of Natural Language Search.

This search makes use of following operators:
+
A leading plus indicates this word must 
A leading minus indicates that the rows returned must not contain the specified word.
[no operator] implies optional

Syntax:

select * from table_name where MATCH(indexed columns seperated by comma) AGAINST(SEARCH STRING IN '' IN BOOLEAN MODE);

Ex:
select * from test where MATCH(column1,column2) AGAINST('search string' IN BOOLEAN MODE);

TheBlind Query Expansion (or automatic relevance feedback)feature can be used to expand the results of the search. This often includes much more noise, and makes for a very fuzzy search.In most cases you would use this operation if the users query returned just a few results, you try it again withe query expansion and it will add words that are commonly found with the words in the query.

Syntax:

select * from table_name where MATCH(indexed columns seperated by comma) AGAINST(SEARCH STRING IN '' WITH QUERY EXPANSION);

Ex:
select * from test where MATCH(column1,column2) AGAINST('search string' WITH QUERY EXPANSION);


FEW IMPORTANT POINTS:

Use the following query to query the variables status related to FULL TEXT search

show variables like 'ft%';
  • Searches are not case sensitive.
  • Short words are ignored, the default minimum length is 4 characters. You  can change the min and max word length with the variables ft_min_word_len and ft_max_word_len .
  • Words called stopwords are ignored, you can specify your own stopwords, but default words include thehave,some
  • You can disable stopwords by setting the variable ft_stopword_file to an empty string.
  • Full Text searching is only supported by the MYISAM storage engine.
  • If a word is present in more than 50% of the rows it will have a weight of zero. This has advantages on large datasets, but can make testing difficult on small ones(applicable only to Natural language Search).

No comments:

Post a Comment