Want to search a particular string in MYSQL database? then make use of Full Text Search techniques
There
are 3 types of Full Text Search techniques available:
1.Natural
language Full-Text Search
2.Boolean
Full-Text Search
3.Query
Expansion Search
For
the above searches we make use of MATCH(col1,col2..) AGAINST(expr [IN
NATURAL LANGUAGE MODE/IN BOOLEAN MODE/WITH QUERY EXPANSION])
One
or more columns can be provided in the MATCH function. Search string
is given as argument to AGAINST function.
- Natural language Full-Text Search(is used by default)
- Boolean Full-Text Search
- Query Expansion Search
Natural
language Full-Text Search can be applied only on MYISAM tables. For
performing the search we create an FULLTEXT index on the particular
columns to be searched. Non indexed columns cannot be searched.
FULLTEXT index can be created while creating a table or can be
altered later to add the index. It is better to first create a table,
upload the data and then add the index. This is because when large
data sets must be loaded to table, FULLTEXT indexed table takes
larger time.This search interprets the search string as a phrase in
natural language. There are no special operators.
By
default the search is not case sensitive. To make search case
sensitive, we need to go for binary collation for the indexed
columns.For example, a column that uses the latin1 character set can
be assigned a collation of latin1_bin to make it case sensitive for
full-text searches.
This
returns rows basing on the relevance. Relevance is calculated basing
on the number of times the search string occurs in a particular
row(from higher relevance to lower relevance).
Limitations:
Words
that are present in 50% or more of the rows are considered common and
do not match the selection.The stopword list applies.
Syntax:
select
* from table_name where MATCH(indexed columns seperated by comma)
AGAINST(SEARCH STRING IN '');
Ex:
select
* from test where MATCH(column1,column2) AGAINST('search string');
OR
select
* from test where MATCH(column1,column2) AGAINST('search string' IN
NATURAL LANGUAGE MODE);
A
boolean search interprets the search string using the rules of a
special query language. The string contains the words to search for.
It can also contain operators that specify requirements such that a
word must be present or absent in matching rows, or that it should be
weighted higher or lower than usual. Common words such as “some”
or “then” are stopwords and do not match if present in the search
string.
This
does not follow any relevance order as Natural Language Search.
Boolean
search can be performed even without creating an index on the
columns.But this type of search would be slow.
This
does not have 50% matching criteria as of Natural Language Search.
This
search makes use of following operators:
+
A
leading plus indicates this word must
-
A
leading minus indicates that the rows returned must not contain the
specified word.
[no
operator] implies optional
Syntax:
select
* from table_name where MATCH(indexed columns seperated by comma)
AGAINST(SEARCH STRING IN '' IN BOOLEAN MODE);
Ex:
select
* from test where MATCH(column1,column2) AGAINST('search string' IN
BOOLEAN MODE);
TheBlind
Query Expansion (or automatic relevance feedback)feature
can be used to expand the results of the search. This often includes
much more noise, and makes for a very fuzzy search.In most cases you
would use this operation if the users query returned just a few
results, you try it again withe query expansion and it will add
words that are commonly found with the words in the query.
Syntax:
select
* from table_name where MATCH(indexed columns seperated by comma)
AGAINST(SEARCH STRING IN '' WITH QUERY EXPANSION);
Ex:
select
* from test where MATCH(column1,column2) AGAINST('search string' WITH
QUERY EXPANSION);
FEW
IMPORTANT POINTS:
Use
the following query to query the variables status related to FULL
TEXT search
show
variables like 'ft%';
- Searches are not case sensitive.
- Short words are ignored, the default minimum length is 4 characters. You can change the min and max word length with the variables ft_min_word_len and ft_max_word_len .
- Words called stopwords are ignored, you can specify your own stopwords, but default words include the, have,some
- You can disable stopwords by setting the variable ft_stopword_file to an empty string.
- Full Text searching is only supported by the MYISAM storage engine.
- If a word is present in more than 50% of the rows it will have a weight of zero. This has advantages on large datasets, but can make testing difficult on small ones(applicable only to Natural language Search).