Class Ferret::Search::FuzzyQuery
In: ext/r_search.c
Parent: Ferret::Search::Query

Summary

FuzzyQuery uses the Levenshtein distance formula for measuring the similarity between two terms. For example, weak and week have one letter difference and they are four characters long so the simlarity is 75% or 0.75. You can use this query to match terms that are very close to the search term.

Example

FuzzyQuery can be quite useful for find documents that wouldn‘t normally be found because of typos.

  FuzzyQuery.new(:field, "google",
                 :min_similarity => 0.6,
                 :prefix_length => 2)
  # matches => "gogle", "goggle", "googol", "googel"

Methods

Public Class methods

Create a new FuzzyQuery that will match terms with a similarity of at least +:min_similarity+ to term. Similarity is scored using the Levenshtein edit distance formula. See en.wikipedia.org/wiki/Levenshtein_distance

If a +:prefix_length+ > 0 is specified, a common prefix of that length is also required.

You can also set +:max_terms+ to prevent memory overflow problems. By default it is set to 512.

Example

  FuzzyQuery.new(:content, "levenshtein",
                 :min_similarity => 0.8,
                 :prefix_length => 5,
                 :max_terms => 1024)
field:field to search
term:term to search for including it‘s close matches
:min_similarity:Default: 0.5. minimum levenshtein distance score for a match
:prefix_length:Default: 0. minimum prefix_match before levenshtein distance is measured. This parameter is used to improve performance. With a +:prefix_length+ of 0, all terms in the index must be checked which can be quite a performance hit. By setting the prefix length to a larger number you minimize the number of terms that need to be checked. Even 1 will cut down the work by a factor of about 26 depending on your character set and the first letter.
:max_terms:Limits the number of terms that can be added to the query when it is expanded as a MultiTermQuery. This is not usually a problem with FuzzyQueries unless you set +:min_similarity+ to a very low value.

Public Instance methods

[Validate]