Searching peoples names with SQL Server Full Text Search

  2012-11-21


Suppose you have a Users table as such:

CREATE TABLE [dbo].[Users]
(
    Id bigint NOT NULL,
    Name nvarchar(80) NOT NULL,
    CONSTRAINT PK_Users PRIMARY KEY CLUSTERED (Id ASC)
)

The Name column can contain first name and/or last name or pretty much anything really (separated by spaces). All these are valid values:

  • John Mann
  • Jane Doe
  • Curtis James Jackson III
  • The Game
  • Jay-Z
  • Peter
  • My name is Johnson
  • Joanna

Now you are required to implement a search on the Users.Name column. The requirements are that it’s possible to search by any part of the name, so if you entered “Jackson”, “Curtis James Jackson III” should show up.

To implement such search you could simply use a query like this:

SELECT * FROM Users WHERE Name LIKE '%query%'

However the performance of such a query would be terrible for any database of significant size. Also such query would produce unwanted results, because it disregards word boundaries. So for example if you searched for “ann”, you would get “John Mann” and “Joanna”, which aren’t useful results. Another problem is that if you searched for “Doe Jane”, you wouldn’t get any results, however searching for “Jane Doe” you do.

For all these reasons, using LIKE '%query%' to search for multi-part names is not a good idea. Instead we should use SQL Server’s Full-Text Search feature, which avoids the above-described disadvantages and provides good performance.

To implement full-text search on Users.Name column, we first need to create a full-text catalog:

CREATE FULLTEXT CATALOG [FC_Content]

Then we need to create a full-text index on the Users.Name column:

CREATE FULLTEXT INDEX
    ON [Users]
    (
        Name LANGUAGE 1033
    )
    KEY INDEX [PK_Users]
    ON [FC_Content]
    WITH CHANGE_TRACKING AUTO

Now the full-text search on Users.Name is available and we can utilize with the help of CONTAINS keyword:

SELECT *
    FROM Users
    WHERE CONTAINS(Name, '"John*"')

The query would return “John Mann” and “My name is Johnson”. One minor difficulty with this approach though is that you have to make sure the query is in the correct format before sending it to the SQL Server. So for example if user searched for John, you need to convert it to "John*" (notice the double quotes). The double quotes indicate that it’s a single term for which we are searching and the asterisk indicates a wildcard.

Suppose user has searched for Curt Jack, the query you send to the SQL Server should be "Curt*" AND "Jack*", which would return “Curtis James Jackson III”. The other good thing is that order of terms doesn’t matter, so if user searched for Jack Curt, your SQL Server query would be "Jack*" AND "Curt*", which would still yield the same result.

I suggest converting user query to the SQL-Server-friendly CONTAINS query in your application code (as opposed to inside stored procedure or other SQL Script). Here is my C# code for doing so:

/// <summary>
/// Converts user-entered search query into a query that can be consumed by CONTAINS keyword of SQL Server.
/// </summary>
/// <example>If query is "John S Ju", the result will be "\"John*\" AND \"S*\" AND \"Ju*\"".</example>
/// <param name="query">Query entered by user.</param>
/// <returns>String instance.</returns>
public static string GetContainsQuery(string query)
{
    string containsQuery = string.Empty;

    var terms = query.Split(new[] { ' ' }, StringSplitOptions.None);

    if (terms.Length > 1)
    {
        for (int i = 0; i < terms.Length; i++)
        {
            string term = terms[i].Trim();

            // Add wildcard term, e.g. - "term*". The reason to add wildcard is because we want
            // to allow search by partially entered name parts (partially entered first name and/or
            // partially entered last name, etc).
            containsQuery += "\"" + term + "*\"";

            // If it's not the last term.
            if (i < terms.Length - 1)
            {
                // We want all terms inside user query to match.
                containsQuery += " AND ";
            }
        }

        containsQuery = containsQuery.Trim();
    }
    else
    {
        containsQuery = "\"" + query + "*\"";
    }

    return containsQuery;
}

This is all you need to implement the full-text search on User.Names. The result is that you have a search which will act very similar to how facebook’s search acts. Great stuff for very little effort!

One thing that I did not investigate is the use of language and whether it is useful at all, since names aren’t actual words. However that is part of optimization that can be done later if needed. For now I would stick with having language at 1033 (English).

22bugs.co © 2017. All rights reserved.