How to use Lucene In .Net Core project

Search is a critical part of any application. Users of your application expect to have search capability that works no different than Google search where they can enter free form query text and expect results that are ranked by relevance. If you are using database like MS SQL server, you could turn on full text search capability for your database and perform search that could provide some degree of search that is not strictly based on query against a field or group of fields. This approach limits you to text analysis done by database server's built in component. You really do not have lot of control on how the indexing and scoring of search results.

There is a solution available for your .Net core applications. Lucene has been a search API library for a long time that powers services like Solr. There is a long list of search application that are powered by Lucene. Lucene is Java based library. For quite some time, .Net community has tried to create a port of this Java library for .Net applications. After some time, this port has not kept up with pace of Lucene development. Current version of Lucene.Net port has most of the core capabilities of Java based Lucene engine. In this post, I will quickly introduce how you can use Lucene.Net engine into your .Net core application.

Any good search engine perform following basic tasks.

  • Index the data in a searchable data structure and format
  • Read the indexed data
  • Search the indexed data

Lucene.Net provides the APIs to performs these tasks and utilities to parse your search queries in a way that will help in searching the data

Getting Started

First thing first. You will need to add reference to Lucene.Net library to your .Net project. Open NuGet manager for your project and search for Lucene. You will see few Lucene based entries will show in your list.

As of this writing latest version is 4.8beta. Add reference to this library. It will automatically all other relevant Lucene libraries with it.

Index Writer

Next step will be to index your data. In Lucene's terms, you will need to set up a IndexWriter. In our HomeRP® portal, we have refactored that process in a class itself. Following code snippet show how Lucene IndexWriter is created.

public static SearchIndexWriter Create(Organization organization, SearchConfig config, Analyzer analyzer)
        {
            var indexFolder = Path.Combine(config.IndexRootFolder, $"lucene_index_{organization.Id}");
            var indexDirectory = CreateIndexDirectory(indexFolder);
            var writerConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer)
            {
                OpenMode = OpenMode.CREATE_OR_APPEND,
                WriteLockTimeout = Lock.LOCK_POLL_INTERVAL * 2
            };
            var indexWriter = new IndexWriter(indexDirectory, writerConfig);
            return new SearchIndexWriter(indexFolder, indexWriter);
        }

Index will need to be written in a folder. Create the folder where you want to store the index files. Depending on your applications, you will decide how you want to structure index folders. Our portal is a multi-tenant application, we have a separate index folder for each tenant. First line of the code is creating index folder for each client.

Add Indexed Records

After creating IndexWriter, next step will be to start adding data to your index. In Lucene terms, these are called documents. Following code snippet shows data from multiple database tables and related code objects is added to the index.

public async Task AddAssetRecord(Asset asset, IList<AssetCategory> categories, 
                Manufacturer? manufacturer = null, Vendor? vendor = null)
        {
            var searchDocument = CreateSearchDocument(asset, categories, manufacturer, vendor);
            _indexWriter.AddDocument(searchDocument);
            _updatedOnUtc = DateTime.UtcNow;
        }

private static Document CreateSearchDocument(Asset asset, IList<AssetCategory> categories, 
   Manufacturer? manufacturer = null, Vendor? vendor = null)
        {
            // build content that needs to be indexed.
            var indexedContent = new StringBuilder(100);
            var document = new Document();
            // Add asset id that uniquely identifies records.
            document.AddStringField("AssetId", asset.Id.ToString(), Field.Store.YES);
            // Add asset name field
            document.AddStringField("Name", asset.Name, Field.Store.YES);
            indexedContent.Append(asset.Name);
            if (string.IsNullOrEmpty(asset.DisplayName))
            {
                indexedContent.Append($" {asset.DisplayName} ");
            }
            // Add make and model information
            document.AddStringField("Model", 
                   string.IsNullOrEmpty(asset.MakeModel) ? "" : asset.MakeModel, Field.Store.YES);
            if (!string.IsNullOrEmpty(asset.MakeModel))
            {
                indexedContent.Append($" {asset.MakeModel} ");
            }

            if (categories.Any())
            {
                indexedContent.Append($" {string.Join('|',categories)} ");
                foreach (var category in categories)
                {
                    if (!string.IsNullOrEmpty(category.ParentCategoryName))
                    {
                        indexedContent.Append($" {category.ParentCategoryName} ");
                    }
                }
            }

            // Add description fields, but do not store in index
            if (!string.IsNullOrEmpty(asset.ShortDescription))
            {
                indexedContent.Append($" {asset.ShortDescription} ");
            }

            if (!string.IsNullOrEmpty(asset.FullDescription))
            {
                indexedContent.Append($" {asset.FullDescription} ");
            }

            if (!string.IsNullOrEmpty(asset.Notes))
            {
                indexedContent.Append($" {asset.Notes} ");
            }

            // If manufacture is available, then index it. Do not store it in index.
            if (null != manufacturer)
            {
                indexedContent.Append($" {manufacturer.Name} ");
            }
            // If vendor is available, then index it. Do not store it in index.
            if (null != vendor)
            {
                indexedContent.Append($" {vendor.Name} ");
            }

            document.AddTextField("Content",
                indexedContent.ToString().ToLowerInvariant(), Field.Store.NO);

            document.AddInt64Field("UpdateDate",
                asset.UpdatedOnUtc.HasValue
                    ? new DateTimeOffset(asset.UpdatedOnUtc.Value).ToUnixTimeMilliseconds()
                    : new DateTimeOffset(asset.CreatedOnUtc).ToUnixTimeMilliseconds(), Field.Store.NO);
            return document;
        }

Pay attention to CreateSearchDocument method. This is heart and soul of your index. This method is responsible for creating search documents that your indexer is going to user to perform search. There are some key concepts of Lucene Indexing that you will in this code snippet.

  • Aggregate all the searchable text in one field. Lucene analyzer will analyze and tokenize that content to create relevant inverted index.
  • You do not have to store all the content in index. You control it by using Field.Store value. A value that is stored in the index can be retrieved from documents returned by search results. Unless you have a strong reason to store large text fields in the index itself, I will recommend using those fields for indexing only and not store in index.
  • Assign a unique identifier for each search document and store it in the index. That makes it easy to identify documents when you want to delete or update records.

That is all that you need to index your data. There are some advanced techniques to tokenize and index data using different type of analyzer. For this discussion I will keep it simple by using StandardAnalyzer

Search Index

Now that you have your index ready, you can use IndexReader to read the index and perform search against it. Following code snippet is from our HomeRP portal. I have refactored the code into a separate class.

public static SearchIndexSearcher Create(Organization organization, SearchConfig searchConfig)
        {
            var indexFolder = Path.Combine(searchConfig.IndexRootFolder, $"lucene_index_{organization.Id}");
            var reader = DirectoryReader.Open(FSDirectory.Open(indexFolder));
            return new SearchIndexSearcher(indexFolder, reader);
        }

        public async Task<IList<Asset>> SearchAssets(string searchPhrase)
        {
            var searcher = new IndexSearcher(_indexReader);
            var queryParser = new QueryParser(LuceneVersion.LUCENE_48, "Content",
                      new StandardAnalyzer(LuceneVersion.LUCENE_48));
            var query = queryParser.Parse(searchPhrase);
            var hits = searcher.Search(query, _indexReader.MaxDoc).ScoreDocs;

            return hits.Select(hit => searcher.Doc(hit.Doc)).Select(doc => new Asset
                {Id = Convert.ToInt32(doc.Get("AssetId")), Name = doc.Get("Name")}).ToList();
        }

The code starts by creating IndexReader from the folder where the index is stored. Since I have sharded our index into multiple folders based on a client's id, separate readers are created and cached for each folder. SearchAssets method starts by creating IndexSearcher object with IndexReader for a folder. Next it parses the search query using QueryParser class. Notice the use of field name Content in the parser. This is the field where all the searchable text has been indexed by our CreateSearchDocument method when we created search records. I stored unique identifier of each record in AssetId field of each search document. I access that field from search results and create a transformation of the search document to application specific Asset object.

This is all the you need to integrate Lucene into your .Net core project. In this post I have shown use of the very basic indexing and search capabilities of Lucene. In subsequent posts, I will discuss handling of some complex and specific use cases related to Facets, Range Queries, Custom Analyzers etc.

Search

Social

Weather

2.5 °C / 36.5 °F

weather conditions Clouds

Monthly Posts

Blog Tags