Indexing Custom Data with Umbraco Examine
If you’ve ever built an Umbraco website, then there’s a good chance you’ve used Umbraco Examine. Examine is Umbraco’s wrapper around the Lucene.NET searcher package and provides the Umbraco CMS with a fast, powerful, lightweight, extensible solution for searching all things Umbraco. What if we want to search more than just Umbraco content though? What if we had data that wasn’t stored in the CMS, but stored in another table, or a different database? Look no further! This article will walk you through setting up your Umbraco Examine instance to search through Database Tables to provide even more robust and flexible searches on your website.
This article assumes you have some basic experience with Examine and Umbraco and is not meant to be an introductory tutorial into all things Examine and Lucene.NET. If you are unfamiliar with using Examine to index and search for content on your Umbraco website, first stop by the main Umbraco Examine blog post to brush up on your Examine knowledge (I even find myself consulting this particular post from time to time as it is very comprehensive!).
A couple of years ago, I built a website that required a quick search that also required some complex logic. The catch however, was that the data I was searching on wouldn’t be Umbraco based content. I didn’t want to just use the database because the current architecture of the site wouldn’t scale if many users started searching on the website. I wanted to ensure that we had a powerful searcher that could handle any requirements for searching that we may have, out of the box… enter Umbraco Examine!
Examine Configuration
For ease of explanation, let’s assume we have a custom table we want to index that follows a schema as such:
Column | Type |
ExamineId | INT (NON-NULL) |
Name | NVARCHAR (50) (NON-NULL) |
CreatedOn | DATE (NON-NULL) |
Color | NVARCHAR (15) |
SquareFootage | INT (NON-NULL) |
Sample table schema for Residential Construction Models
To begin, we will first start by setting up our Examine IndexSet which will be entered as a section into the ExamineIndex configuration file (/Config/ExamineIndex.config). We will provide a SetName to our IndexSet, as well as a custom IndexPath, and then we will add each field we want to index into the IndexUserFields set. Here is my sample IndexSet using the table above:
<IndexSet SetName="ResidentialConstructionIndexSet" IndexPath="App_Data\ResidentialConstructionIndexSet">
<IndexUserFields>
<add Name="Id" />
<add Name="ExamineId" />
<add Name="Name" />
<add Name="CreatedOn" />
<add Name="Color" />
<add Name="SquareFootage" />
</IndexUserFields>
</IndexSet>
Next up, we will want to configure our index and search providers in our Examine Settings file (/Config/ExamineSettings.config). Within the ExamineIndexProviders section, we will add a new Indexer to tell Examine what content we would like to Index. This is where we will specify things such as what type of data Examine can expect, and we can even point Examine to tell it where and how to Index those items!
<add name="ResidentialConstructionIndexer"
type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine" dataService="marathon.core.Data.Services.ResidentialConstructionService, marathon.core"
indexTypes="CustomData"
runAsync="false"
analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net" />
One thing to note here is that the Indexer shares the same prefix as our IndexSet (ResidentialConstruction). The dataService attribute points to our data collector and instructs Examine where to go when it is indexing for this IndexSet – keep in mind that you should have the fully qualified class name (in my case it is marathon.core.Data.Services.ResidentialConstructionService) followed by the project name in your solution, which is the typical way of referencing a class from a config file in .NET.
After that, we will create a Searcher to instruct Examine on what rules it should follow when searching our Index. This is your run-of-the-mill search provider that comes with Examine, you won’t need anything different here unless you are wanting to use a custom analyzer or something to that effect.
<add name="ResidentialConstructionSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
Indexing the Data
Now that our Examine Index is configured to use custom data, we need to write some code that will allow Examine to connect to our Database and populate our index.
Previously, we created an index provider named ResidentialConstructionIndexer that we pointed a dataService attribute to our ResidentialConstructionService class.
You will need to inherit from ISimpleDataService (Examine.LuceneEngine is the namespace), which requires you to implement a GetAllData method. The GetAllData method is what is called from Examine when it sees that it needs to index Custom Data.
This is the beginning of the ResidentialConstructionService class – note I have my own interface added to it, ICustomExamine, which enforces that a SQL Command string and Connection string are implemented on the class.
public class ResidentialConstructionService : ISimpleDataService, ICustomExamine
{
#region Interface Members
public string Command { get; set; } =
@"SELECT ExamineId
,Name
,CreatedOn
,Color
,SquareFootage
FROM ResidentialConstructionModels
WHERE CreatedOn > '2018-01-01'";
public string Connection { get; set; } = DataAccess.ExternalConnection;
public List<SimpleDataSet> Data { get; set; } = new List<SimpleDataSet>();
#endregion
// more goodies
}
This ensures that we have a select statement and a connection string so that Examine knows where to go to fetch the data and how to query for it.
Next in this class, we would implement our GetAllData method defined on the ISimpleDataService interface:
public IEnumerable<SimpleDataSet> GetAllData(string indexType)
{
try
{
var count = 0;
var records = DataAccess.GetData(Command, Connection);
foreach (var rec in records)
{
count = AddIndexItem(Data, rec, count);
}
}
catch (Exception ex)
{
Umbraco.Core.Logging.LogHelper.Error(typeof(ResidentialConstructionService), $"Error retrieving residential construction data - exception message: {ex.Message}", ex);
}
return Data;
}
This returns a collection of SimpleDataSet back to Examine so that it can create the index based off this format of data.
You’ll notice in the GetAllData method, there are two other calls that we are reaching out to, one is a generic SQL data-reader method I threw together which I will provide:
public static IEnumerable<IDataRecord> GetData(string sql, string connection)
{
using (var conn = new SqlConnection(connection))
{
conn.Open();
using (var cmd = new SqlCommand(sql, conn))
using (IDataReader dr = cmd.ExecuteReader())
{
while (dr?.Read() ?? false)
{
yield return dr;
}
}
}
}
But the next call actually adds each database record found from the query we defined into a formatted object that Examine knows how to use called a SimpleDataSet:
public int AddIndexItem(List<SimpleDataSet> data, IDataRecord record, int count)
{
count++;
data.Add(new SimpleDataSet()
{
//create the node definition, ensure that it is the same type as referenced in the config
NodeDefinition = new IndexedNode()
{
NodeId = count,
Type = "CustomData"
},
//add the data to the row
// this basically sets up the examine field with the value of the DB column for this particular examine record
RowData = new Dictionary<string, string>()
{
{ "Id", count.ToString() },
{ "ExamineId", Convert.ToString(record["ExamineId"]) },
{ "Name", Convert.ToString(record["Name"]) },
{ "CreatedOn", Convert.ToString(record["CreatedOn"]) },
{ "Color", Convert.ToString(record["Color"]) },
{ "SquareFootage", Convert.ToString(record["SquareFootage"]) },
}
});
return count;
}
This database record reads each column that we want to be searchable on our index into a row of data and is ultimately passed back to Examine as a collection that it will parse and create a searchable index from.
As you may have noticed above, in the SQL statement I have opted to include a WHERE clause to filter out additional data that I don’t want indexed. This provides you with the flexibility to build out the index with whatever constraints fit your needs.
At this point, you should be able to build your solution and log into the Umbraco Backoffice to verify under the Developer section, on the Examine Management tab, that you now have a new indexer that is filled with data.
I inserted 3 rows into my new table to demonstrate the indexer in action:
If you recall from my SQL statement for the indexer, I have chosen to only include houses that have a CreatedOn date GREATER than 2018-01-01 which should only return 2 of the 3 rows seen above.
If I check the Examine Management section of the Backoffice, this is confirmed by the index data:
Since I don’t have a search built out, I downloaded the Luke program for inspecting the index. I then created a basic Lucene search query to look for any records that have a Name of House1 which returned 1 result as expected (the top left pane is the search query and the bottom pane is the results for that query).
There is much more that can be done with this but hopefully if you were looking for a way to index your custom data in your Umbraco site, you have found this useful. Feel free to drop a comment and let me know how this worked out for you, or if you run into any issues!