Five Minute Quickstart with cbElasticSearch

Last week I built a small search application using ElasticSearch and ColdBox, and holy cow - it was super easy using the cbElasticSearch module. Here's how to quickly get started using cbElasticSearch for your application.

Minute One: Downloading and Running ElasticSearch

First, we're going to need an ElasticSearch server to play with.

By far the easiest option is to use Docker to download (via docker pull) and start (via docker run) an ElasticSearch image. This literally takes two commands and less than 60 seconds.

Assuming you are using Docker (and why not?), you can start up an ES image like this:

docker pull elasticsearch:7.5.1
docker run -d --name elasticsearch \
 -p 9200:9200 \
 -p 9300:9300 \
 -e "discovery.type=single-node" \
 elasticsearch:7.5.1

Note: I tried using the latest tag when pulling and starting an ES docker container, but in each case I had an error that looked like Error response from daemon: manifest for docker.elastic.co/elasticsearch/elasticsearch:latest not found For whatever reason, it seems the ElasticSearch docker container does not support the latest tag, so I was able to get around that by using the current latest version number, 7.5.1. Simple enough.

And that's all it takes to get an ES instance up and running. Sure, there's a lot more configuration you could ( and probably should) do before hitting production. But for a quick demo, this is perfect. Two bash commands and we're off.

By the way, you're going to need the cbElasticSearch module documentation, currently hosted at https://ortus.gitbook.io/cbelasticsearch.

Minute Two: Installing cbElasticSearch

The next thing you need is the cbElasticSearch module which provides a simple API for working with ElasticSearch.

In CommandBox, run:

install cbelasticsearch

When this completes, you should have a modules/cbelasticsearch folder in your ColdBox app.

Minute Three: Creating an ElasticSearch Index

Next you'll need an index for storing documents in. (I won't explain what an ES index is since many others have more accurate definitions.)

To create the index, I want to make sure it exists before any application code run. This could be on a new developer's machine on first start, it could be on first startup in production, etc. The key idea is to create the index on startup if it does not exist.

Oh, and this is a five-minute demo, so there's probably a "better" way to do this. Regardless, it seemed to work for me!

First I created an interceptors/InitIndex.cfc file. This is a ColdBox "interceptor", and I pointed my app to it using the interceptors configuration in config/ColdBox.cfc:

component {
    function configure(){
        // Other ColdBox and app settings...
        // ...

        // register the interceptor which runs on app start/reinit
        // and sets up the ES index
        interceptors = [{
            class = "interceptors.InitIndex",
            properties    = {},
            name            = "InitIndex"
        }];
    }
}

Next, in the InitIndex.cfc interceptor file I used the following to check for an index and create it if it doesn't exist:

component {
    /**
     * After the config has loaded on first startup or reinit, I think.
     */
    void function afterConfigurationLoad( event, interceptData ){
        if ( !getESClient().indexExists( "myDat" ) ){
            createIndex();
        }
    }
    /**
     * Creates the myDat index
     */
    private function createIndex(){
        getIndexBuilder().new(
            "myDat",
            {
                "_doc" = {
                    "_all" = { "enabled" = false },
                    "properties" = {
                        "title" = { "type" = "text" },
                        "tags" = { "type" = "keyword" },
                        "description" = { "type" = "text" }
                    }
                }
            }
        ).save();
    }

    /**
     * Load ElasticSearch client model from the cbElasticSearch module
     */
    Client function getESClient() provider="Client@cbElasticsearch"{}

    /**
     * Load ElasticSearch index builder model from the cbElasticSearch module
     */
    IndexBuilder function getIndexBuilder() provider="IndexBuilder@cbElasticsearch"{}
}

This file hooks into the afterConfigurationLoad app lifecycle event in ColdBox to run during application startup and on every app reinit. This way I know that the ES index will be created by the time ColdBox is running my app code.

For creating your ES index, you'll need to do some research on what types of fields to use. Each field should be defined within the properties section, and most fields will probably need { "type": "text" }. I used the keyword type for a few fields because it requires an exact match when searching - think of matching an exact category name as opposed to searching the body of a blog post.

The rest of this is standard ColdBox functionality - I could have used getInstance( "Client@cbElasticsearch" ), but I like the way provider functions sorta "list out" all dependencies of the component. By the way: I cannot use the standard property name="test" inject="test"; injection DSL since interceptors can be executed really early in the ColdBox app lifecycle - before WireBox has had a chance to inspect and cache the mapping DSL.

Minute Four: Loading Documents into the ElasticSearch Index

The second-to-last step in this five-minute demo is to load data into the ElasticSearch index.

I created a contentPath setting in config/ColdBox.cfc which referenced a directory full of .json files:

function configure(){
    settings = {
        contentPath: getSystemSetting("CONTENT_PATH", "resources/data/myDat")
    }
    // register the interceptor which runs on app start/reinit
    // and sets up the ES index
    interceptors = [{
        class = "interceptors.InitIndex",
        properties    = {},
        name            = "InitIndex"
    }];
}

Notice I used the getSystemSetting() function so I can override this with an environment variable if I choose using .env and commandbox-dotenv. However, for now the default value (the second argument in getSystemSetting()) works fine for me.

To load in the json files, I went back and added a little more to my interceptor:

    /**
     * After the config has loaded on first startup or reinit, I think.
     */
    void function afterConfigurationLoad( event, interceptData ){
        if ( !getESClient().indexExists( "myDat" ) ){
            createIndex();
          // load dem data!!!
          populateIndex( getDataFiles() );
        }
    }

    /**
     * Get big ole array of .json files hanging out in the content directory.
     */
    private array function getDataFiles(){
        var path = getSetting( "contentPath" );
        return directoryList(
            path = expandPath( path ),
            recurse = false,
            listInfo = "name",
            filter = "*.json",
            type = "file"
        );
    }

    /**
     * Given an array of .json file names,
     * read each into a new document in our ElasticSearch index.
     * @files {Array} array of file names without path specified. Path is assumed to be that of the `contentPath` setting.
     */
    private function populateIndex( required array files ){
        files.each( function( filename ) {
            var filepath = expandPath( getSetting( "contentPath" ) ) & "/" & filename;
            if ( fileExists( filepath ) ){
                var data = fileRead( filepath );
                if ( isJSON( data ) ){
                    var result = getDocument()
                    .new(
                        index = "myDat",
                        type = "_doc",
                        properties = deSerializeJSON( data )
                    )
                    .save();
                }
            }
        } );
    }
    Document function getDocument() provider="Document@cbElasticsearch"{}

In brief:

  1. I used the directoryList() function to iterate over all .json files in my content directory.
  2. I sent the resulting file names to a function, and for each file:
    1. Confirm it exists via fileExists()
    2. Read the file via fileRead()
    3. Confirm it's valid JSON via isJSON()
    4. Save the JSON as a new document using Document.new().save()

Again, this is demo-quality. I may go back and "clean it up"... or I may consider this good enough to send to prod. Point is, this was all much easier than expected.

Minute Five: Searching ElasticSearch

At last! We are now done with configuring and loading ElasticSearch data and can use ES for its intended purpose. (Put that name to good use, eh?)

To search the index, I created a Search.cfc handler which looked something like so:

component{
    
    /**
     * Search all items!
     */
    function search( event, rc, prc ) cache="true" cacheTimeout="30" {
        event.paramValue( "q", "" );

        var search  = getInstance( "SearchBuilder@cbElasticSearch" )
            .new(
                index = "myDat",
                type = "_doc"
            )
            .multiMatch(
              names = [ "title", "description", "tags" ],
              value = event.getValue( "q" )
          );
            searchResults = search.execute();
        
            prc.results = [];
            searchResults.getHits().each( function( item ){
                prc.results.append( item.getMemento() );
            } );

            event.setView( "Search/Results" );
        }
}

The multiMatch function in the cbelasticsearch SearchBuilder allows me to search multiple fields for a particular query. Once I have the results, it is simple enough to loop over the hits and append the "memento" (that's a struct, bro) to an array.

PS. One awesome tip here is to use the collection argument in ColdBox's renderView() function to loop over the results and render a .cfm view file for each item.

In my app, I created the following code in view/Search/Results.cfm:

<cfoutput>
    <cfif NOT ArrayLen( prc.results )>
        <p class="alert alert-info">Sorry, couldn't find any results for that search query.</p>
    <cfelse>
        #renderView(
            view = "Main/Item",
            collection = prc.results
        )#
    </cfif>
</cfoutput>

Then in views/Main/Item.cfm we can output the item however we want using the name of the view file (that would be Item) as the variable name of the struct containing the single result:

<cfoutput>
    <article>
        <h2>#encodeForHTML( Item.title )#</h2>
        #encodeForHTML( Item.description )#
    </article>
</cfoutput>

And that's it! This all took me perhaps two hours. Using the snippets I've pasted here, you should be able to get started with ElasticSearch in five minutes. Ready... set... go!

January 13, 2020

« Five Things I'd Like To Learn in 2020 - Announcing CFsnippets.com »