Elasticsearch

Elasticsearch is described as a 'full text search' and 'analytics engine'.

A glossary is a good thing to peruse first, a couple of definitions are given below, to get a grip of the nomenclature.

Full text
Containing the full text, as distinguished from meta data like a database with an author field, title, date etc
analytics engine
A general purpose system. In this sense it is a system that can query and work with the data contained in the full text search
Document
The data being used as material for the search engine comprising fields extracted using Leucine
Field
A key-value extracted from the original document during indexing
Term
A word or phrase used for search
Token
A string occurring in a field consisting of the text value, start position, offset, and type
Inverted index
A lookup table consisting of the unique words in documents in the search index
Index
You can think of an index as a MongoDB collection which allows queries

So before elasticsearch engineers would hammer an sql database for searches, not great. Imagine all those wild card queries. This is where search engines came in. They take a document as a string, parsing it and analysing string tokens, creating a kind of look up table (an index) based on the content which includes numbers of occurrences and references to the documents etc. Elasticsearch is based on Apache Solr, which in turn uses Apache Leucine. T

Elasticsearch is a much easier search engine to use than Apache Solr I reckon. It also uses JSON so data can be structured or unstructured and makes life much more easier.

Installation

I personally think the installation and configuration can be messy - I personally have seen it installed willy-nilly and such a nightmare to trace where folders are all over the shop. Follow the instructions on their site first.

Elastic folder structure

The installation folder structure looks like this:

elasticsearch
  \bin - executables
  \config - namely elasticsearch.yml
  \data - where data is stored
  \lib - libraries used by elasticsearch
  \logs - can be changed in config/elasticsearch.yaml
  \plugins

As above, because elasticsearch is a standalone app you can put the elasticsearch folder anywhere. Make sure you put it somewhere useful like at the root or /usr/local or /var/etc/. Whatever standard you have that gives consistency.

The example folder layout suggests a folder structure for your app. Again, stick to a convention.

Running Elasticsearch

Navigate to your elasticsearch installation directory. Then in a terminal run:

./bin/elasticsearch

Or just set up a symlink in your /usr/local/bin to point to that binary. Here is more info on starting and stopping service. If you want to setup elasticsearch as a service use /bin/systemctl start elasticsearch.service.

Now, in a browser list all indices by visiting

http://localhost:9200/_cat/indices?v

Obviously we have nothing there yet, so we will just see something like:

health status index pri rep docs.count docs.deleted store.size pri.store.size 

You can run elasticsearch specifying the cluster and node in one line:

elasticsearch --cluster.name randomname --node.name local_dev

To see the cluster health go to:

http://localhost:9200/_cat/health?v

And for the node health:

http://localhost:9200/_cat/nodes?v

Executing queries

You can use your browser to interact with Elasticsearch or you can use Curl for running queries in the format below:

curl -X<REST Verb> <Node>:<Port>/<Index>/<Type>/<ID>

Plugins

Elasticsearch is more like an engine, and you only masturbate over an engine when you really know what your talking about. Instead you need some eye candy to masturbate over, that is where plugins come in.

There are lots of cool plugins for Elasticsearch. We use Marvel and Kibana.

Marvel-agent is an interface to Elasticsearch innards and state, helping to spot problems fast and providing an overview of what is going on with the system and deployments. To install, navigate to your elasticsearch installation directory and run:

./bin/plugin install elasticsearch/marvel/latest

Kibana is an interface to the data, analytics, and a lot more. Just to make things awkward it is actually a stand alone app in itself so it is not a plug in as such, more like a wrapper. Follow the setup instructions first.

Once you've installed it navigate to the directory and run using:

cd kibana
bin/kibana

That runs a node app and it listens on port 5601:

http://localhost:5601

So remember the comment about running queries? A plugin for Kibana I use for querying is called Sense. To install that, navigate to your kibana installation directory and run:

./bin/kibana plugin --install elastic/sense

Then in kibana (http://localhost:5601) you will see the sense plugin, click on it and start running queries inline. So good!

Queries

Create an index:

PUT /library

Add a record:

PUT /library/books/1
{
  "title": "Kevin the Chicken"
}

Putting also updates a document:

PUT /library/books/1
{
  "title": "Kevin the Chicken",
  "description": "The adventures of Kevin"
}

To run the most basic search do:

GET /library/_search

Or to get a specific record you can reference the _id:

GET /library/books/1

Or you can add a query string:

GET /library/books/_search/?q=Kevin&analyzer=english&default_operator=AND

Or you can use an object:

GET /library/books/_search
{
  "simple_query_string": {
    "query": "Kevin",
    "analyzer": "english"
  }
}

Check out the Elasticsearch query string documentation.

Deleting a single item:

DELETE /library/books/1

Deleting an index:

DELETE /library

Getting metadata:

GET /library/_stats

Any CRUD type activity usually results in a message like this:

{
  "acknowledged": true
}

You can add aliases for an index or indices:

{
    "actions" : [
        { "add" : { "index" : "logs-2018-07-03", "alias" : "today" } }
    ]
}
{
    "actions" : [
        { "add" : { "index" : "logs-2018-07-03", "alias" : "today" } },
        { "add" : { "index" : "logs-2018-07-02", "alias" : "twodays" } },
        { "add" : { "index" : "logs-2018-07-02", "alias" : "twodays" } }
    ]
}
{
    "actions" : [
        { "add" : { "index" : "logs-2018-07-03", "alias" : "today" } },
        { "add" : { "index" : "logs-2018-07-03", "alias" : "twodays" } },
        { "add" : { "index" : "logs-2018-07-02", "alias" : "twodays" } },
        { "add" : { "index" : "logs-2018-07-03", "alias" : "threedays" } },
        { "add" : { "index" : "logs-2018-07-02", "alias" : "threedays" } },
        { "add" : { "index" : "logs-2018-07-01", "alias" : "threedays" } }
    ]
}

Ingestion and batch ingestion

Ingestion is simply adding records into elasticsearch. We've done this using PUT like this:

PUT /library/books/1
{
  "doc": "Lorem ipsum dolor sit amet...",
  "documentInfo": {
    "Author": "Mark Robson",
    "Content-Type": "doc",
    "Creation-Date": 1459948321944,
    "Keywords": "lorem, ipsum",
    "title": "Mark Robson's tremendous magnificent amazing document"
  },
  "link": "http://www.google.com"
}

If you have a multiple documents to ingest, there is an endpoint for that too, POST /index/type/_bulk.

POST /library/books/_bulk
{ "index": { "_id": 1 } }{ "doc": "Lorem ipsum dolor sit amet ...", "documentInfo": { "Author": "Mark Robson", "Content-Type": "doc",   "Creation-Date": 1459948321944, "Keywords": "lorem, ipsum",   "title": "Mark Robson's tremendous magnificent amazing document" }, "link": "http://www.google.com" }
{ "index": { "_id": 2 } }{ "doc": "Lorem ipsum dolor sit amet ...", "documentInfo": { "Author": "Bob Johnson", "Content-Type": "doc", "Creation-Date": 1459948321944, "Keywords": "lorem, ipsum", "title": "Bob johnson's tremendous magnificent amazing document" }, "link": "http://www.google.co.uk" }

(Note Bene that the index and document are on one line each)

Partial updates

We don't want to have to PUT the whole document if we just want to update a tiny part of it. This is where partial updates come in.

Example from elastic.co:

POST /website/blog/1/_update
{
   "doc" : {
      "tags" : [ "testing" ],
      "views": 0
   }
}

Another example from elastics website uses scripts which need to be enabled. Here is one of their examples:

POST /website/blog/1/_update
{
   "script" : "ctx._source.tags+=new_tag",
   "params" : {
      "new_tag" : "search"
   }
}

Mapping

Mapping is establishing what the elements of the document are and how best to store it for searches. Elasticsearch will figure out the best data types to use for the document fields you have ingested by default. Check out the list of elasticsearch data types.

To view the field mappings:

GET /library/books/_mapping

To specify your own mapping:

PUT /library/books/_mapping
{
  "library": {
    "mappings": {
      "books": {
        "properties": {
          "title": {
            "type": "string"
          },
          "description": {
            "type": "string"
          }
        }
      }
    }
  }
}

As elasticsearch processes JSON we can add a nested object:

PUT /library/books/2
{
  "authoredDocument": {
    "title": "Kev Cockadoodles",
    "description": "The chronicles of Kevin the Chicken"
  }
}

Slop

Doing an exact search is fine and well, but sometimes we want to show 'Kevin the Chicken' when we search for 'Chicken Kevin'. This is where slop comes in handy.

GET /library/books/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "Chicken Kevin",
        "slop":  3
      }
    }
  }
}

The slop is the number of times a word is removed/changed from the original search term. Because 'Chicken' and 'Kevin' are in different places, that takes the count up to 2, and then removing the 'the' would knock it up one to 3.

Analyzers - Americans love their Zees

Analysers break up your content into tokens (a symbol or string), breaking them apart based on whitespace (or some specific character), and filters them with one or more filters. A filter is just that, it can change the case, stem the word, remove characters etc. Lets check the default analyser:

GET /library/_analyze?pretty&analyzer=default&text=Kevin%20the%20Chicken

We can specify what analyser to use at the query level too:

GET /library/books/_search/?q=Kevin+Chicken&analyzer=english&default_operator=AND

Or:

GET /library/books/_search
{ "query": {
    "simple_query_string": {
      "query": "Kevin the Chicken",
      "analyzer": "english"
    }
  }
}

Custom analysers can be set at index creation time:

PUT /library
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "stemmer": {
            "type": "stemmer",
            "language": "english"
          },
          "stopwords": {
            "type": "stop",
            "stopwords": [
              "_english_"
            ]
          }
        },
        "analyzer": {
          "default": {
            "filter": [
              "lowercase",
              "stopwords",
              "stemmer"
            ],
            "type": "custom",
            "tokenizer": "standard"
          }
        }
      }
    }
  }
}

To see what analysers are used on your index just curl:

curl localhost:9200/youramazingindex/_settings

Synonyms

Synonyms can be added easily in Elasticsearch. It is added at the index creation time. Note bene, if this is added to an existing index all documents need to be reindexed.

PUT /library
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "stemmer": {
            "type": "stemmer",
            "language": "english"
          },
          "stopwords": {
            "type": "stop",
            "stopwords": [
              "_english_"
            ]
          },
          "my_synonym_filter": {
            "type": "synonym",
            "synonyms": [
              "chicken,bird",
              "kevin,protagonist"
            ]
          }
        },
        "analyzer": {
          "default": {
            "filter": [
              "lowercase",
              "stopwords",
              "stemmer",
              "my_synonym_filter"
            ],
            "type": "custom",
            "tokenizer": "standard"
          }
        }
      }
    }
  }
}

And then doing a search on those synonyms:

GET /library/books/_search
{ "query": {
    "simple_query_string": {
      "query": "bird protagonist"
    }
  }
}

You can also use synonyms in a file.

Securing Elasticsearch with Shield

Shield is another plugin for Elasticsearch which is a user management system, authentication using Active Directory and also Role Based Authorisation. Check out the Shield getting started guide for an overview.

Before you install you will need the licence plugin first.

cd /your/elsticsearchdirectory
bin/plugin install license
bin/plugin install shield

Licenses are pretty expensive, so make sure you are ready to work on it before you install it.

Check out basic authentication.

Node integration

Check out the api reference and the elasticsearch npm module.

Elasticsearch Pipelines

Resources