Index Data from a Relational Database into Elasticsearch – 2

In my previous article, we showed a straight forward way to index data from the relational database into Elasticsearch. In this article we will show you:

  1. Create mappings for the index
  2. Understand the SQL syntax for creating nested objects and nested object arrays in the elasticsearch

Create mappings for the index

We would better know of the data and its structure than Elasticsearch. It’s good to start with the mapping structure created by Elasticsearch. But over time when we understand the index structure, we can create our own mapping and specify it while creating the index.

The complete index definition for the world index can be found here. It includes the index properties and the mappings structure of the data to be indexed. The syntax for the index definition is:

{
    "aliases": {},
    "mappings": {},
    "settings": {},
    "warmers": {}
}

where,

  • aliases – the aliases created for this index
  • mappings – the structure of the data in the index
  • settings – the settings for the index like the number of shards and the number of replicas for the index.
  • warmers – any index warmers. This has been removed in Elasticsearch 5.3

Let us focus on mappings. There are three nested properties in the mapping: capital, language and cities. The cities attribute is an array of objects. We differentiate its type by defining it as nested.

We will create a new index world_v2 by sending a POST request with the index definition available here as the request body.

Why do we need to declare it’s type as nested?

This is because declaring it as nested will allow us to filter the data on these nested attributes. Elasticsearch will flatten these nested objects while indexing and there by allowing us to filter on the same.

SQL syntax for creating nested object

In the SQL query we have created column alias in the query of the form: “object.property_name” and “object[property_name]“. For example “capital.name“, “cities[name]“. The “.” notation is used for nested object and the “[]” notation is used for  array of nested object.

This is the syntax supported by the elasitcsearch-jdbc importer we are using.

Indexing data into new index

After creating the index, the index structure as shown by: http://localhost:9200/world_v2 looks like:

mappings.PNG

Update world-importer-config.json index property to world_v2 or the index name you have provided. And then run world-importer.bat or world-importer.sh

In our subsequent articles, we will see writing simple queries, aggregate queries, filtering on nested objects and lot more on this indexed data.

 

3 thoughts on “Index Data from a Relational Database into Elasticsearch – 2”

Leave a Reply

Discover more from Experiences Unlimited

Subscribe now to keep reading and get access to the full archive.

Continue reading