Site icon Experiences Unlimited

Index Data from a Relational Database into Elasticsearch – 1

Elasticsearch provides powerful search capabilities with support for sharding and replication of the data. So we would want to index data available in our DB into Elasticsearch.

There are multiple ways to index data into Elasticsearch:

  1. Use Logstash to setup source as DB and sink as Elasticsearch and use a filter if required to build JSON object.
  2. Use an external library elasticsearch-jdbc which runs, in its own process, external to Elasticsearch instance. It makes use of the transport client and its bulk APIs to index data into Elasticsearch.

In this article, we will look at the approach 2 i.e using an external library running as a separate process.

We will use MySQL as the DB and use the sample database world, which comes with MySQL, with the following tables:

Elasticsearch-jdbc library supports up to Elasticsearch 2.3.4 with the compatibility matrix available on their GitHub page. First, let us index the data without pre-defining the index structure in Elasticsearch (ES). If we don’t provide the index structure, ES infers and builds one based on the data indexed.  But this will not be an ideal structure always.

Create an Empty Index

Make sure ES is running. If you are on windows, then navigate to the bin directory of ES and run elasticsearch.bat. If you are on Linux, then follow the instructions here.

To create an empty index you have to issue a HTTP POST to <es_url>:<port>/<index_name>. For example, using cURL:

curl -X POST localhost:9200/world

You can even use REST clients of your choice like Postman to invoke the ES REST APIs

Setting up the JDBC Importer

Download the elasticsearch-jdbc binary from here and extract it into a folder, let’s call it ES_IMPORTER. There are already scripts available in ES_IMPORTER/bin for working with MySQL and other DBs and we will place all the scripts related to the importing in the same folder.

Note: If you are using Oracle DB, then you need to place the JDBC driver in the ES_IMPORTER/lib folder. For other DBs like MySQL, PostgreSql JDBC drivers are already available.

Another Note: elasticsearch-jdbc requires JDK 8

We will create the following files (code available at the links) to run the importer:

The JDBC connection string, the DB username and the password can be updated in the world-importer-config.json. Also, the ES port, the ES host name, the ES index name, the ES index type name can be updated in the same file.

Running the JDBC Importer

Before running the importer, open  the URL: http://localhost:9200/world

The above shows that the index world has no structure defined

Open the URL: http://localhost:9200/world/_search

The above shows that there is no data in the index world.

Navigate to the directory: ES_IMPORTER/bin and run world-importer.bat OR

Once the importer completes, which should be almost immediately, open the URL: http://localhost:9200/world/. You will now see that the index has a structure, which is also called mappings, defined within the mappings key. And when you open the URL: http://localhost:9200/world/_search you will find the indexed data which would be around 239 entries as reported by the hits field.

Advantages of JDBC Importer

Shortcomings of JDBC Importer

In the next part of this article we will:



Exit mobile version