In my previous article, we showed a straight forward way to index data from the relational database into Elasticsearch. In this article we will show you:
- Create mappings for the index
- Understand the SQL syntax for creating nested objects and nested object arrays in the elasticsearch
Elasticsearch provides powerful search capabilities with support for sharding and replication of the data. So we would want to index data available in our DB into Elasticsearch.
There are multiple ways to index data into Elasticsearch:
- Use Logstash to setup source as DB and sink as Elasticsearch and use a filter if required to build JSON object.
- Use an external library elasticsearch-jdbc which runs, in its own process, external to Elasticsearch instance. It makes use of the transport client and its bulk APIs to index data into Elasticsearch.
In this article, we will look at the approach 2 i.e using an external library running as a separate process.
Download the Elasticsearch distribution, note the version you want to use
curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.3.4/elasticsearch-2.3.4.tar.gz
tar -xvf elasticsearch-2.3.4.tar.gz
Make sure you have Java 7. Verify by java -version. If not find out the value of $JAVA_HOME by using echo $JAVA_HOME. Download Java 7 and set the $JAVA_HOME value to the place where you have downloaded Java 7.
Adding elasticsearch user
1. Login as su
3. Donot select password so that this user cannot be used for shell login
Run Elasticsearch as
sudo -H -u elasticsearch bin/elasticsearch -d. This runs elasticsearch as daemon.
Also to increase the heap size use
ES_JAVA_OPTS="-Xms2g -Xmx2g", so the command becomes
sudo -H -u elasticsearch ES_JAVA_OPTS="-Xms2g -Xmx2g -Dcom.sun.management.jmxremote.port=8855 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false" bin/elasticsearch -d