How to Install and Configure Elasticsearch on Linux

Elasticsearch is a search engine that uses the Lucene library as its base. It is free and open-source. Therefore, it’s a full-text search engine with a distributed, multitenant capability, an HTTP web interface, and schema-free JSON documents. It is also a NoSQL database. Let’s look at how Elasticsearch differs from other NoSQL databases, as there are so many. Installing Elasticsearch is a pretty straightforward process. This tutorial will guide you on how to install and configure Elasticsearch, following certain steps.

This article will show you why elastic search is important and what it can do for your organization, and most importantly, if you follow along with the tutorial you would be able to install Elasticsearch, test it and run a REST API query on it using CURL. If by any chance you encounter systemd timeout issue while working on yours, please check out this tutorial – Elasticsearch: How to stop systemd service start operation from timing out

Other helpful tutorials can be found here: Elasticsearch error feature and how to install, configure Prometheus for Monitoring on a Linux Server also How to install Java Runtime Environment on Mac OS and How to install Node.js on Ubuntu, and how to install and configure Tripwire on Ubuntu again, How to Install Apache OpenOffice on Ubuntu

Capabilities of Elasticsearch

Elasticsearch could be used to analyze a variety of diverse types of data. In addition, it has a comprehensive search system, near real-time search, and multi-tenancy support. Elasticsearch collects unstructured data from a variety of sources, stores, and indexes it using user-defined mapping (which can also be derived dynamically from data) and makes it accessible.

Its networked architecture allows it to search and analyze substantial amounts of data in real-time. Furthermore, it allows you to start small and build up to hundreds of machines. Running a full-featured search cluster with Elasticsearch is simple, but scaling it needs a significant amount of experience.

Elasticsearch is frequently used for storing data that needs to be sliced and diced, aggregated by numerous aspects, and so on, in addition to full-text search-oriented use cases like product search, document search, and email search, and so on. Elasticsearch for metrics, logs, traces, and other timeseries data are examples of such analytical use cases.

The search engine is available on-premises and in the cloud. You have the option of running it yourself or using a hosted Elasticsearch service like AWS Elasticsearch. See https://aws.amazon.com/opensearch-service/ for more information

Why Install and Configure Elasticsearch?

Full Text Search Engine has one of the most powerful full-text search capabilities and allows you to run and combine a variety of searches, including structured, unstructured, geo, and metric searches.
Analytical Engine Elasticsearch’s analytical use case is the most popular. Elasticsearch is frequently used for log analytics, as well as slicing and dicing numerical data like application and infrastructure performance indicators. Elasticsearch’s faceting feature allows users to aggregate data on the fly via aggregation queries.
Designed for Scalability Elasticsearch was designed from the start to be scalable. Elasticsearch’s distributed architecture allows you to grow it to a large number of machines and handle petabytes of data.
Good investment the basics of Elasticsearch are pretty simple to grasp, at least when dealing with a tiny dataset or deployment. It’s easy RESTful APIs integrate with data ingestion tools like Logstash, which sends data to Elasticsearch as JSON documents, and Kibana, which allows you to create reports and view your data.

How Elasticsearch Works

Elasticsearch operates by obtaining and managing document-oriented and semi-structured data. Internally, the “shared nothing” design is the guiding philosophy of how Elasticsearch operates. Besides, Elasticsearch’s fundamental data structure is an inverted index handled using the APIs of Apache Lucene.

In its most basic form, an inverted index is a mapping of each unique ‘word’ (token) to the list of documents (locations) containing that word, allowing users to rapidly identify papers containing certain keywords. Index information is maintained in one or more partitions, also known as shards.

Key Terminologies to Note When Installing and Configuring Elasticsearch

shard

A shard is a collection of documents from an index. When the volume of data stored in your cluster surpasses the boundaries of your server, Elasticsearch employs shards. As a result, you may divide your index into smaller chunks known as shards. A shard is a single Lucene index instance. Elasticsearch has two types of shards:

primary shards, or active shards, that hold the data replica shards, or duplicates of the primary shard

Mapping

A mapping defines the schema for the index. Extending the mapping with additional fields or adding sub-fields is feasible at any time, however altering the type of fields is a more involved procedure that requires re-indexing the data.

Segments

A segment is a term coined at the Lucene level. They represent shard bits (Lucene Index). Each Lucene index comprises one or more sections. While this is a Lucene-level issue, Elasticsearch does include knobs for managing segment sizes, and how you tune them affects Elasticsearch indexing performance.

Document

Elasticsearch’s core and the basic unit of information entity is the document, expressed in JSON (JavaScript Object Notation) format. You can index or save document. An index may have one or more documents, and each document may contain one or more fields. Aside from the indexed fields of a document, the API represents the original as “_source.”

Node

A node is a single Elasticsearch process instance. It is a server that stores data and participates in the indexing and searching activities of the cluster. The common cluster name allows nodes in the cluster to find each other. Nonetheless, depending on the node configuration, you can utilize multicast or unicast discovery in a cluster. A single physical server, virtual machine, or container can support several nodes. Data nodes and master nodes are the two basic types of nodes. You can set up Nodes to store data, function as cluster master nodes, or both.

Cluster

A cluster consists of one or more nodes (servers) that store all data and offer to index and search across all nodes. Each cluster contains a single active master node selected at random

Requirements for Installing and Configuring Elasticsearch

Here are the requirements and steps to install Elasticsearch below:

A Linux operating system, preferably Ubuntu 20.04
A user account with root privileges.

Step1: Install Java dependency

 apt install openjdk-8-jdk

Step2: Installing from the APT repository

You may need to install the apt-transport-https package on Debian before proceeding:

sudo apt-get install apt-transport-https

Step3: Save the Repo

Save the repository definition to /etc/apt/sources.list.d/elastic-8.x.list:

echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

Step4: Install and Configure Elastisearch using apt package

install the Elasticsearch Debian package with:

 sudo apt-get update & sudo apt-get install elasticsearch

Download and Install Elasticsearch Manually

Download the most recent version of the install package for your platform

It will be a.tar.gz file on Linux. Also, ensure that the path in the.yml file follows the right syntax.

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.3-amd64.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.3-amd64.deb.sha512
shasum -a 512 -c elasticsearch-8.1.3-amd64.deb.sha512 
sudo dpkg -i elasticsearch-8.1.3-amd64.deb

It will generate all of the folders necessary by elasticsearch. The preceding are the minimal prerequisites for configuring elastic search.

use apt-get to install as shown below

root@ubuntu:/home/rdgmh# sudo apt-get install elasticsearch
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  elasticsearch
0 upgraded, 1 newly installed, 0 to remove and 110 not upgraded.
Need to get 516 MB of archives.
After this operation, 1,101 MB of additional disk space will be used.
Get:1 https://artifacts.elastic.co/packages/8.x/apt stable/main amd64 elasticsearch amd64 8.1.3 [516 MB]
Get:1 https://artifacts.elastic.co/packages/8.x/apt stable/main amd64 elasticsearch amd64 8.1.3 [516 MB]
Get:1 https://artifacts.elastic.co/packages/8.x/apt stable/main amd64 elasticsearch amd64 8.1.3 [516 MB]
Fetched 25.8 MB in 8min 11s (52.7 kB/s)                                       
Selecting previously unselected package elasticsearch.
(Reading database ... 189950 files and directories currently installed.)
Preparing to unpack .../elasticsearch_8.1.3_amd64.deb ...
Creating elasticsearch group... OK
Creating elasticsearch user... OK
Unpacking elasticsearch (8.1.3) ...
Setting up elasticsearch (8.1.3) ...
--------------------------- Security autoconfiguration information ------------------------------

Authentication and authorization are enabled.
TLS for the transport and HTTP layers is enabled and configured.

The generated password for the elastic built-in superuser is : JHH=cp9*D_=z7jw0gA2q

If this node should join an existing cluster, you can reconfigure this with
'/usr/share/elasticsearch/bin/elasticsearch-reconfigure-node --enrollment-token <token-here>'
after creating an enrollment token on your existing cluster.

You can complete the following actions at any time:

Reset the password of the elastic built-in superuser with 
'/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic'.

Generate an enrollment token for Kibana instances with 
 '/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana'.

Generate an enrollment token for Elasticsearch nodes with 
'/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s node'.

-------------------------------------------------------------------------------------------------
### NOT starting on installation, please execute the following statements to configure elasticsearch service to start automatically using systemd
 sudo systemctl daemon-reload
 sudo systemctl enable elasticsearch.service
### You can start elasticsearch service by executing
 sudo systemctl start elasticsearch.service

Start & check the status of the service

After installing Elasticsearch, start configuring it.

sudo systemctl start elasticsearch.service
systemctl status elasticsearch.service

How to install and configure Elasticsearch on Linux -sudo-systemct — status

Configure Elasticsearch File

Edit the elasticsearch.yml file.

Open the elasticsearch/config/elasticsearch.yml configuration file and modify the following setting parameters.

sudo vim /etc/elasticsearch/elasticsearch.yml

Cluster

First and foremost, you should give your cluster and the node you are setting names.A node can only join a cluster if it is also a member of the cluster. Name with the names of all the other nodes in the cluster You should name your cluster correctly to indicate its function.

Node name

The node.name property is used to identify nodes in an Elasticsearch cluster in a human-readable manner. The node.name is set to the server’s hostname by default, although it may be changed

How to install and configure Elasticsearch on Linux -node — node

Network

Elasticsearch is only accessible by default via localhost or the IP address 127.0.0.1 If you wish to query it from another server or your local computer, you must specify network.host to an IP address.

When you’re through with the adjustments, save the file and close the editor.

Then launch the Elasticsearch service and set it to start automatically at boot.

for the network there are 3 options

0.0.0.0 which will make your configuration accept all network connections
127.0.0.1 which is local host, and this is actually the default
You can choose to bind the network. host with your local Ip address

sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch

The setup process may take a few seconds, but must be finished without error or output. Following that, use the following command to verify the status of the service.

sudo systemctl status elasticsearch

Continue with the firewall setting while the Elasticsearch service is running before attempting the search.

Configuring firewall

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: techdirectarchive-blog-cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1

#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#

## Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
#network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:

#cluster.initial_master_nodes: ["node-1", "node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Allow wildcard deletion of indices:
#
#action.destructive_requires_name: false

#----------------------- BEGIN SECURITY AUTO CONFIGURATION -----------------------
#
# The following settings, TLS certificates, and keys have been automatically      
# generated to configure Elasticsearch security features on 26-04-2022 22:47:37
#
# --------------------------------------------------------------------------------

# Enable security features
xpack.security.enabled: true

xpack.security.enrollment.enabled: true

xpack.security.enrollment.enabled: true

# Enable encryption for HTTP API client connections, such as Kibana, Logstash, and Agents
xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

# Enable encryption and mutual authentication between cluster nodes
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
# Create a new cluster with the current node only
# Additional nodes can still join the cluster later
cluster.initial_master_nodes: ["ubuntu"]

# Allow HTTP API connections from localhost and local networks
# Connections are encrypted and require user authentication
http.host: [_local_, _site_]

# Allow other nodes to join the cluster from localhost and local networks
# Connections are encrypted and mutually authenticated
#transport.host: [_local_, _site_]

#------------------------ END SECURITY AUTO CONFIGURATION ----------------------

Set the following settings accordingly in the elasticsearch.yml file, as demonstrated in the preceding example:

sudo ufw status

Testing the connection

Curl or another comparable command-line HTTP tool is the quickest thing to query the Elasticsearch server.

curl -X GET 'http://localhost:9200'

{
  "name": "elastic.example.com",
  "cluster_name" : "techdirectarchive-blog-cluster",
  "cluster_uuid" : "BTN3941lctuECJfI_fAGlq",
  "version" : {
    "number" : "7.12.1",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "6189837139a9c6b6s23d3200870651f10d3343f0",
    "build_date" : "2022-04-29T20:56:39.040728659Q",
    "build_snapshot" : false,
    "lucene_version" : "8.8.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "linux and elastic, rocks"
}

Querying the database

Now that we have a functional Elasticsearch node, let’s feed it some data to see how it performs.

Using the following command, create a new index with an associated message.

curl -X POST -H 'Content-Type: application/json' 'localhost:9200/example/helloworld/1?pretty' -d '{ "message": "Hello world!" }'

To ensure, you should see something like this.


  "_index" : "example",
  "_type" : "helloworld",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Summary

If you followed along, then well-done! You have a clear knowledge of how to install and configure Elasticsearch. Furthermore, you should now have a fully operational Elasticsearch node. A simple single node configuration is a quick way to get started with Elasticsearch, which then allows for easy scaling up. In practice, owing to automatic rebalancing and routing, interfacing with Elasticsearch works the same regardless of the number of nodes