Skip to content

TechDirectArchive

Hands-on IT, Cloud, Security & DevOps Insights

  • Home
  • About
  • Advertise With US
  • Contact
  • Reviews
  • Toggle search form
Home » Network | Monitoring » How to Install Hadoop on Linux
  • ccsC
    NTuser.dat file: How to correctly load Windows Registry Hive Windows
  • powershell
    How to install and uninstall PowerShell on macOS via Homebrew Mac
  • Featured image Excel crash 1
    How to Fix Microsoft Excel Crash Issues Network | Monitoring
  • Featured image Desktop Stickers
    How to create Desktop Stickers in Windows 11 Windows
  • 1 ifg3ir3l 8ejus3pueqt0a
    Fix cannot find KDC for realm while getting initial credentials and kinit configuration file does not specify default realm Configuration Management Tool
  • PSD1 Azure 2
    How to install PSD Hydration Kit for remote bare-metal deployment or via PXE boot Windows Server
  • loc
    Solve VMware workstation .lck error [Part 1] Virtualization
  • Wiki in Linux
    How to Use Wiki from Linux Terminal Linux

How to Install Hadoop on Linux

Posted on 30/03/202409/04/2024 Raphael Gab-Momoh By Raphael Gab-Momoh No Comments on How to Install Hadoop on Linux
Hadoop-installation

In this guide, we shall discuss how to Install Hadoop on Linux. With the use of straightforward programming paradigms, the Apache Hadoop software library provides a framework for the distributed processing of massive data sets over networks of computers. Please see how to fix MySQL Workbench could not connect to MySQL server, fix “WARNING: The provided hosts list is empty only the localhost is available and note that the implicit localhost does not match all“, and How to perform SSH key-based authentication in Linux.

It is intended to scale up from a small number of servers to thousands of devices, each of which provides local computing and storage. The library itself is designed to identify and manage faults at the application layer, so it may give an available service on top of a cluster of computers, each of which may be prone to failures, rather than relying on hardware to deliver high availability.

Also, see how to Associate SSH Public key with Azure Linux VM, and how to install Java Runtime Environment on Mac OS.

Prerequisites to installing Hadoop on Linux

  • Ubuntu 18.04 or Higher
  • Access to a command line tool
  • Sudo or root privileges on local /remote machines

Step1: Install OpenJDK on Ubuntu

A suitable Java Runtime Environment (JRE) and Java Development Kit are necessary for the Hadoop framework’s services, which are developed in Java (JDK). Before beginning a new installation, use the following command to update your system:

sudo apt update

Currently, Apache Hadoop 3 fully supports Java 8.x. Both the runtime environment and the development kit are included in the Ubuntu OpenJDK 8 package.

To install OpenJDK 8 in your terminal, enter the following command:

sudo apt install openjdk-8-jdk headless -y

The interaction between components of a Hadoop ecosystem might be impacted by the OpenJDK or Oracle Java version. Check the current Java version when the installation is finished:

java -version; javac -version

Which Java edition is being used is revealed in the output?

java-version
java version

Step2: Create a Non-Root User in the Hadoop Environment

Particularly for the Hadoop environment, it is preferable to create a non-root user. You may more effectively manage your cluster and increase security by using a unique user.

The user must be able to create a passwordless SSH connection with localhost in order for Hadoop services to operate without interruption.

Install OpenSSH on Ubuntu

Install the OpenSSH server and client using the following command:

sudo apt install openssh-server openssh-client -y
ssh1
open ssh installed
ssh2
open ssh installed

configure the SSH using the command

nano /etc/ssh/sshd_config

You can choose to change the port to anything you want

ssh-port
ssh config
  • Change the port number to the value of your choice. Make sure there is no “#” at the beginning of the line.
  • Exit the editor and confirm that you want to save the changes.
  • For the changes to take effect, restart the sshd service with this command:
service sshd restart

Create Hadoop User

To add a new user to Hadoop, use the adduser command:

sudo adduser hadoop
adduser
adduser

In the given instance, the username is hadoop. Any username and password that you want to use are acceptable. Change the current user to the newly created one, then enter the associated password:

su - hadoop
su-hadoop
su hadoop

Now, the user must be able to connect to localhost over SSH without being requested for a password.

Enable Passwordless SSH for Hadoop User

Create an SSH key pair and specify where it should be kept with the command:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
rsa
keygen

To save the public key as authorized keys in the ssh directory, use the cat command:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Set the permissions for your user with the chmod command:

chmod 0600 ~/.ssh/authorized keys

Note: A password is no longer required each time the new user wants to SSH.

Use the hadoop user to SSH to localhost to ensure everything is configured properly.

ssh localhost
ssh-localhost
ssh localhost

The Hadoop user may now effortlessly create an SSH connection to the localhost after a brief question.

Step3: Download and Install Hadoop on Ubuntu

You can use wget to download it from the official website

hadoopbinary
binary

Download and extract

hadoopdownload
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz
wget-hadoop
wget hadoop

Once the download is complete, extract the files to initiate the Hadoop installation:

tar xzf hadoop-3.3.3.tar.gz

Now, the hadoop-3.3.3 directory is where you may find the Hadoop binary files.

Step4: Single Node Hadoop Deployment

Hadoop performs best when set up on a sizable networked cluster of machines in a fully distributed configuration. However, you may set up Hadoop on a single node if you’re unfamiliar with it and wish to investigate fundamental commands or test out applications.

With this configuration, also known as pseudo-distributed mode, each Hadoop daemon can function as a separate Java process. Editing a collection of configuration files listed below allows you to customize a Hadoop environment:

  1. bashrc
  2. hadoop-env.sh
  3. core-site.xml
  4. hdfs-site.xml
  5. mapred-site-xml
  6. yarn-site.xml

Configure Hadoop Environment Variables (bashrc)

Edit the .bashrc shell configuration file using a text editor of your choice (we will be using vim):

sudo vim .bashrc

I hope you found the steps on how to Install Hadoop on Linux very useful. Please feel free to leave a comment below.

Rate this post

Thank you for reading this post. Kindly share it with others.

  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on Facebook (Opens in new window) Facebook
  • Share on Pinterest (Opens in new window) Pinterest
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Telegram (Opens in new window) Telegram
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on Threads (Opens in new window) Threads
  • Share on Nextdoor (Opens in new window) Nextdoor
Network | Monitoring

Post navigation

Previous Post: How to Install SonarQube on Ubuntu 20.04 LTS
Next Post: How to encrypt your system with Trellix Data Encryption

Related Posts

  • Docker error manifest
    Docker image OS “windows” cannot be used on this platform: No matching manifest for linux/amd64 in the manifest list entries from Microsoft Docker Registry Network | Monitoring
  • office365 1200x675 1
    How to prevent emails from going into a junk folder in Office365 Network | Monitoring
  • cisco ASA
    Basic Cisco ASA troubleshooting Commands Guide Network | Monitoring
  • 71PSZcv0RL. AC SX425
    How to disable unused Cisco Access Ports Network | Monitoring
  • SSU
    What to know about the servicing stack update and latest cumulative update in Windows Network | Monitoring
  • cb5e9fcbe91618c68c5236d801eb6721
    Real-Time Monitoring: How to setup VeeamONE Network | Monitoring

More Related Articles

Docker error manifest Docker image OS “windows” cannot be used on this platform: No matching manifest for linux/amd64 in the manifest list entries from Microsoft Docker Registry Network | Monitoring
office365 1200x675 1 How to prevent emails from going into a junk folder in Office365 Network | Monitoring
cisco ASA Basic Cisco ASA troubleshooting Commands Guide Network | Monitoring
71PSZcv0RL. AC SX425 How to disable unused Cisco Access Ports Network | Monitoring
SSU What to know about the servicing stack update and latest cumulative update in Windows Network | Monitoring
cb5e9fcbe91618c68c5236d801eb6721 Real-Time Monitoring: How to setup VeeamONE Network | Monitoring

Leave a Reply Cancel reply

You must be logged in to post a comment.

Microsoft MVP

VEEAMLEGEND

vexpert-badge-stars-5

Virtual Background

GoogleNews

Categories

veeaam100

sysadmin top30a

  • ccsC
    NTuser.dat file: How to correctly load Windows Registry Hive Windows
  • powershell
    How to install and uninstall PowerShell on macOS via Homebrew Mac
  • Featured image Excel crash 1
    How to Fix Microsoft Excel Crash Issues Network | Monitoring
  • Featured image Desktop Stickers
    How to create Desktop Stickers in Windows 11 Windows
  • 1 ifg3ir3l 8ejus3pueqt0a
    Fix cannot find KDC for realm while getting initial credentials and kinit configuration file does not specify default realm Configuration Management Tool
  • PSD1 Azure 2
    How to install PSD Hydration Kit for remote bare-metal deployment or via PXE boot Windows Server
  • loc
    Solve VMware workstation .lck error [Part 1] Virtualization
  • Wiki in Linux
    How to Use Wiki from Linux Terminal Linux

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 1,825 other subscribers
  • RSS - Posts
  • RSS - Comments
  • About
  • Authors
  • Write for us
  • Advertise with us
  • General Terms and Conditions
  • Privacy policy
  • Feedly
  • Telegram
  • Youtube
  • Facebook
  • Instagram
  • LinkedIn
  • Tumblr
  • Pinterest
  • Twitter
  • mastodon

Tags

AWS Azure Bitlocker Microsoft Windows PowerShell WDS Windows 10 Windows 11 Windows Deployment Services Windows Server 2016

Copyright © 2025 TechDirectArchive

 

Loading Comments...
 

You must be logged in to post a comment.