Skip to main content

Installation & Setup

This guide walks you through installing Apache Hadoop in pseudo-distributed mode on a single machine — the fastest way to learn Hadoop before deploying to a real cluster.

Prerequisites

  • Java 8 or Java 11 (Hadoop 3.x requires Java 8+)
  • SSH with passwordless login to localhost
  • A Linux or macOS machine (WSL2 works on Windows)

Verify Java is installed:

java -version
# java version "11.0.x"

Step 1 — Download Hadoop

Visit the Apache Hadoop Releases page and download the latest stable binary.

wget https://downloads.apache.org/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
tar -xzf hadoop-3.4.0.tar.gz
sudo mv hadoop-3.4.0 /usr/local/hadoop

Step 2 — Set Environment Variables

Add the following to your ~/.bashrc or ~/.zshrc:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Reload your shell:

source ~/.bashrc
hadoop version

Step 3 — Configure Pseudo-Distributed Mode

Edit $HADOOP_HOME/etc/hadoop/core-site.xml:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Edit $HADOOP_HOME/etc/hadoop/hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Step 4 — Format the NameNode

hdfs namenode -format

Step 5 — Start the Cluster

start-dfs.sh
start-yarn.sh

Verify with jps — you should see:

NameNode
DataNode
SecondaryNameNode
ResourceManager
NodeManager

Open the HDFS web UI at http://localhost:9870 and YARN at http://localhost:8088.

Next Steps

Continue to HDFS Deep Dive to learn how to store and manage files on your new cluster.