Installation & Setup
This guide walks you through installing Apache Hadoop in pseudo-distributed mode on a single machine — the fastest way to learn Hadoop before deploying to a real cluster.
Prerequisites
- Java 8 or Java 11 (Hadoop 3.x requires Java 8+)
- SSH with passwordless login to localhost
- A Linux or macOS machine (WSL2 works on Windows)
Verify Java is installed:
java -version
# java version "11.0.x"
Step 1 — Download Hadoop
Visit the Apache Hadoop Releases page and download the latest stable binary.
wget https://downloads.apache.org/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
tar -xzf hadoop-3.4.0.tar.gz
sudo mv hadoop-3.4.0 /usr/local/hadoop
Step 2 — Set Environment Variables
Add the following to your ~/.bashrc or ~/.zshrc:
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Reload your shell:
source ~/.bashrc
hadoop version
Step 3 — Configure Pseudo-Distributed Mode
Edit $HADOOP_HOME/etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Edit $HADOOP_HOME/etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Step 4 — Format the NameNode
hdfs namenode -format
Step 5 — Start the Cluster
start-dfs.sh
start-yarn.sh
Verify with jps — you should see:
NameNode
DataNode
SecondaryNameNode
ResourceManager
NodeManager
Open the HDFS web UI at http://localhost:9870 and YARN at http://localhost:8088.
Next Steps
Continue to HDFS Deep Dive to learn how to store and manage files on your new cluster.