Skip to main content

Upgrading from Hadoop 2 to Hadoop 3: A Complete How-To

· 5 min read
Hadoop.so Editorial Team
Big Data Engineers

Hadoop 3.x introduced erasure coding, YARN Timeline Service v2, multiple NameNode support, and significant performance improvements. If you're still running Hadoop 2.x, this guide walks through a safe, rolling upgrade path — without losing data or taking extended downtime.

What Changed in Hadoop 3.x

Before upgrading, understand the key differences:

AreaHadoop 2.xHadoop 3.x
HDFS replication default3x replicationErasure Coding option
NameNodes (HA)1 active + 1 standbyUp to 5 NameNodes
Minimum JavaJava 7Java 8
YARN Timeline Servicev1v2 (HBase-backed)
Shell scriptsCommon scriptsReworked, cleaner separation
Ports50070, 80209870, 9000 (changed)

The port changes alone can break existing monitoring, firewall rules, and client configs — plan for those carefully.


Pre-Upgrade Checklist

Before touching a single config file:

  1. Audit all client applications for hardcoded ports (50070, 8020, 50010, etc.)
  2. Check Java version — every node must run Java 8 or higher
  3. Review deprecated APIs — several mapred and dfs shell commands were removed
  4. Back up namenode metadata:
    hdfs dfsadmin -saveNamespace
    cp -r /path/to/namenode/current /backup/namenode-$(date +%Y%m%d)
  5. Snapshot your HDFS data directories on each DataNode if possible
  6. Read the release notes for your specific target version (3.3.x or 3.4.x)

Step 1: Upgrade HDFS Metadata

The NameNode metadata format must be finalized before DataNodes are upgraded.

1.1 — Put NameNode in safemode and save namespace

hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
hdfs dfsadmin -safemode leave

1.2 — Stop all services in order

stop-yarn.sh
stop-dfs.sh

Stop MapReduce Job History Server last:

mapred --daemon stop historyserver

1.3 — Upgrade NameNode

Replace the Hadoop binaries on the NameNode host with the 3.x release, then run:

hdfs namenode -upgrade

This writes a new metadata layout while preserving the old layout in a previous/ directory — allowing rollback if needed.


Step 2: Upgrade DataNodes

With the NameNode upgraded and running, start each DataNode with the new binaries:

hdfs --daemon start datanode

DataNodes are backward-compatible during the rolling upgrade window. You can upgrade them one at a time and keep HDFS serving data throughout.

Monitor upgrade progress:

hdfs dfsadmin -upgradeProgress status

Step 3: Upgrade YARN

ResourceManager and NodeManagers can be rolled independently in Hadoop 3.x thanks to the work-preserving restart feature.

3.1 — Upgrade ResourceManager

yarn --daemon stop resourcemanager
# Replace binaries
yarn --daemon start resourcemanager

3.2 — Rolling NodeManager upgrade

# On each node, one at a time:
yarn --daemon stop nodemanager
# Replace binaries
yarn --daemon start nodemanager

Running containers are preserved across NodeManager restarts (work-preserving upgrade).


Step 4: Update Configuration Files

Hadoop 3.x uses different default ports. Update core-site.xml, hdfs-site.xml, and any clients pointing to old ports:

Old → New port mappings:

NameNode RPC:       8020  → 9000 (or keep 8020 with explicit config)
NameNode Web UI: 50070 → 9870
Secondary NN: 50090 → 9868
DataNode Web UI: 50075 → 9864
DataNode transfer: 50010 → 9866
DataNode IPC: 50020 → 9867

Update core-site.xml if using the old port:

<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode-host:9000</value>
</property>

Step 5: Finalize the Upgrade

Once you've validated that everything is working correctly, finalize the upgrade to reclaim the space used by the previous/ layout backup:

hdfs dfsadmin -finalizeUpgrade

Warning: After finalization, rollback is no longer possible.


Rollback Procedure (if needed)

If you encounter critical issues before finalization:

stop-dfs.sh
hdfs namenode -rollback
start-dfs.sh

This reverts the NameNode metadata to the Hadoop 2.x layout. DataNodes also need to be rolled back with the 2.x binaries.


Common Upgrade Issues

Shell Script Changes

Hadoop 3.x reworked the shell scripts. Commands like hadoop-daemon.sh are deprecated in favor of:

# Old (2.x)
hadoop-daemon.sh start datanode
# New (3.x)
hdfs --daemon start datanode

Classpath Changes

Third-party tools (Hive, HBase, Spark) that relied on Hadoop's classpath may need updated versions compatible with Hadoop 3.x. Check each ecosystem component's compatibility matrix.

YARN Timeline Service v2

YARN Timeline Service v2 requires HBase as a backend. If you relied on Timeline Service v1, plan the HBase deployment before enabling v2:

<!-- yarn-site.xml -->
<property>
<name>yarn.timeline-service.version</name>
<value>2.0f</value>
</property>

Post-Upgrade Verification

# Verify HDFS health
hdfs dfsadmin -report
hdfs fsck / -summary

# Check YARN cluster
yarn node -list
yarn application -list

# Run a test job
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 10 100

A successful Pi estimation job confirms that HDFS and YARN are both operational end-to-end.


Summary

PhaseAction
Pre-upgradeBackup metadata, check Java 8+, audit ports
Step 1Save namespace, stop services, upgrade NameNode
Step 2Roll DataNodes one at a time
Step 3Roll ResourceManager and NodeManagers
Step 4Update config files for new default ports
Step 5Finalize upgrade (reclaims rollback space)

Upgrading Hadoop 2 to 3 is operationally straightforward when done in order. The biggest surprises tend to come from port changes and ecosystem tool compatibility — audit those before you start and the rest is mechanical.