High Availability
A single NameNode is a single point of failure in Hadoop. Hadoop 2.x introduced NameNode High Availability (HA) using two NameNodes — an Active and a Standby — to eliminate this risk.
How HA Works
The Active NameNode handles all client operations. The Standby NameNode keeps its state synchronized via JournalNodes (a quorum-based shared edit log). If the Active NameNode fails, the Standby is promoted automatically using ZooKeeper for leader election.
ZooKeeper Cluster (3+ nodes)
|
|--- Active NameNode ---|
| |--- JournalNode 1 ---|
|--- Standby NameNode --|--- JournalNode 2 ----| (edit log quorum)
--- JournalNode 3 ---|
Configuring NameNode HA
In hdfs-site.xml:
<configuration>
<!-- Logical name for the pair -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- The two NameNode IDs -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- RPC addresses -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>namenode1.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>namenode2.example.com:8020</value>
</property>
<!-- JournalNode edit log URI -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://jn1:8485;jn2:8485;jn3:8485/mycluster</value>
</property>
<!-- Enable automatic failover -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
Manual Failover
If automatic failover is not enabled, you can switch the Active NameNode manually:
hdfs haadmin -failover nn1 nn2
Checking HA Status
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
ResourceManager HA
YARN's ResourceManager also supports HA. The configuration pattern is similar, using ZooKeeper for state store and leader election:
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>mycluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>