HDFS Federation
The Single NameNode Bottleneck
In a standard HDFS deployment, a single NameNode stores the entire namespace in memory. As clusters grow to hundreds of petabytes, this becomes a bottleneck in three ways:
| Limit | Impact |
|---|---|
| RAM capacity | Namespace size capped by NameNode heap (~1 billion files per 200 GB RAM) |
| Single namespace | All tenants share one directory tree — no isolation |
| Throughput | All metadata operations serialized through one JVM |
HDFS Federation solves this by allowing multiple independent NameNodes, each managing its own portion of the namespace.
How Federation Works
┌─────────────────────────────────────────────────────┐
│ HDFS Federation │
│ │
│ NameNode A NameNode B NameNode C │
│ /user /tmp /data /warehouse /logs │
│ │ │ │ │
│ Block Pool A Block Pool B Block Pool C│
│ │ │ │ │
└───────┼───────────────────┼───────────────────┼──────┘
│ │ │
┌────▼────────────────────────────────────────▼───┐
│ Shared DataNode Cluster │
│ (DataNodes report all block pools to all NNs) │
└─────────────────────────────────────────────────┘
Key concepts:
- Block Pool: Each NameNode owns a block pool — a set of blocks on the DataNodes. DataNodes serve all block pools simultaneously.
- Namespace Volume: A NameNode + its block pool together form a namespace volume. Failure of one does not affect others.
- ViewFs: A client-side mount table that maps paths to the correct NameNode (
hdfs://nn-a/user,hdfs://nn-b/data, etc.).
Configuration
hdfs-site.xml — enable federation
<!-- Define two NameNode services -->
<property>
<name>dfs.nameservices</name>
<value>ns1,ns2</value>
</property>
<!-- NameNode for ns1 -->
<property>
<name>dfs.namenode.rpc-address.ns1</name>
<value>namenode1.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1</name>
<value>namenode1.example.com:9870</value>
</property>
<!-- NameNode for ns2 -->
<property>
<name>dfs.namenode.rpc-address.ns2</name>
<value>namenode2.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ns2</name>
<value>namenode2.example.com:9870</value>
</property>
core-site.xml — ViewFs mount table
<property>
<name>fs.defaultFS</name>
<value>viewfs://clusterX</value>
</property>
<!-- Mount /user and /tmp to NameNode 1 -->
<property>
<name>fs.viewfs.mounttable.clusterX.link./user</name>
<value>hdfs://ns1/user</value>
</property>
<property>
<name>fs.viewfs.mounttable.clusterX.link./tmp</name>
<value>hdfs://ns1/tmp</value>
</property>
<!-- Mount /data to NameNode 2 -->
<property>
<name>fs.viewfs.mounttable.clusterX.link./data</name>
<value>hdfs://ns2/data</value>
</property>
With ViewFs configured, clients use normal paths (/user/alice, /data/sales) and the client library transparently routes to the correct NameNode.
Format and Start
Format each NameNode separately with a shared cluster ID so DataNodes can register with all:
# Get cluster ID from first NameNode format
hdfs namenode -format -clusterid hadoop-cluster-01
# Format second NameNode with same cluster ID
hdfs namenode -format -clusterid hadoop-cluster-01 \
-force # on namenode2
# Start both NameNodes
hdfs --daemon start namenode # run on each NameNode host
hdfs --daemon start datanode # run on DataNode hosts
Balancer Across Pools
The HDFS Balancer works per-namespace. Run it against each NameNode independently:
hdfs balancer -fs hdfs://ns1
hdfs balancer -fs hdfs://ns2
When to Use Federation
- Namespace exceeds ~500 million files on a single NameNode
- Multiple teams/projects need isolated namespaces with separate quotas
- You need to split metadata throughput across multiple NameNodes
- Planning for multi-tenant cluster with chargeback per namespace
For clusters under 300–400 million files, standard High Availability (active/standby NameNode pair) is sufficient and simpler to operate.