Skip to main content

HDFS Federation

The Single NameNode Bottleneck

In a standard HDFS deployment, a single NameNode stores the entire namespace in memory. As clusters grow to hundreds of petabytes, this becomes a bottleneck in three ways:

LimitImpact
RAM capacityNamespace size capped by NameNode heap (~1 billion files per 200 GB RAM)
Single namespaceAll tenants share one directory tree — no isolation
ThroughputAll metadata operations serialized through one JVM

HDFS Federation solves this by allowing multiple independent NameNodes, each managing its own portion of the namespace.

How Federation Works

┌─────────────────────────────────────────────────────┐
│ HDFS Federation │
│ │
│ NameNode A NameNode B NameNode C │
│ /user /tmp /data /warehouse /logs │
│ │ │ │ │
│ Block Pool A Block Pool B Block Pool C│
│ │ │ │ │
└───────┼───────────────────┼───────────────────┼──────┘
│ │ │
┌────▼────────────────────────────────────────▼───┐
│ Shared DataNode Cluster │
│ (DataNodes report all block pools to all NNs) │
└─────────────────────────────────────────────────┘

Key concepts:

  • Block Pool: Each NameNode owns a block pool — a set of blocks on the DataNodes. DataNodes serve all block pools simultaneously.
  • Namespace Volume: A NameNode + its block pool together form a namespace volume. Failure of one does not affect others.
  • ViewFs: A client-side mount table that maps paths to the correct NameNode (hdfs://nn-a/user, hdfs://nn-b/data, etc.).

Configuration

hdfs-site.xml — enable federation

<!-- Define two NameNode services -->
<property>
<name>dfs.nameservices</name>
<value>ns1,ns2</value>
</property>

<!-- NameNode for ns1 -->
<property>
<name>dfs.namenode.rpc-address.ns1</name>
<value>namenode1.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1</name>
<value>namenode1.example.com:9870</value>
</property>

<!-- NameNode for ns2 -->
<property>
<name>dfs.namenode.rpc-address.ns2</name>
<value>namenode2.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ns2</name>
<value>namenode2.example.com:9870</value>
</property>

core-site.xml — ViewFs mount table

<property>
<name>fs.defaultFS</name>
<value>viewfs://clusterX</value>
</property>

<!-- Mount /user and /tmp to NameNode 1 -->
<property>
<name>fs.viewfs.mounttable.clusterX.link./user</name>
<value>hdfs://ns1/user</value>
</property>
<property>
<name>fs.viewfs.mounttable.clusterX.link./tmp</name>
<value>hdfs://ns1/tmp</value>
</property>

<!-- Mount /data to NameNode 2 -->
<property>
<name>fs.viewfs.mounttable.clusterX.link./data</name>
<value>hdfs://ns2/data</value>
</property>

With ViewFs configured, clients use normal paths (/user/alice, /data/sales) and the client library transparently routes to the correct NameNode.

Format and Start

Format each NameNode separately with a shared cluster ID so DataNodes can register with all:

# Get cluster ID from first NameNode format
hdfs namenode -format -clusterid hadoop-cluster-01

# Format second NameNode with same cluster ID
hdfs namenode -format -clusterid hadoop-cluster-01 \
-force # on namenode2

# Start both NameNodes
hdfs --daemon start namenode # run on each NameNode host
hdfs --daemon start datanode # run on DataNode hosts

Balancer Across Pools

The HDFS Balancer works per-namespace. Run it against each NameNode independently:

hdfs balancer -fs hdfs://ns1
hdfs balancer -fs hdfs://ns2

When to Use Federation

  • Namespace exceeds ~500 million files on a single NameNode
  • Multiple teams/projects need isolated namespaces with separate quotas
  • You need to split metadata throughput across multiple NameNodes
  • Planning for multi-tenant cluster with chargeback per namespace

For clusters under 300–400 million files, standard High Availability (active/standby NameNode pair) is sufficient and simpler to operate.