Skip to main content

Security & Kerberos

By default, Hadoop runs in simple authentication mode, which offers no real security — any user can impersonate any other. For production clusters, Apache Hadoop supports Kerberos for strong mutual authentication.

Why Kerberos?

Kerberos provides:

  • Authentication — Verify the identity of users and services
  • Mutual authentication — The cluster verifies the client and the client verifies the cluster
  • Ticket-based — No passwords sent over the wire; short-lived tickets limit exposure

Key Concepts

TermDescription
KDC (Key Distribution Center)Central Kerberos server that issues tickets
PrincipalA unique identity (e.g., hdfs/namenode.example.com@EXAMPLE.COM)
KeytabFile containing encrypted credentials for a principal (used by services)
TGTTicket Granting Ticket — proves to the KDC that you're authenticated
Service TicketGrants access to a specific service (e.g., HDFS, YARN)

Enabling Kerberos in Hadoop

1. Set up a KDC

Install MIT Kerberos or use an existing Active Directory / FreeIPA server.

2. Create Principals

# On the KDC
kadmin.local
kadmin.local: addprinc -randkey hdfs/namenode.example.com@EXAMPLE.COM
kadmin.local: addprinc -randkey yarn/resourcemanager.example.com@EXAMPLE.COM
kadmin.local: addprinc -randkey HTTP/namenode.example.com@EXAMPLE.COM

3. Export Keytabs

kadmin.local:  ktadd -k /etc/security/keytabs/hdfs.keytab hdfs/namenode.example.com@EXAMPLE.COM

4. Configure core-site.xml

<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>

5. Configure hdfs-site.xml

<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/security/keytabs/hdfs.keytab</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@EXAMPLE.COM</value>
</property>

User Authentication

Users authenticate with kinit before running Hadoop commands:

kinit alice@EXAMPLE.COM
# enter password (or use a keytab)
kinit -kt alice.keytab alice@EXAMPLE.COM

# Verify ticket
klist

# Run HDFS commands — authentication happens automatically
hdfs dfs -ls /user/alice

HDFS Permissions

HDFS has a POSIX-like permission model:

# Set directory ownership
hdfs dfs -chown alice:analytics /user/alice

# Set permissions (rwxr-x---)
hdfs dfs -chmod 750 /user/alice

# Enable ACLs for fine-grained access
hdfs dfs -setfacl -m user:bob:r-- /user/alice/shared
hdfs dfs -getfacl /user/alice/shared

Apache Ranger

For attribute-based access control, auditing, and a GUI policy editor, consider Apache Ranger — the standard security governance layer for the Hadoop ecosystem.