Security & Kerberos
By default, Hadoop runs in simple authentication mode, which offers no real security — any user can impersonate any other. For production clusters, Apache Hadoop supports Kerberos for strong mutual authentication.
Why Kerberos?
Kerberos provides:
- Authentication — Verify the identity of users and services
- Mutual authentication — The cluster verifies the client and the client verifies the cluster
- Ticket-based — No passwords sent over the wire; short-lived tickets limit exposure
Key Concepts
| Term | Description |
|---|---|
| KDC (Key Distribution Center) | Central Kerberos server that issues tickets |
| Principal | A unique identity (e.g., hdfs/namenode.example.com@EXAMPLE.COM) |
| Keytab | File containing encrypted credentials for a principal (used by services) |
| TGT | Ticket Granting Ticket — proves to the KDC that you're authenticated |
| Service Ticket | Grants access to a specific service (e.g., HDFS, YARN) |
Enabling Kerberos in Hadoop
1. Set up a KDC
Install MIT Kerberos or use an existing Active Directory / FreeIPA server.
2. Create Principals
# On the KDC
kadmin.local
kadmin.local: addprinc -randkey hdfs/namenode.example.com@EXAMPLE.COM
kadmin.local: addprinc -randkey yarn/resourcemanager.example.com@EXAMPLE.COM
kadmin.local: addprinc -randkey HTTP/namenode.example.com@EXAMPLE.COM
3. Export Keytabs
kadmin.local: ktadd -k /etc/security/keytabs/hdfs.keytab hdfs/namenode.example.com@EXAMPLE.COM
4. Configure core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
5. Configure hdfs-site.xml
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/security/keytabs/hdfs.keytab</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@EXAMPLE.COM</value>
</property>
User Authentication
Users authenticate with kinit before running Hadoop commands:
kinit alice@EXAMPLE.COM
# enter password (or use a keytab)
kinit -kt alice.keytab alice@EXAMPLE.COM
# Verify ticket
klist
# Run HDFS commands — authentication happens automatically
hdfs dfs -ls /user/alice
HDFS Permissions
HDFS has a POSIX-like permission model:
# Set directory ownership
hdfs dfs -chown alice:analytics /user/alice
# Set permissions (rwxr-x---)
hdfs dfs -chmod 750 /user/alice
# Enable ACLs for fine-grained access
hdfs dfs -setfacl -m user:bob:r-- /user/alice/shared
hdfs dfs -getfacl /user/alice/shared
Apache Ranger
For attribute-based access control, auditing, and a GUI policy editor, consider Apache Ranger — the standard security governance layer for the Hadoop ecosystem.