<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://hadoop.so/blog</id>
    <title>Hadoop Blog</title>
    <updated>2026-04-28T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://hadoop.so/blog"/>
    <subtitle>Hadoop Blog</subtitle>
    <icon>https://hadoop.so/img/favicon.ico</icon>
    <entry>
        <title type="html"><![CDATA[What's New in Apache Hadoop 3]]></title>
        <id>https://hadoop.so/blog/hadoop-3-whats-new</id>
        <link href="https://hadoop.so/blog/hadoop-3-whats-new"/>
        <updated>2026-04-28T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Apache Hadoop 3.x was a landmark release that brought significant improvements to performance, reliability, and scalability. Here's a quick tour of the most important changes.]]></summary>
        <content type="html"><![CDATA[<p>Apache Hadoop 3.x was a landmark release that brought significant improvements to performance, reliability, and scalability. Here's a quick tour of the most important changes.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="erasure-coding--slash-storage-costs-by-50">Erasure Coding — Slash Storage Costs by 50%<a href="https://hadoop.so/blog/hadoop-3-whats-new#erasure-coding--slash-storage-costs-by-50" class="hash-link" aria-label="Direct link to Erasure Coding — Slash Storage Costs by 50%" title="Direct link to Erasure Coding — Slash Storage Costs by 50%" translate="no">​</a></h2>
<p>The biggest storage improvement in Hadoop 3 is <strong>Erasure Coding (EC)</strong> for HDFS. Previously, the default 3x replication meant storing 200% overhead. With EC (using Reed-Solomon algorithms), you can achieve the same fault tolerance with just <strong>50% overhead</strong> — cutting storage costs dramatically for cold or infrequently accessed data.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Enable erasure coding on a directory</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs ec -setPolicy -policy RS-6-3-1024k -path /data/cold-storage</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs ec -getPolicy -path /data/cold-storage</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="support-for-more-than-2-namenodes-in-ha">Support for More Than 2 NameNodes in HA<a href="https://hadoop.so/blog/hadoop-3-whats-new#support-for-more-than-2-namenodes-in-ha" class="hash-link" aria-label="Direct link to Support for More Than 2 NameNodes in HA" title="Direct link to Support for More Than 2 NameNodes in HA" translate="no">​</a></h2>
<p>Hadoop 2 supported exactly 2 NameNodes in HA mode. Hadoop 3 supports <strong>up to 5 NameNodes</strong>, enabling more resilient configurations for large-scale deployments.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="yarn-timeline-service-v2">YARN Timeline Service v2<a href="https://hadoop.so/blog/hadoop-3-whats-new#yarn-timeline-service-v2" class="hash-link" aria-label="Direct link to YARN Timeline Service v2" title="Direct link to YARN Timeline Service v2" translate="no">​</a></h2>
<p>The redesigned YARN Timeline Service v2 offers better scalability using HBase as its backend, replacing the single-writer bottleneck of v1. This makes job history and metrics retrieval much faster on large clusters.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="intra-datanode-balancer">Intra-DataNode Balancer<a href="https://hadoop.so/blog/hadoop-3-whats-new#intra-datanode-balancer" class="hash-link" aria-label="Direct link to Intra-DataNode Balancer" title="Direct link to Intra-DataNode Balancer" translate="no">​</a></h2>
<p>A new <strong>intra-DataNode disk balancer</strong> ensures that data is spread evenly across all disks on a single DataNode, preventing single-disk hotspots that could degrade performance.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">hdfs diskbalancer -plan datanode.example.com</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs diskbalancer -execute datanode.example.com.plan.json</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="java-8-minimum--dropped-support-for-older-versions">Java 8 Minimum + Dropped Support for Older Versions<a href="https://hadoop.so/blog/hadoop-3-whats-new#java-8-minimum--dropped-support-for-older-versions" class="hash-link" aria-label="Direct link to Java 8 Minimum + Dropped Support for Older Versions" title="Direct link to Java 8 Minimum + Dropped Support for Older Versions" translate="no">​</a></h2>
<p>Hadoop 3 dropped support for Java 7 and requires <strong>Java 8 or higher</strong>, allowing the codebase to take advantage of modern JVM features.</p>]]></content>
        <author>
            <name>Hadoop.so Editorial Team</name>
            <uri>https://hadoop.so</uri>
        </author>
        <category label="Hadoop" term="Hadoop"/>
        <category label="Release" term="Release"/>
        <category label="HDFS" term="HDFS"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[HDFS vs Amazon S3: Choosing Your Hadoop Storage]]></title>
        <id>https://hadoop.so/blog/hdfs-vs-s3-comparison</id>
        <link href="https://hadoop.so/blog/hdfs-vs-s3-comparison"/>
        <updated>2026-04-27T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[As organizations move workloads to the cloud, one of the most common questions is: should I use HDFS or Amazon S3 as my Hadoop storage layer? Both are valid choices, but they have very different performance profiles and operational characteristics.]]></summary>
        <content type="html"><![CDATA[<p>As organizations move workloads to the cloud, one of the most common questions is: <strong>should I use HDFS or Amazon S3 as my Hadoop storage layer?</strong> Both are valid choices, but they have very different performance profiles and operational characteristics.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="architecture-differences">Architecture Differences<a href="https://hadoop.so/blog/hdfs-vs-s3-comparison#architecture-differences" class="hash-link" aria-label="Direct link to Architecture Differences" title="Direct link to Architecture Differences" translate="no">​</a></h2>
<p><strong>HDFS</strong> co-locates storage and compute. DataNodes store data on local disks, and Hadoop's data locality optimization ensures that MapReduce and Spark tasks run on the same node that holds the data — eliminating network I/O for reads.</p>
<p><strong>Amazon S3</strong> separates storage from compute. Your cluster nodes have no local data; all reads and writes traverse the network. This means data locality is impossible, but it also means your compute cluster can be resized independently of your storage.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="performance">Performance<a href="https://hadoop.so/blog/hdfs-vs-s3-comparison#performance" class="hash-link" aria-label="Direct link to Performance" title="Direct link to Performance" translate="no">​</a></h2>
<table><thead><tr><th>Scenario</th><th>HDFS Winner</th><th>S3 Winner</th></tr></thead><tbody><tr><td>Sequential large file reads</td><td>✅ Data locality</td><td>❌ Network I/O</td></tr><tr><td>Random small file access</td><td>✅ Local disk</td><td>❌ High latency per request</td></tr><tr><td>Cluster scale-up/down</td><td>❌ Data rebalancing needed</td><td>✅ Instant, no rebalancing</td></tr><tr><td>Long-term cold storage cost</td><td>❌ Always-on nodes</td><td>✅ Pay only for storage</td></tr><tr><td>Multi-framework access (Spark, Presto)</td><td>❌ Must be on same cluster</td><td>✅ Any cluster can read</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="latency">Latency<a href="https://hadoop.so/blog/hdfs-vs-s3-comparison#latency" class="hash-link" aria-label="Direct link to Latency" title="Direct link to Latency" translate="no">​</a></h2>
<p>HDFS latency for sequential reads is typically <strong>10-40ms</strong> for a local DataNode read. S3 API calls have a base latency of <strong>100-200ms</strong> before any data is transferred. For workloads with many small files, this difference is significant.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="cost-model">Cost Model<a href="https://hadoop.so/blog/hdfs-vs-s3-comparison#cost-model" class="hash-link" aria-label="Direct link to Cost Model" title="Direct link to Cost Model" translate="no">​</a></h2>
<p>With HDFS, you pay for the full node (CPU + memory + disk) even when the cluster is idle. With S3, you pay only for stored bytes when not running jobs, making it dramatically cheaper for <strong>intermittent</strong> or <strong>serverless</strong> architectures.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-hybrid-pattern">The Hybrid Pattern<a href="https://hadoop.so/blog/hdfs-vs-s3-comparison#the-hybrid-pattern" class="hash-link" aria-label="Direct link to The Hybrid Pattern" title="Direct link to The Hybrid Pattern" translate="no">​</a></h2>
<p>Many organizations adopt a hybrid approach:</p>
<ul>
<li class=""><strong>Hot data</strong> → HDFS (low latency, high throughput)</li>
<li class=""><strong>Cold/archival data</strong> → S3 (cheap, durable, accessible)</li>
<li class=""><strong>Hadoop on S3</strong> → Use <code>s3a://</code> connector with EMRFS or Hadoop 3's S3A committer for correctness</li>
</ul>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Access S3 from Hadoop</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfs -ls s3a://my-bucket/data/</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hadoop jar myapp.jar -input s3a://my-bucket/input/ -output s3a://my-bucket/output/</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="recommendation">Recommendation<a href="https://hadoop.so/blog/hdfs-vs-s3-comparison#recommendation" class="hash-link" aria-label="Direct link to Recommendation" title="Direct link to Recommendation" translate="no">​</a></h2>
<ul>
<li class=""><strong>On-premises cluster, performance-critical batch jobs</strong> → HDFS</li>
<li class=""><strong>Cloud-native, variable workloads, or data lake</strong> → S3 with a cloud Hadoop distribution (EMR, Dataproc, HDInsight)</li>
<li class=""><strong>Migrating to cloud</strong> → Start with S3 for new data, migrate HDFS data gradually</li>
</ul>]]></content>
        <author>
            <name>Hadoop.so Editorial Team</name>
            <uri>https://hadoop.so</uri>
        </author>
        <category label="HDFS" term="HDFS"/>
        <category label="Cloud" term="Cloud"/>
        <category label="Storage" term="Storage"/>
        <category label="Amazon S3" term="Amazon S3"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Apache Spark vs MapReduce: When to Use Which]]></title>
        <id>https://hadoop.so/blog/apache-spark-vs-mapreduce</id>
        <link href="https://hadoop.so/blog/apache-spark-vs-mapreduce"/>
        <updated>2026-04-26T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Apache Spark has largely replaced MapReduce for new Hadoop workloads. But MapReduce is not dead — understanding when each is appropriate will help you build more efficient data pipelines.]]></summary>
        <content type="html"><![CDATA[<p>Apache Spark has largely replaced MapReduce for new Hadoop workloads. But MapReduce is not dead — understanding when each is appropriate will help you build more efficient data pipelines.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-core-difference-memory-vs-disk">The Core Difference: Memory vs Disk<a href="https://hadoop.so/blog/apache-spark-vs-mapreduce#the-core-difference-memory-vs-disk" class="hash-link" aria-label="Direct link to The Core Difference: Memory vs Disk" title="Direct link to The Core Difference: Memory vs Disk" translate="no">​</a></h2>
<p>MapReduce writes intermediate results to <strong>HDFS disk</strong> between every Map and Reduce stage. Spark keeps intermediate data <strong>in memory</strong> (with spill to disk when needed). For iterative algorithms that process the same data repeatedly, this makes Spark orders of magnitude faster.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="performance-benchmark-example">Performance Benchmark Example<a href="https://hadoop.so/blog/apache-spark-vs-mapreduce#performance-benchmark-example" class="hash-link" aria-label="Direct link to Performance Benchmark Example" title="Direct link to Performance Benchmark Example" translate="no">​</a></h2>
<p>For a machine learning algorithm that iterates over a dataset 10 times:</p>
<table><thead><tr><th>Approach</th><th>I/O Pattern</th><th>Relative Speed</th></tr></thead><tbody><tr><td>MapReduce</td><td>10 HDFS reads + 10 HDFS writes</td><td>1x (baseline)</td></tr><tr><td>Spark (with cache)</td><td>1 HDFS read, rest in memory</td><td>~10-100x faster</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="when-mapreduce-wins">When MapReduce Wins<a href="https://hadoop.so/blog/apache-spark-vs-mapreduce#when-mapreduce-wins" class="hash-link" aria-label="Direct link to When MapReduce Wins" title="Direct link to When MapReduce Wins" translate="no">​</a></h2>
<p>Despite Spark's performance advantage, MapReduce still makes sense when:</p>
<ol>
<li class=""><strong>Memory is severely constrained</strong> — MapReduce handles datasets larger than cluster RAM by spilling everything to disk</li>
<li class=""><strong>Long-running, write-once batch jobs</strong> — the disk durability of MapReduce is a feature, not a bug</li>
<li class=""><strong>Legacy compatibility</strong> — existing MapReduce jobs in production don't need to be rewritten if they're working fine</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="when-spark-wins">When Spark Wins<a href="https://hadoop.so/blog/apache-spark-vs-mapreduce#when-spark-wins" class="hash-link" aria-label="Direct link to When Spark Wins" title="Direct link to When Spark Wins" translate="no">​</a></h2>
<p>Use Spark when:</p>
<ol>
<li class=""><strong>Iterative ML training</strong> — Spark MLlib and graph algorithms benefit enormously from in-memory caching</li>
<li class=""><strong>Interactive analytics</strong> — Spark's REPL (PySpark, spark-shell) supports exploratory data analysis</li>
<li class=""><strong>Streaming</strong> — Spark Structured Streaming provides unified batch/streaming APIs</li>
<li class=""><strong>SQL workloads</strong> — Spark SQL with DataFrames is faster and more expressive than Hive on MapReduce</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="running-spark-on-yarn">Running Spark on YARN<a href="https://hadoop.so/blog/apache-spark-vs-mapreduce#running-spark-on-yarn" class="hash-link" aria-label="Direct link to Running Spark on YARN" title="Direct link to Running Spark on YARN" translate="no">​</a></h2>
<p>Spark integrates natively with YARN, making it a first-class Hadoop citizen:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Submit a Spark job to YARN</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spark-submit \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --master yarn \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --deploy-mode cluster \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --num-executors 10 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --executor-memory 4g \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --executor-cores 2 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  myapp.py</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Launch PySpark shell on YARN</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">pyspark --master yarn --num-executors 5 --executor-memory 2g</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="recommendation">Recommendation<a href="https://hadoop.so/blog/apache-spark-vs-mapreduce#recommendation" class="hash-link" aria-label="Direct link to Recommendation" title="Direct link to Recommendation" translate="no">​</a></h2>
<p>For any <strong>new</strong> Hadoop workload, start with Spark. Only fall back to MapReduce if you have specific memory constraints or need to maintain a legacy codebase.</p>]]></content>
        <author>
            <name>Hadoop.so Editorial Team</name>
            <uri>https://hadoop.so</uri>
        </author>
        <category label="Apache Spark" term="Apache Spark"/>
        <category label="MapReduce" term="MapReduce"/>
        <category label="Performance" term="Performance"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Welcome to hadoop.so]]></title>
        <id>https://hadoop.so/blog/welcome-to-hadoop-so</id>
        <link href="https://hadoop.so/blog/welcome-to-hadoop-so"/>
        <updated>2026-04-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Welcome to hadoop.so — your comprehensive resource for learning and mastering Apache Hadoop and the broader big data ecosystem.]]></summary>
        <content type="html"><![CDATA[<p>Welcome to <strong>hadoop.so</strong> — your comprehensive resource for learning and mastering Apache Hadoop and the broader big data ecosystem.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-youll-find-here">What You'll Find Here<a href="https://hadoop.so/blog/welcome-to-hadoop-so#what-youll-find-here" class="hash-link" aria-label="Direct link to What You'll Find Here" title="Direct link to What You'll Find Here" translate="no">​</a></h2>
<p>This site is organized into three areas:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="documentation">Documentation<a href="https://hadoop.so/blog/welcome-to-hadoop-so#documentation" class="hash-link" aria-label="Direct link to Documentation" title="Direct link to Documentation" translate="no">​</a></h3>
<p>Step-by-step guides covering the entire Hadoop stack:</p>
<ul>
<li class=""><strong>Getting Started</strong> — Install Hadoop, understand HDFS, write your first MapReduce job, and manage resources with YARN</li>
<li class=""><strong>Advanced Topics</strong> — High availability, Kerberos security, performance tuning, and the Hadoop ecosystem</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="blog">Blog<a href="https://hadoop.so/blog/welcome-to-hadoop-so#blog" class="hash-link" aria-label="Direct link to Blog" title="Direct link to Blog" translate="no">​</a></h3>
<p>Regular articles on:</p>
<ul>
<li class="">Hadoop release highlights and migration guides</li>
<li class="">Architecture comparisons (HDFS vs S3, MapReduce vs Spark)</li>
<li class="">Real-world deployment patterns and war stories</li>
<li class="">Ecosystem tool deep dives</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="community">Community<a href="https://hadoop.so/blog/welcome-to-hadoop-so#community" class="hash-link" aria-label="Direct link to Community" title="Direct link to Community" translate="no">​</a></h3>
<ul>
<li class="">Questions? Post on <a href="https://stackoverflow.com/questions/tagged/hadoop" target="_blank" rel="noopener noreferrer" class="">Stack Overflow with the <code>hadoop</code> tag</a></li>
<li class="">Bugs and features: <a href="https://issues.apache.org/jira/projects/HADOOP" target="_blank" rel="noopener noreferrer" class="">Apache Hadoop JIRA</a></li>
<li class="">Official docs: <a href="https://hadoop.apache.org/" target="_blank" rel="noopener noreferrer" class="">hadoop.apache.org</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="a-note-on-versions">A Note on Versions<a href="https://hadoop.so/blog/welcome-to-hadoop-so#a-note-on-versions" class="hash-link" aria-label="Direct link to A Note on Versions" title="Direct link to A Note on Versions" translate="no">​</a></h2>
<p>This site targets <strong>Hadoop 3.x</strong> (the current stable series). Where behavior differs significantly from Hadoop 2.x, we'll call it out explicitly.</p>
<p>Happy learning!</p>]]></content>
        <author>
            <name>Hadoop.so Editorial Team</name>
            <uri>https://hadoop.so</uri>
        </author>
        <category label="Hadoop" term="Hadoop"/>
        <category label="Welcome" term="Welcome"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Upgrading from Hadoop 2 to Hadoop 3: A Complete How-To]]></title>
        <id>https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide</id>
        <link href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide"/>
        <updated>2026-04-24T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Hadoop 3.x introduced erasure coding, YARN Timeline Service v2, multiple NameNode support, and significant performance improvements. If you're still running Hadoop 2.x, this guide walks through a safe, rolling upgrade path — without losing data or taking extended downtime.]]></summary>
        <content type="html"><![CDATA[<p>Hadoop 3.x introduced erasure coding, YARN Timeline Service v2, multiple NameNode support, and significant performance improvements. If you're still running Hadoop 2.x, this guide walks through a safe, rolling upgrade path — without losing data or taking extended downtime.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-changed-in-hadoop-3x">What Changed in Hadoop 3.x<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#what-changed-in-hadoop-3x" class="hash-link" aria-label="Direct link to What Changed in Hadoop 3.x" title="Direct link to What Changed in Hadoop 3.x" translate="no">​</a></h2>
<p>Before upgrading, understand the key differences:</p>
<table><thead><tr><th>Area</th><th>Hadoop 2.x</th><th>Hadoop 3.x</th></tr></thead><tbody><tr><td>HDFS replication default</td><td>3x replication</td><td>Erasure Coding option</td></tr><tr><td>NameNodes (HA)</td><td>1 active + 1 standby</td><td>Up to 5 NameNodes</td></tr><tr><td>Minimum Java</td><td>Java 7</td><td>Java 8</td></tr><tr><td>YARN Timeline Service</td><td>v1</td><td>v2 (HBase-backed)</td></tr><tr><td>Shell scripts</td><td>Common scripts</td><td>Reworked, cleaner separation</td></tr><tr><td>Ports</td><td>50070, 8020</td><td>9870, 9000 (changed)</td></tr></tbody></table>
<p>The port changes alone can break existing monitoring, firewall rules, and client configs — plan for those carefully.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="pre-upgrade-checklist">Pre-Upgrade Checklist<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#pre-upgrade-checklist" class="hash-link" aria-label="Direct link to Pre-Upgrade Checklist" title="Direct link to Pre-Upgrade Checklist" translate="no">​</a></h2>
<p>Before touching a single config file:</p>
<ol>
<li class=""><strong>Audit all client applications</strong> for hardcoded ports (50070, 8020, 50010, etc.)</li>
<li class=""><strong>Check Java version</strong> — every node must run Java 8 or higher</li>
<li class=""><strong>Review deprecated APIs</strong> — several <code>mapred</code> and <code>dfs</code> shell commands were removed</li>
<li class=""><strong>Back up namenode metadata</strong>:<!-- -->
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfsadmin -saveNamespace</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">cp -r /path/to/namenode/current /backup/namenode-$(date +%Y%m%d)</span><br></span></code></pre></div></div>
</li>
<li class=""><strong>Snapshot your HDFS data directories</strong> on each DataNode if possible</li>
<li class=""><strong>Read the release notes</strong> for your specific target version (3.3.x or 3.4.x)</li>
</ol>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-1-upgrade-hdfs-metadata">Step 1: Upgrade HDFS Metadata<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#step-1-upgrade-hdfs-metadata" class="hash-link" aria-label="Direct link to Step 1: Upgrade HDFS Metadata" title="Direct link to Step 1: Upgrade HDFS Metadata" translate="no">​</a></h2>
<p>The NameNode metadata format must be finalized before DataNodes are upgraded.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="11--put-namenode-in-safemode-and-save-namespace">1.1 — Put NameNode in safemode and save namespace<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#11--put-namenode-in-safemode-and-save-namespace" class="hash-link" aria-label="Direct link to 1.1 — Put NameNode in safemode and save namespace" title="Direct link to 1.1 — Put NameNode in safemode and save namespace" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfsadmin -safemode enter</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfsadmin -saveNamespace</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfsadmin -safemode leave</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="12--stop-all-services-in-order">1.2 — Stop all services in order<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#12--stop-all-services-in-order" class="hash-link" aria-label="Direct link to 1.2 — Stop all services in order" title="Direct link to 1.2 — Stop all services in order" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">stop-yarn.sh</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">stop-dfs.sh</span><br></span></code></pre></div></div>
<p>Stop MapReduce Job History Server last:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">mapred --daemon stop historyserver</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="13--upgrade-namenode">1.3 — Upgrade NameNode<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#13--upgrade-namenode" class="hash-link" aria-label="Direct link to 1.3 — Upgrade NameNode" title="Direct link to 1.3 — Upgrade NameNode" translate="no">​</a></h3>
<p>Replace the Hadoop binaries on the NameNode host with the 3.x release, then run:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">hdfs namenode -upgrade</span><br></span></code></pre></div></div>
<p>This writes a new metadata layout while preserving the old layout in a <code>previous/</code> directory — allowing rollback if needed.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-2-upgrade-datanodes">Step 2: Upgrade DataNodes<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#step-2-upgrade-datanodes" class="hash-link" aria-label="Direct link to Step 2: Upgrade DataNodes" title="Direct link to Step 2: Upgrade DataNodes" translate="no">​</a></h2>
<p>With the NameNode upgraded and running, start each DataNode with the new binaries:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">hdfs --daemon start datanode</span><br></span></code></pre></div></div>
<p>DataNodes are backward-compatible during the rolling upgrade window. You can upgrade them one at a time and keep HDFS serving data throughout.</p>
<p>Monitor upgrade progress:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfsadmin -upgradeProgress status</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-3-upgrade-yarn">Step 3: Upgrade YARN<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#step-3-upgrade-yarn" class="hash-link" aria-label="Direct link to Step 3: Upgrade YARN" title="Direct link to Step 3: Upgrade YARN" translate="no">​</a></h2>
<p>ResourceManager and NodeManagers can be rolled independently in Hadoop 3.x thanks to the work-preserving restart feature.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="31--upgrade-resourcemanager">3.1 — Upgrade ResourceManager<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#31--upgrade-resourcemanager" class="hash-link" aria-label="Direct link to 3.1 — Upgrade ResourceManager" title="Direct link to 3.1 — Upgrade ResourceManager" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">yarn --daemon stop resourcemanager</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Replace binaries</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">yarn --daemon start resourcemanager</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="32--rolling-nodemanager-upgrade">3.2 — Rolling NodeManager upgrade<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#32--rolling-nodemanager-upgrade" class="hash-link" aria-label="Direct link to 3.2 — Rolling NodeManager upgrade" title="Direct link to 3.2 — Rolling NodeManager upgrade" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># On each node, one at a time:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">yarn --daemon stop nodemanager</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Replace binaries</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">yarn --daemon start nodemanager</span><br></span></code></pre></div></div>
<p>Running containers are preserved across NodeManager restarts (work-preserving upgrade).</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-4-update-configuration-files">Step 4: Update Configuration Files<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#step-4-update-configuration-files" class="hash-link" aria-label="Direct link to Step 4: Update Configuration Files" title="Direct link to Step 4: Update Configuration Files" translate="no">​</a></h2>
<p>Hadoop 3.x uses different default ports. Update <code>core-site.xml</code>, <code>hdfs-site.xml</code>, and any clients pointing to old ports:</p>
<p><strong>Old → New port mappings:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NameNode RPC:       8020  → 9000 (or keep 8020 with explicit config)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">NameNode Web UI:    50070 → 9870</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Secondary NN:       50090 → 9868</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">DataNode Web UI:    50075 → 9864</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">DataNode transfer:  50010 → 9866</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">DataNode IPC:       50020 → 9867</span><br></span></code></pre></div></div>
<p>Update <code>core-site.xml</code> if using the old port:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.defaultFS</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">hdfs://namenode-host:9000</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-5-finalize-the-upgrade">Step 5: Finalize the Upgrade<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#step-5-finalize-the-upgrade" class="hash-link" aria-label="Direct link to Step 5: Finalize the Upgrade" title="Direct link to Step 5: Finalize the Upgrade" translate="no">​</a></h2>
<p>Once you've validated that everything is working correctly, finalize the upgrade to reclaim the space used by the <code>previous/</code> layout backup:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfsadmin -finalizeUpgrade</span><br></span></code></pre></div></div>
<p><strong>Warning:</strong> After finalization, rollback is no longer possible.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="rollback-procedure-if-needed">Rollback Procedure (if needed)<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#rollback-procedure-if-needed" class="hash-link" aria-label="Direct link to Rollback Procedure (if needed)" title="Direct link to Rollback Procedure (if needed)" translate="no">​</a></h2>
<p>If you encounter critical issues before finalization:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">stop-dfs.sh</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs namenode -rollback</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">start-dfs.sh</span><br></span></code></pre></div></div>
<p>This reverts the NameNode metadata to the Hadoop 2.x layout. DataNodes also need to be rolled back with the 2.x binaries.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="common-upgrade-issues">Common Upgrade Issues<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#common-upgrade-issues" class="hash-link" aria-label="Direct link to Common Upgrade Issues" title="Direct link to Common Upgrade Issues" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="shell-script-changes">Shell Script Changes<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#shell-script-changes" class="hash-link" aria-label="Direct link to Shell Script Changes" title="Direct link to Shell Script Changes" translate="no">​</a></h3>
<p>Hadoop 3.x reworked the shell scripts. Commands like <code>hadoop-daemon.sh</code> are deprecated in favor of:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Old (2.x)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hadoop-daemon.sh start datanode</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># New (3.x)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs --daemon start datanode</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="classpath-changes">Classpath Changes<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#classpath-changes" class="hash-link" aria-label="Direct link to Classpath Changes" title="Direct link to Classpath Changes" translate="no">​</a></h3>
<p>Third-party tools (Hive, HBase, Spark) that relied on Hadoop's classpath may need updated versions compatible with Hadoop 3.x. Check each ecosystem component's compatibility matrix.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="yarn-timeline-service-v2">YARN Timeline Service v2<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#yarn-timeline-service-v2" class="hash-link" aria-label="Direct link to YARN Timeline Service v2" title="Direct link to YARN Timeline Service v2" translate="no">​</a></h3>
<p>YARN Timeline Service v2 requires HBase as a backend. If you relied on Timeline Service v1, plan the HBase deployment before enabling v2:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- yarn-site.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.timeline-service.version</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">2.0f</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="post-upgrade-verification">Post-Upgrade Verification<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#post-upgrade-verification" class="hash-link" aria-label="Direct link to Post-Upgrade Verification" title="Direct link to Post-Upgrade Verification" translate="no">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Verify HDFS health</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfsadmin -report</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs fsck / -summary</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Check YARN cluster</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">yarn node -list</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">yarn application -list</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Run a test job</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 10 100</span><br></span></code></pre></div></div>
<p>A successful Pi estimation job confirms that HDFS and YARN are both operational end-to-end.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary">Summary<a href="https://hadoop.so/blog/hadoop-2-to-3-upgrade-guide#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary" translate="no">​</a></h2>
<table><thead><tr><th>Phase</th><th>Action</th></tr></thead><tbody><tr><td>Pre-upgrade</td><td>Backup metadata, check Java 8+, audit ports</td></tr><tr><td>Step 1</td><td>Save namespace, stop services, upgrade NameNode</td></tr><tr><td>Step 2</td><td>Roll DataNodes one at a time</td></tr><tr><td>Step 3</td><td>Roll ResourceManager and NodeManagers</td></tr><tr><td>Step 4</td><td>Update config files for new default ports</td></tr><tr><td>Step 5</td><td>Finalize upgrade (reclaims rollback space)</td></tr></tbody></table>
<p>Upgrading Hadoop 2 to 3 is operationally straightforward when done in order. The biggest surprises tend to come from port changes and ecosystem tool compatibility — audit those before you start and the rest is mechanical.</p>]]></content>
        <author>
            <name>Hadoop.so Editorial Team</name>
            <uri>https://hadoop.so</uri>
        </author>
        <category label="Hadoop" term="Hadoop"/>
        <category label="Upgrade" term="Upgrade"/>
        <category label="HDFS" term="HDFS"/>
        <category label="YARN" term="YARN"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Using Hadoop with Amazon S3: The S3A Connector Explained]]></title>
        <id>https://hadoop.so/blog/hadoop-aws-s3a-connector-guide</id>
        <link href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide"/>
        <updated>2026-04-23T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[The s3a:// filesystem connector in Hadoop lets you use Amazon S3 as a drop-in replacement for HDFS storage. It's the foundation for cost-effective data lake architectures where compute and storage are decoupled. This guide covers configuration, performance tuning, and production best practices.]]></summary>
        <content type="html"><![CDATA[<p>The <code>s3a://</code> filesystem connector in Hadoop lets you use Amazon S3 as a drop-in replacement for HDFS storage. It's the foundation for cost-effective data lake architectures where compute and storage are decoupled. This guide covers configuration, performance tuning, and production best practices.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-s3a">Why S3A?<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#why-s3a" class="hash-link" aria-label="Direct link to Why S3A?" title="Direct link to Why S3A?" translate="no">​</a></h2>
<p>Amazon S3 offers virtually unlimited capacity at a fraction of the cost of on-premises HDFS. With the S3A connector (the third and current generation, replacing the older <code>s3://</code> and <code>s3n://</code> implementations), Hadoop jobs read and write S3 objects using familiar <code>s3a://bucket/path</code> URIs — no code changes required.</p>
<p>Key advantages of S3A:</p>
<ul>
<li class=""><strong>Storage decoupling</strong> — scale compute and storage independently</li>
<li class=""><strong>Durability</strong> — S3 provides 99.999999999% (11 nines) object durability</li>
<li class=""><strong>Cost</strong> — typically 70–80% cheaper than equivalent HDFS on-premises storage per GB</li>
<li class=""><strong>Multi-region availability</strong> — replicate data across AWS regions easily</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="core-configuration">Core Configuration<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#core-configuration" class="hash-link" aria-label="Direct link to Core Configuration" title="Direct link to Core Configuration" translate="no">​</a></h2>
<p>Add the following to <code>core-site.xml</code> on all cluster nodes. Never store credentials in config files for production — use IAM roles instead.</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- core-site.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">configuration</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- S3A implementation class --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.impl</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">org.apache.hadoop.fs.s3a.S3AFileSystem</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- AWS credentials (use IAM roles in production instead) --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.access.key</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">YOUR_ACCESS_KEY</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.secret.key</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">YOUR_SECRET_KEY</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- AWS region --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.endpoint.region</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">us-east-1</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">configuration</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>For production on EC2, use an IAM instance role and omit access/secret keys entirely — Hadoop will retrieve credentials from the EC2 metadata service automatically.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="authentication-methods">Authentication Methods<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#authentication-methods" class="hash-link" aria-label="Direct link to Authentication Methods" title="Direct link to Authentication Methods" translate="no">​</a></h2>
<p>S3A supports multiple credential providers, tried in order:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.aws.credentials.provider</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    com.amazonaws.auth.EnvironmentVariableCredentialsProvider,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    com.amazonaws.auth.profile.ProfileCredentialsProvider</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p><strong>Provider priority (recommended for production):</strong></p>
<ol>
<li class="">IAM Instance Role (EC2) or IAM Task Role (ECS/EKS)</li>
<li class="">AWS environment variables (<code>AWS_ACCESS_KEY_ID</code>, <code>AWS_SECRET_ACCESS_KEY</code>)</li>
<li class="">AWS credentials file (<code>~/.aws/credentials</code>)</li>
<li class="">Hardcoded keys in <code>core-site.xml</code> (avoid in production)</li>
</ol>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="performance-tuning">Performance Tuning<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#performance-tuning" class="hash-link" aria-label="Direct link to Performance Tuning" title="Direct link to Performance Tuning" translate="no">​</a></h2>
<p>S3 is not a filesystem — it's an object store. Network latency and request overhead are the main performance factors. These settings dramatically improve throughput:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="parallel-upload-multipart">Parallel Upload (Multipart)<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#parallel-upload-multipart" class="hash-link" aria-label="Direct link to Parallel Upload (Multipart)" title="Direct link to Parallel Upload (Multipart)" translate="no">​</a></h3>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Minimum part size for multipart upload (default 5MB) --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.multipart.size</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">67108864</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- 64MB --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Threshold above which multipart upload is used (default 128MB) --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.multipart.threshold</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">134217728</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- 128MB --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="connection-pool-size">Connection Pool Size<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#connection-pool-size" class="hash-link" aria-label="Direct link to Connection Pool Size" title="Direct link to Connection Pool Size" translate="no">​</a></h3>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Max connections to S3 per JVM --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.connection.maximum</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">100</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Enable HTTP keep-alive --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.connection.ssl.enabled</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="prefetch-and-read-ahead">Prefetch and Read-Ahead<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#prefetch-and-read-ahead" class="hash-link" aria-label="Direct link to Prefetch and Read-Ahead" title="Direct link to Prefetch and Read-Ahead" translate="no">​</a></h3>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Prefetch block size for sequential reads --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.readahead.range</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">1048576</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- 1MB --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Enable predictive prefetch (Hadoop 3.3.5+) --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.prefetch.enabled</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.prefetch.block.size</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">8388608</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- 8MB --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="fast-upload-buffer">Fast Upload Buffer<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#fast-upload-buffer" class="hash-link" aria-label="Direct link to Fast Upload Buffer" title="Direct link to Fast Upload Buffer" translate="no">​</a></h3>
<p>S3A's fast upload mode buffers data in memory or disk before uploading, improving job throughput:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.fast.upload</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Buffer location: disk, array, or bytebuffer --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.fast.upload.buffer</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">disk</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="committer-handling-the-rename-problem">Committer: Handling the Rename Problem<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#committer-handling-the-rename-problem" class="hash-link" aria-label="Direct link to Committer: Handling the Rename Problem" title="Direct link to Committer: Handling the Rename Problem" translate="no">​</a></h2>
<p>S3 doesn't support atomic directory renames. The classic <code>FileOutputCommitter</code> moves output files from a <code>_temporary/</code> directory to the final path — on HDFS this is a metadata operation, but on S3 it means copying every byte. For large outputs this is catastrophic.</p>
<p><strong>Use the S3A Magic Committer instead:</strong></p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- mapred-site.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">mapreduce.outputcommitter.factory.scheme.s3a</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.committer.name</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">magic</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.committer.magic.enabled</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>The Magic Committer uses S3's multipart upload API to write files directly to their final location during the task — commit just calls <code>CompleteMultipartUpload</code> and is nearly instantaneous.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="working-with-s3a">Working with S3A<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#working-with-s3a" class="hash-link" aria-label="Direct link to Working with S3A" title="Direct link to Working with S3A" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="basic-file-operations">Basic file operations<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#basic-file-operations" class="hash-link" aria-label="Direct link to Basic file operations" title="Direct link to Basic file operations" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># List bucket contents</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hadoop fs -ls s3a://my-bucket/data/</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Copy local file to S3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hadoop fs -put localfile.csv s3a://my-bucket/input/</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Run a MapReduce job with S3 input/output</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hadoop jar hadoop-mapreduce-examples-*.jar wordcount \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  s3a://my-bucket/input/ \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  s3a://my-bucket/output/wordcount/</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Check S3 usage (note: no block count, S3 is object store)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hadoop fs -du -h s3a://my-bucket/</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="per-bucket-configuration">Per-bucket configuration<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#per-bucket-configuration" class="hash-link" aria-label="Direct link to Per-bucket configuration" title="Direct link to Per-bucket configuration" translate="no">​</a></h3>
<p>You can configure different credentials or endpoints per bucket using per-bucket properties:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.bucket.my-west-bucket.endpoint.region</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">us-west-2</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.bucket.my-west-bucket.access.key</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">WEST_REGION_ACCESS_KEY</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="s3-compatible-object-stores">S3-Compatible Object Stores<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#s3-compatible-object-stores" class="hash-link" aria-label="Direct link to S3-Compatible Object Stores" title="Direct link to S3-Compatible Object Stores" translate="no">​</a></h2>
<p>S3A works with any S3-compatible object store by overriding the endpoint:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- MinIO example --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.endpoint</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">http://minio.internal:9000</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">fs.s3a.path.style.access</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>This enables the same Hadoop code to run against MinIO (on-premises), Ceph RGW, Cloudflare R2, or Backblaze B2 — same config pattern, different endpoint.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="common-issues-and-fixes">Common Issues and Fixes<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#common-issues-and-fixes" class="hash-link" aria-label="Direct link to Common Issues and Fixes" title="Direct link to Common Issues and Fixes" translate="no">​</a></h2>
<table><thead><tr><th>Symptom</th><th>Cause</th><th>Fix</th></tr></thead><tbody><tr><td><code>AccessDeniedException</code></td><td>Wrong credentials or missing IAM policy</td><td>Verify IAM policy includes <code>s3:GetObject</code>, <code>s3:PutObject</code>, <code>s3:ListBucket</code></td></tr><tr><td>Slow output commit</td><td>Using old FileOutputCommitter</td><td>Switch to S3A Magic Committer</td></tr><tr><td><code>IOException: No AWS credentials</code></td><td>No credential provider found</td><td>Configure IAM role or credential provider chain</td></tr><tr><td>High S3 costs (LIST requests)</td><td>Many small files / frequent listing</td><td>Combine small files, use <code>s3a://</code> prefixes wisely</td></tr><tr><td><code>FileNotFoundException</code> on read after write</td><td>S3 eventual consistency (old SDKs)</td><td>Hadoop 3.3.1+ uses S3 strong consistency — upgrade</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary">Summary<a href="https://hadoop.so/blog/hadoop-aws-s3a-connector-guide#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary" translate="no">​</a></h2>
<p>The S3A connector transforms Hadoop into a hybrid compute engine that can process data wherever it lives. Key takeaways:</p>
<ul>
<li class="">Use <strong>IAM roles</strong> for authentication — never hardcode credentials</li>
<li class="">Enable the <strong>Magic Committer</strong> to avoid catastrophic rename overhead</li>
<li class="">Tune <strong>multipart upload</strong> size and <strong>connection pool</strong> for your workload</li>
<li class="">S3 strong consistency (available since 2020) eliminates the "read-after-write" problem in modern Hadoop releases</li>
<li class="">Per-bucket configuration lets you span multiple regions and credential sets</li>
</ul>
<p>With S3A, running Hadoop workloads on transient EMR clusters or spot instances — reading and writing directly to S3 — becomes a practical, cost-efficient architecture.</p>]]></content>
        <author>
            <name>Hadoop.so Editorial Team</name>
            <uri>https://hadoop.so</uri>
        </author>
        <category label="Hadoop" term="Hadoop"/>
        <category label="AWS" term="AWS"/>
        <category label="Cloud" term="Cloud"/>
        <category label="Storage" term="Storage"/>
        <category label="HDFS" term="HDFS"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Securing Your Hadoop DataNode: Kerberos, Wire Encryption, and Best Practices]]></title>
        <id>https://hadoop.so/blog/securing-hadoop-datanode</id>
        <link href="https://hadoop.so/blog/securing-hadoop-datanode"/>
        <updated>2026-04-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[An unsecured Hadoop cluster is a ticking time bomb. Without authentication, any user on the network can read, write, or delete HDFS data. This guide covers the essential security layers for HDFS DataNodes: Kerberos authentication, data transfer encryption, block access tokens, and OS-level hardening.]]></summary>
        <content type="html"><![CDATA[<p>An unsecured Hadoop cluster is a ticking time bomb. Without authentication, any user on the network can read, write, or delete HDFS data. This guide covers the essential security layers for HDFS DataNodes: Kerberos authentication, data transfer encryption, block access tokens, and OS-level hardening.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-datanodes-are-a-security-target">Why DataNodes Are a Security Target<a href="https://hadoop.so/blog/securing-hadoop-datanode#why-datanodes-are-a-security-target" class="hash-link" aria-label="Direct link to Why DataNodes Are a Security Target" title="Direct link to Why DataNodes Are a Security Target" translate="no">​</a></h2>
<p>DataNodes are the workhorses of HDFS — they store actual data blocks and serve reads/writes to clients. In an unsecured cluster:</p>
<ul>
<li class="">Any process that can reach port 9866 (DataNode transfer port) can read or write blocks directly</li>
<li class="">There's no per-user access control on who reads which data</li>
<li class="">A rogue client can inject corrupt or malicious blocks</li>
</ul>
<p>Hadoop's security model addresses all of this through Kerberos-based mutual authentication, block access tokens, and optional wire encryption.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="layer-1-kerberos-authentication">Layer 1: Kerberos Authentication<a href="https://hadoop.so/blog/securing-hadoop-datanode#layer-1-kerberos-authentication" class="hash-link" aria-label="Direct link to Layer 1: Kerberos Authentication" title="Direct link to Layer 1: Kerberos Authentication" translate="no">​</a></h2>
<p>Kerberos is the foundation of Hadoop security. Every Hadoop service (NameNode, DataNode, ResourceManager, NodeManager) authenticates with a Kerberos principal before communicating.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="prerequisites">Prerequisites<a href="https://hadoop.so/blog/securing-hadoop-datanode#prerequisites" class="hash-link" aria-label="Direct link to Prerequisites" title="Direct link to Prerequisites" translate="no">​</a></h3>
<ul>
<li class="">A running Kerberos KDC (MIT Kerberos or Active Directory)</li>
<li class="">DNS properly configured (Kerberos is very sensitive to hostname resolution)</li>
<li class="">Synchronized clocks across all nodes (within 5 minutes; use NTP)</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="create-service-principals">Create Service Principals<a href="https://hadoop.so/blog/securing-hadoop-datanode#create-service-principals" class="hash-link" aria-label="Direct link to Create Service Principals" title="Direct link to Create Service Principals" translate="no">​</a></h3>
<p>For each DataNode host, create a principal:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># On the KDC</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kadmin.local -q "addprinc -randkey hdfs/datanode1.example.com@EXAMPLE.COM"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kadmin.local -q "addprinc -randkey host/datanode1.example.com@EXAMPLE.COM"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Export keytabs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kadmin.local -q "ktadd -k /etc/security/keytabs/hdfs.keytab hdfs/datanode1.example.com@EXAMPLE.COM"</span><br></span></code></pre></div></div>
<p>Copy keytabs to each DataNode at <code>/etc/security/keytabs/hdfs.keytab</code> with ownership <code>hdfs:hdfs</code> and mode <code>400</code>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="enable-security-in-hdfs-sitexml">Enable Security in hdfs-site.xml<a href="https://hadoop.so/blog/securing-hadoop-datanode#enable-security-in-hdfs-sitexml" class="hash-link" aria-label="Direct link to Enable Security in hdfs-site.xml" title="Direct link to Enable Security in hdfs-site.xml" translate="no">​</a></h3>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- hdfs-site.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.block.access.token.enable</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- DataNode SASL RPC authentication --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.datanode.kerberos.principal</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">hdfs/_HOST@EXAMPLE.COM</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.datanode.keytab.file</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">/etc/security/keytabs/hdfs.keytab</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="enable-security-in-core-sitexml">Enable Security in core-site.xml<a href="https://hadoop.so/blog/securing-hadoop-datanode#enable-security-in-core-sitexml" class="hash-link" aria-label="Direct link to Enable Security in core-site.xml" title="Direct link to Enable Security in core-site.xml" translate="no">​</a></h3>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- core-site.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">hadoop.security.authentication</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">kerberos</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">hadoop.security.authorization</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">hadoop.rpc.protection</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">authentication</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- or privacy for encryption --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="layer-2-block-access-tokens">Layer 2: Block Access Tokens<a href="https://hadoop.so/blog/securing-hadoop-datanode#layer-2-block-access-tokens" class="hash-link" aria-label="Direct link to Layer 2: Block Access Tokens" title="Direct link to Layer 2: Block Access Tokens" translate="no">​</a></h2>
<p>Block access tokens prevent unauthorized direct block reads/writes even from nodes that have network access to a DataNode. The NameNode issues a short-lived token when a client requests a block location; the DataNode validates the token before serving data.</p>
<p>Enable with:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.block.access.token.enable</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.block.access.token.lifetime</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">600</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- seconds; default 600 --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>Without block tokens, a client who obtains a block location (host<!-- -->:port<!-- --> + block ID) can read that block without further auth. With tokens, the NameNode effectively gatekeeps all data transfers.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="layer-3-wire-encryption">Layer 3: Wire Encryption<a href="https://hadoop.so/blog/securing-hadoop-datanode#layer-3-wire-encryption" class="hash-link" aria-label="Direct link to Layer 3: Wire Encryption" title="Direct link to Layer 3: Wire Encryption" translate="no">​</a></h2>
<p>Even with Kerberos, data transferred between DataNodes and clients is in plaintext by default. Enable encryption for data in transit:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="rpc-encryption-control-plane">RPC Encryption (control plane)<a href="https://hadoop.so/blog/securing-hadoop-datanode#rpc-encryption-control-plane" class="hash-link" aria-label="Direct link to RPC Encryption (control plane)" title="Direct link to RPC Encryption (control plane)" translate="no">​</a></h3>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- core-site.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">hadoop.rpc.protection</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">privacy</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- authentication = auth only; integrity = + checksums; privacy = + encryption --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="data-transfer-encryption-data-plane">Data Transfer Encryption (data plane)<a href="https://hadoop.so/blog/securing-hadoop-datanode#data-transfer-encryption-data-plane" class="hash-link" aria-label="Direct link to Data Transfer Encryption (data plane)" title="Direct link to Data Transfer Encryption (data plane)" translate="no">​</a></h3>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- hdfs-site.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.encrypt.data.transfer</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.encrypt.data.transfer.algorithm</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">rc4</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- or 3des for FIPS compliance --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.encrypt.data.transfer.cipher.suites</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">AES/CTR/NoPadding</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- hardware-accelerated AES on modern CPUs --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>AES/CTR with hardware acceleration (AES-NI, available on most modern Intel/AMD CPUs) adds only 5–10% overhead compared to unencrypted transfer.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="layer-4-datanode-sasl-on-privileged-ports">Layer 4: DataNode SASL on Privileged Ports<a href="https://hadoop.so/blog/securing-hadoop-datanode#layer-4-datanode-sasl-on-privileged-ports" class="hash-link" aria-label="Direct link to Layer 4: DataNode SASL on Privileged Ports" title="Direct link to Layer 4: DataNode SASL on Privileged Ports" translate="no">​</a></h2>
<p>Running the DataNode data transfer on a privileged port (below 1024) proves that the process was started as root and later dropped privileges — adding OS-level verification. This is optional but adds defense in depth.</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.datanode.address</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">0.0.0.0:1004</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.datanode.http.address</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">0.0.0.0:1006</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>When using SASL (Hadoop 2.6+) instead of privileged ports, the DataNode proves its identity through Kerberos without needing root-owned ports:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.datanode.require.secure.ports</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">false</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">dfs.http.policy</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">HTTPS_ONLY</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="layer-5-os-hardening">Layer 5: OS Hardening<a href="https://hadoop.so/blog/securing-hadoop-datanode#layer-5-os-hardening" class="hash-link" aria-label="Direct link to Layer 5: OS Hardening" title="Direct link to Layer 5: OS Hardening" translate="no">​</a></h2>
<p>Kerberos secures the Hadoop layer, but the underlying OS must also be locked down:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="file-permissions">File Permissions<a href="https://hadoop.so/blog/securing-hadoop-datanode#file-permissions" class="hash-link" aria-label="Direct link to File Permissions" title="Direct link to File Permissions" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># DataNode data directories should be owned by the hdfs user</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">chown -R hdfs:hadoop /data/hdfs/dn</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">chmod 700 /data/hdfs/dn</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Keytab files must not be world-readable</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">chmod 400 /etc/security/keytabs/hdfs.keytab</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">chown hdfs:hdfs /etc/security/keytabs/hdfs.keytab</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="network-restrictions">Network Restrictions<a href="https://hadoop.so/blog/securing-hadoop-datanode#network-restrictions" class="hash-link" aria-label="Direct link to Network Restrictions" title="Direct link to Network Restrictions" translate="no">​</a></h3>
<p>Restrict DataNode ports to cluster-internal network ranges using <code>iptables</code> or <code>firewalld</code>:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Only allow DataNode transfer port from within the cluster subnet</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">iptables -A INPUT -p tcp --dport 9866 -s 10.0.0.0/8 -j ACCEPT</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">iptables -A INPUT -p tcp --dport 9866 -j DROP</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="run-datanode-as-non-root">Run DataNode as Non-Root<a href="https://hadoop.so/blog/securing-hadoop-datanode#run-datanode-as-non-root" class="hash-link" aria-label="Direct link to Run DataNode as Non-Root" title="Direct link to Run DataNode as Non-Root" translate="no">​</a></h3>
<p>The DataNode process should run as the <code>hdfs</code> system user, not root:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># /etc/hadoop/hadoop-env.sh</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export HDFS_DATANODE_USER=hdfs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export HDFS_DATANODE_SECURE_USER=hdfs</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="verifying-security-configuration">Verifying Security Configuration<a href="https://hadoop.so/blog/securing-hadoop-datanode#verifying-security-configuration" class="hash-link" aria-label="Direct link to Verifying Security Configuration" title="Direct link to Verifying Security Configuration" translate="no">​</a></h2>
<p>After enabling security, verify that everything works:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Obtain a Kerberos ticket for the hdfs service user</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kinit -kt /etc/security/keytabs/hdfs.keytab hdfs/namenode.example.com@EXAMPLE.COM</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># List HDFS root (should succeed)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfs -ls /</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Check that unauthenticated access is denied</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kdestroy</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfs -ls /  # Should fail with "No valid credentials"</span><br></span></code></pre></div></div>
<p>Run an HDFS health check with auth:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kinit -kt /etc/security/keytabs/hdfs.keytab hdfs/namenode.example.com@EXAMPLE.COM</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs dfsadmin -report</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hdfs fsck / -summary</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="security-audit-checklist">Security Audit Checklist<a href="https://hadoop.so/blog/securing-hadoop-datanode#security-audit-checklist" class="hash-link" aria-label="Direct link to Security Audit Checklist" title="Direct link to Security Audit Checklist" translate="no">​</a></h2>
<table><thead><tr><th>Item</th><th>Secured</th></tr></thead><tbody><tr><td>Kerberos principals created for all service hosts</td><td>[ ]</td></tr><tr><td>Keytab files owned by service user, mode 400</td><td>[ ]</td></tr><tr><td><code>hadoop.security.authentication = kerberos</code></td><td>[ ]</td></tr><tr><td><code>dfs.block.access.token.enable = true</code></td><td>[ ]</td></tr><tr><td>Data transfer encryption enabled</td><td>[ ]</td></tr><tr><td>DataNode data dirs owned by hdfs user, mode 700</td><td>[ ]</td></tr><tr><td>Firewall restricts DataNode ports to cluster subnet</td><td>[ ]</td></tr><tr><td>HDFS audit logging enabled</td><td>[ ]</td></tr><tr><td>NTP synchronized (&lt; 5 min skew)</td><td>[ ]</td></tr><tr><td>Ranger or Sentry for fine-grained authorization</td><td>[ ]</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary">Summary<a href="https://hadoop.so/blog/securing-hadoop-datanode#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary" translate="no">​</a></h2>
<p>Securing a Hadoop DataNode involves multiple complementary layers:</p>
<ol>
<li class=""><strong>Kerberos</strong> — mutual authentication between services and clients</li>
<li class=""><strong>Block access tokens</strong> — prevent unauthorized direct block access</li>
<li class=""><strong>Wire encryption</strong> — protect data in transit (RPC + data transfer)</li>
<li class=""><strong>Privileged ports or SASL</strong> — OS-level service identity verification</li>
<li class=""><strong>OS hardening</strong> — file permissions, firewall, non-root user</li>
</ol>
<p>No single layer is sufficient on its own. A properly secured DataNode requires all these working together. For fine-grained row-level and column-level access control beyond what HDFS ACLs provide, look at Apache Ranger as the next step.</p>]]></content>
        <author>
            <name>Hadoop.so Editorial Team</name>
            <uri>https://hadoop.so</uri>
        </author>
        <category label="Hadoop" term="Hadoop"/>
        <category label="Security" term="Security"/>
        <category label="HDFS" term="HDFS"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Hadoop and Java: A Version Compatibility Guide]]></title>
        <id>https://hadoop.so/blog/hadoop-java-version-compatibility-guide</id>
        <link href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide"/>
        <updated>2026-04-21T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Picking the wrong Java version for your Hadoop cluster is one of the most common causes of cryptic build failures, runtime exceptions, and upgrade blockers. This guide maps Hadoop releases to their supported Java versions, explains what changed between Java versions, and offers practical recommendations for 2025.]]></summary>
        <content type="html"><![CDATA[<p>Picking the wrong Java version for your Hadoop cluster is one of the most common causes of cryptic build failures, runtime exceptions, and upgrade blockers. This guide maps Hadoop releases to their supported Java versions, explains what changed between Java versions, and offers practical recommendations for 2025.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="quick-reference-hadoop--java-compatibility-matrix">Quick Reference: Hadoop ↔ Java Compatibility Matrix<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#quick-reference-hadoop--java-compatibility-matrix" class="hash-link" aria-label="Direct link to Quick Reference: Hadoop ↔ Java Compatibility Matrix" title="Direct link to Quick Reference: Hadoop ↔ Java Compatibility Matrix" translate="no">​</a></h2>
<table><thead><tr><th>Hadoop Version</th><th>Java 8</th><th>Java 11</th><th>Java 17</th><th>Java 21</th></tr></thead><tbody><tr><td>2.10.x</td><td>Required</td><td>Not supported</td><td>Not supported</td><td>Not supported</td></tr><tr><td>3.2.x</td><td>Supported</td><td>Supported</td><td>Not supported</td><td>Not supported</td></tr><tr><td>3.3.x</td><td>Supported</td><td>Supported</td><td>Limited (3.3.5+)</td><td>Not supported</td></tr><tr><td>3.4.x</td><td>Supported</td><td>Supported</td><td>Supported</td><td>Preview</td></tr><tr><td>3.5.x (planned)</td><td>Deprecated</td><td>Supported</td><td>Supported</td><td>Supported</td></tr></tbody></table>
<p><strong>Rule of thumb for 2025:</strong> Run Java 11 for stability. Java 17 with Hadoop 3.4.x is becoming viable. Java 8 is approaching EOL and should be phased out.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="java-8-the-legacy-standard">Java 8: The Legacy Standard<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#java-8-the-legacy-standard" class="hash-link" aria-label="Direct link to Java 8: The Legacy Standard" title="Direct link to Java 8: The Legacy Standard" translate="no">​</a></h2>
<p>Hadoop 3.x requires Java 8 as the minimum. Java 8 was the de facto standard for Hadoop workloads from 2014 through about 2022.</p>
<p><strong>Current status:</strong> Oracle Java 8 reached end-of-life for public updates in March 2022. Adoptium (Temurin) and Amazon Corretto still provide free, maintained Java 8 builds, but the ecosystem is moving on.</p>
<p><strong>When to use:</strong> Only if your organization has a hard dependency on a third-party library or Hadoop ecosystem component that isn't yet compatible with Java 11+.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="install-temurin-8-ubuntudebian">Install Temurin 8 (Ubuntu/Debian)<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#install-temurin-8-ubuntudebian" class="hash-link" aria-label="Direct link to Install Temurin 8 (Ubuntu/Debian)" title="Direct link to Install Temurin 8 (Ubuntu/Debian)" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">sudo apt-get install -y wget apt-transport-https</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">wget -qO - https://packages.adoptium.net/artifactory/api/gpg/key/public | sudo apt-key add -</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">echo "deb https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | sudo tee /etc/apt/sources.list.d/adoptium.list</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sudo apt-get update</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sudo apt-get install -y temurin-8-jdk</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="java-11-the-recommended-choice-lts">Java 11: The Recommended Choice (LTS)<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#java-11-the-recommended-choice-lts" class="hash-link" aria-label="Direct link to Java 11: The Recommended Choice (LTS)" title="Direct link to Java 11: The Recommended Choice (LTS)" translate="no">​</a></h2>
<p>Java 11, released in 2018, is the most tested Java version with Hadoop 3.x as of 2025. All major ecosystem tools (Spark, Hive, HBase, Flink) support Java 11.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-changes-from-java-8">Key changes from Java 8<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#key-changes-from-java-8" class="hash-link" aria-label="Direct link to Key changes from Java 8" title="Direct link to Key changes from Java 8" translate="no">​</a></h3>
<ul>
<li class=""><strong>Module system (Project Jigsaw)</strong> — <code>--add-opens</code> flags needed for reflective access</li>
<li class=""><strong>G1GC is now default</strong> — better pause times for large heaps (critical for NameNode with hundreds of millions of files)</li>
<li class=""><strong><code>javax.*</code> → <code>jakarta.*</code></strong> — does not affect Hadoop itself, but affects some ecosystem tools</li>
<li class=""><strong>Removed deprecated API</strong> — <code>sun.misc.Unsafe</code> usage patterns changed</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="required-jvm-flags-for-hadoop-on-java-11">Required JVM flags for Hadoop on Java 11+<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#required-jvm-flags-for-hadoop-on-java-11" class="hash-link" aria-label="Direct link to Required JVM flags for Hadoop on Java 11+" title="Direct link to Required JVM flags for Hadoop on Java 11+" translate="no">​</a></h3>
<p>Hadoop's internal code (and many dependencies) use reflective access that Java 11's module system restricts by default. Add these to <code>hadoop-env.sh</code>:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">export HADOOP_OPTS="$HADOOP_OPTS \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --add-opens=java.base/java.lang=ALL-UNNAMED \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --add-opens=java.base/java.lang.reflect=ALL-UNNAMED \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --add-opens=java.base/java.io=ALL-UNNAMED \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --add-opens=java.base/java.net=ALL-UNNAMED \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --add-opens=java.base/java.util=ALL-UNNAMED \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --add-opens=java.base/java.util.concurrent=ALL-UNNAMED \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --add-opens=java.base/sun.net.dns=ALL-UNNAMED \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --add-opens=java.base/sun.net.util=ALL-UNNAMED"</span><br></span></code></pre></div></div>
<p>Hadoop 3.3.0 and later automatically adds most of these — check your version before manually adding them.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="java-17-stricter-modules-better-performance">Java 17: Stricter Modules, Better Performance<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#java-17-stricter-modules-better-performance" class="hash-link" aria-label="Direct link to Java 17: Stricter Modules, Better Performance" title="Direct link to Java 17: Stricter Modules, Better Performance" translate="no">​</a></h2>
<p>Java 17 (LTS, 2021) tightened the module encapsulation started in Java 11. It fully enforces strong encapsulation of JDK internals that Java 11 only warned about.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hadoop-compatibility">Hadoop compatibility<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#hadoop-compatibility" class="hash-link" aria-label="Direct link to Hadoop compatibility" title="Direct link to Hadoop compatibility" translate="no">​</a></h3>
<ul>
<li class=""><strong>Hadoop 3.3.4 and earlier:</strong> Not supported — too many internal reflection violations</li>
<li class=""><strong>Hadoop 3.3.5+:</strong> Experimental support</li>
<li class=""><strong>Hadoop 3.4.0+:</strong> Officially supported</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-required-fixing-for-java-17">What required fixing for Java 17<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#what-required-fixing-for-java-17" class="hash-link" aria-label="Direct link to What required fixing for Java 17" title="Direct link to What required fixing for Java 17" translate="no">​</a></h3>
<p>The main breaking changes were in <code>sun.misc.Unsafe</code> usage and <code>java.lang.reflect</code> access patterns throughout Hadoop's RPC and serialization code. The community patched these across multiple JIRAs (HADOOP-17975, HADOOP-18079, and others).</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="gc-improvements-relevant-to-hadoop">GC improvements relevant to Hadoop<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#gc-improvements-relevant-to-hadoop" class="hash-link" aria-label="Direct link to GC improvements relevant to Hadoop" title="Direct link to GC improvements relevant to Hadoop" translate="no">​</a></h3>
<p>Java 17 includes ZGC and Shenandoah as production-ready collectors (not just experimental as in Java 11). For NameNode workloads with large heaps (32–512GB), ZGC's sub-millisecond pauses can dramatically improve NameNode responsiveness:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># hadoop-env.sh — NameNode with ZGC</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export HDFS_NAMENODE_OPTS="-Xms64g -Xmx64g -XX:+UseZGC $HDFS_NAMENODE_OPTS"</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="java-21-the-future-lts">Java 21: The Future LTS<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#java-21-the-future-lts" class="hash-link" aria-label="Direct link to Java 21: The Future LTS" title="Direct link to Java 21: The Future LTS" translate="no">​</a></h2>
<p>Java 21 (LTS, 2023) is the current long-term support release. It brings virtual threads (Project Loom), pattern matching, and record patterns.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hadoop-compatibility-1">Hadoop compatibility<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#hadoop-compatibility-1" class="hash-link" aria-label="Direct link to Hadoop compatibility" title="Direct link to Hadoop compatibility" translate="no">​</a></h3>
<p>Full support for Java 21 is still in progress as of early 2025. Hadoop 3.4.x has preliminary support — test thoroughly before using in production.</p>
<p>Virtual threads are particularly interesting for Hadoop's IPC layer, which handles thousands of concurrent RPC connections. Future Hadoop releases may leverage virtual threads to reduce thread-pool overhead on NameNodes and ResourceManagers.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="setting-java_home-for-hadoop">Setting JAVA_HOME for Hadoop<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#setting-java_home-for-hadoop" class="hash-link" aria-label="Direct link to Setting JAVA_HOME for Hadoop" title="Direct link to Setting JAVA_HOME for Hadoop" translate="no">​</a></h2>
<p>Hadoop reads <code>JAVA_HOME</code> from <code>hadoop-env.sh</code>. Always set it explicitly rather than relying on the system default — different users or shell environments may have different defaults:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># /etc/hadoop/conf/hadoop-env.sh</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Or use java_home helper on macOS:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># export JAVA_HOME=$(/usr/libexec/java_home -v 11)</span><br></span></code></pre></div></div>
<p>Verify that every node in the cluster uses the same Java version and JAVA_HOME path to avoid subtle serialization incompatibilities.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="multi-jdk-environments-sdkman">Multi-JDK Environments: SDKMAN!<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#multi-jdk-environments-sdkman" class="hash-link" aria-label="Direct link to Multi-JDK Environments: SDKMAN!" title="Direct link to Multi-JDK Environments: SDKMAN!" translate="no">​</a></h2>
<p>If you maintain multiple Hadoop environments with different Java requirements, SDKMAN! makes switching trivial:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Install SDKMAN!</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">curl -s "https://get.sdkman.io" | bash</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Install multiple JDKs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sdk install java 11.0.22-tem</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sdk install java 17.0.10-tem</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sdk install java 21.0.2-tem</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Switch for a session</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sdk use java 11.0.22-tem</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Set a default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sdk default java 11.0.22-tem</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="checking-your-current-setup">Checking Your Current Setup<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#checking-your-current-setup" class="hash-link" aria-label="Direct link to Checking Your Current Setup" title="Direct link to Checking Your Current Setup" translate="no">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Check Java version</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">java -version</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Check what Hadoop thinks JAVA_HOME is</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hadoop version</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Check which JVM is actually running the NameNode</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ps aux | grep NameNode</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Find the PID, then:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ls -la /proc/&lt;PID&gt;/exe</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary-recommendations">Summary Recommendations<a href="https://hadoop.so/blog/hadoop-java-version-compatibility-guide#summary-recommendations" class="hash-link" aria-label="Direct link to Summary Recommendations" title="Direct link to Summary Recommendations" translate="no">​</a></h2>
<table><thead><tr><th>Scenario</th><th>Recommended Java</th></tr></thead><tbody><tr><td>Hadoop 2.10.x (legacy)</td><td>Java 8</td></tr><tr><td>Hadoop 3.2.x production</td><td>Java 11</td></tr><tr><td>Hadoop 3.3.x production</td><td>Java 11 (stable), Java 17 (3.3.5+ experimental)</td></tr><tr><td>Hadoop 3.4.x production</td><td>Java 11 or Java 17</td></tr><tr><td>New deployments in 2025</td><td>Java 11 (safe), Java 17 (forward-looking)</td></tr><tr><td>Future / Hadoop 3.5.x</td><td>Java 17 or Java 21</td></tr></tbody></table>
<p>The Java version you run affects not just Hadoop but every ecosystem tool layered on top: Spark, Hive, HBase, Kafka. Coordinate your Java version across the entire platform before upgrading, and always test with your actual workloads — JVM GC behavior at scale can differ significantly from simple benchmarks.</p>]]></content>
        <author>
            <name>Hadoop.so Editorial Team</name>
            <uri>https://hadoop.so</uri>
        </author>
        <category label="Hadoop" term="Hadoop"/>
        <category label="Java" term="Java"/>
        <category label="Release" term="Release"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[YARN Containers Deep Dive: How Resource Allocation Really Works]]></title>
        <id>https://hadoop.so/blog/yarn-containers-deep-dive</id>
        <link href="https://hadoop.so/blog/yarn-containers-deep-dive"/>
        <updated>2026-04-20T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[YARN (Yet Another Resource Negotiator) is Hadoop's cluster resource management layer. Understanding how YARN allocates containers — the fundamental unit of computation — is essential for getting good utilization and avoiding the frustrating "application is waiting for resources" message that plagues many clusters.]]></summary>
        <content type="html"><![CDATA[<p>YARN (Yet Another Resource Negotiator) is Hadoop's cluster resource management layer. Understanding how YARN allocates containers — the fundamental unit of computation — is essential for getting good utilization and avoiding the frustrating "application is waiting for resources" message that plagues many clusters.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-is-a-yarn-container">What Is a YARN Container?<a href="https://hadoop.so/blog/yarn-containers-deep-dive#what-is-a-yarn-container" class="hash-link" aria-label="Direct link to What Is a YARN Container?" title="Direct link to What Is a YARN Container?" translate="no">​</a></h2>
<p>A YARN container is a reservation of resources (CPU cores and memory) on a NodeManager. It's the unit in which YARN launches application tasks — MapReduce map/reduce tasks, Spark executors, Tez tasks, and so on all run inside YARN containers.</p>
<p>Each container has:</p>
<ul>
<li class=""><strong>Memory</strong> (in MB) — hard limit enforced by cgroups or virtual memory check</li>
<li class=""><strong>vCores</strong> (virtual CPU cores) — a logical unit, not necessarily a physical core</li>
<li class="">A <strong>host</strong> (the NodeManager it runs on)</li>
<li class="">A <strong>priority</strong> within the application</li>
</ul>
<p>Containers are ephemeral: they're created for a task, run to completion, and are released.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-container-allocation-works">How Container Allocation Works<a href="https://hadoop.so/blog/yarn-containers-deep-dive#how-container-allocation-works" class="hash-link" aria-label="Direct link to How Container Allocation Works" title="Direct link to How Container Allocation Works" translate="no">​</a></h2>
<p>The flow from application submission to container launch:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Client</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  └──► ApplicationMaster (AM) submitted to YARN</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         └──► AM registers with ResourceManager</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                └──► AM requests containers (resource requests)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                       │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                       └──► ResourceManager scheduler evaluates queue + node availability</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                              │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                              └──► ResourceManager issues container allocations</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                                     │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                                     └──► AM contacts NodeManager → container launch</span><br></span></code></pre></div></div>
<p>The ResourceManager's scheduler is the brain. It tracks available resources on each NodeManager and matches pending container requests against available capacity, obeying queue limits, node labels, and locality preferences.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="resource-request-parameters">Resource Request Parameters<a href="https://hadoop.so/blog/yarn-containers-deep-dive#resource-request-parameters" class="hash-link" aria-label="Direct link to Resource Request Parameters" title="Direct link to Resource Request Parameters" translate="no">​</a></h2>
<p>An ApplicationMaster sends resource requests with these key parameters:</p>
<table><thead><tr><th>Parameter</th><th>Description</th></tr></thead><tbody><tr><td><code>memory</code></td><td>Container memory in MB</td></tr><tr><td><code>vCores</code></td><td>Virtual CPU cores</td></tr><tr><td><code>nodes</code></td><td>Preferred host list (data locality)</td></tr><tr><td><code>racks</code></td><td>Preferred rack list (rack locality)</td></tr><tr><td><code>priority</code></td><td>Request priority (lower = higher priority)</td></tr><tr><td><code>relaxLocality</code></td><td>Whether to fall back to any node if preferred not available</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="locality-preference-order">Locality preference order<a href="https://hadoop.so/blog/yarn-containers-deep-dive#locality-preference-order" class="hash-link" aria-label="Direct link to Locality preference order" title="Direct link to Locality preference order" translate="no">​</a></h3>
<p>YARN tries to honor locality in this order:</p>
<ol>
<li class=""><strong>Node-local</strong> — container on the same node as the data block</li>
<li class=""><strong>Rack-local</strong> — container on a node in the same rack</li>
<li class=""><strong>Off-rack</strong> — any node in the cluster</li>
</ol>
<p>For MapReduce, HDFS block locality means the map task ideally runs on a node that already has the block locally — avoiding network transfer entirely.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="nodemanager-configuration">NodeManager Configuration<a href="https://hadoop.so/blog/yarn-containers-deep-dive#nodemanager-configuration" class="hash-link" aria-label="Direct link to NodeManager Configuration" title="Direct link to NodeManager Configuration" translate="no">​</a></h2>
<p>Configure how much resource each NodeManager exposes to YARN in <code>yarn-site.xml</code>:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Total memory available to YARN containers on this node --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.resource.memory-mb</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">49152</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- 48GB; leave some for OS and system processes --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Total vCores available on this node --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.resource.cpu-vcores</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">20</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Leave 2-4 for OS --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Minimum container memory allocation --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.scheduler.minimum-allocation-mb</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">1024</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Maximum container memory allocation --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.scheduler.maximum-allocation-mb</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">16384</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Minimum container vCores --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.scheduler.minimum-allocation-vcores</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">1</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Maximum container vCores --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.scheduler.maximum-allocation-vcores</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">8</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p><strong>Important:</strong> Container allocations are rounded up to the nearest increment of <code>minimum-allocation-mb</code>. Requesting 1.5GB on a cluster with 1GB minimum gets you 2GB. Size your minimum allocation to match typical workloads.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="memory-enforcement-virtual-vs-physical">Memory Enforcement: Virtual vs Physical<a href="https://hadoop.so/blog/yarn-containers-deep-dive#memory-enforcement-virtual-vs-physical" class="hash-link" aria-label="Direct link to Memory Enforcement: Virtual vs Physical" title="Direct link to Memory Enforcement: Virtual vs Physical" translate="no">​</a></h2>
<p>YARN can enforce memory limits in two ways:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="virtual-memory-check-default-on">Virtual Memory Check (default on)<a href="https://hadoop.so/blog/yarn-containers-deep-dive#virtual-memory-check-default-on" class="hash-link" aria-label="Direct link to Virtual Memory Check (default on)" title="Direct link to Virtual Memory Check (default on)" translate="no">​</a></h3>
<p>YARN monitors the virtual memory (VM size) of each container process. If it exceeds <code>yarn.nodemanager.vmem-pmem-ratio</code> times the allocated physical memory, the container is killed:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.vmem-check-enabled</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.vmem-pmem-ratio</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">2.1</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- Default; container can use 2.1x its physical memory as virtual memory --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>This is a common source of spurious container kills on modern Linux systems where JVM virtual memory usage appears large. If you see <code>Container killed due to exceeding virtual memory limits</code>, either increase the ratio or disable the check:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.vmem-check-enabled</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">false</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="physical-memory-check">Physical Memory Check<a href="https://hadoop.so/blog/yarn-containers-deep-dive#physical-memory-check" class="hash-link" aria-label="Direct link to Physical Memory Check" title="Direct link to Physical Memory Check" translate="no">​</a></h3>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.pmem-check-enabled</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>Physical memory enforcement kills containers that exceed their allocated memory in RAM. This prevents one container from starving others on the same node.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cgroups-enforcement-recommended">cgroups Enforcement (recommended)<a href="https://hadoop.so/blog/yarn-containers-deep-dive#cgroups-enforcement-recommended" class="hash-link" aria-label="Direct link to cgroups Enforcement (recommended)" title="Direct link to cgroups Enforcement (recommended)" translate="no">​</a></h3>
<p>The most reliable enforcement uses Linux cgroups. The NodeManager enforces both CPU and memory limits at the OS level:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.container-executor.class</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.linux-container-executor.cgroups.hierarchy</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">/hadoop-yarn</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.linux-container-executor.cgroups.mount</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.nodemanager.resource.cpu.enabled</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">true</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="scheduler-types">Scheduler Types<a href="https://hadoop.so/blog/yarn-containers-deep-dive#scheduler-types" class="hash-link" aria-label="Direct link to Scheduler Types" title="Direct link to Scheduler Types" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="capacity-scheduler-default">Capacity Scheduler (default)<a href="https://hadoop.so/blog/yarn-containers-deep-dive#capacity-scheduler-default" class="hash-link" aria-label="Direct link to Capacity Scheduler (default)" title="Direct link to Capacity Scheduler (default)" translate="no">​</a></h3>
<p>The Capacity Scheduler divides cluster resources into queues, each with a guaranteed minimum capacity and an optional maximum:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- capacity-scheduler.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.scheduler.capacity.root.queues</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">default,production,development</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.scheduler.capacity.root.production.capacity</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">60</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- 60% of cluster --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.scheduler.capacity.root.development.capacity</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">30</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.scheduler.capacity.root.default.capacity</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">10</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>Queues can borrow unused capacity from sibling queues (elastic sharing) and return it when the owner queue needs it.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="fair-scheduler">Fair Scheduler<a href="https://hadoop.so/blog/yarn-containers-deep-dive#fair-scheduler" class="hash-link" aria-label="Direct link to Fair Scheduler" title="Direct link to Fair Scheduler" translate="no">​</a></h3>
<p>The Fair Scheduler distributes resources so that all running applications get an equal share over time. Good for multi-tenant environments with diverse workload sizes:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- yarn-site.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">yarn.resourcemanager.scheduler.class</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="diagnosing-container-issues">Diagnosing Container Issues<a href="https://hadoop.so/blog/yarn-containers-deep-dive#diagnosing-container-issues" class="hash-link" aria-label="Direct link to Diagnosing Container Issues" title="Direct link to Diagnosing Container Issues" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="application-stuck-waiting-for-resources">Application stuck waiting for resources<a href="https://hadoop.so/blog/yarn-containers-deep-dive#application-stuck-waiting-for-resources" class="hash-link" aria-label="Direct link to Application stuck waiting for resources" title="Direct link to Application stuck waiting for resources" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Check what resources are available</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">yarn node -list -all</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Check scheduler queue utilization</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">yarn queue -status default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Check ResourceManager logs for why containers aren't being assigned</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Look for: "Application ... is waiting for ..."</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="container-killed-oom">Container killed: OOM<a href="https://hadoop.so/blog/yarn-containers-deep-dive#container-killed-oom" class="hash-link" aria-label="Direct link to Container killed: OOM" title="Direct link to Container killed: OOM" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Check NodeManager logs on the host where the container ran</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Look for: "Container killed due to Physical memory limit"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Or check YARN application logs:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">yarn logs -applicationId application_XXXX_XXXX</span><br></span></code></pre></div></div>
<p>Increase the container memory allocation in your job configuration:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># MapReduce</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">mapred job -Dmapreduce.map.memory.mb=4096 -Dmapreduce.reduce.memory.mb=8192</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="container-killed-virtual-memory">Container killed: virtual memory<a href="https://hadoop.so/blog/yarn-containers-deep-dive#container-killed-virtual-memory" class="hash-link" aria-label="Direct link to Container killed: virtual memory" title="Direct link to Container killed: virtual memory" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Quick fix: disable vmem check in yarn-site.xml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Better fix: understand JVM memory layout on your OS</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># The JVM reserves large amounts of virtual address space for mapped libraries</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="sizing-containers-for-common-workloads">Sizing Containers for Common Workloads<a href="https://hadoop.so/blog/yarn-containers-deep-dive#sizing-containers-for-common-workloads" class="hash-link" aria-label="Direct link to Sizing Containers for Common Workloads" title="Direct link to Sizing Containers for Common Workloads" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="mapreduce">MapReduce<a href="https://hadoop.so/blog/yarn-containers-deep-dive#mapreduce" class="hash-link" aria-label="Direct link to MapReduce" title="Direct link to MapReduce" translate="no">​</a></h3>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- mapred-site.xml --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">mapreduce.map.memory.mb</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">2048</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">mapreduce.map.java.opts</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">-Xmx1638m</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- 80% of container memory for JVM heap --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">mapreduce.reduce.memory.mb</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">4096</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">mapreduce.reduce.java.opts</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">name</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">-Xmx3276m</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">value</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">property</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="spark-on-yarn">Spark on YARN<a href="https://hadoop.so/blog/yarn-containers-deep-dive#spark-on-yarn" class="hash-link" aria-label="Direct link to Spark on YARN" title="Direct link to Spark on YARN" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">spark-submit \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --master yarn \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --deploy-mode cluster \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --executor-memory 8g \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --executor-cores 4 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --num-executors 20 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --conf spark.yarn.executor.memoryOverhead=1024 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  my_spark_app.jar</span><br></span></code></pre></div></div>
<p><code>memoryOverhead</code> covers JVM off-heap memory (direct buffers, native libraries). Set it to at least 384MB or 10% of executor memory, whichever is larger.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary">Summary<a href="https://hadoop.so/blog/yarn-containers-deep-dive#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary" translate="no">​</a></h2>
<table><thead><tr><th>Concept</th><th>Key Point</th></tr></thead><tbody><tr><td>Container</td><td>CPU + memory reservation on a NodeManager</td></tr><tr><td>AM negotiates</td><td>ApplicationMaster requests containers from ResourceManager</td></tr><tr><td>Locality</td><td>Node-local &gt; rack-local &gt; off-rack</td></tr><tr><td>vmem check</td><td>Common source of spurious kills; consider disabling on modern Linux</td></tr><tr><td>cgroups</td><td>Best enforcement; use LinuxContainerExecutor</td></tr><tr><td>Capacity Scheduler</td><td>Guaranteed queues with elastic borrowing</td></tr><tr><td>Fair Scheduler</td><td>Equal sharing over time for multi-tenant</td></tr><tr><td>Heap = 80% of MB</td><td>Set JVM -Xmx to ~80% of container memory</td></tr></tbody></table>
<p>Mastering YARN container allocation turns resource utilization from a guessing game into a predictable, measurable engineering discipline. Profile your actual workloads, size containers deliberately, and use queue priorities to give critical jobs the resources they need.</p>]]></content>
        <author>
            <name>Hadoop.so Editorial Team</name>
            <uri>https://hadoop.so</uri>
        </author>
        <category label="Hadoop" term="Hadoop"/>
        <category label="YARN" term="YARN"/>
        <category label="Performance" term="Performance"/>
    </entry>
</feed>