11 posts tagged with "Hadoop"

Apache Hadoop news and guides

What's New in Apache Hadoop 3

April 28, 2026 · 2 min read

Big Data Engineers

Apache Hadoop 3.x was a landmark release that brought significant improvements to performance, reliability, and scalability. Here's a quick tour of the most important changes.

Welcome to hadoop.so

April 25, 2026 · One min read

Hadoop.so Editorial Team

Big Data Engineers

Welcome to hadoop.so — your comprehensive resource for learning and mastering Apache Hadoop and the broader big data ecosystem.

Upgrading from Hadoop 2 to Hadoop 3: A Complete How-To

April 24, 2026 · 5 min read

Hadoop.so Editorial Team

Big Data Engineers

Hadoop 3.x introduced erasure coding, YARN Timeline Service v2, multiple NameNode support, and significant performance improvements. If you're still running Hadoop 2.x, this guide walks through a safe, rolling upgrade path — without losing data or taking extended downtime.

Using Hadoop with Amazon S3: The S3A Connector Explained

April 23, 2026 · 5 min read

Hadoop.so Editorial Team

Big Data Engineers

The s3a:// filesystem connector in Hadoop lets you use Amazon S3 as a drop-in replacement for HDFS storage. It's the foundation for cost-effective data lake architectures where compute and storage are decoupled. This guide covers configuration, performance tuning, and production best practices.

Securing Your Hadoop DataNode: Kerberos, Wire Encryption, and Best Practices

April 22, 2026 · 5 min read

Hadoop.so Editorial Team

Big Data Engineers

An unsecured Hadoop cluster is a ticking time bomb. Without authentication, any user on the network can read, write, or delete HDFS data. This guide covers the essential security layers for HDFS DataNodes: Kerberos authentication, data transfer encryption, block access tokens, and OS-level hardening.

Hadoop and Java: A Version Compatibility Guide

April 21, 2026 · 5 min read

Hadoop.so Editorial Team

Big Data Engineers

Picking the wrong Java version for your Hadoop cluster is one of the most common causes of cryptic build failures, runtime exceptions, and upgrade blockers. This guide maps Hadoop releases to their supported Java versions, explains what changed between Java versions, and offers practical recommendations for 2025.

YARN Containers Deep Dive: How Resource Allocation Really Works

April 20, 2026 · 6 min read

Hadoop.so Editorial Team

Big Data Engineers

YARN (Yet Another Resource Negotiator) is Hadoop's cluster resource management layer. Understanding how YARN allocates containers — the fundamental unit of computation — is essential for getting good utilization and avoiding the frustrating "application is waiting for resources" message that plagues many clusters.

YARN vs Kubernetes: Which Should Orchestrate Your Big Data Workloads?

April 19, 2026 · 6 min read

Hadoop.so Editorial Team

Big Data Engineers

Kubernetes has become the default orchestration platform for containerized applications. But should you migrate your Hadoop workloads off YARN onto Kubernetes? The answer depends heavily on your workload patterns, team expertise, and existing infrastructure. This post compares both platforms head-to-head.

Hive vs Presto vs Trino: Choosing a SQL Engine for Your Data Lake

April 18, 2026 · 6 min read

Hadoop.so Editorial Team

Big Data Engineers

Three SQL engines dominate the Hadoop data lake landscape: Apache Hive, Presto, and Trino (Presto's open-source fork). Each evolved to solve different problems. Picking the wrong one leads to either unbearably slow interactive queries or over-engineered infrastructure for simple batch ETL. Here's how they compare.

HBase vs Cassandra: Choosing a NoSQL Database for Big Data

April 17, 2026 · 7 min read

Hadoop.so Editorial Team

Big Data Engineers

Apache HBase and Apache Cassandra are the two most widely deployed NoSQL databases in the Hadoop ecosystem. Both handle massive datasets across distributed clusters, but they have fundamentally different architectures that make each excel in different scenarios. This post cuts through the marketing and gives you a practical comparison.