Module-1: Introduction to Big Data
What is Data? Structured vs Semi-Structured vs Unstructured Data What is Big Data? 5Vs of Big Data Challenges with Traditional Databases Why Big Data Emerged Big Data Use Cases (Banking, E-commerce, Healthcare, etc.) Big Data Architecture

Module-2: Hadoop & Its Introduction
Hadoop Introduction Hadoop Architecture Hadoop Core Components Linux commands Hadoop Commands

Module-2: Hadoop and its Introduction

1. Hadoop Introduction

  • What is Hadoop?

  • Why Hadoop is needed

  • Problems solved by Hadoop

  • Evolution of Hadoop

  • Hadoop in Big Data ecosystem


2. Hadoop Architecture

  • Distributed Storage Concept

  • HDFS Architecture

    • NameNode

    • DataNode

    • Secondary NameNode

  • Block storage & Replication Factor

  • Rack Awareness

  • YARN Architecture

    • ResourceManager

    • NodeManager

    • ApplicationMaster


3. Hadoop Core Components

  • HDFS (Storage Layer)

  • YARN (Resource Management)

  • MapReduce (Processing Layer)

  • Overview of Hadoop Ecosystem tools

  • Hadoop Daemons


4. Linux Commands for Hadoop

  • Basic Linux File System

  • Navigation Commands (pwd, ls, cd)

  • File Operations (mkdir, rm, cp, mv)

  • Viewing Files (cat, head, tail)

  • vi Editor basics


5. Hadoop Commands (HDFS Shell)

  • hdfs dfs -ls

  • hdfs dfs -mkdir

  • hdfs dfs -put

  • hdfs dfs -get

  • hdfs dfs -rm

  • hdfs dfs -cat

  • hdfs dfs -du

  • hdfs dfs -df

  • Checking replication factor