Big Data Masters

Data is an essential part of any organization. Every organization generates a massive amount of real-time or batch data. This is where Big data plays a vital role irrespective of domain and industry. This complete course is designed to fulfill such requirements so that we will be able to work with a humongous amount of data. You will be able to create your Big Data Engine in your organization by implementing various big data stacks used across the industry.

Start Date: 26th June 2021
Class Timings: 10:30 AM - 12:30 PM IST Saturday and Sunday
Doubt Clearing Session: 8 PM IST Every Tuesday
4.45 (31 Reviews)
Language: English

Course Overview

Data is an essential part of any organization. Every organization generates a massive amount of real-time or batch data. This is where Big data plays a vital role irrespective of domain and industry. This complete course is designed to fulfill such requirements so that we will be able to work with a humongous amount of data. You will be able to create your Big Data Engine in your organization by implementing various big data stacks used across the industry.

What you'll learn
  • 30+ Big Data Technologies
  • Big Data Engine Creation
  • Streaming and Batch Processing of Data
  • Various SQL Databases
  • Various NOSQL Databases
  • Real-Time Implementation
  • Spark
  • Hive
  • Talend
  • Informatica
  • Hadoop Distributions
  • Deployment
  • DataBricks Implementation
Requirements
  • Minimum system requirement i3 or higher
  • Dedication

Course Curriculum

  • Why Is Data So Important?
  • Pre-Requisite – Data Scale
  • What Is Big Data?
  • Big Bank: Big Challenge
  • Common Problems
  • 3 Vs Of Big Data
  • Defining Big Data
  • Sources Of Data Flood
  • Exploding Data Problem
  • Redefining The Challenges Of Big Data
  • Possible Solutions: Scaling Up Vs. Scaling Out
  • Challenges Of Scaling Out
  • Solution For Data Explosion-Hadoop
  • Hadoop: Introduction
  • Hadoop In Layman's Term
  • Hadoop Ecosystem
  • Evolutionary Features Of Hadoop
  • Hadoop Timeline
  • Why Learn Big Data Technologies?
  • Who Is Using Big Data?
  • HDFS: Introduction
  • Design Of HDFS
  • Why Hadoop Cluster?
  • HDFS Blocks
  • Components Of Hadoop 3
  • NameNode And Hadoop Cluster
  • Arrangement Of Racks
  • Arrangement Of Machines And Racks
  • Local FS And HDFS
  • NameNode
  • Checkpointing
  • Replica Placement
  • Benefits-Replica Placement And Rack Awareness
  • URI
  • URL And URN
  • HDFS Commands
  • Problems With HDFS In Hadoop 1.X
  • HDFS Federation
  • High Availability
  • Anatomy Of File Read From HDFS
  • Data Read Steps
  • Important Java Classes To Write Data To HDFS
  • Anatomy Of File Write To HDFS
  • Writing File To HDFS: Steps
  • Building Principles
  • InputSplit
  • InputSplit And Data Blocks – Difference
  • Why Is The Block Size 128 MB?
  • RecordReader
  • InputFormat
  • Default Inputformat : TextInputFormat
  • OutputFormat
  • Using A Different OutputFormat
  • Important Points
  • Partitioner
  • Using Partitioner
  • Map Only Job
  • Flow Of Operations In MapReduce
  • Serialization In MapReduce
  • Schedulers In YARN
  • FIFO Scheduler
  • Capacity Scheduler
  • Fair Scheduler
  • Differences Between Hadoop 1.X And Hadoop 2.X and hadoop 3.X
  • Introduction
  • Hive DDL
  • Demo: Databases.Ddl
  • Demo: Tables.Ddl
  • Hive Views
  • Demo: Views.Ddl
  • Architecture
  • Primary Data Types
  • Data Load
  • Demo: ImportExport.Dml
  • Demo: HiveQueries.Dml
  • Demo: Explain.Hql Table Types
  • Demo: ExternalTable.Ddl
  • Complex Data Types
  • Demo: Working With Complex Datatypes
  • Hive Variables
  • Demo: Working With Hive Variables
  • Hive Variables And Execution Customisation
  • Working With Arrays
  • Sort By And Order By
  • Distribute By And Cluster By
  • Partitioning
  • Static And Dynamic Partitioning
  • Bucketing Vs Partitioning
  • Joins And Types
  • Bucket-Map Join
  • Sort-Merge-Bucket-Map Join
  • Left Semi Join
  • Demo: Join Optimisations
  • Input Formats In Hive
  • Sequence Files In Hive
  • RC File In Hive
  • File Formats In Hive
  • ORC Files In Hive
  • Inline Index In ORC Files
  • ORC File Configurations In Hive
  • SerDe In Hive
  • Demo: CSVSerDe
  • JSONSerDe
  • RegexSerDe
  • Analytic And Windowing In Hive
  • Demo: Analytics.Hql
  • Hcatalog In Hive
  • Demo: Using_HCatalog
  • Accessing Hive With JDBC
  • Demo: HiveQueries.Java
  • HiveServer2 And Beeline
  • Demo: Beeline
  • UDF In Hive
  • Demo: ToUpper.Java And Working_with_UDF
  • Optimizations In Hive
  • Demo: Optimizations
  • Challenges With Traditional RDBMS
  • Features Of NoSQL Databases
  • NoSQL Database Types
  • CAP Theorem
  • What Is HBase Regions
  • HBase HMaster ZooKeeper
  • HBase First Read
  • HBase Meta Table
  • Region Split
  • Apache HBase Architecture Benefits
  • HBase Vs. RDBMS
  • Shell Commands
  • Sqoop Architecture
  • Sqoop Features
  • Sqoop Hands On
  • Python Core
  • Introduction of python and comparison with other
  • Programming language
  • Installation of Anaconda Distribution and other python
  • IDE Python Objects, Number & Booleans, Strings
  • Container objects, Mutability of objects
  • Operators Arithmetic, Bitwise, C omparison and Assignment o perators, Operators Precedence and associativity
  • Conditions(If else,if elif else) Loops(While ,for)
  • Break and Continue statement and Range Function.
  • String Objects And Collections
  • String object basics
  • String methods
  • Splitting and Joining Strings
  • String format functions
  • List object basics
  • List as stack and Queues
  • List comprehensions
  • Tuples,Set ,Dictionaries Functions
  • Tuples,Sets Dictionary Object basics, Dictionary Object methods, Dictionary View Objects.
  • Functions basics, Parameter passing, Iterators Generator functions
  • Lambda functions
  • Map , Reduce, Filter functions
  • OOPS Concepts Working With Files
  • OOPS basic concepts
  • Creating classes and Objects Inheritance
  • Multiple Inheritance
  • Working with files
  • Reading and writing files
  • Buffered read and write
  • Other File methods
  • Exception Handling Database Programming
  • Using Standard Module
  • Creating new modules
  • Exceptions Handling with Try except
  • Creating ,inserting and retrieving Table
  • Updating and deleting the data
  • Installing and configuring MySQL
  • Install and Configure MySQL Client
  • DDL- Create database/table, Drop, Alter, etc
  • DML - INSERT, DELETE, UPDATE, MERGE etc
  • DML - INSERT, DELETE, UPDATE, MERGE etc
  • DQL - SELECT,etc
  • JOINS - One Many, Many Many
  • DISTINCT
  • ORDER BY
  • LIMIT
  • WILD CARDS
  • LOGICAL OPERATORS - LIKE, EQUAL, AND, OR etc
  • STRING Functions
  • DATE Functions
  • MATH Functions
  • COUNT, MIN and MAX
  • SUM
  • AVG
  • LAG and LEAD function Examples
  • Top N Analysis
  • ROW_NUMBER
  • RANK AND DENSE_RANK
  • CASE WHEN
  • PIVOT
  • LISTAGG
  • UNION
  • Sub-Queries
  • EXISTS
  • NOT EXISTS
  • WITH CLAUSE
  • Recursive WITH & CTE
  • Regular Expressions in SQL
  • Cassandra Introduction
  • Cassandra Installation in local system
  • DATASTAX Cassandra setup
  • Cassandra ArchitectureCassandra Queries
  • MondoDB Introduction
  • MondoDB Compass Setup
  • MongoDB Atlas Setup
  • MondoDB Architecture
  • MondoDB Queries
  • Introduction To Apache Spark
  • Map Reduce Limitations
  • RDD's
  • Spark Context - SQLContext And HiveContext
  • Programming With RDD's
  • Creating RDD's From Text-Files
  • Transformations And Actions
  • How Does Spark Execution Work
  • RDD API's - Filter
  • FlatMap
  • Fold
  • Foreach
  • Glom
  • GroupBy
  • Map
  • ReduceByKey
  • Zip
  • Persist
  • Unpersist
  • Read/Write From Storage
  • RDD Examples
  • RDD API's - Aggregate
  • Cartesian
  • Checkpoint
  • Coalesce
  • Reparition
  • Cogroup
  • CollectAsMap
  • CombineByKey
  • Count And CountApprox Functions
  • More RDD Examples
  • Schema - StructType
  • StructFields
  • DataType
  • DataFrame API's And Examples
  • Create Temporary Tables
  • SparkSQL
  • Spark Dataset
  • Parquet Vs Avro
  • Examples And Problem Solving On Real Data Using RDD And Converting The Same To Dataframe
  • Create A Spark Project
  • SBT / Maven
  • How Do Maven Repo Work
  • Accumulators
  • BroadCast Variables
  • Query Execution Plan
  • Internal Of Spark Workings
  • Databricks Introduction
  • Databricks Setup
  • Databricks Integration with cloud
  • Databricks OPS Pipeline
  • Databricks in Production
  • Introduction To Kafka
  • Kakfa Architecture
  • Kafka Key Consepts/Fundamentals
  • Overview Of Zookeeper And It’s Role In Kafka Cluster
  • Cluster, Nodes, Brokers, Topics Consumer, Producers, Logs, Partitions Consept Of Consumer Groups
  • Leader & Follower Partition
  • Installing One Node Kafka Cluster On Local Installing Multinode Kafka Cluster On Losal Command Line Producer And Consumer Replisation Consept For Fault Tolerance How Data Is Stored In Brokers
  • Log Segments, Message Offsets, Message Index
  • Isr List / Minimum Isr
  • Committed Vs Uncommited Messages Writing A Kafka Producer In Java Writing A Kafka Consumer In Java Scaling Up The Kafka Cluster Achieving Exactly Once Semantics
  • Integrating Kafka With Spark Structured Streaming.
  • Introduction To Airflow And Its Usage What Is Workflow
  • Cron-Job Creation Example Airflow Additional Features
  • Airflow Architecture And Components Airflow Installation Demo
  • Dags-Creating A Simple Helloworld Dag Introduction To Tasks And Operators
  • Viewing The DAG In Ui-Graph View, Tree View, Logs Viewing
  • Example Showcasing Bash Operators Usage Setting Precedence Among Various Tasks Lifecycle OfATask-Understanding Various Stages About Trigger_rules & Understanding With Example Airflow Artifact - More On Operators
  • Writing Our Own Custom Operators Walkthrough Of Airflow UI
  • Connections To Various Datastores & Variables
  • Working With Connections, Understanding Sensors — Demo
  • Building an end-to-end customer-360 pipeline using Airflow involving data collection from various sources, processing in spark, loading the processed data in hive and uploading the same to HBase and generating a notification about success of the pipeline to the downstream applications.
  • Kind of Processing
  • What is Real-time Processing
  • The Importance of Real-time Processing
  • Batch processing vs Real-tim Stream Processing Spark Streaming Data
  • Spark dissretized stream or DStream Batch & Batch Interval
  • Do Spark is a real-time streaming engine Stream Processing in Spark Transformed DStream
  • Understanding Producer & Consumer Practisal on Real•time Processing Stream Transformations
  • Stateless Transformations Stateful Transformations Window Operations
  • Batch Interval Window Size Sliding Interval
  • Practical on Stateless Transformation Practisal on Stateful Transformation reduceByKey vs updateStateByKey Working With Sliding Window reduceByKeyAndWindow Transformation reduceByWindow Transformation countByWindow Transformation
  • What Is Structured Streaming Requirement Of Strusture Streaming Limitations Of Spark Streaming Benefits Of Spark Structure Streaming
  • Practical • Wordcount Example On Structured Streaming
  • Dynamically Setting The ShuPle Partitions Data Stream Writer Output Modes
  • Datastream Output Modes - append, update & complete
  • Spark Streaming Graceful Shutdown
  • How Does Spark Streaming Code Executes Internally How a Job Converted to Micro batches
  • Trigger Point For Micro Batches
  • Types of Triggers • unspecified, time interval, one time, continuous
  • Types of Data Sourses • Sosket Source, Rate Source, File Source, Kafka Source
  • Limitations of socket source Prastisal on File Data Source
  • Types of Spark Streaming Output Data Options Fault Tolerance and Exastly Onse Guarantee Understanding Checkpoint Location
  • Stateful vs Stateless Transformations
  • Managed Stateful Operations vs UnManaged Stateful Operations
  • Types of Aggregations - Continuous Aggregations vs Time Bound Aggregations
  • Window Transformations
  • UpdateStateByKey, reduceByKeyAndWindow, reduceByWindow, countByWindow
  • Types of windows - Tumbling Time Window, Sliding Time Window
  • Dealing With Late Coming Records Using Watermark
  • State Store Cleanup
  • Calculating the Watermark Boundary Streaming Joins
  • Streaming Dataframe to static dataframe
  • Streaming Dataframe With Another Streaming Dataframes
  • AWS EMR (Elastic MapReduce):
  • What is a VM (Virtual Machine) On-Premise vs Cloud Setup
  • Major Vendors of Hadoop Distribution Why Cloud & Big Data on Cloud Major Cloud Providers of Bigdata What is EMR
  • Hdfs vs S3 What Is 53
  • Important Instances in AWS Kinds of Nodes in Cluster
  • Transient vs Long Running Cluster Running Spark Code on Emr
  • How to Track Your Job
  • Copy File From S3 to Local Zeppelin Notebook
  • Types of EC2 Instances How to Create a VM What is a Keypair Elastic IP
  • AWS Storage, Networking & CLI Instance Store
  • S3 & EBS
  • Public ip Vs Private Ip Network Switches Security Group
  • Aws Command Line Interface
  • Launch A Emr Cluster Using Advanced Options
  • AWS Athena
  • What is Athena?
  • When do we require Athena What problem Athena Solve How Athena Works
  • Athena Pricing
  • Athena Practical Demonstration
  • How to create a normal table manually on csv data residing in s3
  • How to minimize data scanning in Athena How to create partition table on Parquet file
  • Infering Schema automatically using AWS Glue
  • AWS Glue
  • What is AWS Glue? Introduction To Glue Features of Glue AWS Glue Benefits
  • AWS Glue Terminology
  • Pointing to Specific Data Stores and Endpoints Glue Data Catalogue
  • Crawlers
  • Connecting to Your Data Store Using Crawlers for Catalogue Tables
  • Overview and Working of Glue Jobs Adding New Jobs in Glue
  • Triggering Jobs and Their Scheduling
  • AWS Redshift
  • Database vs Data Warehouse vs Data Lake Introduction to Amazon Redshift
  • Benefits of Amazon Redshift Use Cases of Amazon Redshift
  • Redshift Master Slave Architecture Types of Nodes
  • Redshift Spectrum Redshift Fault Tolerance Redshift Sort Keys
  • Redshift Distribution Styles Practical Demonstration
  • Basic statistics
  • Data sources
  • Pipelines
  • Extracting, transforming and selecting features
  • Classification and Regression
  • Clustering
  • Collaborative filtering
  • Frequent Pattern Mining
  • Model selection and tuning
  • Advanced topics
  • Introduction to ETL from Talend Studio- Integration with HDFS, Hive, Sqoop, Spark etc
  • Introduction to ETL from Informatica BDM- Integration with HDFS, Hive, Sqoop, Spark etc
  • End-to-end Big Data Pipeline Engine PROJECT
  • Involving all Major components like
  • Sqoop, Hdfs, Hive, Hbase, Spark... etc.
  • Interview Preparation Tips
  • Sample Resume
  • 300+ Mock Interview Recordings
  • Mock Interview QA
  • Interview Questions
  • How to Handle Various Interview Round Qs
  • Career Guidance
  • One to One Resume Discussion
  • Certification
4.45 out of 5.0
1 Star 9.7%
2 Star 3.2%
3 Star 3.2%
4 Star 0.0%
5 Star 83.9%
Sudhanshu Kumar

Having 7+ years of experience in Big data, Data Science and Analytics with product architecture design and delivery. Worked in various product and service based Company. Having an experience of 5+ years in educating people and helping them to make a career transition.

Reviews

Sudeepth Pokkuluri
September 14,2021
5.00

" Course was really going good with good explanation. First i thought it is boring and not useful but i realized big data is vast syllabus and will take some time to understand after going to each class. Now i\'m very comfortable and Sourav is really good in explaining.I watched recording classes if i didn\'t understand properly. Finally Patience and dedication matters. "

Sahaj Tomar
September 13,2021
5.00

" Course is designed specifically considering industrial view point. Saurav covers theory in lot of detail, followed by hands on. Must recommended for anyone who wants to dive in big data. "

chidi Henry
September 12,2021
5.00

" firstly i need to thank the course instructor for well explain concept and his approach of teach is excellent , he also allow student to contribute if need be and when to hold on. such an excellent instructor and well organize scheme. thanks to ineuron team "

Submit Reviews

You can not rate this course before login

Join Thousand of Happy Students!

Subscribe our newsletter & get latest news and updation!