Course Overview:
The Big Data Foundation certification is designed to provide candidates with a well-rounded understanding of big data. It covers the potential data sources that can be used for solving real business problems and an overview of data mining and the tools used in it.
This is a fundamental course with practical, hands-on, exercises to experience using two of the most popular technologies in big data processing – Hadoop and MongoDB. Candidates will get the opportunity to practice installing these two technologies through lab exercises. The course exposes candidates to real-life big data technologies with the purpose of obtaining results from real datasets, including major social media platforms.
After completing the course, candidates will be equipped with fundamental big data knowledge, and introduced to a working development environment containing Hadoop and MongoDB, installed by them. This practical knowledge can be used as a starting point in the journey into big data.
Course Objectives:
- Explain Big Data, its origin, and its characteristics.
- Discuss the tools applicable to Big Data processing.
- Explain data mining.
- Discuss the popular Big Data technologies – Hadoop and MongoDB.
- Discuss the Big Data projects and the main players involved.
- Identify and obtain relevant datasets when looking at a business problem.
- Install and manage Big Data processing environments based on Hadoop or MongoDB at a departmental level.
Course Prerequisites
There are no prerequisites for this course.
Who Should Attend?
This course is best suited to IT professionals who possess intermediate to advanced programming, system administration, or relational database skills and are looking to move into the area of big data. These include:
- Software Engineers
- Application Developers
- IT Architects
- System Administrators
- Data Analysts and Scientists
Course Content:
Module 1. Big Data Fundamentals
1.1 Big Data – History, Overview, and Characteristics
● History
● Big Data Definition
● Big Data Benefits
● Big Data Characteristics – Volume, Velocity & Variety
1.2 Big Data Technologies – Overview
1.3 Big Data Success Stories
1.4 Big Data – Privacy and Ethics
● Privacy – Compliance
● Privacy – Challenges
● Privacy – Approach
● Ethics
1.5 Big Data Projects
● Who Should Be Involved?
● What Is Involved?
Module 2. Big Data Sources
2.1 Enterprise Data Sources
● Enterprise Systems
● Oracle
● SAP
● Microsoft
● Data Warehouses
● Unstructured Data – Introduction
● Unstructured Data – Metadata
2.2 Social Media Data Sources
● Introduction
● Facebook – Introduction
● Facebook – Public Feed API
● Facebook – Keyword Insights API
● Facebook – Graph API
● Twitter – Introduction
● Twitter – Streaming APIs
● Twitter – REST APIs
● Other Social Media
2.3 Public Data Sources
● Introduction
● Weather
● Economics
● Finance
● Regulatory Bodies
Module 3. Data Mining – Concepts and Tools
3.1 Data Mining – Introduction
● Introduction
● Types of Data Mining – Overview
● Types of Data Mining – Classification
● Types of Data Mining – Association
● Types of Data Mining – Clustering
3.2 Data Mining – Tools
● Introduction
● Weka
● Modules of Weka Applications
● KNIME
● KNIME – Example
● R Language
Module 4. Big Data Technologies – Hadoop
4.1 Hadoop Fundamentals
● Introduction
● Main Components of Hadoop
● Additional Components of Hadoop
4.2 Install and Configure
● Download
● How to Install and Configure
4.3 MapReduce
● Introduction
● How Does It Work?
4.4 Data Processing with Hadoop
● Introduction
● Twitter Sentiment Analysis – Overview
● Twitter Sentiment Analysis – Algorithm
● Network Log Analysis – Overview
● Network Log Analysis – Algorithm
Module 5. Big Data Technologies – MongoDB
5.1 MongoDB Fundamentals
● Introduction
● Replication
● Sharding
● Sharding and Replication
● MongoDB Ecosystem – Languages and Drivers
● MongoDB Ecosystem – Hadoop Integration
● MongoDB Ecosystem – Tools
5.2 Install and Configure
● Download
● How to Install and Configure
5.3 Document Databases
● Introduction
● Documents
● Document Design Considerations
● Fields
5.4 Data Modelling with Document Databases
● Introduction
● Twitter Sentiment Analysis
● Twitter Sentiment Analysis – Algorithm
● Network Log Analysis
● Network Log Analysis – Algorithm