Course description
Join Hadoop expert Kevin McCarty as he takes a high level look at Hadoop beginning with its history. Next, McCarty examines a number of key components in the Hadoop ecosystem used for storage, processing, data ingest, and transformation. He will show how Hadoop addresses problems that plague large systems such as failover and redundant storage as well as explore how an organization might incorporate Hadoop into their existing IT framework.
Prerequisites
This course assumes that students have some programming background and some familiarity with a Unix-based operating system. No specific experience with Java programming language or Hadoop is required. As with any such course, the more experience you bring to the course, the more you’ll get out of it. This course moves quickly through a broad range of topics, but it does not require any prior experience with Hadoop.
The course does assume that you are well familiarized with how to use the version of Windows that you are running. For example, the course might say simply “Open PuTTY” without explaining how to do that. You should also be able to navigate the folder hierarchy using Windows Explorer.
Learning Paths
This course is part of the following LearnNowOnline SuccessPaths™:
Hadoop
Meet the expert
Kevin McCarty is a computer professional with over 30 years of experience in the industry as a programmer, project manager, database administrator, architect, and data scientist. He is a Microsoft Certified Trainer with over 25 individual certifications in programming and database technologies and serves as the chapter leader of the Boise SQL Server Users Group. A former Army officer and Eagle Scout, he holds a doctorate in Computer Science and a lifelong love of learning.
Course outline
Hadoop Architecture
Prerequisites (20:34)
- Introduction (00:26)
- Hadoop Sandbox (01:00)
- PuTTY (00:38)
- Linux (00:26)
- Linux BASH (00:28)
- Linux Commands (00:28)
- WinSCP (01:04)
- Notepad++ (00:31)
- Demo: Open Source Tools (03:50)
- Demo: BASH (05:37)
- Demo: Other Tools (05:42)
- Summary (00:20)
Introduction (19:44)
- Introduction (00:34)
- The Lure of Big Data (01:24)
- What Is Big Data? (03:45)
- Where Do We Get Big Data? (01:52)
- Types of Big Data (00:32)
- Managing Big Data (01:24)
- The Goal of Big Data (01:07)
- Companies Using Big Data Today (01:17)
- The Challenge of Big Data (05:19)
- How Do We Process Big Data? (01:56)
- Summary (00:29)
History (08:30)
- Introduction (00:23)
- The Motivation for Hadoop (00:53)
- Enter Hadoop (00:34)
- History of Hadoop (03:51)
- What Hadoop Provides (01:20)
- Major Users of Hadoop (00:22)
- The Future (00:39)
- Summary (00:24)
Architecture (07:07)
- Introduction (00:26)
- Hadoop's Architecture (01:17)
- HDFS - Name Node (01:09)
- HDFS - DataNode (01:16)
- Hadoop Architecture (00:28)
- Job and Task Tracker (01:01)
- Hadoop Architecture (01:03)
- Summary (00:25)
Ecosystems (16:18)
- Introduction (00:24)
- Hadoop Ecosystem (00:37)
- Zookeeper - Motivation (00:33)
- What Is ZooKeeper (01:22)
- Data Ingest (01:36)
- Apache Flume (01:07)
- Flume Overview (00:47)
- Apache Sqoop (01:04)
- Sqoop Example (00:12)
- Pig (02:07)
- Example Pig Script (00:24)
- Apache Hive (01:12)
- Example Hive Script (00:37)
- HBase (00:34)
- HBase Is... (00:52)
- Oozie - Job Workflow (00:34)
- Mahout (01:16)
- Mahout Use Cases (00:26)
- Summary (00:25)
Data Delivery in Hadoop (08:10)
- Introduction (00:26)
- Hadoop in the Enterprise (02:14)
- Demo: Hadoop Sandbox (04:53)
- Summary (00:34)
HDFS Basics (21:33)
- Introduction (00:32)
- Motivation for HDFS (00:49)
- Hadoop Distributed Architecture (02:22)
- HDFS (00:55)
- HDFS - Nodes (00:38)
- HDFS NameNode (02:26)
- HDFS - Secondary NameNode (03:00)
- HDFS - Standby NameNode (00:26)
- HDFS - DataNode (03:02)
- Demo: HDFS (04:52)
- Demo: Other Commands (01:56)
- Summary (00:32)