Course description
Tackle Hadoop tools and services like NiFi, YARN, and Flume as well as the Spark shell, an alternative to MapReduce. Discover why Hadoop has such a large and growing following among sys admins and data scientists. Learning how Hadoop has something for just about everybody to gain and maintain competitive advantage.
Prerequisites
You should have some programming background and some familiarity with a Unix-based operating system. No specific experience with Java programming language or Hadoop is required. As with any such course, the more experience you bring to the course, the more you’ll get out of it. This course moves quickly through a broad range of topics, but it does not require any prior experience with Hadoop. The course does assume that you are well familiarized with how to use the version of Windows that you are running. For example, the course might say simply “Open PuTTY” without explaining how to do that. You should also be able to navigate the folder hierarchy using Windows Explorer.
Learning Paths
This course is part of the following LearnNowOnline SuccessPaths™:
Hadoop
Meet the expert
Kevin McCarty is a computer professional with over 30 years of experience in the industry as a programmer, project manager, database administrator, architect, and data scientist. He is a Microsoft Certified Trainer with over 25 individual certifications in programming and database technologies and serves as the chapter leader of the Boise SQL Server Users Group. A former Army officer and Eagle Scout, he holds a doctorate in Computer Science and a lifelong love of learning.
Course outline
YARN
YARN Basics (09:27)
- Introduction (00:40)
- Scalable Computing (02:04)
- Scalable Computing - YARN (00:37)
- YARN (00:42)
- Hadoop = YARN + HDFS (00:36)
- Managing Data and Processes (01:43)
- Limitations of MapReduce v1 (01:42)
- YARN Processing (00:50)
- Summary (00:29)
YARN Services (15:17)
- Introduction (00:19)
- YARN Daemons (00:31)
- Resource Manager (01:18)
- Node Manager (01:13)
- Job History Server (00:17)
- Application Master (00:34)
- Tasks in YARN (00:23)
- YARN Architecture (01:07)
- Demo: YARN Tools (03:36)
- Demo: YARN Parameters (05:30)
- Summary (00:25)
Tez and Spark (10:48)
- Introduction (00:39)
- Hadoop Structure (01:11)
- Tez (01:41)
- What is Apache Spark? (02:41)
- Benefits of Spark vs. MapReduce (00:59)
- Spark Framework (00:46)
- Spark Languages (02:09)
- Summary (00:38)
The Spark Shell (08:05)
- Introduction (00:38)
- Demo: The Spark Shell (03:11)
- Demo: Using Scala (03:18)
- Summary (00:56)
Flume, Linux, and Nifi (19:15)
- Introduction (01:16)
- What Is Flume? (01:14)
- What Does Flume Do? (01:02)
- Flume Overview (01:08)
- Linux ETL (00:44)
- What is Nifi? (02:03)
- Working with Nifi (00:56)
- Nifi Capabilities (00:52)
- Demo: Linux ETL (04:43)
- Demo: More Linux Commands (04:27)
- Summary (00:47)
Nifi
Installing Nifi (13:53)
- Introduction (00:29)
- Demo: Install Nifi (03:50)
- Demo: install-nifi.sh (03:30)
- Demo: Run Nifi (05:31)
- Summary (00:31)
Nifi Components (10:21)
- Introduction (00:37)
- Nifi Components (01:12)
- The FlowFile (00:40)
- The GetFile Component (00:37)
- The UnpackContent Component (00:43)
- The ControlRate Component (00:37)
- The EvaluateXPath Component (00:47)
- The SplitXML Component (00:28)
- The UpdateAttribute Component (01:39)
- The AttributesToJSON Component (01:23)
- The MergeContent Component (00:22)
- The PutFile Component (00:12)
- The RouteOnAttribute Component (00:25)
- Summary (00:30)
Nifi Workflow (12:23)
- Introduction (00:33)
- Demo: Nifi Workflow (03:16)
- Demo: Processors (05:30)
- Demo: Source File (02:23)
- Summary (00:38)
Configuring Nifi Workflow (32:56)
- Introduction (00:46)
- Demo: GetFile, Unpack Content, and ControlRate (06:14)
- Demo: Evaluate XPath and Split XML (04:07)
- Demo: Update Attribute and Parse Records (04:54)
- Demo: Route on Attribute (04:30)
- Demo: MergeContent and PutFile (05:26)
- Demo: Debugging (06:04)
- Summary (00:52)