Introduction to Data Storage and Processing
Setting the stage for MapReduce
Defining Hadoop Cluster Requirements
Building the cluster
Configuring a Cluster
Deploying MapReduce
Maximizing HDFS Robustness
Leveraging NameNode Federation
Introducing YARN
Managing Resources and Cluster Health
Maintaining HDFS
Administering MapReduce
Maintaining a Cluster
Performing Hadoop status checks
Extending Hadoop
Integrating additional elements of the ecosystem
Implementing Data Ingress and Egress
Acquiring application-specific data
Planning for Backup, Recovery and Security
- Installing the Hadoop Distributed File System (HDFS)
- Defining key design assumptions and architecture
- Configuring and setting up the file system
- Issuing commands from the console
- Reading and writing files
Setting the stage for MapReduce
- Reviewing the MapReduce approach
- Introducing the computing daemons
- Dissecting a MapReduce job
Defining Hadoop Cluster Requirements
- Planning the architecture
- Selecting appropriate hardware
- Designing a scalable cluster
Building the cluster
- Installing Hadoop daemons
- Optimizing the network architecture
Configuring a Cluster
- Preparing HDFS
- Setting basic configuration parameters
- Configuring block allocation, redundancy and replication
Deploying MapReduce
- Installing and setting up the MapReduce environment
- Delivering redundant load balancing via Rack Awareness
Maximizing HDFS Robustness
- Creating a fault-tolerant file system
- Isolating single points of failure
- Maintaining High Availability
- Triggering manual failover
- Automating failover with Zookeeper
Leveraging NameNode Federation
- Extending HDFS resources
- Managing the namespace volumes
Introducing YARN
- Critiquing the YARN architecture
- Identifying the new daemons
Managing Resources and Cluster Health
- Allocating resources
- Setting quotas to constrain HDFS utilization
- Prioritizing access to MapReduce using schedulers
Maintaining HDFS
- Starting and stopping Hadoop daemons
- Monitoring HDFS status
- Adding and removing data nodes
Administering MapReduce
- Managing MapReduce jobs
- Tracking progress with monitoring tools
- Commissioning and decommissioning compute nodes
Maintaining a Cluster
- Employing the standard built-in tools
- Managing and debugging processes using JVM metrics
Performing Hadoop status checks
- Tuning with supplementary tools
- Assessing performance with Ganglia
- Benchmarking to ensure continued performance
Extending Hadoop
- Simplifying information access
- Enabling SQL-like querying with Hive
- Installing Pig to create MapReduce jobs
Integrating additional elements of the ecosystem
- Imposing a tabular view on HDFS with HBase
- Configuring Oozie to schedule workflows
Implementing Data Ingress and Egress
- Facilitating generic input/output
- Moving bulk data into and out of Hadoop
- Transmitting HDFS data over HTTP with WebHDFS
Acquiring application-specific data
- Collecting multi-sourced log files with Flume
- Importing and exporting relational information with Sqoop
Planning for Backup, Recovery and Security
- Coping with inevitable hardware failures
- Securing your Hadoop cluster
No comments:
Post a Comment