Adi's Oracle Tech Blog: Big Data Contents

About Me.

My self Adinarayana working as Implementation Application DBA with advanced technologies like RAC/PCP,OID/SSO,DMZ,Exadata and Fusion Middleware i.e Demantra,Application Server,SOA,FMW,BPEL and UPK. Created this blog to share the useful information related to DBA and Application DBA Your comments and suggestions are most welcome. Disclaimer: Please note all the views and opinions expressed in this site are my own. It's not recommend to use the fixes/suggestions provided in this site directly in production instance, please test them before implementing.

Sunday, December 14, 2014

Big Data Contents

Introduction to Data Storage and Processing

Installing the Hadoop Distributed File System (HDFS)
Defining key design assumptions and architecture
Configuring and setting up the file system
Issuing commands from the console
Reading and writing files

Setting the stage for MapReduce

Reviewing the MapReduce approach
Introducing the computing daemons
Dissecting a MapReduce job

Defining Hadoop Cluster Requirements

Planning the architecture
Selecting appropriate hardware
Designing a scalable cluster

Building the cluster

Installing Hadoop daemons
Optimizing the network architecture

Configuring a Cluster

Preparing HDFS
Setting basic configuration parameters
Configuring block allocation, redundancy and replication

Deploying MapReduce

Installing and setting up the MapReduce environment
Delivering redundant load balancing via Rack Awareness

Maximizing HDFS Robustness

Creating a fault-tolerant file system
Isolating single points of failure
Maintaining High Availability
Triggering manual failover
Automating failover with Zookeeper

Leveraging NameNode Federation

Extending HDFS resources
Managing the namespace volumes

Introducing YARN

Critiquing the YARN architecture
Identifying the new daemons

Managing Resources and Cluster Health

Allocating resources
Setting quotas to constrain HDFS utilization
Prioritizing access to MapReduce using schedulers

Maintaining HDFS

Starting and stopping Hadoop daemons
Monitoring HDFS status
Adding and removing data nodes

Administering MapReduce

Managing MapReduce jobs
Tracking progress with monitoring tools
Commissioning and decommissioning compute nodes

Maintaining a Cluster

Employing the standard built-in tools
Managing and debugging processes using JVM metrics

Performing Hadoop status checks

Tuning with supplementary tools
Assessing performance with Ganglia
Benchmarking to ensure continued performance

Extending Hadoop

Simplifying information access
Enabling SQL-like querying with Hive
Installing Pig to create MapReduce jobs

Integrating additional elements of the ecosystem

Imposing a tabular view on HDFS with HBase
Configuring Oozie to schedule workflows

Implementing Data Ingress and Egress

Facilitating generic input/output
Moving bulk data into and out of Hadoop
Transmitting HDFS data over HTTP with WebHDFS

Acquiring application-specific data

Collecting multi-sourced log files with Flume
Importing and exporting relational information with Sqoop

Planning for Backup, Recovery and Security

Coping with inevitable hardware failures
Securing your Hadoop cluster

Adi's Oracle Tech Blog

About Me.

Sunday, December 14, 2014

Big Data Contents

No comments:

Post a Comment

Popular Posts