About Me.

My self Adinarayana working as Implementation Application DBA with advanced technologies like RAC/PCP,OID/SSO,DMZ,Exadata and Fusion Middleware i.e Demantra,Application Server,SOA,FMW,BPEL and UPK. Created this blog to share the useful information related to DBA and Application DBA Your comments and suggestions are most welcome. Disclaimer: Please note all the views and opinions expressed in this site are my own. It's not recommend to use the fixes/suggestions provided in this site directly in production instance, please test them before implementing.

Sunday, December 14, 2014

Big Data Contents

Introduction to Data Storage and Processing

  • Installing the Hadoop Distributed File System (HDFS)
  • Defining key design assumptions and architecture
  • Configuring and setting up the file system
  • Issuing commands from the console
  • Reading and writing files


Setting the stage for MapReduce

  • Reviewing the MapReduce approach
  • Introducing the computing daemons
  • Dissecting a MapReduce job


Defining Hadoop Cluster Requirements

  • Planning the architecture
  • Selecting appropriate hardware
  • Designing a scalable cluster


Building the cluster

  • Installing Hadoop daemons
  • Optimizing the network architecture


Configuring a Cluster

  • Preparing HDFS
  • Setting basic configuration parameters
  • Configuring block allocation, redundancy and replication


Deploying MapReduce

  • Installing and setting up the MapReduce environment
  • Delivering redundant load balancing via Rack Awareness


Maximizing HDFS Robustness

  • Creating a fault-tolerant file system
  • Isolating single points of failure
  • Maintaining High Availability
  • Triggering manual failover
  • Automating failover with Zookeeper


Leveraging NameNode Federation

  • Extending HDFS resources
  • Managing the namespace volumes


Introducing YARN

  • Critiquing the YARN architecture
  • Identifying the new daemons


Managing Resources and Cluster Health

  • Allocating resources
  • Setting quotas to constrain HDFS utilization
  • Prioritizing access to MapReduce using schedulers


Maintaining HDFS

  • Starting and stopping Hadoop daemons
  • Monitoring HDFS status
  • Adding and removing data nodes


Administering MapReduce

  • Managing MapReduce jobs
  • Tracking progress with monitoring tools
  • Commissioning and decommissioning compute nodes


Maintaining a Cluster

  • Employing the standard built-in tools
  • Managing and debugging processes using JVM metrics


Performing Hadoop status checks

  • Tuning with supplementary tools
  • Assessing performance with Ganglia
  • Benchmarking to ensure continued performance


Extending Hadoop

  • Simplifying information access
  • Enabling SQL-like querying with Hive
  • Installing Pig to create MapReduce jobs


Integrating additional elements of the ecosystem

  • Imposing a tabular view on HDFS with HBase
  • Configuring Oozie to schedule workflows


Implementing Data Ingress and Egress

  • Facilitating generic input/output
  • Moving bulk data into and out of Hadoop
  • Transmitting HDFS data over HTTP with WebHDFS


Acquiring application-specific data

  • Collecting multi-sourced log files with Flume
  • Importing and exporting relational information with Sqoop


Planning for Backup, Recovery and Security

  • Coping with inevitable hardware failures
  • Securing your Hadoop cluster

No comments:

Post a Comment