SlideShare a Scribd company logo
Hadoop Summit 2012




 Infrastructure Around Hadoop

Backups, failover, configuration and monitoring

       Terran Melconian, Edmund MacKenty


               tripadvisor.com/careers            1
What TripAdvisor Does


•  World's largest travel site and community
•  Trip planning user reviews
•  >50 million unique monthly visitors, 30 countries*
•  >60 million reviews and opinions*
•  Run like a startup: 30+ teams all doing their own thing
•  Heavy use of open-source projects
•  Speed Wins!




            * source: comScore Media Metrix for TripAdvisor Sites, Worldwide, January 2012


                                                                                             2
What the Warehouse Team Does


•  Retain and aggregate historic site activity data
•  Make data available throughout the company
•  Hits, reviews, forums, contacts, locations, businesses, etc.
•  ~50 nodes in 4 clusters: Cloudera CDH3u3 (Hadoop 0.20.2)
•  Used by ~12 analytics teams, heavy use of Hive
•  Some jobs must run every day (eg. ETL, aggregations)
•  Systems are very open, we trust our users (usually)
•  3 people, fairly new to Hadoop/Hive




                                                                  3
Why Hadoop at TripAdvisor


•  Hadoop is how we scale analysis past the limits of one machine
  –  Some daily jobs taking nearly 24 hours, and we're still growing quickly

•  Our old RDBMS data warehouse could barely keep up with data
   ingestion, even running on expensive hardware with a SAN
  –  We obtained 20x improvement in wall clock time

•  Reprocess unaggregated historical data as definitions change
  –  Before, impossible except for a small sample
  –  Now, reprocess years of data at the finest level in a few days

•  Efficient platform for many kinds of statistics
  –  Representative example: five-hour RDBMS job went to 25 minutes




                                                                               4
HA NameNode: DRBD, Corosync and Pacemaker


•  Namenode and JobTracker run on “master” node
•  Datanode and TaskTracker run on “slave” nodes
•  Automatic fail-over of all master-node services to a passive node
•  Provision two identical systems
•  Set up virtual Master IP address to be failed over
•  Secondary namenode on passive node, if available
•  Monitor and automatically restart failed services




                                                                       5
DRBD/Corosync Configuration


•  DRBD: replicates namenode image, Hive metadata, Oozie job data
  –  Create two identical storage devices (we used RAID 1)
  –  Connect the master nodes with a cross-over ethernet cable
  –  Configure DRBD to use the cross-over and storage devices
  –  Use drbdadm to create the replicated device
  –  Create a filesystem on /dev/drbd0 with mkfs
  –  Cat /proc/drbd to see state of the device
  –  Once created, use /etc/init.d/drbd to manage it

•  Corosync: messaging between active-passive masters
  –  Configure Corosync to also use the cross-over ethernet cable
  –  Corosync will start Pacemaker for you
  –  Use /etc/init.d/corosync to manage it, and Pacemaker




                                                                    6
Pacemaker Configuration


•  Define each resource you want to manage:
  –  DRBD device, master IP address, ethernet connectivity checks,
    Hadoop namenode and jobtracker, Hive thrift server, MySQL for Hive
    metadata, Oozie for workflow coordination

•  Set monitoring intervals for each resource
•  Define resource co-location dependencies
•  Define resource ordering dependencies
•  Restarts failed services, eg. Hive-Thrift
•  Use crm tool to manage nodes and resources
•  Test with a manual fail-over:
  –  migrate namenode resource to passive master
  –  Use crm status to watch all resources move over

                                                                         7
Monitoring: Ganglia and Nagios, Job Tracking


•  Visibility into cluster operations
•  Monitor hardware states and resource usage
•  Notify on specific boundary or failure conditions
•  Track MapReduce jobs and Hive tables
•  Identify immediate problems
•  Show trends over time to predict future needs




                                                       8
Ganglia


•  Standard monitoring of CPU, Memory, Disk usage, etc.
•  PERL script parses Hadoop metrics, sends using gmetric(1)
•  ~50 Hadoop metrics, ~30 system metrics
•  Graphs for entire cluster and individual nodes
•  Example: Two jobs with different resource profiles




                                                               9
Nagios


•  Our primary notification system
•  About 80 checks, ~25 are our own. Examples:
  –  check_hdp_connectivity: can master talk to all its slaves?
  –  check_hdp_data_nodes: are all configured slave datanodes running?
  –  check_hdp_max_mr_settings: does jobtracker have resources we expect?
  –  check_hadoop_master_logfiles: are logs being written to?
  –  check_hive_server: is it up?

•  Some warnings:
  –  Do not let Nagios run hadoop fsck (check_hdp_hdfs)
  –  LDAP failure causes email cascade
  –  High loads can cause timeouts, which cause notifications




                                                                         10
Job Tracking


•  PERL script invoked frequently by cron
•  Parses jobtracker log entries since last run
•  Records data on each job in PostreSQL DB:
  –  Job ID, user, submitting IP and time, status
  –  Cluster ID, queue, Hive query
  –  start/stop times for job and first mapper and reducer
  –  Mapper and reducer counts, max memory, slots, splits

•  CGI script to do queries:
  –  Running jobs, failed jobs, MapReduce capacity usage
  –  Job resource usage by status, queue, user

•  Helps post-mortem of problems
•  Used to predict trends, future resource needs

                                                             11
Other cron scripts we run


•  Check_load:
  –  Dumps Java stack trace when load is too high
  –  Emails list of top processes so we can see what was wrong

•  Master nodes:
  –  Compresses Hadoop/Hive logs more than 30 days old
  –  Removes logs more than 120 days old (we keep 10+ GBs)
  –  Check_hdfs: Runs hadoop fsck to see if HDFS is “healthy”
  –  Backup current namenode fsimage

•  Slave Nodes:
  –  Check_disks: Removes read-only disks from datanode configuration
  –  Check_load: Kills some tasks and notifies us when load is too high

•  Refresh production data to development cluster


                                                                          12
Configuration Management


•  Seems like extra work at first, but essential as you grow.
•  Not Hadoop-specific: manage OS packages, Nagios and Ganglia
   scripts, cron jobs, svn, SSH keys, NFS mounts, jars
  –  Consistent UID/GIDs critical with DRBD
  –  We replace some jars from the RPMs with local fixes
  –  Templatized configuration files very convenient. ERB is good.
  –  SSH keys made consistent across nodes, masters share host key

•  Use SVN as file delivery mechanism: checkout on each box
•  We chose Puppet as a tool
  –  Gets the job done
  –  Lacks flexibility in inheritance to specialize defaults per-machine
  –  Some aspects of operation are hard to debug



                                                                           13
Backup: HDFS and Hive DDL


•  Objectives:
  –  Provide safety against total HDFS failure due to software bugs or
     machine room environmental incident
  –  Protect against user error in dropping or overwriting tables
  –  Restore data to another cluster

•  Assumptions
  –  Repeating one day of processing is acceptable when restoring

•  Components
  –  Incremental HDFS backup
  –  Hive DDL backup

•  Runs on separate backup server with storage (NexSan)
  –  Pull process driven by processes on backup server



                                                                         14
Backup HDFS


•  Open-source Java app
•  Requires customization to your environment
•  Traverses HDFS directory tree
•  Copies out files modified after a given date
•  Doesn't copy very new directories
  –  Needed a way to avoid copying files being written at time of backup
  –  HDFS has no snapshots

•  Ignores specified directories
•  Generates restore shell scripts to set owners, perms
•  Verification tool checks file sizes and checksums


                                                                           15
Backup Hive DDL


•  Open-source Java app uses Thrift server
•  Iterates over all tables and views
•  Constructs DDL statements from Hive metadata
•  Ignores specific tables
•  Generates Hive command script
  –  Recreates all tables, adds all partitions back one at a time

•  Used to move metadata to MySQL
•  Restore full cluster:
  –  copying files back with copyFromLocal
  –  Run perm/owner scripts
  –  Reapply Hive DDL


                                                                    16
Other Things To Potentially Back Up


•  Backup the Namenode Metadata
  –  We do this once every 4 hours
  –  This is in addition to mirroring on four physical drives

•  Our job tracking database
•  No general backups of root or local FS on machines
  –  Recreate machines with Puppet or other configuration management
    tool instead

•  Oozie job database
  –  We do NOT back this up
  –  Tightly coupled with HDFS state and restore would be problematic
  –  The recovery procedure is to rebuild and reinstall coordinators




                                                                        17
Oozie: Why


•  Drawback: several times slower to write than cronjobs, while also
   less expressive
•  Advantage: Ability to cleanly depend on input data
  –  With cron, you would have to poll for stamps

•  Advantage: Clean and consistent metadata
  –  See what ran, what failed, what is still waiting and why
  –  Easily retry things which failed – good luck doing that with cron
  –  Output datasets are deleted on rerun so ordering is preserved




                                                                         18
Oozie: How


•  Establish consistent local practices for completion stamps, job
   naming, owners, and source code locations
•  Enforce that all jobs must be idempotent
•  Create scripts/makefiles/build.xml to rebuild and reinstall jobs
   after changes in their dependencies
•  Bypass the Oozie GUI
  –  The CLI is a more capable tool
  –  Go straight to the Oozie backing DB and issue SQL queries

•  Rerun coordinator actions, not workflows
•  Don't ever use Derby – we experienced massive corruption




                                                                      19
Experiences and Expectations


•  Hadoop is not mature from a reliability and stability point of view
  –  It will probably get there in a few more years

•  Cluster outages are common events, not outliers
  –  Must bounce key services to pick up basic configuration changes such
     as adding a new queue
  –  As you scale up, you will encounter new classes of problems
  –  Example: kernel deadlocks during heavy disk IO

•  You must design for failure and have a robust mechanism to
   cleanly and easily resume execution once the cluster is back up.
•  Important jobs must be isolated from developers
  –  Each cluster should contain ONE tier of jobs, grouped by SLA, release
    process, and time-to-recovery requirements



                                                                             20
Attributes of Robust Jobs


•  Idempotent and resumable regardless of when/how terminated
•  Has an external framework for recording success/failure, timing,
   and amount of data processed
•  Knows what input data it needs and waits for it to be ready
•  Has mechanism for reprocessing if the input data is restated
•  Checked into source control
•  Testable in an expendable cluster before release




                                                                      21
Benchmarks


•  How to evaluate hardware/network changes or map/reduce slot
   tuning?
  –  Key insight: For the same job, the same task always does the same
     work
  –  Rerun job and compare execution of the same task across machines
Machine  Tasks Comps Relative Perf (larger is better)
~~~~~~~~~~~~ ~~~~~ ~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
type1_1   82 37 0.99 ====================
type1_2   91 76 0.98 ====================
type1_3   92 35 1.01 ====================
type1_4   88 85 1.06 =====================
type2_1   71 26 1.30 ==========================
type3_1   92 80 0.68 ==============
type4_1   78 42 1.19 ========================
type4_2   78 45 1.29 ==========================
type4_3   75 75 1.19 ========================

remote    546 534 0.97 ===================
local    378 69 1.05 =====================
                                                                         22
Features you Should Use


•  Fair Scheduler
•  refreshNodes, refreshQueues
•  Hadoop metrics
•  Namenode audit logging (disabled by default in 0.20)
•  Exclude files to decommission slave nodes




                                                          23
Staffing


•  We're living proof that you can hire some engineers with good
   fundamentals but no specialized experience and throw them in
   the deep end (it's the TA way)
•  Skills to hire for:
   –  Operations and Linux experience
   –  General service troubleshooting
   –  Scripting
   –  Java
   –  SQL (even if not using Hive)

•  Managing clusters which are growing 2x - 4x per year takes 1-2
   people working full time just to run in place




                                                                    24
Open Questions


•  Resuming of jobs on jobtracker restart
•  Reloading of configurations without a restart
•  Robust response to cluster OOM conditions
•  Disabling job submission while allowing existing jobs to finish


•  Please tell us if you have the answers!




                                                                     25
Questions?




             26
Appendix


This is for you to read later
  after downloading the
         presentation
                                27
Downloads




https://github.com/TAwarehouse/




                                  28
DRBD Configuration
global {
  usage-count no;
  minor-count 1;
}
common {
  protocol C;                             on master01.tripadvisor.com {
  syncer { rate 90M; }                        device     /dev/drbd0;
}                                             disk      /dev/sda3;
resource internal {                           address 10.0.0.1:7789;
  startup {                                   flexible-meta-disk internal;
    wfc-timeout 600;                        }
    degr-wfc-timeout 60;                    on master02.tripadvisor.com {
  }                                           device     /dev/drbd0;
  disk {                                      disk      /dev/sda3;
    on-io-error detach;                       address 10.0.0.2:7789;
  }                                           flexible-meta-disk internal;
  net {                                     }
    # timeout        60;                  }
    # connect-int     10;
    # ping-int      10;
    # max-buffers 2048;
    # max-epoch-size 2048;
  }




                                                                             29
Corosync Configuration
compatibility: whitetank
totem {
    version: 2
    secauth: off
    threads: 0
    interface {
           ringnumber: 0                  amf {
           bindnetaddr: 10.0.0.0               mode: disabled
           mcastaddr: 239.0.0.11          }
           mcastport: 5415                aisexec {
    }                                          user: root
}                                              group: root
logging {                                 }
    fileline: off                         service {
    to_stderr: no                              name: pacemaker
    to_logfile: yes                            ver: 0
    to_syslog: yes                        }
    logfile: /var/log/corosync.log
    debug: off
    timestamp: on
    logger_subsys {
           subsys: AMF
           debug: off
    }
}

                                                                 30
Pacemaker Configuration

node master01.tripadvisor.com attributes standby="off"
node master02.tripadvisor.com attributes standby="off"
property $id="cib-bootstrap-options" stonith-enabled="false" no-quorum-policy="ignore" 
              expected-quorum-votes="2" dc-version="1.0.12-unknown" cluster-infrastructure="openais" 
              last-lrm-refresh="1337718104"
rsc_defaults $id="rsc-options" resource-stickiness="100"
primitive DataStore ocf:linbit:drbd params drbd_resource="internal" 
              op start interval="0" timeout="240s" op stop interval="0" timeout="100s"
primitive fs_DataStore ocf:heartbeat:Filesystem 
              params device="/dev/drbd0" directory="/data/internal" fstype="ext3" 
              op monitor interval="60s" timeout="40s" op start interval="0" timeout="60s" 
              op stop interval="0" timeout="60s"
ms Cluster DataStore 
              meta master-max="1" master-node="max=1" clone-max="2" clone-node-max="1" notify="true"
colocation fs-with-drbd inf: fs_DataStore Cluster:Master
order drdb-fs inf: Cluster:promote fs_DataStore:start
primitive MasterIP ocf:heartbeat:IPaddr2 
              params ip="192.168.236.10" nic="bond0" op monitor interval="30s"
colocation ip-with-drbd inf: MasterIP Cluster:Master
order fs-ip inf: fs_DataStore MasterIP
primitive NameNode lsb:hadoop-0.20-namenode op monitor interval="30s" meta target-role="Started"
colocation namenode-with-fs inf: NameNode fs_DataStore
order ip-namenode inf: MasterIP NameNode
primitive JobTracker lsb:hadoop-0.20-jobtracker op monitor interval="30s" meta target-role="Started"
colocation jobtracker-with-fs inf: JobTracker fs_DataStore
order namenode-jobtracker inf: NameNode JobTracker




                                                                                                         31
Pacemaker Configuration (cont.)
primitive SecondaryNameNode lsb:hadoop-0.20-secondarynamenode 
             op monitor interval="30s" meta target-role="Started"
colocation secondarynamenode-not-with-ip -inf: SecondaryNameNode MasterIP
order jobtracker-secnamenode inf: JobTracker SecondaryNameNode
primitive Mysql ocf:heartbeat:mysql 
             params datadir="/data/internal/mysql" socket="/data/internal/mysql/mysql.sock" 
             binary="/usr/bin/mysqld_safe" op monitor interval="30s" timeout="30s" op start 
             interval="0" timeout="120s" op stop interval="0" timeout="120s" 
             meta target-role="Started"
colocation mysql-with-fs inf: Mysql fs_DataStore
order ip-mysql inf: MasterIP Mysql
primitive HiveThrift lsb:hive-thrift 
             op monitor interval="30s" meta target-role="Started"
colocation hivethrift-with-ip inf: HiveThrift MasterIP
order jobtracker-hivethrift inf: JobTracker HiveThrift
order mysql-hivethrift inf: Mysql HiveThrift
primitive Oozie lsb:oozie 
             op monitor interval="30s" meta target-role="Started"
colocation oozie-with-fs inf: Oozie MasterIP
order jobtracker-oozie inf: JobTracker Oozie
primitive PingNodes ocf:pacemaker:ping 
             params host_list="192.168.236.1 192.168.236.2 192.168.236.5" multiplier="100" 
             op start interval="0" timeout="60s" op monitor interval="30s" timeout="60s"
clone PingClone PingNodes meta interleave="true"
location ping-with-ip MasterIP 
rule $id="ping-with-ip-rule" pingd: defined pingd
location prefer-master01.tripadvisor.com MasterIP 
             rule $id="prefer-master01.tripadvisor.com-rule" 50: #uname eq master01.tripadvisor.com
order ip-ping inf: MasterIP PingClone


                                                                                                      32
Nagios Checks

check_apt              check_breeze      check_by_ssh            check_checkup_metric
check_clamd            check_cluster     check_cronjobs          check_crontabs
check_dhcp             check_dig         check_disk              check_disk_smb
check_disk_writable    check_dns         check_dummy             check_fbrs
check_file_age         check_files_age   check_filesystems       check_flexlm
check_ftp              check_gc          check_hadoop_master_logfiles
check_hdp_connectivity check_hdp_data_nodes                      check_hdp_hdfs
                                                        20	
  
check_hdp_max_mr_settings                check_hive     10	
     check_hive_nsc
check_hive_server      check_http        check_icmp      0	
     check_ide_smart
                                                               R
check_ifoperstatus     check_ifstatus    check_imap              check_ircd
check_jabber           check_load        check_local_mail        check_log
check_log_updated      check_mailq       check_memcached                    check_minerva
check_mrtg             check_mrtgtraf    check_mysql_repl        check_nagios
check_nntp             check_nntps       check_nrpe              check_nt
check_ntp              check_ntp_peer    check_ntp_time          check_nwstat
check_oracle           check_overcr      check_ping              check_pop
check_proc_filehandles check_procs       check_real              check_rpc
check_sensors          check_simap       check_smtp              check_spop
check_ssh              check_ssmtp       check_swap              check_swapping
check_sys_filehandles check_ta_services  check_tcp               check_time
check_udp              check_ups         check_users             check_wave
check_writeable_tmp




                                                                                            33
Example Oozie Query
SELECT
  a.todaystatus as today,
  a.yesterdaystatus as yday,
  j.status as parent,
  j.app_name,
  a.last_modified_time,
  a.nominal_time,
  a.id
FROM (
  SELECT
  t.status as todaystatus,
  y.status as yesterdaystatus,
  COALESCE(t.id, y.id) AS id,
  y.job_id,
  COALESCE(t.nominal_time, y.nominal_time) AS nominal_time,
  COALESCE(t.last_modified_time, y.last_modified_time) AS last_modified_time
  FROM (SELECT *
      FROM COORD_ACTIONS
      WHERE TIMESTAMPDIFF(DAY, last_modified_time, now()) = 0) t
  RIGHT OUTER JOIN (SELECT *
      FROM COORD_ACTIONS
      WHERE TIMESTAMPDIFF(DAY, last_modified_time, now()) = 1) y
  ON (t.job_id=y.job_id)
  WHERE COALESCE(t.status, '') NOT IN ('SUCCEEDED', 'WAITING')
      -- If they're WAITING today, then make sure yesterday ran OK.
        OR (t.status = 'WAITING' and y.status <> 'SUCCEEDED')
  UNION DISTINCT
  -- Dummy record to force the table to exist even when empty, since MySql
  -- otherwise emits nothing if data is not returned.
  SELECT 'EMPTY', 'RECORD', '', '', '', 'THIS IS A DUMMY RECORD'
)a
LEFT OUTER JOIN COORD_JOBS j
ON a.job_id=j.id
WHERE j.status = 'RUNNING' OR j.status IS NULL
;



                                                                               34
Sessions will resume at 4:30pm




                             Page 35

More Related Content

PPTX
Pacemaker hadoop infrastructure and soft serve experience
Vitaliy Bashun
 
PDF
Kudu - Fast Analytics on Fast Data
Ryan Bosshart
 
PPTX
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Mladen Kovacevic
 
PPTX
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 
PPTX
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
 
PPTX
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
PPTX
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
PDF
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
 
Pacemaker hadoop infrastructure and soft serve experience
Vitaliy Bashun
 
Kudu - Fast Analytics on Fast Data
Ryan Bosshart
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Mladen Kovacevic
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
 

What's hot (20)

PPTX
Apache hadoop technology : Beginners
Shweta Patnaik
 
PPTX
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Dataconomy Media
 
PPTX
High concurrency,
Low latency analytics
using Spark/Kudu
Chris George
 
PPTX
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
 
PPTX
Video Analysis in Hadoop
DataWorks Summit
 
PPTX
Hadoop and HBase @eBay
DataWorks Summit
 
PDF
Hadoop and OpenStack
DataWorks Summit
 
PDF
GCP Data Engineer cheatsheet
Guang Xu
 
PPTX
Splice Machine Overview
Kunal Gupta
 
PDF
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
Rakuten Group, Inc.
 
PPTX
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
PPTX
Introduction to Apache Kudu
Jeff Holoman
 
PPTX
Introducing Kudu
Jeremy Beard
 
PDF
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Ontico
 
PPTX
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
PDF
Managing PostgreSQL with Ansible
EDB
 
PPTX
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
 
PDF
What database
Regunath B
 
Apache hadoop technology : Beginners
Shweta Patnaik
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Dataconomy Media
 
High concurrency,
Low latency analytics
using Spark/Kudu
Chris George
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
 
Video Analysis in Hadoop
DataWorks Summit
 
Hadoop and HBase @eBay
DataWorks Summit
 
Hadoop and OpenStack
DataWorks Summit
 
GCP Data Engineer cheatsheet
Guang Xu
 
Splice Machine Overview
Kunal Gupta
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
Rakuten Group, Inc.
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
Introduction to Apache Kudu
Jeff Holoman
 
Introducing Kudu
Jeremy Beard
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Ontico
 
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
Managing PostgreSQL with Ansible
EDB
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
 
What database
Regunath B
 
Ad

Similar to Infrastructure Around Hadoop (20)

PDF
Next Generation Hadoop Operations
Owen O'Malley
 
PPTX
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 
PDF
Practice and challenges from building IaaS
Shawn Zhu
 
PPTX
Introduction to hadoop and hdfs
shrey mehrotra
 
PDF
Hadoop at Nokia
Josh Devins
 
PPT
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
PPTX
Top 10 lessons learned from deploying hadoop in a private cloud
Rogue Wave Software
 
PPTX
HDFS tiered storage
DataWorks Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
larsgeorge
 
PPTX
Big Data Analytics -Introduction education
mohammedansaralima
 
PDF
Hadoop Operations - Best practices from the field
Uwe Printz
 
PDF
Scaling Hadoop at LinkedIn
DataWorks Summit
 
PDF
Five Years of EC2 Distilled
Grig Gheorghiu
 
PPTX
Hadoop project design and a usecase
sudhakara st
 
PDF
Hadoop, Taming Elephants
Ovidiu Dimulescu
 
PDF
HDFS Design Principles
Konstantin V. Shvachko
 
PPTX
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
PDF
Operate your hadoop cluster like a high eff goldmine
DataWorks Summit
 
PDF
Hadoop Distributed File System
elliando dias
 
PPTX
What it takes to run Hadoop at Scale: Yahoo! Perspectives
DataWorks Summit
 
Next Generation Hadoop Operations
Owen O'Malley
 
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 
Practice and challenges from building IaaS
Shawn Zhu
 
Introduction to hadoop and hdfs
shrey mehrotra
 
Hadoop at Nokia
Josh Devins
 
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
Top 10 lessons learned from deploying hadoop in a private cloud
Rogue Wave Software
 
HDFS tiered storage
DataWorks Summit
 
Backup and Disaster Recovery in Hadoop
larsgeorge
 
Big Data Analytics -Introduction education
mohammedansaralima
 
Hadoop Operations - Best practices from the field
Uwe Printz
 
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Five Years of EC2 Distilled
Grig Gheorghiu
 
Hadoop project design and a usecase
sudhakara st
 
Hadoop, Taming Elephants
Ovidiu Dimulescu
 
HDFS Design Principles
Konstantin V. Shvachko
 
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
Operate your hadoop cluster like a high eff goldmine
DataWorks Summit
 
Hadoop Distributed File System
elliando dias
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
DataWorks Summit
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Doc9.....................................
SofiaCollazos
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PPTX
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Doc9.....................................
SofiaCollazos
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 

Infrastructure Around Hadoop

  • 1. Hadoop Summit 2012 Infrastructure Around Hadoop Backups, failover, configuration and monitoring Terran Melconian, Edmund MacKenty tripadvisor.com/careers 1
  • 2. What TripAdvisor Does •  World's largest travel site and community •  Trip planning user reviews •  >50 million unique monthly visitors, 30 countries* •  >60 million reviews and opinions* •  Run like a startup: 30+ teams all doing their own thing •  Heavy use of open-source projects •  Speed Wins! * source: comScore Media Metrix for TripAdvisor Sites, Worldwide, January 2012 2
  • 3. What the Warehouse Team Does •  Retain and aggregate historic site activity data •  Make data available throughout the company •  Hits, reviews, forums, contacts, locations, businesses, etc. •  ~50 nodes in 4 clusters: Cloudera CDH3u3 (Hadoop 0.20.2) •  Used by ~12 analytics teams, heavy use of Hive •  Some jobs must run every day (eg. ETL, aggregations) •  Systems are very open, we trust our users (usually) •  3 people, fairly new to Hadoop/Hive 3
  • 4. Why Hadoop at TripAdvisor •  Hadoop is how we scale analysis past the limits of one machine –  Some daily jobs taking nearly 24 hours, and we're still growing quickly •  Our old RDBMS data warehouse could barely keep up with data ingestion, even running on expensive hardware with a SAN –  We obtained 20x improvement in wall clock time •  Reprocess unaggregated historical data as definitions change –  Before, impossible except for a small sample –  Now, reprocess years of data at the finest level in a few days •  Efficient platform for many kinds of statistics –  Representative example: five-hour RDBMS job went to 25 minutes 4
  • 5. HA NameNode: DRBD, Corosync and Pacemaker •  Namenode and JobTracker run on “master” node •  Datanode and TaskTracker run on “slave” nodes •  Automatic fail-over of all master-node services to a passive node •  Provision two identical systems •  Set up virtual Master IP address to be failed over •  Secondary namenode on passive node, if available •  Monitor and automatically restart failed services 5
  • 6. DRBD/Corosync Configuration •  DRBD: replicates namenode image, Hive metadata, Oozie job data –  Create two identical storage devices (we used RAID 1) –  Connect the master nodes with a cross-over ethernet cable –  Configure DRBD to use the cross-over and storage devices –  Use drbdadm to create the replicated device –  Create a filesystem on /dev/drbd0 with mkfs –  Cat /proc/drbd to see state of the device –  Once created, use /etc/init.d/drbd to manage it •  Corosync: messaging between active-passive masters –  Configure Corosync to also use the cross-over ethernet cable –  Corosync will start Pacemaker for you –  Use /etc/init.d/corosync to manage it, and Pacemaker 6
  • 7. Pacemaker Configuration •  Define each resource you want to manage: –  DRBD device, master IP address, ethernet connectivity checks, Hadoop namenode and jobtracker, Hive thrift server, MySQL for Hive metadata, Oozie for workflow coordination •  Set monitoring intervals for each resource •  Define resource co-location dependencies •  Define resource ordering dependencies •  Restarts failed services, eg. Hive-Thrift •  Use crm tool to manage nodes and resources •  Test with a manual fail-over: –  migrate namenode resource to passive master –  Use crm status to watch all resources move over 7
  • 8. Monitoring: Ganglia and Nagios, Job Tracking •  Visibility into cluster operations •  Monitor hardware states and resource usage •  Notify on specific boundary or failure conditions •  Track MapReduce jobs and Hive tables •  Identify immediate problems •  Show trends over time to predict future needs 8
  • 9. Ganglia •  Standard monitoring of CPU, Memory, Disk usage, etc. •  PERL script parses Hadoop metrics, sends using gmetric(1) •  ~50 Hadoop metrics, ~30 system metrics •  Graphs for entire cluster and individual nodes •  Example: Two jobs with different resource profiles 9
  • 10. Nagios •  Our primary notification system •  About 80 checks, ~25 are our own. Examples: –  check_hdp_connectivity: can master talk to all its slaves? –  check_hdp_data_nodes: are all configured slave datanodes running? –  check_hdp_max_mr_settings: does jobtracker have resources we expect? –  check_hadoop_master_logfiles: are logs being written to? –  check_hive_server: is it up? •  Some warnings: –  Do not let Nagios run hadoop fsck (check_hdp_hdfs) –  LDAP failure causes email cascade –  High loads can cause timeouts, which cause notifications 10
  • 11. Job Tracking •  PERL script invoked frequently by cron •  Parses jobtracker log entries since last run •  Records data on each job in PostreSQL DB: –  Job ID, user, submitting IP and time, status –  Cluster ID, queue, Hive query –  start/stop times for job and first mapper and reducer –  Mapper and reducer counts, max memory, slots, splits •  CGI script to do queries: –  Running jobs, failed jobs, MapReduce capacity usage –  Job resource usage by status, queue, user •  Helps post-mortem of problems •  Used to predict trends, future resource needs 11
  • 12. Other cron scripts we run •  Check_load: –  Dumps Java stack trace when load is too high –  Emails list of top processes so we can see what was wrong •  Master nodes: –  Compresses Hadoop/Hive logs more than 30 days old –  Removes logs more than 120 days old (we keep 10+ GBs) –  Check_hdfs: Runs hadoop fsck to see if HDFS is “healthy” –  Backup current namenode fsimage •  Slave Nodes: –  Check_disks: Removes read-only disks from datanode configuration –  Check_load: Kills some tasks and notifies us when load is too high •  Refresh production data to development cluster 12
  • 13. Configuration Management •  Seems like extra work at first, but essential as you grow. •  Not Hadoop-specific: manage OS packages, Nagios and Ganglia scripts, cron jobs, svn, SSH keys, NFS mounts, jars –  Consistent UID/GIDs critical with DRBD –  We replace some jars from the RPMs with local fixes –  Templatized configuration files very convenient. ERB is good. –  SSH keys made consistent across nodes, masters share host key •  Use SVN as file delivery mechanism: checkout on each box •  We chose Puppet as a tool –  Gets the job done –  Lacks flexibility in inheritance to specialize defaults per-machine –  Some aspects of operation are hard to debug 13
  • 14. Backup: HDFS and Hive DDL •  Objectives: –  Provide safety against total HDFS failure due to software bugs or machine room environmental incident –  Protect against user error in dropping or overwriting tables –  Restore data to another cluster •  Assumptions –  Repeating one day of processing is acceptable when restoring •  Components –  Incremental HDFS backup –  Hive DDL backup •  Runs on separate backup server with storage (NexSan) –  Pull process driven by processes on backup server 14
  • 15. Backup HDFS •  Open-source Java app •  Requires customization to your environment •  Traverses HDFS directory tree •  Copies out files modified after a given date •  Doesn't copy very new directories –  Needed a way to avoid copying files being written at time of backup –  HDFS has no snapshots •  Ignores specified directories •  Generates restore shell scripts to set owners, perms •  Verification tool checks file sizes and checksums 15
  • 16. Backup Hive DDL •  Open-source Java app uses Thrift server •  Iterates over all tables and views •  Constructs DDL statements from Hive metadata •  Ignores specific tables •  Generates Hive command script –  Recreates all tables, adds all partitions back one at a time •  Used to move metadata to MySQL •  Restore full cluster: –  copying files back with copyFromLocal –  Run perm/owner scripts –  Reapply Hive DDL 16
  • 17. Other Things To Potentially Back Up •  Backup the Namenode Metadata –  We do this once every 4 hours –  This is in addition to mirroring on four physical drives •  Our job tracking database •  No general backups of root or local FS on machines –  Recreate machines with Puppet or other configuration management tool instead •  Oozie job database –  We do NOT back this up –  Tightly coupled with HDFS state and restore would be problematic –  The recovery procedure is to rebuild and reinstall coordinators 17
  • 18. Oozie: Why •  Drawback: several times slower to write than cronjobs, while also less expressive •  Advantage: Ability to cleanly depend on input data –  With cron, you would have to poll for stamps •  Advantage: Clean and consistent metadata –  See what ran, what failed, what is still waiting and why –  Easily retry things which failed – good luck doing that with cron –  Output datasets are deleted on rerun so ordering is preserved 18
  • 19. Oozie: How •  Establish consistent local practices for completion stamps, job naming, owners, and source code locations •  Enforce that all jobs must be idempotent •  Create scripts/makefiles/build.xml to rebuild and reinstall jobs after changes in their dependencies •  Bypass the Oozie GUI –  The CLI is a more capable tool –  Go straight to the Oozie backing DB and issue SQL queries •  Rerun coordinator actions, not workflows •  Don't ever use Derby – we experienced massive corruption 19
  • 20. Experiences and Expectations •  Hadoop is not mature from a reliability and stability point of view –  It will probably get there in a few more years •  Cluster outages are common events, not outliers –  Must bounce key services to pick up basic configuration changes such as adding a new queue –  As you scale up, you will encounter new classes of problems –  Example: kernel deadlocks during heavy disk IO •  You must design for failure and have a robust mechanism to cleanly and easily resume execution once the cluster is back up. •  Important jobs must be isolated from developers –  Each cluster should contain ONE tier of jobs, grouped by SLA, release process, and time-to-recovery requirements 20
  • 21. Attributes of Robust Jobs •  Idempotent and resumable regardless of when/how terminated •  Has an external framework for recording success/failure, timing, and amount of data processed •  Knows what input data it needs and waits for it to be ready •  Has mechanism for reprocessing if the input data is restated •  Checked into source control •  Testable in an expendable cluster before release 21
  • 22. Benchmarks •  How to evaluate hardware/network changes or map/reduce slot tuning? –  Key insight: For the same job, the same task always does the same work –  Rerun job and compare execution of the same task across machines Machine Tasks Comps Relative Perf (larger is better) ~~~~~~~~~~~~ ~~~~~ ~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ type1_1 82 37 0.99 ==================== type1_2 91 76 0.98 ==================== type1_3 92 35 1.01 ==================== type1_4 88 85 1.06 ===================== type2_1 71 26 1.30 ========================== type3_1 92 80 0.68 ============== type4_1 78 42 1.19 ======================== type4_2 78 45 1.29 ========================== type4_3 75 75 1.19 ======================== remote 546 534 0.97 =================== local 378 69 1.05 ===================== 22
  • 23. Features you Should Use •  Fair Scheduler •  refreshNodes, refreshQueues •  Hadoop metrics •  Namenode audit logging (disabled by default in 0.20) •  Exclude files to decommission slave nodes 23
  • 24. Staffing •  We're living proof that you can hire some engineers with good fundamentals but no specialized experience and throw them in the deep end (it's the TA way) •  Skills to hire for: –  Operations and Linux experience –  General service troubleshooting –  Scripting –  Java –  SQL (even if not using Hive) •  Managing clusters which are growing 2x - 4x per year takes 1-2 people working full time just to run in place 24
  • 25. Open Questions •  Resuming of jobs on jobtracker restart •  Reloading of configurations without a restart •  Robust response to cluster OOM conditions •  Disabling job submission while allowing existing jobs to finish •  Please tell us if you have the answers! 25
  • 27. Appendix This is for you to read later after downloading the presentation 27
  • 29. DRBD Configuration global { usage-count no; minor-count 1; } common { protocol C; on master01.tripadvisor.com { syncer { rate 90M; } device /dev/drbd0; } disk /dev/sda3; resource internal { address 10.0.0.1:7789; startup { flexible-meta-disk internal; wfc-timeout 600; } degr-wfc-timeout 60; on master02.tripadvisor.com { } device /dev/drbd0; disk { disk /dev/sda3; on-io-error detach; address 10.0.0.2:7789; } flexible-meta-disk internal; net { } # timeout 60; } # connect-int 10; # ping-int 10; # max-buffers 2048; # max-epoch-size 2048; } 29
  • 30. Corosync Configuration compatibility: whitetank totem { version: 2 secauth: off threads: 0 interface { ringnumber: 0 amf { bindnetaddr: 10.0.0.0 mode: disabled mcastaddr: 239.0.0.11 } mcastport: 5415 aisexec { } user: root } group: root logging { } fileline: off service { to_stderr: no name: pacemaker to_logfile: yes ver: 0 to_syslog: yes } logfile: /var/log/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } 30
  • 31. Pacemaker Configuration node master01.tripadvisor.com attributes standby="off" node master02.tripadvisor.com attributes standby="off" property $id="cib-bootstrap-options" stonith-enabled="false" no-quorum-policy="ignore" expected-quorum-votes="2" dc-version="1.0.12-unknown" cluster-infrastructure="openais" last-lrm-refresh="1337718104" rsc_defaults $id="rsc-options" resource-stickiness="100" primitive DataStore ocf:linbit:drbd params drbd_resource="internal" op start interval="0" timeout="240s" op stop interval="0" timeout="100s" primitive fs_DataStore ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/data/internal" fstype="ext3" op monitor interval="60s" timeout="40s" op start interval="0" timeout="60s" op stop interval="0" timeout="60s" ms Cluster DataStore meta master-max="1" master-node="max=1" clone-max="2" clone-node-max="1" notify="true" colocation fs-with-drbd inf: fs_DataStore Cluster:Master order drdb-fs inf: Cluster:promote fs_DataStore:start primitive MasterIP ocf:heartbeat:IPaddr2 params ip="192.168.236.10" nic="bond0" op monitor interval="30s" colocation ip-with-drbd inf: MasterIP Cluster:Master order fs-ip inf: fs_DataStore MasterIP primitive NameNode lsb:hadoop-0.20-namenode op monitor interval="30s" meta target-role="Started" colocation namenode-with-fs inf: NameNode fs_DataStore order ip-namenode inf: MasterIP NameNode primitive JobTracker lsb:hadoop-0.20-jobtracker op monitor interval="30s" meta target-role="Started" colocation jobtracker-with-fs inf: JobTracker fs_DataStore order namenode-jobtracker inf: NameNode JobTracker 31
  • 32. Pacemaker Configuration (cont.) primitive SecondaryNameNode lsb:hadoop-0.20-secondarynamenode op monitor interval="30s" meta target-role="Started" colocation secondarynamenode-not-with-ip -inf: SecondaryNameNode MasterIP order jobtracker-secnamenode inf: JobTracker SecondaryNameNode primitive Mysql ocf:heartbeat:mysql params datadir="/data/internal/mysql" socket="/data/internal/mysql/mysql.sock" binary="/usr/bin/mysqld_safe" op monitor interval="30s" timeout="30s" op start interval="0" timeout="120s" op stop interval="0" timeout="120s" meta target-role="Started" colocation mysql-with-fs inf: Mysql fs_DataStore order ip-mysql inf: MasterIP Mysql primitive HiveThrift lsb:hive-thrift op monitor interval="30s" meta target-role="Started" colocation hivethrift-with-ip inf: HiveThrift MasterIP order jobtracker-hivethrift inf: JobTracker HiveThrift order mysql-hivethrift inf: Mysql HiveThrift primitive Oozie lsb:oozie op monitor interval="30s" meta target-role="Started" colocation oozie-with-fs inf: Oozie MasterIP order jobtracker-oozie inf: JobTracker Oozie primitive PingNodes ocf:pacemaker:ping params host_list="192.168.236.1 192.168.236.2 192.168.236.5" multiplier="100" op start interval="0" timeout="60s" op monitor interval="30s" timeout="60s" clone PingClone PingNodes meta interleave="true" location ping-with-ip MasterIP rule $id="ping-with-ip-rule" pingd: defined pingd location prefer-master01.tripadvisor.com MasterIP rule $id="prefer-master01.tripadvisor.com-rule" 50: #uname eq master01.tripadvisor.com order ip-ping inf: MasterIP PingClone 32
  • 33. Nagios Checks check_apt check_breeze check_by_ssh check_checkup_metric check_clamd check_cluster check_cronjobs check_crontabs check_dhcp check_dig check_disk check_disk_smb check_disk_writable check_dns check_dummy check_fbrs check_file_age check_files_age check_filesystems check_flexlm check_ftp check_gc check_hadoop_master_logfiles check_hdp_connectivity check_hdp_data_nodes check_hdp_hdfs 20   check_hdp_max_mr_settings check_hive 10   check_hive_nsc check_hive_server check_http check_icmp 0   check_ide_smart R check_ifoperstatus check_ifstatus check_imap check_ircd check_jabber check_load check_local_mail check_log check_log_updated check_mailq check_memcached check_minerva check_mrtg check_mrtgtraf check_mysql_repl check_nagios check_nntp check_nntps check_nrpe check_nt check_ntp check_ntp_peer check_ntp_time check_nwstat check_oracle check_overcr check_ping check_pop check_proc_filehandles check_procs check_real check_rpc check_sensors check_simap check_smtp check_spop check_ssh check_ssmtp check_swap check_swapping check_sys_filehandles check_ta_services check_tcp check_time check_udp check_ups check_users check_wave check_writeable_tmp 33
  • 34. Example Oozie Query SELECT a.todaystatus as today, a.yesterdaystatus as yday, j.status as parent, j.app_name, a.last_modified_time, a.nominal_time, a.id FROM ( SELECT t.status as todaystatus, y.status as yesterdaystatus, COALESCE(t.id, y.id) AS id, y.job_id, COALESCE(t.nominal_time, y.nominal_time) AS nominal_time, COALESCE(t.last_modified_time, y.last_modified_time) AS last_modified_time FROM (SELECT * FROM COORD_ACTIONS WHERE TIMESTAMPDIFF(DAY, last_modified_time, now()) = 0) t RIGHT OUTER JOIN (SELECT * FROM COORD_ACTIONS WHERE TIMESTAMPDIFF(DAY, last_modified_time, now()) = 1) y ON (t.job_id=y.job_id) WHERE COALESCE(t.status, '') NOT IN ('SUCCEEDED', 'WAITING') -- If they're WAITING today, then make sure yesterday ran OK. OR (t.status = 'WAITING' and y.status <> 'SUCCEEDED') UNION DISTINCT -- Dummy record to force the table to exist even when empty, since MySql -- otherwise emits nothing if data is not returned. SELECT 'EMPTY', 'RECORD', '', '', '', 'THIS IS A DUMMY RECORD' )a LEFT OUTER JOIN COORD_JOBS j ON a.job_id=j.id WHERE j.status = 'RUNNING' OR j.status IS NULL ; 34
  • 35. Sessions will resume at 4:30pm Page 35