Hadoop - A case study HSBC

Hadoop - A case study HSBC

By  soeperbaby

Hadoop Application Track - Financial Industry Case Study HSBC

Introducing new technology to the enterprise via 3 steps:

  • Learn: Proof of Concept
  • Plan: Business Value
  • Build: Pilot Projects / Strategic Stack
slide-1-638
slide-2-638
slide-3-638
slide-4-638
slide-5-638
slide-6-638
slide-7-638
slide-8-638
slide-9-638
slide-10-638
slide-11-638
slide-12-638
slide-13-638
slide-14-638
slide-15-638
slide-16-638
slide-17-638
slide-18-638
slide-19-638
slide-20-638
slide-21-638

What is new in Hadoop 2.3

 The community as a whole has invested heavily in making the namenode highly available as well as Federation and Snapshots, we are pleased to announce that Hadoop 2.3 is released, it is the *General Availability GA* release of  Hadoop 2.x series!

The significant highlights of Hadoop 2.3:

  • YARN - A general purpose resource management system for Hadoop to allow MapReduce and other other data processing frameworks and services. 
  • High Availability for HDFS: The HDFS High Availability feature provides the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.
  • HDFS Federation: In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated, that is, the Namenodes are independent and don’t require coordination with each other. The datanodes are used as common storage for blocks by all the Namenodes. Each datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports and handles commands from the Namenodes.  Key Benefits are 
    • Namespace Scalability - Large deployments using lot of small files benefit from scaling the namespace by adding more Namenodes to the cluster 
    • Performance - Adding more Namenodes to the cluster scales the file system read/write operations throughput.
    • Isolation - With multiple Namenodes, different categories of applications and users can be isolated to different namespaces.
  • HDFS Snapshots: HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.
PostgreSQL, Open Source, database, Oracle, SQLServer, MYSQL