Hadoop Hong Kong and China

Hadoop Hong Kong and China

By  Santiago Ron

[Hadoop Hong Kong & China

Big data is a collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. Big data is requiring instead exceptional technologies that efficiently process large quantities of data.


"For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration." - Jimmy Guterman


Hadoop is the de facto standard framework for big data processing, by making all of your data usable, not just what’s in databases, Hadoop lets you see relationships that were hidden before and reveal answers that have always been just out of reach.


Cases of Hadoop Applications:

  • Alibaba  
  •     •   Processing sorts of business data dumped out of database and joining them together.
  •     •   These data will then be fed into iSearch
  • Facebook
  •     •  Uses Hadoop to store copies of internal log and dimension data sources for reporting/analytics and machine learning.
  •     •  Currently has 2 major clusters:
  •             A 1100-machine cluster with 8800 cores and about 12 PB raw storage.
  •             A 300-machine cluster with 2400 cores and about 3 PB raw storage.
  •             Each (commodity) node has 8 cores and 12 TB of storage, heavily uses both streaming as well as the Java APIs,
  •     •  Built a higher level data warehousing framework
  • Yahoo!  
  •     •  More than 100,000 CPUs in  > 40,000 computers running Hadoop  
  •     •  The biggest cluster: 4500 nodes,  used to support research for Ad Systems and Web Search  
  •     •  Also used to do scaling tests to support development of Hadoop on larger clusters  
  •     •  Over 60% of Hadoop Jobs within Yahoo are Pig jobs. 
  • More cases ......

Hadoop delivers several key advantages:

  • Extremely cost effective to handle Big Data
  • Use with confidence 
  • Proven at scale
  • High Availability 
  • Big Data Random Access and Flexible Secondary Indexes 
  • HBase is Hadoop's Database, with Built-in load-balancing, Automatic versioning, Automatic failover and Built-in scalability. I
  • Store anything and NO information is lost  

Please feel free to use the contact us form to contact us now if you have any queries.

PostgreSQL, Open Source, database, Oracle, SQLServer, MYSQL