Thursday, 17 October 2013

Hadoop Distributions

Below are the companies offering commercial implementations and/or providing support for Apache Hadoop, which is the base for all the below.


  • Cloudera offers CDH (Cloudera's Distribution including Apache Hadoop) and Cloudera Enterprise.
  • Hortonworks (formed by Yahoo and Benchmark Capital), whose focus is on making Hadoop more robust and easier to install, manage and use for enterprise users. Hortonworks provides Hortonworks Data Platform (HDP).
  • MapR Technologies offers distributed filesystem and MapReduce engine, the MapR Distribution for Apache Hadoop.
  • Oracle announced the Big Data Appliance, which integrates Cloudera's Distribution Including Apache Hadoop (CDH).
  • IBM offers InfoSphere BigInsights based on Hadoop in both a basic and enterprise edition.
  • Greenplum, A Division of EMC, offers Hadoop in Community and Enterprise editions.
  • Intel - the Intel Distribution for Apache Hadoop is the product includes the Intel Manager for Apache Hadoop for managing a cluster.
  • Amazon Web Services - Amazon offers a version of Apache Hadoop on their EC2 infrastructure, sold as Amazon Elastic MapReduce.
  • VMware - Initiate Open Source project and product to enable easily and efficiently deploy and use Hadoop on virtual infrastructure.
  • Bigtop - project for the development of packaging and tests of the Apache Hadoop ecosystem.
  • DataStax - DataStax provides a product of Hadoop which fully integrates Apache Hadoop with Apache Cassandra and Apache Solr in its DataStax Enterprise platform.
  • Cascading - A popular feature-rich API for defining and executing complex and fault tolerant data processing workflows on a Apache Hadoop cluster. 
  • Mahout - Apache project using Hadoop to build scalable machine learning algorithms like canopy clustering, k-means and many more.
  • Cloudspace - uses Apache Hadoop to scale client and internal projects on Amazon's EC2 and bare metal architectures.
  • Datameer - Datameer Analytics Solution (DAS) is a Hadoop-based solution for big data analytics that includes data source integration, storage, an analytics engine and visualization.
  • Data Mine Lab - Developing solutions based on Hadoop, Mahout, HBase and Amazon Web Services.
  • Debian - A Debian package of Apache Hadoop is available.
  • HStreaming - offers real-time stream processing and continuous advanced analytics built into Hadoop, available as free community edition, enterprise edition, and cloud service.
  • Impetus
  • Karmasphere - Distributes Karmasphere Studio for Hadoop, which allows cross-version development and management of Apache Hadoop jobs.
  • Nutch - Apache Nutch, flexible web search engine software.
  • NGDATA - Makes available Lily Open Source that builds upon Hadoop, HBase and SOLR. Distributes Lily Enterprise.
  • Pentaho – Pentaho provides a complete, end-to-end open-source BI and offers an easy-to-use, graphical ETL tool that is integrated with Apache Hadoop for managing data and coordinating Hadoop related tasks in the broader context of ETL and Business Intelligence workflow.
  • Pervasive Software - Provides Pervasive DataRush, a parallel dataflow framework which improves performance of Apache Hadoop and MapReduce jobs by exploiting fine-grained parallelism on multicore servers.
  • Platform Computing - Provides an Enterprise Class MapReduce solution for Big Data Analytics with high scalability and fault tolerance. Platform MapReduce provides unique scheduling capabilities and its architecture is based on almost two decades of distributed computing research and development.
  • Sematext International - Provides consulting services around Apache Hadoop and Apache HBase, along with large-scale search using Apache Lucene, Apache Solr, and Elastic Search.
  • Talend - Talend Platform for Big Data includes support and management tools for all the major Apache Hadoop distributions. Talend Open Studio for Big Data is an Apache License Eclipse IDE, which provides a set of graphical components for HDFS, HBase, Pig, Sqoop and Hive.
  • Think Big Analytics - Offers expert consulting services specializing in Apache Hadoop, MapReduce and related data processing architectures.
  • Tresata - Financial Industry's first software platform architected from the ground up on Hadoop. Data storage, processing, analytics and visualization all done on Hadoop.
  • WANdisco is a committed member & sponsor of the Apache Software community and has active committers on several projects including Apache Hadoop

No comments:

Post a Comment