Automated, programmatic provisioning, configuration and management of Hortonworks Data Platform (HDP) cluster.

Some of the problems included:

  • Company’s product required automated, programmatic provisioning, configuration and management of Hortonworks Data Platform (HDP) cluster
  • Running of MapReduce, Yarn and Oozie jobs in HDP cluster could only be done from edge node or via GUI and could not be done programmatically
  • It was not known whether bootstrapping hosts in HDP cluster requires GUI or can be done programmatically

Some of the solutions applied included:

  • Researching and prototyping to understand how provisioning, configuration and management of Hortonworks Data Platform (HDP) cluster and its components can be automated
  • Developing proof of concept for bootstrapping hosts in HDP cluster programmatically
  • Implementing on-demand programmatic provisioning, configuration and management of Hortonworks Data Platform (HDP) cluster off blueprint and integrating it into Company’s product
  • Implementing automated generation of graph depicting dependencies of HDP stack components to facilitate planning of topologies for HDP cluster
  • Implementing running of MapReduce, Yarn and Oozie jobs programmatically in HDP cluster
  • Implementing transferring data in and out of HDP cluster programmatically via HDFS web service (HttpFS)

Technology stack

  • Java
  • Spring
  • OSGi
  • vSphere
  • VMware VI (vSphere) Java API
  • Hortonworks Data Platform (HDP), including:
    • Ambari
    • HDFS
    • HttpFS
    • MapReduce
    • Yarn
    • Zookeeper
    • Spark
    • Storm
    • Pig
    • Hive
    • Kafka
    • Flume
    • Oozie
    • Nifi
    • Hue
    • Zeppelin
  • SAP HANA Vora
