Dealing with Data in the Hadoop Ecosystem

Hadoop, Sqoop, and ZooKeeper

Kathleen Ting (@kate_ting), Technical Account Manager at Cloudera, and our own Andy Oram (@praxagora) sat down to discuss how to work with structured and unstructured data as well as how to keep a system up and running that is crunching that data.

Key highlights include:

  • Misconfigurations consist of almost half of the support issues that the team at Cloudera is seeing [Discussed at 0:22]
  • ZooKeeper, the canary in the Hadoop coal mine [Discussed at 1:10]
  • Leaky clients are often a problem ZooKeeper detects [Discussed at 2:10]
  • Sqoop is a bulk data transfer tool [Discussed at 2:47]
  • Sqoop helps to bring together structured and unstructured data [Discussed at 3:50]
  • ZooKeep is not for storage, but coordination, reliability, availability [Discussed at 4:44]

You can view the full interview here:


or listen to it here:

Related:

Related

Sign up for the O'Reilly Programming Newsletter to get weekly insight from industry insiders.