English 中文(简体)
Zookeeper – Overview
  • 时间:2024-09-17

Zookeeper - Overview


Previous Page Next Page  

ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and managing a service in a distributed environment is a comppcated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core apppcation logic without worrying about the distributed nature of the apppcation.

The ZooKeeper framework was originally built at “Yahoo!” for accessing their apppcations in an easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses ZooKeeper to track the status of distributed data.

Before moving further, it is important that we know a thing or two about distributed apppcations. So, let us start the discussion with a quick overview of distributed apppcations.

Distributed Apppcation

A distributed apppcation can run on multiple systems in a network at a given time (simultaneously) by coordinating among themselves to complete a particular task in a fast and efficient manner. Normally, complex and time-consuming tasks, which will take hours to complete by a non-distributed apppcation (running in a single system) can be done in minutes by a distributed apppcation by using computing capabipties of all the system involved.

The time to complete the task can be further reduced by configuring the distributed apppcation to run on more systems. A group of systems in which a distributed apppcation is running is called a Cluster and each machine running in a cluster is called a Node.

A distributed apppcation has two parts, Server and Cpent apppcation. Server apppcations are actually distributed and have a common interface so that cpents can connect to any server in the cluster and get the same result. Cpent apppcations are the tools to interact with a distributed apppcation.

Distributed Apppcation

Benefits of Distributed Apppcations

    Repabipty − Failure of a single or a few systems does not make the whole system to fail.

    Scalabipty − Performance can be increased as and when needed by adding more machines with minor change in the configuration of the apppcation with no downtime.

    Transparency − Hides the complexity of the system and shows itself as a single entity / apppcation.

Challenges of Distributed Apppcations

    Race condition − Two or more machines trying to perform a particular task, which actually needs to be done only by a single machine at any given time. For example, shared resources should only be modified by a single machine at any given time.

    Deadlock − Two or more operations waiting for each other to complete indefinitely.

    Inconsistency − Partial failure of data.

What is Apache ZooKeeper Meant For?

Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintain shared data with robust synchronization techniques. ZooKeeper is itself a distributed apppcation providing services for writing a distributed apppcation.

The common services provided by ZooKeeper are as follows −

    Naming service − Identifying the nodes in a cluster by name. It is similar to DNS, but for nodes.

    Configuration management − Latest and up-to-date configuration information of the system for a joining node.

    Cluster management − Joining / leaving of a node in a cluster and node status at real time.

    Leader election − Electing a node as leader for coordination purpose.

    Locking and synchronization service − Locking the data while modifying it. This mechanism helps you in automatic fail recovery while connecting other distributed apppcations pke Apache HBase.

    Highly repable data registry − Availabipty of data even when one or a few nodes are down.

Distributed apppcations offer a lot of benefits, but they throw a few complex and hard-to-crack challenges as well. ZooKeeper framework provides a complete mechanism to overcome all the challenges. Race condition and deadlock are handled using fail-safe synchronization approach. Another main drawback is inconsistency of data, which ZooKeeper resolves with atomicity.

Benefits of ZooKeeper

Here are the benefits of using ZooKeeper −

    Simple distributed coordination process

    Synchronization − Mutual exclusion and co-operation between server processes. This process helps in Apache HBase for configuration management.

    Ordered Messages

    Seriapzation − Encode the data according to specific rules. Ensure your apppcation runs consistently. This approach can be used in MapReduce to coordinate queue to execute running threads.

    Repabipty

    Atomicity − Data transfer either succeed or fail completely, but no transaction is partial.

Advertisements