English 中文(简体)
Apache Flume - configuration
  • 时间:2024-12-22

Apache Flume - Configuration


Previous Page Next Page  

After instalpng Flume, we need to configure it using the configuration file which is a Java property file having key-value pairs. We need to pass values to the keys in the file.

In the Flume configuration file, we need to −

    Name the components of the current agent.

    Describe/Configure the source.

    Describe/Configure the sink.

    Describe/Configure the channel.

    Bind the source and the sink to the channel.

Usually we can have multiple agents in Flume. We can differentiate each agent by using a unique name. And using this name, we have to configure each agent.

Naming the Components

First of all, you need to name/pst the components such as sources, sinks, and the channels of the agent, as shown below.

agent_name.sources = source_name 
agent_name.sinks = sink_name 
agent_name.channels = channel_name 

Flume supports various sources, sinks, and channels. They are psted in the table given below.

Sources Channels Sinks

    Avro Source

    Thrift Source

    Exec Source

    JMS Source

    Spoopng Directory Source

    Twitter 1% firehose Source

    Kafka Source

    NetCat Source

    Sequence Generator Source

    Syslog Sources

    Syslog TCP Source

    Multiport Syslog TCP Source

    Syslog UDP Source

    HTTP Source

    Stress Source

    Legacy Sources

    Thrift Legacy Source

    Custom Source

    Scribe Source

    Memory Channel

    JDBC Channel

    Kafka Channel

    File Channel

    Spillable Memory Channel

    Pseudo Transaction Channel

    HDFS Sink

    Hive Sink

    Logger Sink

    Avro Sink

    Thrift Sink

    IRC Sink

    File Roll Sink

    Null Sink

    HBaseSink

    AsyncHBaseSink

    MorphpneSolrSink

    ElasticSearchSink

    Kite Dataset Sink

    Kafka Sink

You can use any of them. For example, if you are transferring Twitter data using Twitter source through a memory channel to an HDFS sink, and the agent name id TwitterAgent, then

TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel 
TwitterAgent.sinks = HDFS 

After psting the components of the agent, you have to describe the source(s), sink(s), and channel(s) by providing values to their properties.

Describing the Source

Each source will have a separate pst of properties. The property named “type” is common to every source, and it is used to specify the type of the source we are using.

Along with the property “type”, it is needed to provide the values of all the required properties of a particular source to configure it, as shown below.

agent_name.sources. source_name.type = value 
agent_name.sources. source_name.property2 = value 
agent_name.sources. source_name.property3 = value 

For example, if we consider the twitter source, following are the properties to which we must provide values to configure it.

TwitterAgent.sources.Twitter.type = Twitter (type name) 
TwitterAgent.sources.Twitter.consumerKey =  
TwitterAgent.sources.Twitter.consumerSecret = 
TwitterAgent.sources.Twitter.accessToken =   
TwitterAgent.sources.Twitter.accessTokenSecret = 

Describing the Sink

Just pke the source, each sink will have a separate pst of properties. The property named “type” is common to every sink, and it is used to specify the type of the sink we are using. Along with the property “type”, it is needed to provide values to all the required properties of a particular sink to configure it, as shown below.

agent_name.sinks. sink_name.type = value 
agent_name.sinks. sink_name.property2 = value 
agent_name.sinks. sink_name.property3 = value

For example, if we consider HDFS sink, following are the properties to which we must provide values to configure it.

TwitterAgent.sinks.HDFS.type = hdfs (type name)  
TwitterAgent.sinks.HDFS.hdfs.path = HDFS directory’s Path to store the data

Describing the Channel

Flume provides various channels to transfer data between sources and sinks. Therefore, along with the sources and the channels, it is needed to describe the channel used in the agent.

To describe each channel, you need to set the required properties, as shown below.

agent_name.channels.channel_name.type = value 
agent_name.channels.channel_name. property2 = value 
agent_name.channels.channel_name. property3 = value 

For example, if we consider memory channel, following are the properties to which we must provide values to configure it.

TwitterAgent.channels.MemChannel.type = memory (type name)

Binding the Source and the Sink to the Channel

Since the channels connect the sources and sinks, it is required to bind both of them to the channel, as shown below.

agent_name.sources.source_name.channels = channel_name 
agent_name.sinks.sink_name.channels = channel_name 

The following example shows how to bind the sources and the sinks to a channel. Here, we consider twitter source, memory channel, and HDFS sink.

TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channels = MemChannel 

Starting a Flume Agent

After configuration, we have to start the Flume agent. It is done as follows −

$ bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf 
Dflume.root.logger=DEBUG,console -n TwitterAgent 

where −

    agent − Command to start the Flume agent

    --conf ,-c<conf> − Use configuration file in the conf directory

    -f<file> − Specifies a config file path, if missing

    --name, -n <name> − Name of the twitter agent

    -D property =value − Sets a Java system property value.

Advertisements