- Apache Flume - NetCat Source
- Sequence Generator Source
- Apache Flume - Fetching Twitter Data
- Apache Flume - configuration
- Apache Flume - Environment
- Apache Flume - Data Flow
- Apache Flume - Architecture
- Data Transfer in Hadoop
- Apache Flume - Introduction
- Apache Flume - Home
Apache Flume Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Apache Flume - Configuration
After instalpng Flume, we need to configure it using the configuration file which is a Java property file having key-value pairs. We need to pass values to the keys in the file.
In the Flume configuration file, we need to −
Name the components of the current agent.
Describe/Configure the source.
Describe/Configure the sink.
Describe/Configure the channel.
Bind the source and the sink to the channel.
Usually we can have multiple agents in Flume. We can differentiate each agent by using a unique name. And using this name, we have to configure each agent.
Naming the Components
First of all, you need to name/pst the components such as sources, sinks, and the channels of the agent, as shown below.
agent_name.sources = source_name agent_name.sinks = sink_name agent_name.channels = channel_name
Flume supports various sources, sinks, and channels. They are psted in the table given below.
Sources | Channels | Sinks |
---|---|---|
Avro Source Thrift Source Exec Source JMS Source Spoopng Directory Source Twitter 1% firehose Source Kafka Source NetCat Source Sequence Generator Source Syslog Sources Syslog TCP Source Multiport Syslog TCP Source Syslog UDP Source HTTP Source Stress Source Legacy Sources Thrift Legacy Source Custom Source Scribe Source |
Memory Channel JDBC Channel Kafka Channel File Channel Spillable Memory Channel Pseudo Transaction Channel |
HDFS Sink Hive Sink Logger Sink Avro Sink Thrift Sink IRC Sink File Roll Sink Null Sink HBaseSink AsyncHBaseSink MorphpneSolrSink ElasticSearchSink Kite Dataset Sink Kafka Sink |
You can use any of them. For example, if you are transferring Twitter data using Twitter source through a memory channel to an HDFS sink, and the agent name id TwitterAgent, then
TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS
After psting the components of the agent, you have to describe the source(s), sink(s), and channel(s) by providing values to their properties.
Describing the Source
Each source will have a separate pst of properties. The property named “type” is common to every source, and it is used to specify the type of the source we are using.
Along with the property “type”, it is needed to provide the values of all the required properties of a particular source to configure it, as shown below.
agent_name.sources. source_name.type = value agent_name.sources. source_name.property2 = value agent_name.sources. source_name.property3 = value
For example, if we consider the twitter source, following are the properties to which we must provide values to configure it.
TwitterAgent.sources.Twitter.type = Twitter (type name) TwitterAgent.sources.Twitter.consumerKey = TwitterAgent.sources.Twitter.consumerSecret = TwitterAgent.sources.Twitter.accessToken = TwitterAgent.sources.Twitter.accessTokenSecret =
Describing the Sink
Just pke the source, each sink will have a separate pst of properties. The property named “type” is common to every sink, and it is used to specify the type of the sink we are using. Along with the property “type”, it is needed to provide values to all the required properties of a particular sink to configure it, as shown below.
agent_name.sinks. sink_name.type = value agent_name.sinks. sink_name.property2 = value agent_name.sinks. sink_name.property3 = value
For example, if we consider HDFS sink, following are the properties to which we must provide values to configure it.
TwitterAgent.sinks.HDFS.type = hdfs (type name) TwitterAgent.sinks.HDFS.hdfs.path = HDFS directory’s Path to store the data
Describing the Channel
Flume provides various channels to transfer data between sources and sinks. Therefore, along with the sources and the channels, it is needed to describe the channel used in the agent.
To describe each channel, you need to set the required properties, as shown below.
agent_name.channels.channel_name.type = value agent_name.channels.channel_name. property2 = value agent_name.channels.channel_name. property3 = value
For example, if we consider memory channel, following are the properties to which we must provide values to configure it.
TwitterAgent.channels.MemChannel.type = memory (type name)
Binding the Source and the Sink to the Channel
Since the channels connect the sources and sinks, it is required to bind both of them to the channel, as shown below.
agent_name.sources.source_name.channels = channel_name agent_name.sinks.sink_name.channels = channel_name
The following example shows how to bind the sources and the sinks to a channel. Here, we consider twitter source, memory channel, and HDFS sink.
TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sinks.HDFS.channels = MemChannel
Starting a Flume Agent
After configuration, we have to start the Flume agent. It is done as follows −
$ bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf Dflume.root.logger=DEBUG,console -n TwitterAgent
where −
agent − Command to start the Flume agent
--conf ,-c<conf> − Use configuration file in the conf directory
-f<file> − Specifies a config file path, if missing
--name, -n <name> − Name of the twitter agent
-D property =value − Sets a Java system property value.