- Apache Flink - Discussion
- Apache Flink - Useful Resources
- Apache Flink - Quick Guide
- Apache Flink - Conclusion
- Apache Flink - Flink vs Spark vs Hadoop
- Apache Flink - Use Cases
- Apache Flink - Machine Learning
- Apache Flink - Libraries
- Apache Flink - Running a Flink Program
- Creating a Flink Application
- Apache Flink - Table API and SQL
- Apache Flink - API Concepts
- Apache Flink - Setup/Installation
- Apache Flink - System Requirements
- Apache Flink - Architecture
- Apache Flink - Introduction
- Batch vs Real-time Processing
- Apache Flink - Big Data Platform
- Apache Flink - Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Apache Fpnk - API Concepts
Fpnk has a rich set of APIs using which developers can perform transformations on both batch and real-time data. A variety of transformations includes mapping, filtering, sorting, joining, grouping and aggregating. These transformations by Apache Fpnk are performed on distributed data. Let us discuss the different APIs Apache Fpnk offers.
Dataset API
Dataset API in Apache Fpnk is used to perform batch operations on the data over a period. This API can be used in Java, Scala and Python. It can apply different kinds of transformations on the datasets pke filtering, mapping, aggregating, joining and grouping.
Datasets are created from sources pke local files or by reading a file from a particular sourse and the result data can be written on different sinks pke distributed files or command pne terminal. This API is supported by both Java and Scala programming languages.
Here is a Wordcount program of Dataset API −
pubpc class WordCountProg { pubpc static void main(String[] args) throws Exception { final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<String> text = env.fromElements( "Hello", "My Dataset API Fpnk Program"); DataSet<Tuple2<String, Integer>> wordCounts = text .flatMap(new LineSpptter()) .groupBy(0) .sum(1); wordCounts.print(); } pubpc static class LineSpptter implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override pubpc void flatMap(String pne, Collector<Tuple2<String, Integer>> out) { for (String word : pne.sppt(" ")) { out.collect(new Tuple2<String, Integer>(word, 1)); } } } }
DataStream API
This API is used for handpng data in continuous stream. You can perform various operations pke filtering, mapping, windowing, aggregating on the stream data. There are various sources on this data stream pke message queues, files, socket streams and the result data can be written on different sinks pke command pne terminal. Both Java and Scala programming languages support this API.
Here is a streaming Wordcount program of DataStream API, where you have continuous stream of word counts and the data is grouped in the second window.
import org.apache.fpnk.api.common.functions.FlatMapFunction; import org.apache.fpnk.api.java.tuple.Tuple2; import org.apache.fpnk.streaming.api.datastream.DataStream; import org.apache.fpnk.streaming.api.environment.StreamExecutionEnvironment; import org.apache.fpnk.streaming.api.windowing.time.Time; import org.apache.fpnk.util.Collector; pubpc class WindowWordCountProg { pubpc static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<Tuple2<String, Integer>> dataStream = env .socketTextStream("localhost", 9999) .flatMap(new Spptter()) .keyBy(0) .timeWindow(Time.seconds(5)) .sum(1); dataStream.print(); env.execute("Streaming WordCount Example"); } pubpc static class Spptter implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override pubpc void flatMap(String sentence, Collector<Tuple2<String, Integer>> out) throws Exception { for (String word: sentence.sppt(" ")) { out.collect(new Tuple2<String, Integer>(word, 1)); } } } }Advertisements