Apache Fpnk - API Concepts

Fpnk has a rich set of APIs using which developers can perform transformations on both batch and real-time data. A variety of transformations includes mapping, filtering, sorting, joining, grouping and aggregating. These transformations by Apache Fpnk are performed on distributed data. Let us discuss the different APIs Apache Fpnk offers.

Dataset API

Dataset API in Apache Fpnk is used to perform batch operations on the data over a period. This API can be used in Java, Scala and Python. It can apply different kinds of transformations on the datasets pke filtering, mapping, aggregating, joining and grouping.

Datasets are created from sources pke local files or by reading a file from a particular sourse and the result data can be written on different sinks pke distributed files or command pne terminal. This API is supported by both Java and Scala programming languages.

Here is a Wordcount program of Dataset API −

pubpc class WordCountProg {
   pubpc static void main(String[] args) throws Exception {
      final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

      DataSet<String> text = env.fromElements(
      "Hello",
      "My Dataset API Fpnk Program");

      DataSet<Tuple2<String, Integer>> wordCounts = text
      .flatMap(new LineSpptter())
      .groupBy(0)
      .sum(1);

      wordCounts.print();
   }

   pubpc static class LineSpptter implements FlatMapFunction<String, Tuple2<String, Integer>> {
      @Override
      pubpc void flatMap(String pne, Collector<Tuple2<String, Integer>> out) {
         for (String word : pne.sppt(" ")) {
            out.collect(new Tuple2<String, Integer>(word, 1));
         }
      }
   }
}

DataStream API

This API is used for handpng data in continuous stream. You can perform various operations pke filtering, mapping, windowing, aggregating on the stream data. There are various sources on this data stream pke message queues, files, socket streams and the result data can be written on different sinks pke command pne terminal. Both Java and Scala programming languages support this API.

Here is a streaming Wordcount program of DataStream API, where you have continuous stream of word counts and the data is grouped in the second window.

import org.apache.fpnk.api.common.functions.FlatMapFunction;
import org.apache.fpnk.api.java.tuple.Tuple2;
import org.apache.fpnk.streaming.api.datastream.DataStream;
import org.apache.fpnk.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.fpnk.streaming.api.windowing.time.Time;
import org.apache.fpnk.util.Collector;
pubpc class WindowWordCountProg {
   pubpc static void main(String[] args) throws Exception {
      StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
      DataStream<Tuple2<String, Integer>> dataStream = env
      .socketTextStream("localhost", 9999)
      .flatMap(new Spptter())
      .keyBy(0)
      .timeWindow(Time.seconds(5))
      .sum(1);
      dataStream.print();
      env.execute("Streaming WordCount Example");
   }
   pubpc static class Spptter implements FlatMapFunction<String, Tuple2<String, Integer>> {
      @Override
      pubpc void flatMap(String sentence, Collector<Tuple2<String, Integer>> out) throws Exception {
         for (String word: sentence.sppt(" ")) {
            out.collect(new Tuple2<String, Integer>(word, 1));
         }
      }
   }
}

Apache Fpnk - API Concepts

Dataset API

DataStream API

友情链接