Big Data is growing exponentially, requiring massive-scale infrastructure however, business analytics has shifted from reactive to proactive analysis; this is the era of streaming data (a.k.a. Fast Data). Apache Hadoop is very good for analyzing data at rest but cannot handle streaming data.
Big Data analytics needs new Big data frameworks. Apache Spark brings in-memory processing and RDD data abstraction which allow real-time processing of streaming data however its micro batch architecture incurs high latency. Apache Flink brings low latency and could address Spark limitations however it is not as mature and largely adopted as Spark.
Apache Beam promotes its portable Beam model across Big data frameworks (Spark, Flink, Dataflow).
Apache Beam promotes its portable Beam model across Big data frameworks (Spark, Flink, Dataflow).
Tis session presents and overview of the major Big Data frameworks and suggests that DBA should embrace these frameworks and expand their skills as a necessary path to becoming Data Architects.