Apache Spark GPU support

Dirk BrysBig Data, Machine Learning

By now you should already have a fair understanding of the Apache Spark overall architecture. In this talk we’ll dive deeper into the GPU support features that Spark 3.0.0 added in June 2020. GPU stands for Graphical Processing Unit and … Read More

Data Lakes – Apache Spark in detail

Dirk BrysBig Data

We already discussed the Spark Ecosystem. In this blog we’ll delve a bit deeper into the main reasons why you would use Spark. In short, for its distributed processing engine through Spark Clusters and it’s Query Optimization engine. Spark Clustering … Read More

Data Lakes – The Apache Spark Ecosystem

Dirk BrysBig Data

Once you have stored data you need to process it. Enters the distributed processing system called Apache Spark. Spark reads and processes data on a cluster of machines and, once processed, writes it back to either a distributed file system, … Read More