Data Lake Archives

Data Lakes – Apache Spark in detail

Dirk BrysDecember 26, 2020Big Data

We already discussed the Spark Ecosystem. In this blog we’ll delve a bit deeper into the main reasons why you would use Spark. In short, for its distributed processing engine through Spark Clusters and it’s Query Optimization engine. Spark Clustering … Read More

5 Key Succes Factors for Data Intelligence

Dirk BrysNovember 28, 2020Big Data, Machine Learning

It astonishes me sometimes what people think AI is. It’s like a miracle box: you give your data to a vendor and by some magic suddenly a model comes out that predicts whatever you want to predict or detects any … Read More

Data Lakes – The Apache Spark Ecosystem

Dirk BrysSeptember 12, 2020Big Data

Once you have stored data you need to process it. Enters the distributed processing system called Apache Spark. Spark reads and processes data on a cluster of machines and, once processed, writes it back to either a distributed file system, … Read More

sAInce.io

Data Lakes – Apache Spark in detail

5 Key Succes Factors for Data Intelligence

Data Lakes – The Apache Spark Ecosystem

Data Lakes – Apache Parquet

Data Lakes – Amazon S3

Data Lakes – Reference Architecture