Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
-
Updated
Oct 10, 2019 - Python
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
This is our final project for SFU's CMPT 353 taught by Greg Baker during Summer 2023
work with a flight dataset and use Spark SQL to analyze flight delays, airport traffic, and other key metrics
Advanced Topics in Databases course project - NTUA ECE - 2022-23
Treat Spark like pandas.
Some batch processing demos with various data warehouses like local, S3 and HDFS in AWS
This project is about exploring and analysing E-commerce data. This primarily includes leveraging Apache Spark Dataframe API, joins, functions and aggregations to generate summarized results.
Add a description, image, and links to the spark-dataframes topic page so that developers can more easily learn about it.
To associate your repository with the spark-dataframes topic, visit your repo's landing page and select "manage topics."