Skip to content
View huy-dataguy's full-sized avatar

Block or report huy-dataguy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
huy-dataguy/README.md

Hi there 👋, I'm Nguyễn Quốc Huy

🎓 Passionate Data Engineering Student

I'm a third-year Data Engineering student at HCMUTE, Vietnam, passionate about building scalable data pipelines, real-time streaming systems, and exploring modern Big Data technologies. I focus on mastering Big Data technologies, real-time streaming, data lakehouse and warehouse architectures, while also exploring full-stack development (MERN) to complement my data engineering skills.

🔭 I’m currently studying at HCMUTE (Ho Chi Minh City University of Technology and Education)

📫 How to reach me: quochuy.working@gmail.com

🚀 What I'm working on:

  • Designing and optimizing real-time data streaming systems data lakehouse and warehouse
  • Building efficient data pipelines with Apache Spark, Kafka, and Delta Lake
  • Exploring MLOps for deploying machine learning models
  • Experimenting with full-stack MERN applications

Connect with me

huyhocdata 100012067900880


Tech Stack

Apache Spark Apache Kafka Delta Lake MinIO Apache Airflow Apache Hadoop Trino Docker Apache Superset MERN Stack MongoDB Express.js React Node.js Python MySQL C#


Activity

huy-dataguy's GitHub activity graph

Pinned Loading

  1. NYC-Taxi-Lakehouse NYC-Taxi-Lakehouse Public

    Real-time Big Data Streaming simulating NYC taxi trip analytics using a modern Lakehouse architecture. Ingests high-volume Parquet data into Kafka, processes it with Spark Structured Streaming, sto…

    Python 3

  2. Spark-on-YARN Spark-on-YARN Public

    This repository contains the configuration and scripts necessary to run Apache Spark on a Hadoop YARN cluster in client mode. The setup allows you to leverage the scalability of YARN for distribute…

    Dockerfile

  3. Salus-Assistant-Hackathon2025 Salus-Assistant-Hackathon2025 Public

    JavaScript

  4. MERN-Stack-Book-Store MERN-Stack-Book-Store Public

    A full-stack MERN e-commerce bookstore with user authentication, admin dashboard, and inventory management — built while learning from FreeCodeCamp to strengthen my fullstack skills alongside data …

    JavaScript

  5. HadoopSphere HadoopSphere Public

    Forked from DOCUTEE/HaMu

    Containerized Hadoop cluster with Spark, Hive, Pig, HBase, and Zookeeper for scalable Big Data processing using Docker.

    Shell 2

  6. huy-dataguy.github.io huy-dataguy.github.io Public

    Shell