SlideShare a Scribd company logo
OPEN SOURCE LAMBDA ARCHITECTURE
KAFKA Ā· HADOOP Ā· SAMZA Ā· DRUID
FANGJIN YANG Ā· GIAN MERLINO Ā· DRUID COMMITTERS
PROBLEM DEALING WITH EVENT DATA
MOTIVATION EVOLUTION OF A ā€œREAL-TIMEā€ STACK
ARCHITECTURE THE ā€œRADā€-STACK
NEXT STEPS TRY IT OUT FOR YOURSELF
OVERVIEW
THE PROBLEM
2013
THE PROBLEM
‣ Arbitrary and interactive exploration of time series data
• Ad-tech, system/app metrics, network/website traffic analysis
‣ Multi-tenancy: lots of concurrent users
‣ Scalability: 10+ TB/day, ad-hoc queries on trillions of events
‣ Recency matters! Real-time analysis
2013
FINDING A SOLUTION
‣ Load all your data into Hadoop. Query it. Done!
‣ Good job guys, let’s go home
2013
FINDING A SOLUTION
Hadoop
EventStreams
Insight
2013
PROBLEMS WITH THE NAIVE SOLUTION
‣ MapReduce can handle almost every distributed computing
problem
‣ MapReduce over your raw data is flexible but slow
‣ Hadoop is not optimized for query latency
‣ To optimize queries, we need a query layer
2013
FINDING A SOLUTION
Hadoop (pre-processing and storage) Query Layer
Hadoop
EventStreams
Insight
A FASTER QUERY LAYER
2013
MAKE QUERIES FASTER
‣ What types of queries to optimize for?
• Revenue over time broken down by demographic
• Top publishers by clicks over the last month
• Number of unique visitors broken down by any dimension
• Not dumping the entire dataset
• Not examining individual events
2013
FINDING A SOLUTION
Hadoop (pre-processing and storage) RDBMS
Hadoop
EventStreams
Insight
2013
FINDING A SOLUTION
Hadoop (pre-processing and storage)
NoSQL K/V
Stores
Hadoop
EventStreams
Insight
2013
FINDING A SOLUTION
Hadoop (pre-processing and storage)
Commercial
Databases
Hadoop
EventStreams
Insight
DRUID AS A QUERY LAYER
2013
DRUID
‣ Druid project started in 2011, went open source in 2012
‣ Designed for low latency ingestion and ad-hoc aggregations
‣ Designed for keeping around a lot of history (years are ok)
‣ Growing Community
• ~100 contributors
• Used in production at numerous large and small organizations
2014
REALTIME INGESTION
>500K EVENTS / SECOND AVERAGE
>1M EVENTS / SECOND PEAK
10 – 100K EVENTS / SECOND / CORE
DRUID IN PRODUCTION
2014
0.0
0.5
1.0
1.5
0
1
2
3
4
0
5
10
15
20
90%ile95%ile99%ile
Feb 03 Feb 10 Feb 17 Feb 24
time
querytime(seconds)
datasource
a
b
c
d
e
f
g
h
Query latency percentiles
QUERY LATENCY (500MS AVERAGE)
90% < 1S 95% < 5S 99% < 10S
DRUID IN PRODUCTION
2013
RAW DATA
timestamp publisher advertiser gender country click price
2011-01-01T01:01:35Z bieberfever.com google.com Male USA 0 0.65
2011-01-01T01:03:63Z bieberfever.com google.com Male USA 0 0.62
2011-01-01T01:04:51Z bieberfever.com google.com Male USA 1 0.45
...
2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53
2013
ROLLUP DATA
timestamp publisher advertiser gender country impressions clicks revenue
2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31
2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01
‣ Truncate timestamps
‣ GroupBy over string columns (dimensions)
‣ Aggregate numeric columns (metrics)
2013
PARTITION DATA
timestamp publisher advertiser gender country impressions clicks revenue
2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31
2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01
‣ Shard data by time
‣ Immutable chunks of data called ā€œsegmentsā€
Segment 2011-01-01T02/2011-01-01T03
Segment 2011-01-01T01/2011-01-01T02
2013
IMMUTABLE SEGMENTS
‣ Fundamental storage unit in Druid
‣ Read consistency
‣ One thread scans one segment
‣ Multiple threads can access same underlying data
‣ Segment sizes -> computation completes in ms
‣ Simplifies distribution & replication
2013
COLUMN ORIENTATION
timestamp publisher advertiser gender country impressions clicks revenue
2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
‣ Scan/load only what you need
‣ Compression!
‣ Indexes!
DRUID INGESTION
‣ Must have denormalized, flat data
‣ Druid cannot do stateful processing at ingestion time
‣ …like stream-stream joins
‣ …or user session reconstruction
‣ …or a bunch of other useful things!
‣ Many Druid users need an ETL pipeline
2013
DRUID REAL-TIME INGESTION
Druid
Realtime
Workers
Immediate Druid
Historical
Nodes
Periodic
Druid
Broker
Nodes
Data
Source
User queries
2013
DRUID REAL-TIME INGESTION
Druid
Realtime
Workers
Druid
Historical
Nodes
Periodic
Druid
Broker
Nodes
Data
Source
User queries
2013
DRUID REAL-TIME INGESTION
Druid
Realtime
Workers
Immediate Druid
Historical
Nodes
Periodic
Druid
Broker
Nodes
Data
Source
Stream
Processor
User queries
2013
DRUID REAL-TIME INGESTION
Druid
Realtime
Workers
Immediate Druid
Historical
Nodes
Periodic
Druid
Broker
Nodes
User queries
STREAMING DATA PIPELINES
AN EXAMPLE: ONLINE ADS
‣ Input data: impressions, clicks, ID-to-name mappings
‣ Output: enhanced impressions
‣ Steps
‣ Join impressions with clicks ->ā€œclicksā€
‣ Look up IDs to names -> ā€œadvertiserā€, ā€œpublisherā€, …
‣ Geocode -> ā€œcountryā€, …
‣ Lots of other additions
PIPELINE
Impressions
Clicks
Druid
?
PIPELINE
Impressions
Partition 0
{key: 186bd591-9442-48f0, publisher: foo, …}
{key: 9b5e2cd2-a8ac-4232, publisher: qux, …}
…
Partition 1
{key: 1079026c-7151-4871, publisher: baz, …}
…
Clicks
Partition 0
…
Partition 1
{key: 186bd591-9442-48f0}
…
PIPELINE
Impressions
Clicks
Druid
PIPELINE
Impressions
Clicks
Shuffled
Shuffle
Druid
PIPELINE
Shuffled
Partition 0
{type: impression, key: 186bd591-9442-48f0, publisher: foo, …}
{type: impression, key: 1079026c-7151-4871, publisher: baz, …}
{type: click, key: 186bd591-9442-48f0}
…
Partition 1
{type: impression, key: 9b5e2cd2-a8ac-4232, publisher: qux, …}
…
PIPELINE
Impressions
Clicks
Shuffled
Shuffle
Druid
PIPELINE
Impressions
Clicks
Shuffled
Joined
Shuffle
Join
Druid
PIPELINE
Joined
Partition 0
{key: 186bd591-9442-48f0, is_clicked: true, publisher: foo, …}
{key: 1079026c-7151-4871, is_clicked: false, publisher: baz, …}
…
Partition 1
{key: 9b5e2cd2-a8ac-4232, is_clicked: false, publisher: qux, …}
…
PIPELINE
Impressions
Clicks
Shuffled
Joined
Shuffle
Join
Druid
PIPELINE
Impressions
Clicks
Shuffled
Joined
Shuffle
Join
Enhance & Output
Druid
ALTERNATIVE PIPELINE
Impressions
Clicks
Shuffled
Joined
Shuffle
Join
Enhance Druid
Enhanced
REPROCESSING
WHY REPROCESS DATA?
‣ Bugs in processing code
‣ Imprecise streaming operations
‣ …like using short join windows
‣ Limitations of current software
‣ …Kafka 0.8.x, Samza 0.9.x can generate duplicate messages
‣ …Druid 0.7.x streaming ingestion is best-effort
LAMBDA ARCHITECTURES
‣ Hybrid batch/streaming data pipeline
‣ Batch technologies
• Hadoop MapReduce
• Spark
‣ Streaming technologies
• Samza
• Storm
• Spark Streaming
LAMBDA ARCHITECTURES
‣ Advantages?
• Works as advertised
• Works with a huge variety of open software
• Druid supports batch-replace-by-time-range through Hadoop
LAMBDA ARCHITECTURES
‣ Disadvantages?
‣ Need code to run on two very different systems
‣ Maintaining two codebases is perilous
‣ …productivity loss
‣ …code drift
‣ …difficulty training new developers
LAMBDA ARCHITECTURES
Data
streaming
LAMBDA ARCHITECTURES
Data batch
LAMBDA ARCHITECTURES
Data
streaming
batch
KAPPA ARCHITECTURE
‣ Pure streaming
‣ Reprocess data by replaying the input stream
‣ Doesn’t require operating two systems
‣ Doesn’t overcome software limitations
‣ I don’t have much experience with this
‣ http://radar.oreilly.com/2014/07/questioning-the-lambda-
architecture.html
OPERATIONS
NICE THINGS ABOUT KAFKA
‣ Scalable, replicated pub/sub
‣ Replayable message logs
‣ New consumers can read all old messages
‣ Existing consumers can reprocess all old messages
NICE THINGS ABOUT SAMZA
‣ Multi-tenancy: one main thread per container
‣ Robustness: isolated containers limit slowness and failure
‣ Visibility
‣ Multistage jobs, lots of metrics per stage
‣ Can inspect the message queue in Kafka
‣ State is simple
‣ Logging and restoring handled for you
‣ Single-threaded programming
NICE THINGS ABOUT DRUID
‣ Fast ingestion, fast queries
‣ Seamlessly merge stream-ingested and batch-ingested data
‣ Batch loads can ā€œreplaceā€ stream loads for the same time range
NICE THINGS ABOUT HADOOP
‣ Solid batch processing system
‣ Easy to partition and reprocess data by time range
‣ Jobs can process all data, or a pre-partitioned slice
MONITORING
‣ Kafka partition availability
‣ Kafka log cleaner
‣ Samza consumer offsets
‣ Druid ingestion process rate
‣ Druid ingestion drop rate
‣ Druid query latency
‣ System metrics: CPU, network, disk
‣ Event counts at various stages
STREAM METRICS
STREAM METRICS
DO TRY THIS AT HOME
2013
CORNERSTONES
‣ Druid - druid.io - @druidio
‣ Samza - samza.apache.org - @samzastream
‣ Kafka - kafka.apache.org - @apachekafka
‣ Hadoop - hadoop.apache.org
GLUE
Tranquility
Camus / Secor Druid Hadoop indexer
GLUE
Camus / Secor Druid Hadoop indexer
druid-kaka-eight
TAKE AWAYS
‣ Consider Kafka for making your streams available
‣ Consider Samza for streaming data integration
‣ Consider Druid for interactive exploration of streams
‣ Metrics, metrics, metrics
‣ Have a reprocessing strategy if you’re interested in historical data
THANK YOU

More Related Content

PDF
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark Summit
Ā 
PDF
Recommendation System Explained
Crossing Minds
Ā 
PPT
Recommendation system
Vikrant Arya
Ā 
PPTX
Recommender system
Nilotpal Pramanik
Ā 
PDF
Recommender system algorithm and architecture
Liang Xiang
Ā 
PDF
GPT and other Text Transformers: Black Swans and Stochastic Parrots
Konstantin Savenkov
Ā 
PDF
Beyond Churn Prediction : An Introduction to uplift modeling
Pierre Gutierrez
Ā 
PDF
The Path To Success With Graph Database and Analytics
Neo4j
Ā 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark Summit
Ā 
Recommendation System Explained
Crossing Minds
Ā 
Recommendation system
Vikrant Arya
Ā 
Recommender system
Nilotpal Pramanik
Ā 
Recommender system algorithm and architecture
Liang Xiang
Ā 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
Konstantin Savenkov
Ā 
Beyond Churn Prediction : An Introduction to uplift modeling
Pierre Gutierrez
Ā 
The Path To Success With Graph Database and Analytics
Neo4j
Ā 

What's hot (20)

PDF
Machine Learning for Fraud Detection
Nitesh Kumar
Ā 
PDF
Overlapping Experiments Infrastructure
Srihari Sriraman
Ā 
PDF
4 Steps Toward Scientific A/B Testing
Janessa Lantz
Ā 
PDF
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
Ā 
PPTX
Topic sensitive page rank(review)
hongs
Ā 
PDF
Creating data apps using Streamlit in Python
Nithish Raghunandanan
Ā 
PPTX
AI-Augmented Drug Discovery - Creative Biolabs
Creative-Biolabs
Ā 
PDF
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai WƤhner
Ā 
PDF
Recommender Systems
Francesco Casalegno
Ā 
PDF
A Primer on Entity Resolution
Benjamin Bengfort
Ā 
PDF
Context-aware Recommendation: A Quick View
YONG ZHENG
Ā 
PDF
H2O.ai's Driverless AI
Sri Ambati
Ā 
PPTX
Recommendation system (1).pptx
prathammishra28
Ā 
PDF
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
Hakka Labs
Ā 
PDF
Conversational AI and Chatbot Integrations
Cristina Vidu
Ā 
PDF
Recommender Systems
Carlos Castillo (ChaTo)
Ā 
PPTX
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
Ā 
PPTX
LIONS KANKURGACHHI HOSPITALHOSPITAL INFORMATION PPTHOSPITAL INFORMATION PPTHO...
bhaskarganguly1976
Ā 
PDF
Marv Wexler - Transform Your with AI.pdf
SOLTUIONSpeople, THINKubators, THINKathons
Ā 
PDF
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Jim Dowling
Ā 
Machine Learning for Fraud Detection
Nitesh Kumar
Ā 
Overlapping Experiments Infrastructure
Srihari Sriraman
Ā 
4 Steps Toward Scientific A/B Testing
Janessa Lantz
Ā 
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
Ā 
Topic sensitive page rank(review)
hongs
Ā 
Creating data apps using Streamlit in Python
Nithish Raghunandanan
Ā 
AI-Augmented Drug Discovery - Creative Biolabs
Creative-Biolabs
Ā 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai WƤhner
Ā 
Recommender Systems
Francesco Casalegno
Ā 
A Primer on Entity Resolution
Benjamin Bengfort
Ā 
Context-aware Recommendation: A Quick View
YONG ZHENG
Ā 
H2O.ai's Driverless AI
Sri Ambati
Ā 
Recommendation system (1).pptx
prathammishra28
Ā 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
Hakka Labs
Ā 
Conversational AI and Chatbot Integrations
Cristina Vidu
Ā 
Recommender Systems
Carlos Castillo (ChaTo)
Ā 
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
Ā 
LIONS KANKURGACHHI HOSPITALHOSPITAL INFORMATION PPTHOSPITAL INFORMATION PPTHO...
bhaskarganguly1976
Ā 
Marv Wexler - Transform Your with AI.pdf
SOLTUIONSpeople, THINKubators, THINKathons
Ā 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Jim Dowling
Ā 
Ad

Similar to Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid (20)

PDF
Lambda Architectures in Practice
C4Media
Ā 
PPTX
Visual Mapping of Clickstream Data
DataWorks Summit
Ā 
PPTX
Understanding apache-druid
Suman Banerjee
Ā 
PPTX
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics
Ā 
PDF
Horses for Courses: Database Roundtable
Eric Kavanagh
Ā 
PDF
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Publicis Sapient Engineering
Ā 
PDF
Acting on Real-time Behavior: How Peak Games Won Transactions
VoltDB
Ā 
PDF
Wed-12-05pm-box-salmanahmed
Salman Ahmed
Ā 
PPTX
Web Performance Internals explained for Developers and other stake holders.
Sreejesh Madonandy
Ā 
PDF
Google Cloud Dataflow Two Worlds Become a Much Better One
DataWorks Summit
Ā 
PDF
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
Ā 
PPTX
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Ā 
PDF
The Future of Distributed Databases
NuoDB
Ā 
PDF
Introduction to Stream Processing
Guido Schmutz
Ā 
PDF
Cloud Big Data Architectures
Lynn Langit
Ā 
PDF
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
Ā 
PDF
Database Survival Guide: Exploratory Webcast
Eric Kavanagh
Ā 
PDF
Metadata Lakes for Next-Gen AI/ML - Datastrato
Zilliz
Ā 
PDF
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
Ā 
PPTX
DevOps Pipelines and Metrics Driven Feedback Loops
Andreas Grabner
Ā 
Lambda Architectures in Practice
C4Media
Ā 
Visual Mapping of Clickstream Data
DataWorks Summit
Ā 
Understanding apache-druid
Suman Banerjee
Ā 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics
Ā 
Horses for Courses: Database Roundtable
Eric Kavanagh
Ā 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Publicis Sapient Engineering
Ā 
Acting on Real-time Behavior: How Peak Games Won Transactions
VoltDB
Ā 
Wed-12-05pm-box-salmanahmed
Salman Ahmed
Ā 
Web Performance Internals explained for Developers and other stake holders.
Sreejesh Madonandy
Ā 
Google Cloud Dataflow Two Worlds Become a Much Better One
DataWorks Summit
Ā 
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
Ā 
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Ā 
The Future of Distributed Databases
NuoDB
Ā 
Introduction to Stream Processing
Guido Schmutz
Ā 
Cloud Big Data Architectures
Lynn Langit
Ā 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
Ā 
Database Survival Guide: Exploratory Webcast
Eric Kavanagh
Ā 
Metadata Lakes for Next-Gen AI/ML - Datastrato
Zilliz
Ā 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
Ā 
DevOps Pipelines and Metrics Driven Feedback Loops
Andreas Grabner
Ā 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
Ā 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Ā 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
Ā 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Ā 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Ā 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
Ā 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
Ā 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Ā 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Ā 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Ā 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Ā 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
Ā 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Ā 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Ā 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Ā 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Ā 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Ā 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Ā 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
Ā 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Ā 
Data Science Crash Course
DataWorks Summit
Ā 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Ā 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
Ā 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Ā 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Ā 
Managing the Dewey Decimal System
DataWorks Summit
Ā 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
Ā 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Ā 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Ā 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Ā 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Ā 
Security Framework for Multitenant Architecture
DataWorks Summit
Ā 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Ā 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Ā 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Ā 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Ā 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Ā 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Ā 
Computer Vision: Coming to a Store Near You
DataWorks Summit
Ā 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Ā 

Recently uploaded (20)

PPT
L2 Rules of Netiquette in Empowerment technology
Archibal2
Ā 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
Ā 
PDF
This slide provides an overview Technology
mineshkharadi333
Ā 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
Ā 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
Ā 
PPTX
Coupa-Overview _Assumptions presentation
annapureddyn
Ā 
PPTX
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
Ā 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
Ā 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
Ā 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
Ā 
PDF
Software Development Methodologies in 2025
KodekX
Ā 
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
Ā 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
Ā 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
Ā 
PDF
Software Development Company | KodekX
KodekX
Ā 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
Ā 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
Ā 
PDF
DevOps & Developer Experience Summer BBQ
AUGNYC
Ā 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
Ā 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
Ā 
L2 Rules of Netiquette in Empowerment technology
Archibal2
Ā 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
Ā 
This slide provides an overview Technology
mineshkharadi333
Ā 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
Ā 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
Ā 
Coupa-Overview _Assumptions presentation
annapureddyn
Ā 
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
Ā 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
Ā 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
Ā 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
Ā 
Software Development Methodologies in 2025
KodekX
Ā 
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
Ā 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
Ā 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
Ā 
Software Development Company | KodekX
KodekX
Ā 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
Ā 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
Ā 
DevOps & Developer Experience Summer BBQ
AUGNYC
Ā 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
Ā 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
Ā 

Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid