SlideShare a Scribd company logo
5
Most read
8
Most read
10
Most read
Introduction to
Kafka Connect
Himani Arora
Software Consultant
Knoldus Software LLP
Topics Covered
● What is Kafka Connect ?
● Source and Sinks
● Motivation behind kafka Connect
● Use cases of kafka Connect
● Architecture
● Demo
What is Kafka Connect ?
● Added in 0.9 release of Apache Kafka.
● Tool for scalably and reliably streaming data
between Apache Kafka and other data systems.
● It abstracts away the common problems every
connector to Kafka needs to solve:
– schema management
– fault tolerance
– delivery semantics
– operations, monitoring etc.
What is Kafka Connect ?
Image source
Sources and Sinks
Image Source
Sources and Sinks
Motivation behind Kafka Connect
● Why build another framework when there are
already so many to choose from?
● most of the solutions do not integrate optimally
with a stream data platform.
Benefits of kafka Connect
● Broad copying by default
● Streaming and batch
● Scales to the application
● Focus on copying data only
● Accessible connector API
Architecture
● Three major models :
– Connector model
– Worker model
– Data model
Connector Model
● The connector model defines how third-party
developers create connector plugins which
import or export data from another system.
● The model has two key concepts:
– Connector
– Tasks
Connectors, tasks and workers
Image Source
Worker and Data Model
● The worker model represents the runtime in which
connectors and tasks execute.
● Worker model allows Kafka Connect to scale to the
application.
● The data model addresses the remaining
requirements, like coupling tightly with Kafka,
schema management etc..
● Kafka Connect tracks offsets for each connector
so that connectors can resume from their
previous position in the event of failures or
graceful restarts for maintenance.
● It has two types of workers:
– Standalone
– Distributed.
Worker and Data Model
Balancing Work
Balancing Work
Balancing Work
Questions
References
● https://cwiki.apache.org/confluence/pages/viewpag
● http://www.confluent.io/blog/announcing-kafka-conn
● http://docs.confluent.io/3.0.0/connect/intro.html
THANKTHANK
YOUYOU

More Related Content

PPTX
Kafka connect 101
Whiteklay
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PDF
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
PPTX
Apache kafka
Kumar Shivam
 
PDF
How to Build an Apache Kafka® Connector
confluent
 
PDF
Kafka Streams: What it is, and how to use it?
confluent
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
Kafka connect 101
Whiteklay
 
From Zero to Hero with Kafka Connect
confluent
 
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
Apache kafka
Kumar Shivam
 
How to Build an Apache Kafka® Connector
confluent
 
Kafka Streams: What it is, and how to use it?
confluent
 
Stream processing using Kafka
Knoldus Inc.
 
Apache Kafka Introduction
Amita Mirajkar
 

What's hot (20)

PDF
Introduction to Kafka Streams
Guozhang Wang
 
PPTX
Deep Dive into Apache Kafka
confluent
 
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
PDF
Getting Started with Confluent Schema Registry
confluent
 
PPTX
Kafka presentation
Mohammed Fazuluddin
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
PDF
Apache kafka
NexThoughts Technologies
 
PPTX
Kafka 101
Clement Demonchy
 
PPTX
Envoy and Kafka
Adam Kotwasinski
 
PPTX
Kafka 101
Aparna Pillai
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PDF
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
PDF
Distributed stream processing with Apache Kafka
confluent
 
PDF
ksqlDB: A Stream-Relational Database System
confluent
 
PPTX
An Introduction to Confluent Cloud: Apache Kafka as a Service
confluent
 
PPTX
A visual introduction to Apache Kafka
Paul Brebner
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Introduction to Kafka Streams
Guozhang Wang
 
Deep Dive into Apache Kafka
confluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Getting Started with Confluent Schema Registry
confluent
 
Kafka presentation
Mohammed Fazuluddin
 
Introduction to Apache Kafka
Shiao-An Yuan
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
Kafka 101
Clement Demonchy
 
Envoy and Kafka
Adam Kotwasinski
 
Kafka 101
Aparna Pillai
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
Distributed stream processing with Apache Kafka
confluent
 
ksqlDB: A Stream-Relational Database System
confluent
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
confluent
 
A visual introduction to Apache Kafka
Paul Brebner
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Ad

Similar to Introduction to Kafka connect (20)

PPTX
Introduction to kafka connector
Knoldus Inc.
 
PPTX
Introduction to Kafka Connectors (Knolx).pptx
Knoldus Inc.
 
PDF
Diving into the Deep End - Kafka Connect
confluent
 
PDF
Introduction to Kafka Connectors
Knoldus Inc.
 
PDF
Introduction to Kafka Connectors
Knoldus Inc.
 
PDF
Overview of Kafka connect
Knoldus Inc.
 
PDF
Overview of Kafka connect
Knoldus Inc.
 
PPTX
Data Pipelines with Kafka Connect
Kaufman Ng
 
PDF
Introducing Kafka Connect and Implementing Custom Connectors
Itai Yaffe
 
PDF
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
PPTX
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
PDF
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
PDF
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Keigo Suda
 
PDF
Partner Development Guide for Kafka Connect
confluent
 
PDF
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
PDF
Changing landscapes in data integration - Kafka Connect for near real-time da...
HostedbyConfluent
 
PDF
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
PDF
A Better Kafka Connect With Kubernetes, Stefan Sprenger & Hakan Lofcali | Cur...
HostedbyConfluent
 
PDF
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
Rich Lee
 
PDF
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Jen Aman
 
Introduction to kafka connector
Knoldus Inc.
 
Introduction to Kafka Connectors (Knolx).pptx
Knoldus Inc.
 
Diving into the Deep End - Kafka Connect
confluent
 
Introduction to Kafka Connectors
Knoldus Inc.
 
Introduction to Kafka Connectors
Knoldus Inc.
 
Overview of Kafka connect
Knoldus Inc.
 
Overview of Kafka connect
Knoldus Inc.
 
Data Pipelines with Kafka Connect
Kaufman Ng
 
Introducing Kafka Connect and Implementing Custom Connectors
Itai Yaffe
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Keigo Suda
 
Partner Development Guide for Kafka Connect
confluent
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
HostedbyConfluent
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
A Better Kafka Connect With Kubernetes, Stefan Sprenger & Hakan Lofcali | Cur...
HostedbyConfluent
 
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
Rich Lee
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Jen Aman
 
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
PPTX
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
PPTX
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
PPTX
Java 17 features and implementation.pptx
Knoldus Inc.
 
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
PPTX
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
PPTX
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
PPTX
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
PPTX
Intro to Azure Container App Presentation
Knoldus Inc.
 
PPTX
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
PPTX
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
PPTX
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
PPTX
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
Java 17 features and implementation.pptx
Knoldus Inc.
 
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
Intro to Azure Container App Presentation
Knoldus Inc.
 
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 

Recently uploaded (20)

PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PDF
49785682629390197565_LRN3014_Migrating_the_Beast.pdf
Abilash868456
 
PPTX
oapresentation.pptx
mehatdhavalrajubhai
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PPTX
Services offered by Dynamic Solutions in Pakistan
DaniyaalAdeemShibli1
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PDF
Become an Agentblazer Champion Challenge
Dele Amefo
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
QAware GmbH
 
PDF
Exploring AI Agents in Process Industries
amoreira6
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
49785682629390197565_LRN3014_Migrating_the_Beast.pdf
Abilash868456
 
oapresentation.pptx
mehatdhavalrajubhai
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Services offered by Dynamic Solutions in Pakistan
DaniyaalAdeemShibli1
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
Presentation about variables and constant.pptx
safalsingh810
 
Activate_Methodology_Summary presentatio
annapureddyn
 
Become an Agentblazer Champion Challenge
Dele Amefo
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
QAware GmbH
 
Exploring AI Agents in Process Industries
amoreira6
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 

Introduction to Kafka connect

Editor's Notes

  • #4: For a long time, companies used to do data processingas big batch jobs.CSV files dumped out of databases, log files collected at the end of the day. But businesses operate in real time.So, rather than processing data at the end of the day, why not react to it continuosuly as the data arrives.This is where stream processing came into picture And this shift led to the popularity of apache kafka. But even with apache kafka, building real time data pipeline has required some effort. And this is why kafka connect was announced as a new feature in 0.9 relaease of kafka
  • #5: Schema management: The ability of the data pipeline to carry schema information where it is available. In the absence of this capability, you end up having to recreate it downstream. Furthermore, if there are multiple consumers for the same data, then each consumer has to recreate it. Fault tolerance: Run several instances of a process and be resilient to failures Delivery semantics: Provide strong guarantees when machines fail or processes crash Operations and monitoring: Monitor the health and progress of every data integration process in a consistent manner
  • #6: Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other data systems. It makes it simple to quickly define connectors that move large data sets into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency.
  • #8: Sources import data into Kafka, and Sinks export data from Kafka. An implementation of a Source or Sink is a Connector. Users deploy connectors to enable data flows on Kafka Some of the certified connectors utilizing kafka connect framework are : Source -> Jdbc, couchbase, Apache ignite,cassandra Sink -> HDFS, Apache ignite, Solr
  • #9: where streaming, event-based data is the lingua franca and Kafka is the common medium that serves as a hub for all data. eg. in log metric collection processing frameworks like flume,logstash They do not handle integration well with batch systems. Operationally complex for large data pipelines where an agent runs for each server. Goblin,Siro ETL of data warehousing Specific use case. Work with single sink
  • #10: Quickly define connectors that copy vast quantities of data between systems Support copying to and from both streaming and batch-oriented systems. Scale down to a single process running one connector a small production environment, and scale up to an organization-wide service for copying data between a wide variety of large scale systems. Focus on reliable, scalable data copying; leave transformation, enrichment, and other modifications It is easy to develop new connectors. The API and runtime model for implementing new connectors should make it simple to use.
  • #12: Connectors are the largest logical unit of work in Kafka Connect and define where data should be copied to and from. This might cover copying a whole database or collection of databases into Kafka. connector does not perform any copying itself instead it schedules tasks for it. Tasks are responsible for producing or consuming sequences of Kafka ConnectRecords in order to copy data.
  • #13: Kafka Connect’s core concept that users interact with is a connector. Partitions are balanced evenly across tasks. Each task reads from its partitions, translates the data to Kafka Connect's format, decides the destination topic (and possibly partition) in Kafka.
  • #14: This layer decouples the logical work (connectors) from the physical execution (workers executing tasks) Workers are processes that execute connectors and tasks Workers automatically coordinate with each other to distribute work and provide scalability and fault tolerance. All other tasks like schema managemenet,tight coupling with kafka.
  • #15: so that connectors can resume from their previous position in the event of failures or graceful restarts for maintenance. Standalone mode is the simplest mode, where a single process is responsible for executing all connectors and tasks. Since it is a single process, it requires minimal configuration. In distributed mode, you start many worker processes using the same group.id and they automatically coordinate to schedule execution of connectors and tasks across all available workers.
  • #16: simple example of a cluster of 3 workers (processes launched via any mechanism you choose) running two connectors. The worker processes have balanced the connectors and tasks across themselves
  • #17: If a connector adds partitions, this causes it to regenerate task configurations.
  • #18: If one of the workers fails, the remaining workers rebalance the connectors and tasks so the work previously handled by the failed worker is moved to other workers: