What Is Apache Kafka?

Developed as a publish-subscribe messaging system to handle mass amounts of data at LinkedIn, today, Apache Kafka® is an open-source distributed event streaming platform used by over 80% of the Fortune 100.

This beginner’s Kafka tutorial will help you learn Kafka, its benefits, and use cases, and how to get started from the ground up. It includes a look at Kafka architecture, core concepts, and the connector ecosystem.

Introduction

Apache Kafka is an event streaming platform used to collect, process, store, and integrate data at scale. It has numerous use cases including distributed streaming, stream processing, data integration, and pub/sub messaging.

In order to make complete sense of what Kafka does, we'll delve into what an event streaming platform is and how it works. So before delving into Kafka architecture or its core components, let's discuss what an event is. This will help explain how Kafka stores events, how to get events in and out of the system, and how to analyze event streams.

what is kafka and how does it work

What Is an Event?

An event is any type of action, incident, or change that's identified or recorded by software or applications. For example, a payment, a website click, or a temperature reading, along with a description of what happened.

In other words, an event is a combination of notification—the element of when-ness that can be used to trigger some other activity—and state. That state is usually fairly small, say less than a megabyte or so, and is normally represented in some structured format, say in JSON or an object serialized with Apache Avro™ or Protocol Buffers.

Kafka and Events – Key/Value Pairs

Kafka is based on the abstraction of a distributed commit log. By splitting a log into partitions, Kafka is able to scale-out systems. As such, Kafka models events as key/value pairs . Internally, keys and values are just sequences of bytes, but externally in your programming language of choice, they are often structured objects represented in your language’s type system. Kafka famously calls the translation between language types and internal bytes serialization and deserialization. The serialized format is usually JSON, JSON Schema, Avro, or Protobuf.

Values are typically the serialized representation of an application domain object or some form of raw message input, like the output of a sensor.

Keys can also be complex domain objects but are often primitive types like strings or integers. The key part of a Kafka event is not necessarily a unique identifier for the event, like the primary key of a row in a relational database would be. It is more likely the identifier of some entity in the system, like a user, order, or a particular connected device.

This may not sound so significant now, but we’ll see later on that keys are crucial for how Kafka deals with things like parallelization and data locality.

Why Kafka? Benefits and Use Cases

Kafka is used by over 100,000 organizations across the world and is backed by a thriving community of professional developers, who are constantly advancing the state of the art in stream processing together. Due to Kafka's high throughput, fault tolerance, resilience, and scalability, there are numerous use cases across almost every industry - from banking and fraud detection, to transportation and IoT. We typically see Kafka used for purposes like those below.

Data Integration

Kafka can connect to nearly any other data source in traditional enterprise information systems, modern databases, or in the cloud. It forms an efficient point of integration with built-in data connectors, without hiding logic or routing inside brittle, centralized infrastructure.

Metrics and Monitoring

Kafka is often used for monitoring operational data. This involves aggregating statistics from distributed applications to produce centralized feeds with real-time metrics.

Log Aggregation

A modern system is typically a distributed system, and logging data must be centralized from the various components of the system to one place. Kafka often serves as a single source of truth by centralizing data across all sources, regardless of form or volume.

Stream Processing

Performing real-time computations on event streams is a core competency of Kafka. From real-time data processing to dataflow programming, Kafka ingests, stores, and processes streams of data as it's being generated, at any scale.

Publish-Subscribe Messaging

As a distributed pub/sub messaging system, Kafka works well as a modernized version of the traditional message broker. Any time a process that generates events must be decoupled from the process or from processes receiving the events, Kafka is a scalable and flexible way to get the job done.

Kafka Architecture – Fundamental Concepts

Kafka topics.

Events have a tendency to proliferate—just think of the events that happened to you this morning—so we’ll need a system for organizing them. Kafka’s most fundamental unit of organization is the topic , which is something like a table in a relational database. As a developer using Kafka, the topic is the abstraction you probably think the most about. You create different topics to hold different kinds of events and different topics to hold filtered and transformed versions of the same kind of event.

A topic is a log of events. Logs are easy to understand, because they are simple data structures with well-known semantics. First, they are append only: When you write a new message into a log, it always goes on the end. Second, they can only be read by seeking an arbitrary offset in the log, then by scanning sequential log entries. Third, events in the log are immutable—once something has happened, it is exceedingly difficult to make it un-happen. The simple semantics of a log make it feasible for Kafka to deliver high levels of sustained throughput in and out of topics, and also make it easier to reason about the replication of topics, which we’ll cover more later.

Logs are also fundamentally durable things. Traditional enterprise messaging systems have topics and queues, which store messages temporarily to buffer them between source and destination.

Since Kafka topics are logs, there is nothing inherently temporary about the data in them. Every topic can be configured to expire data after it has reached a certain age (or the topic overall has reached a certain size), from as short as seconds to as long as years or even to retain messages indefinitely . The logs that underlie Kafka topics are files stored on disk. When you write an event to a topic, it is as durable as it would be if you had written it to any database you ever trusted.

The simplicity of the log and the immutability of the contents in it are key to Kafka’s success as a critical component in modern data infrastructure—but they are only the beginning.

Kafka Partitioning

If a topic were constrained to live entirely on one machine, that would place a pretty radical limit on the ability of Kafka to scale. It could manage many topics across many machines—Kafka is a distributed system, after all—but no one topic could ever get too big or aspire to accommodate too many reads and writes. Fortunately, Kafka does not leave us without options here: It gives us the ability to partition topics.

Partitioning takes the single topic log and breaks it into multiple logs, each of which can live on a separate node in the Kafka cluster. This way, the work of storing messages, writing new messages, and processing existing messages can be split among many nodes in the cluster.

How Kafka Partitioning Works

Having broken a topic up into partitions, we need a way of deciding which messages to write to which partitions. Typically, if a message has no key, subsequent messages will be distributed round-robin among all the topic’s partitions. In this case, all partitions get an even share of the data, but we don’t preserve any kind of ordering of the input messages. If the message does have a key, then the destination partition will be computed from a hash of the key. This allows Kafka to guarantee that messages having the same key always land in the same partition, and therefore are always in order.

For example, if you are producing events that are all associated with the same customer, using the customer ID as the key guarantees that all of the events from a given customer will always arrive in order. This creates the possibility that a very active key will create a larger and more active partition, but this risk is small in practice and is manageable when it presents itself. It is often worth it in order to preserve the ordering of keys.

Kafka Brokers

So far we have talked about events, topics, and partitions, but as of yet, we have not been too explicit about the actual computers in the picture. From a physical infrastructure standpoint, Kafka is composed of a network of machines called brokers . In a contemporary deployment, these may not be separate physical servers but containers running on pods running on virtualized servers running on actual processors in a physical datacenter somewhere. However they are deployed, they are independent machines each running the Kafka broker process. Each broker hosts some set of partitions and handles incoming requests to write new events to those partitions or read events from them. Brokers also handle replication of partitions between each other.

Replication

It would not do if we stored each partition on only one broker. Whether brokers are bare metal servers or managed containers, they and their underlying storage are susceptible to failure, so we need to copy partition data to several other brokers to keep it safe. Those copies are called follower replica, whereas the main partition is called the leader replica. When you produce data to the leader—in general, reading and writing are done to the leader—the leader and the followers work together to replicate those new writes to the followers.

This happens automatically, and while you can tune some settings in the producer to produce varying levels of durability guarantees, this is not usually a process you have to think about as a developer building systems on Kafka. All you really need to know as a developer is that your data is safe, and that if one node in the cluster dies, another will take over its role.

Client Applications

Now let’s get outside of the Kafka cluster itself to the applications that use Kafka: the producers and consumers . These are client applications that contain your code, putting messages into topics and reading messages from topics. Every component of the Kafka platform that is not a Kafka broker is, at bottom, either a producer or a consumer or both. Producing and consuming are how you interface with a cluster.

Kafka Producers

The API surface of the producer library is fairly lightweight: In Java, there is a class called KafkaProducer that you use to connect to the cluster. You give this class a map of configuration parameters, including the address of some brokers in the cluster, any appropriate security configuration, and other settings that determine the network behavior of the producer. There is another class called ProducerRecord that you use to hold the key-value pair you want to send to the cluster.

To a first-order approximation, this is all the API surface area there is to producing messages. Under the covers, the library is managing connection pools, network buffering, waiting for brokers to acknowledge messages, retransmitting messages when necessary, and a host of other details no application programmer need concern herself with.

Kafka Consumers

Using the consumer API is similar in principle to the producer. You use a class called KafkaConsumer to connect to the cluster (passing a configuration map to specify the address of the cluster, security, and other parameters). Then you use that connection to subscribe to one or more topics. When messages are available on those topics, they come back in a collection called ConsumerRecords, which contains individual instances of messages in the form of ConsumerRecord objects. A ConsumerRecord object represents the key/value pair of a single Kafka message.

KafkaConsumer manages connection pooling and the network protocol just like KafkaProducer does, but there is a much bigger story on the read side than just the network plumbing. First of all, Kafka is different from legacy message queues in that reading a message does not destroy it; it is still there to be read by any other consumer that might be interested in it. In fact, it’s perfectly normal in Kafka for many consumers to read from one topic. This one small fact has a positively disproportionate impact on the kinds of software architectures that emerge around Kafka, which is a topic covered very well elsewhere.

Also, consumers need to be able to handle the scenario in which the rate of message consumption from a topic combined with the computational cost of processing a single message are together too high for a single instance of the application to keep up. That is, consumers need to scale. In Kafka, scaling consumer groups is more or less automatic.

Kafka Components and Ecosystem

If all you had were brokers managing partitioned, replicated topics with an ever-growing collection of producers and consumers writing and reading events, you would actually have a pretty useful system. However, the experience of the Kafka community is that certain patterns will emerge that will encourage you and your fellow developers to build the same bits of functionality over and over again around core Kafka.

You will end up building common layers of application functionality to repeat certain undifferentiated tasks. This is code that does important work but is not tied in any way to the business you’re actually in. It doesn’t contribute value directly to your customers. It’s infrastructure, and it should be provided by the community or by an infrastructure vendor.

It can be tempting to write this code yourself, but you should not. Kafka Connect , the Confluent Schema Registry , Kafka Streams , and ksqlDB are examples of this kind of infrastructure code. We’ll take a look at each of them in turn.

Kafka Connect

Kafka Connect

In the world of information storage and retrieval, some systems are not Kafka. Sometimes you would like the data in those other systems to get into Kafka topics, and sometimes you would like data in Kafka topics to get into those systems. As Apache Kafka's integration API, this is exactly what Kafka Connect does.

What Does Kafka Connect Do?

On the one hand, Kafka Connect is an ecosystem of pluggable connectors, and on the other, a client application. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault tolerant, meaning you can run not just one single Connect worker but a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Kafka Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. For example, here’s how you’d stream data from Kafka to Elasticsearch:

Benefits of Kafka Connect

One of the primary advantages of Kafka Connect is its large ecosystem of connectors. Writing the code that moves data to a cloud blob store, or writes to Elasticsearch, or inserts records into a relational database is code that is unlikely to vary from one business to the next. Likewise, reading from a relational database, Salesforce, or a legacy HDFS filesystem is the same operation no matter what sort of application does it. You can definitely write this code, but spending your time doing that doesn’t add any kind of unique value to your customers or make your business more uniquely competitive.

All of these are examples of Kafka connectors available in the Confluent Hub , a curated collection of connectors of all sorts and most importantly, all licenses and levels of support. Some are commercially licensed and some can be used for free. Connect Hub lets you search for source and sink connectors of all kinds and clearly shows the license of each connector. Of course, connectors need not come from the Hub and can be found on GitHub or elsewhere in the marketplace. And if after all that you still can’t find a connector that does what you need, you can write your own using a fairly simple API.

Now, it might seem straightforward to build this kind of functionality on your own: If an external source system is easy to read from, it would be easy enough to read from it and produce to a destination topic. If an external sink system is easy to write to, it would again be easy enough to consume from a topic and write to that system. But any number of complexities arise, including how to handle failover, horizontally scale, manage commonplace transformation operations on inbound or outbound data, distribute common connector code, configure and operate this through a standard interface, and more.

Connect seems deceptively simple on its surface, but it is in fact a complex distributed system and plugin ecosystem in its own right. And if that plugin ecosystem happens not to have what you need, the open-source Connect framework makes it simple to build your own connector and inherit all the scalability and fault tolerance properties Connect offers.

Schema Registry

schema registry

Once applications are busily producing messages to Kafka and consuming messages from it, two things will happen. First, new consumers of existing topics will emerge. These are brand new applications—perhaps written by the team that wrote the original producer of the messages, perhaps by another team—and will need to understand the format of the messages in the topic. Second, the format of those messages will evolve as the business evolves. Order objects gain a new status field, usernames split into first and last name from full name, and so on. The schema of our domain objects is a constantly moving target, and we must have a way of agreeing on the schema of messages in any given topic.

Confluent Schema Registry exists to solve this problem.

What Is Schema Registry?

Schema Registry is a standalone server process that runs on a machine external to the Kafka brokers. Its job is to maintain a database of all of the schemas that have been written into topics in the cluster for which it is responsible. That “database” is persisted in an internal Kafka topic and cached in the Schema Registry for low-latency access. Schema Registry can be run in a redundant, high-availability configuration, so it remains up if one instance fails.

Schema Registry is also an API that allows producers and consumers to predict whether the message they are about to produce or consume is compatible with previous versions. When a producer is configured to use the Schema Registry, it calls an API at the Schema Registry REST endpoint and presents the schema of the new message. If it is the same as the last message produced, then the produce may succeed. If it is different from the last message but matches the compatibility rules defined for the topic, the produce may still succeed. But if it is different in a way that violates the compatibility rules, the produce will fail in a way that the application code can detect.

Likewise on the consume side, if a consumer reads a message that has an incompatible schema from the version the consumer code expects, Schema Registry will tell it not to consume the message. Schema Registry doesn’t fully automate the problem of schema evolution—that is a challenge in any system regardless of the tooling—but it does make a difficult problem much easier by keeping runtime failures from happening when possible.

Looking at what we’ve covered so far, we’ve got a system for storing events durably, the ability to write and read those events, a data integration framework, and even a tool for managing evolving schemas. What remains is the purely computational side of stream processing.

Kafka Streams

In a growing Kafka-based application, consumers tend to grow in complexity. What might have started as a simple stateless transformation (e.g., masking out personally identifying information or changing the format of a message to conform with internal schema requirements) soon evolves into complex aggregation, enrichment, and more. If you recall the consumer code we looked at up above, there isn’t a lot of support in that API for operations like those: You’re going to have to build a lot of framework code to handle time windows, late-arriving messages, lookup tables, aggregation by key, and more. And once you’ve got that, recall that operations like aggregation and enrichment are typically stateful.

That “state” is going to be memory in your program’s heap, which means it’s a fault tolerance liability. If your stream processing application goes down, its state goes with it, unless you’ve devised a scheme to persist that state somewhere. That sort of thing is fiendishly complex to write and debug at scale and really does nothing to directly make your users’ lives better. This is why Apache Kafka provides a stream processing API. This is why we have Kafka Streams .

What Is Kafka Streams?

Kafka Streams is a Java API that gives you easy access to all of the computational primitives of stream processing: filtering, grouping, aggregating, joining, and more, keeping you from having to write framework code on top of the consumer API to do all those things. It also provides support for the potentially large amounts of state that result from stream processing computations. If you’re grouping events in a high-throughput topic by a field with many unique values then computing a rollup over that group every hour, you might need to use a lot of memory.

Indeed, for high-volume topics and complex stream processing topologies, it’s not at all difficult to imagine that you’d need to deploy a cluster of machines sharing the stream processing workload like a regular consumer group would. The Streams API solves both problems by handling all of the distributed state problems for you: It persists state to local disk and to internal topics in the Kafka cluster, and it automatically reassigns state between nodes in a stream processing cluster when adding or removing stream processing nodes to the cluster.

In a typical microservice, stream processing is a thing the application does in addition to other functions. For example, a shipment notification service might combine shipment events with events in a product information changelog containing customer records to produce shipment notification objects, which other services might turn into emails and text messages. But that shipment notification service might also be obligated to expose a REST API for synchronous key lookups by the mobile app or web front end when rendering views that show the status of a given shipment.

The service is reacting to events—and in this case, joining three streams together, and perhaps doing other windowed computations on the joined result—but it is also servicing HTTP requests against its REST endpoint, perhaps using the Spring Framework or Micronaut or some other Java API in common use. Because Kafka Streams is a Java library and not a set of dedicated infrastructure components that do stream processing and only stream processing, it’s trivial to stand up services that use other frameworks to accomplish other ends (like REST endpoints) and sophisticated, scalable, fault-tolerant stream processing.

Learn About Kafka with More Free Courses and Tutorials

  • Learn stream processing in Kafka with the Kafka Streams course
  • Get started with Kafka Connectors in the Kafka Connect course
  • Check out Michael Noll’s four-part series on Streams and Tables in Apache Kafka
  • Listen to the podcast about Knative 101: Kubernetes and Serverless Explained with Jacques Chester

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

apache kafka

Apache Kafka

Jul 11, 2014

1.48k likes | 2.69k Views

Apache Kafka. A high-throughput distributed messaging system. Johan Lundahl. Agenda. Kafka overview Main concepts and comparisons to other messaging systems Features, strengths and tradeoffs Message format and broker concepts Partitioning, Keyed messages, Replication

Share Presentation

  • fluentd official
  • topic2 push broker kafka
  • consumer group
  • serialization message10 consumer3
  • replication
  • millisecond latencies

zanthe

Presentation Transcript

Apache Kafka A high-throughput distributed messaging system Johan Lundahl

Agenda • Kafka overview • Main concepts and comparisons to other messaging systems • Features, strengths and tradeoffs • Message format and broker concepts • Partitioning, Keyed messages, Replication • Producer / Consumer APIs • Operation considerations • Kafka ecosystem If time permits: • Kafka as a real-time processing backbone • Brief intro to Storm • Kafka-Storm wordcount demo

What is Apache Kafka? • Distributed, high-throughput, pub-sub messaging system • Fast, Scalable, Durable • Main use cases: • log aggregation, real-time processing, monitoring, queueing • Originally developed by LinkedIn • Implemented in Scala/Java • Top level Apache project since 2012: http://kafka.apache.org/

Comparison to other messaging systems • Traditional: JMS, xxxMQ/AMQP • New gen: Kestrel, Scribe, Flume, Kafka Message queues Low throughput, low latency Log aggregators High throughput, high latency RabbitMQ JMS Flume Hedwig Kafka ActiveMQ Scribe Batch jobs Qpid Kestrel

Kafka concepts Producers Service Frontend Frontend Topic1 Topic3 Topic1 Topic2 Push Broker Kafka Pull Topic3 Topic3 Topic2 Topic2 Topic3 Topic1 Topic1 Data warehouse Batch processing Consumers Monitoring Stream processing

Distributed model KAFKA-156 Producer Producer Producer Producer persistence Partitioned Data Publication Intra cluster replication Broker Broker Broker Zookeeper Ordered subscription Topic2 consumer group Topic1 consumer group

Performance factors • Broker doesn’t track consumer state • Everything is distributed • Zero-copy (sendfile) reads/writes • Usage of page cache backed by sequential disk allocation • Like a distributed commit log • Low overhead protocol • Message batching (Producer & Consumer) • Compression (End to end) • Configurable ack levels From: http://queue.acm.org/detail.cfm?id=1563874

Kafka features and strengths • Simple model, focused on high throughput and durability • O(1) time persistence on disk • Horizontally scalable by design (broker and consumers) • Push - pull => consumer burst tolerance • Replay messages • Multiple independent subscribes per topic • Configurable batching, compression, serialization • Online upgrades

Tradeoffs • Not optimized for millisecond latencies • Have not beaten CAP • Simple messaging system, no processing • Zookeeper becomes a bottleneck when using too many topics/partitions (>>10000) • Not designed for very large payloads (full HD movie etc.) • Helps to know your data in advance

Message/Log Format Message Length Version Checksum Payload

Log based queue (Simplified model) Broker Topic1 Topic2 Producer API used directly by application or through one of the contributed implementations, e.g. log4j/logbackappender Consumer1 Message1 Message1 Message2 Message2 Message3 Message3 Consumer2 Producer1 Message4 Message4 Message5 Message5 Producer2 Message6 Message6 Message7 Message7 Message8 Consumer3 Message9 ConsumerGroup1 Consumer3 • Batching • Compression • Serialization Message10 Consumer3

Partitioning Broker Partitions Topic1 Group1 Producer Group2 Consumer Producer Consumer Producer Consumer Topic2 Producer Group3 Producer Consumer Consumer Consumer No partition for this guy Consumer

Keyed messages #partitions=3 hash(key) % #partitions BrokerId=3 BrokerId=1 BrokerId=2 Topic1 Topic1 Topic1 Message3 Message1 Message2 Message7 Message5 Message4 Message11 Message9 Message6 Message15 Message13 Message8 Message17 Message10 Message12 Message14 Message16 Producer Message18

Intra cluster replication Replication factor = 3 Broker1 Broker2 Broker3 InSyncReplicas Topic1 follower Topic1 leader Topic1 follower Follower fails: • Follower dropped from ISR • When follower comes online again: fetch data from leader, then ISR gets updated Leader fails: • Detected via Zookeeper from ISR • New leader gets elected Message1 Message1 Message1 Message2 Message2 Message2 Message3 Message3 Message3 Message4 Message4 Message4 Message5 Message5 Message5 Message6 Message6 Message6 Message7 Message7 Message7 Message8 Message8 Message8 Producer Message9 Message9 Message9 ack ack ack ack Message10 Message10 Message10 3 commit modes:

Producer API …or for log aggregation: Configuration parameters: ProducerType (sync/async) CompressionCodec (none/snappy/gzip) BatchSize EnqueueSize/Time Encoder/Serializer Partitioner #Retries MaxMessageSize …

Consumer API(s) • High-level (consumer group, auto-commit) • Low-level (simple consumer, manual commit)

Broker Protips • Reasonable number of partitions – will affect performance • Reasonable number of topics – will affect performance • Performance decrease with larger Zookeeper ensembles • Disk flush rate settings • message.max.bytes – max accept size, should be smaller than the heap • socket.request.max.bytes – max fetch size, should be smaller than the heap • log.retention.bytes – don’t want to run out of disk space… • Keep Zookeeper logs under control for same reason as above • Kafka brokers have been tested on Linux and Solaris

Operating Kafka • Zookeeper usage • Producer loadbalancing • Broker ISR • Consumer tracking • Monitoring • JMX • Audit trail/console in the making • Distribution Tools: • Controlled shutdown tool • Preferred replica leader election tool • List topic tool • Create topic tool • Add partition tool • Reassign partitions tool • MirrorMaker

Multi-datacenter replication

Ecosystem • Producers: • Java (in standard dist) • Scala (in standard dist) • Log4j (in standard dist) • Logback: logback-kafka • Udp-kafka-bridge • Python: kafka-python • Python: pykafka • Python: samsa • Python: pykafkap • Python: brod • Go: Sarama • Go: kafka.go • C: librdkafka • C/C++: libkafka • Clojure: clj-kafka • Clojure: kafka-clj • Ruby: Poseidon • Ruby: kafka-rb • Ruby: em-kafka • PHP: kafka-php(1) • PHP: kafka-php(2) • PHP: log4php • Node.js: Prozess • Node.js: node-kafka • Node.js: franz-kafka • Erlang: erlkafka • Consumers: • Java (in standard dist) • Scala (in standard dist) • Python: kafka-python • Python: samsa • Python: brod • Go: Sarama • Go: nuance • Go: kafka.go • C/C++: libkafka • Clojure: clj-kafka • Clojure: kafka-clj • Ruby: Poseidon • Ruby: kafka-rb • Ruby: Kafkaesque • Jruby::Kafka • PHP: kafka-php(1) • PHP: kafka-php(2) • Node.js: Prozess • Node.js: node-kafka • Node.js: franz-kafka • Erlang: erlkafka • Erlang: kafka-erlang Common integration points: Stream Processing Storm - A stream-processing framework. Samza - A YARN-based stream processing framework. Hadoop Integration Camus - LinkedIn's Kafka=>HDFS pipeline. This one is used for all data at LinkedIn, and works great. Kafka Hadoop Loader A different take on Hadoop loading functionality from what is included in the main distribution. AWS Integration Automated AWS deployment Kafka->S3 Mirroring Logging klogd - A python syslog publisher klogd2 - A java syslog publisher Tail2Kafka - A simple log tailing utility Fluentd plugin - Integration with Fluentd Flume Kafka Plugin - Integration with Flume Remote log viewer LogStash integration - Integration with LogStash and Fluentd Official logstash integration Metrics Mozilla Metrics Service - A Kafka and Protocol Buffers based metrics and logging system Ganglia Integration Packing and Deployment RPM packaging Debian packaginghttps://github.com/tomdz/kafka-deb-packaging Puppet integration Dropwizard packaging Misc. Kafka Mirror - An alternative to the built-in mirroring tool Ruby Demo App Apache Camel Integration Infobright integration

What’s in the future? • Topic and transient consumer garbage collection (KAFKA-560/KAFKA-559) • Producer side persistence (KAFKA-156/KAFKA-789) • Exact mirroring (KAFKA-658) • Quotas (KAFKA-656) • YARN integration (KAFKA-949) • RESTful proxy (KAFKA-639) • New build system? (KAFKA-855) • More tooling (Console, Audit trail) (KAFKA-266/KAFKA-260) • Client API rewrite (Proposal) • Application level security (Proposal)

Stream processing Kafka as a processing pipeline backbone Producer Process1 Process2 Kafka topic1 Kafka topic2 Process1 Producer Process2 Process1 Producer Process2 System1 System2

What is Storm? • Distributed real-time computation system with design goals: • Guaranteed processing • No orphaned tasks • Horizontally scalable • Fault tolerant • Fast • Use cases: Stream processing, DRPC, Continuous computation • 4 basic concepts: streams, spouts, bolts, topologies • In Apache incubator • Implemented in Clojure

Streams an [infinite] sequence (of tuples) (timestamp,sessionid,exceptionstacktrace) (t4,s2,e2) (t4,s2,e2) (t3,s3) (t3,s3) (t2,s1,e2) (t2,s1,e2) (t1,s1,e1) (t1,s1,e1) Spouts a source of streams Connects to queues, logs, API calls, event data. Some features like transactional topologies (which gives exactly-once messaging semantics) is only possible using the Kafka-TransactionalSpout-consumer

Bolts (t2,s1,h2) (t1,s1,h1) • Filters • Transformations • Apply functions • Aggregations • Access DB, APIs etc. • Emitting new streams • Trident = a high level abstraction on top of Storm (t4,s2,e2) (t5,s4) (t3,s3)

Topologies (t2,s1,h2) (t1,s1,h1) (t4,s2,e2) (t5,s4) (t3,s3) (t8,s8) (t7,s7) (t6,s6)

Storm cluster Deploy Topology Compare with Hadoop: Nimbus (JobTracker) Zookeeper (TaskTrackers) Supervisor Supervisor Supervisor Supervisor Supervisor Mesos/YARN

Links Apache Kafka: Papers and presentations Main project page Small Mediawiki case study Storm: Introductory article Realtime discussing blog post Kafka+Storm for realtime BigData Trifecta blog post: Kafka+Storm+Cassandra IBM developer article Kafka+Storm@Twitter BigDataQuadfecta blog post

  • More by User

Franz Kafka

Franz Kafka

Franz Kafka. i ndividuals in the grip of incomprehensible bureaucracy literal political meaning or metaphor for the human condition? I suggest: both ; I acknowledge second level but focus here on first

945 views • 35 slides

Franz Kafka

Franz Kafka. The Metamorphosis. Early Life. Born in 1883 in Prague, Bohemia (Czech Rep.) Middle-class, German-speaking Jewish family Strained relationship with parents Still strong dependence on family Mirrored in Metamorphosis Gregor’s dependence on his sister and mother

608 views • 24 slides

Franz Kafka

Franz Kafka Born in 1883 to a middle-class, German-speaking, Jewish family in Prague, Bohemia (now the Czech Republic). Helped raise three younger sisters while his parents worked at his father’s business as many as 12 hours every day. Kafka had a difficult relationship with his

411 views • 6 slides

FRANZ KAFKA

FRANZ KAFKA

FRANZ KAFKA . By Celile Önürt -20627842494. Overview. Life and Work Education Employment Literary Career Writing Style Publications Death. L ife and Work. Franz Kafka, born 3 July 1883 in Prague and died 3 June 1924 in Keirling

451 views • 13 slides

Franz Kafka

The Metamorphosis. Franz Kafka. The Metamorphosis. Great ending point for the semester a very specific study in Psychoanalysis and Formalism. 1. Formalism and The Metamorphosis. You should understand these literary terms: eponym and Kafkaesque defamiliarization novella Parable

742 views • 34 slides

Franz Kafka

1883-1924. Franz Kafka. Born in the middle class His whole family was Jewish Eldest of 6 children He was a shy kid. Early Years. Admitted to Charles Ferdinand University of Prague: first studied chemistry but, after two weeks, switched to law.

1.31k views • 17 slides

Apache Samza * Reliable Stream Processing atop Apache Kafka and Yarn

Apache Samza * Reliable Stream Processing atop Apache Kafka and Yarn

Apache Samza * Reliable Stream Processing atop Apache Kafka and Yarn. Sriram Subramanian Me on Linkedin Me on twitter - @sriramsub1. * Incubating. Agenda. Why Stream Processing? What is Samza’s Design ? How is Samza’s Design Implemented? How can you use Samza ?

1.36k views • 100 slides

Franz Kafka

Franz Kafka. His life His work His “issues”. Early Life. Born July 3, 1883 Born into a Jewish family in Prague. Early life. Named for Emperor Franz Joseph of Austria-Hungary Middle class family Father had a dry goods store

569 views • 24 slides

Apache Kafka Courses

Apache Kafka Courses

Apache Kafka’s unique architecture enables huge scalability, but it must be deployed and managed in a considered fashion. This course covers the basic concepts of Apache Kafka, and considerations for deploying Kafka and managing servers. http://www.e-learningcenter.com/

359 views • 4 slides

Apache Kafka Plugin-ORIEN IT

Apache Kafka Plugin-ORIEN IT

To provide detailed knowledge on Spark And Scala and its prominence in data handling. • To build knowledge on the future trends and career scope by attaining our institutes Spark And Scala Course In Hyderabad. • Scope for employment opportunities with Spark And Scala training. • Sorting out all kinds of doubts regarding Spark And Scala with the experts. webpage:http://www.orienit.com/courses/Spark-and-Scala-Training-in-Hyderabad.html Blogs: http://www.kalyanhadooptraining.com/ http://kalyanbigdatatraining.blogspot.com/

248 views • 9 slides

Apache Kafka

This presentation gives an overview of the Apache Kafka project. It covers areas like producer, consumer, topic, partitions, API's, architecture and usage. Links for further information and connecting http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ https://nz.linkedin.com/pub/mike-frampton/20/630/385 https://open-source-systems.blogspot.com/ Music by "Little Planet", composed and performed by Bensound from http://www.bensound.com/

418 views • 11 slides

Apache Pulsar vs Apache Kafka [Infographic]

Apache Pulsar vs Apache Kafka [Infographic]

From zero message loss to all in one messaging, native geo-replication and much more, when it comes to distributed messaging, Pulsar outperforms Kafka at every turn. Learn more about Pandio's solutions at: https://pandio.com/apache-pulsar-as-a-service/

90 views • 1 slides

Integrating Apache NiFi and Apache Kafka

Integrating Apache NiFi and Apache Kafka

Learn More About How The Nifi-Kafka Combo Can Benefit You! Check out PPT contains detailed info. For NiFi Kafka experts visit Ksolves.

94 views • 7 slides

Apache Kafka PPT

presentation on kafka

Title: The Power of Apache Kafka: A Comprehensive Guide

Introduction

  • Begin by defining distributed streaming platforms and their importance in the modern data landscape.
  • Handling massive data volumes
  • Real-time data processing
  • Fault tolerance and reliability needs

What is Apache Kafka?

  • A high-throughput, distributed, publish-subscribe messaging system.
  • Designed to handle real-time data streams from diverse sources.
  • Built for scalability and reliability to handle massive datasets.

Key Concepts

  • Topics:  Logical categories for organizing data streams.
  • Partitions:  Topics are divided into partitions for scalability and distribution.
  • Producers:  Applications that send messages to Kafka topics.
  • Consumers:  Applications that read and process messages from Kafka topics.
  • Brokers:  Kafka servers that manage data storage and replication.
  • ZooKeeper:  Coordination service for Kafka clusters (although note that newer versions may lessen reliance on Zookeeper).
  • Real-time Analytics:  Processing data streams for immediate insights.
  • Microservices Communication:  Enabling reliable communication between distributed services.
  • Activity Tracking:  Monitoring website clicks, user behavior, etc.
  • Log Aggregation:  Centralizing logs from different systems.
  • Messaging:  Replacing traditional message queues with a more scalable solution.

Architecture

  • Components:  Visually illustrate producers, consumers, brokers, topics, and partitions.
  • Data Flow:  Describe the message journey to clarify interactions.
  • Replication:  Show how replication ensures high availability and fault tolerance.

Benefits of Kafka

  • Scalability:  Handles massive data volumes with ease.
  • Reliability:  Replicated data and fault tolerance ensure no data loss.
  • Performance:  Low latency high throughput for demanding applications.
  • Flexibility:  Supports a wide range of use cases and programming languages.

Getting Started

  • Installation and Configuration:  Basic setup steps.
  • Simple Producer/Consumer Example:  Demonstrate sending and receiving messages.

Additional Considerations

  • Integration with Other Systems:  Spark, Flink, Hadoop, etc.
  • Advanced Concepts:  Kafka Streams, Kafka Connect, KSQL.
  • Security:  Authentication, authorization, and encryption.

Slide 1: Title Slide

  • Title:  The Power of Apache Kafka: A Comprehensive Guide
  • Your Name/Company

Slide 2: Introduction

  • Define distributed streaming platforms and their importance.
  • Mention key challenges of handling large, real-time datasets.

Slide 3: Apache Kafka Overview

  • Brief definition of Kafka.
  • Highlight core features.

Slides 4-6: Key Concepts

  • Dedicated slides to Topics, Partitions, Producers, Consumers, and Brokers.
  • Use simple diagrams where possible.

Slide 7: Use Cases

  • 3-4 strong use case examples, with visuals if possible

Slide 8: Kafka Architecture

  • Key components diagram.
  • Arrows to show data flow.

Slides 9-10: Benefits

  • Bullet points out the primary benefits of Kafka.

Slides 11-12: Getting Started

  • Overview of installation (download link).
  • Simple code example of a producer and consumer.

Slide 13: Conclusion

  • Summarize Kafka’s strengths
  • End with a call to action (explore further, try it out).

Important Notes:

  • Tailor to Audience:  Adjust technical depth based on audience knowledge.
  • Visuals:  Make liberal use of diagrams and illustrations.
  • Practice:  Rehearse for confidence and timing.

Conclusion:

Unogeeks is the No.1 IT Training Institute for Apache kafka Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on  Apache Kafka  here –  Apache kafka Blogs

You can check out our Best In Class Apache Kafka Details here –   Apache kafka Training

Follow & Connect with us:

———————————-

For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at: [email protected]

Our Website ➜  https://unogeeks.com

Instagram:  https://www.instagram.com/unogeeks

Facebook:  https://www.facebook.com/UnogeeksSoftwareTrainingInstitute

Twitter:  https://twitter.com/unogeek

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Apache Kafka Architecture and Its Components-The A-Z Guide

Apache Kafka Architecture with Diagram - Explore the Event Driven Architecture of Kafka Cluster, its features and the role of various components.

Apache Kafka Architecture and Its Components-The A-Z Guide

A detailed introduction to Apache Kafka Architecture, one of the most popular messaging systems for distributed applications. 

The first COVID-19 cases were reported in the United States in January 2020. By the end of the year, over 200,000 cases were reported per day, which climbed to 250,000 cases  in early 2021. Responding to a pandemic on such a large scale involves technical and public health challenges. One of the challenges was keeping track of the data coming in from many data streams in multiple formats. The CELR (COVID Electronic Lab Reporting) program of the Centre for Disease Control and Prevention (CDC) was established to validate, transform and aggregate laboratory testing data submitted by various public health departments and other partners. Kafka Streams and Kafka Connect were used to keep track of the threat of the COVID-19 virus and analyze the data for a more thorough response on local, state, and federal levels.

kafka architecture

Kafka is an integral part of Netflix’s real-time monitoring and event-processing pipeline. The Keystone Data Pipeline of Netflix processes over 500 billion events a day . These events include error logs, data on user viewing activities, and troubleshooting events, among other valuable datasets. 

big_data_project

Streaming ETL in Kafka with KSQL using NYC TLC Data

Downloadable solution code | Explanatory videos | Tech Support

At LinkedIn, Kafka is the backbone behind various products, including LinkedIn Newsfeed and LinkedIn Today. Spotify uses Kafka as part of its log delivery system.

Kafka is used by thousands of companies today, including over 60% of the Fortune 100, including Box, Goldman Sachs, Target, Cisco, and Intuit. Apache Kafka is one of the most popular open-source distributed streaming platforms for  processing large volumes of streaming data from real-time applications.

Table of Contents

Why is apache kafka so popular.

  • Apache Kafka Architecture - Overview of Kafka Components

Apache Kafka Event-Driven Workflow Orchestration

The role of zookeeper in apache kafka architecture, drawbacks of apache kafka, apache kafka use cases .

ProjectPro Free Projects on Big Data and Data Science

So why is Kafka so popular? And what makes it such a popular choice for companies? 

Scalability: The scalability of a system is determined by how well it can maintain its performance when exposed to changes in application and processing demands. Apache Kafka has a distributed architecture capable of handling incoming messages with higher volume and velocity. As a result, Kafka is highly scalable without any downtime impact.

High Throughput: Apache Kafka is able to handle thousands of messages per second. Messages coming in at a high volume or a high velocity or both will not affect the performance of Kafka.

Low Latency: Latency refers to the amount of time taken for a system to process a single event. Kafka offers a very low latency, which is as low as ten milliseconds.

Fault Tolerance: By using replication, Kafka can handle failures at nodes in a cluster without any data loss. Running processes, too, can remain undisturbed. The replication factor determines the number of replicas for a partition. For a replication factor of ‘n,’ Kafka guarantees a fault tolerance for up to n-1 servers in the Kafka cluster.

Reliability: Apache Kafka is a distributed platform with very high fault tolerance, making it a very reliable system to use.

Durability: Data present on the Kafka cluster is allowed to remain persistent more on the cluster than on the disk. This ensures that Kafka’s data remains durable.

  • Ability to handle real-time data: Kafka supports real-time data handling and is an excellent choice when data has to be processed in real-time.

New Projects

Apache Kafka Architecture Explained- Overview of Kafka Components

Let’s look in detail at the architecture of Apache Kafka and the relationship between the various architectural components to develop a deeper understanding of Kafka for distributed streaming.  But before delving into the components of Apache Kafka, it is crucial to grasp the concept of a Kafka cluster first.

What is a Kafka Cluster? 

A Kafka cluster is a distributed system composed of multiple Kafka brokers working together to handle the storage and processing of real-time streaming data. It provides fault tolerance, scalability, and high availability for efficient data streaming and messaging in large-scale applications.

Apache Kafka Components and Its Architectural Concepts

apache kafka architecture

A stream of messages that are a part of a specific category or feed name is referred to as a Kafka topic. In Kafka, data is stored in the form of topics. Producers write their data to topics, and consumers read the data from these topics. 

Here's what valued users are saying about ProjectPro

user profile

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

user profile

Graduate Research assistance at Stony Brook University

Not sure what you are looking for?

A Kafka cluster comprises one or more servers that are known as brokers. In Kafka, a broker works as a container that can hold multiple topics with different partitions. A unique integer ID is used to identify brokers in the Kafka cluster. Connection with any one of the kafka brokers in the cluster implies a connection with the whole cluster. If there is more than one broker in a cluster, the brokers need not contain the complete data associated with a particular topic.

Consumers and Consumer Groups

Consumers read data from the Kafka cluster. The data to be read by the consumers has to be pulled from the broker when the consumer is ready to receive the message. A consumer group in Kafka refers to a number of consumers that pull data from the same topic or same set of topics. 

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

Producers in Kafka publish messages to one or more topics. They send data to the Kafka cluster. Whenever a Kafka producer publishes a message to Kafka, the broker receives the message and appends it to a particular partition. Producers are given a choice to publish messages to a partition of their choice.

Topics in Kafka are divided into a configurable number of parts, which are known as partitions. Partitions allow several consumers to read data from a particular topic in parallel. Partitions are separated in order. The number of partitions is specified when configuring a topic, but this number can be changed later on. The partitions comprising a topic are distributed across servers in the Kafka cluster. Each server in the cluster handles the data and requests for its share of partitions. Messages are sent to the broker along with a key. The key can be used to determine which partition that particular message will go to. All messages which have the same key go to the same partition. If the key is not specified, then the partition will be decided in a round-robin fashion.

Partition Offset

Messages or records in Kafka are assigned to a partition. To specify the position of the records within the partition, each record is provided with an offset. A record can be uniquely identified within its partition using the offset value associated with it. A partition offset carries meaning only within that particular partition. Older records will have lower offset values since records are added to the ends of partitions.

Replicas are like backups for partitions in Kafka. They are used to ensure that there is no data loss in the event of a failure or a planned shutdown. Partitions of a topic are published across multiple servers in a Kafka cluster. Copies of the partition are known as Replicas.

Leader and Follower

Every partition in Kafka will have one server that plays the role of a leader for that particular partition. The leader is responsible for performing all the read and write tasks for the partition. Each partition can have zero or more followers. The duty of the follower is to replicate the data of the leader. In the event of a failure in the leader for a particular partition, one of the follower nodes can take on the role of the leader.

Get FREE Access to  Data Analytics Example Codes for Data Cleaning, Data Munging, and Data Visualization

event driven architecture kafka

Kafka Producers

In Kafka, the producers send data directly to the broker that plays the role of leader for a given partition. In order to help the producer send the messages directly, the nodes of the Kafka cluster answer requests for metadata on which servers are alive and the current status of the leaders of partitions of a topic so that the producer can direct its requests accordingly. The client decides which partition it publishes its messages to. This can either be done arbitrarily or by making use of a partitioning key, where all messages containing the same partition key will be sent to the same partition.

Messages in Kafka are sent in the form of batches, known as record batches. The producers accumulate messages in memory and send them in batches either after a fixed number of messages are accumulated or before a fixed latency bound period of time has elapsed.

Explore Categories

Kafka Brokers

In Kafka, the cluster usually contains multiple nodes, that are known as brokers, to maintain the load balance. The brokers are stateless, and hence their cluster state is maintained by the ZooKeeper. One Kafka broker is able to handle hundreds of thousands of reads and writes per second. For one particular partition, one broker serves as the leader. The leader may have one or more followers, where the data on the leader is to be replicated across the followers for that particular partition. The role of leader for partitions is distributed across brokers in the cluster.

The nodes in a cluster have to send messages called Heartbeat messages to the ZooKeeper to keep the ZooKeeper informed that they are alive. The followers have to stay caught up with the data that is in the leader. The leader keeps track of the followers that are “in sync” with it. If a follower is no longer alive or does not stay caught up with the leader, it is removed from the list of in-sync replicas (ISRs) associated with that particular leader. If the leader dies, a new leader is selected from among the followers. The election of the new leader is handled by the ZooKeeper.

Kafka Consumers

In Kafka, the consumer has to issue requests to the brokers indicating the partitions it wants to consume. The consumer is required to specify its offset in the request and receives a chunk of log beginning from the offset position from the broker. Since the consumer has control over this position, it can re-consume data if required. Records remain in the log for a configurable time period which is known as the retention period. The consumer may re-consume the data as long as the data is present in the log.

In Kafka, the consumers work on a pull-based approach. This means that data is not immediately pushed onto the consumers from the brokers. The consumers have to send requests to the brokers to indicate that they are ready to consume the data. A pull-based system ensures that the consumer does not get overwhelmed with messages and can fall behind and catch up when it can. A pull-based system can also allow aggressive batching of data sent to the consumer since the consumer will pull all available messages after its current position in the log. In this manner, batching is performed without any unnecessary latency.

End to End Batch Compression

To efficiently handle large volumes of data, Kafka performs compression of messages. Efficient compression involves compressing multiple messages together instead of compressing individual messages. For the reason that Apache Kafka supports an efficient batching format, a batch of messages can be compressed together and sent to the server in this format. The batch of messages here get written to the broker in a compressed format and continue to remain compressed in the log until they are extracted and decompressed by the consumer.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

kafka architecture diagram

Apache ZooKeeper is a software developed by Apache that acts as a centralized service and is used to maintain the configuration of data to provide flexible yet robust synchronization for distributed systems. The ZooKeeper is used to manage and coordinate Kafka brokers in the cluster. It maintains a list of the brokers and manages them using this list. It is used to notify the producer and consumer about the presence of new brokers or about the failure of brokers in the Kafka cluster. Using this information, the producer and consumer can make a decision and accordingly coordinate with some other broker in the cluster. The ZooKeeper is also used to store information about the Kafka cluster and the various details regarding the consumer clients. The ZooKeeper is also responsible for choosing a leader for the partitions. In the event of a failure in the leader node, it is the duty of the ZooKeeper to coordinate the leader election and choose the next leader for a partition.

Up until Kafka 2.8.0, it was not possible to run a Kafka cluster without the ZooKeeper. However, in the 2.8.0 release, the Kafka team is rolling out an alternative method where users can run a Kafka cluster without ZooKeeper but instead using an internal implementation of the Raft consensus algorithm. The changes are outlined in KIP-500 (Kafka Improvement Proposal - 500). The goal here is to move topic metadata and configurations out of ZooKeeper and into a new internal topic, named @metadata, which is managed by an internal Raft quorum of controllers, and replicated to all brokers in the cluster.

Recommended Reading:

  • Apache Spark Architecture Explained in Detail
  • Hadoop Architecture Explained in Detail
  • Top 75 Data Engineer Interview Questions and Answers for 2021

Achieving Performance Tuning in Apache Kafka

Optimum performance involves the consideration of two key measures: latency and throughput. Latency refers to the time taken to process one event. Hence a lower latency is required for better performance. Throughput denotes the number of events that can be processed in a specific amount of time, and hence, the goal is to always have a higher throughput. Many systems tend to optimize one and end up compromising the other, but Kafka attains a perfect balance of the two. 

Tuning Apache Kafka for optimal performance involves:

Tuning Kafka Producer: Data that the producers publish to the brokers is stored in a batch and sent only when the batch is ready. To tune the producers, two parameters are taken into consideration -

Batch Size: The batch size has to be decided based on the nature of the volume of messages sent by the producer. Producers which send messages frequently will work better with larger batch sizes so that throughput can be maximized without compromising heavily on the latency. In cases where the producers do not send messages frequently, smaller batch size is preferred. In such cases, if the batch size is very large, it may never get full or take a long time to fill up. This will increase the latency and hence, compromise performance.

Linger Time: The linger time is added to create a delay to allow more records to be filled up in the batch so that larger batches can be sent. A longer linger time allows more messages to be sent in one batch but can result in compromising latency. On the other hand, a reduced linger time results in fewer messages getting sent faster, and as a result, there is lower latency but reduced throughput too.

Tuning Kafka Brokers: Every partition has a leader associated with it and zero or more followers for the leader. While the Kafka cluster is running, due to failures in some of the brokers or due to reallocation of partitions, an imbalance may occur among the brokers in the cluster. Some brokers might be overworked compared to others. In such cases, it is important to monitor the brokers and ensure that the workload is balanced across the various brokers present in the cluster.

Tuning Kafka Consumers: While tuning consumers, it is important to keep in mind that a consumer can read many partitions, but one consumer can only read one partition. A good practice to follow is to keep the number of consumers equal to or lower than the partition count. If the customers are lower than the partition count, the number of partitions can be an exact multiple of the number of consumers. More consumers than partitions will result in some consumers remaining idle. 

Get More Practice,  More  Big Data and Analytics Projects , and More guidance.Fast-Track Your Career Transition with ProjectPro

We have already seen some interesting reasons that make Apache Kafka a popular tool for distributed streaming, but like every other big data tool , there are a few downsides to using Apache Kafka-

Tweaking messages in Kafka results in performance issues. Kafka is well-suited for cases where the message does not have to be changed.

In Kafka, there is no support for wildcard topic selection. The topic name has to be an exact match.

Certain message paradigms such as point-to-point queues and request/reply features are not supported by Kafka.

Large messages require compression and decompression of messages. This results in an effect on the throughput and performance of Kafka.

Message Broker

Kafka serves as an excellent replacement for traditional message brokers. Compared to traditional massage brokers, Apache Kafka provides better throughput and is capable of handling a larger volume of messages. Kafka can be used as a publish-subscribe messaging service and is a good tool for large-scale message processing applications.

Tracking Website Activities

The activity associated with a website, that includes metrics like page views, searches, and other actions that users take, is published to a centralized topic which in turn contains a topic for each type of activity. This data can be further used for real-time processing, real-time monitoring, and loading into the Hadoop Ecosystem for processing in the future. Website activity usually involves a very high volume of data as several messages are generated for page views by a single user.

Monitoring Metrics

Kafka finds applications in monitoring the metrics associated with operational data. Statistics from distributed applications are consolidated into centralized feeds to monitor their metrics.

Stream Processing

A widespread use case for Kafka is to process data in processing pipelines, where raw data is consumed from topics and then further processed or transformed into a new topic or topics, that will be consumed for another round of processing. These processing pipelines create channels of real-time data. Kafka version 0.10.0.0 onwards, a powerful stream processing library known as Kafka Streams, has been made available in Kafka to process data in such a format.

Build an Awesome Job Winning Project Portfolio with Solved End-to-End Big Data Projects

Event Sourcing

Event sourcing refers to an application design that involves logging state changes as a sequence of records ordered based on time. Kafka’s ability to store large logs make it a great choice for event sourcing applications.

Kafka can be used as an external commit-log for a distributed application. 

Kafka’s replication feature can help replicate data between multiple nodes and allow re-syncing in failed nodes for data restoration when required. In addition, Kafka can also be used as a centralized repository for log files from multiple data sources and in cases where there is distributed data consumption. In such cases, data can be collected from physical log files of servers and from numerous sources and made available in a single location.

To explore the powerful and versatile streaming loads on Apache Kafka , it is necessary to have some hands-on experience working with the architecture in real life. Working on real-time Apache Kafka projects is an excellent way to build your big data skills and experience to nail your next big data job interview to land a top gig as a big data professional.

Access Solved Big Data and Data Science Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Video courses covering Apache Kafka basics, advanced concepts, setup and use cases, and everything in between.

A wide range of resources to get you started

Deep-dives into key concepts

Architectures for event streaming

Q & A about Kafka® and its ecosystem

The Confluent blog

presentation on kafka

Our podcast, Streaming Audio

Build a client app, explore use cases, and build on our demos and resources

Build apps in your favorite language

Hands-on stream processing examples

More resources to get you started

Confluent proudly supports the global community of streaming platforms, real-time data streams, Apache Kafka®️, and its ecosystems

Kafka and data streaming community

Community forums and Slack channels

Sharing expertise with the community

Bi-weekly newsletter with Apache Kafka® resources, news from the community, and fun links.

Nominate amazing use cases and view previous winners

Real-world Examples of Apache Kafka® and Flink® in action

Register now!

View sessions and slides from Current 2023

View sessions and slides from Kafka Summit 2023

  • Get Started with Confluent Platform

Kafka Basics on Confluent Platform ¶

Apache Kafka® is an open-source, distributed, event streaming platform capable of handling large volumes of real-time data. You use Kafka to build real-time streaming applications. Confluent is a commercial, global corporation that specializes in providing businesses with real-time access to data. Confluent was founded by the creators of Kafka, and its product line includes proprietary products based on open-source Kafka. This topic describes Kafka use cases, the relationship between Confluent and Kafka, and key differences between the Confluent products.

How Kafka relates to Confluent ¶

Confluent products are built on the open-source software framework of Kafka to provide customers with reliable ways to stream data in real time. Confluent provides the features and know-how that enhance your ability to reliably stream data. If you’re already using Kafka, that means Confluent products support any producer or consumer code you’ve already written with the Kafka Java libraries. Whether you’re already using Kafka or just getting started with streaming data, Confluent provides features not found in Kafka. This includes non-Java libraries for client development and server processes that help you stream data more efficiently in a production environment, like Confluent Schema Registry , ksqlDB , and Confluent Hub . Confluent offers Confluent Cloud , a data-streaming service, and Confluent Platform , software you download and manage yourself.

Kafka use cases ¶

Consider an application that uses Kafka topics as a backend to store and retrieve posts, likes, and comments from a popular social media site. The application incorporates producers and consumers that subscribe to those Kafka topics. When a user of the application publishes a post, likes something, or comments, the Kafka producer code in the application sends that data to the associated topic. When the user navigates to a particular page in the application, a Kafka consumer reads from the associated backend topic and the application renders data on the user’s device. For more information, see Use cases in the Apache Kafka Docs hosted by Confluent.

Confluent Platform ¶

Confluent Platform is software you download and manage yourself. Any Kafka use cases are also Confluent Platform use cases. Confluent Platform is a specialized distribution of Kafka that includes additional features and APIs . Many of the commercial Confluent Platform features are built into the brokers as a function of Confluent Server .

The fundamental capabilities, concepts , design ethos , and ways of working that you already know from using Kafka, also apply to Confluent Platform. By definition, Confluent Platform ships with all of the basic Kafka command utilities and APIs used in development, along with several additional CLIs to support Confluent specific features. To learn more about Confluent Platform, see What is Confluent Platform? .

Confluent Platform releases include the latest stable version of Apache Kafka , so when you install Confluent Platform you are also installing Kafka. To view a mapping of Confluent Platform release to Kafka versions, see Supported Versions and Interoperability for Confluent Platform .

Ready to get started?

  • Download Confluent Platform , the self managed, enterprise-grade distribution of Apache Kafka and get started using the Confluent Platform quick start .

Confluent Cloud ¶

Confluent Cloud provides Kafka as a cloud service, so that means you no longer need to install, upgrade or patch Kafka server components. You also get access to a cloud-native design , which offers Infinite Storage, elastic scaling and an uptime guarantee. If you’re coming to Confluent Cloud from open source Kafka, you can use data-streaming features only available from Confluent, including non-Java client libraries and proxies for Kafka producers and consumers, tools for monitoring and observability, an intuitive browser-based user interface, enterprise-grade security and data governance features.

Confluent Cloud includes different types of server processes for steaming data in a production environment. In addition to brokers and topics, Confluent Cloud provides implementations of Kafka Connect, Schema Registry, and ksqlDB.

  • Sign up for Confluent Cloud , the fully managed cloud-native service for Apache Kafka® and get started for free using the Cloud quick start .

What’s next ¶

  • An overview of options for How to run Confluent Platform
  • Instructions on how to set up Confluent Enterprise deployments on a single laptop or machine that models production style configurations, such as multi-broker or multi-cluster , including discussion of replication factors for topics
  • Kafka Commands Primer , a commands cheat sheet that also helps clarify how Apache Kafka® utilities might fit into a development or administrator workflow
  • Explanation of how to configure listeners, Metrics Reporter, and REST endpoints on a multi-broker setup so that all of the brokers and other components show up on Confluent Control Center. Brief introduction to using Control Center to verify topics and messages you create with Kafka commands .
  • Code Examples and Demo Apps

How to Run Confluent Platform ¶

You have several options for running Confluent Platform (and Kafka), depending on your use cases and goals.

Quick Start ¶

For developers who want to get familiar with the platform, you can start with the Quick Start for Confluent Platform . This quick start shows you how to run Confluent Platform using Docker on a single broker, single cluster development environment with topic replication factors set to 1 .

If you want both an introduction to using Confluent Platform and an understanding of how to configure your clusters, a suggested learning progression is:

  • Follow the steps for a local install as shown in the Quick Start for Confluent Platform and run a default single-broker cluster. Experiment with the features as shown in the workflow for that tutorial.
  • Return to this page and walk through the steps to configure and run a multi-broker cluster .

Multi-node production-ready deployments ¶

Operators and developers who want to set up production-ready deployments can follow the workflows for Install Confluent Platform On-Premises or Ansible Playbooks .

Single machine, multi-broker and multi-cluster configurations ¶

To bridge the gap between the developer environment quick starts and full-scale, multi-node deployments, you can start by pioneering multi-broker clusters and multi-cluster setups on a single machine, like your laptop.

Trying out these different setups is a great way to learn your way around the configuration files for Kafka broker and Control Center, and experiment locally with more sophisticated deployments. These setups more closely resemble real-world configurations and support data sharing and other scenarios for Confluent Platform specific features like Replicator, Self-Balancing, Cluster Linking, and multi-cluster Schema Registry.

  • For a single cluster with multiple brokers, you must configure and start a single ZooKeeper or KRaft controller, and as many brokers as you want to run in the cluster. A detailed example of how to run this with ZooKeeper is provided in the Run a multi-broker cluster section that follows.
  • For a multi-cluster deployment, you should have a dedicated controller for each cluster, and a Kafka server properties file for each broker. To learn more about multi-cluster setups, see Run multiple clusters .

Does all this run on my laptop? ¶

Yes, these examples show you how to run all clusters and brokers on a single laptop or machine.

That said, you can apply what you learn in this topic to create similar deployments on your favorite cloud provider, using multiple virtual hosts. Use these examples as stepping stones to more complex deployments and feature integrations.

KRaft and ZooKeeper ¶

As of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. To learn more about running Kafka in KRaft mode, see KRaft Overview , the KRaft steps in the Platform Quick Start , and Settings for other components .

The following tutorial on how to run a multi-broker cluster provides examples for both KRaft mode and ZooKeeper mode.

For KRaft, the examples show an isolated mode configuration for a multi-broker cluster managed by a single controller. This maps to the deprecated ZooKeeper configuration, which uses one ZooKeeper and multiple brokers in a single cluster. To learn more about KRaft, see KRaft Overview and Kraft mode under Configure Confluent Platform for production .

In addition to some other differences noted in the steps below, note that:

  • For KRaft mode, you will use $CONFLUENT_HOME/etc/kafka/kraft/broker.properties and $CONFLUENT_HOME/etc/kafka/kraft/controller.properties .
  • For ZooKeeper mode, you will use $CONFLUENT_HOME/etc/kafka/server.properties and $CONFLUENT_HOME/etc/kafka/zookeeper.properties .

Run a multi-broker cluster ¶

To run a single cluster with multiple brokers (3 brokers, for this example) you need:

  • 1 controller properties file (KRaft mode) or 1 ZooKeeper properties file (ZooKeeper mode)
  • 3 Kafka broker properties files with unique broker IDs, listener ports (to surface details for all brokers on Control Center), and log file directories.
  • Control Center properties file with the REST endpoints for controlcenter.cluster mapped to your brokers.
  • Metrics Reporter JAR file installed and enabled on the brokers. (If you start Confluent Platform as described below, from $CONFLUENT_HOME/bin/ , the Metrics Reporter is automatically installed on the broker. Otherwise, you would need to add the path to the Metrics Reporter JAR file to your CLASSPATH.)
  • Properties files for any other Confluent Platform components you want to run, with default settings to start with.

../_images/kafka-basics-multi-broker-kraft.png

All of this is described in detail below.

Configure replication factors ¶

The broker.properties (KRaft) and server.properties (ZooKeeper) files that ships with Confluent Platform have replication factors set to 1 on several system topics to support development test environments and Quick Start for Confluent Platform scenarios. For real-world scenarios, however, a replication factor greater than 1 is preferable to support fail-over and auto-balancing capabilities on both system and user-created topics.

For the purposes of this example, set the replication factors to 2 , which is one less than the number of brokers ( 3 ). When you create your topics, make sure that they also have the needed replication factor, depending on the number of brokers.

Run these commands to update replication configurations in KRaft mode.

Run these commands to update replication configurations in ZooKeeper mode.

When you complete these steps, your file should show the following configs:

  • offsets.topic.replication.factor=2
  • transaction.state.log.replication.factor=2
  • confluent.license.topic.replication.factor=2
  • confluent.metadata.topic.replication.factor=2
  • confluent.balancer.topic.replication.factor=2

Configuration snapshot preview: Basic configuration for a three-broker cluster ¶

The following table shows a summary of the configurations to specify for each of these files, as a reference to check against if needed. The steps in the next sections guide you through a quick way to set up these files, using existing the existing broker.properties file (KRaft) or server.properties file (ZooKeeper) as a basis for your specialized ones.

Ready to get started? Skip to Configure the servers .

In server.properties and other configuration files, commented out properties or those not listed at all, take the default values. For example, the commented out line for listeners on broker 0 has the effect of setting a single listener to PLAINTEXT://:9092 .

Configure the servers ¶

Start with the broker.properties file you updated in the previous sections with regard to replication factors and enabling Self-Balancing Clusters. You will make a few more changes to this file, then use it as the basis for the other servers.

Update the node ID, controller quorum voters and port for the first broker, and then add the REST endpoint listener configuration for this broker at the end of the file:

Copy the properties file for the first broker to use as a basis for the other two:

Update the node ID, listener, and data directories for broker-1, and then update the REST endpoint listener for this broker:

Update the node ID, listener, controller, and data directories for broker-2, and then update the REST endpoint listener for this broker:

Finally, update the controller node ID, quorum voters, and port:

Start with the server.properties file you updated in the previous sections with regard to replication factors and enabling Self-Balancing. You will make a few more changes to this file, then use it as the basis for the other servers.

Uncomment the listener, and then add the REST endpoint listener configuration at the end of the file:

Copy the properties file for the first server to use as a basis for the other four servers. This is the file you updated in the previous sections with regard to replication factors and enabling Self-Balancing.

Update the broker ID, listener, and data directories for server-1, and then update the REST endpoint listener for this broker:

Update the broker ID, listener, and data directories for server-2, and then update the REST endpoint listener for this broker:

When you have completed this step, you will have three properties files that match the configurations shown in the Configuration snapshot preview: Basic configuration for a three-broker cluster :

  • broker.properties (KRaft) or server.properties (ZooKeeper) which corresponds to node/broker 0
  • broker-1.properties (KRaft) or server-1.properties (ZooKeeper) which corresponds to node/broker 1
  • broker-2.properties (KRaft) or server-2.properties (ZooKeeper) which corresponds to node/broker 2

Run this command to list the files in KRaft mode:

Run this command to list the files in ZooKeeper mode:

Configure Control Center with REST endpoints and advertised listeners (Optional) ¶

This is an optional step, only needed if you want to use Confluent Control Center. It gives you a similar starting point as you get in the Quick Start for Confluent Platform , and an alternate way to work with and verify the topics and data you will create on the command line with kafka-topics .

You must tell Control Center about the REST endpoints for all brokers in your cluster, and the advertised listeners for the other components you may want to run. Without these configurations, the brokers and components will not show up on Control Center.

Make the following changes to $CONFLUENT_HOME/etc/confluent-control-center/control-center-dev.properties and save the file.

Open the file in an editor; for example, in vi :

Configure REST endpoints for the brokers.

In $CONFLUENT_HOME/etc/confluent-control-center/control-center-dev.properties , replace the default value for the Kafka REST endpoint URL by a copy-paste of the following lines to match your multi-broker configuration:

Required Configurations for Control Center in Self-Balancing Configuration Options and confluent.controlcenter.streams.cprest.url in the Control Center Configuration Reference .

Replace the configurations for Kafka Connect, ksqlDB, and Schema Registry to provide Control Center with the default advertised URLs to for the component clusters. You can delete the original configs and copy-paste the following into the file.

Install the Datagen Connector (Optional) ¶

Install the Kafka Connect Datagen source connector using the confluent connect plugin install command, or by using Confluent Hub. This connector generates mock data for demonstration purposes and is not suitable for production.

To install with the confluent connect plugin install command:

Confluent Hub provides an online library of pre-packaged and ready-to-install extensions or add-ons for Confluent Platform and Kafka. To install with Confluent Hub:

This is an optional step, but useful, as it gives you a similar starting point as you get in the Quick Start for Confluent Platform .

Start the controller and brokers ¶

In KRaft mode, you must run the following commands from `$CONFLUENT_HOME to generate a random cluster ID, and format log directories for the controller and each broker in dedicated command windows. You will then start the controller and brokers from those same dedicated windows.

The kafka-storage command is run only once per broker/controller. You cannot use the kafka-storage command to update an existing cluster. If you make a mistake in configurations at that point, you must recreate the directories from scratch, and work through the steps again.

In a new dedicated command window, change directories into $CONFLUENT_HOME to run the KRaft setup commands and start the controller.

Generate a random-uuid for the cluster using the kafka-storage tool.

Get the value for KAFKA_CLUSTER_ID and add it to your .bash_profile , .bashrc , .zsh or similar so that it is available to you in new command windows for running the brokers. You will use this same cluster ID for all brokers.

Format the log directories for the controller:

Start the controller:

broker.properties (node 0)

In a new command window dedicated to running node 0, change directories into $CONFLUENT_HOME to run the KRaft setup commands and start your first broker.

Make sure that the KAFKA_CLUSTER_ID you generated for the controller is available in this shell as an environment variable.

( Optional Example ) For example, if you added the value for KAFKA_CLUSTER_ID to your .bash_profile :

Format the log directories for this broker:

Start the broker:

broker-1.properties (node 1)

In a new command window dedicated to running node 1, change directories into $CONFLUENT_HOME to run the KRaft setup commands and start broker-1.

Format the log directories for broker-1:

broker-2.properties (node 2)

In a new command window dedicated to running node 2, change directories into $CONFLUENT_HOME to run the KRaft setup commands and start broker-2.

Format the log directories for this broker-2:

Start ZooKeeper in its own command window:

Start each of the brokers in separate command windows:

Start the other components ¶

Start each of these components in separate windows.

For this example, it is not necessary to start all of these. At a minimum, you will need ZooKeeper and the brokers (already started), and Kafka REST. However, it is useful to have all components running if you are just getting started with the platform, and want to explore everything. This gives you a similar starting point as you get in Quick Start for Confluent Platform , and enables you to work through the examples in that Quick Start in addition to the Kafka command examples provided here .

Start Kafka REST

(Optional) Start Kafka Connect

(Optional) Start ksqlDB

(Optional) Start Schema Registry

(Optional) Finally, start Control Center in a separate command window.

Create Kafka topics, producers, and consumers ¶

If you are ready to start working at the command line, skip to Kafka Commands Primer and try creating Kafka topics, working with producers and consumers, and so forth.

Explore Control Center (Optional) ¶

Bring up Confluent Control Center to verify the current status of your cluster, including lead broker (controller), topic data, and number of brokers. For a local deployment, Control Center is available at http://localhost:9021/ in your web browser.

The starting view of your environment in Control Center shows your cluster with 3 brokers.

Click into the cluster card.

../_images/basics-c3-cluster.png

The cluster overview is displayed.

../_images/basics-c3-cluster-overview.png

Click either the Brokers card or Brokers on the menu to view broker metrics.

Notice the card for Active controller indicating that the lead broker is broker.id 0 , which was configured in server.properties when you specified broker.id=0 . On a multi-broker cluster, the role of the controller can change hands if the current controller is lost. To learn more, see What happens if the lead broker (controller) is removed or lost? , and topics on the “Controller” and “State Change Log” in Best Practices for Kafka Production Deployments in Confluent Platform .

../_images/basics-c3-brokers.png

From the brokers list at the bottom of the page, you can view detailed metrics and drill down on each broker.

../_images/basics-c3-brokers-list.png

Finally, click Topics on the left menu.

Note that only system (internal) topics are available at this point because you haven’t created any topics of your own yet. The default_ksql_processing_log will show up as a topic if you configured and started ksqlDB.

There is a lot more to Control Center but it is not the focus of this guide. If you haven’t had a chance to work all the way through a quick start (which demos tasks on Control Center), technically you could jump over to Quick Start for Confluent Platform and work through those same tasks on this cluster (starting with creating Kafka topics on Control Center), and then come back to this guide to continue with the examples in Kafka Commands Primer .

Everything should work the same for the Quick Start steps. The only difference is that here you have a multi-broker cluster with replication factors set appropriately for additional examples, and the deployment in the quick start is a single-broker cluster with replication factors set to 1 for a development-only environment.

Kafka Commands Primer ¶

After you have Confluent Platform running, an intuitive next step is try out some basic Kafka commands to create topics and work with producers and consumers. This should help orient Kafka newbies and pros alike that all those familiar Kafka tools are readily available in Confluent Platform, and work the same way. These provide a means of testing and working with basic functionality, as well as configuring and monitoring deployments. The commands surface a subset of the APIs available to you.

A few things to note:

  • Confluent Platform ships with Kafka commands and utilities in $CONFLUENT_HOME/bin . This bin/ directory includes both Confluent proprietary and open source Kafka utilities. A full list is provided in CLI Tools Shipped With Confluent Platform . Those in the list that begin with kafka- are the Kafka open source command utilities. A reference for Confluent proprietary commands is provided in CLI Tools for Confluent Platform .
  • With Confluent Platform installed and running on your system, you can run Kafka commands from anywhere; for example, from your $HOME ( ~/ ) directory. You do not have to run these from within $CONFLUENT_HOME .
  • Command line help is available by typing any of the commands with no arguments; for example, kafka-topics or kafka-producer-perf-test .

To help get you started, the sections below provide examples for some of the most fundamental and widely-used Kafka scripts.

Create, list and describe topics ¶

You can use kafka-topics for operations on topics (create, list, describe, alter, delete, and so forth).

In a command window, run the following commands to experiment with topics.

Create three topics, cool-topic , warm-topic , hot-topic .

List all topics.

System topics are prefaced by an underscore in the output. The topics you created are listed at the end.

Describe a topic.

This shows partitions, replication factor, and in-sync replicas for the topic.

Your output should resemble the following:

If you run kafka-topics --describe with no specified topic, you get a detailed description of every topic on the cluster (system and user topics).

Describe another topic, using one of the other brokers in the cluster as the bootstrap server.

Here is that example output:

You can connect to any of the brokers in the cluster to run these commands because they all have the same data!

Alter a topic’s cofiguration.

For this example, change the partition count on hot-topic from 2 to 9 .

Dynamic topic modification is inherently limited by the current configurations. For example, you cannot decrease the number of partitions or modify the replication factor for a topic, as that would require partition reassignment.

Rerun --describe on the same topic.

Here is that example output, and verify that the partition count is updated to 9 :

Delete a topic.

Run producers and consumers to send and read messages ¶

The command utilities kafka-console-producer and kafka-console-consumer allow you to manually produce messages to and consume from a topic.

Open two new command windows, one for a producer, and the other for a consumer.

Run a producer to produce to cool-topic .

Send some messages.

Type your messages at the prompt ( > ), and hit Return after each one.

Your command window will resemble the following:

You can use the --broker-list flag in place of --bootstrap-server for the producer, typically used to send data to specific brokers; shown here as an example.

In the other command window, run a consumer to read messages from cool-topic . Specify that you want to start consuming from the beginning, as shown.

Your output will resemble the following:

When you want to stop the producer and consumer, type Ctl-C in their respective command windows.

You may want to leave at least the producer running for now, in case you want to send more messages when we revisit topics on the Control Center.

Produce auto-generated message data to topics ¶

You can use kafka-producer-perf-test in its own command window to generate test data to topics.

For example, open a new command window and type the following command to send data to hot-topic , with the specified throughput and record size.

The command provides status output on messages sent, as shown:

Open a new command window to consume the messages from hot-topic as they are sent (not from the beginning).

Type Ctl-C to stop the consumer.

You may want to leave the producer running for a moment, as you are about to revisit Topics on the Control Center.

To learn more, check out Benchmark Commands , Let’s Load test, Kafka! , and How to do Performance testing of Kafka Cluster

Revisit Control Center (Optional) ¶

Now that you have created some topics and produced message data to a topic (both manually and with auto-generated), take another look at Control Center, this time to inspect the existing topics.

Open a web browser and go to http://localhost:9021/ , the default URL for Control Center on a local system.

Select the cluster, and click Topics from the menu.

Choose cool-topic , then select the Messages tab.

Select Jump to offset and type 1 , 2 , or 3 to display previous messages.

These messages do not show in the order they were sent because the consumer here is not reading --from-beginning .

Try manually typing some more messages to cool-topic with your command line producer, and watch them show up here.

../_images/basics-c3-topics-messages-cool.png

Navigate to Topics > hot-topic > Messages tab.

Auto-generated messages from your kafka-producer-perf-test are shown here as they arrive.

../_images/basics-c3-topics-messages-hot.png

Shutdown and cleanup tasks ¶

Run the following shutdown and cleanup tasks.

Stop the kafka-producer-perf-test with Ctl-C in its respective command window.

Stop the all of the other components with Ctl-C in their respective command windows, in reverse order in which you started them. For example, stop Control Center first, then other components, followed by Kafka brokers, and finally ZooKeeper.

Remove log files from /tmp . For example, if you were running in KRaft mode:

Run multiple clusters ¶

Another option to experiment with is a multi-cluster deployment. This is relevant for trying out features like Replicator, Cluster Linking, and multi-cluster Schema Registry, where you want to share or replicate topic data across two clusters, often modeled as the origin and the destination cluster.

These configurations can be used for data sharing across data centers and regions and are often modeled as source and destination clusters. An example configuration for cluster linking is shown in the diagram below. (A full guide to this setup is available in the Tutorial: Share Data Across Topics Using Cluster Linking for Confluent Platform .)

../_images/kafka-basics-multi-cluster.png

Multi-cluster configurations are described in context under the relevant use cases. Since these configurations will vary depending on what you want to accomplish, the best way to test out multi-cluster is to choose a use case, and follow the feature-specific tutorial. The specifics of these configurations vary depending on whether you are using KRaft in combined or isolated mode, or ZooKeeper.

  • Tutorial: Share Data Across Topics Using Cluster Linking for Confluent Platform (requires Confluent Platform 6.0.0 or newer, recommended as the best getting started example)
  • Tutorial: Replicate Data Across Kafka Clusters in Confluent Platform
  • Enabling Multi-Cluster Schema Registry

Code Examples and Demo Apps ¶

Following are links to examples of Confluent Platform distributed applications that uses Kafka topics, along with producers, and consumers that subscribe to those topics, in an event subscription model. The idea is to complete the picture of how Kafka and Confluent Platform can be used to accomplish a task or provide a service.

  • Kafka Streams examples
  • Demo Scene examples

Related content ¶

  • To learn how serverless infrastructure is built and apply these learnings to your own projects, see Cloud-Native Apache Kafka: Designing Cloud Systems for Speed and Scale
  • Kafka commands
  • Admin operations
  • Apache Quick Start Guides
  • Introduction to Kafka
  • Configure a multi-Node Apache Kafka environment with Docker and cloud providers
  • Confluent Blog
  • Confluent Community
  • developer.confluent.io

Blog post: Publishing with Apache Kafka at The New York Times

Best Kafka Summit Videos

The following talks, with video recordings and slides available, achieved the best ratings by the community at the Kafka Summit conferences from 2018 onwards. Thanks to all the speakers for their hard work!

Kafka Internals and Fundamentals

Applications and use cases, architecture and patterns, data pipelines, kafka operations.

  • Making Kafka Cloud Native , Jay Kreps (Confluent), KS EU 2021
  • How ksqlDB works , Michael Drogalis (Confluent), KS EU 2021
  • Understanding Kafka Produce and Fetch API Calls for High Throughput Applications , Jason Gustafson (Confluent), KS EU 2021
  • A Kafkaesque Raft Protocol , Mik Kocikowski (Cloudflare), KS EU 2021
  • Kafka ♥ Cloud (Keynote) , Jay Kreps (Confluent), KS 2020
  • Trade-offs in Distributed Systems Design: Is Kafka The Best? , Ben Stopford & Michael G. Noll (Confluent), KS 2020
  • The Flux Capacitor of Kafka Streams and ksqlDB , Matthias J. Sax (Confluent), KS 2020
  • Crossing the Streams: the New Streaming Foreign-Key Join Feature in Kafka Streams , John Roesler (Confluent), KS 2020
  • Welcome to Kafka; We're Glad You're Here , Dave Klein (Centene), KS 2020
  • Kafka's New Architecture (Keynote) , Gwen Shapira (Confluent), KS 2020
  • Getting Started with Apache Kafka – a Contributor's Journey , Israel Ekpo (Microsoft) & Matthias J. Sax (Confluent) & Nikolay Izhikov (Sberbank), KS 2020
  • Kafka Needs no (Zoo)Keeper ( abstract ), Jason Gustafson & Colin McCabe (Confluent), SFO 2019
  • Why Stop the World When you Can Change it? Design and Implementation of Incremental Cooperative Rebalancing ( abstract ), Konstantine Karantasis (Confluent), SFO 2019
  • Kafka 102: Streams and Tables All the Way Down ( abstract ), Michael G. Noll (Confluent), SFO 2019
  • What’s the time? …and why? ( abstract ), Matthias J. Sax (Confluent), SFO 2019
  • Zen and the Art of Streaming Joins: The What, When and Why ( abstract ), Nick Dearden (Confluent), NYC 2019
  • Exactly Once Semantics Revisited ( abstract ), Jason Gustafson (Confluent), NYC 2019
  • Performance Analysis and Optimizations for Kafka Streams Applications ( abstract ), Guozhang Wang (Confluent), LON 2019
  • Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Were Afraid to Ask ( abstract ), Matthias J. Sax (Confluent), LON 2019
  • Reliable Message Delivery with Apache Kafka ( abstract ), Ying Zheng & Xiaobing Li (Uber), SFO 2018
  • Hardening Kafka Replication ( abstract ), Jason Gustafson (Confluent), SFO 2018
  • Don’t Repeat Yourself: Introducing Exactly-Once Semantics in Apache Kafka ( abstract ), Matthias J. Sax (Confluent), LON 2018
  • Should You Read Kafka as a Stream or in Batch? Should You Even Care? , Ido Nadler (Nielsen) & Opher Dubrovsky (Nielsen)
  • Scaling a Core Banking Engine Using Apache Kafka , Peter Dudbridge (Thought Machine), KS APAC 2021
  • Scaling an Event-Driven Architecture with IBM and Confluent , Antony Amanse (IBM) & Anton McConville (IBM), KS EU 2021
  • Development of Dynamic Pricing for Tours Using Real-time Data Feeds , Mourad Benabdelkerim (FREE NOW), KS EU 2021
  • Building Event Streaming Applications with Pac-Man , Ricardo Ferreira (Confluent), KS 2020
  • Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning without a Data Lake , Kai Waehner (Confluent), KS 2020
  • KSQL-ops! Running ksqlDB in the Wild , Simon Aubury (ThoughtWorks), KS 2020
  • Streaming Towards Our Quantum Future , David Elbert & Tyrel McQueen (Johns Hopkins University), KS 2020
  • KafkaConsumer – Decoupling Consumption and Processing for Better Resource Utilization , Igor Buzatović (Inovativni trendovi d.o.o), KS 2020
  • Can Kafka Handle a Lyft Ride? , Andrey Falko & Can Cecen (Lyft), KS 2020
  • Flattening the Curve with Kafka , Rishi Tarar (Northrop Grumman Corp.), KS 2020
  • Risk Management in Retail with Stream Processing , Daniel Jagielski (Virtuslab/Tesco), KS 2020
  • 0-60: Tesla’s Streaming Data Platform ( abstract ), Jesse Yates (Tesla), SFO 2019
  • Eventing Things – A Netflix Original! ( abstract ), Nitin Sharma (Netflix), SFO 2019
  • Mission-Critical, Real-Time Fault-Detection for NASA’s Deep Space Network using Apache Kafka ( abstract ), Rishi Verma (NASA Jet Propulsion Laboratory), SFO 2019
  • Using Kafka to Discover Events Hidden in your Database ( abstract ), Anna McDonald (SAS Institute), SFO 2019
  • Building an Enterprise Eventing Framework ( abstract ), Bryan Zelle (Centene) & Neil Buesing (Object Partners, Inc), SFO 2019
  • ksqlDB Performance Tuning for Fun and Profit ( abstract ), Nick Dearden (Confluent), SFO 2019
  • Discovering Drugs with Kafka Streams ( abstract ), Ben Mabey & Scott Nielsen (Recursion Pharmaceutical), SFO 2019
  • Being an Apache Kafka Developer Hero in the World of Cloud ( abstract ), Ricardo Ferreira (Confluent), SFO 2019
  • Streaming Apps and Poison Pills: handle the unexpected with Kafka Streams ( abstract ), Loic Divad (Xebia France), SFO 2019
  • Scaling for India's Cricket Hungry Population ( abstract ), Bhavesh Raheja & Namit Mahuvakar (Hotstar), SFO 2019
  • Leveraging Services in Stream Processor Apps at Ticketmaster ( abstract ), Derek Cline (Ticketmaster), SFO 2019
  • Stream Processing with the Spring Framework (Like You’ve Never Seen It Before) ( abstract ), Josh Long (Pivotal), Tim Berglund (Confluent), NYC 2019
  • How To Use Apache Kafka and Druid to Tame Your Router Data ( abstract ), Rachel Pedreschi (Imply Data), NYC 2019
  • Kafka Connect and ksqlDB: Useful Tools in Migrating from a Legacy System to Kafka Streams ( abstract ), Alex Leung & Danica Fine (Bloomberg L.P.), NYC 2019
  • Building Serverless Apps with Kafka ( abstract ), Dale Lane (IBM), LON 2019
  • Introducing Events and Stream Processing into Nationwide Building Society ( abstract ), Robert Jackson & Pete Cracknell (Nationwide Building Society), LON 2019
  • The Exciting Frontier of Custom ksqlDB Functions ( abstract ), Mitch Seymour (Mailchimp), LON 2019
  • Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL and ksqlDB ( abstract ), Neil Buesing (Object Partners, Inc), LON 2019
  • ksqlDB in Practice ( abstract ), Almog Gavra (Confluent), LON 2019
  • Industry-ready NLP Service Framework Based on Kafka ( abstract ), Bernhard Waltl & Georg Bonczek (BMW Group), LON 2019
  • Data Streaming Ecosystem Management at Booking.com ( abstract ), Alex Mironov (Booking.com), SFO 2018
  • Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform from Scratch ( abstract ), Benny Lee & Christopher Arthur (Commonwealth Bank of Australia), SFO 2018
  • Life is a Stream of Events ( abstract ), Bjørn Kvernstuen & Tommy Jocumsen (Norwegian Directorate for Work and Welfare), SFO 2018
  • Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE ( abstract ), Yuto Kawamura (LINE Corporation), SFO 2018
  • Matching the Scale at Tinder with Kafka ( abstract ), Krunal Vora (Tinder), SFO 2018
  • Building an Enterprise Streaming Platform at Capital One , Chris D'Agostino (Capital One), SFO 2018
  • More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn ( abstract ), Celia Kung (LinkedIn), SFO 2018
  • From Propeller to Jet: Upgrading Your Engines Mid-Flight ( abstract ), Christopher Morris (PagerDuty), SFO 2018
  • A Visual Approach to Understanding Streaming SQL ( abstract ), Shant Hovespian (Arcadia Data), SFO 2018
  • Digital Transformation in Healthcare with Kafka—Building a Low Latency Data Pipeline ( abstract ), Dmitry Milman & Ankur Kaneria (Express Scripts), SFO 2018
  • Building Pinterest Real-Time Ads Platform Using Kafka Streams ( abstract ), Liquan Pei & Boyang Chen (Pinterest), SFO 2018
  • Taking Stateful Stream Processing to the Next Level with Kafka and Flink ( abstract ), Stephan Ewen (Ververica), LON 2018
  • Taming Billions of Metrics and Logs at Scale: Two Years with Kafka as a Central Data Hub for Monitoring at CERN ( abstract ), Luca Magnoni (CERN), LON 2018
  • ksqlDB 201: A Deep Dive into Query Processing ( abstract ), Hojjat Jafarpour (Confluent), LON 2018
  • The Evolution of Kafka at ING Bank ( abstract ), Timor Timuri & Richard Bras (ING), LON 2018
  • Mistakes - I’ve Made a Few. Blunders in Event-driven Architecture , Simon Aubury (ThoughtWorks Australia), KS APAC 2021
  • Better CQRS with ksqlDB , Anshuman Mukherjee (Airwallex), KS APAC 2021
  • Sharing Data Among Microservices: How Change Data Capture with Kafka Connect Came to Our Rescue , Ali Nazemian (Brolly) & Milad Vahood (Brolly), KS APAC 2021
  • Apache Kafka and the Data Mesh , Ben Stopford (Confluent) & Michael Noll (Confluent), KS EU 2021
  • How to Build the Data Mesh Foundation: A Principled Approach , Zhamak Dehghani (ThoughtWorks), KS EU 2021
  • Getting up to Speed with MirrorMaker 2 , Mickael Maison (IBM) & Ryanne Dolan (Twitter), KS EU 2021
  • Reacting to an Event-Driven World , Kate Stanley & Grace Jansen (IBM), KS 2020
  • A Tale of Two Data Centers: Kafka Streams Resiliency , Anna McDonald (Confluent), KS 2020
  • Kafka as your Data Lake – is it Feasible? , Guido Schmutz (Trivadis), KS 2020
  • GDPR Compliance: Transparent Handing of Personally Identifiable Information in Event-Driven Systems , Masih Derkani (SolarWinds), KS 2020
  • Synchronous Commands over Apache Kafka , Neil Buesing (Object Partners, Inc), KS 2020
  • Hybrid Kafka, Taking Real-time Analytics to the Business , Cody Irwin (Google Cloud) & Josh Treichel (Confluent) & Jeff Ferguson (Confluent), KS 2020
  • I Don't Always Test My Streams, But When I Do, I Do it in Production , Viktor Gamov (Confluent), KS 2020
  • Building Information Systems using Event Modeling , Bobby Calderwood (Evident Systems), KS 2020
  • Building Event Driven Architectures with Kafka and Cloud Events ( abstract ), Dan Rosanova (Microsoft), SFO 2019
  • Event Sourcing, Stream Processing, and Serverless , Ben Stopford (Confluent), SFO 2019
  • Shattering The Monolith(s) ( abstract ), Martin Kess (Namely), SFO 2019
  • Hard Truths About Streaming and Eventing ( abstract ), Dan Rosanova (Microsoft), NYC 2019
  • Complex Event Flows in Distributed Systems ( abstract ), Bernd Ruecker (Camunda), NYC 2019
  • Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using Apache Kafka ( abstract ), Chunky Gupta & Osman Sarood (Mist Systems), NYC 2019
  • Tracing for Kafka-Based Applications: Making Sense of Your Event-Driven Dataflows ( abstract ), Jorge Esteban Quilcate Otoya (SYSCO AS), NYC 2019
  • From a Million to a Trillion Events Per Day: Stream Processing in Ludicrous Mode ( abstract ), Shrijeet Paliwal (Tesla) NYC 2019
  • The Migration to Event-Driven Microservices ( abstract ), Adam Bellemare (Flipp), NYC 2019
  • Talking Traffic: Data in the Driver’s Seat ( abstract ), Dominique Chanet (Klarrio), LON 2019
  • Privacy Engineering for the World of Kafka ( abstract ), Alexander Cook (Privitar), LON 2019
  • Riddles of Streaming – Code Puzzlers for Fun & Profit ( abstract ), Nick Dearden (Confluent), LON 2019
  • Is Kafka a Database? , Martin Kleppmann (University of Cambridge), LON 2019
  • Event-Driven Workflow: Monitoring and Orchestrating Your Microservices Landscape with Kafka and Zeebe ( abstract ), Mike Winters & Sebastian Menski (Camunda), LON 2019
  • Handling GDPR with Apache Kafka: How to Comply Without Freaking Out? ( abstract ), David Jacot (Independent, formally Swisscom), LON 2019
  • One Key to Rule them All ( abstract ), Richard Noble & Francesco Nobilia (Babylon Health), LON 2019
  • How To Use Kafka and Druid to Tame Your Router Data ( abstract ), Rachel Pedreschi And Eric Graham (Imply Data), LON 2019
  • Apache Kafka and Event-Oriented Architecture , Jay Kreps (Confluent), SFO 2018
  • Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Your Streaming Data Platform ( abstract ), Bob Lehmann (Bayer), SFO 2018
  • Breaking Down a SQL Monolith with Change Tracking, Kafka and KStreams/ksqlDB ( abstract Wanny Morellato (SAP Concur), SFO 2018
  • Kafka as an Eventing System to Replatform a Monolith into Microservices ( abstract ), Madhulika Tripathi (Intuit), SFO 2018
  • Experimentation Using Event-based Systems , Martin Fowler & Toby Clemson (ThoughtWorks), LON 2018
  • On Track with Apache Kafka: Building a Streaming ETL Solution with Rail Data , Robin Moffatt (Confluent), KS APAC 2021
  • Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! , Robin Moffat (Confluent), KS 2020
  • Change Data Capture Pipelines with Debezium and Kafka Streams , Gunnar Morling (Red Hat), KS 2020
  • Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming , Yaroslav Tkachenko (Activision), KS 2020
  • How Kroger Embraced a "Schema First" Philosophy in Building Real-time Data Pipelines , Rob Hoeting, Rob Hammonds & Lauren McDonald (Kroger), SFO 2019
  • Building Stream Processing Applications with Apache Kafka Using ksqlDB Robin Moffat (Confluent), SFO 2019
  • Kafka Connect: Operational Lessons Learned from the Trenches ( abstract ), Elizabeth Bennett (Confluent), SFO 2019
  • Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka ( abstract ), Guido Schmutz (Trivadis), SFO 2019
  • Streaming Ingestion of Logging Events at Airbnb ( abstract ), Hao Wang (Airbnb), NYC 2019
  • No More Silos: Integrating Databases into Apache Kafka ( abstract ), Robin Moffatt (Confluent), NYC 2019
  • Lessons Learned Building a Connector Using Kafka Connect ( abstract ), Katherine Stanley & Andrew Schofield (IBM UK), NYC 2019
  • Building Scalable and Extendable Data Pipeline for Call of Duty Games ( abstract ), Yaroslav Tkachenko (Activision), NYC 2019
  • From Zero to Hero with Kafka Connect ( abstract ), Robin Moffatt (Confluent), LON 2019
  • Change Data Streaming Patterns For Microservices With Debezium ( abstract ), Gunnar Morling (Red Hat), LON 2019
  • Beyond Messaging: Enterprise-Scale, Multi-Cloud Intelligent Routing ( abstract ), Anand Phatak (Adobe), SFO 2018
  • You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard ( abstract ), Stephen Parente & Jeff Field (Blizzard), SFO 2018
  • So You Want to Write a Connector? ( abstract ), Randall Hauch (Confluent), SFO 2018
  • Mind the App: How to Monitor Your Kafka Streams Applications , Simon Aubury (ThoughtWorks), KS EU 2021
  • Encrypting Kafka Messages at Rest to Secure Applications , Robert Barnes (HashiCorp), KS EU 2021
  • Everything You Ever Needed to Know About Kafka on Kubernetes but Were Afraid to Ask , Jakob Scholz (Red Hat), KS EU 2021
  • Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools , Christopher Beard (Bloomberg), KS 2020
  • Overcoming the Perils of Kafka Secret Sprawl , Tejal Adsul (Confluent), KS 2020
  • Kafka Lag Monitoring For Human Beings , Elad Leev (AppsFlyer), KS 2020
  • Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka , Anna Povzner (Confluent), KS 2020
  • Kafka on Kubernetes: Keeping It Simple , Nikki Thean (Etsy), SFO 2019
  • Please Upgrade Apache Kafka. Now! Gwen Shapira (Confluent), SFO 2019
  • Tackling Kafka, with a Small Team , Jaren Glover (Robinhood), SFO 2019
  • Secure Kafka at scale in true multi-tenant environment ( abstract ), Vishnu Balusu & Ashok Kadambala (JP Morgan Chase), SFO 2019
  • Running large scale Kafka upgrades at Yelp ( abstract ), Manpreet Singh (Yelp), SFO 2019
  • Production Ready Kafka on Kubernetes ( abstract ), Devandra Tagare (Lyft), SFO 2019
  • Kafka Cluster Federation at Uber ( abstract ), Yupeng Fui & Xiaoman Dong (Uber), SFO 2019
  • Experiences Operating Apache Kafka® at Scale ( abstract ), Noa Resare (Apple), NYC 2019
  • Kafka Pluggable Authorization for Enterprise Security ( abstract ), Anna Kepler (Viasat), NYC 2019
  • Kafka on Kubernetes: Does it really have to be “The Hard Way”? ( abstract ), Viktor Gamov and Michael Ng (Confluent), NYC 2019
  • Flexible Authentication Strategies with SASL/OAUTHBEARER ( abstract ), Michael Kaminski (The New York Times) & Ron Dagostino (State Street Corp), NYC 2019
  • Bulletproof Apache Kafka with Fault Tree Analysis ( abstract ), Andrey Falko (Lyft), NYC 2019
  • The Foundations of Multi-DC Kafka ( abstract ), Jakub Korab (Confluent), LON 2019
  • Show Me Kafka Tools That Will Increase My Productivity ( abstract ), Stephane Maarek (DataCumulus)
  • Running Kafka in Kubernetes: A Practical Guide ( abstract ), Katherine Stanley (IBM UK), LON 2019
  • Running Production Kafka Clusters in Kubernetes ( abstract ), Balthazar Rouberol (Datadog), LON 2019
  • Don’t Be Scared: Multi-Tenant Cluster Support at Scale ( abstract ), Kelly Attaway & Cabe Waldrop (Pandora Media), LON 2019
  • Kafka on ZFS: Better Living Through Filesystems ( abstract ), Hugh O'Brien (Jet.com), SFO 2018
  • Kafka Security 101 and Real-World Tips ( abstract ), Stephane Maarek (DataCumulus), SFO 2018
  • End-to-End Security with Confluent Platform ( abstract ), Vahid Fereydouny (Confluent), SFO 2018
  • Robust Operations of Kafka Streams ( abstract ), Bill Bejeck (Confluent), SFO 2018
  • Deploying Kafka Streams Applications with Docker and Kubernetes ( abstract ), Gwen Shapira & Matthias J. Sax (Confluent), SFO 2018
  • URP? Excuse You! The Three Metrics You Have to Know ( abstract ), Todd Palino (LinkedIn), SFO 2018
  • Monitor Kafka Like a Pro ( abstract ), Gwen Shapira & Xavier Léauté (Confluent), LON 2018

Apache Feather

[Webinar] How to Build a Data Mesh in Confluent Cloud | Register Now

Login Contact Us

Forrester TEI Study: Save ~$2.58M Using Confluent Cloud vs Apache Kafka

Forrester TEI Study: Save ~$2.58M Using Confluent Cloud vs Apache Kafka

On Demand Demo:            Kafka streaming in 10 Minutes on Confluent Cloud

On Demand Demo: Kafka streaming in 10 Minutes on Confluent Cloud

Microservices with Confluent

Microservices with Confluent

Explore the 2024 data streaming report.

Explore the 2024 Data Streaming Report to discover the trends and tactics IT leaders are leveraging to boost ROI, AI adoption, and innovation with data streaming.

The Ultimate Guide to Understanding Event-Driven Microservices Architecture

Learn how Apache Kafka, Confluent, and event-driven microservices ensure real-time communication and event streaming for modernized deployment, testing, and continuous delivery.

The Data Streaming Revolution: The Force of Kafka + Flink Awakens

Shoe retailer NewLimits is struggling with decentralized data processing challenges and needs a manageable, cost-effective stream processing solution for an important upcoming launch. Join developer Ada and architect Jax as they learn why Apache Kafka and Apache Flink are better together.

I Heart Logs

Jay Kreps, CEO of Confluent and co-creator of Apache Kafka, shows how logs work in distributed systems, and provides practical applications of these concepts.

Disaster Recovery for Multi-Datacenter Apache Kafka Deployments

A practical guide to configuring multiple Apache Kafka clusters so that if a disaster scenario strikes, you have a plan for failover, failback, and ultimately successful recovery.

Real Time is Happening: Discover How Data Streaming Can Revolutionize Your Business

Businesses are discovering that they can create new business opportunities as well as make their existing operations more efficient using real-time data at scale. Learn how real-time data streams is revolutionizing your business.

Optimizing Your Apache Kafka® Deployment

This whitepaper discusses how to optimize your Apache Kafka deployment for various services goals including throughput, latency, durability and availability. It is intended for Kafka administrators and developers planning to deploy Kafka in production.

Microservices in the Apache Kafka® Ecosystem

This white paper provides a brief overview of how microservices can be built in the Apache Kafka ecosystem.

Insuring the Future Through Data

To succeed, insurance companies must unify data from all their channels that may be scattered across multiple legacy systems as well as new digital applications. Without the ability to access and combine all this data in real time, delivering a truly modern insurance experience while assessing fast-changing risks can be an uphill battle. Our eBook explains how event streaming, an emerging technology for analyzing event data in real time, can help insurers compete with their insuretech peers. You will learn how combining event streaming from Apache Kafka® and Confluent with Google Cloud can help you.

The Ongoing Disruption of Retail: A Shift to Data in Motion

Every one of your customer touch points, from an actual purchase to a marketing engagement, creates data streams and opportunities to trigger automations in real time.

How Businesses Succeed With Real-Time Data Streaming: 5 Use Cases

In this ebook, you’ll get a look at five of the common use cases when getting started with data streaming, with real-world customer examples and insights into how your organization can make the leap.

5 Steps to Event Streaming: The Pivot from Projects to a Platform

In this ebook, you’ll learn about the profound strategic potential in an event streaming platform for enterprise businesses of many kinds. The types of business challenges event streaming is capable of addressing include driving better customer experience, reducing costs, mitigating risk, and providing a single source of truth across the business. It can be a game changer.

Reference Architecture: Confluent and Snowflake

This document provides an overview of Confluent and Snowflake’s integration, a detailed tutorial for getting started with the integration, and unique considerations to keep in mind when working with these two technologies.

Apache Kafka Transaction Data Streaming for Dummies

Learn how CDC (Change Data Capture) captures database transactions for ingest into Confluent Platform to enable real-time data pipelines.

Kafka Serialization and Deserialization (SerDes) Examples

Dive into full Kafka examples, with connector configurations and Kafka Streams code, that demonstrate different data formats and SerDes combinations for building event streaming pipelines.

Confluent Platform Reference Architecture

Learn about the components of Confluent Enterprise, key considerations for production deployments, and guidelines for selecting hardware or deployment with different cloud providers.

Accelerate Cloud Migrations with Apache Kafka®

Learn why organizations are considering Apache Kafka to streamline cloud migrations.

Making Sense of Stream Processing

In this book, O’Reilly author Martin Kleppmann shows you how stream processing can make your data processing systems more flexible and less complex.

Confluent Platform Reference Architecture for Kubernetes

The reference architecture provides a detailed architecture for deploying Confluent Platform on Kubernetes and uses the Helm Charts for Confluent Platform as a reference to illustrate configuration and deployment practices.

Recommendations for Developers using Confluent Cloud

In this white paper, we offer recommendations and best practices for designing data architectures that will work well with Confluent Cloud.

O’Reilly | Kafka: The Definitive Guide 2nd Edition

Learn how to take full advantage of Apache Kafka®, the distributed, publish-subscribe queue for handling real-time data feeds.

Top 6 Reasons to Modernize Legacy Messaging Infrastructure

Learn the challenges of traditional messaging middleware, hindering innovation, low fault tolerance at scale, ephemeral persistence limiting data usage for analytics, and soaring technical debt and operational costs.

Download this Forrester study to understand the economic benefits of Confluent Cloud.

The Data Streaming Revolution: Rise of the Kafka Heroes

Shoe retail titan NewLimits is drowning in stale, inconsistent data due to nightly batch jobs that keep failing. Read the comic to see how developer Ada and architect Jax navigate through Batchland with Iris, their guide, and enter Streamscape and the realm of event-driven architectures.

Kafka Summit Bangalore 2024

Modernize financial services with data streaming.

This ebook will show you how to make your highly valuable data available at scale, everywhere it needs to be, while keeping it secure and compliant.

Migrating from Kafka services to Confluent

Complete guide to migrate from open-source (OSS) Apache Kafka to Confluent. This includes best practices & customer-success stories on personal migration journeys.

The Builder’s Guide to Streaming Data Mesh

Learn how to successfully implement a data mesh and build data products using Confluent’s data streaming platform, leveraging connectors, stream processing, and Stream Governance.

Best Practices for Multi-Region Apache Kafka® Disaster Recovery in the Cloud (Active/Passive)

In this white paper, we provide a holistic overview of an active-passive multi-region DR solution based on the capabilities of Confluent Cloud, the only fully managed, cloud-native service for Apache Kafka.

Monitoring Your Apache Kafka® Deployment End-to-End

In this white paper, you will learn how you can monitor your Apache Kafka deployments like a pro, the 7 common questions you'll need to answer, what requirements to look for in a monitoring solution and key advantages of the Confluent Control Center.

The Top 6 Reasons Kafka Projects Fail (and How to Overcome Them)

Learn about 6 common Kafka challenges that cause enterprise projects to fail, and how to overcome the disadvantages of running, managing, securing, and scaling Kafka.

Streaming Pipelines to Data Warehouses - Use Case Implementation

This whitepaper is an in-depth guide to building streaming pipelines to data warehouses. Covers source and sink connectors (with Change Data Capture capabilities), stream processing with Kafka Streams and ksqlDB, with use cases and operational considerations.

Streaming Pipelines to Databases - Use Case Implementation

This whitepaper is an in-depth guide to building streaming pipelines between different databases (RDBMS). Covers source and sink connectors (with Change Data Capture capabilities), stream processing with Kafka Streams and ksqlDB, with use cases and operational considerations.

Confluent and Amazon Security Lake

Processing large amounts of data is challenging due to cost, physical size, efficiency, and availability limitations most companies face. A scalable and highly-available back-end system such as Confluent can efficiently process your company’s ever-growing volume of data.

Good Teams Manage Kafka® - Efficient Teams Use Confluent

This white paper unpacks the true costs of open source Kafka and MSK and demonstrates the value you can realize using Confluent.

Forrester Wave™: Cloud Data Pipelines, Q4 2023

Forrester says Confluent is a “Streaming force to be reckoned with,” and has named Confluent a leader in The Forrester Wave™: Cloud Data Pipelines, Q4 2023. See why Confluent is a leader.

Forrester names Confluent a leader in streaming data platforms

Forrester says Confluent is a “Streaming force to be reckoned with,” and has named Confluent a leader in The Forrester Wave™: Streaming Data Platforms, Q4 2023. See why Confluent is a leader.

Financial Services Reimagined with Apache Kafka®

Learn how Confluent's fully managed, cloud-native Kafka powers enterprise-grade data streaming, integration, and governance for modern banking and financial services use cases.

Data Streaming Platforms… To Build or to Buy?

We’ve put together a decision tree that will help you evaluate your current data streaming setup and trajectory to assess whether a fully managed data streaming platform is a good fit for you.

IDC Tech Brief: Why Real-Time Streaming Technology Is Critical to Innovation and Gaining a Competitive Advantage

In this IDC Tech Brief, we share our research on streaming data platforms, and the advantages they’re bringing for innovation, improved operational efficiency, ROI, and more.

Data Streaming 101 Infographic

The modern world is defined by speed. Grocery delivery, rideshare apps, and payments for just about anything can happen instantly using a mobile device and its apps. Every action of every consumer creates data, and businesses must make sense of it quickly to take advantage in real time.

Ventana Report: Confluent Addresses Data Governance for Data in Motion

This Ventana Research Analyst Perspective explains why organizations have to manage and govern data streaming projects alongside data at rest.

Maximize Your Data’s Potential With Streaming Pipelines

Download the “Transform Your Data Pipelines, Transform Your Business: 3 Ways to Get Started” ebook to take a deep dive into the challenges associated with legacy data pipelines and how streaming pipelines can help you reinvent the way data flows through—and is accessed—in your organization.

Designing Event-Driven Systems

Learn how event-driven architecture and stream processing tools such as Apache Kafka can help you build business-critical systems that open modern, innovative use cases.

Confluent named a Leader in IDC MarketScape for Worldwide Analytic Stream Processing Software 2024

Confluent named a Leader in the IDC MarketScape for Worldwide Analytic Stream Processing Software 2024. See why Confluent is a Leader.

Confluent named a Leader in IDC MarketScape for Worldwide Event Brokering Software 2024

Confluent named a Leader in the IDC MarketScape for Worldwide Event Brokering Software 2024. See why Confluent is a Leader.

Kafka Summit London 2024

The cloud-native chasm: lessons learned from reinventing apache kafka as a cloud-native, online service.

Differentiating cloud-native, cloud, and cloud services, and lessons learned building a fully managed, elastic, cloud-native Apache Kafka.

451 Research’s Report Reveals Major Apache Flink® & Confluent Announcements!

Discover the latest Apache Flink developments and major Confluent announcements from Kafka Summit 2023 in 451 Research’s Market Insight Report.

Real-time Fraud Detection - Use Case Implementation

This whitepaper covers how to implement a real-time fraud detection solution, covering multi-channel detection and real-time data integration, real-time processing, machine learning and AI, and real-time monitoring, reporting, and analytics.

Putting Fraud In Context

In our ebook “Putting Fraud In Context”, we explore the complexities of fraud detection, why current detection tools often fall short and how Confluent can help.

How to Build Data Streaming Pipelines to Amazon Aurora with Confluent

Learn how Confluent can simplify and accelerate your migration to Amazon Aurora

5 Ways Data Streaming is Fueling Financial Services Transformation

Top streaming data use cases powering leading financial services organizations, like Citigroup and 10x Banking, with real-time payments, fraud detection, and better customer experiences.

How BigCommerce Upleveled Kafka Management for Digital Retail Innovation

Recognizing the need for real-time data while understanding the burden of self-managing Kafka on their own led BigCommerce to choose Confluent—allowing them to tap into data streaming without having to manage and maintain the data infrastructure.

Top Three Use Cases for Streaming Data Pipelines

Our latest eBook explores legacy data pipelines challenges—and how streaming pipelines and existing tech partners can help you optimize how data flows through your company, and make it more accessible throughout your organization.

How To Reduce Mainframe MIPS with Confluent

Mainframes play a fundamental role in many organizations, but can be expensive to operate. Discover how Confluent's data streaming technology can help reduce MIPS and lower costs, with real-world case studies and example architectures.

Data in Motion and Space Operations

A data mesh is useful for military space operations for numerous reasons including improving data quality, enabling data access and sharing while maintaining security and access controls, and supporting better decision-making.

Putting the National Cybersecurity Strategy in Motion

Confluent is uniquely positioned to help agencies reframe how they approach the responsibility for and the coordination of cyber defense and resilience.

USDA Data Modernization

With Confluent, USDA can deploy across on-prem and cloud environments so the different Mission Areas can continue to manage their data as they need. It creates a flexible and future-ready data infrastructure by decoupling producers and consumers, simplifying how data can be combined in new ways.

USAF and Confluent - Driving Data to DOD

Confluent Platform completes the event streaming platform and adds the flexibility, durability, and security required for complex, large-scale mission operations.

Unleashing Data to Advance the National Defense Strategy

A data mesh architecture helps address all eight guiding principles in the DoD Data Strategy from viewing data as a strategic asset to collective stewardship and enterprise access to being designed for compliance.

Thundercat & Confluent

Data Centralization enables algorithms to work more effectively, with access to more information and working at the speed of machines to provide deeper insight in near real time.

Securing Data in Motion

With ABAC, authorization occurs at a granular level less than the topic in the stream. The requirement restricts access to fields within the event based on attribute types, combinations, and user roles.

Digital Citizen Engagement for Government Agencies

Data streaming can be applied to nearly any citizen service (e.g., permit applications, financial aid, pensions, medical claims, immigration processing, tax filing), becoming increasingly powerful when government agencies use the same data sources across multiple applications.

JADO and JADC2 Objectives with Data in Motion

Data mesh architectures help bridge the gap between the systems we have and the decisions we need to support.

Improving the Nation's Cybersecurity

How Confluent helps meet the Executive Order requirement for event forwarding and event log management in collecting, aggregating, routing, and sharing data.

Federal News Network: Closing the Gap Between Mission and Data

Insights on streaming data from the General Services Administration (GSA), NASA, Air Force, and the Federal Energy Regulatory Commission.

FCW Summit: Apply Data in Motion

The solution is better data-in-motion architectures that focus on harnessing the flow of data across applications, databases, Software as a Service (SaaS), layers and cloud systems.

Efficiently Modernizing Government Data Environments

Confluent enables government organizations to easily inject legacy data sources into new, modern applications and adapt to changing real-world circumstances faster than ever.

Driving Data to DoD Software Factories

As the DoD continues to invest in DevSecOps as a culture and approach to rapidly meeting the warfighter’s needs, it needs secure yet widely available access to cloud-native infrastructure.

Digital Government Needs Access to Data in Motion

Event streaming puts data in motion and creates a central nervous system for your entire organization, creating a new paradigm that supports collecting a continuous flow of data throughout an organization and processing it in real time.

Data Sharing in Public Sector

Data streaming enables organizations to put data sharing in motion. The sharing organization publishes a stream of events (including changes and deltas) as they occur and data sharing consumers can subscribe to efficiently receive them as they happen.

Data-Rich Government Services

Confluent aligns closely with the goals of the Data Strategy’s principle of Conscious Design, harnessing existing data and protecting its quality and relevance, allowing agencies to be more responsive to constituent needs with modern services.

Connected Government - Transportation Government

Confluent's data streaming platform enables government entities to transform the way they work with data to protect the public, improve infrastructure, manage transportation, and more.

Apache Kafka® Reinvented for the Cloud

Kafka management becomes risky and costly as it scales. Learn why Confluent reinvented Kafka as a cloud service for over 10X more elasticity, storage, and resiliency.

Easily Offload Operational Complexities to Make Kafka Go Farther, Faster

Download the “Kafka In the Cloud: Why It’s 10x Better With Confluent” ebook to take a deep dive into how Confluent harnessed the power of the cloud to build a data streaming platform that’s 10x better than Apache Kafka, so you can leave your Kafka management woes behind.

Confluent Public Sector

Confluent enables government agencies to utilize data as a continually updating stream of events, rather than discrete snapshots. Run your agency by building real-time applications with historical context - all based on a universal event pipeline.

Connecting the Dots: Getting Maximum Value From Data

In this report, you’ll learn how to use event streaming to process, store, analyze and act on both historical and real-time data in one place. You'll also explore: Data access and management challenges agencies are facing and how to address them.

Government’s Data Platform for Modern Applications

The Confluent event-streaming platform enables government organizations to unlock and repurpose their existing data for countless modern applications and use cases.

10 Ways Confluent Drives Transformation in Financial Firms

This whitepaper describes some of the financial businesses that rely on Confluent and the game-changing business outcomes that can be realized by using data streaming technology

Moving at the Speed of the Mission with Data Fabric

As the DoD presses forward with Joint All-Domain Command and Control (JADC2) programs and architectures the Air Force is working to stand up technology centers that will not only allow for the sharing of data but for the sharing of data in motion.

The Tech Executive’s Guide to Data Streaming Systems

Data streaming provides an accurate, real-time view of your business. Learn about the data streaming ecosystem, its benefits, and how to accelerate real-time insights and analytics in this guide.

Kora: A Cloud-Native Event Streaming Platform For Kafka

Building a cloud-native data streaming platform isn’t just hosting Kafka on the cloud. We documented our design of Kora, the Apache Kafka engine built for the cloud, and were awarded “Best Industry Paper” at Very Large Data Bases (VLDB), one of the most prestigious tech conferences.

Practical Data Mesh: Building Decentralized Data Architectures with Event Streams

Why an event-driven data mesh built on Apache Kafka provides the best way to access important business data and unify the operational and analytical planes.

Mainframe Integration - Use Case Implementation

This whitepaper outlines the most common patterns and considerations for Mainframe Integration projects.

Gartner report: Innovation Insight for Streaming Data in Motion: The Collision of Messaging, Analytics and DBMS

Many businesses are using streaming data in some form—but not necessarily effectively. As the volume and variety of data streams increases, data and analytics leaders should evaluate the design patterns, architectures, and vendors involved in data streaming technology to find relevant opportunities.

3 Best Practices for Building Cloud-Native Systems with Kafka

Taking Kafka to the cloud? Learn 3 best practices for building a cloud-native system that makes data streaming scalable, reliable, and cost-effective.

5 Signs It's Time to Move On From Legacy Architecture

Learn about 5 challenges of legacy systems and why your organization should move its data infrastructure and Apache Kafka use cases to the cloud.

10x Banking Infographic

Modern customers crave personalization. How do banks deliver on it? By leveraging real-time data—enabled by data streaming platforms—to unlock powerful customer experiences.

Data Streaming Platforms: To Build or to Buy?

Should you spend time self-managing open source technologies such as Apache Kafka® (build), or invest in a managed service (buy)? Let’s evaluate!

5 Tips to Improve Fraud Detection and Prevention With Data Streaming

Modern fraud technology calls for a modern fraud detection approach, and that requires real-time data. Industry leaders from Capital One, RBC, and more are detecting fraud using data streaming to protect customers in real time.

Modern, Omnichannel Retail Experiences with Kafka

To succeed, retailers must unify data scattered across point-of-sale, e-commerce, ERP, and other systems. Without integrating all of this data in motion—and making it available to applications in real time—it’s almost impossible to deliver a fully connected omnichannel customer experience.

Confluent Data Warehouse Modernization with AWS

Confluent can help you build data streaming pipelines that allow you to connect, process, and govern any data stream for any data warehouse.

Retailers Improve Their Topline and Bottom Line by Harnessing Data in Motion

Ventana Research finds that more than nine in ten organizations place a high priority on speeding the flow of data across their business and improving the responsiveness of their organizations. This is where Confluent comes in.

GigaOm Ease-of-Use Comparison: Managed vs. Open-Source Kafka

Read GigaOm’s ease-of-use study on self-managed Apache Kafka® and fully managed Confluent Cloud. See how Confluent accelerates and streamlines development.

Modernize Your Databases with Confluent’s Data Streaming Platform

Confluent is 10X better than Apache Kafka so you can cost-effectively build real-time applications on Google Cloud

Modernize your Data Estate with Confluent and Microsoft Azure

Confluent is 10X better than Apache Kafka so you can cost-effectively build real-time applications on Microsoft Azure.

Harness Data in Motion Within a Hybrid and Multi-cloud Architecture

Explore new ways that your organization can thrive with a data-in-motion approach by downloading the new e-book, Harness Data in Motion Within a Hybrid and Multicloud Architecture.

Good teams manage Kafka. Efficient teams use Confluent.

An overview of Confluent’s Core Product Pillars.

Sainsbury’s Infographic

How Sainsbury’s is revolutionizing its supply chain with real time data streaming from Confluent.

Optimize your SIEM to Build Tomorrow’s Cyber Defense with Confluent

Confluent E2E Encryption Accelerator

To learn more about the E2E Encryption Accelerator and how it may be used to address your data protection requirements, download the Confluent E2E Encryption Accelerator white paper.

Running Apache Kafka® in 2022: A Cloud-Native Service

In 2022, if you want to deliver high-value projects that drive competitive advantage or business differentiation quickly, your best people can’t be stuck in the day-to-day management of Kafka, and your budget is better spent on your core business. By now you know, the answer is cloud.

Stateful Serverless Architectures with ksqlDB and AWS Lambda

Introduction to serverless, how it works, and the benefits stateful serverless architectures provide when paired with data streaming technologies.

Hybrid and Multicloud Reference Architecture

To learn more about how you can implement a real-time data platform that connects all parts of your global business, download this free Confluent hybrid and multicloud reference architecture.

Use Data Insights to Build Resiliency in the Face of Complexity

Check out IDC’s findings on why & how building resiliency matters in the face of near-constant disruption. To build resiliency, businesses should focus on one key area: their data.

Find out more in IDC’s From Data at Rest to Data in Motion: A Shift to Continuous Delivery of Value.

Why you need data streaming for hybrid and multicloud data architectures

This eBook will explain how you can modernize your data architecture with a real-time, global data plane that eliminates the need for point-to-point connections and makes your data architecture simpler, faster, more resilient, and more cost effective.

IDC Market Note: Not Just Messaging: Confluent Points the Way to a More Comprehensive Approach to Kafka and Streaming Data Platforms

This IDC Market Note discusses the main takeaways from the 2022 Kafka Summit in London, hosted by Confluent.

From monoliths to microservices: building event-driven systems

The secret to modernizing monoliths and scaling microservices across your organization? An event-driven architecture.

Set Your Data in Motion with Apache Kafka using Confluent on AWS

The companies most successful in meeting the demanding expectations of today’s customers are running on top of a constant supply of real-time event streams and continuous real-time processing. If you aspire to join the ranks of those capitalizing on data in motion, this is the place to start.

Autonomous Networks and Data In Motion

Download this white paper to read how Confluent can power the infrastructure necessary to run Autonomous Networks.

Set Your Data in Motion with Confluent and Apache Kafka®

Comparing confluent with traditional messaging middleware.

In this paper, we explore some of the fundamental concepts of Apache Kafka, the foundation of Confluent Platform, and compare it to traditional message-oriented middleware.

Recognizing the Full Value of Event Streaming: Beyond Messaging to Meaning

This ENTERPRISE MANAGEMENT ASSOCIATES® (EMA™) eBook will show how, with fully managed cloud-based event streaming, executives, managers, and individual contributors gain access to real-time intelligence and the enterprise will achieve unprecedented momentum and material gain.

Real-Time Analytics: Best Practices and Use Cases for Deploying Apache Kafka on AWS with Confluent

In this eBook from Confluent and AWS, discover when and how to deploy Apache Kafka on your enterprise to harness your data, respond in real-time, and make faster, more informed decisions.

From IoT to Effective Analytics— The Full Journey

From data collection at scale to data processing in the Cloud or at the Edge—IoT architectures and data can provide enormous advantages through useful business and operational insights.

Technology Leadership Enabling Digital Transformation in the Home Mortgage Industry

Confluent is pioneering a new category of data infrastructure focused on data in motion, designed to be the intelligent connective tissue enabling real-time data, from multiple sources, to constantly and securely stream across any organization.

Build an Integrated Trading Ecosystem with Data in Motion

Confluent’s platform for data in motion unifies silos and sets data in motion across an organization. Learn how this empowers developers to build the kinds of real-time applications that make their organizations more competitive and more efficient.

Confluent and Qlik® Fast-Track Business Insights with Data in Motion

Discover how to fuel Kafka-enabled analytics use cases—including real-time customer predictions, supply chain optimization, and operational reporting—with a real-time flow of data.

Measuring the Cost-Effectiveness of Confluent Platform

Confluent Platform completes Kafka with a set of enterprise-grade features and services. Confluent Platform can reduce your Kafka TCO by up to 40% and accelerate your time to value for new data in motion use cases by 6+ months. Learn how Confluent Platform drives these outcomes for our customers.

Transforming Financial Services to Meet the New Wave of Digital Adoption

For financial services companies, digital technologies can solve business problems, drastically improve traditional processes, modernize middleware and front-end infrastructure, improve operational efficiency, and most importantly, better serve customers.

Build a Foundation for Tomorrow’s Financial Services

Banks and financial institutions are looking toward a future in which most business is transacted digitally. They’re adding new, always-on digital services, using artificial intelligence (AI) to power a new class of real-time applications, and automating back-office processes.

The Ongoing Disruption of Banking: The Shift to Data in Motion

Banking customers today demand personalized service and expect real-time insight into their accounts from any device—and not just during “business hours.” Financial institutions trying to meet those expectations have intense competition from each other as well as fintech startups...

Building Stream Processing Applications with Confluent

Download this whitepaper to learn about ksqlDB, one of the most critical components of Confluent, that enables you to build complete stream processing applications with just a few simple SQL queries.

Best Practices for Apache Kafka® – 5 Tips Every Developer Should Know

In this white paper, you’ll learn about five Kafka elements that deserve closer attention, either because they significantly improve upon the behavior of their predecessors, because they are easy to overlook or to make assumptions about, or simply because they are extremely useful.

Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka®

This paper presents Apache Kafka’s core design for stream processing, which relies on its persistent log architecture as the storage and inter-processor communication layers to achieve correctness guarantees.

Confluent and Istio Service Mesh

This white paper explores the potential benefits and relevance of deploying Confluent with the Istio service mesh.

Recommendations for Deploying Apache Kafka® on Kubernetes

Learn Kubernetes terms, concepts and considerations, as well as best practices for deploying Apache Kafka on Kubernetes.

Benchmark Your Dedicated Apache Kafka® Cluster on Confluent Cloud

This white paper reports the results of benchmarks we ran on a 2-CKU multi-zone dedicated cluster and shows the ability of a CKU to deliver the stated client bandwidth on AWS, GCP, and Azure clouds.

Event Streaming at the Core of Industry 4.0

If you’re a leader in a business that could or does benefit from automation, IoT, and real-time data, don’t miss this white paper. The lifeblood of Industry 4.0 is streaming data, which is where event streaming comes in: the real-time capture, processing, and management of all your data in order to drive transformative technology initiatives.

Enabling Operational Data Flows with NoSQL - Couchbase and Confluent

This brief describes how to enable operational data flows with NoSQL and Kafka, in partnership with Couchbase and Confluent.

10 Principles for Streaming Services

This paper provides 10 principles for streaming services, a list of items to be mindful of when designing and building a microservices system

The IDC Perspective on Confluent Platform 6.0

The IDC Perspective on Confluent Platform 6.0 is here, and in it, you can read IDC’s lens on the importance of event streaming to enterprise companies today.

An IDC Spotlight on Modern Data Management: Real Time and Software-Defined

We used to talk about the world’s collective data in terms of terabytes. Now, according to IDC’s latest Global Datasphere, we talk in terms of zettabtytes: 138Z of new data will be created in 2024—and 24% of it will be real-time data. How important is real-time streaming data to enterprise organizations? If they want to respond at the speed of business, it’s crucial. In this digital economy, having a competitive advantage requires using data to support quicker decision-making, streamlined operations, and optimized customer experiences. Those things all come from data.

Deploying Confluent Enterprise on Microsoft Azure

This white paper outlines the integration of Confluent Enterprise with the Microsoft Azure Cloud Platform.

Modernize data architectures - MongoDB and Kafka

This brief describes a modern data architecture with Kafka and MongoDB

2018 Apache Kafka® Report

The survey of the Apache Kafka community shows how and why companies are adopting streaming platforms to build event-driven architectures.

Streaming Analytics Platform - Altair and Confluent

This brief describes a comprehensive streaming analytics platform for visualizing real-time data with Altiar Panopticon and Confluent Platform.

Confluent Verification Guide

This paper will guide developers who want to build an integration or connector and outlines the criteria used for Confluent to verify the integration.

Top 3 Streaming Use Cases for Real Time Streams in Financial Services Architectures

Read this white paper to learn about the common use cases Confluent is seeing amongst its financial services customers.

Role-Based Access Control (RBAC) for Kafka Connect

Ensure that only authorized clients have appropriate access to system resources by using RBAC with Kafka Connect.

Confluent Cloud Datasheet

Confluent Cloud is the industry's only cloud-native, fully managed event streaming platform powered by Apache Kafka.

Partner Development Guide for Kafka Connect

Best practices for developing a connector using Kafka Connect APIs.

Real-time Data Integration into Kafka - HVR and Confluent

This brief describes a solution for data integration and replication in real time and continuously into Kafka, in partnership with HVR and Confluent.

Accelerate Insights with Real-Time Event Streaming with Qlik and Confluent

This brief describes a solution to efficiently prepare data streams for Kafka and Confluent with Qlik Data Integration for CDC Streaming.

Reference Architecture: Confluent and MongoDB

This reference architecture documents the MongoDB and Confluent integration including detailed tutorials for getting started with the integration, guidelines for deployment, and unique considerations to keep in mind when working with these two technologies.

Confluent Operations Training for Apache Kafka

In this three-day hands-on course, you will learn how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka experts.

The Ongoing Disruption of Insurance: A Shift to Real-Time Streaming Data

Most insurance companies today are somewhere along the spectrum of digital transformation, finding new ways to use data while staying within the confines of strict regulatory complexity and capital requirements. But only a few insurtech leaders and innovative startups have really tapped into real-time streaming data as the architecture behind these efforts. In this free ebook, learn about three pivotal insurance business uses for event streaming: reducing operating costs with automated digital experiences, personalizing the customer experience, and mitigating risks with real-time fraud and security analytics.

2017 Apache Kafka® Report

This survey focuses on why and how companies are using Apache Kafka and streaming data and the impact it has on their business.

Event-driven business: How to handle the flow of event data

Get key research stats on why CIOs are turning to streaming data for a competitive advantage.

Event driven enterprise architecture - DataStax and Confluent

This brief describes a modern datacenter to manage the velocity and variety of data with an event driven enterprise architecture with DataStax and Confluentj

Organizing for Enterprise Event Streaming: The New Central Nervous System of Business

In this ebook, you’ll learn about the adoption curve of event streaming and how to gain momentum and effect change within your organization. Learn how to wield event streaming to convert your enterprise to a real-time digital business, responsive to customers and able to create business outcomes in ways never before possible.

Confluent Training: Stream Processing Using Kafka Streams & KSQL

Streams and tables: two sides of the same coin.

In this paper, we introduce the Dual Streaming Model. The model presents the result of an operator as a stream of successive updates, which induces a duality of results and streams.

Confluent Developer Training: Building Kafka Solutions

In this three-day hands-on course you will learn how to build an application that can publish data to, and subscribe to data from, an Apache Kafka cluster.

Event Streaming and Graph Databases - Neo4j and Confluent

This brief describes a solution with Neo4js graph database and Confluent Platform.

High-throughput Low-latency NoSQL solution - Scylla and Confluent

This brief describes a solution for real-time data streaming with ScyllaDB's NoSQL database paired with Confluent Platform.

Confluent Cloud Security Controls

Confluent implements layered security controls designed to protect and secure Confluent Cloud customer data, incorporating multiple logical and physical security controls that include access management, least privilege, strong authentication, logging and monitoring, vulnerability management, and bug bounty programs.

Streaming Data Analysis and Visualization - Kinetica and Confluent

This brief describes streaming data analysis and visualization accelerated by Kinetica's GPU in-memory technology, in partnership with Confluent.

Confluent Platform Datasheet

An overview of the enterprise-ready event streaming platform.

Fives Stages to Streaming Platform Adoption

Use cases for streaming platforms vary from improving the customer experience - we have synthesized some common themes of streaming maturity and have identified five stages of adoption

End-to-End Streaming Analytics - Imply and Confluent

This brief describes an end-to-end streaming analytics solution with Imply, Druid providing the data querying and visualizations and Kafka data streaming.

Apache Kafka® in the Automotive Industry

Spending time with many OEMs and suppliers as well as technology vendors in the IoT segment, Kai Waehner gives an overview on current challenges in the automotive industry and on a variety of use cases for event-driven architectures.

In this 30-minute session, hear from top Kafka experts who will show you how to easily create your own Kafka cluster and use out-of-the-box components like ksqlDB to rapidly develop event streaming applications.

Event-Driven Microservices with Spring Boot and Confluent Cloud

In this two-hour spooktacular workshop with Bruce Springstreams, learn about event-driven microservices with Spring BOOOOt and Confluent Cloud.

Mainframe Integration, Offloading and Replacement with Apache Kafka®

Replace the mainframe with new applications using modern and less costly technologies. Stand up to the dinosaur, but keep in mind that legacy migration is a journey. This session will guide you to the next step of your company’s evolution!

How to Build a Bridge to the Cloud

A company's journey to the cloud often starts with the discovery of a new use case or need for a new application. Deploying Confluent Cloud, a fully managed cloud-native streaming service based on Apache Kafka, enables organisations to revolutionise the way they build streaming applications and real-time data pipelines.

Elastically Scaling Kafka Using Confluent

Adjusting to the real-time needs of your mission-critical apps is only possible with an architecture that scales elastically. Confluent re-engineered Apache Kafka into an elastically scalable, next-gen event streaming platform that processes real-time data wherever it lives - making it accessible for any budget or use case.

Capital One Delivers Risk Insights in Real Time with Stream Processing

Capital One supports interactions with real-time streaming transactional data using Apache Kafka®. Join us for this online talk on lessons learned, best practices and technical patterns of Capital One’s deployment of Apache Kafka.

Event Streaming and the MongoDB Kafka Connector

In this online talk, we introduce Apache Kafka® and the MongoDB connector for Kafka, and demonstrate a real world stock trading use case that joins heterogeneous data sources to find the moving average of securities using Apache Kafka and MongoDB.

Unlock Data by Connecting Confluent Cloud with Azure Cosmos DB

View this webinar with Confluent and Microsoft experts to:

  • Learn how companies are unlocking their data with Kafka
  • Understand integration strategies that you can adopt
  • See a demo of the Cosmos DB connector and how it can safely deliver data and events in real-time

Stream Processing Fundamentals: A Confluent Online Talk Series

Stream processing is a data processing technology used to collect, store, and manage continuous streams of data as it’s produced or received. Also known as event streaming or complex event processing (CEP), stream processing has grown exponentially in recent years due to its powerful...

Fundamentals Workshop: ksqlDB 101

In today’s fast-paced digital world, customers want businesses to anticipate their needs in real time. To meet these heightened expectations, organizations are using Apache Kafka®, a modern, real-time data streaming platform.

Building Real-Time, Intelligent AI Copilots with Confluent Cloud, Azure OpenAI & Azure Data Services

Join our webinar to explore how Confluent's data streaming platform, Azure data services, and Azure OpenAI deliver real-time insights and help customers conduct seamless business transactions.

Best of the Best 23 - A year in motion

As the year draws to a close, we invite you to join us for a special event that reflects on the best moments of 2023 and provides a glimpse into the future of data and innovation.

Demo: Design Event-Driven Microservices for Cloud

Join this webinar to see how the legacy of traditional implementations still impacts microservice architectures today.

How Data Streaming Elevates the Omnichannel Customer Experience

Hear how Thrivent brought mainframe customer data into a real-time streaming platform so customers get a frictionless omnichannel experience. Systems integration partner, Improving, will discuss how they helped Thrivent speed up this data transformation journey.

Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs.

How to Build a Streaming Data Mesh with Confluent

Ready to turn data mess into a data mesh? Join us to learn how to use Confluent connectors, stream processing, and Stream Governance to successfully implement the four principles of data mesh. Build high-quality data products and make them easily discoverable and accessible across your organization.

How Dollar General Leverages Streaming Pipelines for Real-Time Retail Use Cases

In this fireside chat you’ll hear from Dollar General’s Head of Merchandising and Supply Chain Engineering on why and how the organization has adopted data streaming, benefits seen so far, and tactical recommendations on how other organizations can adopt similar use cases.

Ready to break up with ZooKeeper? KRaft, you had me at hello

In this webinar, we will walk you through two product demos to ensure you’re ready for ZooKeeper-less Kafka: one on how to get started with KRaft and the second on how to migrate to KRaft from an existing deployment.

Fast, Frictionless, and Secure Integrations with Confluent’s Connector Portfolio

In this demo webinar, you’ll learn about Confluent’s connector portfolio and how it can enable seamless, reliable, and secure integrations with your source and sink data systems. We’ll show you how to set up secure networking, configure popular connectors, and leverage other productivity features.

How to Optimize your SIEM Platforms with Confluent

Learn how to modernize your SIEM architecture for higher throughput, lower latency, and more cost efficiency. You’ll also be able to run the demo and explore a series of hands-on labs for yourself and dig into the technical details.

Transforming to a Modern Tech Stack with Cloud-native Microservices and Data Streaming

Check out our webinar with McAfee, where you’ll get first-hand insight into how the cybersecurity giant replatformed its architecture to reap the benefits of real-time data streaming with a fully managed cloud service.

Open Standards for Data Lineage

Accenture & Confluent present: Open Standards for Data Lineage. Explore Data Lineage's Future!

Streaming Use Case Showcase

Join us for our monthly webinar series, "Streaming Use Case Showcase," and discover how industry leaders, customers, and partners leverage Confluent's cutting-edge technology to revolutionise IT and achieve unprecedented business outcomes.

Microservices & Apache Kafka®

This is a three-part series which introduces key concepts, use cases, and best practices for finding success with event-driven microservices. Each session is recorded so if you missed a session you’ll have a chance to watch on-demand.

You’ll walk away with an understanding of how to modernize your SIEM architecture for higher throughput, lower latency, and more cost efficiency. You’ll also be able to run the demo and explore a series of hands-on labs for yourself and dig into the technical details.

Full Stream Ahead: A Kafka Summit London Recap

Get ready for an exclusive opportunity to immerse yourself in the transformative world of Kafka Summit London! Whether you're a tech enthusiast, a business leader, or simply curious about the forefront of data streaming, this series is designed for you.

Simple, Serverless Stream Processing with Confluent Cloud for Apache Flink®

In this webinar, you’ll get a detailed overview of what’s new with our fully managed Flink service. You’ll also see a technical demo that incorporates all of the latest Flink enhancements on Confluent Cloud, including Actions, Stream Lineage integration, and more.

Serverless Event Driven Architecture, Made Easier with Confluent and AWS Lambda

See how Extend pairs Confluent's data streaming platform with AWS serverless services to build scalable data applications.

Simplifying Event-Driven Architectures with Confluent and AWS Lambda

During this webinar, learn how to simplify event-driven, serverless architectures with Confluent and AWS Lambda. You'll see how to scale seamlessly, integrate AWS Lambda with Confluent, and build apps faster with Confluent’s robust connector ecosystem.

Build a Secure Shared Services Data Streaming Platform

Learn about the new key features in the Confluent Cloud Q1 2023 launch - Centralized Identity Management (OAuth), Enhanced RBAC, Client Quotas and more that enable you to build a secured shared services data streaming platform.

Set your Retail Data in Motion with Confluent and Azure

Listen to this webinar to earn how to build and deploy data pipelines faster while combining and enriching data in motion with Confluent & Azure Cosmos DB.

Eliminate your Kafka Ops burden with Confluent's fully managed Kafka service

In this webinar, Dan Rosanova, Group Product Manager at Confluent, will cover:

  • Why not all fully managed services for Kafka are created equal
  • How Confluent Cloud eliminates your Kafka Ops burden
  • How Confluent Cloud unlocks time and focus for other business needs

Stream Data with Confluent Cloud into Databricks

In this webinar, we’ll walk through how to build streaming data pipelines to Databricks across on-prem and cloud environments using Confluent and our ecosystem of pre-built connectors, with ksqlDB for real-time stream processing.

Unpacking the Cloud Migration Process with Improving

Join us for this webinar where we will discuss the challenges and benefits of a cloud migration and how our partner, Improving, can help you simplify and navigate the process with confidence.

Building a Streaming Analytics Application in 1 Hour

Watch this webinar for an opportunity to hear from the thought leaders of Kafka and Apache Druid on how Confluent Cloud and Imply Polaris enable customers to leverage the power of interactive streaming platforms to accelerate real time data analytics.

How to Build a Data Mesh in Confluent Cloud with Stream Governance

Learn how to build a data mesh on Confluent Cloud by understanding, accessing, and enriching your real-time Kafka data streams using Stream Governance. Confluent’s product team will demo the latest features and enhanced capabilities, along with showing how you can get access in a few clicks.

Maximize Streaming Data Quality with Stream Governance

Learn how to ensure high data quality, discoverability, and compatibility for your real-time data streams on Confluent Cloud using Stream Governance. Confluent’s Stream Governance team will demo the latest features and enhanced capabilities, along with showing how you can upgrade in a few clicks.

Modern Practices for Agile Data Mesh Architectures

Learn the benefits of data mesh, how to best scale your data architecture, empower real-time data governance, and best practices from experts at Confluent and Microsoft.

Kafka + Disaster Recovery: Are You Ready?

Learn in this demo-driven webinar best practices for building a resilient Kafka deployment with Confluent Platform to ensure continuous operations and automatic failover in case of an outage.

The Strategic Value of Data Streaming In Financial Services

Join Kai Waehner, Global Field CTO at Confluent, to explore the latest financial services trends and learn how data streaming helps modernize legacy architectures to enable digitalization in regulated industries.

Show Me How - Bring Your Own Connectors to Confluent Cloud

In this hands-on session you’ll learn about Custom Connectors for connecting to any data system or apps without needing to manage Kafka Connect infrastructure. We’ll show you how to upload your connector plugin, configure the connector, and monitor the logs and metrics pages ensure high performance.

How cluster scaling is made 10x easier with Confluent Cloud

Your data streaming platform needs to be truly elastic to match with customer demand. It should scale up with your business’s peak traffic, and back down as demand shrinks.

Q1 '24 Confluent Launch - Serverless Flink, Cost-effective Enterprise Clusters, Revamped Connectors, and more!

In this demo webinar, you will learn about our new Apache Flink on Confluent Cloud, the industry’s only cloud-native, serverless Flink service for stream processing alongside other new innovations on Confluent Cloud.

Data Streaming in Real Life: Logistics

Explore the state of data streaming for the logistics sector , where digital logistics and real-time capabilities are a core segment for investments and data consistency is crucial!

The Top 5 Use Cases and Architectures for Real-Time Data Streaming in 2022

Learn how companies will leverage event-driven data streaming, Apache Kafka, Confluent, and other tools to meet the demands of real-time markets, increased regulations, heightened customer expectations, and much more.

Streaming Data Architectures and Use Cases

Register now to attend this informative online talk and discover Kai’s top five cutting-edge use cases and architectures that are at the forefront of real-time data streaming initiatives.

Data Mesh: From Concept to Implementation

What is data mesh and why is it gaining rapid traction among data teams?

Join us on May 17 to talk with Michele Goetz, VP, Principal Analyst at Forrester and Raiffeisen Bank International for a deep dive.

DIMT Reflections - A Data in Motion Recap

Straight from the event floor to you, DIMT Reflections - A Data in Motion Recap discussion series showcases the best of the best content from the Data in Motion tour. Giving you access to conversations and content from the best the APAC region has to offer.

Data in Motion Tour 2023 ANZ Livestream

Learn how Confluent helps you manage Apache Kafka® — without its complexity.

Join Confluent for the opportunity to hear from customers, network with your peers and ecosystem partners, learn from Kafka experts, and roll up your sleeves with interactive demonstrations.

Live on Stage 2023: From Telco to TechCo? How Telcos Are Shaping the Future of Communication with Data Streaming

This panel of industry experts discuss their everyday usage of data streaming within their company, how they got there, and what use cases they will be focussing on further down the road of real-time.

The Top Five Trends for Data Streaming in 2024

Data Streaming with Apache Kafka and Apache Flink is one of the world's most relevant and talked about paradigms in technology. With the buzz around this technology growing, join Kai Waehner, Global Field CTO at Confluent, to hear his predictions for the 'Top Five Trends for Data Streaming in 2024'.

Data Streaming in Real Life: Public Sector

Explore general trends like customer-driven in-store experiences, social & locality platforms, and how data streaming helps modernize and innovate customer experiences as well as operational efficiencies.

Data Streaming in Real Life: Healthcare

Explore the latest data streaming trends and architectures, including edge, datacenter, hybrid, and multicloud solutions.

Data Streaming in Real Life: Insurance

Explore the state of data streaming in the insurance industry , which constantly needs innovation due to changing market environments and changes in customer expectations.

Data Streaming in Real Life: Gaming

Data streaming in real life: financial services.

Explore the latest financial services trends and learn how data streaming helps modernize legacy architectures to enable digitalization in regulated industries.

Data Streaming in Real Life: Telecommunication

Explore general trends and how data streaming architectures are used, included edge, data center, hybrid and multi cloud helps to modernize and innovate the industry.

Data Streaming in Real Life: Retail

Data streaming in real life: manufacturing.

Explore general trends like software-defined manufacturing and how data streaming helps modernize and innovate the entire engineering and sales lifecycle.

Data Streaming in Real Life: Digital Natives

How dish scaled their data streaming platform and data mesh to support 5g.

Learn how DISH’ Wireless scaled their data streaming platform to power a new smart 5G network and deliver next-gen apps and valuable network data products. Hear why DISH chose Confluent Cloud and delve deeper into their 5G architecture.

Data Streaming in Real Life: Utilities & Energy

Live on stage 2023 with siemens & brose: unlocking the potential of industry 4.0 and iiot, build a scalable genai chatbot with amazon bedrock and confluent cloud, modernize your database with confluent and azure cosmos db.

In this demo, we will show you how to connect on-premises and multi-cloud data to Azure Cosmos DB, process that data in a stream before it reaches Azure Cosmos DB, and connect your Azure Cosmos DB data to any application.

Show Me How: Omnichannel Analytics with Streaming Pipelines

Join us to hear how Confluent enables our customers to use real-time data processing against Apache Kafka®, leverage easy-to-use yet powerful interactive interfaces for stream processing, and build integration pipelines without needing to write code.

Show Me How: Build a Real-Time Slot Machine with Confluent Cloud

In this workshop session, you will follow along with an instructor as you walk through the design, build, and implementation process with a simple, hypothetical application using Confluent Cloud.

Show Me How: Multi-Cloud Architectures

Join us to learn how to set up Confluent Cloud to provide a singular and global data plane connecting all of your systems, applications, datastores, and environments – regardless of whether systems are running on-prem, in the cloud, or both.

Monolith to Event-driven Microservices with McAfee: Crafting the Winning Kafka Strategy

McAfee, a leader in online protection, recognized the need to transition from open-source Kafka for their cloud-native modernization effort. Learn how they drove a successful migration, secured leadership buy-in to support this shift, & discovered insights for crafting an effective Kafka strategy.

Running Kafka efficiently: Scaling streaming data pipelines in Hypergrowth FinTech

In this webinar, hear directly from KOR's CTO to learn why they chose Confluent over MSK to serve as a single source of truth for four decades’ worth of trade reporting data to stay in compliance with financial regulations.

Show Me How: Build Streaming Data Pipelines from SQL Server to MongoDB

In this hands-on session with Q&A, you’ll learn how to build streaming pipelines to connect, process, govern, and share real-time data flows for cloud databases. The demo shows how an ecommerce company uses streaming pipelines for Customer 360 and personalization.

IDC CIO Summit Fireside Chat with Confluent-Data In Motion

In the world that real time analytics, cloud, event streaming and Kafka are hot topics, how does “Data In Motion” come into play? What are the core ideas behind and why is it a big deal to companies that are going through digital transformation?

Fundamentals for Apache Kafka®

In this three-part series, you’ll get an overview of what Kafka is, what it's used for, and the core concepts that enable it to power a highly scalable, available and resilient real-time event streaming platform.

Elevating Kafka: Driving operational excellence with Albertsons + Forrester

Join Forrester analyst Mike Gualtieri and Albertsons Senior Director of Omni-Channel Architecture Nitin Saksena to hear how the market trends driving the adoption of data streaming and how Albertsons has implemented a plethora of real-time use cases to deliver differentiated customer experiences.

Consolidating Data Silos with Confluent and Google Cloud

In this webinar, we’ll show you how to leverage Confluent Cloud and Google Cloud Platform products such as BigQuery to streamline your data in minutes, setting your data in motion.

Launch Hybrid Applications with Confluent and Google Cloud

How do you mobilize your data securely and cost effectively to power a global business in real time?

Experience serverless stream processing with Confluent Cloud for Apache Flink®

In this webinar, you'll learn about the new open preview of Confluent Cloud for Apache Flink®, a serverless Flink service for processing data in flight. Discover how to filter, join, and enrich data streams with Flink for high-performance stream processing at any scale.

How to Build Streaming Pipelines for Cloud Databases

Maygol will walk us through the new streaming data pipeline demo that showcases a finserv use case showcasing streaming data pipelines between Oracle database and RabbitMQ on-prem systems, to migrate data to MongoDB on the cloud.

How Vimeo Uses Streaming Pipelines to Optimize Real-Time Experiences for 260M+ Users

Learn from Vimeo on Creating Better, Faster Real-Time User Experiences at Massive Scale From batch ETL with a 1-day delay to building streaming data pipelines, learn how Vimeo unlocked real-time analytics and performance monitoring to optimize video experiences for 260M+ users.

Show Me How: Unlock Real-Time Analytics with Streaming Data Pipelines and Change Data Capture (CDC)

In this hands-on session we’ll show how to enrich customer data with real-time product, order, and demographic data every time a new order is created. You’ll learn how to connect data sources, process data streams with ksqlDB, and govern streaming pipelines with Confluent.

The Top Data Streaming Use Cases for Retail

Join the Confluent team for a webinar that delves into the most impactful data streaming use cases, leaving you clear on the value these can deliver to your business and how to get started with deploying a data streaming platform.

Fundamentals Workshop: Apache Kafka® 101

Kafka is a platform used to collect, store, and process streams of data at scale, with numerous use cases. Join us in this live, interactive session, to learn more about Apache Kafka.

New with Confluent Platform: Zookeeper Removal, Data Quality Rules & More

Zookeeper removal, Confluent for Kubernetes, governance updates, cluster linking, and more! Join this demo webinar and see our product highlights from Confluent Platform 7.x.

Single Message Transformations Are Not the Transformations You’re Looking For

The sessions covers the common use cases for SMTs when sending data to and from a Kafka cluster, including masking sensitive information, storing lineage data, remove unnecessary columns, and more.

Top 6 Reasons Kafka Projects Fail and How to Overcome Them

Embark on a journey to Kafka success! Join our exclusive webinar, "Top 6 Reasons Kafka Projects Fail and How to Overcome Them," on March 7, 2024, at 11 am SGT.

Real-time Cyber Defence with a Streaming SIEM

Maximize the value of your SIEM platform: Make data streaming the entry point for your cyber data and deploy next-generation SIEM pipelines

Layanan Mikro & Apache Kafka®

Arsitektur aplikasi bergeser dari sistem perusahaan monolitik ke pendekatan yang fleksibel, dapat diskalakan, dan digerakkan oleh peristiwa. Selamat datang di era layanan mikro.

Build Real-Time Customer Experiences with SAP Datasphere and Cloud-Native Apache Kafka®

During this demo webinar, you’ll learn about the enterprise data streaming capabilities Confluent and SAP are building together. See how you can build real-time experiences with your SAP data at a lower cost with Confluent’s fully managed data streaming platform now integrated with SAP Datasphere.

Show Me How: Streaming Data Pipelines from Your IBM Mainframe

In this hands-on session, you’ll learn how to integrate your IBM mainframe with Confluent in order to unlock Z System data for use in real-time, cloud-native applications.

The Data Streaming Platform: Key to AI Initiatives

Learn how Gen AI is transforming the way businesses approach their data strategy, offering real-time, context-driven content generation.

How DriveCentric Teamed Up with Confluent to Scale for Massive Growth

Leveraging Confluent’s fully managed, cloud-native service for Apache Kafka®, DriveCentric has been able to successfully transform and grow their business within a rapidly changing market.

Data Streaming & Airlines: A Match Made in Heaven (and on the Ground)

Explore use cases, architectures, and success stories for data streaming in the aviation industry, including airlines, airports, global distribution systems (GDS), aircraft manufacturers, and more.

Streaming & The Future: Artificial Intelligence as a Data Product

GenAI has the potential to revolutionize industries by improving efficiency, reducing costs, and providing better outcomes for individuals and businesses.

Explore the Technology Powering Today's Cutting-Edge Gaming Platforms

Demo webinar: See how real world gaming tech use cases like real-time player cheating detection are powered by Confluent’s 10x Kafka service for data streaming and AWS Lambda.

Build a Modern Database Architecture with Confluent and Amazon Web Services (AWS)

Modernize your database and move to the cloud by connecting multicloud and hybrid data to Amazon Aurora in real time.

Data Estate Modernization Best Practices: Save Time, Money, and Headaches

Learn the latest cost and time-saving data estate modernization best practices with Azure and Confluent.

Five Tips for Enabling Enterprise-Wide Data Streaming

Join Lyndon Hedderly and Burhan Nazir of Confluent as they share their expertise on deploying enterprise-wide data streaming and accelerating the speed to realising measurable business value derived from a data-streaming investment.

Breaking the Monolith: Unlocking the Power of Event-Driven Microservices

This event-driven microservices webinar will have refreshed messaging around how event-driven microservices is an important use case for Confluent. Maygol will walk through a brand-new demo exclusively focused on an event-driven microservices use case.

How Data Mesh Can Transform Your Business

Join us in our new Confluent Illustrated webinar as we present the fundamental aspects of data mesh and how to best put it into practice.

Data Mesh and Data in Motion Theory and Practice

Experience a groundbreaking discussion led by James Golan, a solutions engineering expert at Confluent. In this comprehensive session, delve into the core concepts of data motion, where data powers digital experiences and businesses alike.

How a Data Streaming Platform Makes Your Broader Data Strategy Successful

This 35-minute webinar is an overview of the Gartner Presentation How Data Streaming Makes Your Broader Data Strategy Successful.

It is a high-level overview of Greg’s Presentation with some added commentary on what is being spoken about at the summit.

Real-Time Banking Beyond Boundaries: Unveiling the Future with Alex Bank

Demo webinar: See how realJoin us for a captivating fireside chat with John Heaton, the visionary CTO of Alex Bank, as he reveals the transformative power of cutting-edge technology in banking.

Discover how Homepoint uses Confluent and Azure to Speed up Loan Processes

How to Unleash the Power of Data with SAP® and Kafka®

A 'how to' webinar in which Rojo outlines how to optimise the use of Apache Kafka in your SAP integration initiatives.

USPS at Current

Pritha Mehra, CIO of United States Postal Service, spoke with Confluent co-founder Jun Rao, where she described how the postal service leveraged data streaming to send free COVID-19 test kits to all Americans in the height of the pandemic.

Federal News Network: How Agencies Can Make Data an Active Asset to Drive Better Outcomes

Video with Jason Schick: The strategy emphasizes the need for enterprise-wide data standards and coordination of data use across agencies, as well as using data to inform annual budget planning.

Creating a Data Culture to Deliver on Mission

Hear our esteemed panel of experts address how to leverage information as a strategic asset.

The New Cyber: Faster, Better, Cheaper

Government agencies understand the need to augment traditional SIEM systems. And, with this knowledge comes the pressure to do so in a way that is better, faster, and cheaper than before.

Data in Motion & Apache Kafka® Use Cases for the Defense Industry

Join Kai Waehner, Field CTO at Confluent, for an online talk in which he will explore the latest data in motion & Apache Kafka® use cases for the defence industry.

Real-time Streaming for Government and the Public Sector

Data streaming is an infrastructure revolution that is fundamentally changing how public sector organisations think about data and build applications. Rather than viewing data as stored records or transient messages, data could be considered to be a continually updating stream of events.

Deliver Better Product Recommendations with Real-Time AI & Vector Search

Join this demo webinar to see how Confluent and Rockset power a critical architecture for efficiently developing and scaling AI applications built on real-time streaming data.

Build Streaming Pipelines with Change Data Capture for Data Warehouses

From batch to real time—learn about and see a demo on how to build streaming pipelines with CDC to stream and process data in real time, from an on-prem Oracle DB and cloud PostgreSQL to Snowflake.

Full Stream Ahead - Live from Current 2023

Join us over 3 days from September 26th to 28th for the "Full Stream Ahead - Live from Current 2023" webinar series.

This webinar will walk through the story of a bank that uses an Oracle database to store sensitive customer information and RabbitMQ as the message broker for credit card transaction events.

Australian State Government - Data Streaming Forum

Data streaming is an infrastructure revolution that is fundamentally changing how Departments and Agencies think about data and build applications.

Learn How to Combat Financial Fraud in Real Time with Confluent and AWS

In this webinar, learn how Confluent and AWS can help your company detect and combat financial fraud. Confluent's cloud-native data streaming platform gathers and analyzes transactional and event data in real time to prevent fraud, reduce losses, and protect your business from threats.

Build a Real-Time Observability Platform in the Cloud

Demo webinar: Build a real-time analytics app to query and visualize critical observability metrics including latencies, error rates, and overall service health status. See how it’s done with Confluent Cloud and Imply Polaris, a fully managed Apache Druid® service.

How to Expedite Your Microservices Development with Stream Governance

Data governance is critical, but how do you govern data streams in real-time? Learn how ACERTUS drove faster microservices development, unlocked streaming data pipelines and real-time data integration across 4K+ schemas using Confluent’s Stream Governance.

How to Build Your Native Integration with Confluent Cloud and Grow Through Data Streams

Partner webinar: Meet with Confluent’s Kafka experts to build your step-by-step plan for integrating with Confluent Cloud and accelerating customer growth on your platform through real-time data streams.

How EVO Banco Transformed Fraud Detection with Data Streaming and Machine Learning

Join us for a unique insider look into the complex world of fraud mitigation in banking. In this session, you will learn how Spain’s leading digital bank, Evo Banco, is paving the way in predictive fraud detection with data streaming and machine learning.

Why Your Fraud Detection and Security Tools Need Data Streaming (India)

With fraud growing at exponential rates and costing financial firms billions of dollars in losses, we take a deeper look into the role timely data and real-time context plays in risk analytics and fraud mitigation.

Data Streaming Report – A Conversation with Jay Kreps

Data serves as the lifeblood of modern businesses, propelling them toward unprecedented success. With data volumes growing at a breakneck speed, the ability to harness its power in real time has become the key to driving greater business insight and accelerating innovation.

Why Your Fraud Detection and Security Tools Need Data Streaming

In this webinar, we’ll be joined by guest speakers from Forrester and Scotiabank to discuss how real-time data streaming is advancing fraud detection and driving smarter decision-making across the Financial Services industry.

What's New in Confluent Platform 7.4

Confluent Platform 7.4 enables Apache Kafka® to scale to millions of partitions. It simplifies architectures and accelerates time to market with self-service tooling and codified best practices for developers, ensuring consistent and accurate data.

Retain Data Infinitely Without Operational Burdens with 10x More Scalable Kafka Storage

Join this webinar to learn how Confluent Cloud relieves these operational considerations with infinite storage for Kafka that’s 10x more scalable and high-performing.

Microservices and Apache Kafka

This three-part online talk series introduces key concepts, use cases, and best practices for getting started with microservices.

Real-time ETL Made Easy with Confluent

Real-time ETL with Apache Kafka® doesn’t have to be a challenge. Join this webinar and see how with Confluent Cloud. With out-of-the-box source & sink connectors and SQL-based stream processing, all fully managed on a complete platform for data in motion.

Demo: How to Build a Context-Aware, Real-Time Fraud Detection Solution in Confluent

In this demo, you will learn how to transform your existing fraud tools with the power of real time data streaming and processing in Confluent. See how to easily connect, curate and integrate relevant data into your fraud systems to build a faster, smarter fraud detection solution.

In this webinar you will hear from industry experts on how real-time data streaming is advancing fraud detection and driving smarter decision-making across the Financial Services industry.

Connect, Process, and Share Trusted Data Faster than Ever

This demo webinar will provide everything you need to get started with the latest updates to Confluent Cloud, our cloud-native data streaming platform.

How Walmart Made Real-Time Inventory & Replenishment a Reality

This fireside chat will cover Suman’s learnings from implementing 2 critical use cases at Walmart that continue to play a critical role in customer satisfaction: real-time inventory and real-time replenishment.

Show Me How: Build Streaming Data Pipelines for Cloud Databases

In this hands-on session with Q&A, you’ll learn how to build streaming data pipelines to connect, process, and govern real-time data flows for cloud databases. The demo shows a FinServ company using streaming pipelines for real-time fraud detection.

Confluent Cloud: Data Warehousing with Amazon Redshift

In this webinar, we’ll walk through how you can start immediately migrating to Amazon Redshift across on-prem and cloud environments using Confluent, our ecosystem of pre-built connectors, and ksqlDB for real-time data processing.

Application Modernization Through Event Driven Microservices

During this webinar, Rishi Doerga, Senior Solutions Engineer at Confluent, discusses how event streaming can help modernize your applications, enabling you to become more agile, innovative, and responsive to your customer's needs.

Partner Tech Talk: Confluent meets Starburst feat. BearingPoint

Partners Tech Talks are webinars where subject matter experts from a Partner talk about a specific use case or project. The goal of Tech Talks is to provide best practices and applications insights, along with inspiration, and help you stay up to date about innovations in confluent ecosystem.

AWS ❤️ Confluent : The Power of Real-Time

Watch this webinar and transform your data pipeline processes.

Show Me How: Build Streaming Data Pipelines for Real-Time Data Warehousing

In this hands-on session with live Q&A, you’ll learn how to build streaming data pipelines to connect, process, and govern real-time data flows for data warehouses. The demo shows an e-commerce company using streaming pipelines for customer 360.

Streaming Data Pipelines for Automatic Vendor Detection in 5 Steps

This webinar covers the operational use case and learnings from SecurityScorecard’s journey from batch to building streaming data pipelines with Confluent.

Customers expect businesses to respond to both their implicit and explicit cues instantaneously. Gone are the days when a business could run a batch process overnight to analyze customer orders, preferences, app downloads, page views, and clicks. They must now respond in real time.

Enhance your Data Infrastructure with Confluent Cloud and Microsoft Azure

In this session, we'll explore how Confluent helps companies modernize their database strategy with Confluent Cloud and modern Azure Data Services like Cosmos DB. Confluent accelerates getting data to the cloud and reduces costs by implementing a central-pipeline architecture using Apache Kafka.

Modernize your real-time data infrastructure with Confluent and Azure

Migrating, innovating, or building in the cloud requires retailers to rethink their data infrastructure. Confluent and Azure enable companies to set data in motion across any system, at any scale, in near real-time.

Show Me How: Confluent Makes Kappa Architecture a Reality

Confluent Infinite Storage allows you to store data in your Kafka cluster indefinitely, opening up new use cases and simplifying your architecture. This hands-on workshop will show you how to achieve real-time and historical processing with a single data streaming platform.

Apache Kafka®, Confluent, and the Data Mesh

Watch this webinar to find out how a data mesh can bring much-needed order to a system in both cases, resulting in a more mature, manageable, and evolvable data architecture.

How ACERTUS Migrated from a Monolith to Microservices with ksqlDB

Learn how ACERTUS leverages Confluent Cloud and ksqlDB for their streaming data pipelines, data pre-processing and transformations, data warehouse modernization, and their latest data mesh framework project.

Show Me How: Trusted Shared Services

In this hands-on session you’ll learn about building trusted shared services with Confluent—a better way to allow safe and secure data sharing. We’ll show you how to enable trusted shared services through OAuth 2.0, role-based access control, and Cloud Client Quotas.

(On-Demand) Show Me How: Confluent Makes Kappa Architecture a Reality

Streaming time series data with apache kafka and mongodb.

Learn how Apache Kafka® on Confluent Cloud streams massive data volumes to time series collections via the MongoDB Connector for Apache Kafka®.

Build a modern database architecture with Confluent and Google

In this session, you’ll learn how to accelerate your digital transformation using real-time data.

Stream Designer - build Apache Kafka® pipelines visually

Confluent’s Stream Designer is a new visual canvas for rapidly building, testing, and deploying streaming data pipelines powered by Kafka.

10 Ways Data Streaming Transforms Financial Services

Every aspect of the financial services industry is undergoing some form of transformation. By leveraging the power of real-time data streaming, financial firms can drive personalized customer experiences, proactively mitigate cyber risk, and drive regulatory compliance.

How Vitality Group future-proofed its' event-driven microservices with Confluent and AWS

Join Ryan James, Chief Data Officer of Vitality Group, to learn how Vitality Group future-proofed its event-driven microservices with Confluent and AWS

Modernizing your data warehouse doesn’t need to be long or complicated. In this webinar, we’ll walk through how you can start migrating to Databricks immediately across on-prem and cloud environments using Confluent, our ecosystem of pre-built connectors, and ksqlDB for real-time data processing.

Demo: How to use Confluent for streaming data pipelines

This demo will showcase how to use Confluent as a streaming data pipeline between operational databases. We’ll walk through an example of how to connect data and capture change data in real-time from a legacy database such as Oracle to a modern cloud-native database like MongoDB using Confluent.

Show Me How: Building Streaming Data Pipelines for Real-time Data Warehousing

In this hands-on session you’ll learn about Streaming Data Pipelines which is a better way to build real-time pipelines. The demo shows how an e-commerce company can use a Streaming Data Pipeline for real-time data warehousing.

Confluent Cloud: Managing your environments like a pro

Listen to this webinar to learn how Confluent Cloud enables your developer community.

Confluent Cloud 101 Series

In fast-moving businesses companies must quickly integrate Kafka in their workloads to respond to customers in real time. Confluent Cloud is a fully managed data streaming platform available everywhere you need it. Join us in these interactive sessions to learn more about Confluent Cloud.

Building an Event-Driven Architecture? Here’s 5 Ways to Get It Done Right

Tune in to discover how you can avoid common mistakes that could set back your event-driven ambitions—and how Confluent’s fully managed platform can help get you where you need to be faster and with less operational headaches.

Data in Motion Tour 2022 - EMEA

Listen back and view the presentations from the Data in Motion Tour 2022 - EMEA.

The Top Five Use Cases & Architectures for Data In Motion in 2023

Data Streaming and Apache Kafka® are two of the world's most relevant and talked about technologies. With the buzz continuing to grow, join this webinar to hear predictions for the 'Top Five Use Cases & Architectures for Data In Motion in 2023'.

Show Me How: Optimize Your SIEM Infrastructure with Confluent

In this hands-on workshop, we’ll show you how to augment your existing SIEM and SOAR solutions to deliver contextually rich data, automate and orchestrate threat detection, reduce false positives, and transform the way you respond to threats and cyber attacks in real time.

Data Streaming, Sharing, and Integrity with Stream Governance

Join this demo to see how Confluent’s Stream Governance suite delivers a self-service experience that helps all your teams put data streams to work.

Welcome to the Streaming Era - A current conversation with Jay Kreps

Join Jay Kreps for a deep dive into data streaming and real-time technology, share best practices and use cases, as well as explore the vision and future of data streaming. Data streaming is foundational to next-gen architecture.

Data in Motion - Apache Kafka® Use Cases for Financial Services

Demand for fast results and decision-making has generated the need for real-time event streaming and processing of data adoption in financial institutions to be on the competitive edge. Apache Kafka® and the Confluent Platform are designed to solve the problems associated with traditional systems

Live on Stage 2022: Telco Networks and Data Mesh becomes one

Network Analytics in a Big Data World: Why telco networks and data mesh need to become one (and how data streaming can save the internet) with Swisscom, NTT, INSA Lyon, and Imply in a panel discussion with Field CTO Kai Waehner.

Developers can focus on building new features and applications, liberated from the operational burden of managing their own Kafka clusters. Join us in these interactive sessions to learn more about Confluent Cloud.

Wix's Migration to Confluent

Join Noam Berman, Software Engineer at Wix, for an insight-packed webinar in which he discusses Wix's growing use of Apache Kafka® in recent years.

Fundamentals Workshop: Kafka Streams 101

Kafka Streams transforms your streams of data, be it a stateless transformation like masking personally identifiable information, or a complex stateful operation like aggregating across time windows, or a lookup table.

We will discuss Confluent’s applicability to SIEM and shows an end-to-end demo of Confluent and Confluent Sigma, an open source project built by Confluent for processing streams of SIEM data in action, showing how to bridge the gap between old-school SIEM solutions and a next-gen architecture.

Real-time inventory and Real-time replenishment use-cases at Walmart, from a practitioner's perspective

Tune in to hear Suman share best practices for building real-time use-cases in retail!

Bank Transformation at Scale: The event-driven architecture of Raiffeisenbank International in 13 countries

Raiffeisen International bank is scaling an event-driven architecture across the group as part of a bank wide transformation program. As technology and architecture leader RBI plays a key role in banking in CEE which will be shared with the audience in this webinar.

This webinar will walk through a story of a Bank who uses an Oracle database to store sensitive customer information and RabbitMQ as the message broker for credit card transaction events.

Confluent x AWS Cloud Forum: How SOCAR Building Real-Time Personalized Promotion & Data Pipeline with Confluent Cloud

Confluent Cloud alleviates the burden of managing Apache Kafka, Schema Registry, Connect, and ksqlDB so teams can effectively focus on modern app development and deliver immediate value with your real-time use cases.

Getting started with Apache Kafka and Real-Time Data Streaming

This two part series provides an overview of what Kafka is, what it's used for, and the core concepts that enable it to power a highly scalable, available and resilient real-time event streaming platform.

Confluent Platform Demo – Hybrid Deployment to Confluent Cloud

Many organizations have data locked away in their on-premises data center, making it impossible to take full advantage of modern cloud services. In this session, you’ll learn how to stream your on-prem data to the cloud in near real time.

Confluent Terraform provider, Independent Network Lifecycle Management, and More

This demo webinar will provide you with everything you need to get started with the latest updates to our cloud-native data streaming platform, Confluent Cloud.

Event-Driven Architectures Done Right

Kafka is now a technology developers and architects are adopting with enthusiasm. And it’s often not just a good choice, but a technology enabling meaningful improvements in complex, evolvable systems that need to respond to the world in real time. But surely it’s possible to do wrong!

The Fundamentals of a Successful Data Mesh

This webinar offers suggestions for best practices, the kinds of tools you’ll need, and how to get your organization started down a path toward a data mesh.

Confluent Developer Live Workshop

We’ve got an exciting line up of sessions designed to get you up to speed on all things Confluent Cloud! You’re sure to gain invaluable insights, no matter how many you’re able to join.

Stream Governance: Eliminate Compliance Headaches for Your Real-Time Data

Confluent’s Stream Governance suite establishes trust in the real-time data moving throughout your business and delivers an easy, self-service experience for multiple teams to discover, understand, and put these streams to work.

Data in Motion Use Cases for Financial Services

The demand for fast results and decision making, have generated the need for real-time event streaming and processing of data adoption in financial institutions to be on the competitive edge.

Data in Motion Tour 2021 - EMEA

Listen back and view the presentations from the Data in Motion Tour 2021 - EMEA.

Data in Motion: Driving Business Value Using Real-Time Data and Insights

Providing a seamless digital experience is what our customers have come to expect. As we step into 2022, our businesses are being challenged to evolve even further to maintain that competitive edge.

The Top Five Use Cases & Architectures for Data In Motion in 2022

Kai Waehner, Field CTO at Confluent, will deliver his predictions on the hottest and most important data in motion use cases for 2022.

Set Your Data in Motion - CTO Roundtable

Confluent hosted a technical thought leadership session to discuss how leading organisations move to real-time architecture to support business growth and enhance customer experience.

The Strategic Importance of Data in Motion

To help organisations understand how data in motion can transform business, Watch ‘The Strategic Importance of Data in Motion’ hosted by Tech UK.

Dashing Off A Dashboard with Apache Kafka

In this online talk, we will answer the question, 'How much can we do with Kafka in 30 minutes of coding?'

Modernize Your Data Warehouse with Confluent and Azure

In this webinar, see how Confluent’s data warehouse modernization solution leverages the Azure Synapse connector to help enterprises create a bridge across your Azure cloud and on-prem environments. We’ll explain how the solution works, and show you a demo!

Benefits, Architecture, and Use Cases with Apache Kafka

Apache Kafka® was built with the vision to become the central nervous system that makes real-time data available to all the applications that need to use it, with numerous use cases like stock trading and fraud detection, and real-time analytics.

Build Fully Managed Data Pipelines

Today’s data sources are fast-moving and dispersed, which can leave businesses and engineers struggling to deliver data and applications in real-time. While this can be hard, we know it doesn’t have to be - because we’ve already made it easy.

Introducing Confluent Platform 7.0 and Cluster Linking

Learn more about Confluent Platform 7.0 and how Cluster Linking enables you to leverage modern cloud-based platforms and build hybrid architectures with a secure, reliable, and cost-effective bridge.

How to Deploy a Kafka Cluster in Production onto Your Desk or Anywhere

Hivecell and Confluent deliver the promise of bringing a piece of Confluent Cloud right there to your desk and deliver managed Kafka at the edge for the first time at scale

Data in Motion Use Cases for Retail and eCommerce

Retailers that have embraced the opportunities of the prolonged pandemic are emerging leaner and stronger than before. Hear Lawrence Stoker, Senior Solutions Engineer at Confluent, walk through the data in motion use cases that are re-inventing the retail business.

BT's Journey to Being an Event-Driven Organisation

Listen to this On-Demand online talk to hear how BT's digital strategy is becoming an event-driven business.

Harnessing the Power of Real-Time Fraud Detection in the Ever Changing World

The world is changing! Organisations are now more globally integrated than ever before and new problems need to be solved. As systems scale and migrate into the cloud, those seeking to infiltrate enterprise systems are presented with new and more frequent opportunities to succeed.

Put out Apache Kafka® fires faster with Health+ by Confluent

We invite you to join Jesse Miller, our lead Product Manager for Health+, in an upcoming webinar to learn about how Health+ can optimize your deployment, give you the highest level of monitoring visibility, and provide intelligent alerts and accelerated support when you need it.

Save $2.5M on Kafka when you switch to Confluent Cloud

Forrester recently released a Total Economic Impact report that identified $2.5M+ in savings, a 257% ROI, and <6 month payback for organizations that used Confluent Cloud instead of Open Source Apache Kafka.

Data Streaming & SIEM Modernisation: Secure your enterprise in real-time & reduce your SIEM costs

This webinar explores use cases and architectures for Kafka in the cybersecurity space, also featuring a very prominent example of combining Confluent and Splunk with Intel’s Cyber Intelligence Platform (CIP).

How Sencrop is powering smart agriculture with real time analytics at the edge

In Sencrop’s case, working with IoT at the “edge” means collecting and providing accurate data from the farm fields. Find out how AWS and Confluent Cloud are powering this real-time processing of data for anomaly detection and weather prediction.

Serverless stream processing for Financial Services

In this session, we'll explore how to build a serverless, event-driven architectures by using AWS Lambda with Kafka. We'll discuss how event-based compute like Lambda can be used to decrease the complexity of running, scaling, and operating stream-based architectures when building new applications.

How Storyblocks Built a New Class of Event-Driven Microservices with Confluent

We’ll discuss the challenges Storyblocks attempted to overcome with their monolithic apps and REST API architecture as the business grew rapidly, and the advantages they gained from using Confluent Event-driven architecture to power their mission-critical microservices.

Confluent, Apache Kafka made easier for digital native and beyond

In this session we will discuss how Apache Kafka has become the de facto standard for event driven architecture, its community support and the scale at which some customers are running it.

Modernize your Database with Confluent and Google CloudSQL

In this demo, we’ll show you how to modernize your database and move to the cloud by connecting multi-cloud and hybrid data to Google Cloud SQL in real time.

Modernize your Database with Confluent and Amazon Aurora

In this demo, we’ll walk through how you can start building a persistent pipeline for continuous migration from a legacy database to a modern, cloud database. You’ll see how to use Confluent and Amazon Aurora to create a bridge across your Amazon cloud and on-prem environments.

RBAC at scale, fully managed Oracle CDC Source Premium Connector, and more

This demo webinar will provide you with everything you need to get started with the latest capabilities of our cloud-native data streaming platform, Confluent Cloud.

The rise of Data in Motion: Serverless and Cloud-Native Event Streaming with AWS on Confluent Cloud

The Cloud - as we all know - offers the perfect solution to many challenges. Many organisations are already using fully-managed cloud services such as AWS S3, DynamoDB, or Redshift. This creates an opportunity to implement fully-managed Kafka with ease using Confluent Cloud on AWS.

Set your Data in Motion with Confluent on AWS, a Howdy Partner Twitch Episode

Join Joseph Morais, Staff Cloud Partner SA, and Braeden Quirante, Cloud Partner SA at Confluent as they discuss Apache Kafka and Confluent.

Migrate and Modernize Your Data Warehouse with Confluent and Databricks

Join us for this webinar to see how Confluent and Databricks enable companies to set data in motion across any system, at any scale, in near real-time.

Demo: Set data in motion with Confluent’s cloud-native service for Apache Kafka®

In this 30-minute session, top Kafka experts will show everything for quickly getting started with real-time data movement ranging from on-demand cluster creation and data generation through to real-time stream processing and account management.

How to Secure Data Streams Using Confluent

This demo webinar will show you how Confluent is the world’s most trusted data streaming platform, with resilience, security, compliance, and privacy built-in by default.

Connect and Modernize Your Data with Confluent and Google Cloud

Leverage Confluent Cloud and Google Cloud Platform products such as BigQuery to modernize your data in minutes, setting your data in motion.

Simplified real-time data integrations, massive throughput elasticity, and more!

This webinar will provide you with everything you need to get started with all the latest capabilities available on our cloud-native data streaming platform, Confluent Cloud.

From Batch to Real-time: How to Get Started with Stream Processing

Interested in bringing stream processing to your organization, but unclear on how to get started? Designed to help you go from idea to proof of concept, this online talk dives into a few of the most popular stream processing use cases and workloads to help get you up and running with ksqlDB.

Innovate Faster with a Real-time Hybrid and Multicloud Data Architecture

This webinar will address the problems with current approaches and show you how you can leverage Confluent’s platform for data in motion to make your data architecture fast, cost-effective, resilient, and secure.

Build your real-time bridge to the cloud across hybrid environments with Confluent Platform 7.0

In this webinar, we'll introduce you to Confluent Platform 7.0, which offers Cluster Linking to enable you to leverage modern cloud-based platforms and build hybrid architectures with a secure, reliable, and cost-effective bridge between on-prem and cloud environments.

Set Your Data in Motion with Confluent on AWS

Learn how to break data silos and accelerate time to market for new applications by connecting valuable data from your existing systems on-prem to your AWS environment using Confluent.

Migrate and modernize your data warehouse with Amazon Web Services (AWS) and Confluent

Today, with Confluent, enterprises can stream data across hybrid and multicloud environments to Amazon Redshift, powering real-time analysis while reducing total cost of ownership and time to value.

Ensure Kafka system health and minimize business disruption with Confluent Platform 6.2 and Health+

In this webinar, we'll introduce you to Confluent Platform 6.2, which offers Health+, a new feature that includes intelligent alerting and cloud-based monitoring tools to reduce the risk of downtime, streamline troubleshooting, surface key metrics, and accelerate issue resolution.

Sketch the Highest Impact Apache Kafka PoC with Confluent

This webinar presents the decision making framework we use to coach our customers toward the most impactful and lowest cost PoC built on Kafka. The framework considers business impact, technology learning, existing resources, technical backgrounds, and cost to ensure the greatest chance of success.

Kafka Streams, a scalable stream processing client library in Apache Kafka, defines the processing logic as read-process-write cycles in which all processing state updates and result outputs are captured as log appends.

Cluster Linking: How to Seamlessly Share Data Across Environments

Learn how Confluent Cluster Linking can seamlessly integrate and share data across these environments in real-time by leveraging your current Confluent/Apache Kafka deployments.

Digital Transformation Starts by Setting Your Data in Motion

Hear how Fortune 500 companies and leading technology providers are driving real-time innovation through the power of data in motion to deliver richer customer experiences and automate backend operations.

On Demand Demo: Intro to Event-Driven Microservices with Confluent

In this short, 20-minute session you’ll gain everything you need to get started with development of your first app based upon event-driven microservices.

Migrate and modernize your data warehouse with Google Cloud and Confluent

Today, with Confluent, enterprises can stream data across hybrid and multicloud environments to Google Cloud’s BigQuery, powering real-time analysis while reducing total cost of ownership and time to value.

Event Streaming and the Rise of Apache Kafka

In this Online Talk you will learn:

  • The rise of the streaming transformation (companies going digital. real-time)
  • The resulting rise of Kafka as the technology that powers the streaming transformation
  • How streaming relates to other major technology trends like cloud and mobile

On Demand Demo: Develop a Streaming ETL pipeline from MongoDB to Snowflake with Apache Kafka®

During this session you’ll see a pipeline built with data extraction from MongoDB Atlas, real-time transformation with ksqlDB, and simple loading into Snowflake.

Extracting value from IoT data with Azure and Confluent Cloud

This webinar presents a solution using Confluent Cloud on Azure, Azure Cosmos DB and Azure Synapse Analytics which can be connected in a secure way within Azure VNET using Azure Private link configured on Kafka clusters.

Connecting Data Applications and Transactions to Kafka in Real Time

Watch this webinar to hear more about how Generali, Skechers and Conrad Electronics are using Qlik and Confluent to increase Kafka’s value.

Upgrading from Apache Kafka® to Confluent

This webinar will cover how you can protect your Kafka use cases with enterprise-grade security, reduce your Kafka operational burden and instead focus on building real-time apps that drive your business forward, and pursue hybrid and multi-cloud architectures with a data platform.

Bring the power of Kafka and data in motion to your business: How to get started

Establish event streaming as the central nervous system of your entire business, perhaps starting with a single use case and eventually architecting a system around event-driven microservices or delivering net-new capabilities like streaming ETL or a comprehensive customer 360.

Demo Series: Learn the Confluent Q3 ‘21 Release

Learn how teams around the world continue building innovative, mission-critical applications fueled by data in motion. This 4-part webinar series will provide you with bite-sized tutorials for how to get started with all the latest capabilities available on the platform.

Build Fully Managed Data Pipelines with MongoDB Atlas and Confluent

With Confluent, you can start streaming data into MongoDB Atlas in just a few easy clicks. Learn how to bring real-time capabilities to your business and applications by setting data in motion.

Discover how to Accelerate App Innovation on Azure with Confluent Managed Apache Kafka

Watch this session to learn how to streamline infrastructure, increase development velocity, unveil new use cases, and analyze data in real-time.

Confluent x Imply: Build the Last Mile to Value for Data Streaming Applications

Join Confluent and Imply at this joint webinar to explore and learn the use cases about how Apache Kafka® integrates with Imply to bring in data-in-motion and real-time analytics to life.

Building Value - Understanding the TCO and ROI of Apache Kafka & Confluent

In this presentation, Lyndon Hedderly, Team Lead of Business Value Consulting at Confluent, will cover how Confluent works with customers to measure the business value of data streaming.

Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing It Yourself

By shifting to a fully managed, cloud-native service for Kafka, you can unlock your teams to work on the projects that make the best use of your data in motion.

Confluent Steaming Event Series - Europe 2020

Listen back and view the presentations from the Confluent Streaming Event Series in Europe 2020

Streaming: Back to Fundamentals (French)

A l’issu de ce talk vous saurez: • Mettre des mots sur les difficultés fondamentales de nos systèmes • Déterminer le rôle du streaming dans vos architecture • Présenter des cas d’usages concrets dans l’usage du streaming

Save time, save money, reduce risks: Connect your SAP® Systems with Confluent

The ASAPIO Connector for Confluent allows true application-based change data capture, along with full database access. This webinar will showcase a SAP- and Confluent-certified solution to enable real-time event streaming for on-prem SAP data.

Cloud-Native Kafka: Simplicity, Scale, Speed & Savings with Confluent

Learn about the benefits of leveraging a cloud-native service for Kafka, and how you can lower your total cost of ownership (TCO) by 60% with Confluent Cloud while streamlining your DevOps efforts. Priya Shivakumar, Head of Product, Confluent Cloud, will share two short demos.

Choosing Christmas Movies with Kubernetes, Spring Boot, and Kafka Streams

Hands-on workshop: Using Kubernetes, Spring Boot, Kafka Streams, and Confluent Cloud to rate Christmas movies.

New Approaches for Fraud Detection on Apache Kafka and KSQL

Modern streaming data technologies like Apache Kafka® and Confluent KSQL, the streaming SQL engine for Apache Kafka, can help companies catch and detect fraud in real time instead of after the fact.

Stand up to the Dinosaur: Mainframe Integration, Offloading and Replacement with Apache Kafka

Mainframe offloading with Apache Kafka and its ecosystem can be used to keep a more modern data store in real-time sync with the mainframe. At the same time, it is persisting the event data on the bus to enable microservices, and deliver the data to other systems such as data warehouses and search indexes.

Real-Time Data Streaming in the Insurance Industry

Learn how Generali Switzerland set up an event-driven architecture to support their digital transformation project.

A Practical Guide to Selecting a Stream Processing Technology

In this talk, we survey the stream processing landscape, the dimensions along which to evaluate stream processing technologies, and how they integrate with Apache Kafka®. Part 5 in the Apache Kafka: Online Talk Series.

Large-Scale Stream Processing with Apache Kafka - 50:46

Neha Narkhede explains how Apache Kafka was designed to support capturing and processing distributed data streams by building up the basic primitives needed for a stream processing system.

Building a Streaming ETL Pipeline with Apache Kafka® and KSQL

In this talk, we'll build a streaming data pipeline using nothing but our bare hands, the Kafka Connect API and KSQL.

Bridge to Cloud: Using Apache Kafka to Migrate to GCP

In this session, we will share how companies around the world are using Confluent Cloud, a fully managed Apache Kafka® service, to migrate to GCP.

The State of Stream Processing (Gwen Shapira, Confluent) | Confluent Streaming Event - Paris 2018

In this talk Gwen Shapira will break through the clutter and look at how successful companies are adopting centralized streaming platforms, and the use-cases and methodologies that we see practiced right now.

Stream me to the Cloud (and back) with Confluent & MongoDB

In this online talk, we’ll explore how and why companies are leveraging Confluent and MongoDB to modernize their architecture and leverage the scalability of the cloud and the velocity of streaming.

Streaming Microservices: Contracts & Compatibility

Recording from QCon New York 2017 Gwen Shapira discusses patterns of schema design, schema storage and schema evolution that help development teams build better contracts through better collaboration - and deliver resilient applications faster.

Building Microservices with Apache Kafka®

In this talk, I'll describe some of the design tradeoffs when building microservices, and how Apache Kafka's powerful abstractions can help.

Aligning Data Governance Initiatives with Business Objectives

There’s a prevailing enterprise perception that compliance with data protection regulations and standards is a burden: limiting the leverage of data.

Deep Dive into Apache Kafka®

In this talk by Jun Rao, co-creator of Apache Kafka®, get a deep dive on some of the key internals that makes Apache Kafka popular, including how it delivers reliability and compaction. Part 2 in the Apache Kafka: Online Talk Series.

Deploying Confluent Platform in Production

In this talk, Gwen Shapira describes the reference architecture of Confluent Enterprise, which is the most complete platform to build enterprise-scale streaming pipelines using Apache Kafka®. Part 1 in the Best Practices for Apache Kafka in Production Series.

Get answers to: How you would use Apache Kafka® in a micro-service application? How do you build services over a distributed log and leverage the fault tolerance and scalability that comes with it?

How Leading Companies Are Adopting Streaming Strategies

With the evolution of data-driven strategies, event-based business models are influential in innovative organizations.

SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®

This session shows how various sub-systems in Apache Kafka can be used to aggregate, integrate and attribute these signals into signatures of interest.

Modernizing Your Application Architecture with Microservices

Learn from field experts as they discuss how to convert the data locked in traditional databases into event streams using HVR and Apache Kafka®.

Swiss Mobiliar: How Apache Kafka helps to create Data Culture

In this webinar we want to share our experience on how the Swiss Mobiliar, the biggest Swiss household insurance enterprise, introduced Kafka and led it to enterprise-wide adoption with the help of AGOORA.com.

On Track with Apache Kafka: Building a Streaming ETL Solution with Rail Data

This talk takes an in-depth look at how Apache Kafka® can be used to provide a common platform on which to build data infrastructure driving both real-time analytics as well as event-driven applications.

Connecting Apache Kafka® to Cash

Real-time data has value. But how do you quantify that value. This talk explores why valuing Kafka is important - but covers some of the problems in quantifying the value of a data infrastructure platform.

Building a Secure, Tamper-Proof & Scalable Blockchain on Top of Apache Kafka - Introduction to AiB's KafkaBlockchain

Apache Kafka is an open source event streaming platform. It is often used to complement or even replace existing middleware to integrate applications and build microservice architectures. Apache Kafka is already used in various projects in almost every bigger company today. Understood, battled-tested, highly scalable, reliable, real-time. Blockchain is a different story. This technology is a lot in the news, especially related to cryptocurrencies like Bitcoin. But what is the added value for software architectures? Is blockchain just hype and adds complexity? Or will it be used by everybody in the future, like a web browser or mobile app today? And how is it related to an integration architecture and event streaming platform? This session explores use cases for blockchains and discusses different alternatives such as Hyperledger, Ethereum and a Kafka-native tamper-proof blockchain implementation. Different architectures are discussed to understand when blockchain really adds value and how it can be combined with the Apache Kafka ecosystem to integrate blockchain with the rest of the enterprise architecture to build a highly scalable and reliable event streaming infrastructure. Speakers: Kai Waehner, Technology Evangelist, Confluent Stephen Reed, CTO, Co-Founder, AiB

Introduction to Confluent Cloud: Apache Kafka® as a Service

Join us as we walk through an overview of this exciting new service from the experts in Kafka. Learn how to build robust, portable and lock-in free streaming applications using Confluent Cloud.

Top 10 FAQs for KSQL, Streaming SQL for Apache Kafka®

In this interactive discussion, the KSQL team will answer 10 of the toughest, most frequently asked questions about KSQL.

Introduction to KSQL: Streaming SQL for Apache Kafka®

Confluent KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka®. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka.

RBC's Data-Driven Transformation with Confluent Platform

One of the largest banks in the world—with 16 million clients globally—RBC built a real-time, scalable and event-driven data architecture for their rapidly growing number of cloud, machine learning and AI initiatives.

The State of Stream Processing

The ‘current state of stream processing’ walks through the origins of stream processing, applicable use cases and then dives into the challenges currently facing the world of stream processing as it drives the next data revolution.

Kafka’s Rebalance Protocol

This talk provides a deep dive into the details of the rebalance protocol, starting from its original design in version 0.9 up to the latest improvements and future work.

Confluent & Apache Kafka® Power Funding Circle's Lending Marketplace

Learn about the impact of Confluent and Apache Kafka® on Funding Circle’s lending marketplace, from Kafka Connect to Exactly-Once processing.

Express Scripts: Driving Digital Transformation from Mainframe to Microservices

This online talk will showcase how Apache Kafka® plays a key role within Express Scripts’ transformation from mainframe to a microservices-based ecosystem, ensuring data integrity between two worlds.

It’s not just Kafka - what else does it take to be real-time?

In this talk, we are going to observe the natural journey companies undertake to become real-time, the possibilities it opens for them, and the challenges they will face

Jay Kreps (CEO, Confluent) on Exactly-once Semantics In Apache Kafka

Kafka has a set of new features supporting idempotence and transactional writes that support building real-time applications with exactly-once semantics. This talk provides an overview of these features.

From Zero to Hero with Kafka Connect

This talk discusses the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes.

Apache Kafka® Use Cases for Financial Services

Technologies open up a range of use cases for Financial Services organisations, many of which will be explored in this talk. .

ATM Fraud Detection with Apache Kafka and KSQL

Detecting fraudulent activity in real time can save a business significant amounts of money, but has traditionally been an area requiring a lot of complex programming and frameworks, particularly at scale.

Apache Kafka® Architecture & Fundamentals Explained

This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Part 2 of 4 in our Fundamentals for Apache Kafka series.

Building stream processing applications with Apache Kafka using ksql (Robin Moffatt, Confluent) - Big Data LDN 2019

Robin discusses the role of Apache Kafka as the de facto standard streaming data processing platforms.

Apache Kafka and Machine Learning in the Finance Industry

Join this Online Talk, to understand how and why Apache Kafka has become the de-facto standard for reliable and scalable streaming infrastructures in the finance industry.

Building Event-Driven Applications with Apache Kafka® & Confluent Platform

In this session, we will cover the easiest ways to start developing event-driven applications with Apache Kafka using Confluent Platform.

Live Coding a KSQL Application

Join us as we build a complete streaming application with KSQL. There will be plenty of hands-on action, plus a description of our thought process and design choices along the way. Part 2 in the Empowering Streams through KSQL series.

Microservices and the Future of Data

Join the discussion on the relationship between microservices and stream processing with Data-Intensive Apps author Martin Kleppmann, Confluent engineers Damian Guy and Ben Stopford, chaired by Jay Kreps, co-founder and CEO, Confluent.

Introducing the Confluent CLI

In this video, Tim Berglund explains how you can speed up development with the Confluent Command Line Interface (CLI), which allows you to quickly iterate while implementing your applications and enables you to interact with the Confluent ecosystem.

Modern Cloud-Native Streaming Platforms

This talk explores the benefits around cloud-native platforms and running Apache Kafka on Kubernetes, what kinds of workloads are best suited for this combination, and best practices.

GCP for Apache Kafka® Users: Stream Ingestion and Processing

In this session, we'll compare the two approaches to data integration and show how Dataflow allows you to join and transform and deliver data streams among on-prem and cloud Apache Kafka clusters, Cloud Pub/Sub topics and a variety of databases.

Streaming Data Ingest and Processing with Apache Kafka - 56:21

How to build an apache kafka® connector.

This online talk dives into the new Verified Integrations Program and the integration requirements, the Connect API and sources and sinks that use Kafka Connect. Part 2 of 2 in Building Kafka Connectors - The Why and How

KSQL Introduction | Level Up your KSQL

This video offers an introduction to Kafka stream processing, with a focus on KSQL.

Rethinking Microservices with Stateful Streams

In this talk we'll examine how Stateful Stream Processing can be used to build Event Driven Services, using a distributed log like Apache Kafka. In doing so this Data-Dichotomy is balanced with an architecture that exhibits demonstrably better scaling properties, be it increased complexity, team size, data volume or velocity.

Streaming Data to Apache Kafka® for Real-time Decisions

During this online talk, presenters from Confluent and Qlik will demonstrate how to accelerate data delivery to enable real-time analytics, make data more valuable with real-time data ingestion to Kafka, modernize data centers by streaming data in real-time, and demo a customer use case for advanced analytics.

Confluent Cloud: Agility for the modern data-driven enterprise (Hans Jespersen, Confluent) | Confluent Streaming Event, Paris 2018

Hans Jespersen (VP WW Systems Engineering, Confluent) Opened afternoon presentations: Confluent Cloud: Agility for the modern data-driven enterprise at Confluent’s streaming event in Paris.

KSQL: Streaming SQL for Apache Kafka®

Learn about the KSQL architecture and how to design and deploy interactive, continuous queries for streaming ETL and real-time analytics.

Introduction to KSQL & Kafka Streams Processing with Ticketmaster

In this all too fabulous talk, we will be addressing the wonderful and new wonders of KSQL vs. KStreams and how Ticketmaster uses KSQL and KStreams in production to reduce development friction in machine learning products.

Kafka for Marketing campaigns: power up your business decisions in near-real-time

In the world of online streaming providers, real-time events are becoming the new standard, driving innovation and a new set of use cases to react to a quickly changing market. We explain how, from simple media player heartbeats, Data Reply fueled a diverse set of near-real-time use cases and services for his customer, from blocking concurrent media streams, to recognizing ended sessions and trending content.

Design and Implementation of Incremental Cooperative Rebalancing

In this technical deep dive, we’ll discuss the proposition of Incremental Cooperative Rebalancing as a way to alleviate stop-the-world and optimize rebalancing in Kafka APIs.

End-to-End Integration from the IoT Edge to Confluent Cloud

This interactive whiteboard presentation discusses use cases leveraging the Apache Kafka® open source ecosystem as an event streaming platform to process IoT data.

The Future of ETL Isn't What It Used to Be

Gwen Shapira presents core patterns of modern data engineering and explains how you can use microservices, event streams and a streaming platform like Apache Kafka to build scalable and reliable data pipelines. Part 1 of 3 in Streaming ETL - The New Data Integration series.

Streamsheets and Apache Kafka – Interactively build real-time Dashboards and Streaming Apps just using your Spreadsheet Skills

Without any coding or scripting, end-users leverage their existing spreadsheet skills to build customized streaming apps for analysis, dashboarding, condition monitoring or any kind of real-time pre-and post-processing of Kafka or KsqlDB streams and tables.

Introducing Events and Stream Processing into Nationwide Building Society

In this online talk, you will learn why, when facing Open Banking regulation and rapidly increasing transaction volumes, Nationwide decided to take load off their back-end systems through real-time streaming of data changes into Apache Kafka®.

KSQL from Confluent | Streaming SQL for Apache Kafka

Get an introduction to and demo of KSQL, Streaming SQL for Apache Kafka.

Bosch Power Tools Enables Real-time Analytics on IoT Event Streams

In this online talk, Bosch’s Ralph Debusmann outlines their architectural vision for bringing many data streams into a single platform, surrounded by databases that can power complex real-time analytics.

Application Development in the Emerging World of Stream Processing - 46:45

Michael Noll provides an introduction to stream processing, use cases, and Apache Kafka.

Running Apache Kafka® on Kubernetes

In this online talk, Joe Beda, CTO of Heptio and co-creator of Kubernetes, and Gwen Shapira, principal data architect at Confluent and Kafka PMC member, will help you navigate through the hype, address frequently asked questions and deliver critical information to help you decide if running Kafka on Kubernetes is the right approach for your organization.

Apache Kafka Architectures and Fundamentals

In this Online Talk Henrik Janzon, Solutions Engineer at Confluent, explains Apache Kafka’s internal design and architecture.

Driving Business Transformation with Real-Time Analytics

This talk will cover how to integrate real-time analytics and visualizations to drive business processes and how KSQL, streaming SQL for Kafka, can easily transform and filter streams of data in real time.

Microservices Explained

What is microservices? And how does it work in the Apache Kafka ecosystem.

Stream Processing with Apache Kafka and .NET

In this talk, Matt Howlett will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.

Demystifying Stream Processing with Apache Kafka®

Learn how to map practical data problems to stream processing and write applications that process streams of data at scale using Kafka Streams. Part 4 in the Apache Kafka: Online Talk Series.

Exploring KSQL Patterns

Tim Berglund covers the patterns and techniques of using KSQL. Part 1 of the Empowering Streams through KSQL series.

Financial Event Sourcing at Enterprise Scale

Rabobank rose to this challenge and defined the Business Event Bus (BEB) as the place where business events from across the organization are shared between applications.

Building an Enterprise Eventing Framework

Learn how Centene improved their ability to interact and engage with healthcare providers in real time with MongoDB and Confluent Platform.

Divide, Distribute and Conquer: Stream vs. Batch [Philly JUG]

In this talk, get a short introduction to common approaches and architectures (lambda, kappa) for streaming processing and learn how to use open-source steam processing tools (Flink, Kafka Streams, Hazelcast Jet) for stream processing.

Building Event-Driven Services with Apache Kafka®

This practical talk will dig into how we piece services together in event driven systems, how we use a distributed log to create a central, persistent narrative and what benefits we reap from doing so. Part 2 in the Apache Kafka® for Microservices: A Confluent Online Talk Series.

Distributed Stream Processing with Apache Kafka

Presentation from Apache Kafka Meetup at Strata San Jose (3/14/17). Jay Kreps will introduce Kafka and explain why it has become the de facto standard for streaming data.

Best Practices for Streaming IoT Data with MQTT and Apache Kafka®

In this session, we will identify and demo some best practices for implementing a large scale IoT system that can stream MQTT messages to Apache Kafka.

The Digital Transformation Mindset – More Than Just Technology

What was once a ‘batch’ mindset is quickly being replaced with stream processing as the demands of the business impose real-time requirements on technology leaders.

Real-Time Decision Engines: react to your business when it speaks to you

In this talk, we are going to show some example use cases that Data Reply developed for some of its customers and how Real-Time Decision Engines had an impact on their businesses.

Disaster Recovery Plans for Apache Kafka®

In this session, we discuss disaster scenarios that can take down entire Apache Kafka® clusters and share advice on how to plan, prepare and handle these events. Part 4 in the Best Practices for Apache Kafka in Production Series.

Common Patterns of Multi-Datacenter Architectures with Apache Kafka®

In this session, we discuss the basic patterns of multi-datacenter Apache Kafka® architectures, explore some of the use cases enabled by each architecture and show how Confluent Enterprise products make these patterns easy to implement. Part 3 in the Best Practices for Apache Kafka in Production Series.

Fueling real-time use cases in online media streaming through Kafka

We explain how the microservice ecosystem around Apache Kafka was built to ensure the ability to build and deploy new streaming agents on AWS fast and with the least amount of operational effort possible, as well as some of the issues we found and worked around.

Apache Kafka for Java Developers [Pittsburgh JUG]

In this talk, we’ll review the breadth of Apache Kafka as a streaming data platform, including, its internal architecture and its approach to pub/sub messaging.

Architecting Microservices Applications with Instant Analytics

This online talk explores how Apache Druid and Apache Kafka® can turn a microservices ecosystem into a distributed real-time application with instant analytics.

Streaming Data Ingest and Processing with Kafka

Experts from Confluent and Attunity share how you can: realize the value of streaming data ingest with Apache Kafka®, turn databases into live feeds for streaming ingest and processing, accelerate data delivery to enable real-time analytics and reduce skill and training requirements for data ingest.

Benefits of Stream Processing and Apache Kafka Use Cases

This talk explains how companies are using event-driven architecture to transform their business and how Apache Kafka serves as the foundation for streaming data applications. Part 1 of 4 in our Fundamentals for Apache Kafka series

Enabling event streaming at AO.com

Learn how AO.com are enabling real-time event-driven applications to improve customer experience using Confluent Platform.

Why Build an Apache Kafka® Connector

This online talk focuses on the key business drivers behind connecting to Kafka and introduces the new Confluent Verified Integrations Program. Part 1 of 2 in Building Kafka Connectors - The Why and How

Stream Processing and Build Streaming Data Pipelines with Apache Kafka and KSQL

In this talk, we’ll explain the architectural reasoning for Apache Kafka® and the benefits of real-time integration, and we’ll build a streaming data pipeline using nothing but our bare hands, Kafka Connect and KSQL.

Reliability Guarantees in Apache Kafka®

In this session, we go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost. Build a bulletproof data pipeline with Apache Kafka. Part 2 in the Best Practices for Apache Kafka in Production Series.

Apache Kafka® and Stream Processing at Pinterest

In this talk, members of the Pinterest team offer lessons learned from their Confluent Go client migration and discuss their use cases for adopting Kafka Streams.

Connect all the Things! An intro to event streaming for the automotive industry, mobility services, and smart city concepts

The Fourth Industrial Revolution (also known as Industry 4.0) is the ongoing automation of traditional manufacturing and industrial practices, using modern smart technology. Event Streaming with Apache Kafka plays a massive role in processing massive volumes of data in real-time in a reliable, scalable, and flexible way using integrating with various legacy and modern data sources and sinks.

What's New in Confluent Platform 5.4

Join the Confluent Product team as we provide a technical overview of Confluent Platform 5.4, which delivers groundbreaking enhancements in the areas of security, disaster recovery and scalability.

Integrating Databases into Apache Kafka®

This talk looks at one of the most common integration requirements – connecting databases to Apache Kafka.

The database is only half done

Databases represent some of the most successful software that has ever been written and their importance over the last fifty years is hard to overemphasize. Over this time, they have evolved to form a vast landscape of products that cater to different data types, volumes, velocities, and query characteristics. But the broad definition of what a database is has changed relatively little.

A Tour of Apache Kafka

Learn about typical Apache Kafka use cases and how organisations can process large quantities of data in real time using the Kafka Streams API and KSQL.

Streaming in Practice: Putting Apache Kafka® in Production

This talk focuses on how to integrate all the components of the Apache Kafka® ecosystem into an enterprise environment and what you need to consider as you move into production. Part 6 of the Apache Kafka: Online Talk Series.

How to Unlock your Mainframe Data

Large enterprises, government agencies, and many other organisations rely on mainframe computers to deliver the core systems managing some of their most valuable and sensitive data. However, the processes and cultures around a mainframe often prevent the adoption of the agile, born-on-the web practices that have become essential to developing cutting edge internal and customer-facing applications.

Integrating Apache Kafka® Into Your Environment

This session will show you how to get streams of data into and out of Kafka with Kafka Connect and REST Proxy, maintain data formats and ensure compatibility with Schema Registry and Avro, and build real-time stream processing applications with Confluent KSQL and Kafka Streams.

Leveraging Microservice Architectures & Event-Driven Systems for Global APIs

In this talk we will look at what event driven systems are; how they provide a unique contract for services to communicate and share data and how stream processing tools can be used to simplify the interaction between different services.

Bridge to Cloud (Peter Gustafsson, Confluent) - Big Data LDN 2019

This session covers architectures best practises and recommendations for organisations aiming for a more cloud-centric approach in the use of Apache Kafka.

Spring Kafka Beyond the Basics: lessons Learned on our Kafka Journey at ING bank

You know the fundamentals of Apache Kafka. You are a Spring Boot developer and working with Apache Kafka. You have chosen Spring Kafka to integrate with Apache Kafka. You implemented your first producer, consumer, and maybe some Kafka streams, it's working... Hurray! You are ready to deploy to production what can possibly go wrong?

Apache Kafka® + Machine Learning for Supply Chain

This talk showcases different use cases in automation and Industrial IoT (IIoT) where an event streaming platform adds business value.

Architectures for Event Streaming (Nick Dearden, Confluent)

Event Streaming Paradigm: rethink data as not stored records or transient messages, but instead as a continually updating stream of events.

The Data Dichotomy: Rethinking the Way We Treat Data and Services

This talk will examine the underlying dichotomy we all face as we piece such systems together--one that is not well served today. The solution lies in blending the old with the new and Apache Kafka® plays a central role. Part 1 in the Apache Kafka for Microservices: A Confluent Online Talk Series.

Unlock the Power of Streaming Data with Kinetica and Confluent Platform

See how Kinetica enables businesses to leverage the streaming data delivered with Confluent Platform to gain actionable insights.

Streaming Data Integration with Apache Kafka®

Learn different options for integrating systems and applications with Apache Kafka® and best practices for building large-scale data pipelines using Apache Kafka. Part 3 in the Apache Kafka: Online Talk Series.

Apache Kafka® Delivers a Single Source of Truth for The New York Times

Join The New York Times' Director of Engineering Boerge Svingen to learn how the innovative news giant of America transformed the way it sources content—all through the power of a real-time streaming platform.

Unleashing Apache Kafka and TensorFlow in the Cloud

In this online talk, Technology Evangelist Kai Waehner will discuss and demo how you can leverage technologies such as TensorFlow with your Kafka deployments to build a scalable, mission-critical machine learning infrastructure for ingesting, preprocessing, training, deploying and monitoring analytic models.

Bridge to Cloud: Using Apache Kafka to Migrate to AWS

In this session, we will share how companies around the world are using Confluent Cloud, a fully managed Apache Kafka® service, to migrate to AWS.

Real-time Data Streaming from Oracle to Apache Kafka

Learn typical use cases for Apache Kafka®, how you can get real-time data streaming from Oracle databases to move transactional data to Kafka and enable continuous movement of your data to provide access to real-time analytics.

Deploying and Operating KSQL

In this session, Nick Dearden covers the planning and operation of your KSQL deployment, including under-the-hood architectural details. Part 3 out of 3 in the Empowering Streams through KSQL series.

First steps with Confluent Cloud and .NET

In this webinar, we take a hands-on approach to these questions and walk through setting up a simple application written in .NET to a Confluent Cloud based Kafka cluster. Along the way, we point out best practices for developing and deploying applications that scale easily.

Introducing Exactly Once Semantics in Apache Kafka®

Learn about the recent additions to Apache Kafka® to achieve exactly-once semantics (EoS) including support for idempotence and transactions in the Kafka clients.

How to Fail at Apache Kafka®

This online talk is based on real-world experience of Kafka deployments and explores a collection of common mistakes that are made when running Kafka in production and some best practices to avoid them.

How Intrado Transformed Conferencing with an Event-Driven Architecture

Hear from Intrado’s Thomas Squeo, CTO, and Confluent’s Chief Customer Officer, Roger Scott, to learn how Intrado future-proofed their architecture to support current and future real-time business initiatives.

Life is a Stream of Events

Learn how NAV (Norwegian Work and Welfare Department) are using Apache Kafka to distribute and act upon events. NAV currently distributes more than one-third of the national budget to citizens in Norway or abroad. They are there to assist people through all phases of life within the domains of work, family, health, retirement, and social security. Events happening throughout a person’s life determines which services NAV provides to them, how they provide them, and when they offer them.

Putting the Micro into Microservices with Stateful Stream Processing

This talk will look at how Stateful Stream Processing is used to build truly autonomous services. With the distributed guarantees of Exactly Once Processing, Event-Driven Services supported by Apache Kafka®. Part 3 in the Apache Kafka for Microservices: A Confluent Online Talk Series.

Stateful, Stateless and Serverless – Running Apache Kafka® on Kubernetes

Joe Beda, CTO of Heptio and co-creator of Kubernetes, and Gwen Shapira, principal data architect at Confluent, will help you decide if running Kafka on Kubernetes is the right approach for your organization.

Streaming Transformations - Putting the T in Streaming ETL

We’ll discuss how to leverage some of the more advanced transformation capabilities available in both KSQL and Kafka Connect. Part 3 of 3 in Streaming ETL - The New Data Integration online talk series.

Event Streaming: from Projects to Platform - Lyndon Hedderly, Confluent

Event streaming: from technology to a completely new business paradigm.

Using Apache Kafka to Optimize Real-Time Analytics in Financial Services

This online talk includes in depth practical demonstrations of how Confluent and Panopticon together support several key financial services and IoT applications, including transaction cost analysis and risk monitoring.

Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply

In this online talk, you’ll hear about ingesting your Kafka streams into Imply’s scalable analytic engine and gaining real-time insights via a modern user interface.

Building Microservice Architectures Q&A Panel

Microservices guru Sam Newman, Buoyant CTO Oliver Gould and Apache Kafka® engineer Ben Stopford are joined by Jay Kreps, co-founder and CEO, Confluent for a Q&A session where they discuss and debate all things Microservices.

HomeAway uses Confluent & Apache Kafka® to Transform Travel

HomeAway, the world’s leading online marketplace for the vacation rental industry, uses Apache Kafka® and Confluent to match travelers with 2 million+ unique places to stay in 190 countries.

Modernizing the Manufacturing Industry with Kafka and MQTT

Industry 4.0 and smart manufacturing are driving the manufacturing industry to modernize their software infrastructure. This session will look at the unique business drivers for modernizing the manufacturing industry and how MQTT and Kafka can help make it a reality.

The Top 5 Event Streaming Use Cases & Architectures in 2021

Learn how companies will leverage event streaming, Apache Kafka, and Confluent to meet the demand of a real-time market, rising regulations, and customer expectations, and much more in 2021

Introduction to Stream Processing with Apache Kafka®

Get an introduction to Apache Kafka® and how it serves as a foundation for streaming data pipelines and applications that consume/process real-time data streams. Part 1 in the Apache Kafka: Online Talk Series.

Simplified Hybrid Cloud Migration with Confluent and Google Cloud

Join Unity, Confluent and GCP to learn how to reduce risk and increase business options with a hybrid cloud strategy.

Apache Kafka® Delivers Single Source of Truth for The New York Times

Join The New York Times' Director of Engineering Boerge Svingen to learn how the innovative news giant of America transformed the way it sources content while still maintaining searchability, accuracy and accessibility—all through the power of a real-time streaming platform.

Data Streaming with Apache Kafka & MongoDB

Explore the use cases and architecture for Apache Kafka®, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.

How Apache Kafka® Works

Pick up best practices for developing applications that use Apache Kafka, beginning with a high level code overview for a basic producer and consumer.

Apache Kafka: Advice from the trenches or how to successfully fail!

Operating a complex distributed system such as Apache Kafka could be a lot of work. In this talk we will review common issues, and mitigation strategies, seen from the trenches helping teams around the globe with their Kafka infrastructure.

ETL is Dead, Long Live Streams!

Neha Narkhede talks about the experience at LinkedIn moving from batch-oriented ETL to real-time streams using Apache Kafka and how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data.

Event Streaming Platforms

Watch Lyndon Hedderly's keynote from Big Data Analytics London 2018.

Kafka and the Service Mesh

Originally presented by Gwen Shapira at Gluecon 2018, this talk covers the similarities and differences between the communication layer provided by a service mesh and Apache Kafka and their implementations, as well as ways you can combine them together.

Kafka: The Bridge to ISS A/S One-Stop-Shop for Data

This 60-minute online talk is packed with practical insights where you will learn how Kafka fits into a data ecosystem that spans a global enterprise and supports use cases for both data ingestion and integration

Intelligent Real-Time Decisions with VoltDB and Apache Kafka®

Join experts from VoltDB and Confluent to see why and how enterprises are using Apache Kafka as the central nervous system in combination with VoltDB.

Apache Kafka: Past, Present and Future

Confluent Co-founder Jun Rao discusses how Apache Kafka® became the predominant publish/subscribe messaging system that it is today, Kafka's most recent additions to its enterprise-level set of features and how to evolve your Kafka implementation into a complete real-time streaming data platform.

Real-Time Data Streaming with Kafka and Confluent in the Telecom Industry

Join Kai Waehner, Technology Evangelist at Confluent, for this session which explores various telecommunications use cases, including data integration, infrastructure monitoring, data distribution, data processing and business applications. Different architectures and components from the Kafka ecosystem are also discussed.

Monitoring Apache Kafka® and Streaming Applications

In this presentation, we discuss best practices of monitoring Apache Kafka®. Part 5 of the Best Practices for Apache Kafka in Production series.

Kafka and Big Data Streaming Use Cases in the Gaming Industry

Learn how Apache Kafka and Confluent help the gaming industry leverage real-time integration, event streaming, and data analytics for seamless gaming experiences at scale.

Using a combination of Confluent Cloud and Kafka tooling on your laptop for solution prototyping and development

Developing a streaming solution working against a self-managed Kafka cluster, can be awkward and time consuming, largely due to security requirements and configuration red-tape. It's beneficial to use Confluent Cloud in the early stages to get quick progress. Creating the cluster in Confluent Cloud is super easy and allows you to concentrate on defining your Connect sources and sinks as well as fleshing out the streaming topology on your laptop. It also shows the client how easy it is to swap out the self-managed Kafka cluster with Confluent Cloud.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Presentation on Kafka basics

inoio/slides-kafka-fundamentals

Folders and files, repository files navigation, slides kafka fundamentals.

This is a presentation about Kafka Fundamentals, i.e. things that every engineer working with Kafka and expecting reliable messaging (e.g. in the context of event collaboration) should know.

It covers (java) producers, consumers and the cluster itself, i.e. topic partitions, replicas etc. It's not about e.g. Kafka Connect, Kafka Streams or 3rd party client libraries like spring-kafka or reactor-kafka.

See the rendered version here: inoio.github.io/slides-kafka-fundamentals/

How to contribute

If you find anything that could be improved it would be awesome if you'd sumit a pull request. In case images/visualizations are involved, source for images can be found here .

  • JavaScript 66.1%

PowerShow.com - The best place to view and share online presentations

  • Preferences

Free template

Apache Kafka - PowerPoint PPT Presentation

presentation on kafka

Apache Kafka

This presentation gives an overview of the apache kafka project. it covers areas like producer, consumer, topic, partitions, api's, architecture and usage. links for further information and connecting music by "little planet", composed and performed by bensound from – powerpoint ppt presentation.

  • A stream processing platform
  • Open source / Apache 2.0 license
  • Written in Java and Scala
  • A publish/subscribe system for record streams
  • Scaleable / fault tolerant
  • Topic based partition FIFO queues
  • Kafka runs as a cluster of servers
  • Stores records in topics
  • Topics are partitioned into queues
  • Partitions are stored across cluster
  • Consumers organised into groups
  • Stream processors transform records
  • Reusable connectors process queues
  • For instance database connectors
  • Producer API
  • Allows applications to publish to topics
  • Consumer API
  • Applications subscribe to topics / process data streams
  • Streams API
  • Applications acts as stream processor, transforming stream
  • Connector API
  • Build reusable producers / consumers
  • I.E. RDBMS connectors/producers/consumers
  • For topic and broker management
  • Records published to Topics
  • Topics are multi subscriber
  • Topics contain partition queues
  • A partition queue contains an sequence of records
  • Each record has a queue offset ( position )
  • Consumers use the offset to read records
  • Queue record retention is configurable
  • Producers write to partitions i.e. Producer1 ? P0
  • Producers responsible for record ? partition mapping
  • Kafka only guarantees order with a partition
  • Kafka cluster contains ltngt servers
  • Partitions mapped to servers
  • Consumers members of consumer groups
  • Each consumer must maintain it's partition read offset
  • A low latency messaging system
  • Records load balanced across partitions
  • As a storage system
  • Using local file system storage
  • Scales horizontally in terms of performance
  • As a stream processing system
  • Using stream API to transform data
  • Data replication provides fault tolerance
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • Feel free to connect on LinkedIn
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration

PowerShow.com is a leading presentation sharing website. It has millions of presentations already uploaded and available with 1,000s more being uploaded by its users every day. Whatever your area of interest, here you’ll be able to find and view presentations you’ll love and possibly download. And, best of all, it is completely free and easy to use.

You might even have a presentation you’d like to share with others. If so, just upload it to PowerShow.com. We’ll convert it to an HTML5 slideshow that includes all the media types you’ve already added: audio, video, music, pictures, animations and transition effects. Then you can share it with your target audience as well as PowerShow.com’s millions of monthly visitors. And, again, it’s all free.

About the Developers

PowerShow.com is brought to you by  CrystalGraphics , the award-winning developer and market-leading publisher of rich-media enhancement products for presentations. Our product offerings include millions of PowerPoint templates, diagrams, animated 3D characters and more.

  • & Event-driven servers

How to stream Kafka messages to Internet-facing clients over WebSockets

Alex Diaconu

Apache Kafka is one of the most powerful asynchronous messaging technologies out there. Designed in 2010 at LinkedIn by a team that included Jay Kreps, Jun Rao, and Neha Narkhede, Kafka was open-sourced in early 2011. Nowadays, the tool is used by a plethora of companies (including tech giants, such as Slack, Airbnb, or Netflix) to power their realtime data streaming pipelines. 

Since Kafka is so popular, I was curious to see if you can use it to stream realtime data directly to end-users, over the Internet, and via WebSockets . After all, Kafka has a series of characteristics that seem to make it a noteworthy choice, such as:

High throughput

Low latency

High concurrency

Fault tolerance

Durability (persistence)

Existing solutions for streaming Kafka messages to Internet-facing clients

I started researching to see what the realtime development community has to say about this use case. I soon discovered that Kafka was originally designed to be used inside a secure network for machine to machine communication. This made me think that you probably need to use some sort of middleware if you want to stream data from Kafka to Internet-facing users over WebSockets.

I continued researching, hoping to find some open-source solutions that could act as the middleware. I discovered several of them, that can theoretically be used as an intermediary between Kafka and clients that connect to a stream of data over the Internet:

transfers_websockets_service

kafka-websocket

kafka-proxy-ws

Unfortunately, all of the solutions listed above are proofs of concept and nothing more. They have a limited feature set and aren’t production-ready (especially at scale).

I then looked to see how established tech companies are solving this Kafka use case; it seems that they are indeed using some kind of middleware. For example, Trello has developed a simplified version of the WebSocket protocol that only supports subscribe and unsubscribe commands. Another example is provided by Slack. The company has built a broker called Flannel , which is essentially an application-level caching service deployed to edge points of presence.

Of course, companies like Trello or Slack can afford to invest the required resources to build such solutions. However, developing your own middleware is not always a viable option — it’s a very complex undertaking that requires a lot of resources and time. Another option — the most convenient and common one — is to use established third-party solutions.

As we’ve seen, the general consensus seems to be that Kafka is not suitable for last mile delivery over the Internet by itself; you need to use it in combination with another component: an Internet-facing realtime messaging service .

Here at Ably, many of our customers are streaming Kafka messages via our pub/sub Internet-facing realtime messaging service . To demonstrate how simple it is, here’s an example of how data is consumed from Kafka and published to Ably: 

And here’s how clients connect to Ably to consume data:

You can decide to use any Internet-facing realtime messaging service between Kafka and client devices. However, regardless of your choice, you need to consider the entire spectrum of challenges this messaging service layer must be equipped to deal with.

Introducing the Ably Kafka Connector : we've made it easy for developers to extend Kafka to the edge reliably and safely.

Using Kafka and a messaging middleware: engineering and operational challenges

Before I get started, I must emphasize that the design pattern that is covered in this article involves using a WebSocket-based realtime messaging middleware between Kafka and your Internet-facing users . 

It’s also worth mentioning that I’ve written this article based on the extensive collective knowledge that the Ably team possesses about the challenges of realtime data streaming at scale.

I’m now going to dive into the key things you need to think about, such as message fan-out, service availability, security, architecture, or managing back-pressure. The Internet-facing messaging service you choose to use between Kafka and end-users must be equipped to efficiently handle all these complexities if you are to build a scalable and reliable system. 

Message routing

One of the key things you will have to consider is how to ensure that client devices only receive relevant messages. Most of the time, it’s not scalable to have a 1:1 mapping between clients and Kafka topics, so you will have topics that are shared across multiple devices. 

For example, let’s say we have a credit card company that wants to stream a high volume of transaction information to its clients. The company uses a topic that is split (sharded) into multiple partitions to increase the total throughput of messages. In this scenario, Kafka provides ordering guarantees — transactions are ordered by partition.

However, when a client device connects via a browser to receive transaction information, it only wants and should only be allowed to receive transactions that are relevant for that user/device. But the client doesn’t know the exact partition it needs to receive information from, and Kafka doesn’t have a mechanism that can help with this.

To solve the problem, you need to use an Internet-facing realtime messaging service in the middle, between your Kafka layer and your end-users, as illustrated below. 

presentation on kafka

The benefits of using this model:

Flexible routing of messages from Kafka to Internet-facing topics.

Ensures clients connecting over the Internet only subscribe to relevant topics. 

Enhanced security, because clients don’t have access to the secure network where your Kafka cluster is deployed; data is pushed from Kafka to the Internet-facing realtime messaging service; client devices interact with the latter, rather than pulling data directly from Kafka.

System security

One of the main reasons Kafka isn’t used for last-mile delivery relates to security and availability. To put it simply, you don’t want your data-processing component to be accessible directly over the Internet.

To protect the integrity of your data and the availability of your system, you need an Internet-facing realtime messaging service that can act as a security barrier between Kafka and the client devices it streams messages to. Since this messaging service is exposed to the Internet, it should sit outside the security perimeter of your network. 

You should consider pushing data to the Internet-facing realtime messaging service instead of letting the service pull it from your Kafka layer. This way, in the event that the messaging service is compromised, the data in Kafka will still be secure. An Internet-facing realtime messaging service also helps ensure that you never mistakenly allow a client device to connect to your Kafka deployment or subscribe to a topic they shouldn’t have access to.

You would expect your Internet-facing realtime messaging service to have mechanisms in place that allow it to deal with system abuse, such as denial of service (DoS) attacks — even unintentional ones, which can be just as damaging as malicious attacks. 

Let’s now look at a real-life situation of a DoS attack the team at Ably has had to deal with. Although it wasn’t malicious, it was a DoS attack nonetheless. One of our customers had an issue where a fault in the network led to tens of thousands of connections being dropped simultaneously. Due to a bug in the code, whenever there was a connection failure, the system tried to re-establish the WebSocket connection immediately, regardless of network conditions. This, in turn, led to thousands of client-side connection attempts every few seconds, which didn’t stop until the clients were able to reconnect to the Internet-facing realtime messaging service. Since Ably was the messaging service in the middle, it absorbed the spike in connections, while the underlying Kafka layer remained completely unaffected.

Discover how Ably enforces security

The Ably service mesh includes mechanisms for token-based authentication, privilege-based access, and encryption. Our globally-distributed network is equipped to scale up quickly to respond to huge traffic spikes, allowing us to efficiently mitigate DoS attacks. Additionally, we adhere to information security standards, such as SOC 2 Type 1, as well as to data protection guidelines such as GDPR.

Learn more about Ably’s security and compliance

Data transformation

Often, the data you use internally in your streaming pipeline isn’t suitable for end-users. Depending on your use case, this can lead to performance or bandwidth-related issues for your customers, because you might end up streaming additional and redundant information to them with each message. 

I’ll use an example to better demonstrate what I mean. At Ably, we enable our customers to connect to various data streams. One of these streams is called CTtransit GTFS-realtime (note that CTtransit is a Connecticut Department of Transportation bus service). It’s a free stream that uses publicly available bus data. 

Now imagine you want to connect directly to CTtransit GTFS-realtime to stream data to an app that provides live bus updates to end-users, such as vehicle position or route changes. Each time there is an update (even if it’s for only one bus), the message sent by CTtransit might look something like this:

As you can see, the payload is huge and it covers multiple buses. However, most of the time, an end-user is interested in receiving updates for only one of those buses. Therefore, a relevant message for them would look like this:

Let’s take it even further — a client may wish to only receive the new latitude and longitude values for a vehicle, and as such the payload above would be transformed to the following before it’s sent to the client device:

The point of this example is to demonstrate that if you wish to create optimal and low-latency experiences for end-users, you need to have a strategy around transforming data, so you can break it down it (shard it) into smaller, faster, and more relevant sub-streams that are more suitable for last mile delivery.  

On top of data transformation, and if it’s relevant for your use case, you could consider using message delta compression, a mechanism that enables you to send payloads containing only the difference (the delta) between the present message and the previous one you’ve sent. This reduction in size decreases bandwidth costs, improves latencies for end-users, and enables greater message throughput. 

Check out the Ably realtime deltas comparison demo to see just how much of a difference this makes in terms of message size. Note that the demo uses the American CTtransit data source, so if you happen to look at it at a time when there are no buses running (midnight to 6 am EST or early morning in Europe), you won’t see any data.

You’ve seen just how important it is to transform data and use delta compression from a low-latency perspective — payloads can become dozens of times smaller. Kafka offers some functionality around splitting data streams into smaller sub-streams, and it also allows you to compress messages for the purposes of more efficient storage and faster delivery. However, don’t forget that, as a whole, Kafka was not designed for last mile delivery over the Internet. You’re better off passing the operational complexities of data transformation and delta compression to an Internet-facing realtime messaging service that sits between Kafka and your clients. 

Discover Ably’s message delta compression feature

One of the most popular message delta compression standards is called JSON Patch. The only real issue is that it only works with JSON data, so you can’t use it with other data types. That’s why here at Ably, we’ve chosen a flexible standard for delta compression that is based on the open VCDIFF format and the open source xDelta algorithm. We provide delta compression as a feature because we believe that the efficiency of keeping data in sync and delivering messages with low-latency are operational burdens that shouldn’t generally be a concern to developers. 

Learn more about message delta compression

Transport protocol interoperability

The landscape of transport protocols that you can use for your streaming pipeline is quite diverse. Your system will most likely need to support several protocols: aside from your primary one, you also need to have fallback options, such as XHR streaming, XHR polling, or JSONP polling. Let’s have a quick look at some of the most popular protocols:

WebSocket . Provides full-duplex communication channels over a single TCP connection. Much lower overhead than half-duplex alternatives such as HTTP polling. Great choice for financial tickers, location-based apps, and chat solutions. Most portable realtime protocol, with widespread support.

MQTT . The go-to protocol for streaming data between devices with limited CPU power and/or battery life, and for networks with expensive/low bandwidth, unpredictable stability, or high latency. Great for IoT.

SSE . Open, lightweight, subscribe-only protocol for event-driven data streams. Ideal for subscribing to data feeds, such as live sport updates.

On top of the raw protocols listed above, you can add application-level protocols. For example, in the case of WebSockets, you can choose to use solutions like Socket.IO or SockJS . Of course, you can also build your own custom protocol, but the scenarios where you actually have to are very rare. Designing a custom protocol is a complex process that takes a lot of time and effort. In most cases, you are better off using an existing and well-established solution.

Kafka’s binary protocol over TCP isn’t suitable for communication over the Internet and isn’t supported by browsers. Additionally, Kafka doesn’t have native support for other protocols. As a consequence, you need to use an Internet-facing realtime messaging service that can take data from Kafka, transform it, and push it to subscribers via your desired protocol(s).

Ably and protocol interoperability

At Ably, we embrace open standards and interoperability, and we believe that you should have the flexibility to choose the right protocol for the job at any given moment. That’s why we not only provide our own protocol built on top of WebSockets, but we also support raw WebSockets, SSE and MQTT, among other options, as well as various fallbacks.

Learn more about the protocols Ably supports

Message fan-out

Regardless of the tech stack that you are using to build your data streaming pipeline, one thing you will have to consider is how to manage message fan-out (to be more specific, publishing a message that is received by a high number of users, a one-to-many relationship). Designing for scale dictates that you should use a model where the publisher pushes data to a component that any number of users can subscribe to. The most obvious choice available is the pub/sub pattern . 

When you think about high fan-out, you should consider the elasticity of your system, including both the number of client devices that can connect to it, as well as the number of topics it can sustain. This is often where issues arise. Kafka was designed chiefly for machine to machine communication inside a network, where it streams data to a low number of subscribers. As a consequence, it’s not optimized to fan-out messages to a high number of clients over the Internet.

However, with an Internet-facing realtime messaging service in the middle, the situation is entirely different. You can use the messaging service layer to offload the fanning out of messages to clients. If this layer has the capacity to deliver the fan-out messages, then it can deliver them with very low latency, and without you having to add capacity to your Kafka cluster.

Using Ably at scale

We’ve built a globally-distributed and horizontally-scalable system here at Ably that enables us to have enough capacity to stream billions of messages to millions of devices every day. Ably can successfully absorb high fluctuations in traffic and is equipped to provide a low-latency service at scale. 

Learn more about using Ably at scale

Server elasticity

You need to consider the elasticity of your Kafka layer. System-breaking complications can arise when you expose an inelastic streaming server to the Internet, which is a volatile and unpredictable source of traffic.

Your Kafka layer needs to have the capacity to deal with the volume of Internet traffic at all times. For example, if you’re developing a multiplayer game, and you have a spike in usage that is triggered by actions from tens of thousands of concurrent users, the increased load can propagate to your Kafka cluster, which needs to have the resources to handle it.

It’s true that most streaming servers are elastic, but not dynamically elastic. This is not ideal, as there is no way you can boost Kafka server capacity quickly (in minutes as opposed to hours). What you can do is plan and provision capacity ahead of time, and hope it’s enough to deal with traffic spikes. But there are no guarantees your Kafka layer won’t get overloaded.

Internet-facing realtime messaging services are often better equipped to provide dynamic elasticity. They don’t come without challenges of their own, but you can offload the elasticity problem to the messaging service, protecting your Kafka cluster when there’s a spike of Internet traffic.

Let’s look at a real-life example. A while back, Ably had the pleasure of helping Tennis Australia stream realtime score updates to fans who were browsing the Australian Open website. We had initially load tested the system for 1 million connections per minute. Once we went into production, we discovered that the connections were churning every 15 seconds or so. As a consequence, we actually had to deal with 4 million new connections per minute, an entirely different problem in terms of magnitude. If Tennis Australia hadn’t used Ably as an elastic Internet-facing realtime messaging service in the middle, their underlying server layer (Kafka or otherwise) would have been detrimentally affected. Ably absorbed the load entirely, while the amount of work Tennis Australia had to do stayed the same — they only had to publish one message whenever a rally was completed.

Another thing you’ll have to consider is how to handle connection re-establishment. When a client reconnects, the stream of data must resume where it left off. But which component is responsible for keeping track of this? Is it the Internet-facing realtime messaging service, Kafka, or the client? There’s no right or wrong answer — any of the three can be assigned that responsibility. However, you need to carefully analyze your requirements and consider that if every stream requires data to be stored, you will need to scale storage proportionally to the number of connections. 

The Ably network provides dynamic elasticity

The Ably edge messaging network provides dynamic elasticity, so our customers don’t have to be concerned about the elasticity of their server layer. 

Learn more about the Ably edge messaging network

Globally-distributed architecture

To obtain a low-latency data streaming system, the Internet-facing realtime messaging service you are using must be geographically located in the same region as your Kafka deployment. But that’s not enough. The client devices you send messages to should also be in the same region. For example, you don’t want to stream data from Australia to end-users in Australia via a system that is deployed in the other part of the world. 

If you want to provide low latency from source through to end-users when the sender and receivers are in different parts of the world, you need to think about a globally-distributed architecture. Edge delivery enables you to bring computational processing of messages close to clients.

Another benefit of having a globally-distributed Internet-facing realtime messaging layer is that if your Kafka server fails due to a restart or an incompatible upgrade, the realtime messaging service will maintain the connections alive, so they can quickly resume once the Kafka layer is operational again. In other words, an isolated Kafka failure would have no direct impact on all the clients that are subscribed to the data stream. This is one of the main advantages of distributed systems — components fail independently and don’t cause cascading failures.

On the other hand, if you don’t use an Internet-facing realtime edge messaging layer, a Kafka failure would be much harder to manage. It would cause all connections to terminate. When that happens, clients would try to reconnect immediately, which would add more load to any other existing Kafka nodes in the system. The nodes could become overloaded, which would cause cascading failures.

Let’s look at some common globally-distributed architecture models you can use. In the first model, Kafka is deployed inside a secure network and pushes data to the Internet-facing realtime messaging service. The messaging service sits on the edge of the secure network, being exposed to the Internet. 

presentation on kafka

A secondary model you can resort to involves one Internet-facing realtime messaging service and two instances of Kafka, primary and backup/fallback. Since the messaging service is decoupled, it doesn’t know (nor does it care) which of the two Kafka instances is feeding it data. This model is a failover design that adds a layer of reliability to your Kafka setup: if one of the Kafka instances fails, the second one will take its place.  

presentation on kafka

A third model, which is quite popular with Ably customers, is an active-active approach. It requires two Kafka clusters running at the same time, independently, to share the load. Both clusters operate at 50% capacity and use the same Internet-facing realtime messaging service. This model is especially useful in scenarios where you need to stream messages to a high number of client devices. Should one of the Kafka clusters fail, the other one can pick up 50% of the load, to keep your system running.

presentation on kafka

Managing back-pressure

When streaming data to client devices over the Internet, back-pressure is one of the key issues you will have to deal with. For example, let’s assume you are streaming 25 messages per second, but a client can only handle 10 messages per second. What do you do with the remaining 15 messages per second that the client is unable to consume?

Since Kafka was designed for machine to machine communication, it doesn’t provide you with a good mechanism to manage back-pressure over the Internet. However, if you use an Internet-facing realtime messaging service between Kafka and your clients, you may be better equipped to deal with this issue.

Even with a messaging service in the middle, you still need to decide what is more important for your streaming pipeline: low latency or data integrity? They are not mutually exclusive, but choosing one will affect the other to a certain degree.

For example, let’s assume you have a trading app that is used by brokers and traders. In our first use case, the end-users are interested in receiving currency updates as quickly as possible. In this context, low latency should be your focus, while data integrity is of lower importance.

To achieve low latency, you can use back-pressure control, which monitors the buffers building up on the sockets used to stream data to client devices. This packet-level mechanism ensures that buffers don’t grow beyond what the downstream connection can sustain. You can also bake in conflation, which essentially allows you to aggregate multiple messages into a single one. This way, you can control downstream message rates. Additionally, conflation can be successfully used in unreliable network conditions to ensure upon reconnection that the latest state is an aggregate of recent messages.

If you’d rather deal with back-pressure at application level, you can rely on ACKs from clients that are subscribed to your data stream. With this approach, your system would hold off sending additional batches of messages until it has received acknowledgement codes. 

Now let’s go back to our trading app. In our second use case, the end-users are interested in viewing their transaction histories. In this scenario, data integrity trumps latency, because users need to see their complete transaction records. To manage back-pressure, you can resort to ACKs, which we have already mentioned. 

To ensure integrity, you may need to consider how to deal with exactly-once delivery. For example, you may want to use idempotent publishing over persistent connections. In a nutshell, idempotent publishing means that published messages are only processed once, even if client or connectivity failures cause a publish to be reattempted. So how does it work in practice? Well, if a client device makes a request to buy shares, and the request is successful, but the client times out, the client could try the same request again. Idempotency prevents the client from getting charged twice. 

Find out how Ably achieves idempotency

Over the years, we’ve thought a lot about data integrity and exactly-once delivery. Ably supports exactly-once semantics, idempotency, and guaranteed onward processing. 

Learn more about:

idempotency

how you can achieve exactly-once message processing with Ably

Conclusion: Should you use Kafka ,to stream data directly to client devices over the Internet?

Kafka is a great tool for what it was designed to do — machine to machine communication.  You can and should use it as a component of your data streaming pipeline.  But it is not meant for streaming data directly to client devices over the Internet, and intermediary Internet-facing realtime messaging solutions are designed and optimized precisely to take on that responsibility.

Hopefully, this article will help you focus on the right things you need to consider when building a streaming pipeline that uses Kafka and an Internet-facing realtime messaging service that supports WebSockets. It doesn’t really matter if you plan on developing such a service yourself, or if you wish to use existing tech — the scalability and operational challenges you’ll face are the same.

But let’s not stop here. If you want to talk more about this topic or if you'd like to find out more about Ably and how we can help you in your Kafka and WebSockets journey, get in touch or  sign up for a free account .

We’ve written a lot over the years about realtime messaging and building effective data streaming pipelines. Here are some useful links, for further exploration:

Extend Kafka to end-users at the edge with Ably

Confluent Blog: Building a Dependable Real-Time Betting App with Confluent Cloud and Ably

The WebSocket Handbook: learn about the technology behind the realtime web

Ably Kafka Connector: extend Kafka to the edge reliably and safely

Building a realtime ticket booking solution with Kafka, FastAPI, and Ably

WebSockets — A Conceptual Deep-Dive

Dependable realtime banking with Kafka and Ably

Ably resources & datasheets

Download the report

Recommended Articles

presentation on kafka

The challenge of scaling WebSockets [with video]

Scaling WebSockets for a production system can be challenging in terms of load balancing, fallback strategy, and connection management. Here's how to tackle it.

presentation on kafka

Socket.IO vs. WebSocket: Key differences and which to use

We compare Socket.IO with WebSocket. Discover how they are different, their pros & cons, and their use cases.

presentation on kafka

SignalR vs. WebSocket: Key differences and which to use

We compare SignalR and WebSocket, two popular realtime technologies. Discover their advantages and disadvantages, use cases, and key differences.

Join the Ably newsletter today

IMAGES

  1. PPT

    presentation on kafka

  2. PPT

    presentation on kafka

  3. PPT

    presentation on kafka

  4. PPT

    presentation on kafka

  5. PPT

    presentation on kafka

  6. PPT

    presentation on kafka

VIDEO

  1. LES PROCHAINS PORTAILS (Luocha, Kafka, ...)

  2. Introduction to Kafka Streams

  3. (PART 1) Kafka Downstream Replication with Mysql InnoDB

  4. Lecture 11

  5. Apache Kafka Explained Quickly

  6. THE TRIAL (1962): Welles /Kafka /Criterion 4K UHD

COMMENTS

  1. Introduction to Apache Kafka

    Introduction to Apache Kafka. Apache Kafka® is a distributed event streaming platform that is used for building real-time data pipelines and streaming applications. Kafka is designed to handle large volumes of data in a scalable and fault-tolerant manner, making it ideal for use cases such as real-time analytics, data ingestion, and event ...

  2. A visual introduction to Apache Kafka

    5 likes • 4,939 views. P. Paul Brebner. Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.

  3. What is Kafka, and How Does it Work? A Tutorial for Beginners

    Apache Kafka is an event streaming platform used to collect, process, store, and integrate data at scale. It has numerous use cases including distributed streaming, stream processing, data integration, and pub/sub messaging. In order to make complete sense of what Kafka does, we'll delve into what an event streaming platform is and how it works.

  4. Intro to Apache Kafka: How Kafka Works

    Kafka Connect. Kafka Connect is a system for connecting non-Kafka systems to Kafka in a declarative way, without requiring you to write a bunch of non-differentiated integration code to connect to the same exact systems that the rest of the world is connecting to. Connect runs as a scalable, fault-tolerant cluster of machines external to the ...

  5. PPT

    1.48k likes | 2.69k Views. Apache Kafka. A high-throughput distributed messaging system. Johan Lundahl. Agenda. Kafka overview Main concepts and comparisons to other messaging systems Features, strengths and tradeoffs Message format and broker concepts Partitioning, Keyed messages, Replication. Download Presentation.

  6. Apache Kafka PPT

    PPT Slides. Slide 1: Title Slide. Title: The Power of Apache Kafka: A Comprehensive Guide. Your Name/Company. Slide 2: Introduction. Define distributed streaming platforms and their importance. Mention key challenges of handling large, real-time datasets. Slide 3: Apache Kafka Overview. Brief definition of Kafka.

  7. Apache Kafka in 5 minutes

    Learn the principles of Apache Kafka and how it works through easy examples and diagrams!If you want to learn more: https://links.datacumulus.com/apache-kafk...

  8. Kafka Tutorial

    4. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka growth exploding Kafka growth exploding 1/3 of all Fortune 500 companies Top ten travel companies, 7 of ten top banks, 8 of ten top insurance companies, 9 of ten top telecom companies LinkedIn, Microsoft and Netflix process 4 comma message a day with Kafka (1,000,000,000,000) Real-time streams of data, used to ...

  9. Apache Kafka Architecture and Its Components-The A-Z Guide

    Kafka's ability to store large logs make it a great choice for event sourcing applications. Logging. Kafka can be used as an external commit-log for a distributed application. Kafka's replication feature can help replicate data between multiple nodes and allow re-syncing in failed nodes for data restoration when required.

  10. Apache Kafka

    Kafka combines three key capabilities so you can implement your use cases for event streaming end-to-end with a single battle-tested solution: To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems. To store streams of events durably and reliably for as long as you want.

  11. Kafka Basics on Confluent Platform

    Kafka Basics on Confluent Platform¶. Apache Kafka® is an open-source, distributed, event streaming platform capable of handling large volumes of real-time data. You use Kafka to build real-time streaming applications. Confluent is a commercial, global corporation that specializes in providing businesses with real-time access to data.

  12. Apache Kafka

    Scaling a Core Banking Engine Using Apache Kafka , Peter Dudbridge (Thought Machine), KS APAC 2021. Scaling an Event-Driven Architecture with IBM and Confluent , Antony Amanse (IBM) & Anton McConville (IBM), KS EU 2021. Development of Dynamic Pricing for Tours Using Real-time Data Feeds , Mourad Benabdelkerim (FREE NOW), KS EU 2021.

  13. Introduction to Apache Kafka

    Efficiency • Kafka achieves it's high throughput and low latency primarily from two key concepts • 1) Batching of individual messages to amortize network overhead and append/consume chunks together • 2) Zero copy I/O using sendfile (Java's NIO FileChannel transferTo method).

  14. Apache Kafka Resources: Guides, e-Books, Videos, and More

    Kafka Summit Bangalore 2024. Shoe retail titan NewLimits is drowning in stale, inconsistent data due to nightly batch jobs that keep failing. Read the comic to see how developer Ada and architect Jax navigate through Batchland with Iris, their guide, and enter Streamscape and the realm of event-driven architectures.

  15. Apache Kafka: What Is and How It Works

    To ensure the reliability of the cluster, Kafka enters with the concept of the Partition Leader. Each partition of a topic in a broker is the leader of the partition and can exist only one leader ...

  16. Kafka papers and presentations

    Presentations. Stream Processing made Easy with Apache Kafka, Guozhang Wang, Hadoop Summit SF, 2016 ( video, slides) Commander: Decoupled, Immutable REST APIs with Kafka Streams, Bobby Calderwood of Capital One, Abstractions Con 2016, Aug 2016. New consumer internals: rebalancing, rebalancing... -- Jason Gustafson (Confluent) & Onur Karaman ...

  17. PPTX Dashboard

    Learn how Kafka, a distributed messaging system, works and how to use it in this presentation by its co-creator.

  18. Kafka Architecture in Depth

    For virtual instructor-led Kafka Official Class, please reach out to us at [email protected] are an official training delivery partner of Confluent.....

  19. inoio/slides-kafka-fundamentals: Presentation on Kafka basics

    This is a presentation about Kafka Fundamentals, i.e. things that every engineer working with Kafka and expecting reliable messaging (e.g. in the context of event collaboration) should know. It covers (java) producers, consumers and the cluster itself, i.e. topic partitions, replicas etc.

  20. PPTX Department of Computer Science and Electrical Engineering

    Department of Computer Science and Electrical Engineering - UMBC

  21. Kafka presentation

    Kafka presentation. Mar 14, 2017 •. 5 likes • 9,624 views. Mohammed Fazuluddin. KAFKA - In and Out. Technology. 1 of 18. Kafka presentation - Download as a PDF or view online for free.

  22. Apache Kafka

    Apache Kafka. Description: This presentation gives an overview of the Apache Kafka project. It covers areas like producer, consumer, topic, partitions, API's, architecture and usage. Links for further information and connecting Music by "Little Planet", composed and performed by Bensound from - PowerPoint PPT presentation. Number of Views: 2197.

  23. Streaming realtime data over the Internet with Kafka and WebSockets

    Kafka is a great tool for what it was designed to do — machine to machine communication. You can and should use it as a component of your data streaming pipeline. But it is not meant for streaming data directly to client devices over the Internet, and intermediary Internet-facing realtime messaging solutions are designed and optimized ...

  24. Rare Franz Kafka Letter to Sell at Sotheby's

    The letter will be sold as part of Sotheby's books and manuscripts online auction out of London, open from June 27 to July 11, the auction house announced on Monday. The estimate is £70,000-£ ...