node.js - Terms in Apache Kafka -
i read apache kafka documentation , couple of more articles start on kafka , how can used in application. however, highly confused @ point.
i unable understand difference between partition , brokers.
kafka provides replication factor reliability. replicated data present on same machine?
difference between {high level, low level} + {producer, consumer}
if kafka doesn't store consumer position, best methods of storing it? people use databases or may store local information client.
is idea build pub-sub system kafka , nodejs (to provide rest api data)?
can guide me in direction? please comment if want me add other relevant information helps better provide solutions.
thanks in advance.
this chance brush on kafka knowledge, i'm sorry if got bit long.
most answers here derived documentation linked, or googling relevant documentation.
since indicated wanting work node.js, i include references arguably best (to knowledge) kafka 0.9.0 client no-kafka, , discuss in last section too.
question 1
i unable understand difference between partition , brokers
brokers:
a broker server running kafka instance, stated introduction:
kafka run cluster comprised of 1 or more servers each of called broker.
partitions:
you publish , consume messages to/from topic. topic can partitioned, and, if running cluster >1 brokers, partitions distributed on brokers (kafka servers).
each partition ordered, immutable sequence of messages continually appended to...
this enables balance load of high throughput topics. can consume one, many or partitions wish. message goes partition determined chosen partitioning strategy (e.g. hashing key, setting partition while publishing etc.).
question 2
kafka provides replication factor reliability. replicated data present on same machine?
if mean replicated on same machine, no, dubious @ best couldn't withstand simple server crash. replication factor determines how many brokers (servers) each partition of topic replicated on. --replication-factor 3 mean each partition on 3 brokers, 1 of them leader (accepting reads/writes) , remaining 2 replicating leader, ready automagically accept leader status should current leader fail. replication factor must smaller number of brokers on cluster when creating topic.
from introduction:
for topic replication factor n, tolerate n-1 server failures without losing messages committed log.
you many replicas on 1 machine running multiple brokers on (maybe on different disks or something, whatever reason).
question 3
difference between {high level, low level} + {producer, consumer}
there 1 producer api (there exists legacy scala client). there 3 consumer apis. old high , low level apis , new unified api. want use new unified api if running kafka 0.9.0 or newer (which be, if getting started). includes new features not available old consumer apis (e.g. security features introduced in 0.9.0), , there should no need old ones (unless chosen library not support new api, mean should switch).
no-kafka supports simpleconsumer api, iirc models old low level api. can simple testing, recommend groupconsumer api, uses new unified api. 1 of strengths (committing offsets), discussed in relation next question.
question 4
if kafka doesn't store consumer position, best methods of storing it? people use databases or may store local information client.
you store them anyway want (on disk etc.). new unified consumer api saves consumer's offset (which message has been sent) automagically. consumer should commit latest processed offset after processing message (consumer.commitoffset in no-kafka groupconsumer), if reconnect consumer after reboot or whatever, newest message have marked not consumed.
this 1 of many excellent reasons using new unified consumer api well.
the offsets stored in highly available (replicated), partitioned topic , cached kafka. you can configure options offset saving (search options offset. or offsets. behind link.
one used commit consumers offset zookeeper, service kafka relies on distributed services, such configuration, zookeeper doesn't scale many writes , has been abstracted away kafka's api. how simpleconsumer in no-kafka saves offsets well.
question 5
is idea build pub-sub system kafka , node.js (to provide rest api data)?
there nothing wrong doing that. myself have made demos node.js + kafka recently, , thoroughly enjoy it. stated above recommend no-kafka library kafka >0.9, older (for >0.8) kafka-node works well, since 0.9 backwards-compatible. if because of nothining else, choose no-kafka support of unified consumer api.
in addition making client-facing interface node.js, can accomplish light stream processing (e.g. enriching , reformatting gathered events) it. maybe formatting kafka logs database example.
heavy stream processing may not best accomplished node.js, implementing resource management, fault-tolerance , such big undertaking, , there stream processing frameworks (samza, spark etc.) such tasks. yes, in different languages, find framework suitable you. prototype heavy tasks node.js, if familiar developing , deploying performant, optimised node applications.
Comments
Post a Comment