Powered By
Skip to end of metadata
Go to start of metadata

Companies

  • LinkedIn - Apache Kafka is used at LinkedIn for activity stream data and operational metrics. This powers various products like LinkedIn Newsfeed, LinkedIn Today in addition to our offline analytics systems like Hadoop.
  • Twitter - As part of their Storm stream processing infrastructure, e.g. this.
  • Netflix - Real-time monitoring and event-processing pipeline.
  • Square - We use Kafka as a bus to move all systems events through our various datacenters. This includes metrics, logs, custom events etc. On the consumer side, we output into Splunk, Graphite, Esper-like real-time alerting.
  • Spotify - Kafka is used at Spotify as part of their log delivery system.
  • Pinterest - Kafka is used for Pinterest as part of their log collection pipeline.
  • Tumblr - See this
  • Mozilla - Kafka will soon be replacing part of our current production system to collect performance and usage data from the end-users browser for projects like Telemetry, Test Pilot, etc. Downstream consumers usually persist to either HDFS or HBase.
  • Tagged - Apache Kafka drives our new pub sub system which delivers real-time events for users in our latest game - Deckadence. It will soon be used in a host of new use cases including group chat and back end stats and log collection.
  • Foursquare - Kafka powers online to online messaging, and online to offline messaging at Foursquare. We integrate with monitoring, production systems, and our offline infrastructure, including hadoop.
  • StumbleUpon - Data collection platform for analytics.
  • Coursera - At Coursera, Kafka powers education at scale, serving as the data pipeline for realtime learning analytics/dashboards.
  • Mate1.com Inc. - Apache kafka is used at Mate1 as our main event bus that powers our news and activity feeds, automated review systems, and will soon power real time notifications and log distribution.
  • Boundary - Apache Kafka aggregates high-flow message streams into a unified distributed pubsub service, brokering the data for other internal systems as part of Boundary's real-time network analytics infrastructure.
  • DataSift - Apache Kafka is used at DataSift as a collector of monitoring events and to track user's consumption of data streams in real time. http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html
  • Wooga - We use Kafka to aggregate and process tracking data from all our facebook games (which are hosted at various providers) in a central location.
  • AddThis - Apache Kafka is used at AddThis to collect events generated by our data network and broker that data to our analytics clusters and real-time web analytics platform.
  • Urban Airship - At Urban Airship we use Kafka to buffer incoming data points from mobile devices for processing by our analytics infrastructure.
  • Metamarkets - We use Kafka to collect realtime event data from clients, as well as our own internal service metrics, that feed our interactive analytics dashboards.
  • SocialTwist - We use Kafka internally as part of our reliable email queueing system.
  • Countandra - We use a hierarchical distributed counting engine, uses Kafka as a primary speedy interface as well as routing events for cascading counting
  • FlyHajj.com - We use Kafka to collect all metrics and events generated by the users of the website.
  • uSwitch - See this blog.
  • InfoChimps - Kafka is part of the InfoChimps real-time data platform.
  • Visual Revenue - We use Kafka as a distributed queue in front of our web traffic stream processing infrastructure (Storm).
  • Oolya - Kafka is used as the primary high speed message queue to power Storm and our real-time analytics/event ingestion pipelines.
  • Datadog - Kafka brokers data to most systems in our metrics and events ingestion pipeline. Different modules contribute and consume data from it, for streaming CEP (homegrown), persistence (at different "temperatures" in Redis, ElasticSearch, Cassandra, S3), or batch analysis (Hadoop).
  • VisualDNA We use Kafka 1. as an infrastructure that helps us bring continuously the tracking events from various datacenters into our central hadoop cluster for offline processing, 2. as a propagation path for data integration, 3. as a real-time platform for future inference and recommendation engines
  • Sematext - in SPM (performance monitoring + alerting), Kafka is used for metrics collection and feeds SPM's in-memory data aggregation (OLAP cube creation) as well as our CEP/Alerts servers (see also: SPM for Kafka performance monitoring). In SA (search analytics) Kafka is used in search and click stream collection before being aggregated and persisted. In Logsene (log analytics) Kafka is used to pass logs and other events from front-end receivers to the persistent backend.
  • Wize Commerce - At Wize Commerce (previously, NexTag), Kafka is used as a distributed queue in front of Storm based processing for search index generation. We plan to also use it for collecting user generated data on our web tier, landing the data into various data sinks like Hadoop, HBase, etc.
  • Quixey - At Quixey, The Search Engine for Apps, Kafka is an integral part of our eventing, logging and messaging infrastructure.
  • LinkSmart - Kafka is used at LinkSmart as an event stream feeding Hadoop and custom real time systems.
  • Simple - We use Kafka at Simple for log aggregation and to power our analytics infrastructure.
  • LucidWorks Big Data - We use Kafka for syncing LucidWorks Search (Solr) with incoming data from Hadoop and also for sending LucidWorks Search logs back to Hadoop for analysis.
  • RichRelevance - Real-time tracking event pipeline.
  • Loggly - Loggly is the world's most popular cloud-based log management. Our cloud-based log management service helps DevOps and technical teams make sense of the the massive quantity of logs. Kafka is used as part of our log collection and processing infrastructure.
  • Cloud Physics - Kafka is powering our high-flow event pipeline that aggregates over 1.2 billion metric series from 1000+ data centers for near-to-real time data center operational analytics and modeling
  • Graylog2 - Graylog2 is a free and open source log management and data analysis system. It's using Kafka as default transport for Graylog2 Radio. The use case is described here.
  • Ancestry.com - Kafka is used as the event log processing pipeline for delivering better personalized product and service to our customers.
  • Gnip - Kafka is used in their twitter ingestion and processing pipeline.
Labels
  • No labels
  1. Countandra (www.countandra.org), a hierarchical distributed counting engine, uses Kafka as a primary speedy interface as well as routing events for cascading counting.

  2. Anonymous

    Ooyala (www.ooyala.com) - Kafka is used as the primary high speed message queue to power Storm and our real-time analytics/event ingestion pipelines.

    Oh, and the text for Twitter looks messed up.

    -Evan Chan (ev@ooyala.com)

  3. Anonymous

    @VisualDNA We use Kafka 1. as an infrastructure that helps us bring continuously the tracking events from various datacenters into our central hadoop cluster for offline processing, 2. as a propagation path for data integration, 3. as a real-time platform for future inference and recommendation engines

    michal.haris@visualdna.com