Volltext-Downloads (blau) und Frontdoor-Views (grau)

Leveraging Wikipedia Page Edits for Analytical Processing

  • Wikipedia is the largest free encyclopedia and one of the most popular websites worldwide. Analyzing user activity within this encyclopedic ecosystem represents unique opportunities for academic research and analysis. For this reason, this work is fundamentally concerned with obtaining and processing real-time article edit streams from Wikipedia. In this regard, we leverage the Wikimedia EventStreams API and propose a general-purpose event pipeline allowing for further processing of observed page edits. In the suggested pipeline, events are ingested and transported via an Apache Kafka cluster and inserted into a ClickHouse database for storage and analysis. Finally, we confirm the viability of our design by exploring several exemplary analytical use cases.

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Tim Träris, Maxim Balsacq
Parent Title (English):informatikJournal
Document Type:Contribution to a Periodical
Year of Completion:2021
Release Date:2021/11/08
Tag:Analysis; Apache Kafka; ClickHouse; Event stream processing; Wikipedia
First Page:63
Last Page:68
Open-Access-Status: Open Access 
informatikJournal:informatikJournal 2022
Licence (German):License LogoUrheberrechtlich geschützt