Refine
Document type
Language
- English (3)
Has full text
- Yes (3)
Is part of the Bibliography
- No (3)
Keywords
- Analysis (1)
- Apache Kafka (1)
- ClickHouse (1)
- Event stream processing (1)
- GQL (1)
- Graph analytics (1)
- Graph database (1)
- Path semantics (1)
- Wikipedia (1)
- Worst-case optimal join (1)
Wikipedia is the largest free encyclopedia and one of the most popular websites worldwide. Analyzing user activity within this encyclopedic ecosystem represents unique opportunities for academic research and analysis. For this reason, this work is fundamentally concerned with obtaining and processing real-time article edit streams from Wikipedia. In this regard, we leverage the Wikimedia EventStreams API and propose a general-purpose event pipeline allowing for further processing of observed page edits. In the suggested pipeline, events are ingested and transported via an Apache Kafka cluster and inserted into a ClickHouse database for storage and analysis. Finally, we confirm the viability of our design by exploring several exemplary analytical use cases.
Compared to relational databases, graph database systems provide a novel way of processing and analyzing highly interconnected data. Due to their unique properties, graph databases embody an interesting area of research in academic circles. For this reason, this work is fundamentally concerned with examining the state of the industry and current challenges. In this regard, we revisit the basic concepts and highlight the tremendous heterogeneity of available systems using the example of differing path semantics. Based on this insight, we explore algorithmic advancements for graph query processing regarding path finding and worst-case optimal joins. Subsequently, we discuss issues regarding performance and support for graph analytics. Finally, we provide an overview of GQL, a joint standardization effort towards unification of property graph databases.