Prof. Hanhua Chen
Huazhong University of Science and Technology, China
Efficient Data Parallelism in Distributed Stream Processing Systems
The recent advances in distributed stream processing systems bring the community great capability to process extremely huge volumes of real-time data streams. To achieve high processing time-efficiency, distributed stream processing systems exploit various data parallelisms technology for partitioning the stream workloads. However, the highly skewed distribution of real-world stream data raises unique challenges to distributed stream processing systems. Existing stream workload partitioning schemes usually use a "one size fits all" design, leading to notable unsatisfied system throughput and processing latency.
In this speech, we show that the key to efficient stream partitioning is to identify the popularity of the stream data. We propose PStream, a highly time-efficient distributed stream processing system which uses a novel differentiated scheme for data parallelism for stream data partitioning. PStream leverages a novel light-weighted probabilistic counting scheme for identifying the currently hot keys in dynamic real-time streams. The scheme is extremely efficient in computation and memory consumption, so that the predictor based on it can be well integrated into processing instances in the system. We implement PStream on top of Apache Storm and conduct comprehensive experiments using large-scale real-world traces to evaluate the system performance. Results demonstrate the high efficiency of PStream.
Hanhua Chen received the PhD degree in computer science and engineering from the Huazhong University of Science and Technology (HUST), in 2010. He is currently a professor at the School of Computer Science and Technology, HUST, China. His research interests include big data processing systems and distributed computing systems. He received the National Excellent Doctoral Dissertation Award of China in 2012. He has published more than 70 research papers in important international conferences such as SIGMOD, ICDE, WWW, RTSS, INFOCOM, ICNP, ICDCS, IPDPS, IWQoS, ICPP.