MineLiB: A Library of Sub-Operators for Data Mining Algorithms

Summer Research Project @ Systems Group, ETH Zurich

Accelerating data mining algorithms on modern computing hardware is critical for efficient use of the data deluge available in the cloud today. In this project we want to develop a library of algorithmic building blocks (sub-operators) optimize them, and expose them as an application programming interface (API) for data mining application developers. This set of library API, allow the integration of different optimized implementations of these sub-operators targeting different platforms (multicore CPUs, FPGAs, GPUs, etc.).


FastTree: Efficient Distributed Gradient Boosting Tree Framework

Final Year Project @ Systems Research Group, Microsoft Research Asia

A distributed machine learning framework based on Gradient Boosting Decision Tree (GBDT, GBRT, GBM or MART). The current version is built on top of ChaNa, the RDMA-optimized distributed computing engine developed by MSRA Systems Group. With both system-level and algorithm-level optimizations, memory usage and communication cost are reduced, and the performance is better than popular similar tools (e.g. XGBoost).