2024 |
GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs.
IEEE International Parallel and Distributed Processing Symposium (IPDPS). |
2024 |
cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding.
IEEE International Parallel and Distributed Processing Symposium (IPDPS). |
2023 |
PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework.
CoRR. |
2023 |
End-to-End LU Factorization of Large Matrices on GPUs.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP). |
2022 |
STMatch: Accelerating Graph Pattern Matching on GPU with Stack-Based Loop Optimizations.
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC). |
2022 |
SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). |
2022 |
Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training.
Advances in Neural Information Processing Systems (NeurIPS). |
2022 |
Scaling and Selecting GPU Methods for All Pairs Shortest Paths Computations.
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). |
2022 |
Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs.
Proceedings of the 36th ACM International Conference on Supercomputing (ICS). |
2021 |
Scaling Sparse Matrix Multiplication on CPU-GPU Nodes.
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). |
2021 |
Exploring PIM Architecture for High-Performance Graph Pattern Mining.
IEEE Computer Architecture Letters 20(2). |
2021 |
Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks.
CoRR. |
2020 |
Scaling out speculative execution of finite-state machines with parallel merge.
25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). |
2020 |
A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs.
25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). |
2020 |
Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning.
Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT). |
2020 |
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning.
CoRR. |
2019 |
A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction.
28th International Conference on Parallel Architectures and Compilation Techniques (PACT). |
2019 |
Enabling prefix sum parallelism pattern for recurrences with principled function reconstruction.
Proceedings of the 28th International Conference on Compiler Construction (CC). |
2018 |
Revealing parallel scans and reductions in recurrences through function reconstruction.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT). |
2018 |
Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances.
Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO). |
2018 |
A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication.
Advances in Neural Information Processing Systems. |
2017 |
Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation.
Proceedings of the International Conference on Supercomputing (ICS). |
2017 |
Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). |
2016 |
Exploiting recent SIMD architectural advances for irregular applications.
Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO). |
2016 |
Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications.
Proceedings of the 2016 International Conference on Supercomputing (ICS). |