Chenhao Wu (Vito)

Short Bio

I am a first year Ph.D. student at Department of Information Engineering, Chinese University of Hong Kong. I am currently doing research at CUHK AIoT lab in general topics of distributed edge computing systems with my advisor Prof. Guoliang Xing.

Prior to my Ph.D. study, I received my bachelor degree in Computer Science and Engineering from Chinese University of Hong Kong, Shenzhen, where I worked closely with Prof. Shenghao Yang in developing new network protocols. During my bachelor study, I have directed a number of software implementations that basically in collaborations with my colleagues to bring algorithms in fields of information theory, coding theory and cryptography into actual implementations.

Get to know more about me, you can also refer to Misc about Vito.

Research Interests

High Performance Computing Computer Architecture Distributed Systems Embedded Systems Edge Computing


  • C, C++
  • x86/ARM/GPU Assembly
  • CUDA, OpenCL, OpenGL
  • Python, Perl
  • OpenMP, MPI, Pthread, Athread
  • Vtune, gprof, Valgrind
  • VHDL
  • LaTeX
  • JavaScript
  • MySQL
  • Linux, git, perforce, gdb


The Chinese University of Hong Kong
Sha Tin, N.T., Hong Kong
August 2021 - Present

Industrial Experiences

GPU Architect Intern, Streaming Multi-Processor Team
April 2020 - November 2020
  • Investigate and propose architecture ideas based on quantitative study of existing and projected SM architecture.
  • Develop performance and functional simulation models.
  • Develop performance and functional test plans to validate new SM architectural features.
  • Test and debug on simulators, RTL and real silicon.

Research Experiences

Founder and Leader
September 2019 - February 2021
Established a team of 16 undergraduate students for HPC related research. Navigated the first research funding from Huawei. Continually preparing and aiming for the top-tier student supercomputing competitions, including ASC, ISC and SC.
  • Orchestrated the weekly seminars on state-of-art HPC and supercomputing knowledge.
  • Optimized a parallel quantum computer simulater QUEST by reducing the cache-miss rate and introducing AVX2/AVX512 vectorization.
  • Led team to win 3 second prizes in 2020-2021 ASC student cluster competitions.
Research Assistant, System Developer
July 2018 - August 2019
Participated in the development of a new wireless multi-hop protocol.
  • Designed and implemented the routing system of BATS protocol.
  • Produced throughput and latency testings for tuning internal network parameters.
  • Implemented a multi-hop video streaming demo application on BATS protocol.
  • Assembled 16 roadside nodes with computation and communication ability.

Highlighted Projects

In this project, we designed and implemented an efficient multivariate quadratic boolean system solver. We proposes a new algorithm that employs Graycode iterating technique and Gaussian Elimination to solve the MQ problems and our method shows a lower computational complexty than all previous methods. Furthermore, our solver is well optimized utilizing AVX512 intrinsic set.
In this project, we present preimage attacks on 4-round Keccak-224/256 as well as 4-round Keccak[r = 640, c = 160, l = 80] in the preimage challenges. We revisit the Crossbred algorithm for solving the Boolean multivariate quadratic (MQ) system, propose a new view for the case D = 2 and elaborate the computational complexity. The result shows that the Crossbred algorithm outperforms brute force theoretically and practically with feasible memory costs. In our attacks, we construct Boolean MQ systems in order to make full use of variables. With the help of solving MQ systems, we successfully improve preimage attacks on Keccak-224/256 reduced to 4 rounds. Moreover, we implement the preimage attack on 4-round Keccak[r = 640, c = 160, l = 80], an instance in the Keccak preimage challenges, and find 78-bit matched near preimages.
In this project we propose a novel algorithm to optimize the trajectory and power control of multiple UAVs. This problem is basically a non-convex problem, thus we use an SCA-based algorithm to transform the original problem into a convex problem. Also, we proposed to use the ADMM algorithm to compute the trajectory in parallel.