Senior Performance Solution Engineer
Job Overview We are looking for a Senior Performance Solution Engineer to help promoting Arm’s success on data center and cloud. In this role, you will be responsible for analysing, measuring, and optimizing the performance of key data center workloads running on ARM64 platforms. The ideal candidate will have a good background in low-level performance profiling, operating systems internals, compiler interactions, CPU core micro-architecture and SoC architecture. Team Overview You will join an engineering team sits inside Arm’s infrastructure line of business which focuses on promoting Arm’s business on data center and cloud industry. You will be able to engage with Arm’s partners in this industry directly, the team aims to understand partner’s technology stack deeply and accurately and help making it runs better on Arm’s technology. This team is especially focused on partner which has proprietary technology stack that can’t be open sourced. Job Responsibilities
- Analyze and optimize the performance of workloads running on ARM64 platforms, spanning from C/C++ applications to dynamic language runtimes.
- Leverage operating system traces and application-level instrumentation to identify performance bottlenecks in user space, collaborating closely with partners to implement effective optimizations.
- Utilize hardware performance counters (including both core and UnCore PMUs) and hardware trace features to root-cause low-level performance issues across CPU pipelines, memory subsystems, I/O, and system interconnects. Present detailed analysis to inform software and hardware optimization strategies.
- Design and conduct performance benchmarks, profiling experiments, and diagnostic evaluations to assess and improve system behavior.
- Tune system configurations—including compiler flags, kernel parameters, scheduling policies, and runtime environments—to achieve optimal throughput, latency, or power efficiency.
- Solid understanding of workload performance analysis using profiling tools such as Linux perf, or equivalent tools on other operating systems.
- Familiarity with operating system internals, including context switching, interrupt handling, task scheduling, virtual memory, and NUMA architectures.
- Strong foundational knowledge of SoC architectures, particularly CPU clusters, interconnects, and memory subsystems.
- Experience with top-down performance analysis methodology, with the ability to drill down from application-level behavior to microarchitectural bottlenecks.
- Proficient in C/C++, with the ability to navigate and understand complex codebases and interpret compiler-generated assembly.
- Prior experience in performance analysis and optimization on ARM64 platforms
- Hands-on experience optimizing data center workloads, with a strong understanding of data center–specific performance challenges such as multi-core scalability, NUMA effects, and resource contention.