We are seeking an exceptional Senior Applied Scientist specializing in ML Systems, training, and inference optimization to join DS3. This role requires deep expertise in performance engineering, kernel development, distributed systems optimization, and AI workload optimization across heterogeneous compute platforms. You will invent and implement novel optimization techniques that directly impact the performance and cost-efficiency of ML training and inference for AWS customers worldwide.
As a Senior Applied Scientist in DS3, you will work at the lowest levels of the software stack—writing custom CUDA kernels, optimizing PTX assembly, developing high-performance operators for GPUs and AWS Neuron, designing efficient communication patterns for multi-GPU and multi-node training, and inventing new algorithmic approaches to accelerate transformer models and emerging architectures. Your work will span from single-node inference optimization to large-scale distributed training systems, influencing the design of AWS training and inference services and setting new standards for ML systems performance across the industry.
Deep Science for Systems and Services (DS3) is a part of AWS Utility Computing (UC) which provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services.
Key job responsibilities
Systems-Level Scientific Innovation: Design and implement novel kernel-level optimizations for ML inference and training workloads, including custom CUDA kernels, PTX-level optimizations, and cross-platform acceleration for CUDA and AWS Neuron SDK.
Performance Engineering Leadership: Drive 2-10× performance improvements in latency, throughput, and memory efficiency for production ML inference & training systems through systematic profiling, analysis, and optimization.
Cross-Platform Optimization: Develop and port high-performance ML operators across GPUs, AWS Inferentia/Trainium, and emerging AI accelerators, ensuring optimal performance on each platform.
Product-Level Impact: Lead the design, implementation, and delivery of scientifically-complex optimization solutions that directly improve customer experience and reduce AWS operational costs at scale.
Scientific Rigor: Produce technical documentation and internal research reports demonstrating the correctness, efficiency, and scalability of your optimizations. Contribute to external publications when aligned with business needs.
Technical Leadership: Influence your team's technical direction and scientific roadmap. Build consensus across engineering and science teams on optimization strategies and architectural decisions.
Mentorship & Knowledge Sharing: Actively mentor junior scientists and engineers on performance engineering best practices, kernel development, and systems-level optimization techniques.
About the team
Deep Science for Systems and Services (DS3) is a science organization within AWS Compute & ML Services focused on advancing AI/ML technologies at the systems level. Our team works at the intersection of machine learning and high-performance computing, developing optimizations for large model inference across diverse hardware platforms. We push the boundaries of what's possible in ML inference performance, working directly with CUDA, AWS Neuron, and other low-level compute abstractions to deliver industry-leading latency, throughput, and cost-performance for AWS customers deploying AI at scale.
About AWS
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
Why AWS?
Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Inclusive Team Culture
AWS values curiosity and connection. Our employee-led and company-sponsored affinity groups promote inclusion and empower our people to take pride in what makes us unique. Our inclusion events foster stronger, more collaborative teams. Our continual innovation is fueled by the bold ideas, fresh perspectives, and passionate voices our teams bring to everything we do.
Mentorship & Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
- PhD in Computer Science, Computer Engineering, or related technical field, OR Master's degree with 8+ additional years of relevant research/industry experience.
- 5+ years of hands-on experience in performance optimization and systems programming for AI/ML workloads.
- Expert-level proficiency in CUDA programming and GPU architecture, with demonstrated ability to write high-performance custom kernels.
- Proven track record of delivering measurable performance improvements (2× or greater) in production systems.
- Strong C/C++ programming skills with experience in performance profiling tools such as NVIDIA Nsight, Linux Perf, or similar diagnostic frameworks.
- Experience optimizing inference and/or training for large language models (LLMs) and transformer-based architectures, including MoE models, at scale.
- Hands-on experience with AWS Neuron SDK, or other non-NVIDIA AI acceleration platforms.
- Track record of optimizing ML workloads across diverse hardware: embedded devices (ARM Cortex, DSPs, NPUs) and data center GPUs (NVIDIA Ampere/Hopper).
- Experience with low-level optimization techniques including assembly-level tuning (NVIDIA PTX, x86/ARM assembly) and cross-platform kernel porting.
- Experience leading performance optimization initiatives that resulted in significant cost savings or multi-million dollar business impact.
- Proven ability to mentor and train engineers in performance engineering and low-level optimization (5+ team members or workshop instruction).
- Entrepreneurial experience or track record of driving technical vision in startup, co-founder, or product development environments.
Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.
m/w/d
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.