Ayman Mahfuz

Ayman Mahfuz

Computer Science & Mathematics Student | Aspiring Software Engineer & Machine Learning Researcher

About Me

I'm a curious engineer who loves turning bold AI ideas into products and papers that matter. At UT Austin, I spend my days debugging code, training models, and occasionally teaching robots to score goals—always chasing problems that blend rigorous research with hands-on engineering.

  • Patent-pending ML validation at Arm: Bayesian optimization that uncovers worst-case CPU & memory stress in <1 % of the search space.
  • RoboCup 2025 bronze: Multi-agent reinforcement-learning skills powering UT Austin Villa’s 7-v-7 robot soccer team.
  • 200 M-entry media pipeline: Built data/ML stack & fine-tuned BERT models (99 %+ accuracy) for UT’s Center for Media Engagement.
  • Pancreas MRI segmentation on H100s using computer vision and transformers: Engineered 3D medical-image pipeline, achieving +12 % Dice at the Oden Institute.
  • Youngest speaker, UT AI Health Symposium: Presented research on multi-agent LLM clinical reasoning.

Education

The University of Texas at Austin

Bachelor of Science

Location: Austin, TX, USA

Double Major: Computer Science, Mathematics

Minor: Business

Concentration: Artificial Intelligence and Machine Learning

Relevant Coursework:

  • Science of High Performance Computing
  • Data Structures
  • Computer Architecture and Organization
  • Computer Systems and Operating Systems
  • Algorithms
  • Linear Algebra
  • Probability

Skills

Programming Languages

Python Java C JavaScript HTML/CSS Ruby C++ PHP

Frontend Development

React.js Node.js HTML/CSS

Backend Development

Flask Django Node.js

Data Science & Machine Learning

Pandas NumPy Scikit-learn

Databases

SQL PostgreSQL

Tools & Libraries

Git AWS Google Cloud Platform

Miscellaneous

ARM64 MATLAB

📄 Resume

SWE Resume

Software Engineering (Full-Stack, Systems)

View

ML Research Resume

Machine Learning & Research positions

View

Tip: Choose based on the role type you're applying for

Current Roles

Where I'm making impact today

CURRENT

Arm

Platform Validation Intern → ML Research Engineer (Part-Time) | Austin, TX • May 2025 – Present

At Arm, I independently proposed and built an ML-guided framework to solve a 20-year-old challenge in hardware validation: automatically finding worst-case stress tests. My system, now deployed internally and pending a patent, uses Bayesian Optimization to intelligently discover configurations that push CPUs and memory to their absolute limits, proactively identifying hardware bottlenecks for the next-generation platforms essential for hyperscale AI.

  • Invented a novel dual-surrogate Bayesian Optimization pipeline using Random Forests to navigate the vast, non-linear search space of hardware parameters, a design uniquely suited to the noisy and expensive-to-acquire data from performance counters.
  • Achieved 99.8th-percentile hardware stress—far beyond manual or random attempts—by intelligently exploring less than 1% of the configuration space. This work automated over 10,000 hours of validation testing and located the true performance limits of the hardware.
  • Drove the project autonomously from concept to production, earning executive-level (SVP) recognition and a pending patent. The framework is now a key part of Arm's validation strategy, ensuring the reliability of hardware that will power future large-scale AI systems.
BRONZE MEDAL

University of Texas – AI Lab, Texas Robotics

Research Assistant | Austin, TX • Jan 2025 – Present

As a core member of UT's robotics team, I developed the AI-driven agent skills that helped secure a 3rd place victory at the international RoboCup 2025 competition. I championed a reinforcement learning approach in a massive 400K-line C++ codebase, building the high-performance policies for walking, dribbling, and attacking that were critical to our hybrid system's success in the 7v7 Standard Platform League.

  • Designed and trained the foundational agent skills using hierarchical reinforcement learning, creating RL-based policies for all attacker behaviors that proved more robust and effective than classical methods.
  • Dramatically accelerated the team's research cycle by slashing RL training time by over 200% (a ~70% reduction) through aggressive GPU optimization and deep C++ simulator tuning.
  • Pushed the boundaries of multi-agent RL by pioneering curriculum learning strategies and novel reward shaping (e.g., Pitch Control, xG), providing the research groundwork for the team's future goal of a fully autonomous, learning-based system.

Industry Progression

Building production systems & scaling impact

The Sunwater Institute

Data Engineer Intern | North Bethesda, MD • Jan 2025 – Present

I work on the Legis-1 Platform. I build high-performance data pipelines to support AI-driven policy research, processing legislative documents at scale to power structured retrieval and automated analysis. My contributions include developing large-scale LLM pipelines for AI-generated news and policy insights, leveraging retrieval-augmented generation (RAG) and embeddings to analyze 500K+ legal records. By optimizing storage efficiency and retrieval speed, I enhance the AI-readiness of structured legislative data.

  • Building high-performance data pipelines for Legis-1, a legislative database with millions of legal documents, optimizing retrieval speed, storage efficiency, and AI-readiness for structured data.
  • Developing LLM pipelines to power AI-generated news and policy analysis, leveraging retrieval-augmented generation (RAG), embeddings, and scalable document processing across 500K+ records.

Lockheed Martin

Software Engineer Intern | Remote • Jun 2022 – Oct 2022

At Lockheed Martin, I optimized CRM workflows, introduced RPA solutions, and cleaned up their Configuration Database to boost operational efficiency.

  • Spearheaded the development and optimization of Customer Relationship Management (CRM) workflows at Lockheed Martin, achieving a centralized device data framework that enhanced enterprise operational efficiency.
  • Engineered advanced CRM solutions by integrating JavaScript for flow enhancements and implementing Robotic Process Automation (RPA), streamlining the data de-duplication process and elevating data integrity.
  • Administered and refined the Configuration Database, successfully purging redundant records and bolstering data accuracy. Synthesized and presented data-driven insights to executives, highlighting the tangible impact on operational efficiency and guiding strategic decisions.

City of Austin

Software Engineer Intern | Austin, TX • Jun 2021 – Aug 2021

I helped Austin's post-COVID recovery by improving loan processing workflows for small businesses. My work included Python scripting and data visualization to streamline operations.

AT&T

Summer Learning Academy | Austin, TX • Jun 2021 – Aug 2021

As the youngest participant, I gained exposure to AI, business strategies, and professional development while collaborating on tech-focused initiatives with industry leaders.

Research Portfolio

Deep technical projects across AI & ML

University of Texas – Center for Media Engagement

Software Engineer, Research Assistant | Austin, TX • Sep 2023 – Present

I conduct media research by studying how people interact with news, platforms, and each other, by designing systems and using machine learning to evaluate political opinions, storytelling patterns, and societal divides. My contributions include building a 150-million-entry dataset, developing DistilBERT models for large-scale analysis, and designing the full system architecture for MTurk-integrated React games to track and analyze user behavior.

  • Engineered large-scale data pipelines to scrape, preprocess, and upload 50M+ news articles and 70M+ comments to BigQuery, using APIs, sitemaps, and Pandas, while developing dashboards with Python and SQL for real-time monitoring.
  • Led machine learning initiatives by fine-tuning multiple BERT models for key NLP tasks—including clickbait detection, story identification, entity recognition, and sentiment analysis—achieving up to 99% accuracy.
  • Designed & deployed a research platform independently with React, Flask, and Firebase, featuring 3 interactive games, MTurk integration, real-time analytics tracking 15+ metrics, and 99.99% uptime, serving 1,000+ participants.

University of Texas - Oden Institute for Computational Engineering and Sciences

Machine Learning Engineer, Research Assistant | Austin, TX • Feb 2024 – Jan 2025

At the Oden Institute, I led efforts to build a scalable deep learning pipeline for medical image segmentation, training across H100 GPUs and handling over a thousand MRI scans. Focused on pushing the limits of model performance and engineering reliability at high performance scale, I worked hands-on with transformers, CNNs, and hybrid architectures to advance pancreas segmentation research.

  • Engineered a high-throughput, containerized pipeline for 3D pancreas MRI segmentation on TACC's H100 supercomputer using Apptainer, SLURM, CNNs, and transformers; achieved a +12% Dice gain, matching SOTA segmentation performance.
  • Benchmarked CNNs, vision transformers, and hybrid models across 1000+ scans with 5-fold cross-validation, finding hybrids (e.g., PanSegNet) excel at small-organ segmentation, consistent with medical imaging literature.
  • Diagnosed and resolved GPU memory bottlenecks, I/O lag, and mixed precision instability to enable stable, scalable training across large-volume datasets and architectures.

University of Texas - School of Information

Research Assistant | Austin, TX • Feb 2024 – Jan 2025

I designed and led a research project studying the consistency and reasoning behaviors of multiagent large language models (LLMs) in medical diagnosis, working at the intersection of AI, healthcare, and system design. Alongside building the framework itself, I explored how factors like demographics and misleading symptoms influenced multiagent collaboration, aiming to push toward more reliable AI-driven clinical reasoning.

  • My project is on page 124 of this report.
  • Explored multiagent LLM reasoning paradigms, introducing contextual variations such as patient demographics, symptom modifications, and misleading details to rigorously test consistency in diagnostic reasoning. Developed methods to systematically analyze the influence of inter-agent communication on diagnostic reliability, uncovering patterns in how agents resolve conflicts and refine responses over iterative interactions.
  • Conducted advanced evaluations of multiagent LLM performance using machine learning-driven analyses and statistical frameworks. Applied metrics such as Cohen's Kappa, Chi-square tests, and logistic regression to assess agreement, accuracy, and bias across agents. Enhanced insights through correlation analyses, identifying key factors affecting reasoning consistency and robustness. Presented my independently designed research project at the UT AI Health Conference, standing among the youngest presenters at the event. Shared findings on multiagent LLM diagnostic consistency through a technical poster, synthesizing engineering, experimentation, and statistical evaluation.

University of Maryland – College Park

Software Engineering and Research Intern • Remote • Jun 2023 – Present

I helped build an NLP-based chatbot to engage news readers and analyzed linguistic patterns to enhance the interaction. My contributions focus on Python scripting and publishing insights at CHI 2024.

  • Project: Towards Designing a Question-Answering Chatbot for Online News
  • Led the development of an NLP-driven chatbot to augment online news reader engagement, employing deep learning and AI techniques. Directed comprehensive studies and analyses, culminating in findings on human-chatbot interaction dynamics.
  • Executed sophisticated text analytics and data labeling using Python, encompassing Parts of Speech Tagging, LIWC, and clustering on sentence embeddings, to derive intricate linguistic patterns and insights.
  • Collaborated with a cross-disciplinary team of professors and graduate students, driving content creation and ensuring methodological precision. Contributed core analytical insights and visualizations to the research paper published at the CHI 2024 conference.
  • Authored Python scripts for in-depth data analysis, generating insightful graphs and visualizations that formed the backbone of the research findings, illustrating complex human-chatbot interaction patterns.

Projects

Inkwell: YouTube for Books

A dynamic book-sharing platform that allows users to explore and share books freely while empowering authors to earn more by bypassing traditional publishers.

Tech Stack: React, PostgreSQL, Django, AWS S3, Django Rest Framework, Vercel

Click the title to visit the live site.

InReach AI

An intelligent networking tool that allows users to find and email professionals that align with their career interests and automatically drafts and sends emails to them using the user's resume, their intent, and the reciever's experiences. It has 200+ users on it right now.

Tech Stack: Flask, React, Tailwind, OpenAI API, Vercel, Render

Click the title to visit the live site.

LeetCode Matchmaker

A web application that helps users discover LeetCode problems similar to a given one, aiding in interview preparation.

Tech Stack: React, Flask, Scikit-learn, PostgreSQL

Click the title to visit the live site.

CodeXray

Code X-Ray is an interactive, graph-powered codebase explorer that lets developers instantly understand large monorepos—without reading thousands of lines first. By combining static analysis with GPT-4o, it visualizes architecture, highlights complexity hotspots, and answers deep code questions like “what breaks if I touch this?” right inside your IDE or browser.

Tech Stack: React, Tailwind, OpenAI API, Vercel, Render, AST parsing

Click the title to visit the live site.

System Emulator (C)

A low-level system emulator capable of simulating basic operations like instruction execution, memory management, and I/O handling.

Tech Stack: C

Pintos Operating System

Developed an operating system kernel, implementing core functionalities such as thread scheduling, synchronization, and virtual memory management.

Tech Stack: C

Hobbies & Interests

Weightlifting

I love spending time in the gym and hitting new maxes

Soccer

I've been playing soccer since I could walk. If I'm not working, you can find me on the nearest field!

Family and Friends

I cherish spending quality time with friends and family, whether it's a casual hangout or a special gathering.

Startups

I spend a lot of time working on my startups. Outside of coding, it's asking people I'm close with for advice and feedback on my current projects.