Summary
Overview
Work History
Education
Skills
Awards
Languages
References
Timeline
OfficeManager
Rakesh Raj Gopala Sai Krishnan

Rakesh Raj Gopala Sai Krishnan

Parramatta,Australia

Summary

Dynamic Senior Data Architect with a proven track record at Pending AI, specializing in high-performance cloud infrastructure and machine learning. Expert in Kubernetes and data architecture, I engineered petabyte-scale storage systems and pioneered trillion-vector similarity search, showcasing strong leadership and innovative problem-solving skills.

Overview

4
4
years of professional experience

Work History

Senior Data Architect

Pending AI
Sydney, Australia
06.2024 - Current
  • Architected and Deployed Self-Managed, High-Performance Infrastructure: Designed and implemented a fully self-managed compute, storage, and network infrastructure optimized for processing and storing massive chemical/biological datasets, supporting cutting-edge machine learning and quantum research initiatives.
  • Engineered Petabyte-Scale, Ultra-High-Throughput Storage: Architected and deployed a high-performance Ceph storage system, achieving 3 million IOPS, over 50 GiB/s network throughput with 100G networking and NVMe disks, managing 100+ million objects.
  • Pioneered Trillion-Vector Similarity Search: Led the implementation of the world’s first trillion-scale vector similarity batch search using Milvus, establishing a benchmark for large-scale AI/ML infrastructure and achieving one of the largest known searches in the field.
  • Architected Infrastructure for Massive Generative AI Inference: Designed and deployed an ultra-optimized inference stack for Generative AI LLMs, leveraging NVIDIA Multi-Instance GPUs (MIGs), facilitating the generation of 80 billion tokens per day, with a focus on maximizing throughput efficiency.
  • Developed Parallel Processing Pipelines: Architected multi-tenant ETL pipelines using Kubeflow and Istio, processing 1000+ concurrent jobs with built-in fault tolerance, ensuring continuous progress, eliminating single points of failure, and preventing cascading failures across the infrastructure.
  • Orchestrated High-Uptime Clusters: Administered Kubernetes and Ceph clusters, achieving 500+ days of uptime for mission-critical workloads.
  • Optimized Databases for Reliability: Managed MongoDB and Milvus with continuous uptime, ensuring peak performance, reliability, and security.
  • Comprehensive Monitoring and Alerting Systems: Implemented Grafana, Prometheus, and OpsGenie for real-time monitoring and alerting, ensuring proactive issue detection and resolution across all systems.
  • Implemented Load Balancing and High Availability: Utilized Nginx and Keepalived to implement load balancing and virtual IPs, ensuring high availability for all internal services deployed across multiple nodes.
  • Secured Infrastructure with HashiCorp Vault: Deployed Highly Available auto-unsealing HashiCorp Vault for robust authentication, authorization, and encryption.
  • Ensured Compliance & Business Continuity: Contributed to ISO & SOC2 compliance, designed backup strategies using MinIO, and conducted regular disaster recovery exercises.
  • Designed High-Volume Data Ingestion Pipelines: Architected and implemented pipelines capable of ingesting and processing 20+ TiB of data daily.
  • Global Cross-Functional Collaboration: Worked closely with research and business teams across the US, Europe, and Australia to align infrastructure with strategic goals.

Data Scientist

Pending AI
Sydney, Australia
06.2021 - 06.2024
  • Optimized ML Inference Performance: Deployed ML inference containers (TensorFlow Serving, NVIDIA Triton) to reduce latency, improve responsiveness, and optimize GPU resource usage for concurrent and batch requests.
  • Implemented Model Deployment Strategist: Utilized Istio service mesh to implement canary deployments, facilitating controlled, progressive model rollouts with traffic splitting, reducing deployment risks, and ensuring smooth, reliable transitions.
  • Engineered Scalable Infrastructure: Designed and maintained hybrid infrastructure across on-premise and cloud (AWS, GCP), focusing on security, low-latency, and high availability.
  • Developed Data Science Pipelines with Kubernetes and Kubeflow: Built scalable pipelines for efficient model deployment and lifecycle management, supporting future infrastructure growth.
  • Deep Learning for High-Dimensional, Imbalanced Classification Problems: Engineered and deployed a suite of deep LSTM neural networks to tackle a diverse range of classification challenges, from binary to extreme multi-class problems with over 400,000 predictors. Successfully addressed significant class imbalances, resulting in substantial improvements in model accuracy and predictive performance.
  • Enhanced Model Reliability: Led model updates and monitoring, addressing drift and improving prediction accuracy to deliver high-quality, data-driven solutions.

Junior Data Scientist

Pending AI
Sydney, Australia
10.2020 - 06.2021
  • Modernised Legacy Codebase: Led migration from Python 2.7 to 3.7, improving performance and compatibility.
  • Applied feature engineering: Performed exploratory data analysis to uncover insights from dataset and applied feature selection algorithms for better representation of data.
  • Enhanced Model Performance: Re-architected deep learning models from Keras/Theano to Keras/TensorFlow, implementing multi-GPU and mixed precision training to reduce training time and improve efficiency.
  • Developed and Deployed Applications: Built and validated Python Flask and FastAPI endpoints for data integrity, and used Docker for consistent deployment and runtime environments.

Education

Master of Data Science -

Swinburne University of Technology
Melbourne, Australia
06.2021

Bachelor of Engineering - Computer Science

Anna University
Chennai, India
08-2018

Skills

  • AWS
  • GCP
  • Bare metal
  • Kubernetes
  • EKS
  • GKE
  • Docker
  • Containerd
  • Ansible
  • Terraform
  • Python
  • JavaScript
  • YAML
  • CI/CD Pipelines
  • Argo workflow
  • ArgoCD
  • Kubeflow
  • Vertex AI
  • Istio
  • Linkerd
  • Ceph
  • MinIO
  • HashiCorp Vault
  • Prometheus
  • Grafana
  • Linux
  • Ubuntu
  • Rocky
  • RHEL
  • SQL
  • MySQL
  • Postgres
  • NoSQL
  • Mongo
  • Redis
  • VectorDB
  • Milvus
  • Dataframes
  • Pandas
  • Polars
  • Teleport
  • Git
  • Data architecture
  • Machine learning
  • Cloud infrastructure
  • ETL development
  • Capacity planning
  • Team leadership

Awards

  • Academic Excellence in Secondary Education, Ranked 3rd in the Higher Secondary Board Examination with a score of 97.6%, achieving top scores in Tamil (Regional Language), Science, Mathematics, and Social Studies.
  • Outstanding Academic Performance in Master of Data Science, Achieved top student ranking in both Semester 1 and Semester 2 during the Master of Data Science program at Swinburne University.

Languages

English
Proficient (C2)
C2
Tamil
Proficient (C2)
C2

References

References available upon request.

Timeline

Senior Data Architect

Pending AI
06.2024 - Current

Data Scientist

Pending AI
06.2021 - 06.2024

Junior Data Scientist

Pending AI
10.2020 - 06.2021

Master of Data Science -

Swinburne University of Technology

Bachelor of Engineering - Computer Science

Anna University
Rakesh Raj Gopala Sai Krishnan