
Hi there, I'm Jiaru.
Also Claire
5+ years designing infrastructure for analytics, product, and AI teams - real-time streaming, cloud migration, Medallion architecture, governance, MLOps, and now productionising LLM pipelines.
I build pipelines that move, transform, and unlock data at scale. From raw event streams to production ML systems, I care about infrastructure that's fast, trusted, and ready for whatever comes next.
Experience
Where I've been
A timeline of the roles that shaped my engineering and analytical thinking.
GoGuardian
Los Angeles, CAData Engineer II
Feb 2023 – Present- •Led platform-wide migration from AWS ETL to Databricks Lakehouse, delivering ~$400K in annual cost savings
- •Designed scalable browsing data pipeline with Spark and Python, processing 60M - 800M records/day
- •Built streaming pipeline with AWS Kinesis, S3, and Spark to bring app event click data into the Lakehouse
- •Built reverse ETL integrations with HubSpot and Salesforce APIs to enable reliable data syncs and power automated business workflows
- •Drove data quality via dbt Medallion architecture models, reducing duplicate reporting by 40%
- •Productionized 5+ ML/LLM pipelines and implemented PII governance with Unity Catalog
- •Mentored junior engineers and led cross-team data platform initiatives
DatabricksSpark StreamingDelta LakedbtMLOpsUnity CatalogKinesis FirehoseMongoDB
Data Engineer I
Apr 2021 – Feb 2023- •Built ETL pipelines ingesting data from 30+ sources into AWS S3 data lake and Redshift warehouse
- •Established Airflow from scratch - custom operators, hooks, and DAGs across the full stack
- •Automated infrastructure provisioning with Terraform across multiple AWS environments
- •Designed customer usage reporting models, cutting CSM query time by 75–80%
AWS BatchAWS GlueS3RedshiftAirflowTerraform
Kroll Bond Rating Agency
New York, NYData Engineer Intern
Oct 2020 – Jan 2021- •Built CNN models in TensorFlow for financial time series anomaly detection on 50+ GiB datasets
- •Developed Python packages with full GitLab CI/CD - unit tests, static checks, and automated publishing
- •Containerized workloads with Docker and deployed across multiple Terraformed environments
- •Presented findings to 50+ engineers; recognized by leadership for pioneering engineering solutions
PythonTensorFlowPandasDockerGitLab CI/CDTerraform
Regatta Craft Mixers
New York, NYStudent Consultant
Jun – Jul 2020- •Researched 8 major competitors and market trends in the craft mixer space
- •Cleaned a full year of Facebook social media data in Python and visualized patterns in Tableau
- •Delivered a tiered marketing strategy for grocery store market entry
PythonTableau
Emerson
Saint Louis, MOStudent Consultant
Jan – May 2020- •Assessed data utilization across Emerson's $6B+ Commercial and Residential Solutions business unit
- •Standardized and prioritized marketing KPIs through interviews with Marketing and IT leaders
- •Designed data gap frameworks that reduced recurring reporting work by ~20%
- •Delivered a concrete implementation roadmap for the marketing team
Data AnalysisKPI Design
Education
Academic BackgroundCertifications
Credentials & CoursesCloud & Data Engineering
Databricks
Databricks Certified Data Engineer Associate
Jun 2024
Amazon Web Services
AWS Certified Cloud Practitioner
Apr 2026
Udemy
The Complete Hands-On Introduction to Apache Airflow
Jun 2021
AI & Machine Learning
DAIR.AI
Advanced AI Agents
Feb 2026
DAIR.AI
Introduction to RAG
Feb 2026
DAIR.AI
Prompt Engineering For Developers
Jan 2026
Coursera
Introduction to TensorFlow for AI, ML, and Deep Learning
May 2020
Analytics & Visualization
Udemy
Hands-On Tableau Training For Data Science
May 2020
Let's Connect
Say hello.
Always happy to talk about data engineering challenges, architecture decisions, AI systems, or just life.



