Skills

Cloud Native

  • Docker
  • Kubernetes
  • Helm
  • K3S
  • Harbor
  • Grafana
  • Prometheus
  • Minio
  • MLflow

DevOps

  • Caddy
  • Terraform
  • GitHub Actions
  • Proxmox

Data / ML

  • Apache Spark
  • Apache Parquet
  • Polars / Pandas / DuckDB
  • Postgres / BigQuery
  • Scikit-learn / Spark MLib
  • DVC
  • H3 / QGIS
  • FastAPI / Pydantic / Pytest

Cloud

  • AWS
  • GCP
  • Digital Ocean
  • Cloudflare

Work Experience (6)

Jan 2023 - Current
Platform engineer
Data Cafe (Thailand)
https://www.datacafethailand.com/
  • Guide teams on how to better collaborate to achieve faster feedback loops and delivery

Oct 2022 - Dec 2023
Head of platform engineering
Baania
https://www.baania.com
  • Research and implement internal data platform v2, with CI/CD for integration and unit tests, dependencies and code changes

  • Use Terraform with Infracost to asses infrastructure cost to prune unused/underutilized resources, in turn reduce 33% of cloud cost per month (FinOps)

  • Design and implement end-to-end cloud-native application deployment (Terraform for infra/secrets, Helm for application deployment, GitHub Actions for CI/CD, Kubernetes for container orchestration)

  • Lead a major core product refactoring across ops, engineering and data (data engineering, data science and machine learning engineering). Affected areas: data pipelines, model training pipelines, real-time inference endpoints, database performance optimization, deployment pipelines, local development workflow

  • Lead a cloud migration from AWS to GCP (AWS ECS to GCP GKE)

  • Introduce H3 spatial indexing to improve spatial queries performance. This also allows any system to obtain the performance gain, since it is not tied to execution engine with spatial support

Jan 2022 - Sep 2022
Lead Data Engineer | Machine Learning Engineer | SRE
Baania
https://www.baania.com
  • Enable backend team to perform auto-deployment to ECS with Terraform and GitHub Actions

  • Manage repo permissions / secrets / webhooks with Terraform

  • Research BI solution to consolidate fragmented dashboard platforms into a single one

  • Setup Grafana for centralized metrics and logs monitoring

  • Introduce Sourcegraph to help with code search and refactoring

  • Setup GitOps for Terraform to enable collaboration between different teams, in turn reducing configuration drift

  • Architect end-to-end machine learning project involving reproducible data / model training pipelines, hyper-parameter tuning, CI/CD for inference API endpoint (MLOps). Real-time model performance dashboard and tracing via request id are also implemented via Grafana

Jan 2021 - Jan 2022
Senior Data Engineer
Baania
https://www.baania.com
  • Reduce development time for ETL pipelines from a week to 1 day via workflow redesign + codebase refactoring

  • Mentor data engineers

  • Consult other teams as a platform engineer

  • Set up alerts & monitoring to automatically notify task failures (ChatOps)

  • Create a script to automatically grant postgres access permission based on user groups, with option for special permissions per-user basis

  • Optimize a large spark pipeline that fails often due to OOM with divide-and-conquer method for unlimited scaling

  • Reduce runtime for PR code quality checks by 90% to shorten feedback loop cycle

  • Set up secrets management using SOPS/Terraform/AWS SSM, for improved security and secrets rotation

  • Launch Baania Engineering Blog, a platform to showcase how Baania does things behind the scenes

  • Reduce employee onboarding time per employee from a few days to 30 minutes via a setup script to setup necessary tools, applications and environment for development

  • Optimize a nearby POI lookup in PostGIS against 4 million rows with lochash, in turn reducing query runtime and resources requirements. Runtime is reduced from 10 minutes to 2 seconds

Apr 2018 - Jan 2021
Data Engineer
Baania
https://www.baania.com
  • Create and maintain data gathering infrastructure for daily ingestion and processing to be stored in data lake (S3)

  • Create and optimize machine learning models to achieve near-realtime performance

  • Create and maintain ETL pipelines via task orchestrator to reduce data update frequency from once a month to daily

  • Deploy and maintain ML via cloud services to reduce ML deployment time from a day to within minutes

  • Mentor data scientists

  • Automate infrastructure and governance using Terraform

  • Package cron services to AWS ECS and invoke via AWS ECS Task to cut down cost from 50 USD / year to 0.1 USD / year

Jan 2015 - Dec 2018
IT Support & SysAdmin
-

Projects (10)

self-hosted
Dec 2017 - Current
https://github.com/kahnwong/self-hosted/
  • docker
  • kubernetes
  • helm
  • terraform
  • Alternative / self-hosted version for popular subscription services: Netflix, Spotify, LastPass, Trello, Dropbox, NordVPN, etc

  • Managed via docker-compose, helm and terraform

  • Use terraform to manage Cloudflare and Kubernetes.

  • Use Caddy for reverse-proxy

docs
Jan 2020 - Current
https://docs.karnwong.me/
  • documentation
  • knowledge base
  • vitepress
  • Personal documentation website on various topics

nix
Nov 2022 - Current
https://www.karnwong.me/posts/2022/12/cross-platform-package-env-management-with-nix/
  • Nix
  • environment
  • dotfiles
  • package manager
  • A cross-platform setup script that works with both Linux and Mac

Proxmox VM Selector
Dec 2023 - Dec 2023
https://github.com/kahnwong/proxmox-vm-selector
  • proxmox
  • tui
  • A simple TUI to select which Proxmox VM to start/stop

Calculator
Mar 2024 - Mar 2024
https://calculator.karnwong.me/
  • quasar
  • static site
  • Meeting cost, GKE Autopilot and cloud cost calculator

pgconn
Aug 2023 - Aug 2023
https://github.com/kahnwong/pgconn
  • postgres
  • ssh tunneling
  • cli
  • pgcli wrapper to connect to PostgreSQL database specified in db.yaml. Proxy/tunnel connection is automatically created and killed when pgcli is exited

Vercel - Multi Branch Deployment
Jun 2023 - Jun 2023
https://github.com/kahnwong/vercel-multi-branch-deployment
  • terraform
  • github actions
  • vercel
  • Use GitHub Actions to deploy a frontend project from different branches (dev, uat, master), each with their own preview environment

Spark on Kubernetes
Sep 2023 - Sep 2023
https://www.karnwong.me/posts/2023/09/spark-on-kubernetes/
  • spark
  • kubernetes
  • minio
  • finops
  • devex
  • Run spark jobs on kubernetes, which can be used both locally and on production environment

Dataframe Frameworks Showdown
Apr 2023 - Apr 2023
https://www.karnwong.me/posts/2023/04/duckdb-vs-polars-vs-spark/
  • duckdb
  • polars
  • spark
  • dataframe
  • data engineering
  • Benchmark performance between duckdb, polars and spark. In addition to runtime, RAM usage is also provided

Impute Pipelines
Nov 2019 - Dec 2019
https://www.karnwong.me/posts/2020/05/impute-pipelines/
  • Machine Learning
  • data science
  • hyperparameter tuning
  • Use machine learning to fill in missing data

  • Utilize hyperparameter tuning to find the optimum parameters

Volunteer

Jan 2020 - Current
Moderator / Staff & Frequent Contributor
Data Engineering Discord
Jan 2021 - Current
Frequent Contributor
Data Science Discord
Jan 2021 - Current
Frequent Contributor
DevOps, SRE & Infrastructure Discord
Jan 2022 - Current
Frequent Contributor
Data Engineer Cafe

Education (1)

2015 - 2018
Bachelor
Information and Communication Technology
Rangsit University

Certificates

2023
HashiCorp Ambassador 2023
HashiCorp
2023
AWS Certified Solutions Architect – Associate
AWS
2022
Google Cloud Professional Cloud Architect
Google Cloud Platform
2023
AWS Community Builder 2023
AWS
2023
Dagster Essentials
Dagster Labs

Publications

1 Feb 2019

Languages

English

Native or bilingual proficiency

Thai

Native or bilingual proficiency

Interests

Humanities

  • Anthropology
  • Linguistics
  • Psychology
  • Sociology

Music

  • Symphonic Heavy Metal
  • Power Metal
  • Folk Metal
  • Classical Crossover
  • Operatic Pop

Technology

  • Platform Engineering
  • DevX
  • Kubernetes
  • DevOps / DataOps / MLOps
  • Scaling & Optimization