Karn Wong

Senior Data Engineer

Experience

Baania

01/2021 - Present

Senior Data Engineer

  • Reduce development time for ETL pipelines from a week to 1 day via workflow redesign + codebase refactoring
  • Mentor junior data engineers
  • Consult other teams as a platform engineer
  • Set up alerts & monitoring to automatically notify task failures
  • Create a script to automatically grant postgres access permission based on user groups, with option for special permissions per-user basis
  • Deploy internal services via ECS, routed to public internet via ELB & Route 53
  • Deploy internal data catalog for data platform
  • Perform internal infra & cloud cost audit

Baania

04/2018 - 01/2021

Data Engineer

  • Create and maintain data gathering infrastructure for daily ingestion and processing to be stored in data lake (S3)
  • Create and optimize machine learning models to achieve near-realtime performance
  • Create and maintain ETL pipelines via Dagster to reduce data update frequency from once a month to daily
  • Deploy and maintain ML via cloud services to reduce ML deployment time from a day to within minutes
  • Mentor junior data scientists
  • Infrastructure planning
  • Automate infrastructure and governance using Terraform
  • Package cron services to AWS ECS and invoke via AWS ECS Task to cut down cost from 50 USD / year to 0.1 USD / year

Open Source Projects

API
Documentation
  • Convert pydantic definitions from multiple sources into a single swagger docs.
CLI
  • Download podcasts with options for last n episodes or explicit range.
Docker
CalDAV
  • Convert todo.txt entries to calendar all-day event
  • Support both local file and webdav
  • Has docker support with cron-like feature
Docker
Reverse-proxy
DevOps
  • Hosted on a laptop from 2015 with 16GB RAM
  • Alternative / self-hosted version for popular subscription services: Netflix, Spotify, LastPass, Trello, Dropbox, NordVPN, etc.
  • Managed via docker-compose
  • Use Caddy for reverse-proxy
google-api
dashboard
calendar
docker
  • Lightweight with only python and webserver
  • Webpage refreshes every 30 minutes
data science
gis
visualization
nlp
  • Utilize NLP to group region name prefix/suffix
  • GIS visualization
machine learning
data science
nlp
visualization
  • Visualize lyrics trend using NLP
  • Use topic modeling to find common words per specified clusters
machine learning
data science
hyperparameter tuning
  • Use machine learning to fill in missing data
  • Utilize hyperparameter tuning to find the optimum parameters

Education

Rangsit University

08/2015 - 05/2018

Bachelor Information and Communication Technology