Karn Wong, Lead Data Engineer | Machine Learning Engineer | SRE
| Bangkholam, Bangkok, TH
EDUCATION
Rangsit University Aug 1, 2015 - May 1, 2018
Bachelor - Information and Communication Technology
SKILLS
Data Engineering: Data lake, Data modeling, Task orchestrator, Dagster, Spark
Data Science & Machine Learning: Pandas, Sklearn, Matplotlib, Seaborn, Spark MLib
Cloud: AWS, Digital Ocean
DevOps/SRE: Docker, GitHub Actions, Terraform
Other Skills: Linguistics, NLP, GIS, Web Scraping, Linux
EXPERIENCE
Baania | Lead Data Engineer | Machine Learning Engineer | SRE Jan 10, 2022 - Present
  • Enable backend team to perform auto-deployment to ECS with Terraform and GitHub Actions
  • Manage repo permissions / secrets / webhooks with Terraform
  • Research BI solution to consolidate fragmented dashboard platforms into a single one
  • Setup Grafana for centralized metrics and logs monitoring
Baania | Senior Data Engineer Jan 29, 2021 - Jan 10, 2022
  • Reduce development time for ETL pipelines from a week to 1 day via workflow redesign + codebase refactoring
  • Mentor data engineers
  • Consult other teams as a platform engineer
  • Set up alerts & monitoring to automatically notify task failures (ChatOps)
  • Create a script to automatically grant postgres access permission based on user groups, with option for special permissions per-user basis
  • Optimize a large spark pipeline that fails often due to OOM with divide-and-conquer method for unlimited scaling
  • Reduce runtime for PR code quality checks by 90% to reduce feedback loop cycle
  • Set up secrets management using SOPS/Terraform/AWS SSM, for improved security and secrets rotation
  • Launched Baania Engineering Blog, a platform to showcase how Baania does things behind the scenes
  • Reduce employee onboarding time per employee by a few days via a setup script to setup necessary tools, applications and environment for development
Baania | Data Engineer Apr 2, 2018 - Jan 29, 2021
  • Create and maintain data gathering infrastructure for daily ingestion and processing to be stored in data lake (S3)
  • Create and optimize machine learning models to achieve near-realtime performance
  • Create and maintain ETL pipelines via task orchestrator to reduce data update frequency from once a month to daily
  • Deploy and maintain ML via cloud services to reduce ML deployment time from a day to within minutes
  • Mentor data scientists
  • Automate infrastructure and governance using Terraform
  • Package cron services to AWS ECS and invoke via AWS ECS Task to cut down cost from 50 USD / year to 0.1 USD / year
PUBLICATIONS
Using Classification Technique for Customer Relationship Management based on Thai Social Media Data Feb 1, 2019
ICCAE
PROJECTS
setup-new-computer-script Mar 21, 2022 - Aug 13, 2022
https://github.com/kahnwong/setup-new-computer-script
  • Ansible playbook to set up a new Mac with required software and configurations.
terraform-sops-ssm Nov 24, 2021 - Nov 30, 2021
https://github.com/kahnwong/terraform-sops-ssm
  • Create SSM secrets from SOPS-encrypted secrets.
  • Create github-ci+lambda roles and users with access to SSM.
Self-hosting Dec 1, 2017 - Present
https://github.com/kahnwong/self-hosted/
  • Alternative / self-hosted version for popular subscription services: Netflix, Spotify, LastPass, Trello, Dropbox, NordVPN, etc.
  • Managed via docker-compose
  • Use Caddy for reverse-proxy
  • Use terraform to manage DNS via Cloudflare
Impute Pipelines Nov 28, 2019 - Dec 23, 2019
https://www.karnwong.me/posts/impute-pipelines/
  • Use machine learning to fill in missing data
  • Utilize hyperparameter tuning to find the optimum parameters
LANGUAGES
English (Native speaker) , Thai (Native speaker)
INTERESTS
Music [ Symphonic Heavy Metal , Folk Metal ] , Tea [ Jin Xuan oolong , Pu'er , Peppermint ] , Video Games [ FPS , Simulation , Action ] , Books [ History , Nordic Noir , Medicine , Linguistics ]