Spark on Kubernetes

Background For data processing tasks, there are different ways you can go about it: using SQL to leverage a database engine to perform data transformation dataframe-based frameworks such as pandas, ray, dask, polars big data processing frameworks such as spark Check out this article for more info on polars vs spark benchmark. The problem At larger data scale, other solutions (except spark) can work, but with a lot of vertical scaling, and this can get very expensive....

September 12, 2023 · 4 min · Karn Wong

Data Engineering Resources

Note: if you’ve seen the list elsewhere, it was probably me. I first posted this list on Data Engineering Discord and Data Engineer Cafe. Books Data fundamentals (good entrypoint) Fundamentals of Data Engineering - Joe Reis & Matt Housley Seven Databases in Seven Weeks - Luc Perkins & Eric Redmond & Jim Wilson Designing Data-Intensive Applications - Martin Kleppmann The Data Warehouse Toolkit - Ralph Kimball & Margy Ross Data Science for Business - Foster Provost & Tom Fawcett Practical Statistics for Data Scientists - Peter Gedeck & Peter Bruce & Andrew Bruce Software engineering Python Crash Course - Eric Matthes The Pragmatic Programmer - Andrew Hunt & David Thomas Platform Terraform: Up & Running - Yevgeniy Brikman Management Team Topologies - Matthew Skelton & Manuel Pais Radical Candor - Kim Scott Data Teams - Jesse Anderson Practical DataOps - Harvinder Atwal Resources https://brendanthompson....

September 9, 2023 · 1 min · Karn Wong

A Networking God Tale: All I Want is to Run a Speedtest Behind a Firewall

Imagine going to your client’s site to deploy a software. During the deployment process, you notice that the speed is atrociously slow. You have a suspicion that your client’s network bandwidth is the issue. To test this theory, you go to a speedtest website and run a test. Turns out you can’t because it’s blocked at the firewall level. Then you try another speedtest website, oops still got blocked. Then you try a few more, still no dice....

August 27, 2023 · 2 min · Karn Wong

Book Highlights - Build by Tony Fadell

Asshole assholes: They suck at work and everything else. These are the mean, jealous, insecure jerks who you’d avoid at a party, but who inevitably sit immediately next to you at the office. They cannot deliver, are deeply unproductive, so they do everything possible to deflect attention away from themselves. They will lie, craft gossip, and manipulate others to get people off their scent. The only good thing about these assholes is that they’re generally out the door pretty quickly—they can only deflect for so long before people start noticing that they bring zero value....

July 6, 2023 · 4 min · Karn Wong

Hassle-free Kubernetes monitoring with Coroot

Successfully deploying services are not the end, maintenance is coming to town! When you want to see how your system works, usually people rely on SaaS like Datadog or New Relic to do the heavy lifting. Also a lot of $$$ is required. Also with SaaS like these, usually you have to config your application to forward metrics/logs to your monitoring provider, this could mean a few months of engineering man-days....

June 9, 2023 · 2 min · Karn Wong