Run GitHub Actions faster with cache for pipenv and docker build

Update 2021-11-29 Recently we create more PRs, notice that there are a lot of redundant steps (env setup before triggering checks, etc). Found out you can cache steps in GitHub Actions, so I did some research. Got it working and turns out I reduce at least 60% actions time for a large docker image build (since only the later RUN directives are changed more frequently). For pipenv it shaved off 1 minute 18 seconds....

November 9, 2021 · 1 min · Karn Wong

ecs-cli snippets

ecs-cli configure profile \ --access-key $KEY \ --secret-key $SECRET \ --profile-name $PROFILE ### launch mode: fargate ecs-cli configure \ --cluster $CLUSTER \ --default-launch-type FARGATE \ --config-name $NAME \ --region ap-southeast-1 ecs-cli up \ --cluster-config $NAME \ --vpc $VPCID\ --subnets $SUBNETID1, $SUBNETID2 ### launch mode: ec2 ecs-cli configure \ --cluster $CLUSTER \ --region ap-southeast-1 \ --default-launch-type EC2 \ --config-name $NAME ecs-cli up --keypair $KEYPAIR \ --extra-user-data userData....

October 8, 2021 · 1 min · Karn Wong

Self-hosting primer

Self-hosting is a practice for running and managing websites / services using your own server. Some people do this because they are concerned about their privacy, or some services are free if they host it themselves. Below are instructions for how to do self-hosting (also applies to hosting your own website too). Requirements Domain name Server (can be your own computer at home or VPS) Instructions Set up and secure the server (set up password, disable password login (which means you can only login via SSH key), etc....

August 22, 2021 · 2 min · Karn Wong

Python venv management

When you create a project in python, you should create requirements.txt to specify dependencies, so other people can have the same environment when using your project. However, if you don’t specify module versions in requirements.txt, you could end up with people using the wrong module version, where some APIs can be deprecated or have different behaviors than older versions. Another issue is that maybe you’re working on a few python projects, each uses different python versions (eg....

July 2, 2021 · 3 min · Karn Wong

Don't write large table to postgres with pandas

We have a few tables where the data size is > 3GB (in parquet, so around 10 GB uncompressed). Loading it into postgres takes an hour. (Most of our tables are pretty small, hence the reason why we don’t use columnar database). I want to explore whether there’s a faster way or not. The conclusion is writing to postgres with spark seems to be fastest, given we can’t use COPY since our data contain free text, which means it would make CSV parsing impossible....

June 27, 2021 · 1 min · Karn Wong