Shapefile to data lake
Background: we use spark to read/write to data lake. For dealing with spatial data & analysis, we use sedona. Shapefile is converted to TSV then read by spark for further processing & archival. Recently I had to archive shapefiles in our data lake. It wasn’t rosy for the following reasons: Invalid geometries Sedona (and geopandas too) whines if it encounters invalid geometry during geometry casting. The invalid geometries could be from many reasons, one of them being unclean polygon clipping....