HACKER Q&A
📣 SCUSKU

Best tools for image processing pipeline at scale?


Hi HN,

I am a junior python dev. I have been looking for some sort of tool or framework that would support a large scale image processing pipeline. Currently, I built out a pipeline in pure Python and use Pillow, numpy, and other packages to download, manipulate, and upload images.

I have been looking into apache airflow, spark, kubeflow, and some other tools. None seem to really fit the bill of large scale image processing, but I would love any suggestions for tools or frameworks I should consider or reconsider.


  👤 blakeburch Accepted Answer ✓
I run Shipyard (www.shipyardapp.com) a workflow orchestration platform that's designed to handle the scale of processing and moving large data sets. It sounds like we'd easily fit the bill for what you're looking for.

We're not a workflow as code framework like some of those that you mentioned. We let you automate your existing code (upload, copy/paste, or sync with Github) in the cloud with built-in monitoring and alerts. Install any packages at runtime, run each script in a containerized fashion, and connect them all together with complex pathing. No code changes or infrastructure management required.

If you're interested, feel free to reach me at blake[at]shipyardapp.com. Would love to help you tackle your use case!