HACKER Q&A
📣 atif089

Storing Images in PostgreSQL vs. Object Storage for Large Datasets


Is it advisable to store images directly in PostgreSQL for a dataset of 100 million records, each with a 200KB image, or should I use object storage with references from the start? My primary and only use case involves creating multimodal embeddings for search and relevance purposes.


  👤 throwaway38375 Accepted Answer ✓
If you are storing 100 million images at 200KB each, that comes out at 20TB!

I would calculate the costs of something like S3 versus buying five 4TB HDDs and running a network file server.

You're going to save a ton of money hosting this yourself. I would go with two used powerful desktop PCs. One as a DB server and the other as the file server.

Store the images on the file server and store the image's path in the database server.


👤 speedgoose
You could do quick tests using bytea, toast, or large objects.

But an object store may be more convenient overall.

When I did something similar, I did store embeddings and the image UUID in a table and my images in an object store with the same UUIDs as filenames. It was simpler to upload the images and put them available through a CDN.


👤 reactor
Use something like SeaweedFS or Minio etc.