Do the ML folks have any recommendations on tool they use to manage/organize/annotate/view large image sets?
Things like: - Visually and manually deleting bad images - Applying mass cropping/letterboxing/resizing - Annotating - Tagging, etc
I just built one last week by cutting and pasting code out of my YOShInOn RSS reader, which is based on the PAX stack, Python-ArangoDB-HTMX. Fraxinus has a bookmark manager that and a focused webcrawler that knows the markup of (now) a handful of sites so it can get images, metadata, text ,links, etc. Queues in Fraxinus are much simpler than in YOShInOn, there is no A.I. or M.L. in it yet. I am planning, however, to build a rather gold plated ‘tagging’ system which will let tags be positive, negative or indeterminate which would let an active learning system queue judgements on tags. I'd say that it also contains a 'personal data lake' in that crawled content goes into a repository which can be rapidly reprocessed when developing the data enrichment system.
I’ve collected 55k since Saturday, I’d expect no trouble at 10x the size, I’ve built them up to about 2M.