HACKER Q&A
📣 haggy

Labeling new datasets as a bootstrapped startup


Hi all. Im trying to validate and PoC an idea for a tool/product that centers around the complexities of local city traffic signage. The central idea is to simplify parking in cities by highlighting things on a map like "Street Cleaning Schedules", "Tow Zones with variable parking times", "Parking unavailable due to long-term construction" etc. This product will require some form of Data analysis and ML. The initial dataset I was planning to use was Google street views for larger cities and their satellite towns.

My question is, as a lean bootstrapped idea, what services are available to me in order to help label data that won't break the bank? This is not proprietary or overly complicated data but it can require several forms of labeling. Im thinking just the basics to start such as "Has street sign (yes/no)", "Street Cleaning sign (yes/no)" (if has a street sign is Yes), etc. Eventually I'll need to feed that labeled data into image processing pipelines that can extract what the signs actually enforce.

I know there are various companies out there like AWS Turk and others that employ teams to do this but Im not sure I want to sign an AWS contract before I've even validated the idea. Has anyone used these services before? If not, what are the alternatives?

All help is so much appreciated. Thanks in advance!


  👤 sixhobbits Accepted Answer ✓
There are a bunch of services that will let you do this.

At your stage, probably put up some notices in your local university offering an hourly $ amount (or beer/coffee) in return for some manual labour.

Also look at Figure Eight, CrowdAI, Eureka, etc. There are a lot of competitors in this space.

If you're looking for tools to help with this, look at https://prodi.gy/.

Amazon Turk doesn't require a contract I think. There are a lot of other freelancing platforms out there where you can find low-skilled labour.

Feel free to contact me (details in profile) to discuss more. I am exploring this area at the moment in any case.


👤 mtmail
> The initial dataset I was planning to use was Google street views

Does the Google Streetview license allow this? It could become a derived dataset of a licensed database.

https://www.mapillary.com/datasets is in a similar business. They have pre-labelled datasets but also work with developers and researchers to create more labels. There's https://www.mapillary.com/app/marketplace to submit tasks.


👤 codingdave
Almost every city and state DOT already has this data, because they create it when they place the signs. If your business revolves around having unique data, this is the wrong path. But if the business model is about what you do with the data, I'd skip all efforts to gather it yourself and just get it from your local jurisdictions.