HACKER Q&A
📣 vcidev

Startup idea feedback – user-friendly AutoDataScience


I would love to know what people think of this idea. Here is the hypothesis/pain point:

ML and more broadly data science are very useful, but even some of the most recent "easy" data science tools (e.g. Google AutoML tables) have too high of a learning curve to be useful to the average consumer.

Normally if you were learning a new tool, you might learn through a combination of study, and trial and error. However, many people don't have a lot of time to sit down and learn something complex in this manner. (They need a bit of an extra push to minimize their error/guide them toward reasonable use cases.) The result is something like this:

1 - get excited to try something easy and get new value out of their data

2 - get frustrated because the tool is not easy enough, or they don't know what questions are answerable with the available algorithms

3 - search the internet for guidance on what the algorithms do, get overwhelmed

4 - abandon tool

Solution:

1 - send us your data (probably a spreadsheet/CSV/Excel file)

2 - we analyze the data, and send you a list of questions that we can answer/insights that we can derive

3 - you select which of the questions you want answered

4 - we run our analyses and send you the results, including an explanation of the algorithms that were used to derive the results

The key here is that the "learning" takes place after value is delivered to the user. Even though a tool may allow you to do things with the click of a button, the hidden complexity still presents a learning curve to the user.

Footnotes:

- I'm not claiming to have a large amount of data to back this up, hence why I said this is a "hypothesis". I'm offering the idea up for feedback and am interested in hearing what people say!

- This certainly does not apply to people who are used to self-directed learning and enjoy a healthy challenge


  👤 psv1 Accepted Answer ✓
> Solution:

> 1 - send us your data (probably a spreadsheet/CSV/Excel file)

> 2 - we analyze the data, and send you a list of questions that we can answer/insights that we can derive

> 3 - you select which of the questions you want answered

> 4 - we run our analyses and send you the results, including an explanation of the algorithms that were used to derive the results

This order of points isn't quite right. The overwhelming majority of companies will already have a question that needs answering or a problem that needs solving. They will then want to know which parts of which datasets are relevant. If the existing datasets aren't enough, they consider collecting and/or purchasing more. Knowing what to collect and/or buy is another problem. Then you need to set up systems for extracting any 'insight' from what they have and continuously managing and processing data from multiple sources, and so on and so on.

No one is really sitting with a single csv file open in front of them, thinking "Hmm if only someone would tell me what I can do with this".


👤 ian0
I think it's a great idea! However, I think you may find that success with this product will depend more on people being able to use your service - rather than people wanting to use your service.

In my case I would require (1) confidence in the security of our data, (2) some way to continue using the service without it being manual (eg latest months data is reflected somewhere I can log into), (3) where a model is being created a way for existing systems to interact with it via API.

PS Personally I love the insertion of #2 on the solution points. Yes I would have questions going in but would appreciate validation that they can be answered effectively and would appreciate a list of potential questions that I may have missed myself.


👤 seektable
Step (2) involves humans, isn't it? Because for simple cases this scenario is already covered with:

- Google Sheets suggests pivot tables / charts that can be built on your worksheet data

- with our BI tool (https://www.seektable.com) everyone can upload even rather large (up to 500mb) CSV file, then engine suggests dimensions/measures automatically and even suggests some typical reports (suggestions are very simple as for now, just set of heuristics rules based on CSV column names). More than this, for CSV file user can 'ask' data with search-like queries and get an answer in form of pivot table.


👤 eb0la
Good Idea? yes and no.

The biggest problem I find is makes expectations _high_ from the beginning, which is a bad idea.

Most data projects are about managing expectations.

Also, it is _very_ hard to demonstrate that your model works if the customer does not have (yet) some graphics to compare against.

Your first step should be _showing_ the data just to have a visual baselite to compare against.


👤 vincentinverso
Thanks so much for the feedback everyone. It was fun and engaging hearing your thoughts. Just wanted to let future readers know that I likely won't come back and read this thread any time soon, so that I can actually go and get some work done ;) Of course if you'd like to discuss without me, feel free..

👤 temp_dr
That sounds close to what Displayr does - https://displayr.com

👤 andreshb
You should launch it and find out, could be as simple as the wizard behind the curtain where you take their data and pass it through existing tools out there

👤 nudpiedo
isn't that what consulting is about?

👤 nikalras1
we should talk - hit me up on twitter @nikalras