To me this feels a lot like the claims made by startups I've encountered in past tech-hype-cycles, such as IoT and blockchain. In both of these areas there seemed to be a pervasive sense of "if we build it, they will acquire".
The question I have for HN is in two parts:
1. Why is it that a lot of startups seem to be betting the farm, so to speak, on "machine learning" as their core value?
2. Is it reasonable to be highly skeptical of startups that make these claims, seeing it as a sign they have no real vision for their product?
[EDIT]: add missing pt2 of my question
Time to setup up scikit learn and tensorflow algorithms to make predictions: 4 hours. Time to setup python scripts which could parse through the excel spreadsheets, figure out which row corresponded to gross profit margin, and a few other "standard" metrics: ??? unknown because I gave up at about 80 hours trying to figure out rules to process all the different spreadsheets and how names were determined.
I had a professor who was doing machine learning + chemistry. He was building up his own personal database for machine learning. He spent ~5 years with about 500 computers building the database so that he would be able to do the actual the machine learning.
- take some valuable task that's never been successfully automated before
- do it manually (and expensively) for a while to acquire data
- build an automated system with some combination of regular software and ML models trained on the data
- now you can do a valuable task for free
- scale up and profit
The risk is that it's hard to guess how much data you'll need to train an accurate, automated model. Maybe it's very large, and you can't keep doing it manually long enough to get there. Maybe it's very small and lots of companies will automate the same task and you won't have any advantage.
I think there'll be some big successes with this model, and many failures. So be skeptical -- ML isn't a magic bullet. But if a team has a good handle on how they're going to automate something valuable, it can be a good bet.
As an investor, you may well face the situation down the line "We've burned through $10M doing it manually, and we're sure that with another $10M we can finish the automation." Then you have to make a hard decision. With some applications like self-driving cars, it might be $10B.
I think the answer is pretty simple: it sounds good, and it's hard to challenge. It's essentially a promise that they will invent a black box with magic inside. Since nobody can see inside the black box, it's hard to argue that there isn't actually magic in it.
The long-term problem, of course, is that there aren't that many actual magicians in the world. Most of the people who bill themselves as magicians are either people who just think they're magicians, or people who know they aren't but don't mind lying about it.
> This typically comes with assertions such as "data is the new oil" and "once we have our dataset and models the Big Tech shops will have no choice but to acquire us".
Maybe it's me, but I dislike the attitude to work to be acquired. Interestingly, this is a rift I see quite a bit if I interview more development oriented guys, or more infrastructure oriented guys. Tell me I'm wrong, but development oriented guys tend to be more fast paced and care less about long-term impacts. Infra inclined guys tend to be slower paced, but longer-term oriented. Build something to last and generate value for a long time.
> 2. Is it reasonable to be highly skeptical of startups that make these claims, seeing it as a sign they have no real vision for their product?
From my B2B experience over the last few years, and working towards a stable business relationship with large European enterprises, yes. My current workplace is moving into the position of becoming a cutting edge provider in our place of the world. This is a point where machine learning and AI becomes interesting.
However, we didn't get here by fancy models and AI. We got here by providing good customer support, rock-solid SaaS operation, delivering the right features, strong professional services, and none of those features were AI. It's been good, reliable grunt work.
Different forms of AI are currently becoming relevant to our customers, because we have customers that handle 5k tickets per day with our systems and they have 3-4 people just classifying tickets non-stop. We have customers with 30k - 40k articles in their knowledge base, partially redundant, partially conflicting.
This is why we entered a relationship with a university researching natural language processing among other - and they will provide us with a big selling point in the future. And they are profiting from this relationship as well, because they are getting large, real world data sets they couldn't access otherwise. Even with a good amount of pre-processing by the different product teams.
But as I maintain, nothing of that form has brought us where we are.
So, of the 4 outcomes:
(1) buy in now + worthless
(2) buy in now + 100x
(3) don't buy in + worthless
(4) don't buy in + 100x
(4) is a terrible position to be in, (2) is a good position to be in, and (1) only costs a little, and (3) is even. So when you are deciding to buy in or don't, you are deciding between good + little loss or terrible + even.
Google and Facebook use algorithms to try and minimize dollars spent on moderating their sites, with limited success. If they can avoid paying human beings to make judgement calls, they save billions of dollars a year.
It’s reasonable to be skeptical of startups who claim that they’ll use ML someday. Ask them how they’re using human labor today to perform the tasks that they’d like ML to do, and what their runway is for that work at their current burn rate. If they don’t deliver a viable ML labor reduction by that time, they will either collapse or worsen their service.
Consider this: a good deal of open datasets are stolen. People had their faces taken without consent. The data was tagged and labelled for pennies per image on some server farm. Wrapping it up in machine learning can effectively black-box what it is you're actually doing.
- data collection (frontend, APIs) - data storage (database, backend) - data visualization (dashboards, analytics, reports)
To bootstrap a startup, the primary people you need would be - a frontend person, a backend person, and a product person. In the beginning, you would be dreaming about the possibility of using ML but would not invest in hiring a data scientist at that stage. The second stage would be to hand over the reins to operations people and let them optimize the internal and external processes and get ready for the launch (growth). These phases can take-up any amount of time, from 2 to 5 years. Finally you handover the reins over to the salespeople and go back from product to services mode (especially true for enterprise software). During this stage, you would be hiring a data scientist and a team of data analysts. After, around 6-8 years of bootstrapping a company. Unless your product relies heavily on the ML algorithm, it can and would always wait.
Figuring out where to first use ML is another challenge. Hiring a data scientist and ask them to tell you what to do is a futile effort.
ML is a useful tech and if you have to keep thinking about utilizing this technology to improve your product and processes. It has to be a part of your long term efforts.
Call me a cynic, but I think the talk of building a data moat is mostly nonsense. Of the companies that make these claims, how many of them are actually hiring expensive data engineers to build the moat? If don't have a team working on it, then its a ruse.
Unless the startup is founded by people with deep experience in ML and have been working on using it extensively from day 0, it is unlikely that they will be able to deliver on this vision. They won't be collecting the data correctly if they are doing it at all. If they are collecting it correctly (they aren't), they won't be able to get it to a useable form. If they get that far (they won't), they then need to build out the ML Ops to deliver their models. Now, finally, they can `from tensorflow import *`. Engaging this process post hoc takes YEARS.
It seems AGI is the perpetual energy device or philosopher's stone of our era.
Enough so that I'd consider starting a hedge fund to short all the AGI companies :)
So if you say you're going the ML route your perceived value is much higher.
Some people probably talk about it without truly understanding it (it can take years before you have enough data to build good ML models, particularly if there are seasonal trends), but I certainly wouldn't judge a company poorly for seeing ML as an important long-term moat. I would judge a young company that sees it as a short-term moat - if you can acquire valuable data quickly, so can your competitors and it's not really an economy of scale that gives you defensibility.
"Ai with unique datasets is an amazing moat"
So in sum:
1) in some cases this is true (but most of the time there is no unique data) 2) if they truly have unique data this is something to take notice of.
So you're getting into a "We need to grow really fast to get more data than our competitors so we can add ML and create more value to the product so we can grow faster and get ahead of competitors".
I.e. ML is a competitive advantage which is often hard to come by for startups.
Machine learning == statistics
Deep learning == machine learning
Because it is the current overhyped thing, and there are enough dumb investors to fund it.