I'm now doing the NYC Data Science course (https://online.nycdatascience.com) but am getting concerned I may have missed the boat. What does HN think?
- The fundamental skills that you need are mathematics and software engineering. Depending on your background it might take years of additional studying.
- There is a big oversupply of people for the junior-mid level data science jobs. There are more people who want to get in the field than there are jobs. If you drastically switch careers, you'll take yourself out of a field where your skillset is incredibly rare and your competition is limited and put yourself in a place where everyone else wants the same job.
- The fact that you have a PhD is going to help you. Personally, I don't think that a PhD in a field other than mathematics/computer science is that relevant but employers tend to favor applicants with PhDs mostly because there are too many candidates for any given job and asking for a PhD is just a strong initial filter. There are also research jobs within data science for which a PhD requirement (in a relevant discipline) makes more sense but these are a small proportion of all the data science jobs.
- If you're already employed with your agriculture PhD, there must be a number of opportunities for you apply the techniques that you're currently learning wihout leaving the industry. That's probably the path that I would suggest - it would allow you to expand your skillset without taking big risks and you'll have more options in the future. Use the career capital that you already have and explore your options instead of making a sharp turn in your career direction that might leave you disappointed.
I'm interested in this space; I do some work with agricultural data acquisition hardware and software (e.g., soil moisture, environmental conditions, sap flow, plant/fruit growth monitoring), irrigation, fertiliser application) and I'm interested in ways this data could be used in predictive models, but I'm not at the stage of being able to focus on that aspect yet (still getting the core data logging/display tech working well, though we’re nearly there).
Feel free to get in touch (email in profile) if you'd like to connect and discuss.
To address your question, I think the world is still mostly at a very basic stage in its use of data analysis and statistics. Most of the talented people are employed on big salaries by a relatively small number large companies with huge budgets for specific applications (e.g., ad targeting, risk assessment, algorithmic trading).
But outside of that, not much is happening, so I think there are big opportunities to apply data science in new fields and make the benefits more widely distributed.
Company B - have no clue what ML or AI is and feeling the heat. They could be a multi million dollar company or a small SMB.
You will always find both these A & B atleast until ml and AI is well democratised. It is not, not even close. We are at the early stage of the curve still, but moving forward there will be rapid growth in the next 5-8years.
You have few options: 1. Start with sql. It’s not hard, join as an analyst and learn to code. Make sure the team or product you join deploys models. 2. Learn basic python and some orchestration tools (airflow, spark or aws/azure equivalent) . Join as data engineer along with basic sql skills.
What boat do you think you're catching besides a reliable middle class job?
You haven’t even missed the boat in terms of being able to make money off raw buzzwords and zero skills.
At the very basic technical level, there’s infinite work to be done optimising machine learning systems. This includes not just the fashionable issues of faster more accurate (or even less accurate in terms of floating point!) deep learning, but also moving Bayesian approaches like MCMC to multiple cores and GPUs.
There’s infinite work to be done on finding the right topology for a machine learning system. This applies not just to neural network layers but also to traditional stats (i.e. multiverse analysis).
There’s infinite work to be doing in understanding, cleaning and preparing datasets. Something as clunky as tidyverse can’t be close to the final form here. We’ve only just started talking about feature stores etc.
There’s infinite work to be done improving notebooks, integrating better software engineering practise into the workflow but also in terms of productionisation of models created therein.
All this is just platform stuff as well. It doesn’t even touch on the fact that businesses everywhere are terrible at formulating questions to be answered by stats, terrible at communicating those answers and terrible at even knowing this is a valuable endeavour in the first place.
I cannot imagine a boat harder to miss.
I find the data scientist label misleading.
Roughly 70% of of the data scientists I've encountered are actually Excel analysts with little experience outside of a Windows desktop bar Facebook on a Mac. They're unable to use basic software engineering tools such as git, vscode and python. Excel users and their managers are hostile to solutions that aren't excel-like. They will fist-fight you if you restrict them from downloading and exploring data on their computer. Few understand their compliance/legal obligations.
Another 20% are familiar with a wide range of tools - such as Matlab, R, Jupyter notebooks and various ML/AI toolsets. As developers they're unaware of the tech stack, short of "I installed ananconda and it doesn't work" but are happy to work in the cloud and learn new tech. They understand PII requirements and memory/cpu limits but don't always demonstrate the latter in practice. Nonetheless they produce the bulk of your analysis, having studied classification, and reasonably cost efficient if you pair with a SWE.
The final 10% have mastered containers, venvs, wheels, cloud sdks and how to configure their software in environment independent way. They require help to achieve production quality but are great self-starters. Given enough time and support they're able to quickly replicate this effort and teach others. As relative superstars they're in high demand which makes capacity planning difficult. This pushes up their premium.
IMO the best data scientists are 1 in 10. Because we're desperate for quality almost anyone can assume the title meaning the market open to new comers - you just need to be skilled in Excel (harder than it seems - most developers can learn a lot observing an analyst/consultant use Excel).
To answer your question: No - you're not too late. Just by posting here I expect you'll be in the top 30% - an asset in demand.
I believe it makes no sense that you discard the knowledge you already have. You can apply data science methods with the knowledge that you have going to work or creating a new company that works in agriculture using new methods.
Do you want to spend your life doing surveillance and spying on people like everybody else? This is fashionable but people are starting to resist and develop antibodies for it as they understand it more. The TV or the phone that I bought spying or me is not acceptable.
Agriculture will grow enormously in the future with things like LED and other methods to give energy to plants, or plankton or whatever. Drones controlling pests or humidity or temperature. Using natural insects predators for bio farming. Growing materials like cotton directly from cell's cultures.
The methods that are used today for growing marijuana indoors will be applied for more common things when prices go down.
Nobody better than you to identify the markets that will grow in the future. It is also a very good idea if you know (or associate with someone who knows) economics and marketing and selling.
Reason from First principles is extremely useful for identifying new waves that will carry you in the future with no effort. https://www.youtube.com/watch?v=NV3sBlRgzTI
All those things are hard problems. I have worked on those in the south of Spain and in Holland as engineer and entreprenour.
The real life is not Academia, your title means nothing if you can not apply it and give results, but means a lot if you can. So you will need some time to adapt to a different mentality.
On top of that, you have something incredibly valuable to a budding data scientist: domain expertise. Being able to manipulate data is great, but to most effectively solve real-world problems you have to understand (or communicate very closely with someone who understands) the main problems in a domain. I can't count the number of times I've heard scientists frustrated by their lack of data skills, and data scientists frustrated by some arcane domain fact that stymies their model production.
Far from being a liability (or just a sunk cost) your background in agriculture will make you extremely valuable as a data scientist.
Bonus point it is very easy to learn.
Advanced data analysis / machine learning isn’t dated or old-fashioned at all, and I guess will continue to stay (or: become even more) relevant at least another decade. Not all ships haven’t left the harbor.
What I've seen in more recent years with growing supply and maturation of departments is the need for specialization. Can you do hardcore statistics? Are you an ML practitioner? Are you a data architect? Basically, the blended role DS hacker is more commonly (and correctly IMO) relegated to a various analytical and strategy roles.
Honestly you haven't missed the boat and you don't need a formal education, but I would highly recommend having a depth of skills in one or more areas of data science generally with an example or two to back it up. Basic skills are just table stakes at this point.
1980- Guys I want to learn about micro-controllers, is it too late?
1990- GUI programming
2000- Linux, Internet, you name it
2010 -. Javascript
In 30 years time (at the very least) there will be still Data Science. So if you are really up to it, id does not matter if you should have started 5 years ago or now. If you suck at it or really dont like it, it would make no difference either.
For those with solid Math, Stat, Tech skills that require years to master: it is your time to shine
This is a bit too bitter and jaded (I'm not quite this pessimistic), but I think it needs to be said to counter a lot of the rosier advice.
Having a PhD in a non-CS field is a _massive_ negative in the eyes of potential employeers. Even if you're looking at moving into a role where your domain knowledge would be immensely relevant (lots of ag + remote sensing startups these days), you will be seen as underqualified compared to someone with only a BS. You'll be seen as underqualified compared to someone with no BS or a BS in an irrelevant field. You need to be strictly better at multiple roles than anyone they could hire to be considered, and even then you'll be expected to work for 1/3 to 1/2 of what they'd pay someone with only a BS and no experience. Folks usually despise domain experts because they see the role of their company as "disrupting" all of the prior knowledge. You represent what they want to replace and you're likely to disagree with them about key approaches.
You will be much more effective, but no one cares about how much you contribute to the company's bottom line. People only care about appearances.
The appearance companies want is a "self taught college dropout". That holds for data science and machine learning positions every bit as much as it does for developer positions, in my experience.
The upshot of this is that you likely know how to learn quickly. Pick up multiple additional skillsets.
You won't get hired because you're an expert in X field that the company needs + a data scientist. You'll get hired because you can throw together a crappy web app on short notice, or debug their crazy duct-tape-and-glue CI system, or save a lot on their AWS bill by switching some things around. You need those skills _on top of_ being a domain expert and a data scientist.
You have to be able to do more than anyone else they could possibly hire for the role to even be considered. Otherwise, you'll never overcome the fact that you have a PhD in their eyes.
Again, that's the bitter/jaded view. Take the above with a grain of salt.
PS: You might find the book Cloud Computing for Science and Engineering useful - https://cloud4scieng.org/
I see the future of data science benefiting from better software design, e.g. transformative frameworks like pytorch and sklearn, which are powerful tools but hardly fully automated. We'll continue to need skilled workers who are current in the latest software stacks.
It also benefits from what I'll call the "lotto effect", where data scientists will occasionally multiply the bottom line by 10x or more. This is of course rare, but companies will continue to chase that fantasy and hire data scientists because it's too tempting to ignore.
My only advice would be lean heavily into the software side of things. There's too many data scientists who are novice programmers.
It's a 'new kind of job' that's going to be here for the foreseeable future.
While competition might increase for jobs, the number of jobs is likely only grow over time.
Because I must confess that I don't see the immediate connection between an MSc and PhD in agriculture and a (assumingly generic) data scientist job. You should be perfectly capable of performing data science tasks in an agriculture context, but it seems to me you are asking for something different, a "pivot".
At the risk of sounding rude (for which I apologise in advance) are you asking simply whether there's still space on the current bandwagon? If so, I must advise you against it, because employment bandwagons are awful things to get on. Crodwed, badly paid, poorly understood, not that useful, scarcely productive- in short, short-term and not very fullfilling.
Is it just a matter of making lots and lots of money with the skills you clearly have and that you must have worked hard to acquire? Well then, there should be much, much better placements for you, outside the lab, in the sector you studied about.
My concern is that seeking to jump on the data science bandwagon right now will only flood industry with more and more semi-skilled, half-baked professionals, who don't really understand and don't really care to understand their subject matter, similar to what has happened with software development. The world is full of bad devs who are "passionate about javascript" or something like that and who are struggling to promote their personal brand because they have no other skills than the promotion of their personal brand. Don't allow yourself to fall that low.
Edit: in the interest of full discolosure, I'm a CS graduate with an MSc in data science and studying for a PhD in AI, but I'm not looking for data scientist jobs and am not interested in them, because I find them boring, unproductive and unfullfilling. I have actually worked as a (freelance) data scientist for a while.
I've personally spoken to a few large enterprises in the agricultural sector who are just beginning to build out their data science department. It seems like the industry as a whole is just getting started in data science.
Also, there are many promising startups that are emerging in space. E.g. Vertical farming
Ag, farmers, lawn owners are all errr... "ripe for disruption" as their genetic systems lag and only innovate at a the paltry rate of once per season.
There are many ways to play the field (no pun intended) from predicting and capitalizing on misfortune to minimizing the same. I hope some will be most interested in using our talents to mitigate crop failures, maximize sustainable nutrition or find the optimal low risk carbon sink.
"Data Smart: Using Data Science to Transform Information into Insight" by by John W. Foreman
https://www.amazon.com/Data-Smart-Science-Transform-Informat...
This is likely the quickest way to start.
Joking aside, this is a very good overview of data science and engineering for the year 2020:
Try to remember the very beginning of Data Science is often cleaning up data in Excel, and then learning to do it with excel functions.. and then learning to do the same in Python.
The market seems to be flooded with students with CS skills. Actually, people from all fields — mechanical engineering, computational biology, EE, math etc — are entering data science. It is worrisome.
The actual skills of light programming (R, Python), data literacy/manipulation and some basic modeling (statistical/machine learning, Bayesian methods, time series, etc) are useful for many job roles, and I think will be considered basic skills for college graduates in the near future. This isn't new - operations research, particularly Six Sigma and quality control, have used statistics and some light programming to solve business problems for decades.
By itself, I don't see Data Science evolving the way promised by schools and boot camps. Most of the positions named "Data Scientist" (at non-tech companies) are really just senior business analysts; I work with a group of them at my company and 90% of their day to day is just extracting and analyzing various reports for other managers and directors. When an interesting (and potentially lucrative) business problem does come along, they usually outsource to a specialized analytics firm and the data scientist helps coordinate that project. (If you have a good data dictionary, a clear outcome in mind, and some basic knowledge of the field it's relatively easy to outsource the advanced work.)
st1x7 had the best advice below -- learn the basic methods and then apply them to your field. If you google "agriculture iot research papers" you'll find tons of examples of people using sensors for data collection and then analyzing the data to improve some process.
TL;DR I see Data Science melting into other roles, but the basic skills/data literacy are useful for almost everyone.
Will data science get you out of the "lab"?
Maybe you can become a farmer, and iterate on patentable new agri-tech you will inevitably develop along the way; then sell that.
Also known as Linear Regression.