HACKER Q&A
📣 thrwyyy3

How is data science done in the real world, esp outside big corporates?


Is "data science" outside large corporates just about using excel? Do they even call it so? If they do do it, how often and how exactly? The books I read about the subject seems to oversell the activity. IDK.


  👤 blakeburch Accepted Answer ✓
It's definitely not Excel work. It's primarily using Python or R to build statistical models to do things like:

- Predict the likelihood of a customer to churn

- Cluster customers into key segments for targeted messaging

- Analyze and distribute customer support requests based on their content

- Proactively identify bot or fraud activity

- Create multi-touch attribution models for marketing touchpoints

- Analyze multi-variate tests

I think the problem you're hinting at is that many organizations say they're doing data science, or that they need data science, without knowing exactly what the role entails. As a result, many organizations hire "Data Scientists" that are doing the work of a Data Engineer, Analyst, and BI Developer all combined.

I think 2015-2020 saw a huge surge in interest for Data Science, whereas 2020-2025 will have a surge in interest for Data Engineering.... mostly because organizations realized they can't do anything interesting with data until resilient, clean data sets are built for the organization.


👤 thisiswhatsup
Data Science is the intersection of programming and statistics. I think the defining characteristic of being a data scientist is being able to insert a statistical tool inside a software system. Putting data science models into production is actually a very small part of my job, but I think that is what makes me a data scientist. Most of my time is spent cleaning data, trying to understand data, and making presentations to win support from business stakeholders. These tasks are also done by data analysts, but an analyst wouldn't code up a web service to receive data and return predictions.

The main software tool that a data scientist uses is either Python or R. They may use Excel, but I would never hire a data scientist that wasn't able to program in one of these languages (or maybe something similar).


👤 superbcarrot
This is so broad that it can't really be answered in an HN comment. Data science can mean a bunch of different things in different companies and even separating them into small and big companies isn't that useful.

> Is "data science" outside large corporates just about using excel?

No, certainly not just excel.


👤 Jugurtha
Related to a reply[0] I wrote in another thread. Most of the content out there is produced by people who never touched client data, sadly. A lot of enthusiasting/influencing/audience building/blogging on Medium and Twitter by data virgins, essentially. Those who are in the field are sadly not writing about it. I'd love to read their content. The other type of content is from "architects" and "thought leaders" who come up with acronyms and 60,000ft view with workflows and methodologies that make perfect sense if you never worked in the field. They seem to think they're the first to think of using "DevOps" in machine learning and they give naive examples on how to do that. This gets circulated. Even those who talk about topics such as "production machine learning" have clearly only worked on Kaggle competitions, then extrapolate that into advice to be applicable on real world data.

Then there are companies that heavily use machine learning, but they have their own internal platforms which are not accessible to the general public.

There are other companies that try to build such platforms, but they come into the problem without having much real world experience and tackling a problem they don't themselves have. These products tend to focus on stylesheets and animations, or whiteboard an ML workflow/pipeline that makes sense to them (but they've never actually done a project, so it's similar to writing the Kama Sutra without actually having "done the deed"). They'll tend to bake in a rigid workflow or what they think a pipeline is in the real world into their product. You can predict these will fail, and what sucks more than reading their post-mortems when they shut down is them having the wrong lessons learned..

We're not a large corporation, just a tiny consultancy, but we help large organizations with machine learning, and we meet them wherever they are on the maturity spectrum. Some, especially those who didn't view data as core to their operations, are in the "what's this AI thing and how can it help us". Others actually have internal machine learning and data science teams, but need help because their team is either fully booked, or we have expertise on a particular topic they don't have, and we augment their team for that project. Yet others are a pure research entities in artificial intelligence (the "AI" Skunkworks of a large organization) who talk with us to do it more effectively and solve their problems.

The problems we work on are as diverse as our clients who are in different sectors and industries (energy, transporation, retail, banking, telecommunication, public relations, etc) and for different functions within the same organization when we do many projects with the same client, which is often.

- [0]: https://news.ycombinator.com/item?id=25871632