What does your BI stack look like?

Question

How does your company visualize and report on what&rsquo;s happening in the business?

seveibar · Accepted Answer

Postgres -> MetabaseI believe this is the best combination of cheap/powerful for early-stage startups. My very non-technical cofounder is able to use metabase's simple GUI interface to create graphs/insights (even joining and aggregating across tables!), and for anything complex I can step in a give a helper SQL query. We have around 10M records we aggregate around for daily insights.Metabase can also be run as a pseudo-desktop/web application to save additional cash (we don't do this though).

dwl285 · Answer

If you have a data team: Stitch / Segment -> BigQuery -> Dataform -> BigQuery -> Looker
I work with many companies helping them set up their data stack, and from what I've seen this is pretty much the optimal set up. These tools all require very little maintenance, are relatively cheap (compared to the man power required to set these things up from scratch internally), and scale well as companies grow and complexity increases.
If you don't have a data team: Segment -> Amplitude/Mixpanel
If you don't have a dedicated data team, you can do a lot with tools like Amplitude and Mixpanel. Get your engineering team set up with Segment and tracking all of the important interactions within your product, and set Segment to pass those events to Amp/Mix. Self serve analytics in these tools is then pretty easy and will serve you well until you start hiring a data team.
Full disclosure: I work for Dataform (used to be a customer, loved it so much I decided to join them)

rajivayyangar · Answer

As a data scientist (startup / Yahoo) turned product manager (4 startups), I've used a variety of stacks in the past - from plain SQL, to Mode, to Mixpanel, Looker, Interana, and Hive.
Recently we started using PopSQL (https://popsql.com) and love it.
When I don't have a dedicated data team, my philosophy is:
1) Make it difficult to get wrong answers
- Don't use Google Analytics. It's too easy to generate incorrect charts, and too difficult to verify them.
- Have a limited sandbox of reports for non-SQL writers
- Keep the SQL close to the report, so it's easy to verify the underlying query.
- Push people to learn even basic SQL
2) Make it quick and easy to ask iterative questions - PopSQL is way faster than Mode. Like 20x faster.
3) For metrics that matter (e.g. KPIs), instrument them directly and even build a custom analytics dashboard if it's important. (beware dashboard clutter! https://twitter.com/andrewchen/status/1193619877489192961 )

numlocked · Answer

At Grove, we are:Airflow -> S3 -> DBT with Spark/EMR or Redshift/Spectrum -> Redshift data marts -> LookerAt least, that&rsquo;s the way we like our pipelines to work. In practice we have a couple of extractions that land directly in Redshift (we extract Zendesk data, for instance, with Stitch Data). We use Snowplow for click stream analytics. And we&rsquo;ll likely move from Redshift to Snowflake (or mayybbbeee Bigtable) in Q2 of 2020.We used to do all of our transforms via DBT within Redshift but have been offloading the heavier-duty pieces (like Snowplow event processing) to Spark jobs because they were too taxing on Redshift.We&rsquo;ve gone through 3 &ldquo;generations&rdquo; of Looker reporting &mdash; gen 1 was just wrapping LookML around our schema and forcing Looker to do the joins and generate SQL for everything we wanted to know. Second generation involved a bunch of common table expressions within Looker itself that worked, but were developed without much thought as to data mart design. Gen 3 is where we are now with super deliberate scoping and implementation of warehouses in DBT. Before any of that we rolled our own tool [0].Happy to answer any questions. We&rsquo;ve gone from ~30 to ~1000 employees in the 3 years since we started using Looker and have learned a thing or two along the way.[0] https://github.com/groveco/django-sql-explorer

vitorbaptistaa · Answer

Luigi, AWS S3, DBT, Snowflake and Re:dash (currently analyzing Metabase or Looker to allow queries without SQL)
Luigi runs our scrapers and other workflow management tasks (e.g. DB backups).
All raw data lives in S3. We make an effort to be able to recreate the whole data warehouse from the raw data, so if any cleaning/normalization process fails, we have this safety net. I'm curious to hear if others use a similar pattern, or if there are better options.
DBT handles both loading the data from S3 into Snowflake (by creating the Snowflake Stages), and transforming the data in Snowflake. This isn't how DBT is used usually, but it felt wasteful to add Stitch or another tool to load the data into Snowflake, as snowflake supports it out of the box. I also created a `setup_snowflake` macro operation that creates our users, warehouses, databases, etc., in Snowflake (a kind of "poor man's Terraform")
I don't think Snowflake requires introduction. It's an amazing tool. We used Postgres before, but Snowflake is much much better, even though our DB is pretty small (~200 GB).
Finally, we use Re:dash as a BI, but I'm checking other options that allow usage without SQL (currently Metabase and Looker).

pcarolan · Answer

Looker on top of Redshift. Events streamed in from Segment and ELT transforms managed by Airflow. Looker gives you nice visualizations and charting with RBAC and some some lightweight ETL functionality. The real advantage of Looker is their modeling layer which sits on top of your physical data and is done via a DSL called LookML. Source control is managed via a wrapper around git. The end result is that analysts can move lightning fast building their own models in SQL or do analysis via drag and drop explorations. Looker's customer support is the best I've experienced and hasn't changed since Google acquired Looker. We're likely moving off Redshift to Snowflake in the next 6 months because it is slow to pick up changes and we want to decouple storage and compute for scaling reasons. Airflow is an incredible platform but needs customization and templated workflows to make it great. Data build tool (DBT) is how we plan on managing ETL code via Airflow in the near future. We're also adding Spark, but more for data science.

thecodemonkey · Answer

We're a small, bootstrapped company with 2 people. Some data is millions of rows others are billions.
Goal for us is KISS. Keeping everything as simple as possible -- both in terms of infrastructure, ease of use and cost.
Primary we're using Metabase in front of multiple MariaDB instances. Metabase is absolutely fantastic!
We also have a couple of additional small projects:
- A small Node.js app that receives events live via fluentbit, saves them off in MariaDB and sends text message notifications via Twilio when certain thresholds are exceeded
- A small "ETL" app that polls the Stripe and QuickBooks API to mirror data in a MariaDB database so we can easily access it from Metabase

veritas3241 · Answer

Stitch/Fivetran/Custom -> Snowflake -> dbt -> Periscope
Lots more documentation here https://about.gitlab.com/handbook/business-ops/data-team/
We have many of our KPIs embedded in the handbook (look for KPI index mapping link).
This is also our primary project where you can see all of our dbt code https://gitlab.com/gitlab-data/analytics/

crustacean · Answer

Sorry to be that one avoids-the-question comment, but:
Without design sessions to figure out your data store design (look up Kimball, Immon), and then monitoring/testing to make sure everything is running smoothly, any data stack will be hell.
Badly designed data brings fatigue and rancor and unnecessary work and is a huge hard-to-quantify money suck.

grahamdietz · Answer

We use Snowplow directly into Snowflake and report on this using Mode. We are a data native SaaS firm, and we set things up like this about 5 years ago and it has served us well. Streamlined and high performance. For all other sources, we use Stitch -> Snowflake, with one semi-custom Singer.io tap also running on Stitch. All this was simple to set up and means we don't have to worry about getting the data into one place. Of course, we then have lots of work to do in Snowflake and Mode to get the data the business needs. We share the reports from Mode to various teas via Slack. Hope this helps.

aaronharnly · Answer

At Amplify, Matillion/Stitch/Fivetran/Custom -> S3 + Snowflake -> Matillion for transforms -> Looker + Tableau.
It's important to say out loud that a lot of analysis also happens within product-local reporting systems, or in "offline" Google sheets.
We are currently working on building out the same stack, terminating in a Powered By Looker instance, for customer-facing aggregate reporting.
The engineering and data science teams do great with Looker, but Tableau still covers use cases for non-engineer business people (think: the finance, customer operations, training, etc) who want to start from source data plus a few sidebar Google sheets, tinker around, and then develop a rough report without waiting for iterative cycles with the data engineering team. We're thinking hard about how to get those use cases into the warehouse + Looker faster.

nm2259 · Answer

Data infrastructure is scattered, siloed, excel sheets and google sheets stored in various places (personal g drives, company g drives, some network share somewhere, an ms sharepoint site, sometimes in development git repos or various wikis)
Reporting infrastructure is manual massaging and lots of powerpoint.
My company isn't that large, but bigger than you'd think for such a "system".

thunderbong · Answer

After reading the comments here, I decided to give Metabase[0] a shot. Awesome analysis tool. Very impressed. Works even with large SQLite databases.[0]:https://www.metabase.com/

pantene · Answer

Data warehouse:- Amazon Redshift (data sourced from many different backend DBs; e.g. PostgreSQL, MySQL, other Redshift instances etc.)- BigQuery (Some teams store the data for their projects here. For reporting, they're selectively transferred to Redshift.)Reports:- Tableau (extracts of pre-aggregated data from Redshift)- Looker (connects live to and executes its own queries on Redshift)Anything that is based on pre-aggregated (rather small, e.g. n_rows < 5mil) data is visualized on Tableau. If users want to work on a self-service tool they use Looker which can connect to (cleaned) raw data optimized for this use case.ETLs for raw data run daily on Redshift. Reports are also scheduled on Redshift (with Airflow) but the refresh interval is report-specific.

lifeslogit · Answer

We keep it very simple as well.Postgres read-replica for cheap realtime stuff, Redshift for the larger sets, Airflow to manage transfers, and Metabase to visualize and share. We also collect data from a bunch of APIs, but those each run via their own job and store in Postgres.We also try to define wide short tables in Redshift that enable users to use Metabase to ask questions.I was very happy with Metabase. Being that we can't afford Looker right now (but we would all love to) it is pretty solid.

mickeyben · Answer

Stitch/Airflow/Other -> Snowflake -> dbt -> Snowflake
Everyhting goes through S3 because Snowflake storage is on it.
dbt is amazing, we began using it a month ago and it already transformed the way our data team work. It really is a value multiplier for everyone. Data engineers are happier because they don't need to write and maintain data transformations, analysts are happier because they can maintain their own SQL pipelines & the whole company is happier because we now have a great documentation tool to explore our data.
We also are big fans of Snowflake, make operating a data warehouse a breeze.
Then, we use a mix of Redash & Tableau for reporting.
Redash for static reporting (open to the whole company) & Tableau to create more complex data tools we expose to some internal teams; Marketing, Risk, Finance ...

asati · Answer

Postgres (read replica) -> RedashSegment -> Amplitude (but using it less and less)I am surprised no one mentioned https://redash.io/ till now (a lot cheaper than looker/mode/Periscope with all the basic functionality that you might need).

buremba · Answer

We use Segment for event tracking, Postgresql for transactional data and a number of spreadsheets and third-party integrations with Stitchdata. Since our data is relatively small, we use PG as a data-warehouse and heavily use DBT for ETL. The people who are familiar with SQL just use DataGrip, for the UI we use our tool https://rakam.io.Shameless plug: It's basically similar to Looker but it plays well with the product data and integration with Segment as well.

sejtnjir · Answer

Informatica -> ADLS -> SQL Server -> PowerBI also, in the same department: NiFi -> HDFS -> Spark -> Hive -> NiFi -> ADLS -> PowerBI and: NiFi -> Azure Event Hubs -> Azure container instances -> Event Hubs -> Streaming Analytics -> PowerBII'm pretty fond of the last stack for streaming dashboards in the sensor data realm.

chrisjc · Answer

-> Kafka-connect -> Snowflake -> SQL/sf-tasks -> Snowflake -> Looker -> Alooma -> -> custom -> Using Kafka-connect, we're able to serve up near real-time (2-5 mins) insights on device generated events.We probably need to use some kind of ETL tool to replace custom SQL and sf-tasks. Unfortunately, we haven't been able to find a tool that handles this in a non-batch (even if it's micro-batching) form. Snowflake change-streams and tasks allows us to ETL in a streaming-like fashion.We're ingesting everything from raw/transformed/aggregated events, micro-service DBs (as fast as they sprout up), netsuite/salesforce, mixpanel, MySQL, MongoDB... Billions of rows of data across multiple data-source accessible to internal and external customer in a matter of seconds. It's been an incredible challenge, especially with only a team of 2-5 people.

kfk · Answer

Python, Redshift, Tableau. But if you are starting from scratch I&rsquo;d suggest to focus on the etl piece with python and send pdf reports generated with latex. Too many people get distracted by the fancy reporting stuff and don&rsquo;t do the 2 things that matter: good etl with good sql db; analytics that is tied to results the company cares about

iblaine · Answer

It's interesting seeing the various stacks being used...Here at One Medical:
[ onemedical.com, mixpanel, Google Sheets, Salesforce, etc ] -> S3 (Amazon EMR) -> [ Tableau, Aurora MySQL ]
It's a nice & clean stack for data engineering.
Airflow is used for orchestration and is heavily customized with plugins. Zeppelin notebooks are used by most analysts.
We'll probably be replacing Aurora MySQL w/an MPP database like Snowflake or Redshift. MySQL is a crutch, but easy to deploy and manage to a point.
Several python frameworks also do various ETL & data science functions, but everything generally revolves around S3 & Airflow.
Amazon EMR is a great swiss army knife for ETL. Moving between Pig, Spark & Hive is trivial. Tableau is a bit of a challenge. Tableau seems to give users too much rope to hang themselves with.
Also, we're hiring: https://www.onemedical.com/careers/

soumyadeb · Answer

In my previous company, we did TreasureData->Tableau.
TreasureData is a platform like Segment and lets you bring your customer event data as well as data from SaaS tools (like Salesforce, Zendesk) into a data warehouse hosted by TreasureData. It worked great but had the downside that all the data was in TreasureData and we were kind of locked into it. Segment kind of solves that problem because it has connectors to Redshift/Snowflake etc so you can keep ownership of your data warehouse but the warehouse sync delay (in our version) was a problem.
Also, BI was just one of the use cases. We wanted to send the data to 3rd party tools (like Facebook ads) based on some logic (some were simple rules but we had complex ML driven lead scoring models too). TreasureData was more flexible on being able to run those rules and activate the results but ideally we wanted to run them on top of our own warehouse in AWS.

thenaturalist · Answer

Sharing an awesome business intelligence tools list [0] I created ca. 2 years ago to get a better understanding of what's out there.[0]: https://github.com/thenaturalist/awesome-business-intelligen...

gavinray · Answer

It covers many more bases than business intelligence, but Forest Admin. And most of its functionality is free.
https://www.forestadmin.com/
It generates a very beautiful CRUD admin dashboard automatically via reflection.
Allows building drag-and-drop data viz dashboards, saving commonly-used custom queries as "scopes", and even building your own HTML views if you need to get really fancy (think tracking live deliveries on a map, etc).
Also has Stripe and Intercom integrations.
I really can't hype this enough. Have been using this on nearly every app I've built the past three years.
The core team also answered my emails as a never-paying customer within 2-3 days the few times I have mailed them over the years I've used it.

mjirv · Answer

Stitch -> Redshift (with DBT for data modeling) -> Looker
For a smaller company, it makes a lot of sense for us to use off-the-shelf tools like these rather than rolling our own pipelines. Every once in a while we run into issues where Stitch isn't quite as flexible as we'd like (can't set dist/sortkeys etc), but that's the tradeoff you make for its simplicity.
DBT is amazing and I can't recommend it highly enough!
Looker works for analytics, but we're starting to do more data-sciency work, and it doesn't have built-in Jupyter notebooks or anything like that. Does anyone have a solution they use and like for hosting and sharing notebooks alongside a stack like this?

mmckelvy · Answer

A lot of these answers seem to focus on app analytics (e.g. collecting clicks, page views, etc. from Segment). How are people collecting / integrating financial data (e.g. sales, subscriptions, expenses)?

mharroun · Answer

Ive done 2 data pipelines, one alot like what most people are are talking.
The other I had to build for a startup with millions of monthly uniques but only seed funding (cant do a 30+k a month data eng bill).
Went with custom event emission->kenisis->(druid & S3) and used imply (https://imply.io/). Easy real time data analytics, auto "enrichment with druid lookups from a RDBMS, and a simple ui for slice/dice investigation metrics. All in all costed lest then the cheapest looker license.

cozuya · Answer

We made our own reporting. Seems crazy to pay multiple 3rd parties to look at our own data. We're using Apache Druid on the back end. (Giant fortune 50 non tech company)

topogios · Answer

Daily ETL in form of sql scripts from prod dbs to internal data warehouse for use via Saiku, case-by-base Shiny apps and some bash email-reporting.

wilbo · Answer

Postgresql > pgadminFor less technical people > metabaseFor automated reporting and storing historic trends > KlipfolioFor near real time automated operational reporting > kloud.io

css · Answer

Formerly: Oracle Hyperion -> ETL to Azure -> PowerBICurrently: Internal Data Warehouse -> RDS -> Internal web app (Django, React)

zshrdlu · Answer

Custom ETL -> BiqQuery -> Datastudio & MetabaseWe initially considered Stitch and other -as-a-service ETL but ~500 lines of Python later we had our own thing. I also experimented with FDW: https://unwindprotect.com/postgres-fdw

mister_hn · Answer

Does Excel qualify here?

elwell · Answer

On a meta note, if you're interested in viewing & sharing stacks, that's the primary feature of the startup I'm working on: Vetd (app.vetd.com). The communities we host (often VC portfolio companies) share their stacks and leverage for discounts.- CTO (chris at vetd.com)

mattbillenstein · Answer

We use BigQuery as the data warehouse - there are airflow jobs to periodically load data into BQ - most of these are simple python scripts.Metabase for most of our simple BI metrics - Tableau for some advanced users doing more complicated stuff.

jconley · Answer

App, Web, IoT device send realtime events to a fluentd system with many events going to Segment. Web front end also loads Segment for various ad trackers. IoT device also uploads telemetry and other logs in both realtime and asynchronously and those end up in some Postgres databases.
Segment syncs our event data periodically to our data warehouse (Redshift).
We have a readonly replica of the eCommerce DB for live stats (Postgres).
And there is a time series db for system/IoT telemetry (InfluxDB).
Most of our BI queries are done in Mode. Some are done in Grafana (data in our InfluxDB and some live data). Spot check against Google Analytics or FB ad tracker...

beckingz · Answer

MariaDB > Metabase Stitch/Segment > BigQuery > Metabase Stitch/Segment > BigQuery > Google Data Studio (Curmudgeonly stakeholders refuse to log in to BI tools... but require reporting anyways...)

exabrial · Answer

MySQL -> replication -> MySQL -> Metabase & TableauWe want to switch to postgres because of features, but "The devil you know is better than the devil you don't", so we just kinda sticking with MySQL.

kmerrol · Answer

A bit different from the crowd here. After much searching, our BI stack has been fully converted to data streaming using: StreamSets DataCollector >> Kafka >> AWS S3 >> AWS Redshift >> Dremio >> Jupyter Notebooks. Great to have Jupyter take on data prep and data analysis tasks while external tables in Redshift are very fast with minimal ETL. Dremio has been great as a virtual data mart with great performance on Parquet data in S3 and desktop BI tool integration.

adrianN · Answer

I work at Snowflake where we're dogfooding our own stuff.

mike_lee_28 · Answer

We've recently changed from : Segment -> BigQuery -> Looker to Segment -> BigQuery -> Dataform -> BigQuery -> Looker.
The addition of Dataform into our stack has completely changed the game, allowing us to maintain one source of objective truth and maximise looker (minimising the joining etc. in LookML, instead pointing it straight at tables in BigQuery).
This in turn means our downstream data users are happier to use a fast and reliable looker instance!

huy · Answer

Postgres, Pipedrive, Zendesk -> BigQuery, then BI using Holistics.ioHolistics handles both ETL, transformation and self-service visualiation (Looker alternative), all in 1 tool.

sixo · Answer

Custom pipes mostly coordinated by Luigi (heavily customized) -> Redshift -> DBT -> Looker.
Some spark mixed in at the ingestion and transformation phases.
Like someone else said in this thread, we're currently battling Redshift scaling problems and are looking to offload more of the raw data to S3 using Spark to create read views.
No data catalog right now but the Looker setup is extremely well-maintained. Hoping to deploy Lyft's Amundsen to fix this more generally.

thingsilearned · Answer

Dave from Chartio here, wanting to share our new book describing the 4 stages of setting up your ideal data stack here - https://dataschool.com/data-governance/.It covers BI a bit, but mostly the stack that BI sits on top of. It's an open book so we're always looking for suggestions and experiences such as those shared here.

TraceOn37 · Answer

We built a tool to help non-technical folk transform their data into useful formats for BI (https://www.data-monkey.com). The tool currently supports JSON, CSV, Excel and text files, and comes with features to merge/filter/transform data. It's free to try and use if you're interested - we'd love to hear your feedback!

sphix0r · Answer

We built our own tool at Datastreams (https://datastreams.io) to collect data. We currently collect several thousands of events per second, mainly web data.
Events are mainly streamed to one of the following: Cloud buckets(S3, etc), HDFS, SQL-db or Cassandra.
Most clients use one of the following visualization tools: PowerBI, Qlik or Tableau.
Our clients are mid to enterprise size.
Disclaimer: I work at Datastreams

chrisacky · Answer

Is there any way that we can provide BI to our customers using an OSS tool?
Our databases store all our users data. I'm thinking of using something like pgsync to sync all database postgres to a new postgres and then having redash or metabase set up to connect.
Alternatively using locked filters on metabase and embedded questions.
All our data for our users are in postgres and they very much want BI insights. Not sure how easiest way for this...

timosch424 · Answer

Ideally: Podio , excel sheets, other program backend-> Pentaho -> MySQL/MongoDB on Google Cloud, Google Analytics-> R scripting(DBI connections, BigQuery and Hadoop managed through R) -> shiny-proxy, APIs, Hadoop for Big Data computation-> Rmarkdown Reports, shiny-dashboards, a little Tableau and a little Power BI.MySQL will be moved to Postgres for better Performance soon.

subhajeet2107 · Answer

Custom Collector(analytics) -> Clickhouse -> Custom ETL Scripts -> Clickhouse -> Re:dash We tried metabase which is awesome , but Redash is also great and easy to setup as well if your team knows sql then Redash is better We also looked at druid and after some benchmarking we settled on Clickhouse, realtime queries even without etl runs within seconds in clickhouse

rmk2 · Answer

We are a small data consultancy, so we use other/more diverse things for customers, but our internal stack is fairly simple:
Hubspot/Jira/G Suite → (Python) → PostgreSQL → (SQL) → Tableau
Since we are Tableau partners, we have a bunch of internal licences either way. We host Tableau Server, ETL, and PostgreSQL ourselves, all on Centos cloud servers.

derekmcloughlin · Answer

{SQL Server, Oracle} -> ETL to SQL Server (Hybrid graph db)-> SQL Server Analysis Services -> {Power Bi, Tableau}

Dramatize · Answer

Product Manager at Replica here.I've set up the following stack:Segment -> (postgres DB, Intercom, Heap Analytics, Full Story)postgress = Data dump for future usage. Intercom = CMS + Communication. Heap Analytics = Product analytics. Full Story = Session analytics.https://replicastudios.com/

wrs · Answer

Various -> Airflow -> BigQuery -> LookerWe have a variety of data sources, from Mixpanel to PostgreSQL to black-box systems that can only dump out CSV files. We use Airflow to manage pulling from those, doing some transforms, and writing everything into BigQuery. Looker then handles the reporting and browsing.

triiimit · Answer

ETL: Python & SQLWarehouse: PostgreSQLReporting platform: LookerEasy, agile, and cheap.

eibhinn · Answer

Oracle -> IBM CognosA nightly rebuild using ETL scripts written in sql. Not cheap or glamorous, but solid for our needs.

jeffnappi · Answer

Data Sources (Analytics, DBs etc) -> Stitch (stitchdata.com) -> Redshift -> Periscope (periscopedata.com)This setup has worked pretty well for us so far. I've learned of a few tools from this thread that might help us to better manage data sets and views - specifically DBT and Dataform.

wladow · Answer

@sixo @numlocked I'd be happy to share more about Snowflake's architecture. As others mentioned in this thread, Snowflake completely separates storage from compute -- eliminating Redshift scaling issues.
Drop me a message, would love to chat.
william.ladow@snowflake.com

enra · Answer

We use Retool.com with our Postgres & other data sources.
The benefit we can also build tooling and workflows, in addition to the metrics, tables and charts.
Early on you don't necessary don't exactly what end up needing, so malleable tool is useful.

bass_case · Answer

Are any of you offering embedded analytics in your products? Mostly applies to mid-market/enterprise SaaS platforms but interested in learning more about how you offer analytics/BI to your customers in your products.

importantbrian · Answer

We use the standard Microsoft stuff. SQL Server, SSIS, SSRS, PowerBI. There are some pain points, but for the most part, it works, and it's pretty inexpensive if you're already a Microsoft shop.

in9 · Answer

No mention for Athena? Right now we are heavily using it at our org.

sunasra · Answer

At Qubole, I have this setup for internal reportingMySQL -> Data export using Sqoop through Airflow -> S3 -> Spark -> Jupyter NotebookPS: Qubole is a data platform which makes ETL pipeline setup easy.

throwaway49409 · Answer

how does one end up doing BI | Data Engineering ? currently a frontend | full-stack JS dev. tired of that world and want to switch to something stable ?

sdpurtill · Answer

Datacoral, which is our ETL and manages all materialized views within our warehouse, replacing our need for data engineers Redshift Mode & Looker

spullara · Answer

Kinesis Firehose -> S3 -> Snowflake -> Sigma

flowerlad · Answer

MySQL and Pebble Reports [1] for reporting.[1] http://pebblereports.com/

robbiemitchell · Answer

Infra: Postgres -> S3 -> LookerBusiness: Segment -> customer.io/Zapier/Heap Analytics + LookerSupport/Success: Intercom+Slack -> frame.ai

EdwardDiego · Answer

Kafka streaming to validate and attach model data, fed into Druid by Flink, queried via a custom front end.

Tharkun · Answer

We basicallly don't have a BI stack. Which is silly, but I guess we've had other priorities.

8589934591 · Answer

Client I'm working for:* SAP -> SAP services -> tableau. * Some depts use Excel -> Python -> tableau.

kakoni · Answer

Embulk + Airflow => Postgres => Apache Superset(as UI/Tool)ELT process so more DBT in the future

user7878 · Answer

HubSpot -> AzureSQL -> PowerBIHubSpot -> AzureSQL -> TableauFully automated syncing with user friendly signup

valcker · Answer

Data warehouse: Postgres Data visualizations and reporting: Tableau, DataGrip

arkiver · Answer

We use a simple `Postgres -> CSV -> Pandas` pipeline.

tylerjaywood · Answer

ETL: Domo data connectors, Python, SQLWarehouse: PostgreSQLReporting platform: Domo

vitinho_ · Answer

does anyone uses elasticsearch + kibana?

roystonvassey · Answer

apps on pivotal to extract, transform, combine data -> to ms sql db -> powerbi to visualize

eeZah7Ux · Answer

BI?

endlessvoid94 · Answer

Postgres -> QueryClips

vitinho_ · Answer

does anyone use ElasticSearch + Kibana?

What does your BI stack look like?

How does your company visualize and report on what’s happening in the business?

After reading the comments here, I decided to give Metabase[0] a shot. Awesome analysis tool. Very impressed. Works even with large SQLite databases.[0]:https://www.metabase.com/

Postgres (read replica) -> RedashSegment -> Amplitude (but using it less and less)I am surprised no one mentioned https://redash.io/ till now (a lot cheaper than looker/mode/Periscope with all the basic functionality that you might need).

Sharing an awesome business intelligence tools list [0] I created ca. 2 years ago to get a better understanding of what's out there.[0]: https://github.com/thenaturalist/awesome-business-intelligen...

A lot of these answers seem to focus on app analytics (e.g. collecting clicks, page views, etc. from Segment). How are people collecting / integrating financial data (e.g. sales, subscriptions, expenses)?

We made our own reporting. Seems crazy to pay multiple 3rd parties to look at our own data. We're using Apache Druid on the back end. (Giant fortune 50 non tech company)

Daily ETL in form of sql scripts from prod dbs to internal data warehouse for use via Saiku, case-by-base Shiny apps and some bash email-reporting.

Postgresql > pgadminFor less technical people > metabaseFor automated reporting and storing historic trends > KlipfolioFor near real time automated operational reporting > kloud.io

Formerly: Oracle Hyperion -> ETL to Azure -> PowerBICurrently: Internal Data Warehouse -> RDS -> Internal web app (Django, React)

Custom ETL -> BiqQuery -> Datastudio & MetabaseWe initially considered Stitch and other -as-a-service ETL but ~500 lines of Python later we had our own thing. I also experimented with FDW: https://unwindprotect.com/postgres-fdw

Does Excel qualify here?

On a meta note, if you're interested in viewing & sharing stacks, that's the primary feature of the startup I'm working on: Vetd (app.vetd.com). The communities we host (often VC portfolio companies) share their stacks and leverage for discounts.- CTO (chris at vetd.com)

We use BigQuery as the data warehouse - there are airflow jobs to periodically load data into BQ - most of these are simple python scripts.Metabase for most of our simple BI metrics - Tableau for some advanced users doing more complicated stuff.

MariaDB > Metabase Stitch/Segment > BigQuery > Metabase Stitch/Segment > BigQuery > Google Data Studio (Curmudgeonly stakeholders refuse to log in to BI tools... but require reporting anyways...)

MySQL -> replication -> MySQL -> Metabase & TableauWe want to switch to postgres because of features, but "The devil you know is better than the devil you don't", so we just kinda sticking with MySQL.

I work at Snowflake where we're dogfooding our own stuff.

Postgres, Pipedrive, Zendesk -> BigQuery, then BI using Holistics.ioHolistics handles both ETL, transformation and self-service visualiation (Looker alternative), all in 1 tool.

{SQL Server, Oracle} -> ETL to SQL Server (Hybrid graph db)-> SQL Server Analysis Services -> {Power Bi, Tableau}

Product Manager at Replica here.I've set up the following stack:Segment -> (postgres DB, Intercom, Heap Analytics, Full Story)postgress = Data dump for future usage. Intercom = CMS + Communication. Heap Analytics = Product analytics. Full Story = Session analytics.https://replicastudios.com/

ETL: Python & SQLWarehouse: PostgreSQLReporting platform: LookerEasy, agile, and cheap.

Oracle -> IBM CognosA nightly rebuild using ETL scripts written in sql. Not cheap or glamorous, but solid for our needs.

Data Sources (Analytics, DBs etc) -> Stitch (stitchdata.com) -> Redshift -> Periscope (periscopedata.com)This setup has worked pretty well for us so far. I've learned of a few tools from this thread that might help us to better manage data sets and views - specifically DBT and Dataform.

@sixo @numlocked I'd be happy to share more about Snowflake's architecture. As others mentioned in this thread, Snowflake completely separates storage from compute -- eliminating Redshift scaling issues.Drop me a message, would love to chat.william.ladow@snowflake.com

We use Retool.com with our Postgres & other data sources.The benefit we can also build tooling and workflows, in addition to the metrics, tables and charts.Early on you don't necessary don't exactly what end up needing, so malleable tool is useful.

Are any of you offering embedded analytics in your products? Mostly applies to mid-market/enterprise SaaS platforms but interested in learning more about how you offer analytics/BI to your customers in your products.

We use the standard Microsoft stuff. SQL Server, SSIS, SSRS, PowerBI. There are some pain points, but for the most part, it works, and it's pretty inexpensive if you're already a Microsoft shop.

No mention for Athena? Right now we are heavily using it at our org.

At Qubole, I have this setup for internal reportingMySQL -> Data export using Sqoop through Airflow -> S3 -> Spark -> Jupyter NotebookPS: Qubole is a data platform which makes ETL pipeline setup easy.

how does one end up doing BI | Data Engineering ? currently a frontend | full-stack JS dev. tired of that world and want to switch to something stable ?

Datacoral, which is our ETL and manages all materialized views within our warehouse, replacing our need for data engineers Redshift Mode & Looker

Kinesis Firehose -> S3 -> Snowflake -> Sigma

MySQL and Pebble Reports [1] for reporting.[1] http://pebblereports.com/

Infra: Postgres -> S3 -> LookerBusiness: Segment -> customer.io/Zapier/Heap Analytics + LookerSupport/Success: Intercom+Slack -> frame.ai

Kafka streaming to validate and attach model data, fed into Druid by Flink, queried via a custom front end.

We basicallly don't have a BI stack. Which is silly, but I guess we've had other priorities.

Client I'm working for:* SAP -> SAP services -> tableau. * Some depts use Excel -> Python -> tableau.

Embulk + Airflow => Postgres => Apache Superset(as UI/Tool)ELT process so more DBT in the future

HubSpot -> AzureSQL -> PowerBIHubSpot -> AzureSQL -> TableauFully automated syncing with user friendly signup

Data warehouse: Postgres Data visualizations and reporting: Tableau, DataGrip

We use a simple `Postgres -> CSV -> Pandas` pipeline.

ETL: Domo data connectors, Python, SQLWarehouse: PostgreSQLReporting platform: Domo

does anyone uses elasticsearch + kibana?

apps on pivotal to extract, transform, combine data -> to ms sql db -> powerbi to visualize

BI?

Postgres -> QueryClips

does anyone use ElasticSearch + Kibana?

After reading the comments here, I decided to give Metabase[0] a shot. Awesome analysis tool. Very impressed. Works even with large SQLite databases.
[0]:https://www.metabase.com/

Postgres (read replica) -> Redash
Segment -> Amplitude (but using it less and less)
I am surprised no one mentioned https://redash.io/ till now (a lot cheaper than looker/mode/Periscope with all the basic functionality that you might need).

Sharing an awesome business intelligence tools list [0] I created ca. 2 years ago to get a better understanding of what's out there.
[0]: https://github.com/thenaturalist/awesome-business-intelligen...

Postgresql > pgadmin
For less technical people > metabase
For automated reporting and storing historic trends > Klipfolio
For near real time automated operational reporting > kloud.io

Formerly: Oracle Hyperion -> ETL to Azure -> PowerBI
Currently: Internal Data Warehouse -> RDS -> Internal web app (Django, React)

Custom ETL -> BiqQuery -> Datastudio & Metabase
We initially considered Stitch and other -as-a-service ETL but ~500 lines of Python later we had our own thing. I also experimented with FDW: https://unwindprotect.com/postgres-fdw

On a meta note, if you're interested in viewing & sharing stacks, that's the primary feature of the startup I'm working on: Vetd (app.vetd.com). The communities we host (often VC portfolio companies) share their stacks and leverage for discounts.
- CTO (chris at vetd.com)

We use BigQuery as the data warehouse - there are airflow jobs to periodically load data into BQ - most of these are simple python scripts.
Metabase for most of our simple BI metrics - Tableau for some advanced users doing more complicated stuff.

MySQL -> replication -> MySQL -> Metabase & Tableau
We want to switch to postgres because of features, but "The devil you know is better than the devil you don't", so we just kinda sticking with MySQL.

Postgres, Pipedrive, Zendesk -> BigQuery, then BI using Holistics.io
Holistics handles both ETL, transformation and self-service visualiation (Looker alternative), all in 1 tool.

Product Manager at Replica here.
I've set up the following stack:
Segment -> (postgres DB, Intercom, Heap Analytics, Full Story)
postgress = Data dump for future usage. Intercom = CMS + Communication. Heap Analytics = Product analytics. Full Story = Session analytics.
https://replicastudios.com/

ETL: Python & SQL
Warehouse: PostgreSQL
Reporting platform: Looker
Easy, agile, and cheap.

Oracle -> IBM Cognos
A nightly rebuild using ETL scripts written in sql. Not cheap or glamorous, but solid for our needs.

Data Sources (Analytics, DBs etc) -> Stitch (stitchdata.com) -> Redshift -> Periscope (periscopedata.com)
This setup has worked pretty well for us so far. I've learned of a few tools from this thread that might help us to better manage data sets and views - specifically DBT and Dataform.

@sixo @numlocked I'd be happy to share more about Snowflake's architecture. As others mentioned in this thread, Snowflake completely separates storage from compute -- eliminating Redshift scaling issues.
Drop me a message, would love to chat.
william.ladow@snowflake.com

We use Retool.com with our Postgres & other data sources.
The benefit we can also build tooling and workflows, in addition to the metrics, tables and charts.
Early on you don't necessary don't exactly what end up needing, so malleable tool is useful.

At Qubole, I have this setup for internal reporting
MySQL -> Data export using Sqoop through Airflow -> S3 -> Spark -> Jupyter Notebook
PS: Qubole is a data platform which makes ETL pipeline setup easy.

MySQL and Pebble Reports [1] for reporting.
[1] http://pebblereports.com/

Infra: Postgres -> S3 -> Looker
Business: Segment -> customer.io/Zapier/Heap Analytics + Looker
Support/Success: Intercom+Slack -> frame.ai

Client I'm working for:
* SAP -> SAP services -> tableau. * Some depts use Excel -> Python -> tableau.

Embulk + Airflow => Postgres => Apache Superset(as UI/Tool)
ELT process so more DBT in the future

HubSpot -> AzureSQL -> PowerBI
HubSpot -> AzureSQL -> Tableau
Fully automated syncing with user friendly signup

ETL: Domo data connectors, Python, SQL
Warehouse: PostgreSQL
Reporting platform: Domo