HACKER Q&A
📣 mnrmen

E-commerce devs how do you manage and sync a large amount of products?


We have gotten multiple requests in the past few months from clients looking to manage and import products (plus keep stock in sync) to their e-commerce platforms. The data usually comes from multiple sources (API’s or csv files) and providers. Additionally they want to edit the products (change categories, prices, names, etc). The current tools we’ve used up to this point aren’t enough for a large volume of products (over 100k).

How have you approached this - are there any tools on the market that have proven to work well for you or is the only approach building a custom solution?


  👤 eastbayjake Accepted Answer ✓
The category of product you're looking for is a Product Information Management (PIM) system in many eCommerce architectures. It's a core component of big eCommerce platforms like Salesforce and SAP, but there are also "composable commerce" vendors like Commercetools or Spryker or Fabric who are happy to sell you just that API. There are also open source alternatives, but most are geared towards mom-and-pop scale shops whose alternative would be paying for Shopify.

PIM systems are most helpful when your "factual" product catalog (dimensions, weights, country of origin, etc) needs to be enriched with inputs from your marketing and merchandising teams (categories and names, but also search-optimized product description content, managing high-quality product images, etc).

PIM systems do not typically handle pricing and inventory, which are usually abstracted into separate services (or in large enterprise, handled by an ERP system). One of the benefits of this approach at scale is that your marketing/sales teams view pricing as a key lever, and having static pricing in a PIM limits your ability to go act on digital marketing levers like promos or A/B testing a demand curve for prices. PIMs are also typically not setup to have strong governance around who can set/change prices -- this can lead to perverse incentives if eg salespeople have edit permission to change prices!

Fabric is just one of the PIM vendors out there, but they do a really good job explaining the PIM category and I think it handles 95% of what you're looking for in your description: https://fabric.inc/blog/pim-software/


👤 JLuterek
Disclaimer: I work for Elastic Path. We are an ecommerce solution with a leading product management solution (Gartner recognized us for both ecommerce and PIM) and a new integration hub that can sync data across systems. For example it can sync an entire product catalog to a dedicated search provider in just a couple minutes. I'm not adding a link as that feel scammy, if you are interested you can find it, the rest of this post is unbiased advice.

---

Now that the disclaimer is out of the way, let me chat about what I've seen with other systems before joining Elastic Path (EP).

Products are more difficult to model between systems, but easier to sync as they are less dynamic. You can find plenty of ETL (Extract/Transform/Load) systems from inexpensive to premium and costly. There are also many low-code offerings that allow you to map fields. I don't know which ecommerce system you are using, but almost all do not include this functionality. Some systems may include a community created plugin or basic CSV upload, but they will not be as robust as a dedicated system.

If you write a custom product sync it can often be a basic console application running in a container on the cloud that is run on a set schedule. Nothing fancy, just mapping fields and calling APIs with very good error handling and alerts.

I have also seen multiple systems participate in product data management. In this scenario a dedicated PIM is setup to create the initial product data. It is then synced to the ecommerce provider where a merchandising team may do additional categorization, pricing, or web enrichment. It is important that the data is always a one-way flow and that it's clear any data synced from the PIM can be overwritten at any time. This solution works and is typically adopted by large companies who need the advanced capabilities of a PIM. With the exception of EP ecommerce engines only provide light product management capabilities.

Inventory is much more difficult to sync as it is constantly changing. You will want to be very clear which system is your source of truth. Ideally each inventory lookup can call directly to that source of truth to identify the inventory levels, but this is not always possible and then you are left syncing information. The inventory sync will never be perfect, it will always be off by a bit off. Most commerce solutions don't have strong inventory capability so it is left to an ERP or backend system that can handle inventory balancing, allocation, and other advanced functionality. Unfortunately, all of this functionality slows down the system and makes it a poor choice for live API inventory lookups. In this scenario you would want to sync a cache of highlevel inventory data into your ecommerce provider or other system.

You can attempt to use a low-code syncing tool or ETL, but they will often be too slow. So an option is to build it yourself, but in this case a single console application will also take too long. The best way I have seen is to ingest the inventory data in chunks and then call a serverless function for each entry. That function will then make the update in your target system. This will allow you to sync the data very quickly, often fast enough to take down the ecommerce software so throttling may be necessary.

Because it never matches correctly it's worth setting up inventory thresholds based on through-put. Basically saying if inventory is below X it will either be treated as OOS OR will require a callback to the source of truth during checkout. This ensures items are not oversold.

If you provide a bit more details around what systems you are syncing from and to I can provide more details information around tools or infrastructure.