C4 uncleaned is ~ 2.3 TB (https://huggingface.co/datasets/c4) which is a huge amount of only text. Otherwise I have worked with a video hosting site that had more than 100's of TBs of data.
It occurs to me that there are several measures of biggest. Raw size, number of rows, etc. which did you mean?