Most training data originate from English datasets. Some datasets contain aligned sequences of Dutch and English. The LLM learns the relationship between indices of Dutch and English phrases. The quality of translation depends on the size of the aligned dataset and the extent of the topics covered in this dataset. Therefore, most of the knowledge is derived from English data, with Dutch merely being another representation of the same data learned during training.