Over the past two decades the research area of pedometrics evolved as an overlap between soil science and machine learning (ML). Although first approaches date back to the 1990th, a stronger focus on integrating ML approaches to generate soil maps and to extract pedological knowledge is visible within the past 10 years. Approaches mainly comprise regression and supervised classification, analysis of feature importance, spatial data mining, validation, feature construction, the analysis of uncertainty and sampling design. The main focus is on spatial modelling, i.e. generating soil property maps. This has become of ample importance against the background of climate change, food security, biodiversity loss, environmental pollution, etc., where soils as the uppermost part of the Earth’s surface link bio-, hydro-, atmo-, and lithosphere and act as filter and transformer.In this project we aim at developing and improving methods for digital soil mapping, and the related questions of feature construction, and sampling design. Cornerstones of our efforts will be to construct methods that allow for uncertainty quantification; for the integration of domain knowledge and topographical information; and that provide a better understanding of the importance and interactions of underlying soil formation processes. The framework of Gaussian processes provides a well-suited basis for all these demands. Specifically, one contender method under consideration is the construction of covariance functions that are informed by terrain type and topography. This is based on the idea to encode domain knowledge about how likely it is to find similar values of the target attribute when looking at locations separated by certain land features, such as rivers, mountains, hillslides, or rough terrain. We follow another approach that allows integrating domain knowledge in a more direct and expressive manner, by specification of partial differential equations that describe how the target attribute is expected to change along the given terrain. This approach makes it relatively easy to formulate models that incorporate any kind of data that is available at higher resolution than the output variable. We will apply this methodology to develop a better understanding of the factors that play a role in the genesis and spatial distribution of soil properties.