Rating how aesthetically pleasing an image appears is a highly complex matter and depends on a large number of different visual factors. Previous work has tackled the aesthetic rating problem by ranking on a 1-dimensional rating scale, e.g., incorporating handcrafted attributes. In this paper, we propose a rather general approach to map aesthetic pleasingness with all its complexity into an automatically “aesthetic space” to allow for a highly fine-grained resolution. In detail, making use of deep learning, our method directly learns an encoding of a given image into this highdimensional feature space resembling visual aesthetics. In addition to the mentioned visual factors, differences in personal judgments have a substantial impact on the likeableness of a photograph. Nowadays, online platforms allow users to “like” or favor particular content with a single click. To incorporate a vast diversity of people, we make use of such multi-user agreements and assemble an extensive data set of 380K images (AROD) with associated meta information and derive a score to rate how visually pleasing a given photo is. We validate our derived model of aesthetics in a user study.Further, without any extra data labeling or handcrafted features, we achieve state-of-the-art accuracy on the AVA benchmark data set. Finally, as our approach is able to predict the aesthetic quality of any arbitrary image or video, we demonstrate our results on applications for resorting photo collections, capturing the best shot on mobile devices and aesthetic key-frame extraction from videos.