Most AI systems today rely on supervised learning: you provide labelled input and output pairs, and get a program that can perform analogous computation for new data. This allows an approach to software engineering Andrej Karpathy has termed "Software 2.0": programming by example data. This is the machine learning revolution that's already here, which we need to be careful to distinguish from more futuristic visions such as Artificial General Intelligence. If "Software 2.0" is driven by example data, how is that example data created – and how can we make that process better?