Developing data analyses pipelines around open source, open data, and open communities has shown great successes. The most used machine-learning tools to date, scikit-learn, was assembled by a large community, with different contributors bringing different expertise. Can such success be carried over to health data? I will discuss my experience building first scikit-learn, then nilearn in the brain imaging community, and finally more recent work in electronic health records. Spoiler: things are harder with electronic health records.