data sharing in the biomedical community, less attention has been paid to new kinds of biomedical data sharing, particularly the sharing of confidential patient data. In the traditional paradigm of data sharing, researchers transfer their data directly to data modelers. Here we describe an alternative model that allows the protection of confidential data through a process we term ‘model to data’ (MTD). In the MTD model, the flow of information between data generators and data modelers is reversed. This new sharing paradigm has been successfully demonstrated in crowdsourced competitions and represents a promising alternative for increasing the use of data that cannot—or will not—be more broadly shared. Biomedical studies generate vast clinical, radiologic, cellular and molecular data sets, and enable new basic and translational science. However, there is substantial disagreement around the best ways to share these valuable assets, particularly in the context of clinical trials. Some advocate for immediate and fully open sharing, arguing that wide accessibility will facilitate creative new analyses and improved reproducibility1. Others suggest more closed and/or delayed data sharing, arguing that broad availability will disincentivize the substantial effort required to accurately collect and generate large data sets2. Still others highlight the importance of keeping patient data private and not betraying patients’ trust3. These points were highlighted by researchers involved in a large cardiovascular study, who remarked that the public release of their data puts projects and manuscripts “in jeopardy of being scooped”4. Ultimately, the question at hand is simple: what data-sharing model will most effectively incentivize funders, clinicians, scientists and patients, and catalyze new biomedical discovery? Data sharing traditionally implies a flow of information from data generator to data cornerstone of scientific research; data modelers acquire direct access to data to develop and test hypotheses. Recently, alternative forms of data sharing have emerged, enabled by new technologies and propelled by a small, albeit growing, community that organizes research questions around ‘challenges’. These challenges are crowdsourced competitions that pose quantitative questions to the multidisciplinary collaboration (e.g., Kaggle, Innocentive, CASP (Critical Assessment of Protein Structure Prediction), CAGI (Critical Assessment of Genome Interpretation), the DREAM (Dialogue for Research Engineering Assessments and Methods) Challenges, and others5,6). In these challenges, the dissemination and availability of data are critical to their operation and have motivated challenge Figure 1 Sharing paradigms for data challenges. (a) Data to modeler (DTM). Both training and validation data sets are provided to participants for model development and generation of predictions. (b) Model to data (MTD). Participants submit ‘containerized’ models to organizers. Hidden data sets are used for unbiased model validation, as well as potential model training. Challenge cloud platform (private) Training data Model Leaderboards & benchmarks Challenge participants (public) Challenge participants (public) !"#$$%&'%()%#*+( Validation data Ground truth Predictions Prediction Challenge teams Scoring Model Containerized model Validation data sets Model submissions Prospective data sets Future models Leaderboards & benchmarks Scoring Training data Challenge cloud platform (private) Challenge teams a b AlternaIve models for sharing confidenIal biomedical data. JusIn et al., Nature Biotechnology, 2018 • DTM (Data to Modeler)から MTD (Model to data)へ。 • MTD 1. トレーニングデータのみ が公開されており、解析 手法(モデル)をチューニ ング。 2. 解析手法(コンテナ化が 望ましい)をデータのある クラウドにアップロード。 3. 解析結果が生成され、そ れを何らかの形でユー ザーが閲覧、ダウンロー ドするという形。 • テクニカルに詰めることはたく さんあるが、、、新しいフレーム ワークを考え続けることは非常 に需要。