

Fundamentals of reproducible research and
free software
MVA course
Miguel Colom
About this course
This is a course on reproducible research and free/open source software (FOSS) at the MVA master. It includes topics such as how to write and publish reproducible research, legal aspects around the source code, article, and data, and eventually good practices when writing free software and performing reproducible research.Group discussions, debates, and individual dissertations are part of the activities of the course. The plan of the course covers the minimum knowledge that any masters' student or PhD candidate on computational sciences should reach to perform reliable scientific research.
Slides of the course
- Introduction. Software licenses. Patents. Economic models. Case studies.
- Towards reproducible research
- Publishing reproducible research
- A very brief overview on Intellectual Property
- Identifying Billions of Source Code Artifacts: the SWHID in Publication Workflows
- Writing and evaluating reproducible research
TPs (practical work, incremental)
Invited talks
- Jaime Arias, Software Heritage, key infrastructure for Open Science and Software Science
- Enric Meinhardt, S2P: a reproducible satellite stereo pipeline
- Marina Gardella, Image forensic tools
- Charles Truong, Change point detection in Python
Plan
- 02/10/2024. MC1. Free and open-source software. Introduction. Licensing. Patents.
- 09/10/2024. MC2. Economic model of FOSS projects. Reproducible research. Introduction to reproducible research. Presentation of the IPOL journal.
- 16/10/2024. TP1 (economic model and licensing of a free/open source software project).
- 30/10/2024. MC3. Publishing reproducible research. The editorial process. Legal aspects.
- 06/11/2024. P. Intermediate presentation (1/2). Group discussion on the presentations, questions, feedback.
- 13/11/2024. MC4. Writing a reproducible scientific article (1/2). TP2 (reproducibility).
- 20/11/2024. MC5 Writing a reproducible scientific article (2/2). Review of TP2. Start TP3 (scientific writing).
- 27/11/2024. 🎤 Invited talk by Jaime Arias on Free Software, Open Science, and the UNESCO's supported Software Heritage initiative
- 11/12/2024. Final presentations (2/2)
Reading
- Jeffrey Brainard. Open-access journal elife will lose its 'impact factor' over controversial publishing model. Science, 13/11/2024. DOI: 10.1126/science.zycyo78.
- Sheeba Samuel, Daniel Mietchen. Computational reproducibility of Jupyter notebooks from biomedical publications. arXiv preprint. 11 Aug. 2023.
- Anil Oza Reproducibility trial: 246 biologists get different results from same data sets. Nature news article. 12 Oct. 2023.
- Protzko et al. High replicability of newly discovered social-behavioural findings is achievable. Nature Human Behaviour. 9 Nov. 2023.
- Veritasium. The Problem With Science Communication. Youtube video. 1 Nov. 2023.
- Reuters. Moderna sues Pfizer/BioNTech for patent infringement over COVID vaccine (2022) → European Patent Office declares Moderna mRNA patent invalid (2023).
- Florian Prinz, Thomas Schlange and Khusru Asadullah. Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews, drug discovery.
- C. Glenn Begley and Lee M. Ellis. Raise standards for preclinical cancer research. 29 March 2012, vol. 483, Nature 531.
- Christian Fuchs and Marisol Sandoval. The Diamond Model of Open Access Publishing: Why Policy Makers, Scholars, Universities, Libraries, Labour Unions and the Publishing World Need to Take Non-Commercial, Non-Profit Open Access Serious. tripleC 13(2): 428-443, 2013.
- Charles Piller. Blots on a Field?. Science, Vol 377, Issue 6604. DOI: 10.1126/science.add9993.
- Alexandru Nedelcu. Akka is moving away from Open Source, September 7, 2022.
- Tom E. Hardwicke, Robert T. Thibault, Jessica E. Kosie, Loukia Tzavella, Theiss Bendixen, Sarah A. Handcock, Vivian E. Köneke and John P. A. Ioannidis. Post-publication critique at top-ranked journals across scientific disciplines: a cross-sectional assessment of policies and practice, R. Soc. open sci.9220139220139. DOI: 10.1098/rsos.220139.
- Unified Patents. Defending Open Source: An 2022 Litigation Update, Jun. 9, 2022.
- swyx. How Open Source is eating AI, Oct. 9, 2022.
- Juan Pablo Alperin. Why I think ending article-processing charges will save open access, Nature World View, 12 Oct. 2022. DOI: 10.1038/d41586-022-03201-w.
- Holly Else. Dozens of papers co-authored by Nobel laureate raise concerns, Nature News, 21 Oct. 2022. DOI: 10.1038/d41586-022-03032-9.
- eLife. eLife’s New Model: Changing the way you share your research, eLife, 20 Oct. 2022.
Interesting reading provided by the students
We might together discuss these topics within the course.- From Pedro Machado Santos Rohde (2023). An open source developer / lawyer that wants to sue Github because of Copilot, its autocomplete tool trained on public repos. It produces results which can clearly be traced back to the original code, but with no attribution or mention to licenses, e.g. [Twitter post]. I thought it was interesting to share, as it's very recent news and very much linked to our RR/FOSS class.
- From Solal Nathan (2023), about HuggingFace using DOIs: Introducing DOI: the Digital Object Identifier to Datasets and Models.
- From Solal Nathan (2023), The Turing Way handbook to reproducible, ethical and collaborative data science.
- From Théo Saulus (2023), patent Attention-based sequence transduction neural networks. Another example of a patent of software by alleging it's a system made of hardware + a computer program.
- From Solal Nathan (2024), French Court Issues Damages Award for Violation of GPL. Text of the decision, 14 février 2024 Cour d'appel de Paris RG n° 22/18071.
↩ Back to the main page