Mar Barrantes-Cepas, Eva van Heese, Lucas Baudouin, Eva Koderman, Diana Bocancea
In this chapter, we’ll show you practical tools and software to help make your neuroscience research more reproducible. By using scripts instead of graphical interfaces, open source software and version control, you’ll not only make your own work easier to manage and build on in the future, but also ensure others can replicate your findings and easily collaborate with you.
Reproducibility is at the core of good science—it helps move the field forward by making sure discoveries can be verified and expanded upon. Increasing reproducibility in neuroscience can be approached in two ways: from the top down, where institutions reshape incentives and frameworks (i.e. changes at the meso-level, read more here, and from the bottom up, where individual researchers adopt better practices. While both approaches are needed, this chapter focuses on the bottom-up—giving you the tools and guidelines to improve your own work. Adopting these practices early on will improve the quality of your research and contribute to a more reliable scientific community.
Although sharing data in clinical neuroscience can be challenging due to privacy concerns and logistical barriers, this shouldn’t be an excuse not to share your code and materials. Developing your work with this in mind, you can help promote transparency and reproducibility by ensuring that your code is shareable and accessible to others.
You might already know much of what will be discussed in the following section. Should that be the case, you can browse through the headers and share your knowledge with your peers! Otherwise, here is a summary of some concepts you might need to better understand coding practices.
A Graphical User Interface (GUI) is a digital interface that allows users to interact with graphical elements such as icons, buttons, and menus (e.g., SPSS or MATLAB). GUIs are user-friendly because they provide intuitive visual cues for navigation and task execution. However, they are less effective for reproducibility, as it can be challenging to track or recall the exact steps and parameters used during analysis if you don’t note them somewhere. Moreover, scripts provide more flexibility and less manual work, which means more control over your data.
To address these issues, it is advisable to use scripts for your methodology. Scripts provide a record of all actions taken and parameters used, making it easier to reproduce and share your work with others. Fortunately, many GUIs also offer the option to execute commands directly through a terminal. For instance, if properly installed, FSL commands can run from the terminal. If you want to learn more about this, you should consult the log files or documentation specific to the tool you are using.
In programming, just as in everyday life, a wide array of languages are available for writing your scripts —more than you might imagine! Check out the [List of programming languages - Wikipedia](https://en.wikipedia.org/wiki/List_of_programming_languages. The most commonly used languages in neuroscience are Bash, C++, LaTeX, Python, MATLAB, and R. The choice of language often depends on your personal preferences and the specific needs of your project. In this section, we’ll outline the main differences between these languages, discuss Open Science-related considerations, and offer tips for making the most of each.
Bash is excellent for automating command-line tasks and system administration. It enables you to execute and automate terminal commands and call various tools through scripts. C++ is used for computationally intensive projects, and most command-line tools are programmed with it due to its performance capabilities.
Python, MATLAB, and R are high-level languages, meaning they are easy to use, understand, portable, and independent from specific hardware. Python is versatile and user-friendly, making it ideal for data analysis. R is designed specifically for statistical analysis and data visualisation, making it popular among statisticians and data scientists. MATLAB excels in numerical computation and visualisation but requires a paid licence.
LaTeX is also a programming language or tool that can simplify the preparation of your manuscript for publication (or documents in general). Some journals even offer their own LaTeX templates! It is specifically useful when your manuscript contains formulas, graphs that are still in the making or pieces of code since it allows you to easily add everything beautifully without spending too much time looking for the correct character. Different editors, such as Overleaf or VisualStudio Code, will enable you to use it. Some extra tools that will enhance your experience with LaTex are Detexify, which helps to find characters you might not know how are called or a [Tables converter] (https://www.tablesgenerator.com/) into LaTex format.
But there’s more to consider! Besides programming languages, you’ll also need to manage libraries and tools. Libraries are collections of pre-written code that extend the functionality of a programming language and simplify complex tasks. Just as programming languages have different versions, libraries can also have multiple versions due to updates and bug fixes. When multiple people work on the same coding file (see below - Version control), it is important to use consistent versions of programming languages and libraries across the team. Software tools can act as GUIs to simplify data analysis. However, tools can be built using programming that are not open source (i.e., Matlab) and therefore the tools themselves are also not open source.
Virtual environments and containers are tools used in software development to create isolated and controlled environments for running applications and managing dependencies.
A virtual environment in Python is an isolated environment that allows you to install and manage dependencies for a specific project without affecting the global Python installation or other projects. It helps ensure that each project can have its own dependencies and versions, avoiding conflicts between projects.
A container is an isolated unit, and is much more comprehensive tool that isolates not just the programming environment but the entire software environment, including the operating system, system libraries, runtime, and application code - making it more versatile for deploying and running consistent environments across different systems. Containers offer several advantages:
Useful open-source tools within science also include LibreOffice and Inkscape. LibreOffice is a free and open-source alternative to Microsoft Office applications like Word, PowerPoint, Excel, and Access. It offers similar functionalities for document creation, presentations, spreadsheets, and database management. For poster creation or data visualisation you can opt for Inkscape. It is a free and open-source vector graphics editor that is widely used for creating and editing scalable vector graphics (SVG) files.
To make your project as open sciency as possible, we provide a few tips:
This section offers guidance on optimising version control and annotation practices. It covers the best practices to streamline version control, how to integrate them within your team, and the ideal workflow to adopt for maximum efficiency.
When working on a script, it is important to document your code. Annotation is essential to make code understandable, discoverable, citable, and reusable. Check out Chapter 3 to obtain a better general understanding of documentation. More specific to code documentation, it is important to keep in mind the following:
To help get you started, you can check out these (script templates)[https://github.com/marbarrantescepas/script-templates], guidance, and examples. This tool is also useful for formatting your code (and making it beautiful!) - Black Vercel.
Version control is a method used to document and manage changes to a file or collection of files over time. It allows you and your collaborators to monitor the history of revisions, review modifications, and revert to previous versions when necessary. This is useful especially when working together on a script. The most prevalent version control system that can help with that is Git.
Git is a version control system that tracks file changes and coordinates work among multiple people on a project. GitHub and GitLab are web-based platforms that host Git repositories, along with additional features like issue tracking, code reviews, and continuous integration. The main difference between them is that GitHub is more focused on open-source collaboration and has a large user community, while GitLab offers more built-in tools and is known for its flexibility in deployment options, including self-hosting. Both of them allow the creation of private and public repositories.
:tulip: If you want to learn more about pros and cons and the current status of Git(-related) tools at Amsterdam UMC, please check this link. Check with your (co)supervisors about the best option to use or if they already have an account for the group. If the account hasn’t been created yet, take the initiative and set it up yourself by following the simple instructions below.
To create a GitHub account, link it with Git on your local machine, and verify the connection, follow these steps:
git config --global user.name "YourGitHubUsername"
git config --global user.email "your_email@example.com"
The rest of these instructions are coming soon
Once you have a good feeling of the Github lingo, the version control should be easy peasy. Here are some basics on the terminology and a tutorial to help get you started:
The main or sometimes called the master branch is where your code lives as the main character. All the other branches are created for the development of a specific feature (or you can think of them as side quests). After the feature development is complete, and the code is fully tested and functional, you can merge it back into the main branch. Continue this process until all the feature development is complete.
Before creating a new GIT repository and linking it to GitHub or GitLab, it’s EXTREMELY important to make a .gitignore file. Without it, all files in your project, including potentially sensitive or personal data, will be tracked and uploaded by default. This could lead to unintentional data exposure that cannot be shared due to privacy regulations.
To avoid this, make sure to create a .gitignore file listing all the file types and folders you want to exclude from version control. You can find more detailed guidance in the Ignoring files - GitHub Docs and browse Some common .gitignore configurations for examples. This practice ensures that only the necessary files are tracked, reducing the risk of accidentally sharing sensitive information.
The easiest way to use Git/GitHub is through a code editor such as Visual Studio Code (Introduction to Git in Visual Studio Code) or GitHub Desktop.
A README is a text file that introduces and explains a project because no one can read your mind (yet). It contains information that is commonly required to understand what the project is about. If you want to learn how to properly create a README file check: Make a README (they also provide templates!)
Coming soon
A general code workflow should include several iterations of peer review and end with the scripts being uploaded on GitHub. Peer review ensures that the code is correct and functions well, while the publication of scripts online ensures these are shared with the wider scientific community and improves the reproducibility. In scripts, correctness refers to the code’s ability to produce the intended results accurately according to specified requirements. In contrast, reproducibility ensures that these results can be consistently obtained by different users or in different environments when the same code is run with identical inputs. While correctness confirms that the code functions as intended, reproducibility guarantees that the outcomes can be reliably replicated, which is essential for validating research findings.
The general workflow of code review that ensures correctness and reproducibility is summarized in the figure below:
As you might notice, there should always be a code owner and a code reviewer who have separate tasks.
For the code owner:
For the reviewer: Please confirm that the code is understandable and well-documented. It is not your job to rewrite the code for the code owner or to test the code’s functionality. If you have to spend too much time on it, send it back to the owner with your remarks and ask them to improve it before your final revision.
Additional good coding practice, in addition to the code review, is code testing. Code-based testing encompasses various methods to ensure software reliability and quality. Techniques like unit and integration testing help developers validate their code. Unit testing checks individual components of code for correctness, while integration testing ensures that combined components work together as expected.
A key element of code testing is building a solid foundation of tests that cover different scenarios and edge cases. These tests serve as a safety net, offering ongoing feedback on code functionality. By thoroughly testing, developers can catch and fix issues early, reducing the time and effort needed for debugging and maintenance.
This section is coming soon
This section is coming soon
Within the neuroimaging field of neuroscience, many tools are open source and to be used by everyone. Some require a free licence that can be requested.
We’ll review a selection of often applied software and tools:
Software/tool | Type of data | Purpose | Openness? | Recommended tutorials and resources |
---|---|---|---|---|
FSL | Several types of MR images | Process or view images from a variety of modalities | Fully Open | FSL Course on Youtube |
Table to be completed soon! |
This section is coming soon