Mastering the art of preclinical data analysis is key to providing robust translational outcomes that are later used in clinical research. In this article, we get into the nitty-gritty of handling preclinical data to help you approach the most common problems researchers face in this field of study. Learn how to manage the data collection process and how to analyze preclinical samples with maximum efficiency with the help of current AI-aided methods.
- Why is preclinical testing important?
- Types of preclinical studies
- What are the characteristics of preclinical data?
- The answer to effective preclinical data management
- Current methods for the analysis of preclinical image data
Why is preclinical testing important?
Before we dive into the specifics of preclinical research, it is vital to explain the basics and the significance of preclinical testing.
First, it is necessary to make the difference between preclinical and non-clinical studies clear. Preclinical research typically involves animals and refers to studies conducted before moving into clinical testing, while non-clinical studies are not related to and do not involve any living species, i.e., animals. The Federal Drug Association (FDA) provides a more substantial definition of non-clinical testing in their Good Laboratory Practice (GLP) Guidelines (Section 58.3.). According to the FDA, a non-clinical laboratory study refers to “in vivo or in vitro experiments in which test articles are studied prospectively in test systems under laboratory conditions to determine their safety. The term does not include studies utilizing human subjects or clinical studies or field trials in animals. The term does not include basic exploratory studies carried out to determine whether a test article has any potential utility or to determine physical or chemical characteristics of a test article.” (FDA, 2023)
The process of drug development generally involves non-clinical or pre-clinical and clinical studies. Preclinical studies aim to look for new cures, predict drug effects in humans, and identify potential treatments for new and already well-established diseases (Polson & Fuji, 2012).
Depending on the type of study, this initial phase of drug development can either be performed in vitro, in vivo, ex vivo, or using in silico models. On top of that, this stage allows researchers to learn more about the safety and efficacy of a potential drug candidate, as well as gives them the possibility to assess its toxicity, safety toxicity levels, dosing, and treatment response prediction (Shegokar, 2020).
After the preclinical phase is completed, researchers need to gather and review their data and findings and make the decision whether a drug candidate can be tested on humans (final target population) within a clinical trial and whether an Investigational New Drug (IND) application can be submitted to the FDA or the European Medicines Agency (EMA) for approval (FDA, 2023).
With that in mind, researchers are confronted with the need to improve the reproducibility and predictive power of preclinical models to fully ‘exploit’ the potential of preclinical testing (Coleman et al., 2016). Like any clinical trial, preclinical studies have to be properly planned to answer the given research question(s). The statistical analysis of preclinical data and the quantitative methods involved have to be planned based on the study design, as these two aspects are linked. To describe the size and importance of the observed effect, for example, it is a prerequisite to match statistical methods and study design (Aban & George, 2015).
Researchers aim to keep statistical power high and results statistically significant. Statistical significance i.e., of a treatment effect refers to hypothesis testing. Ultimately the null hypothesis (H₀), meaning a studied treatment has yielded no significant effect, needs to be rejected. For a continuous outcome measurement, the H₀ may be that the group differences in a mean response equal zero.
However, it is essential to understand that no statistical significance only means that the data from the study provides insufficient evidence to prove H₀. Then the study setup needs to be re-evaluated in terms of sample size, study method, sampling method, and so on. Usually, p-values are used for hypothesis testing and p<0.05 is associated with statistical significance. In some cases, the more conservative p<0.01 is used.
Different statistical methods can be used to calculate results. Typically preclinical publications include reporting on the statistics study sample characteristics like group size, mean, standard deviation (SD), variance coefficient, and R-squared in terms of precision (Palesch, 2014). Multivariate statistical models are often involved in the analysis to compare the mean differences between treatment and control groups like t-test, ANOVA, or MANOVA.
Did you know?
t-test: This is a tool to measure the difference between two means/groups, which may or may not be related to each other, indicating the probability that those differences have happened by chance. A t-test cannot be used if you have more than two groups.
ANOVA: This is a statistical method used for testing for differences in the means of three or more groups. If there is one dependent variable, then ANOVA is used.
MANOVA: This is an extension of ANOVA to compare multivariate sample means if two or more dependent variables are used.
The degree of variability is a contentious topic in preclinical studies. You may have a highly statistically significant p-value, but it is also worth reporting how much of the variability is explained by your model. The average R-squared value in medical research is 0.499. The R-squared is not necessarily a measure of the quality of your research, yet some journals still require a certain threshold to be met to publish your results. However, R-squared values near both ends of the spectrum are no exception in biomedical research. Around 10% of R-squared values reported in biomedical studies are below 0.035, while approximately 10% lie above 0.979 (Choueiry, 2023).
Variability within samples is often related to factors such as the genotype, age, and sex of the animal subjects, as well as environmental conditions and experimental procedures. Sometimes researchers even try to minimize variability by reducing the heterogeneity within the animal subjects groups.
Did you know?
According to Usui et al. (2021), some authors advocate embracing variability in preclinical research as “the reduction of variability within studies can lead to idiosyncratic, lab-specific results that cannot be replicated.”
Types of preclinical studies
If we, for example, look at the drug development process, there are three major steps involved:
- formulation development phase/discovery
- preclinical studies, and
- clinical trials.
Typically, the transition from one stage to another is a continuum. This is especially true for the first two phases. The second phase (preclinical studies in new drug development) encompasses preliminary toxicology and safety testing, which ultimately helps to single out the most promising drug candidate.
We’d like to discuss several rules and requirements, which need to be fulfilled before initiating a clinical trial. Depending on the “product” to be developed, different types of preclinical studies may be necessary. For example, a potential drug undergoes assessment regarding:
- Pharmacodynamic (PD)
- Pharmacokinetics (PK)
- Absorption, Distribution, Metabolism, and Excretion (ADME)
- Toxicology testing/safety.
Did you know?
Pharmacodynamics (PD) assesses the effects of a drug on a body or biological environment to put the drug into a category such as potent, cytotoxic, or safe. Pharmacodynamic studies estimate the therapeutic index and therapeutic window of a drug (Shegokar, 2020).
Did you know?
Pharmacokinetics (PK) determines how a body handles a substance or drug. It refers to the activity and elimination of the drug by studying the drug distribution profile and plasma profile over a period of time. Key information to identify safer dose ranges/drug regimes on ADME is usually collected (Shegokar, 2020).
This allows researchers to estimate dosages of the drug for later clinical trials in humans (so-called from bench to bedside).
Different classes of preclinical research, i.e., medical device approval, may not have to undergo these specific testing regimens and may move directly to GLP for safety- or biocompatibility testing to examine whether a certain device and its components are safe for use on living organisms. (Steinmetz et al., 2009; EMA, 2023; FDA, 2023)
Did you know?
To be acceptable for submission to regulatory authorities such as the EMA or the FDA (equivalent in the US), most preclinical research must adhere to a set of standards (Steinmetz et al., 2009; EMA, 2023; FDA, 2023).
Preclinical studies are performed based on in vitro, in vivo, ex vivo, and in silico models. Thus, initial data on the efficacy and safety/adverse effects is obtained before testing in a final target population i.e., humans. These tests need to be carried out in compliance with GLP and/or Good Scientific Practices (GSP) to ensure certain reproducibility and reliability rules of the results. The EMA and FDA require those standards to determine whether a lead drug candidate is suitable for a clinical trial. However, this mainly depends on the indication and regulatory guidelines defined by the particular authority (Shegokar, 2020). For this reason, it is vital to understand the characteristics of preclinical data, which we will now explain in the next section.
What are the characteristics of preclinical data?
The drug development process for a given disease from concept to market is extensive and costly in terms of effort, financial resources, and time. When sufficient preclinical data is gathered from a preclinical study, an IND application is submitted to the FDA or the EMA. Only after approval of the IND, researchers can follow up with the clinical trials.
Unfortunately, a high number of the preclinical studies, which showed some therapeutic effect, do not yield similar results in human trials. This problem is mostly due to poor preclinical data quality, protocol standardization, and planning, as well as inconsistencies in the conduct and reporting of most preclinical studies. Hence, it is vital to follow standardized protocols, such as the principles of GLP, and to understand the standard statistical techniques used in preclinical testing.(Inmaculada et al., 2015).
Regarding the study design, one key aspect to consider is the sample size or sample size requirements, meaning the number of subjects in the study to retrieve robust translational datasets. Group size mainly depends on the estimated or expected variability of the outcome or effect size, which also influences the statistical method/test used.
Sample size requirements can vary widely, with statistical models requiring ‘hundreds of animals per group,’ while smaller, homogenous samples may only need to study at least ’10 animals per group’ (Li et al., 2014). Furthermore, preclinical testing with animals requires decisions about the animal species (optimally two different species are used), the age and sex of the animals, as well as aspects like animal handling and ethical issues (Inmaculada et al., 2015). Consequently, this can cause challenges in terms of the statistical variability of animal models, and the variability in survival (Altman & Krzywinski, 2014).
Regulatory authorities such as the FDA and EMA require researchers not only to follow their guidelines (depending on whether the study is conducted in the US or Europe) but also the GLP for preclinical laboratory studies. These regulations call for basic standards in the following categories to be met (FDA, 2023):
- study conduct,
- personnel and facilities,
- written protocols,
- operating procedures,
- study reports,
- a system of quality assurance oversight for each study,
- CDEs: common data elements.
By following these rules, the typically not very large preclinical studies must provide in-depth data on toxicity, safety, and dosing. Based on these results, a decision can be made on whether the drug has the potential to be tested on humans within a clinical trial.
The answer to effective preclinical data management
Back in 2016, the FDA commissioner proposed the development of a database that would collect information from preclinical research before clinical trials. This database would have been similar to ClinicalTrials.gov, but would have been dedicated to preclinical work. Researchers and the scientific community objected to this proposal because of concerns about reliability, funding, legal issues, and regulatory oversight, among others. Usually, when a certain substance proves to be effective, all drug development data is sealed. If a preclinical trial fails, data is very rarely publicly available, despite its potentially high scientific value. A preclinical database, however, would support the learning and sharing of information from both unsuccessful and successful studies (Shegokar, 2020).
Preclinicaltrials.eu (PCT) is a great example of such a platform, which aims to provide a comprehensive overview of animal studies to increase transparency, reduce reporting bias, encourage data sharing (including information on the study design), facilitate reproducibility and create opportunities for collaborative research in the preclinical development pipeline (preclinicaltrials.eu, 2023).
Using a data management system in the preclinical workflow would not only greatly benefit planning, preparation, execution, and analysis, but would also facilitate faster translation to clinical trials and grant access to data from multiple sources. For example, common data elements used in image-guided preclinical data management settings include the following aspects (Persoon et al., 2019):
- Study set-up,
- Treatment schedules,
- Treatment (radiation, immunotherapy, chemotherapy, etc.) plan,
- Imaging techniques and Data (MRI, PET, SPECT, CT, dual-energy CT, multi-energy CT, cone beam CT, digital pathology, histology, and other microscopy images),
- Radiomic data,
- Genomics data,
- Data analysis files and reports.
While clinical imaging is already a norm in daily practice and clinical trials, imaging-based preclinical research (i.e., in-vivo imaging performed on living animals) and especially cancer research is still in the early stages of digitization, which requires preclinical-imaging software solutions, image acquisition systems, as well as viewing and analysis tools.
An effective data management system needs to handle the vast amounts of data generated in preclinical trials. According to Persoon et al. (2019), such a system should for this reason be capable of:
- workflow management (preparation, execution, analysis, storage management),
- handling large amounts of data from different sources and locations (data exchange between researchers, big data analysis),
- integrating preclinical systems,
- reporting results,
- performing complex queries in the database,
- supporting plug-ins for outcome prediction models,
- supporting communication (between data management platforms),
- translating preclinical findings into human clinical trials.
Imaging technologies play a vital role in the drug development process by providing insights into the effects of drug candidates on the organ-, tissue-, cell-, and molecular levels. Acquiring high-quality imaging data is one of the cornerstones of effective preclinical data generation. Modern imaging technologies provide researchers with new opportunities to collect high-resolution imaging data across a range of modalities. The imaging technology used in preclinical research ranges from medical imaging tools like micro-CT and MRI to highly sophisticated microscopy imaging on the cellular level.
To manage the vast amount of multidimensional data generated through imaging techniques, researchers need preclinical data systems that can handle large datasets with complex metadata and also offer an intuitive user interface (Persoon et al., 2019).
This is where advanced data management platforms like IKOSA come into play, providing researchers with image acquisition-, viewing-, and analysis tools that are not only user-friendly but also allow the real-time integration of imaging data into the research workflows. Additionally, image archiving and processing tools with sophisticated data structures can provide secure storage, easy retrieval, and sharing of multidimensional imaging datasets with multiple time points. This makes it easier for researchers to access and analyze imaging data over time.
Available open-source data management platforms
Open-source platforms like XNAT and Small Animal Shanoir (SAS) offer flexible and scalable solutions for preclinical imaging data management. XNAT, for example, provides tools for the secure storage, processing, and sharing of imaging data. It also allows interaction with software packages such as Matlab and Python (Schwartz et al., 2012), making it easier to integrate XNAT into existing research workflows and analysis pipelines (Zullino et al., 2022).
Similarly, SAS is an open-source extension of the Shanoir data management system designed particularly for small animal imaging studies. It provides researchers with a collection of tools and functionalities for managing their data, collaborating with other researchers, and conducting their analyses. The data management architecture supports a variety of imaging modalities and provides a secure web interface for sharing data and metadata with collaborators. SAS also includes analysis and visualization tools that allow researchers to explore and analyze their imaging data. Researchers can take advantage of built-in image processing and analysis tools, as well as integrate external analysis tools to meet the needs of specific research projects. (Kain et al., 2020)
Flexible integration opportunities between preclinical data management systems and other software packages are necessary to meet the growing needs of preclinical research. For example, API integration with software packages such as Matlab and Python can facilitate the creation of predictive models from imaging data. These models can help preclinical researchers to better understand the efficacy and safety of drug candidates, ultimately leading to more informed decisions about drug development.
Additionally, researchers can use imaging data of control and treatment groups to generate dose-response curves and investigate the effects of different dosing regimens on various organs and tissues. This information can then be used for optimizing drug dosing to reduce the risk of adverse effects (Persoon et al., 2019).
While open-source software products have many advantages, such as being cost-effective, easily accessible, and customizable, they also have some limitations compared to proprietary software products.
One limitation is the level of technical support available. Proprietary software vendors often offer extensive technical support services, such as help desks, knowledge bases, and training courses, to assist users with any issues they encounter. With open-source software, support is often provided through online communities of users or volunteer contributors, which may not be as reliable or responsive as a dedicated support team.
Another limitation might be the availability of features and updates. Commercial software vendors have more resources to develop and release new features, updates, and bug fixes more quickly than open-source software projects, which mostly rely on volunteer developers with limited time and resources. Additionally, premium software platforms may have exclusive access to certain technologies or intellectual property that can provide additional functionality.
One example is the IKOSA platform, which is designed to be user-friendly, with no coding skills required for image analysis. This is especially helpful for researchers who may not have extensive technical expertise in image analysis or computer programming. IKOSA also has an online trial version, which is a great opportunity to make yourself familiar with the user interface as well as the image analysis and management capabilities.
Current methods for the analysis of preclinical image data
As researchers continue to investigate new drugs and therapies for treating different diseases, preclinical image data analysis plays a vital role in evaluating their efficacy and safety. The use of advanced imaging techniques such as confocal microscopy and super-resolution microscopy enables the visualization of cells and tissues at the subcellular level with high spatial resolution. These techniques generate vast amounts of imaging data that require specialized analysis methods to extract meaningful information. The use of AI-aided data analysis methods in microscopy imaging has become increasingly important in recent years, providing researchers with faster and more precise methods to analyze imaging data.
AI-aided data analysis for greater precision in preclinical research
AI-assisted bioimage analysis has become an invaluable aid in preclinical studies, as it allows researchers to visualize and quantify the effects of various interventions on tissues and organs. It has significantly improved the efficiency of data processing, accelerating the discovery of new insights in preclinical research. Fast processing times and the automated analysis of large imaging data amounts enable a cost-efficient operation and support increased accuracy while facilitating the reproducibility of data analysis workflows (Meijering, 2020).
Another major advantage of using automated methods is that they can extract quantitative and qualitative information and solve complex segmentation tasks, including segmenting Region Of Interest(s), organs, and tissues with a high degree of quality and reduced human bias. AI-aided pattern recognition applications can automatically detect and recognize those in data, which can then be used to make predictions or classifications.
These models are trained on a representative dataset of images and can learn patterns based on input data that correspond to different features, such as shape, color, or texture. Once trained, the model can be used to classify new images based on the patterns it has learned. This facilitates the automated detection and quantification of biological features, enabling the identification of subtle differences between datasets. This is particularly important in preclinical research, where minor differences in biological systems can have a significant impact on the efficacy of treatments.
The quality of preclinical images has a crucial impact on the accuracy of machine learning models used for image analysis. For these applications to effectively recognize patterns in images, the images must be of sufficient quality and resolution. Data preprocessing methods, such as the detection and correction of imaging artifacts, noise reduction, and image registration, can be applied to improve the quality of the data before analysis. To ensure the reliability of the analysis results, performance measurements and validation methods might be incorporated. For example, the segmentation results can be compared against ground truth labels or manual annotations to evaluate their accuracy.
Quantitative biomarkers extracted from the analysis using AI-aided methods can provide valuable insights into the underlying biological processes. For instance, by quantifying signal intensities, these methods can reveal changes in gene expression, protein levels, and other cellular activities. They can also quantify objects and morphological structures, such as dendritic spines or axonal processes, providing important information about neuronal morphology and function (read up on it in our blog post on axon regeneration). Furthermore, fluorescence quantification allows for the detection and measurement of fluorescently labeled proteins or other molecules in a sample, making it possible to study cellular signaling pathways and other processes. (Chandrasekaran et al., 2020)
With all the information above, you are now well-equipped to master the art of preclinical data analysis. Maximize your chances of success by successfully tackling common problems researchers face in this field of study. Current AI-aided methods like IKOSA can help you manage your data collection and analysis of preclinical samples. If you have any further questions, feel free to reach out to us or leave a comment below.
Written by Elisa Opriessnig and Benjamin Obexer
Aban, I. B., & George, B. (2015). Statistical considerations for preclinical studies. Experimental neurology, 270, 82–87. https://doi.org/10.1016/j.expneurol.2015.02.024
Chandrasekaran, S. N., Ceulemans, H., Boyd, J. D., & Carpenter, A. E. (2021). Image-based profiling for drug discovery: due for a machine-learning upgrade? Nature reviews. Drug discovery, 20(2), 145–159. https://doi.org/10.1038/s41573-020-00117-w
Choueiry, G. (2023). Quantifying Health. https://quantifyinghealth.com/r-squared-study/
Coleman, C. N., Higgins, G. S., Brown, J. M., Baumann, M., Kirsch, D. G., Willers, H., Prasanna, P. G., Dewhirst, M. W., Bernhard, E. J., & Ahmed, M. M. (2016). Improving the Predictive Value of Preclinical Studies in Support of Radiotherapy Clinical Trials. Clinical cancer research: an official journal of the American Association for Cancer Research, 22(13), 3138–3147. https://doi.org/10.1158/1078-0432.CCR-16-0069.
EMA (2023). European Medicines Agency. https://www.ema.europa.eu/en
FDA/Office of the Commissioner. (2023). U.S. Food and Drug Administration. https://www.fda.gov/
Kain, M., Bodin, M., Loury, S., Chi, Y., Louis, J., Simon, M., Lamy, J., Barillot, C., & Dojat, M. (2020). Small Animal Shanoir (SAS) A Cloud-Based Solution for Managing Preclinical MR Brain Imaging Studies. Frontiers in neuroinformatics, 14, 20. https://doi.org/10.3389/fninf.2020.00020
Meijering E. (2020). A bird’s-eye view of deep learning in bioimage analysis. Computational and structural biotechnology journal, 18, 2312–2325. https://doi.org/10.1016/j.csbj.2020.08.003
Palesch Y. Y. (2014). Some common misperceptions about P values. Stroke, 45(12), e244–e246. https://doi.org/10.1161/STROKEAHA.114.006138
PCT. (2023). Preclinical Trials EU. https://preclinicaltrials.eu/about-pct
Persoon, L., Hoof, S. V., van der Kruijssen, F., Granton, P., Sanchez Rivero, A., Beunk, H., Dubois, L., Doosje, J. W., & Verhaegen, F. (2019). A novel data management platform to improve image-guided precision preclinical biological research. The British Journal of Radiology, 92(1095), 20180455. https://doi.org/10.1259/bjr.20180455
Polson, A. G., & Fuji, R. N. (2012). The successes and limitations of preclinical studies in predicting the pharmacodynamics and safety of cell-surface-targeted biological agents in patients. British Journal of Pharmacology, 166(5), 1600–1602. https://doi.org/10.1111/j.1476-5381.2012.01916.x
Schwartz, Y., Barbot, A., Thyreau, B., Frouin, V., Varoquaux, G., Siram, A., Marcus, D. S., & Poline, J. B. (2012). PyXNAT: XNAT in Python. Frontiers in neuroinformatics, 6, 12. https://doi.org/10.3389/fninf.2012.00012
Shegokar, R. (2020). Preclinical Testing – Understanding the basics first. In: Ranjita, Shegokar (Editor): Drug Delivery Aspects. Volume 4: Expectations and Realities of Multifunctional Drug Delivery Systems. Science Direct.
Steinmetz, K. L., & Spack, E. G. (2009). The basics of preclinical drug development for neurodegenerative disease indications. BMC Neurology, 9 Suppl 1(Suppl 1), S2. https://doi.org/10.1186/1471-2377-9-S1-S2
Usui, T., Macleod, M. R., McCann, S. K., Senior, A. M., & Nakagawa, S. (2021). Meta-analysis of variation suggests that embracing variability improves both replicability and generalizability in preclinical research. PLoS Biology, 19(5), e3001009.Zullino, S., Paglialonga, A., Dastrù, W., Longo, D. L., & Aime, S. (2022). XNAT-PIC: Extending XNAT to Preclinical Imaging Centers. Journal of digital imaging, 35(4), 860–875. https://doi.org/10.1007/s10278-022-00612-z