Original Research

Development and validation of a prognostic model to predict birth weight: individual participant data meta-analysis

Abstract

Objective To predict birth weight at various potential gestational ages of delivery based on data routinely available at the first antenatal visit.

Design Individual participant data meta-analysis.

Data sources Individual participant data of four cohorts (237 228 pregnancies) from the International Prediction of Pregnancy Complications (IPPIC) network dataset.

Eligibility criteria for selecting studies Studies in the IPPIC network were identified by searching major databases for studies reporting risk factors for adverse pregnancy outcomes, such as pre-eclampsia, fetal growth restriction, and stillbirth, from database inception to August 2019. Data of four IPPIC cohorts (237 228 pregnancies) from the US (National Institute of Child Health and Human Development, 2018; 233 483 pregnancies), UK (Allen et al, 2017; 1045 pregnancies), Norway (STORK Groruddalen research programme, 2010; 823 pregnancies), and Australia (Rumbold et al, 2006; 1877 pregnancies) were included in the development of the model.

Results The IPPIC birth weight model was developed with random intercept regression models with backward elimination for variable selection. Internal-external cross validation was performed to assess the study specific and pooled performance of the model, reported as calibration slope, calibration-in-the-large, and observed versus expected average birth weight ratio. Meta-analysis showed that the apparent performance of the model had good calibration (calibration slope 0.99, 95% confidence interval (CI) 0.88 to 1.10; calibration-in-the-large 44.5 g, −18.4 to 107.3) with an observed versus expected average birth weight ratio of 1.02 (95% CI 0.97 to 1.07). The proportion of variation in birth weight explained by the model (R2) was 46.9% (range 32.7-56.1% in each cohort). On internal-external cross validation, the model showed good calibration and predictive performance when validated in three cohorts with a calibration slope of 0.90 (Allen cohort), 1.04 (STORK Groruddalen cohort), and 1.07 (Rumbold cohort), calibration-in-the-large of −22.3 g (Allen cohort), −33.42 (Rumbold cohort), and 86.4 g (STORK Groruddalen cohort), and observed versus expected ratio of 0.99 (Rumbold cohort), 1.00 (Allen cohort), and 1.03 (STORK Groruddalen cohort); respective pooled estimates were 1.00 (95% CI 0.78 to 1.23; calibration slope), 9.7 g (−154.3 to 173.8; calibration-in-the-large), and 1.00 (0.94 to 1.07; observed v expected ratio). The model predictions were more accurate (smaller mean square error) in the lower end of predicted birth weight, which is important in informing clinical decision making.

Conclusions The IPPIC birth weight model allowed birth weight predictions for a range of possible gestational ages. The model explained about 50% of individual variation in birth weights, was well calibrated (especially in babies at high risk of fetal growth restriction and its complications), and showed promising performance in four different populations included in the individual participant data meta-analysis. Further research to examine the generalisability of performance in other countries, settings, and subgroups is required.

Trial registration PROSPERO CRD42019135045

What is already known on this topic

  • Accurate and practical methods for predicting birth weight can help identify pregnancies at greater risk of adverse perinatal outcomes and provide opportunities for early intervention to improve outcomes

  • Because of their small sample size and dichotomisation of birth weight outcome, current birth weight prediction models have a high risk of bias, further limiting their power and usefulness

  • Existing prediction models show varying levels of accuracy with limited assessment of their generalisability in different populations or settings

What this study adds

  • A prediction model for birth weight at various potential gestational ages was developed and validated, based on data readily available at the first antenatal visit

  • The model was derived from a large, ethnically diverse dataset, with cohorts from the US, UK, Norway, and Australia, with continuous birth weight data, and showed good calibration performance on internal-external cross validation

  • The model was particularly well calibrated (smallest prediction errors) in the lower end of predicted birth weight, and explained about 50% of individual variation in birth weights

How this study might affect research, practice, or policy

  • The predictive ability of the model could be useful for early identification of babies at risk of abnormal growth at the time of the antenatal booking

  • The model could help inform clinical decision making in pregnancies at high risk of fetal growth restriction and its complications

  • Use of the prediction model in practice might require evaluation in cluster randomised trials and should be evaluated in other countries, settings, and subgroups

Introduction

Identifying abnormal fetal growth patterns antenatally can help reduce perinatal mortality and morbidity.1 Birth weight and estimated fetal weight for gestational age are important indicators of the health of the mother and the baby's chance of survival and future health.2–4 Babies with a birth weight below the population threshold of the 10th centile are usually classified as small for gestational age and considered to be at greater risk of adverse outcomes because of growth restriction.5 6 In these babies, the odds of stillbirth and neonatal death are substantially higher than normal weight fetuses at every week beyond the expected date of delivery.2 Also, the healthcare needs for babies born small for gestational age is higher on average than babies born at appropriate weight for their gestational age.7

Accurate and practical methods of predicting birth weight can help identify babies with an increased risk of adverse perinatal outcomes and provide opportunities for early intervention to improve their outcomes. Current methods of estimating birth weight and fetal weight rely on formulas from fetal biometry on antenatal ultrasound,8 9 an approach associated with considerable variation in precision and consistency of measurement.10 11 Global health inequity is further widened in many resource poor settings with a high burden of perinatal mortality but limited access to ultrasound machines or experienced operators, which affects effective antenatal monitoring of fetal growth.12

Developing accurate birth weight prediction models in individual studies is limited because of small sample size, selectiveness in the population used for developing the model,13 and dichotomisation of predictors or birth weight outcome.14 15 Furthermore, the predictive performances of previously published models have not been validated externally, and so none can currently be recommended for use in routine clinical practice.16 In this study, we used an individual participant data meta-analysis to overcome these limitations. We developed and validated a multivariable prediction model for birth weight at various potential gestational ages of delivery, with routinely available obstetric history and personal data collected at the first antenatal visit. Figure 1 shows the visual abstract.

Figure 1
Figure 1

Visual abstract

Methods

Our individual participant data meta-analysis followed existing recommendations for developing and validating a prediction model,17–20 and used a prospective protocol registered with PROSPERO. We reported our findings based on the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) cluster guideline for development and validation of a prediction model in clustered data (online supplemental appendix 1).21

Data sources and study population

Eligible studies were identified from the International Prediction of Pregnancy Complications (IPPIC) individual participant data network dataset.22 23 Access to the IPPIC dataset was provided after application to the IPPIC data access committee. The IPPIC dataset contains individual participant data from observational studies and cohorts nested within randomised studies reporting various maternal and perinatal outcomes.22 Women in the studies were recruited with sampling methods and inclusion criteria designed to capture a broad cross section of the target population. Studies in the network were identified by searching major databases for studies reporting risk factors for adverse pregnancy outcomes, such as pre-eclampsia, fetal growth restriction, and stillbirth, from database inception to August 2019. The quality of the individual participant data from each study was assessed with the participants, predictors, and outcomes domains of the prediction model risk-of-bias assessment (PROBAST) tool.24 Details of the search, identification, inclusion of studies, and individual participant data harmonisation for the IPPIC dataset are provided elsewhere.25 26

Candidate predictors

Clinically relevant candidate predictors were identified from the literature27 and prioritised by clinical experts with a two round Delphi process. From an initial list of 33 predictor variables, candidate predictors were: maternal weight at the first antenatal visit, maternal height, maternal age, parity, smoking status, ethnic group (white, black, South Asian, Hispanic, mixed, or other), history of chronic hypertension, history of diabetes, assisted conception, and any previous history of pre-eclampsia, stillbirth, and a baby born small for gestational age. We included gestational age at delivery as a predictor in our model to allow us to produce birth weight predictions for a range of assumed gestational ages at delivery.27 Maternal weight at the first antenatal visit was standardised in the IPPIC dataset to include weight before pregnancy and at the first trimester.

Outcome

Our primary aim was to develop a model to predict the birth weight of a baby at any potential gestational age of delivery. As a continuous measure, our outcome was not limited by arbitrary cut-off values and, if needed, the predicted birth weight could be converted into predicted centiles based on any fetal growth standard.28–30 Predicting birth weight at various gestational ages at delivery also provides information on the severity of any restricted growth and the expected timing of onset to allow planning for appropriate management.

Model development and validation cohorts

Data from four studies (237 228 pregnancies;31–34 online supplemental appendix 2) within the IPPIC individual participant data network dataset provided the best combination of candidate predictor variables while maximising the numbers of cohorts and participants for developing the model. All four studies included pregnant women from different ethnic groups. Data on obstetric history and personal characteristics were obtained by various methods, including self-reporting, routine data collected from medical records, or recorded by the research team with prespecified definitions.

The sample size for developing the model should ensure small optimism in predictor effect estimates, a small difference between the apparent and adjusted R2, precise estimation of the mean predicted birth weight (the model intercept), and precise estimation of the model's residual standard deviation.35 To calculate the minimum sample size required, we assumed a lower bound of 0.5 for the anticipated adjusted R2 of the model to be developed, and an intercept value of −0.935, with a standard error of 0.043 (on the log10 scale) based on a previously published birth weight prediction model.27 Hence a minimum sample of 618 women was required to consider up to 50 predictor parameters in a linear regression model. The sample size in the development and internal-external cross validation cohorts far exceeded these estimates.

Statistical analysis

Missing data

We used multiple imputation by chained equations, assuming a missing-at-random mechanism, to generate 100 imputed datasets for each of the included cohorts separately to retain heterogeneity between the study cohorts.36 Continuous variables were imputed with linear regression, binary variables with logistic regression, and categorical variables with predictive mean matching. The imputation model included all candidate predictors and outcome to help ensure that the missing-at-random assumption was reliable. Imputed outcomes were not included in the analysis. We did not impute when all values were missing or when >90% of values were missing. After imputation, we checked the consistency of imputations by comparing distributions of values and summary statistics for imputed datasets with the original unimputed data.

Non-linear relations between continuous candidate predictors and the outcome were considered with multivariable fractional polynomial models.37 38 Fractional polynomials were first identified in the complete case datasets, with each of the non-linear terms, and then included in the imputation model to allow us to consider this non-linearity when developing the model. Fractional polynomial terms for gestational age at delivery and maternal height were included in the final model to account for non-linear relations with birth weight (online supplemental appendix 3).

Model development

We developed the prediction model with multilevel linear regression, with a random effect on the intercept to account for clustering by cohort. We used backward elimination for variable selection where the same model (with the same candidate predictors) was fitted to all imputations, and pooled Wald tests (with Rubin's rules) were used for backward elimination, with P>0.157 (proxy for Akaike information criterion) for exclusion.39

To adjust for overfitting in the development of the model, we calculated the heuristic shrinkage factor in each imputation and pooled for all imputations with Rubin's rules to obtain the average shrinkage factor.40 41 This average shrinkage factor was then applied to each beta coefficient in the model, and subsequently the average intercept value was re-estimated (holding fixed the shrunken beta coefficients) to ensure that the final model predictions were calibrated-in-the-large.

Internal-external cross validation

We maximised our access to individual participant data from multiple studies and used an internal-external cross validation approach.42 43 With the same approach as described above, a model predicting birth weight was developed with all but one of the four cohorts, keeping one cohort for validation. The shrunken model equation was then applied to the excluded cohort to calculate the predicted birth weight at the observed gestational age at delivery. Predictive performance was then evaluated for the model in this excluded cohort, based on values for calibration-in-the-large, calibration slope, root mean squared error, mean absolute error, and R2. This process was then repeated until each cohort had been used to assess the external validation of the model. If required, we will include the largest cohort in all cycles of the internal-external cross validation approach to ensure a large sample size for developing the model in each cycle. Calibration plots were also produced for each cycle of the internal-external cross validation, plotting average observed and expected values for all imputations.

Predictive performance measures from the internal-external cross validation were summarised with a random effects meta-analysis to give a summary estimate of overall performance. The calibration slope and calibration-in-the-large were pooled on their original scales. Confidence intervals were derived with the Hartung-Knapp-Sidik-Jonkman variance correction.44 Heterogeneity in model performance for all internal-external cross validation cycles was quantified with τ2 and 95% prediction intervals.

Apparent performance of the model developed with all four cohorts was calculated for each cohort individually and for all four cohorts (without accounting for clustering). Cohort specific apparent predictive performance was summarised with a random effects meta-analysis to give a pooled estimate of overall apparent model performance. We also carried out a sensitivity analysis of the predictive performance of the model by gestational age at delivery (32-36 weeks and ≥37 weeks) to assess the differential model performance in these populations. All analyses were performed with Stata 16 software.45

Patient and public involvement

Members of the public were involved in prioritising the research question, and developing, designing, and managing the research. The study is supported by the Hildas (https://www.dhlnetwork.com/news), a dedicated patient and public involvement group in women's health. The team members were involved in the interpretation and reporting of the results. Findings will be further disseminated in workshop events with key stakeholders and in a format more suitable for patients and members of the public.

Results

Study population

The model development and validation dataset included four studies (237 228 pregnancies) from four countries, one each from the UK (Allen et al, 2017),32 Norway (STORK Groruddalen research programme 2010),33 Australia (Rumbold et al, 2006),31 and the US (National Institute of Child Health and Human Development (NICHD), 2018).34 Three studies were prospective observational studies of unselected pregnant women32–34 whereas one was a randomised trial of nulliparous pregnant women at low risk of complications of pregnancy.31 The US based study34 made up >95% of the pooled sample size. Women in the combined dataset were mostly from the white ethnic group (50%, n=118 554), followed by black (22%, n=52 691) and Hispanic (17%, n=40 422) ethnic groups. Median gestational age at delivery for the four studies was 39 weeks (interquartile range 38-40), mean maternal age was 27.7 years (standard deviation (SD) 7.4), and birth weight was 3202 g (SD 643.4) (table 1). Assessment of risk of bias of the cohorts with the PROBAST tool considered that all cohorts were at low risk of bias in the domains of participant selection, predictor, and outcome reporting.

Table 1
|
Characteristics of women in development and internal-external cross validation cohorts: UK (Allen et al, 2017),32 Australia (Rumbold et al, 2006),31 Norway (STORK Groruddalen research programme, 2010)33 and US (National Institute of Child Health and Human Development (NICHD), 2018)34 cohorts, and pooled data

Apparent model performance

The final multivariable model included all 13 candidate predictors of assumed gestational age at delivery, maternal weight at the first antenatal visit, maternal height, maternal age, parity, smoking status, ethnic group, history of chronic hypertension, history of diabetes, assisted conception, and previous history of pre-eclampsia, stillbirth, and a baby born small for gestational age (table 2).

Table 2
|
Model coefficients for final birth weight model and each internal-external cross validation cycle, with study specific intercepts

The apparent performance of the model showed good calibration within each cohort, with calibration slopes of 0.88 (95% confidence interval (CI) 0.89 to 0.96) for the Allen et al, 2017 cohort, 1.04 (0.99 to 1.09) for the Rumbold et al, 2006 cohort, 1.30 (0.95 to 1.11) for the STORK Groruddalen research programme, 2010 cohort, and 0.99 (0.99 to 0.99) for the NICHD 2018 cohort. Calibration-in-the-large values were near zero in each cohort: 33.1 g (95% CI 7.1 to 59.1) for the Allen cohort, 13.4 g (−6.5 to 33.3) for the Rumbold cohort, 104.7 g (75.6 to 133.8) for the STORK Groruddalen cohort, and 31.4 g (29.7 to 33.2) for the NICHD cohort. The ratio of mean observed to mean predicted birth weight in each cohort was near 1.00 (range 1.01-1.04), with confidence intervals that overlapped one (table 3).

Table 3
|
Overall apparent model performance and by cohort: UK (Allen et al, 2017),32 Australia (Rumbold et al, 2006),31 Norway (STORK Groruddalen research programme, 2010),33 and US (National Institute of Child Health and Human Development (NICHD), 2018)34 cohorts, and pooled data

We used a meta-analysis to summarise performance for the four cohorts and found good calibration, on average, with a pooled calibration slope of 0.99 (95% CI 0.88 to 1.10) and a pooled calibration-in-the-large of 44.5 g (−18.4 to 107.3), corresponding to a pooled ratio of mean observed to mean predicted birth weight of 1.02 (0.97 to 1.07) (table 3). Calibration curves were close to the line of ideal calibration for all four cohorts.

The pooled estimate for proportion of variation in birth weight explained by the birth weight model (R2) was 46.9% (range 32.7-56.1% in each cohort) (table 3). Errors in predictions varied in individuals, as shown by the range of observed versus expected birth weights in the pooled data panel of figure 2, with a root mean squared error of 427.8 g for all cohorts, ranging from a low of 427.7 g (NICHD, 2018 cohort) to 438.2 g (Rumbold et al, 2006 cohort). We found no evidence of overfitting in the development data in any cycle of the internal-external cross validation, with a heuristic shrinkage estimate of ≥0.9997 for all imputed datasets and internal-external cross validation cycles. Figure 2 shows the final prediction equation developed with all four cohorts, to calculate birth weight at any desired gestational age of delivery, with examples of how to calculate birth weight with the equation.

Figure 2
Figure 2

Calibration plots of observed versus expected birth weights for UK (Allen et al, 2017; 1045 pregnancies),32 Norway (STORK Groruddalen research programme, 2010; 823 pregnancies),33 and Australia (Rumbold et al, 2006; 1877 pregnancies)31cohorts, and for pooled data (237 228 pregnancies)

Figure 3
Figure 3

Final equation for prediction of birth weight at any potential gestational age of delivery with worked examples, superimposed over a GROW growth chart. SGA=baby born small for gestational age.

Internal-external cross validation

The internal-external cross validation analysis was done by including the largest of the four cohorts (NICHD, 2018)34 in all cycles of the internal-external cross validation, to ensure that the sample size for development of the model was always large enough to develop a reliable model, and that the validation performance calculated was representative of an external validation of the final model, which was highly influenced by this cohort. We therefore developed a model in three cohorts and applied this model within the fourth cohort, but did not include a cycle where a model was developed without the NICHD 2018 study. Estimates of calibration slope for the internal-external cross validation cycles showed minimal overfitting to the development cohort in each cycle, with estimates of 0.90 (95% CI 0.82 to 0.97) when validated in the Allen et al, 2017 cohort, 1.07 (1.02 to 1.12) when validated in the Rumbold et al, 2006 cohort, and 1.04 (0.96 to 1.12) when validated in the STORK Groruddalen research programme, 2010 cohort (table 4). The pooled calibration slope for the internal-external cross validation cycles showed excellent performance with little overfitting and negligible miscalibration (on average 1.00, 95% CI 0.78 to 1.23). Model performance by calibration-in-the-large also showed minimal miscalibration, with overestimation of birth weight, on average, by only 22.3 g and 33.4 g when validated in the Allen et al, 2017 and Rumbold et al, 2006 cohorts respectively, whereas the model underestimated birth weight by 86.4 g, on average, when validated in the STORK Groruddalen research programme, 2010 cohort (table 4). For each internal-external cross validation cycle, the ratio of mean observed to mean predicted birth weight was near perfect, ranging from 0.99 to 1.03 (table 4). The pooled calibration-in-the-large suggested an underestimation of birth weight by only 9.7 g (−154.3 to 173.8), on average, for the internal-external cross validation cycles (table 4).

Table 4
|
Predictive performance of developed birth weight model with average intercept in each internal-external cross validation cycle. UK (Allen et al, 2017),32 Australia (Rumbold et al, 2006),31 and Norway (STORK Groruddalen research programme, 2010)33 cohorts, and pooled estimate

Visual inspection of the calibration plots also showed good calibration, on average, for all three cycles of internal-external cross validation, with calibration curves close to the ideal line (figure 2). Errors in individual level predictions were still large for some, however, with a root mean squared error of 428.9 g, 441.3 g, and 424.6 g, for the models validated in the Allen et al, 2017, Rumbold et al, 2006, and STORK Groruddalen research programme, 2010 cohorts, respectively. Sensitivity analysis of predictive performance of the IPPIC birth weight model by gestational age at delivery did not show differential calibration performance for prediction of birth weight in term and late preterm babies (online supplemental appendix 4) or difference in individual level error distributions for different numbers of weeks of gestation (online supplemental appendix 5).

Discussion

Principal findings

Our IPPIC birth weight model, developed with data that are readily available at the antenatal booking, showed excellent prediction of birth weight, on average, and promising performance in four different populations included in the individual participant data meta-analysis. We used a robust Delphi process to prioritise candidate predictors, ensuring that we included clinically meaningful variables. We also used best practice prognostic model methods to develop and validate our birth weight prediction model. Our model predicted the birth weight of a baby at various potential gestational ages of delivery, based on maternal weight, height, age, parity, smoking status, ethnic group, history of chronic hypertension, history of diabetes, assisted conception, and previous history of pre-eclampsia, stillbirth, and babies born small for gestational age. We validated the model with an internal-external cross validation approach.

The model, when tested in cohorts in different countries, explained about 50% of individual level variability, and had good calibration performance in high and low risk populations. Prediction errors were smallest in individuals at the lower end of the range of predicted birth weights, which is important in informing clinical decisions in pregnancies at high risk of fetal growth restriction and related complications.

Strengths and limitations of this study

Our individual participant data meta-analysis simultaneously developed and validated a prediction model for birth weight. We developed our model with data from the harmonised IPPIC individual participant data, from cohorts from different countries,22 23 which provided us with a larger sample size than is achievable with just one study. This approach allowed us to develop a more comprehensive prediction model, applicable in different populations and settings included in these individual participant data. We evaluated clinically relevant predictors that are routinely available at the antenatal booking in both high and low resource settings, allowing the model to be easily applied in high income as well as in low income countries where perinatal mortality rates are highest.46 Although our prediction model showed promising performance after three cycles of internal-external cross validation in women from the UK, Norway, and Australia, multiple external validations with data specifically from low income settings are needed to fully evaluate if the the model can be transferred to these settings. These external validations will help verify the model's robustness and suitability for use in other countries and subgroups, strengthening its practical use in clinical practice.

Our model can be used to generate predictions of birth weight conditional on any clinically relevant gestational age at delivery. Integration of the model as part of routine growth charts has the potential to inform antenatal counselling and empower women to contribute towards shared decision making with clinicians about the frequency of monitoring in pregnancy and discussions on timing of birth, where concerns about the growth of the fetus exist. Further external validation of the model in different populations and settings, however, is required before implementation in clinical practice. Our prediction of birth weight was on the continuous scale, and therefore our model is not limited by arbitrary cut-off values used to define small or large for gestational age. This approach allows clinicians to calculate predicted birth centiles based on any fetal growth standard of their choice, such as GROW, INTERGROWTH 21st, and WHO.28–30

We used a systematic approach to develop and validate our birth weight prediction model, by first identifying and prioritising candidate predictors with a Delphi process and then using multiple imputation to deal with missing data for both predictors and outcome to avoid the loss of useful information.47 48 We used rigorous statistical methods to develop the prediction model and evaluate the predictive performance, with individual participant data from multiple cohorts to assess any potential heterogeneity in performance for the cohorts.

Our study had some limitations. Although mean birth weight was similar in all cohorts, the NICHD cohort had a higher standard deviation in birth weight, with greater variability than the other cohorts. This heterogeneity could be a result of variation in personal characteristics within the population, potentially affecting the generalisability of the model. Considering that the NICHD cohort was retained in all internal-external cross validation cycles, exploring the model's performance within this heterogenous context is important. External validation of the model in other datasets that represent different regions and populations will help confirm the model's generalisability, enhancing its practical applicability.

The average calibration performance of the model was good for all cohorts but varied in individual cohorts, with some underprediction in the smaller cohorts, mainly in those with the highest birth weight. Although overall calibration of the model was good, miscalibration for individual observations was found, particularly at the higher end of the range of predicted birth weights. This miscalibration produced a wide range of observed birth weights for a particular predicted birth weight in all cycles of the internal-external cross validation. This range was much narrower in the clinically important range for the lower predicted birth weights, however, where pregnancies have a higher risk of growth restriction and require intervention. Our model explained 47% of the variability in birth weight in the dataset, ranging from 56% (NICHD, 2018 cohort) to 33% (Allen et al, 2017 cohort). These differences in R2 estimates are partly a result of chance, but could also be because of differences in predictor effects in the various populations. Future research might explore if some variables, such as maternal weight or height, interact with country location and should be modelled differently for each location to improve variance explained.

Comparison with existing evidence

Most published models predict the risk of a baby born small for gestational age rather than birth weight.16 Dichotomisation of birth weight limits the power and usefulness of a prediction model. The use of specific cut-off values can also result in both overdiagnosis and underdiagnosis of fetal growth abnormalities, depending on the criteria used.49 These models were also usually poorly reported, with only a third being internally validated (10/28, 36%), and two (7%) were externally validated and showed limited predictive performance.50 51 Calibration measures were rarely reported in these studies, with only four (14%) reporting these performance measures. A prediction formula, rule, or score that would allow independent external validation was reported in only 16 (57%) of these models. Other published birth weight prediction models were developed for use in specific populations and have not undergone external validation to determine their generalisability to new and different populations.13 So far, no individual test is satisfactorily predictive of birth weight or small for gestational age to warrant recommendation in routine clinical use.52

We reported the development and validation of our prediction model in line with current guidelines on the transparent reporting of multivariable prediction models developed or validated with clustered data.21 Our model showed good calibration performance on internal-external cross validation, with only slight overprediction of birth weight, by 9.7 g on average. Our model also required the user to enter the assumed gestational age at delivery. Although the actual date of delivery is not known when making predictions, the option to enter various possible gestational ages for delivery allows the user to produce a plot of predictions of birth weight for various time points.

The Royal College of Obstetricians and Gynaecologists in the UK recommends, at the antenatal booking, assessing for risk factors for fetuses that are small for gestational age, to identify those who might need increased surveillance.53 The American College of Obstetricians and Gynecologists recommends screening for unspecified medical and obstetric risk factors, but does not recommend use of uterine artery Doppler or biochemical markers, citing lack of evidence on improvement of outcomes.54 The Society of Obstetricians and Gynaecologists of Canada calls for clinical risk factor based screening,55 whereas the Royal Australian and New Zealand College of Obstetricians and Gynaecologist suggests risk assessment through a combination of biomarkers, Doppler ultrasound, and major maternal clinical risk factors.56 The choice of risk factors and their combination to predict risk of small for gestational age or fetal growth restriction in any of these guidelines was not based on formal prediction modelling.

Relevance for clinical practice and research

The prediction of birth weight is an important aspect of antenatal care because it can provide valuable information to healthcare providers and expectant mothers about the growth and development of the fetus, with cost effective use of limited fetal monitoring resources. Accurate predictions of birth weight can also help identify infants who might have an increased risk of adverse outcomes, such as preterm birth or stillbirth, and allow for early interventions to improve outcomes. The development of accurate birth weight prediction models has been challenging, however, because individual studies often have limited sample sizes, variable definitions of birth weight outcome and predictors, with no external validation of any model developed.11 14 49 57 Our individual participant data meta-analysis combined data from multiple studies to develop a mathematical model, providing a more robust estimate of the association between included predictors and birth weight. Use of multiple datasets in the IPPIC data repository allowed us to carry out extensive validation of the model for different geographical regions, health systems, settings, and in populations of women with different baseline risks.

Only clinical characteristic predictors were included in the model, making it potentially applicable to both low and high resource settings. The predictors included are easy to measure and routinely available in clinical practice. Incorporating the model into practice will be simple because no additional measures are required to calculate the birth weight for potential gestational ages of delivery. Because the model includes factors that influence fetal growth and perinatal risk, its predictive ability is particularly useful for early identification of risk of abnormal growth at the antenatal booking. Thus the model can alert healthcare providers to take appropriate actions and provide necessary care in monitoring high risk pregnancies.

Our work was in direct response to calls from the National Institute for Health and Care Excellence and the Royal College of Obstetricians and Gynaecologists for predictive tests or strategies to identify women at risk of delivering a small baby, particularly growth restricted infants with complications,53 58 and the priorities of the UK Department of Health to reduce the incidence of stillbirths and neonatal deaths. Further research is needed to evaluate the ease of implementation of our birth weight model into routine clinical practice and to determine any barriers and facilitators of its use. This research should include assessment of the acceptability of the prediction model as a screening tool for pregnant women and their families, as well as healthcare providers.

The effect of using our birth weight model in clinical practice might require evaluation in cluster randomised trials to assess whether its use improves perinatal outcomes, or evaluation in an implementation study to show that it can be integrated into routine care at a population level. These studies could evaluate the use of the model to inform interventions (such as close monitoring or planned delivery) compared with routine care on perinatal mortality. Although the feasibility of these trials is challenging because of the sample size required to show an effect on perinatal mortality, proxies for perinatal mortality could be used, such as morbidity, to achieve sufficient power.59

Conclusions

We have developed a simple prediction model incorporating routinely available clinical predictors to predict birth weight at various potential gestational ages at delivery. The model explained about 50% of the variability, showed good calibration, and its use could help identify pregnancies at increased risk of adverse outcomes to allow planning of appropriate management or early intervention to improve perinatal outcomes. Further multiple external validations in different settings and populations will help confirm the generalisability of the model.

Ethics approval

Not applicable.