1. Background to the research project 
  2. Who this report is for 
  3. Data sources used in the research project
  4. Prevalence and incidence data, including data relating to age and gender
  5. Mortality data
  6. Impact on hospital services 
  7. Regional breakdown 
  8. Social deprivation 
  9. International comparison

Background to the research project 

This website presents data resulting from a three-year epidemiological research project – The respiratory health of the nation – funded by the British Lung Foundation. The project aimed to compile a comprehensive overview of the extent and impact of lung disease across the UK. This has not been done since 2006, when the British Thoracic Society published the 2nd edition of its report The burden of lung disease.

The research was carried out between 2013 and 2016. The research project team was led by Professor David Strachan, Professor of Epidemiology and Director of the Population Health Research Institute at St George’s, University of London. Professor Strachan was supported by a team of epidemiologists: Ms Ramyani Gupta and Ms Elizabeth Limb at St George’s; Professor Richard Hubbard, Dr Jack Gibson and Dr Laila Tata at the University of Nottingham; Professor Peter Burney, Professor Deborah Jarvis, Professor Paul Cullinan, Dr Anna Hansell, Dr Ioannis Bakolis and Dr Rebecca Ghosh at Imperial College London, and Professor Aziz Sheikh at the University of Edinburgh.

BLF research and policy teams extracted data summaries and key points. They did this with support and advice from the epidemiologists from the research project team who compiled the data, and from leading clinicians in each of the disease areas covered. 

Who this report is for

The aim of this report is to provide the information needed to help improve the respiratory health of the nation. These data and analyses should inform the development of strategies designed to reduce the impact of lung disease on the UK’s health. They can also be used as baseline data to measure the effectiveness of these strategies. The conclusions and recommendations section of this report has outlined some of the approaches that should be included in these strategies.

The report is therefore an invaluable resource for:

  • policy-makers
  • researchers
  • health care providers and professionals
  • the commercial health care sector

It is also useful for anyone else looking for facts and information about lung disease and specific conditions. This includes media professionals and people living with lung disease.

Data sources used in the research project

The research project team used various data sources to compile an up-to-date overview of lung disease. They also looked closely at 15 major lung conditions. They compiled data relating to:

  • prevalence (the number of people who are living with or have previously been diagnosed with the disease)
  • incidence (the number of new diagnoses made each year)
  • mortality (the number of people who die from lung disease)
  • impact on hospital services
  • age
  • regional variation
  • social deprivation
  • international comparison

Data relating to people of all ages were compiled. Overall, the data provide the most accurate picture available of the respiratory health of the nation. However, the data sources used impose certain limitations on the strength of the conclusions that can be drawn.

The sources used and their limitations are detailed below.

Prevalence and incidence data, including data relating to age and gender

The research project team used The Health Improvement Network (THIN) database records for 2004-13 to estimate prevalence and incidence data. These 12.6 million patient records from 591 GP surgeries represent a sample of approximately 5% of the population.

Using population estimates, the team scaled up THIN data to produce estimated total figures for the UK population. They also calculated breakdowns by gender, age group, region and levels of deprivation. The regions were based on former Strategic Health Authority areas in England, Wales, Scotland and Northern Ireland.

The research project team used a process of direct standardisation to estimate the number of people newly or ever diagnosed with each condition. First, they looked at the annual rates recorded by THIN, broken down by age group, gender and region. They then multiplied these by the total number of the UK population in each subgroup in that year (using mid-year population estimates from the Office of National Statistics). Finally, they added these together to produce overall estimates.

For some lung conditions, such as pneumonia, it is possible for a single person to be diagnosed with the disease more than once. For instance, someone might be diagnosed, recover, then be diagnosed again at a later date. For these conditions, incidence and prevalence were recorded as the number of individual persons diagnosed, rather than the number of separate episodes of the disease. This was to make the data comparable with long-term lung diseases, with which someone can only be diagnosed once.


THIN data are commonly accepted as the most accurate available when looking at prevalence and incidence of disease presenting to GPs. But there are limitations in using these data related to the degree of accuracy of the source data, and possible errors when estimating overall figures. Although these do not in any way invalidate the data, they should be borne in mind when considering the figures.

Limitations to the accuracy of the source data:

  • The validity of the estimates depends on how accurately GPs have recorded and coded diagnoses.
  • It is also reliant on hospitals feeding data back to GPs when patients are diagnosed in secondary care. Hospital feedback is generally reliable. However, the figures may be slightly underestimated for diseases like pneumonia, pneumothorax and pulmonary embolism that are diagnosed and managed during a single hospital stay. Hospitals may not feedback all such instances to be recorded in GP data.
  • Some of the diagnoses included here for conditions like mesothelioma, respiratory tuberculosis (TB) and sarcoidosis may relate to organs other than the lungs. This is because there is no reliable way to separate them out in the GP data. However, such errors are likely to be small.

Some incidence trends may be due to changes in the way health care professionals coded data over the period concerned. For instance, use of new codes may steadily increase as GPs get used to them, before levelling off at a certain rate of use. This may give the illusion that prevalence is increasing when it is just use of a particular code that it is increasing. For this reason, data compiled before 2008 were analysed with extreme caution. Use of coding from 2008 onwards was considered more consistent, but some variation in prevalence and incidence, particularly among less common conditions, may still result from changes in coding use rather than reflecting genuine changes in disease rates.

Similarly, a number of different codes can be used to record diseases like idiopathic pulmonary fibrosis (IPF). The research project team, with advice from clinicians and the BLF’s internal research team, used codings they considered to offer the most accurate picture of true incidence and prevalence.

Limitations to the estimates of overall figures:

  • The 5% of the UK population represented by THIN data is from a sample of GP practices, rather than all practices. There is a possibility of error once these figures are extrapolated to apply to the UK as a whole.
  • The researchers either omitted or treated with caution data from 2013. Many practices submitted their last data set in mid-2013, and the researchers applied their extrapolation to provide a full-year estimate. This introduces the possibility of error in 2013 data.

Regarding estimates of the overall number of people newly or ever diagnosed with each condition:

  • For rare diseases these estimates should be treated with caution. This is because they are based on age-, gender- or region-specific subgroup estimates of disease rates in each year. The number of diagnoses in some subgroups may be very small. So they could be subject to a high level of random variation from year to year.
  • Changes in absolute number estimates over time are produced by a combination of changes in underlying rates of disease, and changes in the structure and size of the UK population. So an increase in the incidence of a condition may indicate that it is becoming more common. Or that the section of the population most affected by the disease could have expanded. It could also reflect a combination of the two, or other factors such as improvements in diagnosis.

Mortality data

The research project team obtained mortality data from the Office for National Statistics for England and Wales, the General Register Office for Scotland and the Northern Ireland Statistics and Research Agency.

Numbers of deaths were totalled over the five years from 2008 to 2012. Age-standardised mortality ratios by region were calculated for each condition over this five-year period, separately for males and females.

Using age-standardisation takes the ages of people within a population into account when comparing rates of disease. This is so that, for instance, comparisons of the number of people dying with a condition are not unduly influenced by the fact that there might be a larger number of older people in a particular population at that time.

Often the underlying cause of a person’s death is different from the disease that eventually killed them. For instance, someone may have been admitted to hospital and made vulnerable to infection by COPD, but eventually killed by pneumonia. In these instances, our figures comply with World Health Organisation (WHO) international mortality coding rules under which the underlying cause of death is recorded.


These statistics relate to the underlying cause of death, coded according to internationally agreed criteria. This means they more accurately reflect the numbers of people dying from lung disease than the numbers dying with lung disease. The death of those with lung disease may be certified to other causes.

Comorbid lung disease could shorten lifespan. It impairs the chance of surviving an acute non-respiratory illness, such as heart attack or stroke.

Impact on hospital services

The research project team used the WHO Europe Hospital Morbidity Database (HMDB) to analyse total hospital admissions and bed days. This uses the International Classification of Diseases (ICD-10) coding. The latest reliable available data were from 2011.

Regional variations in hospital admissions were produced by looking at age-standardised hospital admissions ratios for common lung diseases. They are based on emergency admissions only, which accounted for 94% of UK hospital admissions for lung disease in 2011.


These statistics relate to the main reason for admission to hospital. As with mortality data, the true impact of comorbid lung diseases may be underestimated.

Regional breakdown

THIN data were used to calculate regional breakdowns, standardised by age, gender and deprivation.

Regional variations in emergency hospital admissions are also provided. The sources for these were English hospital episodes statistics data from the Health and Social Care Information Centre (HSCIC), NHS Wales Informatics Services, the Information Services Division (ISD) Scotland and the Department of Health, Social Services and Public Safety (DHSSPS) Northern Ireland.


Hospital admissions data were supplied by different agencies for England, Scotland, Wales and Northern Ireland (see above). In general, it was possible to harmonise the statistics for regional comparisons of age-standardised admission ratios. For some of the rarer lung diseases, the researchers aggregated small age- or gender- or year-specific counts for confidentiality reasons. In such cases, the missing counts were imputed, before the age-standardisation. This was done by applying the English age or gender or year distribution to the aggregated counts from the other nations.

Social deprivation

The Townsend deprivation index was used to estimate the impact of social deprivation on individual lung diseases. This index is a census-based method of measuring social deprivation. It uses variables such as unemployment (applying to either parent or guardian in instances of child lung disease), and whether a household owns a car, is overcrowded and is owner-occupied.


The socio-economic analysis was conducted by applying the Townsend index to THIN data. So the same limitations as those detailed for incidence and prevalence data apply.

International comparison

The WHO World Detailed Mortality Database provided comparisons of lung disease mortality for 99 countries. The researchers compared deaths against United Nations population estimates for the years from 2000 to 2010.


Data for many countries were available for all 10 years analysed. But data from some countries were not available for every year.

There are differences in data-recording techniques and health services. This makes it difficult to know the extent to which these data reflect actual rates, or differences in diagnostic conventions.