Data journey within INSIGHT

Step 1: Pre-submission enquiry
Add a Title
Add a Title
Add a Title
Add a Title
Add a Title

The diagram below shows what happens to the routinely collected eye data that forms the basis of the INSIGHT hub.

It begins with your visit to hospital, when you have an eye scan or test. This data is stored safely by the hospital, de-identified and processed by INSIGHT as part of a large dataset, which includes similar data from thousands of patients. Datasets are anonymised by INSIGHT and made available for carefully vetted research projects that will bring clear patient benefit.

Click on the labels on the diagram to find out more:

research body2.png
insight 2.png



The explanations below include some technical terms highlighted in blue. You can click on these terms to see their definitions in the glossary at the bottom of the page



Collection of data during a hospital appointment


During a visit to your doctor or to a hospital, it is routine for your health data to be collected. This may be done by taking your blood pressure, checking your pulse, asking about your general health, your diet, your exercise or smoking habits etc.


At a hospital eye appointment, information about how much of the eye chart you can read or the extent of your peripheral vision may be measured, and your eyes may be scanned. Information like this is stored, and could be requested from the hospital as part of a dataset by researchers. 




Opting out of sharing your data

All the data that is collected as part of your routine clinical care or treatment is normally made available for research purposes unless you decide to opt-out.


The national data opt-out is managed by NHS Digital. If you decide to opt-out, it means that none of your data will be used for any research that the NHS supplies data for. Your data will therefore be excluded from the INSIGHT research database. 


Every three months, NHS Digital will supply INSIGHT with a list of NHS numbers of patients who have opted out of sharing their data for research purposes.  At that point, any patients who have requested to opt out will have their patient records removed from the sets of data made available for research through INSIGHT. 


If you choose to opt out after your data has been processed by INSIGHT, your record will be removed if a link to it still exists (that is, if it has been pseudonymised). Once your data has been fully anonymised there is no way of removing your data from the database as there is no way of tracing it back to you.

Find out more about the National Data Opt-Out and set your data opt-out choice on the NHS Digital website. 




Your data and INSIGHT


De-identification of data

The organisation that collects your data is known as a Data Centre. There are currently two Data Centres that are making data available to research through INSIGHT – these are: Moorfields Eye Hospital NHS Foundation Trust and University Hospitals Birmingham NHS Foundation Trust.


Once your data has been collected at a Data Centre and added to the system, it is pseudonymised. Pseudonymisation is the process by which a person’s identifiers (such as their name and address) are replaced by a code (or pseudonym) which cannot in itself identity that person.  INSIGHT’s Data Controller keeps a list of the person’s NHS number and their code and so they can identify the person if they need to (for example, if the person decides to opt out of sharing their data). These codes are not shared.


The pseudonym is generated using a specific encrypted ‘salt code’. The combined data is then encrypted using an SHA2-256 hashing algorithm.

Sets of processed pseudonymised data are listed in an INSIGHT Metadata Catalogue within the INSIGHT Research Database. A copy of the INSIGHT Metadata Catalogue (but not the data itself) is also sent to HDR UK, which makes it publicy available on the Health Data Research Innovation Gateway. Researchers can browse the Gateway and use it to submit requests to access any of the data listed. The Gateway makes it easier for researchers to find out exactly what data is available and so increases the likelihood of the data being used for research. Researchers can browse the Gateway to discover and identify exactly which sets of data would be most useful in their research. They then make a formal request to access that data.


The metadata required for this catalogue is set centrally by HDR UK and is described according to a MoSCoW framework (Must have vs Should have vs Could have vs Won’t have). INSIGHT will provide all ‘Must have’ data, as much as possible of the ‘Should have’ data and will consider the inclusion of ‘Could have’ on a case-by-case basis according to merit.


Any data that is used for reasons other than your individual care and treatment is normally anonymised which means that information that identifies you is removed. This is done by replacing your identifiers (such as  name, address) with a random code which cannot identify you.


Once data has been collected and organised, a final check is made to find out whether the person who provided the data has made an opt-out request. If the person has decided to opt-out, their data is removed from all NHS research databases including INSGIHT. All other data is anonymised before being made available to researchers.





Managing your data


Each Data Centre has a named Data Controller, who is responsible for the overall security and management of the data held by INSIGHT. The management of data involves a number of processes, including: data cleansing; data normalisation; and data quality control.


  • Data cleansing is the process of detecting and correcting (or removing) corrupt or duplicate or inaccurate records. It refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the ‘dirty’ or coarse data. Data will be cleansed and matched in each Trust’s local server.

  • Data normalisation is the systematic process of ensuring the data structure is appropriate for the purpose it is intended for. This involves organising the data so that it appears in a more consistent form to researchers. The point of data normalisation is to reduce inaccuracy and duplication, and to present the data in a way that makes it easier for researchers to understand what datasets may be available to them. 

  • Quality control of the data takes place within the UHB and MEH servers before transfer to the Google Cloud. The data is checked for completeness and consistency, and to ensure that it is appropriate, accurate and cleansed. Data will be refreshed (which involves pulling new accurate information where necessary) and cross-checked frequently.

At this stage, improvements in data quality can be fed back to the clinical systems, and this can improve the data that is used for routine care.  

Data Processing

Only when data management processes are completed will the data be pseudonymised. 

Clean data is pseudonymised using a secret code. Pseudonymised data can be used for research, audit and service evaluation by specified Data Centre staff only. A copy of this data goes to the Data Centre’s private Google Cloud (currently MEH or UHB), which each have their own private Google Cloud. Here, it undergoes further quality assurance checks and then a copy of this pseudonymised dataset is placed on the UHB/MEH Shared Private Google Cloud for which UHB and MEH are joint Data Controllers, with Google Cloud acting as the Data Processor. The data remains here until an approved request to access it is granted, at which point is it anonymised before the approved researcher can access it.


Once it has been pseudonymised, data is processed by the joint Data Controllers. This involves sorting and organising it so that it can be used by researchers and made ready to link with other approved datasets, if appropriate. 



Responsible use of data

What data is stored by INSIGHT?

As well as collecting data such as how much of the chart you can read (your visual acuity), the pressure inside your eye (your intraocular pressure), and how far away from the centre you can see when looking directly ahead (your peripheral vision), many hospital eye appointments now also involve a routine scan of your eyes. There are several types of eye scan but one of the most useful to researchers is an Optical Coherence Tomography or OCT scan.

Apart from eye data, INSIGHT will also gather medical information from your GP such as what other conditions you have, what hospital treatments you have had and what medication you take. INSIGHT may also collect more general data such as information about your diet, your smoking and drinking habits, and your level of exercise.

All this information is stored safely within INSIGHT, and every piece of data will be anonymised before any researcher is allowed to access it.

Data Security

INSIGHT is governed by the same data security policies that apply to all the Research Hubs within the UK Health Data Research Alliance. HDR UK, and therefore INSIGHT, is committed to the protection of privacy and data security as set out by the Organisation for Economic Co-operation and Development (OECD) Recommendation of the Council on Health Data Governance. INSIGHT is also committed to a carefully considered approach to the control of access to the data it holds. This approach is based on the ‘Five Safes’ which individually and collectively address the safety (or risk) around data access. 

The Five Safes:

  1. Safe projects: Does the research aim to bring public benefit?

  2. Safe people: Can the researchers be trusted to use the data in an appropriate way?

  3. Safe data: Is there a risk of identification from the data itself?

  4. Safe settings: Does the research facility control unauthorised use?

  5. Safe outputs: Is there a risk of identification from the published statistics?


1) Safe projects: Is this use of data appropriate? 

‘Safe projects’ refers to the legal, moral and ethical considerations surrounding use of the data. 

One of the essential criteria for all researchers requesting access to data will be to demonstrate the likelihood of patient benefit. Specifically, the research project will be assessed by asking: 

  • Does the research aim to bring patient benefit (‘public good’)?

  • What is the predicted size of that benefit? 

  • What is the likelihood of the project being successful and this benefit being realised? 

  • What is the risk of unintended harms including potential discrimination?

It is important to note that there may also be a ‘loss to public benefit’ through not doing the project. 


2) Safe People: Can the researchers be trusted to use it in an appropriate manner?

‘Safe people’ reviews the knowledge, skills and incentives of the users to store and use the data appropriately. 

INSIGHT will assess whether the person requesting access to its data is appropriate. To do this INSIGHT will consider:

  • Can the applicant be trusted to use the data exclusively for the purpose agreed and on the terms agreed?

  • Does the applicant understand the reasons for the restrictions of use, including restrictions on onward data transfer, linkage or manipulation?

  • Do they have the necessary skills to undertake the work described and deliver trustworthy outputs?

  • Do they have the resources to complete the project?

Evidence for answering the above questions will be supported by the INSIGHT Due Diligence Process (DDP). The INSIGHT DDP recognises the different types of organisations that may apply eg: NHS; higher education institute (HEI); industry applicant (pharma, tech, other); charity applicant; other research organisation including overseas health institutions. 

Applications from individual researchers who are not employed by or affiliated to an organisation that is sponsoring their research (and on which due diligence can be undertaken) will not be considered. 

An INSIGHT DDP form must be completed by anyone requesting access to the data held by INSIGHT. This form is part of the Phase 1 evaluation process and provides INSIGHT with the following information:

Applicant organisation(s):

  • Sector

  • Legal status

  • Start date or trading age

  • Size – such as FTE (full-time equivalent) employment, financial turnover, R&D (research and development) spend

  • Notable media exposure 

  • Previous working partnerships with any of the INSIGHT partners

  • Sponsorship of the study

Applicant individual(s)

  • Relationship to the applicant organisation

  • Relevant experience

  • Notable media exposure

All applicants will be required to undertake relevant training provided by INSIGHT and to engage constructively throughout the life of the project with access conditions which support safe behaviour.

3) Safe data: Is there a risk of identification in the data itself?

Data sensitivity: 
The data within the INSIGHT Research Database relates to an individual’s ophthalmic or systemic health. It includes images of the internal structures of the eye, such as the retina. A number of highly sensitive types of data relevant to health are excluded, for example, data on sexual health and sexual orientation. 

Risk of identification: 
The data within the INSIGHT Research Database is anonymised before being used by researchers. Information about a person is classified as a ‘Direct Identifier’ or an ‘Indirect Identifier’ according to how it might be used to identify that individual. The types of identifiers that INSIGHT holds can be summarised in the following way:

Direct identifiers

  • The INSIGHT Research Database does not contain: 

    • direct and recognisable identifiers such as name, address or image of a face

    • direct but not recognisable alphanumeric identifiers such as NHS number

  • The INSIGHT Research Database does contain:

    • images that are not recognisable but may be unique such as retinal images

Indirect identifiers

  • The INSIGHT Research Database does contain:

    • post code, age, and gender

    • diagnoses including rare diseases

INSIGHT is aware that allowing access to several different types of data can lead to an individual being identified, and INSIGHT will minimise this risk. Specific examples include:


  • Retinal or iris images: some retinal or iris images may be unique, however identification from these images is not possible unless a copy of the same imaging type that is linked to direct identifiers was made available outside of INSIGHT. Identification is not possible from the images alone, nor from combining the images with the types of data commonly in the public domain, either institutional or personal (such as social media images etc).

  • Post code: the INSIGHT Research Database holds postcode data to support studies into equity of access and enable greater understanding of the health impacts of social deprivation. To reduce risk, however, access will not be provided to the post code directly, but rather INSIGHT will provide the required linked data on demand and provide it as part of the anonymised dataset, for example providing a less specific geographical unit such as the Lower layer Super Output Area (LSOA - which normally relates to about 1500 people) or the associated data of interest such as the Index of Multiple Deprivation score. This approach reduces risk whilst ensuring that the research value of this data is not compromised.

  • Age: data of birth is not provided so as to reduce likelihood of identification; age is provided to the nearest year.

  • Diagnoses including rare diseases: a rare diagnosis may enable identification if combined with enough additional indirect identifiers; this will be evaluated on a case-by-case basis and appropriate restrictions will be placed on accompanying data (such as the specificity of any age or geographical data provided) that might significantly increase the risk of identification.

  • Data in combination: the combination of enough data fields will at some point result in a unique profile for an individual. This provides a theoretical risk to identification, but such identification is still only possible if that same set of data is provided from some other source. Such datasets are not in the public domain, making this risk extremely low. 

  • A general principle of INSIGHT is that only the minimum amount of data that is necessary for the proposed research project will be made accessible to the researcher who is applying for that access. This is called data minimisation. The applicant must justify the need for each piece of data. INSIGHT reserves the right to refuse an application or limit the data being made available if it has concerns about possible identification.


4) Safe Settings: Does the access facility limit unauthorised use?

INSIGHT provides a safe setting using technical and physical security, education and culture, and the contracts that are put in place with researchers.

Technical and physical security

The infrastructure supporting the INSIGHT Research Database builds on longstanding expertise in safe data management and on cloud architecture and security from Google Cloud. It uses the DSP Toolkit and BS-ISO-27000 Series of Information Security Standards

The security of the networks within University Hospitals Birmingham (UHB) and Moorfields Eye Hospital (MEH) will be controlled according to the Trust network security protocols. Any data leaving UHB and MEH will be encrypted in transit and when it arrives. Any data that is transferred to INSIGHT from other data centres who wish to contribute their data to INSIGHT, will be  transferred under a secure File Transfer Protocol.

The INSIGHT Research Database itself will sit on a secure UHB space on a Google Cloud. Data stored on Google infrastructure is automatically encrypted at rest and spread out within the system for availability and reliability. Google data premises are heavily protected.

Once a researcher has been granted access to specific datasets, that data is  anonymised and held separately in a specific place called the Safe Haven which only people with two-factor authorisation can access. These datasets will be time and date-stamped, and ‘water-marked’ so that the dataset can be linked to a particular release and applicant.

The following activities will be comprehensively audited: 

  • who has accessed the system and when 

  • when data items are created and who by 

  • when data items are edited and who by 

  • when data sets have been browsed or information (with correct permissions) has been accessed and downloaded

Education and culture
INSIGHT will support applicants in understanding the rationale behind the safeguards around access to data within INSIGHT and the importance of INSIGHT’s mission.

Contractual safeguards
All applicants who are granted access to data on the INSIGHT Research Database  will be contractually obliged to:

  • expressly prevent any attempts at re-identification

  • limit the use of the data to the purposes described within the contract

  • expressly prevent any attempts to extract or transfer the data to a third party or to release the data in any other way. 


5) Safe Outputs: Are the statistical results non-disclosive?

It is important that researchers publish their findings, and with enough detail to maximise the value of the study. But INSIGHT will require researchers to minimise the risk of publishing any data that could lead to a person being identified, for example, by avoiding publishing data that only applies to fewer than 6 people.



Anonymisation of data

Before researchers are allowed access to the data that INSIGHT holds about you, that data is anonymised. This means that information that identifies you is removed. This is done by replacing your identifiers (such as name and address) with a random code which cannot identify you.

All anonymised data is stored in the INSIGHT Research Database, which is reviewed and updated every three months so that: 

  1. new data can be added and 

  2. any records from patients who have ‘opted out’ since the last review can be removed.



Requests to access INSIGHT data

The data request process

The data request process is designed to work as follows:

  1. A research group will request access to a dataset in order to advance a project they would like to work on, for instance diabetic macular oedema (DMO). The application they complete will ask about technical details as well as broader questions, such as the research’s potential benefits to patients and society. 

  2. The INSIGHT Hub will assess whether the application is technically compliant (e.g. with relevant data-protection regulations) and contractually feasible. Only after technical and legal criteria are met will the application be passed on to the INSIGHT Data Trust Advisory Board (DataTAB). 

  3. The INSIGHT DataTAB will then discuss the validity and appropriateness of the proposal, using the access criteria which the DataTAB itself will have developed. The INSIGHT DataTAB will then provide a recommendation on whether or not to grant the research group access to the dataset.

The INSIGHT Data Trust Advisory Board (DataTAB)

At the heart of INSIGHT is a commitment to Involving the public, patients and other stakeholders in deciding how the data made available by INSIGHT is shared and used. The INSIGHT DataTAB is an important part of this; it is a group of public, patient and other stakeholders who will make recommendations to INSIGHT on allowing access to data for research that aims to improve people’s lives.

The INSIGHT DataTAB will do this by:

  • assessing and providing recommendations on requests received from third parties who want to access the data made available by INSIGHT

  • developing and maintaining the access criteria used to assess third party requests to access the data

  • reviewing and providing feedback on INSIGHT’s other processes and procedures

The DataTAB’s recommendation can be any one of the following:

  1. ‘Recommend access granted’. The data access request meets the standard criteria and members recommend that INSIGHT grants access and use of the data.

  2. ‘Recommend access granted with further conditions’. The data access request meets the standard criteria, although members recommend that further conditions are put in place to allow access and use of the data.

  3. ‘Recommendation deferred pending receipt of additional information or clarification’. Members were unable to assess whether the data access request meets the standard criteria due to a lack of information or clarity in the information that has been provided.

  4. ‘Recommend access denied’. The data access request does not meet the standard criteria and members recommend that access and use of the data is denied.

Membership of the DataTAB

The membership of the DataTAB will seek to reflect the opinions, experiences and perspectives of:

  • the public who may be future users of the NHS

  • NHS patients with sight-impairing conditions

  • charities focused on supporting people with sight-impairing conditions and/or finding solutions to tackle them

  • clinicians and other healthcare professionals

  • NHS organisations experienced in the stewardship of health data and/or involved in setting government policy related to its sharing and use

  • Researchers who use health data to discover new clinical insights into disease detection, diagnosis and referral

  • health data research institutions and/or regulators

  • organisations that advocate for the responsible use of health data

  • industry groups conducting research for the public benefit

These groups are not set in stone and may extend to others that the INSIGHT DataTAB will decide to include in the future. DataTAB members undertake their roles in an individual capacity, rather than on behalf of their employers or other organisations they are affiliated with. Members of the INSIGHT DataTAB are appointed for an initial term of 1 year, after which they can request to remain a member for a second year. 

The Open Data Institute is supporting INSIGHT to set up and run the INSIGHT DataTAB, working closely with charity INSIGHT partner, Action Against AMD; University Hospitals Birmingham; and the Public and Patient Involvement and Engagement team at the National Institute for Health Research (NIHR) Moorfields Biomedical Research Centre.



Research body


Suitability of researchers checked by INSIGHT

Once data is available through the INSIGHT Data Hub, a list of the researchers who have requested data access and the title of their research will be available on this website. Our archive of research case studies will build up over time.

The Health Data Research Innovation Gateway

Requests for access to data will be evaluated by INSIGHT using the ‘Five Safes’. If a request is approved, then the relevant data is anonymised and transferred to a ‘Trusted Research Environment’ or ‘Safe Haven’. Each request must provide detail of the terms under which they will use the data such as the precise purpose/s the data will be used for, how long access to the data is needed for, and how the data will be protected. 

The 'safe haven' sits at the Health Data Research Innovation Gateway, which forms the interface between INSIGHT and the approved researcher. Access to the anonymised datasets within the data Safe Haven will need two-factor authorisation. Datasets will be time and date-stamped, and ‘water-marked’ so that the dataset can be linked to a particular release and applicant.

Visit the Health Data Research Innovation Gateway website.





Patient benefit assessed by INSIGHT


Along with the rest of the HDRUK, INSIGHT aims to build a research database and supporting infrastructure that allows other NHS Trusts or other types of health data collection centres to join INSIGHT. Each additional Data Centre will be expected to meet all the principles of participation within the INSIGHT and wider HDRUK framework.

Find out about areas of research where INSIGHT data could help.



Data analysis, for example using artificial intelligence


For machines (computers) to be able to learn how to detect diseases from eye scans – in the way that doctors can - they need to ‘read’ a huge number of scans. The NHS has the biggest set of health data anywhere in the world and, uniquely, the data is of high enough quality for machines to learn from. 

Over 25 million eye scans are routinely taken every year within the NHS. Using these scans, a machine can, for example, learn the difference between a healthy eye and the eye of someone with Age-related Macular Degeneration (AMD), which is the commonest cause of blindness in the UK. At the moment, 200 people a day develop the blinding from of AMD, but if it could be detected and diagnosed sooner by intelligent machines, treatment for it could begin sooner and many people would not lose their sight from this disease. The benefit of this to patients is clear, but preventing this and other eye diseases from progressing would also free up NHS resources that could then be used where they are needed most. Artificial Intelligence therefore also has the potential to benefit society as a whole.




Research outcomes


New treatments and diagnostic techniques

As well as providing information about eye diseases, scans of the back of the eye have been shown to reveal information about a person’s age, gender, history of heart conditions, smoking behaviour, diabetes and much more. It can even reveal the early signs of some types of dementia.

Being able to research across different diseases should lead to a deeper understanding of disease mechanisms across many areas of medicine.

Find out about areas of research where INSIGHT data could help.


Glossary of terms for INSIGHT


Anonymised data

  • This is data where the pseudonymised code (see below) has been replaced by a random number, so that it is virtually impossible to link that data back to an individual patient. In some cases (for example, with extremely rare conditions) it could be possible to re-identify an individual from their anonymised data. To avoid this, INSIGHT will ensure that data susceptible to being re-identified is removed from any datasets before they are made available for research.

Pseudonymised data

  • This is data which has had identifying details (such as name and address) replaced by a unique code. Pseudoynmised data is not shared, but is instead held in a secure environment, accessible only by accredited INSIGHT staff. If necessary, an individual's data can be identified using their unique code – for example, if a patient decides to opt out of sharing their data for research purposes.


  • Set up by the government through, HDR UK is an independent, non-profit organisation. Its work spans academia, healthcare, industry, charities, plus patients and the public. Its staff include some of the world’s leading experts in health data research and innovation, working together to develop and apply cutting-edge approaches to clinical, biological, genomic and other multi-dimensional health data to address the most pressing health research.

  • Find out more on the HDR UK website

Health Data Research Alliance

HDR UK Innovation Gateway

  • This is a web portal that allows users to discover, and enquire about access to, UK health datasets for research and innovation. It provides detailed information about the datasets, which are held by members of the UK Health Data Research Alliance, such as a description, size of the population and the legal basis for access.

  • Find out more on the HDR UK Innovation Gateway website

Data Controller

  • A Data Controller determines the purposes for which, and the manner in which, personal data is processed. In the case of INSIGHT, the Data Controllers are the two NHS parters, Moofields Eye Hospital NHS Foundation Trust (MEH) and University Hospitals Birmingham NHS Foundation Trust. 

  • Find out more about the use of data by MEH and UHB

Data Processor

  • A Data Processor is responsible for processing personal data on behalf of a Data Controller. Data Processors have specific legal obligations and are required to maintain records of personal data and processing activities. In the case of INSIGHT, Google Cloud acts as the Data Processor. 


Google Cloud

  • Cloud computing makes computer system resources available to individuals and organisations on demand, especially data storage (cloud storage) and computing power. Like other 'clouds', Google Cloud is set of physical assets, such as computers and hard disk drives, and virtual resources that are contained in Google's data centres around the globe. This distribution of resources provides several benefits, including redundancy in case of failure.

  • Find out more on the Google Cloud website

Google Shared Virtual Private Cloud (VPC)

  • The Shared VPC provided by Google allows INSIGHT to connect resources from multiple projects to a common Virtual Private Cloud (VPC) network, so that different parts of INSIGHT can communicate securely and efficiently. It allows INSIGHT to implement a security best practice of ‘least privilege’ for network administration, auditing, and access control – this means each part of INSIGHT can only access the information it actually needs.

  • Find out more on the Google Cloud website

Trusted Research Environment (TRE), also known as a ‘Safe Haven’

  •  TREs are designed to protect the privacy of individuals whose health data they hold while facilitating large-scale data analysis using High Performance Computing that increases understanding of disease and improvements in health and care.

  • Find out more on the HDR UK website

The ‘Five Safes’

  • This is a framework for helping make decisions about making effective use of data which is confidential or sensitive. The Five Safes are: Safe Projects; Safe People; Safe Settings; Safe Data; and Safe Outputs. They were divised by the UK Office for National Statistics to ensure that personal data is used safely and that the analysis of data for research does not result in the identification of individuals. 

  • Find out more on the UK Data Service website