These case studies have been sourced from senior researchers across the UK to illustrate the items in the best practice checklist.

The best practice checklist was created by people with experience of mental illness and expertise in data science.It is divided into items which can be done today, and items to aim for in the future as mental health data science develops.
Its overall aim is to provide researchers with a guide to conducting mental health data science which is both high quality and inclusive of the views of those providing the mental health data.
The case studies are provided in order to assist the wider research community in applying the recommendations from the checklist to the real world. Each case study illustrates one or more of the items from the checklist, focusing on both what individual researchers can do today and what we should be aiming to introduce collectively in the future.
List of Case Studies
- Creating a research database of de-identified NHS data
- Data linkage - linking health data with other sources of data
- Exploring questions of consent
- Dealing with requests for data withdrawal in a population-wide data linkage study
- Dealing with pupil consent in a school setting
- Providing schools with research findings - supporting staff in understanding the data and protecting the identity of individual pupils
- Novel methods for de-identification of data
- The Clinical Record Interactive Search (CRIS) mental healthcare data resource
- Linking the Clinical Record Interactive Record Search (CRIS) mental healthcare resource with external data sources
Case study 1: Creating a research database of de-identified NHS data
Rudolf Cardinal (University of Cambridge/CPFT), Linda Jones (University of Cambridge), Jonathan Lewis (CPFT)

- 
Illustrates (today) checklist item: data should be accessible to a range of people who conduct research, including authorised academics and clinicians
- 
Also illustrates checklist items:- 
(today) ensuring that data are accessed in safe settings using clear and efficient procedures. 
- 
(today) incorporating statistical disclosure control by following rules designed to prevent identification of individuals. 
- 
(future) providing digital controls to allow remote access from private settings, using procedures that are robust and easy to follow. 
 
- 
Cambridgeshire and Peterborough NHS Foundation Trust (CPFT) holds de-identified data in the CPFT Research Database. The data are de-identified by software that is developed and updated in-house, and made freely available to anyone. There is a pre-defined process for any cases of inadvertent identification.
Patients can opt-out of this database if they choose to, something which is publicised by CPFT, and the consent process is integrated with the electronic records system used by NHS clinicians. Patients also have the opportunity to indicate whether they would like to be contacted about research studies that gather additional, non-routine data. Although the routine NHS data are de-identified, it is possible for clinicians to re-identify patients under their care if there is a medical reason to do so.
The database was created after patient/public consultation, with the assistance and collaboration of South London and Maudsley NHS Foundation Trust. The database and its consent processes are governed by an Oversight Committee that includes people with lived experience (who are reimbursed by CPFT’s patient and public involvement team), carers, information governance staff, academics, and clinicians – and by independent arbiters: an NHS Research Ethics Committee and the Health Research Authority.
CPFT’s Research Database Manager oversees access to the Research Database and its auditing systems. Data are made accessible to approved researchers and clinicians, but only for research approved by CPFT. Users must be vetted and trained, and projects usually involve a CPFT clinician who understands the services that record the data. Users can analyse the data via a secure virtual private network from anywhere, but the data stays within CPFT. Researchers can’t remove or publish data about individual patients – typically researchers can only publish findings about groups of at least 10 people, because even some de-identified data about individuals can be re-identified (perhaps by comparison to newspaper reports, for example).
Further information:
Case study 2: Data linkage - linking health data with other sources of data
Rudolf Cardinal (University of Cambridge/CPFT), Linda Jones (University of Cambridge), Kristel Klaus (University of Cambridge), Jonathan Lewis (CPFT)

- 
Illustrates (today) checklist item: facilitating research linking mental health data with other sources of public data, such as education or welfare data, in order to provide new information of public benefit about mental health.
- 
Also illustrates checklist items:- 
(today/future) incorporating the views of people with lived experience throughout the course of each project. 
- 
(future) incorporating inspection processes to ensure ongoing compliance with good data practice, and responding proportionately to inappropriate data use with measures such as training or temporary or long-term suspension of access. 
- 
(future) incorporating oversight of data repositories in order to monitor data quality and respond to public enquiries. 
 
- 
Mental health research can benefit from the linking of NHS mental health data to other data sources, such as cause-of-death data. In order to facilitate such data linkage, researchers at the University of Cambridge have helped set up a Clinical Data Linkage Service (CDLS) within Cambridgeshire and Peterborough NHS Foundation Trust (CPFT).
The first step in setting up the CDLS was to obtain grant funding for a patient/public involvement (PPI) lead and for PPI work. The researchers then recruited a Research Advisory Group including patients, carers, and the public. The group members were asked for their views on data linkage, particularly in relation to linkage without explicit consent (e.g. opt-out models). Specific linkage projects involving CPFT data were also discussed. The linkage methods are often complex, and not all the organizations involved provided the same methods.
The group encouraged the researchers to pursue these data linkage projects, whilst applying both local and national opt-outs and communicating information widely. The researchers developed new materials explaining both the linkage service in general and each linkage – clarifying how and why processes are carried out. Linked data would be de-identified, with projects being governed by an Oversight Committee that includes people with lived experience. Furthermore, projects would be carried out by trained and approved researchers, and would be subject to small-group publication limits and any restrictions required by external organizations that provided data.
The protocol and materials for the CDLS were reviewed iteratively by the CPFT Research Database Oversight Committee (see Case Study 1 for further details about the CPFT Research Database itself), and were then submitted for approval. Approvals for data linkage are required from two national arbiters: the NHS Research Ethics Service (Health Research Authority) in all cases, and the Confidentiality Advisory Group for work involving identifiable information. In all cases involving identifiable information, even transiently for linkage, a “public interest” argument must be made that the research is likely to bring sufficient benefit to relevant patient groups.
Further information:
Case study 3: Exploring questions of consent
Rudolf Cardinal (University of Cambridge/CPFT), Linda Jones (University of Cambridge), Jenny Nelder (University of Cambridge)

- 
Illustrates (future) checklist item: exploring alternative models of consent, which may involve moving away from individualised models of consent
- 
Also illustrates checklist items:- 
(future) building plans for de-identified data collected by researchers, scientists and clinical services to be made available for analysis on an open-access basis. 
- 
(future) active commitment to reducing the stigma associated with mental illness and its research, and to increasing public understanding of science. 
 
- 
Researchers at the University of Cambridge set up an online study to ask UK residents to consider how their NHS data might be used without their explicit consent. The design of the study was assisted by a panel of patients and carers: the panel contributed questions and supported the researchers in refining the overall survey before it was sent to participants.
Survey participants were asked about the use of identifiable data for clinical care and de-identified data for research. They were asked various questions, covering topics such as what they thought happened to their data now, what should happen to their data in the future, how they would like to control the use of their NHS data, and what types of non-health data they would be happy for their health data to be linked with. To explore issues concerning stigma and preference, participants were asked about physical versus mental health data, and different types of data (numbers and codes versus narrative text). In doing this, the survey incorporated a “framing” statement to see how people’s perspective on physical versus mental health data might be affected by the context in which the question was presented.
Approval was obtained via an NHS Research Ethics Committee. Support was provided by the NIHR Clinical Research Network, who were able to recruit people across the country, including from hospitals, GP surgeries, dental services, and ambulance Trusts. Ethical approval to advertise widely was acquired, with the ability to publicise the survey online becoming particularly helpful when the COVID-19 pandemic began and face-to-face recruitment stopped. The study and its analysis plan were pre-registered (https://doi.org/10.1186/ISRCTN37444142), and the anonymous data will be open access.
Further information:
Case study 4: Dealing with requests for data withdrawal in a population-wide data linkage study
Matthew Iveson (University of Edinburgh)

- 
Illustrates (today) checklist item: ensuring that researchers have a process in place for responding to withdrawal requests and that they provide transparency on whether, how and when participants can withdraw their data.
- 
Also illustrates checklist item:- (future) appointing a qualified, independent arbiter to arbitrate on complex questions relating to consent and data withdrawal.
 
Under the General Data Protection Regulation (GDPR), people conducting anonymised health data research do not need to provide an automatic right to withdraw. Despite this, it is widely acknowledged that dealing with objections and withdrawal requests as fully as possible is an important part of building public trust in research.
As part of the ethical and data access approvals process for a recent population-wide data linkage study, it was necessary for the research team to clarify that withdrawal from the study could not be handled directly by the researchers. In fact, as the project deals with only de-identified data the researchers, by design, have no way of discerning particular participants to remove. Instead, individuals are asked to raise their requests for withdrawal with the relevant data controller or linkage agent, such as through NHS Inform.
However, a request for withdrawal such as this will not necessarily result in an individual’s data being removed from a project’s dataset: removing a single record could result in the identification of the individual (e.g., through changes in frequency tables). For this reason, it is easier to prevent an individual’s data being used in future studies than it is to withdraw data from an existing study.
Further information:
Case study 5: Dealing with pupil consent in a school setting
Judith Mabelis, Dawn Haughton, Judith Brown, Dorothy Currie*, Laurence Moore, Daniel Smith, Jo Inchley (University of Glasgow & University of St Andrews*)

- 
Illustrates (today) checklist item: ensuring that participants have as much control over consent as possible.
 
The Schools Health and Wellbeing Improvement Research Network (SHINE) aims to support schools in addressing their health and wellbeing needs. As part of their work with schools, they collect mental health and wellbeing data from pupils, via an online survey, administered by schools. Schools receive a report with results from their pupils, which can be used to inform health and wellbeing action planning. To carry out the survey, consent is sought from three parties: headteacher, parents/guardians, and pupils. This case study focuses on how the SHINE network seeks consent from pupils aged 9-18 years old.
Pupils are informed about the research in different ways and are provided with several ways of opting-out. Two weeks before survey completion, schools are required to hand out Participant Information Sheets to pupils. These are written in simple language to ensure they can be easily understood by pupils of all ages and abilities. The information is also reiterated verbally by the classroom teacher on the day of survey completion, and written on the first page of the online survey. Consent covers key criteria: (i) purpose of study (ii) what it involves (iii) who will use the data and how (iv) how the data are stored and for how long (v) assurances of anonymity and (vi) the voluntary nature of the survey.
To ensure that participation is voluntary, pupils can opt-out in three different ways (1) telling their teacher (2) asking their parents to withdraw them from the survey or (3) at every question, pupils have the option of answering ‘prefer not to say’. This third way is important because pupils may find it difficult to tell an adult that they do not want to take part. It allows pupils to appear to take part but choose not provide data if they do not wish to. At the end of the survey, pupils are provided with information on sources for further advice and support.
Further information:
- SHINE website: https://shine.sphsu.gla.ac.uk/
Case study 6: Providing schools with research findings - supporting staff in understanding the data and protecting the identity of individual pupils
Judith Mabelis, Dawn Haughton, Judith Brown, Dorothy Currie*, Laurence Moore, Daniel Smith, Jo Inchley (University of Glasgow & University of St Andrews*)

- 
Illustrates (today) checklist item: ensuring that data users understand the underlying data collection tools as well as the socio-cultural context in which studies are designed and findings are disseminated.
- 
Also illustrates checklist item:- (today) incorporating statistical disclosure control by following rules designed to prevent identification of individuals.
 
The Schools Health and Wellbeing Improvement Research Network (SHINE) aims to support schools in addressing their health and wellbeing needs. As part of their work with schools, they collect mental health and wellbeing data from pupils, via an online survey. In return, schools receive a school level report with results from their pupils, which can be used to inform health and wellbeing action planning and activities.
The network supports schools by producing reports that present data in an accessible way and providing advice on interpreting the data. The reports provide context for the results via the inclusion of background information on mental health and wellbeing, explaining the topic and highlighting factors that influence mental health and are of particular relevance to adolescents. They also provide information on questions asked in the survey, including what they measure and guidelines for interpreting results. This can include values for cut-off and classification, comparisons to national averages, and information about the limitations of the measures. The data are presented in simple graphs and tables.
In addition to the data, relevant questions are provided to promote discussion amongst the whole school community, helping to aid the interpretation of the data within their own school context. The reports also contain a directory of organisations from whom schools can seek further information and help. In this way, the network aims to provide schools with the tools to understand and interpret the data in way that is meaningful to their individual context.
Since the school level report may involve small sample sizes, the researchers take steps to ensure that individual pupils cannot be identified. They apply statistical disclosure rules, for example, ensuring that data from a specific question is only presented if enough pupils have responded, and taking care not to present data in categories with a small number of responses. To achieve this, responses from year groups or gender groups may be combined, or answers may be collapsed across categories.
Further information:
SHINE website: https://shine.sphsu.gla.ac.uk/
Case study 7: Novel methods for de-identification of data
Emma M. Davidson, Heather Whalley, Will Whiteley (University of Edinburgh)

- 
Illustrates (future) checklist item: developing methods for de-identification, including innovative ways to mask identifiable information.
Researchers at the University of Edinburgh are looking at the association between findings on people’s brain scans and their subsequent health status, with the aim to inform future disease prediction and prevention. To examine these associations effectively they need to use data (written brain scan reports from a doctor) from a very large group of people. The reports can be anonymised to non-identifiable ‘study ID’ numbers with relative ease, but ensuring the de-identification of the text in the reports is more difficult at this scale, and could not be achieved manually. To solve this problem, the researchers are working to develop and test the use of Natural Language Processing (NLP) techniques for automated de-identification of the written reports. NLP is a collection of computational methods, using the rules of linguistics, which can be used to recognise and remove or replace potentially identifiable information from text. The ability to extract de-identified text data from electronic health records is a very promising step forward for healthcare research, and this NLP tool provides a potential model to greatly assist future work in the area.
Further information:
- Clinical Natural Language Processing Research Group: https://www.ed.ac.uk/usher/clinical-natural-language-processing
Case study 8: The Clinical Record Interactive Search (CRIS) mental healthcare data resource
Robert Stewart (King’s College London)

- 
Illustrates (today/future) checklist item: incorporating the views of people with lived experience throughout the course of each project.
- 
Also illustrates checklist items:- (future) providing appropriate training and supervision for data users
- (future) developing effective measures, including secure linking systems, to protect against inappropriate identification and misuse.
 
Since its inception in 2008, the Clinical Record Interactive Search (CRIS) mental healthcare data resource at the South London and Maudsley NHS Foundation Trust (SLaM) and King’s College London (KCL) has successfully made mental healthcare data available to researchers within a robust governance framework. The CRIS security model was designed with service user involvement and leadership, and all projects continue to be reviewed and overseen with close patient and public involvement. The technical security model includes removal or truncation of identifier fields and masking of identifiers in text fields. In addition, all data extraction and searching is auditable against original project approvals, researchers accessing CRIS require appropriate SLaM affiliation, honorary or substantive, and data are analysed and stored at all times within SLaM’s firewall. Provisions have nevertheless been made for remote access to data via VPN, and the secure use of identifiers (subject to necessary external statutory approvals) for data linkages. (See Case Study 9 for more information on data linkage between CRIS and external data sources.) More widely, the assembly of a dedicated multidisciplinary team to support the CRIS platform, and the necessary NHS Trust-owned office facilities for data access and analysis, have allowed CRIS to be widely used by clinicians and other stakeholders, supporting a range of audit, quality improvement and service development work, in addition to 200+ peer reviewed scientific publications. Since the creation of the original (SLaM) CRIS described in this case study, additional CRIS-like resources have been developed across the UK.
Further information:
- Information about the CRIS resource: http://www.maudsleybrc.nihr.ac.uk/facilities/clinical-record-interactive-search-cris/
Case study 9: Linking the Clinical Record Interactive Record Search (CRIS) mental healthcare resource with external data sources
Robert Stewart (King’s College London)

- 
Illustrates (today) checklist item: facilitating research linking mental health data with other sources of public data, such as education or welfare data, in order to provide new information of public benefit about mental health.
- 
Also illustrates checklist items:- (today) monitoring data quality and taking account of the origin and quality of data when drawing conclusions.
- (today/future) incorporating the views of people with lived experience throughout the course of each project.
 
Over its many years of operation, the Clinical Record Interactive Search (CRIS) mental healthcare data resource at the South London and Maudsley NHS Foundation Trust (SLaM) and King’s College London (KCL) has been successfully linked with a number of external data resources. (See Case Study 8 for further details on the CRIS resource itself). These external resources include important complementary healthcare data from acute care (Hospital Episode Statistics) and primary care (Lambeth DataNet), as well as specialist services (National Cancer Registry, local maternity and neonatal care via the MRC-funded eLIXIR platform). In addition, a number of innovative linkages have been set up with education data (National Pupil Database; Department for Education), individual-level socio-economic data and living circumstances (National Census; Office for National Statistics), and employment/benefits (Department for Work and Pensions). These resources are currently supporting a range of research projects and have involved linkage challenges that are less of an issue for health data (e.g. dealing with multiple matching variables). A specific advisory group has been set up and runs successfully to provide patient and public involvement in the use of linked data for research, adding valuable insights to studies at both design and output stages.
Further information:
 
                              