Open Source Data Projects

A tenet of good software development is that it should be collaborative. There is no better way to facilitate this than by making software open source so that the community can contribute.

The gastroenterology data sciences institute believes that all the software projects developed under its guidance should be open sourced. Most of these, often with explanatory vignettes, can be found on the Institute’s github page and we would encourage you to contribute to all the projects there.

Of particular interest for the open source community, is the development of synthetic datasets around endoscopy and pathology. This gives the community the opportunity to see what a typical endoscopy or pathology report might look like so that analyses can be based on real world examples.

Synthetic datasets

There are two versions of synthetic endoscopy data. The first is free text and relates to a complete endoscopic report found often on older software versions before the development of the National Endoscopy Database. This can be found [here](https://github.com/sebastiz/FakeEndoReports

This dataset will also generate pathology reports that are relevant to the endoscopy report that has been generated. Pathology reports related to tissue taken at endoscopy allow a more complete examination of the performance of the endoscopic procedure.

The second synthetic dataset contains the endoscopy report only and is based on the National Endoscopy Database (NED) data collection framework. The fields are formatted according to the data that is automatically collected for NED. This work is ongoing and the scripts for the generation of data can be found here:

Hospital Episode Statistics

Hospital Episode Statistics is an anonymised episode level dataset provided via NHS digital that allows the understanding of patient episodes. This has a potential to be linked to other datasets if the data linkage is appropriate. In order to experiment with this dataset and explore the full potential, a further synthetic dataset is in the process of being developed here: