Looking at the takedown notices, we often see specific files being targeted rather than entire repositories (possibly to justify the copyright infringement as required for a takedown notice, not a copyright expert; although it is clear that they only use DMCA notices as a last resort, for GitHub users they cannot identify, and who were likely not given access in the first place). A quarter of the files are genetic/genomics. Tabular data account for another large share and could contain phenotype or health records.
The exposure of Biobank data on GitHub is the latest in a long series of governance challenges for UK Biobank. The latest is today, with information of all half a million members listed for sale on Alibaba.