Datasets

Here are the 9 datasets produced and available on HuggingFace:

  • JuDDGES/pl-court-raw: Raw data from Polish court decisions | DOI : 10.57967/hf/4700
  • JuDDGES/pl-court-instruct: Annotated and structured data from Polish court decisions | DOI:10.57967/hf/4699
  • JuDDGES/pl-court-graph: Graphical representations of Polish courts judgements | DOI:10.57967/hf/4701
  • JuDDGES/en-court-raw: Raw data from England and Wales Appeal Court judgements | DOI:10.57967/hf/4700
  • JuDDGES/en-appealcourt : English Appeal Court Judgments
  • JuDDGES/en-court-instruct : Judgments data from the Court of Appeal (Criminal Division) of England and Wales. | DOI:10.57967/hf/4697
  • JuDDGES/en-appealcourt-coded-instruct_v02 : Information-extraction examples derived from public Court of Appeal (Criminal Division) judgments of England & Wales
  • JuDDGES/pl-swiss-franc-loans : Large Language Models (LLMs) for information extraction in domain of Polish court judgments ragarding Swiss Franc loans cases.
  • JuDDGES/pl-nsa : Supreme Administrative Court of Poland judgements

Project-related Publications

Dhami, M., Kajdanowicz, T., Windridge, D., & Boukacem-Zeghmouri, C. (2025). Judges on Trial: The Future of Research on Judicial Decision-Making. In BIG IDEAS in Forensic Psychology. Wiley. https://doi.org/10.5281/zenodo.15658177

Dhami, M. (2025, juin 11). JOWS#4 – Mandeep Dhami – The story behind JuDDGES. JuDDGES Open Wednesday Seminar (JOWS), online. Zenodo. https://doi.org/10.5281/zenodo.15649588

Fillaud, C., & Boukacem-Zeghmouri, C. (2025). JuDDGES (Judicial Decision Data Gathering, Encoding and Sharing) DMP CHIST-ERA. Zenodo. https://doi.org/10.5281/zenodo.15441508

Recent Publications by Project Members

Withers, C. A., Rufai, A. M., Venkatesan, A., Tirunagari, S., Lobentanzer, S., Harrison, M., & Zdrazil, B. (2025). Natural language processing in drug discovery: bridging the gap between text and therapeutics with artificial intelligence. Expert Opinion on Drug Discovery, 20(6), 765–783. https://doi.org/10.1080/17460441.2025.2490835

Mirza, A., Alampara, N., Ríos-García, M., Abdelalim, M., Butler, J., Connolly, B., Dogan, T., Nezhurina, M., Şen, B., Tirunagari, S., Worrall, M., Young, A., Schwaller, P., Pieler, M., & Jablonka, K. M. (2025). ChemPile : A 250GB Diverse and Curated Dataset for Chemical Foundation Models (No. arXiv:2505.12534). arXiv.https://doi.org/10.48550/arXiv.2505.12534

Siddiqui, A.A., Tirunagari, S., Zia, T. et al. (2025) A latent diffusion approach to visual attribution in medical imaging. Sci Rep 15, 962 . https://doi.org/10.1038/s41598-024-81646-x

Mandeep K. Dhami, Jessica K. Witt & Peter De Werd (2025) Visualizing versus verbalizing uncertainty in intelligence analysis, Intelligence and National Security, 40:2, 302-327. https://doi.org/10.1080/02684527.2025.2468049

D Önkal, MK Dhami (2025) – Measuring the quality of scenarios generated using the simple scenarios technique – Improving and Enhancing Scenario Planning. https://doi.org/10.4337/9781035310586.00032

J Pina-Sánchez, MK Dhami, JP Gosling (2024). Which are the main characteristics determining sentence severity? An empirical exploration of shoplifting offences using spike-and-slab models – Research Handbook on Judicial Politics. https://www.crimrxiv.com/pub/2c1pvzvd

MK Dhami, Y Zhu – Decision (2024), Possibilities for decision science in the metaverse – American Psychological Association. https://doi.org/10.1037/dec0000235

Sawczyn, A., Binkowski, J., Janiak, D., Gabrys, B., & Kajdanowicz, T. (2025). FactSelfCheck : Fact-Level Black-Box Hallucination Detection for LLMs (No. arXiv:2503.17229). arXiv. https://doi.org/10.48550/arXiv.2503.17229

Binkowski, J., Janiak, D., Sawczyn, A., Gabrys, B., & Kajdanowicz, T. (2025). Hallucination Detection in LLMs Using Spectral Features of Attention Maps (No. arXiv:2502.17598). arXiv. https://doi.org/10.48550/arXiv.2502.17598

Siddiqui, A. A., Tirunagari, S., Zia, T., & Windridge, D. (2025). A latent diffusion approach to visual attribution in medical imaging. Scientific Reports, 15(1), 962. https://doi.org/10.1038/s41598-024-81646-x

Tubaishat, A., Zia, T., Windridge, D. et al. (2024) Contrastive concept-phrase pre-training for generating clinically accurate and interpretable chest X-ray reports. Neural Comput & Applic . https://doi.org/10.1007/s00521-024-10640-1

Daniel, E. C., Tirunagari, S., Batth, K., Windridge, D., & Balla, Y. (2024). Interpretable Machine Learning for Predicting Multiple Sclerosis Conversion from Clinically Isolated Syndrome. Health Informatics. https://doi.org/10.1101/2024.07.18.24310578

Bystroński, M., Hołysz, M., Piotrowski, G., Chawla, N. V., & Kajdanowicz, T. (2025). SMOTExT : SMOTE meets Large Language Models (No. arXiv:2505.13434). arXiv. https://doi.org/10.48550/arXiv.2505.13434