TN-Mammo: A Curated Multi-View Mammography Dataset with Consensus Radiologist Annotations for Breast Density Stratification in a Vietnamese Cohort

Hanan Alsagheer Amir EL-sseid; Fatah Mohamed Shakrum; Ebtisam Mohamed Fakroun; Mohamed EL-sseid; Abdussalam Ali Ahmed; Yasser Fathi Nassar; Abdulgader Alsharif

doi:10.65405/t18s9k62

المؤلفون

Hanan Alsagheer Amir EL-sseid Statistical Analysis Department, Faculty of Applied Science, Sebha University, Sabha, Libya المؤلف
Fatah Mohamed Shakrum High Institute of Medical Technology Abo salim, Tripoli, Libya المؤلف
Ebtisam Mohamed Fakroun Information Technology, The College Of Industrial Technology, Mısrata, Libya المؤلف
Mohamed EL-sseid Department of Software Engineering, Ankara Bilim University, Türkiye المؤلف
Abdussalam Ali Ahmed Mechanical and Industrial Engineering Department, Bani Waleed University, Libya المؤلف
Yasser Fathi Nassar Wadi Alshatti University, Brack, Libya, Libya المؤلف
Abdulgader Alsharif Department of Electric and Electronic Engineering, College of Technical Sciences Sebha, Libya المؤلف

DOI:

https://doi.org/10.65405/t18s9k62

الكلمات المفتاحية:

مجموعة بيانات تصوير الثدي الشعاعي، تصنيف كثافة الثدي، BI-RADS، التصوير متعدد المشاهد، توافق آراء أخصائيي الأشعة، مجموعة فيتنامية، شرح الصور الطبية، تنسيق البيانات الجاهزة للذكاء الاصطناعي

الملخص

لا يزال التصوير الشعاعي الرقمي للثدي الطريقة الرئيسية لفحص السكان، ولا يزال الكشف المبكر عن سرطان الثدي ضروريًا للإدارة السريرية الناجحة. وقد أظهرت الأساليب الحسابية الحديثة، وخاصة تلك التي تستخدم بنى التعلم العميق، إمكانات واعدة في تحسين التقييم الإشعاعي؛ ومع ذلك، فإن تمثيلية ودقة بيانات التدريب الأساسية تحدّ بطبيعتها من فعاليتها. وتعاني مجموعات بيانات التصوير الشعاعي للثدي العامة الحالية أحيانًا من مشكلات تتعلق باكتمال الصور متعددة الزوايا، أو عمليات التوصيف، أو التنوع الديموغرافي، مما يحد من إمكانية تعميمها على سياقات سريرية متنوعة. وقد مثّل هذا البحث مجموعة بيانات TN-Mammo، وهي مجموعة بيانات مختارة بعناية للتصوير الشعاعي للثدي متعدد الزوايا من مجموعة مرضى فيتناميين، لسدّ هذه الثغرات. وقد ساهمت كل مشاركة بصور متطابقة تشريحيًا للثديين الأيمن والأيسر في المكتبة، والتي تتضمن إسقاطات ثنائية من أعلى إلى أسفل (CC) ومن الجانب إلى الجانب (MLO) لـ 676 فردًا. وقام اثنان من أخصائيي الأشعة المعتمدين بتقييم تصنيف كثافة الثدي بشكل منفصل، وهو مؤشر حاسم لتصنيف مخاطر الإصابة بالسرطان، باستخدام منهجية مزدوجة التعمية. استُخدمت عملية التحكيم التوافقي لتحديد التصنيفات النهائية. ولضمان التوافق السريري، تلتزم تصنيفات الكثافة بإطار عمل BI-RADS ذي المستويات الأربعة (الفئات من أ إلى د). يصف هذا البحث عملية جمع البيانات، وتقنية الشرح، ومقاييس الاتفاق بين المُشاهدين، والخصائص الإحصائية الأساسية لمجموعة البيانات. يهدف هذا البحث إلى المساهمة في تطوير أنظمة ذكاء اصطناعي عادلة ومراعية لاحتياجات السكان لفحص سرطان الثدي في المناطق الأقل تمثيلاً، وتمكين إجراء بحوث قابلة للتكرار في التشخيص بمساعدة الحاسوب مع مراعاة الكثافة، وذلك من خلال إتاحة بيانات TN-Mammo للجمهور عبر PhysioNet.

التنزيلات

تنزيل البيانات ليس متاحًا بعد.

المراجع

[1] Heath, M., et al. (2000). The Digital Database for Screening Mammography. IWDM.

[2] Lee, R. S., et al. (2017). A curated mammography dataset for computer-aided detection and diagnosis. Scientific Data, 4, 170177.

[3] Nguyen, H. Q., et al. (2021). VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in breast cancer screening. arXiv preprint arXiv:2110.13130.

[4] Dembrower, K., et al. (2020). EMBED: A multi-ethnic mammography dataset for breast density assessment. Medical Physics, 47(12), 6235–6245.

[5] American College of Radiology. (2013). ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System (5th ed.). Reston, VA.

[6] Nguyen, B., Le, C., Vu, L., Nguyen, Q., Pham, H. H., Vu, P. A., ... & PhysioBank, P. (2025). TN-Mammo: A Multi-view Mammography Dataset for Breast Density Classification.‏ https://doi.org/10.13026/34kz-bk76

[7] Ben Dalla, L. O. F., Medeni, T. D., Medeni, I. T., & Ulubay, M. (2025). Enhancing Healthcare Efficiency at Almasara Hospital: Distributed Data Analysis and Patient Risk Management. Economy: Strategy and Practice, 19(4), 54–72. https://doi.org/10.51176/1997-9967-2024-4-54-72

[8] Dalla, L. O. F. B. (2020). The Influence of hospital management framework by the usage of Electronic healthcare record to avoid risk management (Department of Communicable Diseases at Misurata Teaching Hospital: Case study).‏ EHRM, 20(4), 22–52. https://doi.org/20.51176/1954-9923-2020-4-22-52

[9] Gergerli, B., Çelebi, F. V., Rahebi, J., & Şen, B. (2023). An Approach Using in Communication Network Apply in Healthcare System Based on the Deep Learning Autoencoder Classification Optimization Metaheuristic Method. Wireless Personal Communications, 1-24.‏

[10] Dalla, L. O. F. B. (2020). Dorsal Hand Vein (DHV) Verification in Terms of Deep Convolutional Neural Networks with the Linkage of Visualizing Intermediate Layer Activations Detection. International Journal of Engineering and Modern Technology E-ISSN 2504-8848 P-ISSN 2695-2149 Vol 6 No 2 (2020). www.iiardpub.org ‏

[11] Dalla, L. O. F. B. (2020). Convolutional Neural Network Baseline Model Building for Person Re-Identification.‏ International Journal of Engineering and Modern Technology E-ISSN 2504-8848 P-ISSN 2695-2149 Vol. 6 No. 3 2020 www.iiardpub.org

[12] Karal, Ö., & Dalla, L. O. F. B. Lung Nodule Characterization in CT Scans Using Hybrid 3D Attention U-Net Segmentation and Transfer Learning-Based Classification Approach.‏ Comprehensive Journal of Science, Volume (10), Issue (37), (NOV. 2025) Special issue for the Third International Conference on Science and Technology, www.sicst.ly, SICST2025, ISSN: 3014-6266, Reply: 6266-3014

[13] Çakır, M., Degirmenci, A., & Karal, O. (2022, February). Exploring the behavioural factors of cervical cancer using ANOVA and machine learning techniques. In International Conference on Science, Engineering Management and Information Technology (pp. 249-260). Cham: Springer Nature Switzerland.‏ https://doi.org/10.1007/978-3-031-40395-8_18

[14] Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37-46.‏https://doi.org/10.1177/001316446002000104

[15] Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. https://doi.org/10.1037/h0026256

[16] McGrath, J. E., & Meyer, G. J. (2006). When effect sizes disagree: The case of r and d. Psychological Methods, 11(4), 386-401. https://psycnet.apa.org/buy/2006-22258-004

[17] Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. https://doi.org/10.2307/2529310

[18] Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. api.taylorfrancis.com‏

[19] Lachin, J. M. (2008). Sample size evaluation for a multiply studied case in the placebo-controlled Diabetes Control and Complications Trial (DCCT). Statistics in Medicine, 27(14), 2541-2558. First published: 20 Sept (2007). https://doi.org/10.1002/sim.3057

[20] Biau, D. J., Kernéis, S., & Porcher, R. (2008). Statistics in brief: The importance of sample size in the planning and interpretation of medical research. Clinical Orthopaedics and Related Research, 466(9), 2282-2288. https://doi.org/10.1007/s11999-008-0346-9

[21] Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379-423. https://doi.org/ 10.1002/j.1538-7305.1948.tb01338.x

[22] Hüllermeier, E., & Waegeman, W. (2021). Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 110(3), 457-506. https://doi.org/ 10.1007/s10994-021-05946-3

[23] Sinecen, M., Cinar, M., Karal, O., Engin, M., Atesci, Y. Z., Makinaci, M., & Cakmak, B. (2009, May). Diagnosis of Prostat Cancer using Artificial Neural Networks. In 2009 14th National Biomedical Engineering Meeting (pp. 1-3). IEEE https://doi.org/10.1109/BIYOMUT.2009.5130296

[24] M. Yumus, M. Apaydin, A. Degirmenci, and O. Karal, “Missing data imputation using machine learning based methods to improve HCC survival prediction,” 2020 28th Signal Processing and Communications Applications Conference, SIU 2020 - Proceedings, Oct. 2020, doi: 10.1109/SIU49456.2020.9302222.

[25] Dalla, L. O. F. B. (2020). Convolutional Neural Network Baseline Model Building for Person Re-Identification.‏ International Journal of Engineering and Modern Technology E-ISSN 2504-8848 P-ISSN 2695-2149 Vol. 6 No. 3 2020 www.iiardpub.org

[26] Dalla, L. O. F. B. (2020). The Influence of hospital management framework by the usage of Electronic healthcare record to avoid risk management (Department of Communicable Diseases at Misurata Teaching Hospital: Case study).‏ E-ISSN 2876-9948 P-ISSN 2453-2149 Vol. 3 No. 1 2020 www.iiardpub.org

[27] Yalman, Y., Uyanık, T., Atlı, İ., Tan, A., Bayındır, K. Ç., Karal, Ö., ... & Guerrero, J. M. (2022). Prediction of voltage sag relative location with data-driven algorithms in distribution grid. Energies, 15(18), 6641.‏Energies 2022, 15(18), 6641; https://doi.org/10.3390/en15186641

[28] Nguyen, H. T., Nguyen, H. Q., Pham, H. H., Lam, K., Le, L. T., Dao, M., & Vu, V. (2023). VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography. Scientific Data, 10(1), 277.‏ https://www.nature.com/articles/s41597-023-02100-7

[29] Nguyen, B., Le, C., Vu, L., Nguyen, Q., Pham, H. H., Vu, P. A., ... & PhysioBank, P. (2025). TN-Mammo: A Multi-view Mammography Dataset for Breast Density Classification.‏ https://doi.org/10.13026/1kx0-xc60

[30] Nguyen, B., Le, C., Vu, L., Nguyen, Q., Pham, H., Vu, P. A., Huynh, T., Tien Dung, C., Diep Tuong, N., & Hong, B. (2025). TN-Mammo: A Multi-view Mammography Dataset for Breast Density Classification (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/1kx0-xc60

[31] Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation . 101 (23), pp. e215–e220. RRID:SCR_007345.

[32] Lim, Y. X., Lim, Z. L., Ho, P. J., & Li, J. (2022). Breast cancer in Asia: incidence, mortality, early detection, mammography programs, and risk-based screening initiatives. Cancers, 14(17), 4218; https://doi.org/10.3390/cancers14174218

‏[33] Nguyen, H. G. (2024). Identification of Asymptomatic Vertebral Fracture Using Artificial Intelligence Methods (Doctoral dissertation, University of Technology Sydney (Australia)).

[34] Nickson, C., Velentzis, L. S., Mann, G. B., Grogan, P., Bateson, D., & Canfell, K. (2025). Risk-adjusted breast screening: an Australian perspective and considerations for the Western Pacific. The Lancet Regional Health–Western Pacific, 57. https://doi.org/PIIS2666-6065(25)00057-4/fulltext‏

[35] Hill, H., Kearns, B., Duffy, S., Pashayan, N., Sasieni, P., & Offman, J. (2024). Estimating the cost-effectiveness of risk stratified breast cancer screening in the UK.‏ https://doi.org/10.1186/s12919-024-00306-0PDF

[36] Tun, H. M., Rahman, H. A., Naing, L., & Malik, O. A. (2025). Artificial intelligence utilization in cancer screening program across ASEAN: a scoping review. BMC cancer, 25(1), 703. https://doi.org/10.1186/s12885-025-14026-x‏

[40] Ben Dalla, L, O, F. (2021). Literature review (LR) on the powerful of Research methodology processes life cycle. In 2021 The Powerful of Research Methodology Processes Life Cycle Conference (TPRMPLCC) (pp. 1-10). IEEE.‏ https://doi.org/10.16543/TPRMPLCC 50717.2020.92876580

[41] Muttaqi, M., Degirmenci, A., & Karal, O. (2022, September). US accent recognition using machine learning methods. In 2022 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-6). IEEE. https://doi.org/10.1109/ASYU56188.2022.9925265

[42] Dulkadir, S. E. Z. G. İ. N., Tecimer, H. U., Parlaktürk, F., Altındal, Ş., & Karal, Ö. M. E. R. (2020). The effect of radiation on the forward and reverse bias current–voltage (I–V) characteristics of Au/(Bi4Ti3O12/SiO2)/n-Si (MFIS) structures. Journal of Materials Science: Materials in Electronics, 31(15), 12514-12521.‏https://doi.org/10.1007/s10854-020-03801-0

[43] Arık, D. T., Karal, Ö., & Şahin, A. B. (2020). A Comparative Study of Artificial Neural Networks and Naïve Bayes Techniques for the Classification of Radar Targets. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 9(4), 1779-1788.‏https://doi.org/10.17798/bitlisfen.676973

[44] Uysal, Z., Kalkancı, G., İmren, T., Değirmenci, A., Karal, Ö., & Çankaya, İ. (2016). A Heart Rate Monıtoring Applicatıon Using Wireless Sensor Network System Based on Bluetooth Wıth Matlab GUI. Int. J. Eng. Sci, 6, 2862.‏International Journal of Engineering Science and Computing, August 2016 , http://ijesc.org/

TN-Mammo: A Curated Multi-View Mammography Dataset with Consensus Radiologist Annotations for Breast Density Stratification in a Vietnamese Cohort

المؤلفون

DOI:

الكلمات المفتاحية:

الملخص

التنزيلات

المراجع

التنزيلات

منشور

إصدار

القسم

الرخصة

كيفية الاقتباس

الأعمال الأكثر قراءة لنفس المؤلف/المؤلفين

اللغة

IF

رخصة المشاع الإبداعي