[1]Nguyen Van Thinh et al. 2024. OD-VR-Cap: Image captioning based on detecting and predicting relationships between objects. Journal of Computer Science and Cybernetics. 40, 4 (Dec. 2024), 327–346. DOI:https://doi.org/10.15625/1813-9663/20929.