Keywords: video description, deep learning, convolutional neural network, coreference resolution, knowledge graph, multi-tasking, multimodality
Abstract: The topic of this PhD thesis is the automatic generation of video descriptions based on automatic natural language processing and deep learning. The goal is to overcome the limitations of existing databases in terms of encoding, multimodality, standardization, ground truth and contextualization to improve the performance of video description methods. To this end, we plan to apply convolutional neural networks on videos enriched with textual and semantic data, relying in particular on the knowledge graphs of the Web of Data. This PhD thesis work involves solving scientific challenges such as coreference resolution and multi-task and multi-modal processing for performance evaluation. In addition, the project will contribute to the development of large-scale standardized databases for performance evaluation of video description methods, which is essential for future research in this area.
Laboratory: LIFAT, BDTLN and RFAI teams, University of Tours
Location: Blois and Tours
Duration: 3 years
Funding: A fully funded 3-years PhD position / salary : 1600€
Candidate profile: Master’s degree in Computer Science, initiation to research (teaching, or project, or internship), motivation for NLP and deep learning imagery / Some knowledge in French would be appreciated.
Link to candidate in ADUM: link
Deadline for submission of complete application in ADUM: 15/05/2023
Required documents: M1 and M2 report cards, letters of recommendation
Contacts : email@example.com / , firstname.lastname@example.org, email@example.com, firstname.lastname@example.orgLinks :https://international.univ-tours.fr/, https://lifat.univ-tours.fr/ , https://www.rfai.lifat.univ-tours.fr/ , https://www.info.univ-tours.fr/~soulet/ , https://www.info.univ-tours.fr/~friburger/ , http://mathieu.delalandre.free.fr/
Full description of the subject: