PhD Position: Video description by textual and semantic enrichment

Keywords: video description, deep learning, convolutional neural network, coreference resolution, knowledge graph, multi-tasking, multimodality

Abstract:  The topic of this PhD thesis is the automatic generation of video descriptions based on automatic natural language processing and deep learning. The goal is to overcome the limitations of existing databases in terms of encoding, multimodality, standardization, ground truth and contextualization to improve the performance of video description methods. To this end, we plan to apply convolutional neural networks on videos enriched with textual and semantic data, relying in particular on the knowledge graphs of the Web of Data. This PhD thesis work involves solving scientific challenges such as coreference resolution and multi-task and multi-modal processing for performance evaluation. In addition, the project will contribute to the development of large-scale standardized databases for performance evaluation of video description methods, which is essential for future research in this area.

Laboratory: LIFAT, BDTLN and RFAI teams, University of Tours  

Location: Blois and Tours 

Duration: 3 years 

Funding: A fully funded 3-years PhD position / salary : 1600€

Candidate profile: Master’s degree in Computer Science, initiation to research (teaching, or project, or internship), motivation for NLP and deep learning imagery / Some knowledge in French would be appreciated.

Link to candidate in ADUM: link

Deadline for submission of complete application in ADUM: 15/05/2023 

Required documents: M1 and M2 report cards, letters of recommendation

Contacts : / , nathalie.friburger@univ-tours.frdonatello.conte@univ-tours.frarnaud.soulet@univ-tours.frLinks : , ,  ,  ,

Full description of the subject: