Available from Master 2 MOSIG and MSIAM. Course Description
This course is given by Jean-Pierre Chevallet, Philippe Mulhem, Lorraine Goeuriot and Georges Quénot from the Multimedia Information Modeling and Retrieval (MRIM) research group of the Grenoble Informatics Laboratory (LIG).
Contact: Georges.Quenot@imag.fr
Contents / schedule:
Part I. Foundations of Information Retrieval (Jean-Pierre Chevallet and Philippe Mulhem)
Part II: Web, social networks and health (Philippe Mulhem, Lorraine Goeuriot)
Part III: Multimedia indexing and retrieval (Georges Quénot)
Reference to IR books or papers
The goal of this practical is to implement in Python a skeleton of an IR system. The fundamental structure to implement is a inverted file that contents for each term, a document internal Id (integer) and the frequency (integer) of this term in this document.
You must accept the following implementation constraints to reduce the size of this inverted file:
For that, use the python basic and efficient array structure. This is the only imposed data structure for this project.
The inverted file can be loaded in memory for more efficiency when searching. Beside that, you have to store document external Id in a sequential structure that can be saved. Also, the dictionary should be programmed using a basic python structure, that can be saved.
Optionally, you can used the following libraries:
For the data set, you can use the following test collection (document and solved query) :
The minimum matching model to implement are the Vector Space Model and one simple Language Model.
You can do you projet up to 3 persons and you have to send the following elements packaged in a compressed file:
Please do not include any data to reduce the file final size. Produce a minimalist short code: the goal of this project is to better understand IR system real working.
Then send the result to : jean-pierre.chevallet@univ-grenoble-alpes.fr, before the official exam period (around end of January 2022. Meanwhile, you can send technical question to jean-pierre.chevallet@univ-grenoble-alpes.fr.
The examination will be on February 2nd, 2022 from 9:45 to 11:45am in ENSIMAG Amphi E.
Course materials, the two papers related to the examinations, personal notes, and calculators (without network capabilities) are allowed.
You are expected to do a research work on the two papers proposed below, in a way to understand them and to be able to comment then. You will have to answer questions on topics that occur in the lessons. You must also take time to read complementary information in order to understand the papers. Be sure to bring with you a copy of the two research papers as they will NOT be redistributed with the examination subject. These can be annotated by you. The bibliography and appendices, if any, are part of the papers.
The papers for the 2021/2022 exam are:
2017-2018 examination: gbx9mo23-2017-2018-exam.pdf, papers:
https://www.researchgate.net/publication/305081616_A_Simple_Enhancement_for_Ad-hoc_Information_Retrieval_via_Topic_Modelling,
http://www.tyr.unlu.edu.ar/tallerIR/2014/papers/novel-tfidf.pdf,
http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf
2018-2019 examination: gbx9mo23-2018-2019-exam.pdf, papers:
https://danluu.com/bitfunnel-sigir.pdf,
https://arxiv.org/pdf/1604.01325.
2019-2020 examination: gbx9mo23-2019-2020-exam.pdf, papers:
http://openaccess.thecvf.com/content_CVPR_2019/papers/Dong_Dual_Encoding_for_Zero-Example_Video_Retrieval_CVPR_2019_paper.pdf
https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1302
2020-2021 examination: gbx9mo23-2020-2021-exam-tmpkxtz.pdf, papers:
https://arxiv.org/pdf/1707.05612
https://people.cs.umass.edu/~elm/papers/zamani.pdf