Table of Contents

GBX9/WMM9MO75 - Natural Language Processing & Information Retrieval

Note: this is a temporary replacement site site for the GBX9/WMM9MO75 NLP&IR course for as long as chamilo INP is down. Please switch to chamilo INP again as soon as it is restored.

As a reminder, this year NLP&IR course is a single 6-ECTS course which is a fusion fusion of the two last year's independent NLP and IAR 3-ECTS courses and we reuse here the URL of last year's IAR course.

Natural Language Processing part

Information Retrieval part

This course is given by Jean-Pierre Chevallet, Philippe Mulhem, Lorraine Goeuriot, Petra Galuscakova and Georges Quénot from the Multimedia Information Modeling and Retrieval (MRIM) research group of the Grenoble Informatics Laboratory (LIG).

Contact: Georges.Quenot@imag.fr

Contents:

Notes: __ 1. Not all of these course have been updated here yet.
2. This is the logical order of the courses, not the actual order in time. In particular “Deep learning for multimedia indexing and retrieval - part 1” was given at the beginning as is it mostly an introduction to deep learning, which was useful for the other courses.

Part I. Foundations of Information Retrieval (Jean-Pierre Chevallet and Philippe Mulhem)

Part II: Web, social networks and health (Philippe Mulhem, Lorraine Goeuriot)

Part III: Multimedia indexing and retrieval (Georges Quénot)

Reference to IR books or papers

First session examination

The examination will be on February 1st, 2023 from 2:00 to 5:00 pm, ENSIMAG Amphitheater D.
The three papers related to the examinations, personal notes on no more than 2 double-sided A4 pages, and calculators (without network capabilities) are allowed.

You are expected to do a research work on the three papers proposed below, in a way to understand them and to be able to comment then. You will have to answer questions on topics that occur in the lessons. You must also take time to read complementary information in order to understand the papers. Be sure to bring with you a copy of the three research papers as they will NOT be redistributed with the examination subject. These can be annotated by you. The bibliography and appendices, if any, are part of the papers.

The papers for the 2022/2023 exam are:

  1. Mirco Ravanelli et al, SpeechBrain: A General-Purpose Speech Toolkit, https://arxiv.org/pdf/2106.04624.pdf Only the 16 first pages, Appendices are excluded.
  2. Sibo Dong, Justin Goldstein and Grace Hui Yang, SEINE: SEgment-based Indexing for NEural information retrieval, 2022, https://infosense.cs.georgetown.edu/publication/129.pdf
  3. X. Li, F. Zhou, C. Xu, J. Ji and G. Yang, “SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries,” in IEEE Transactions on Multimedia, vol. 23, pp. 4351-4362, 2021. Please use the freely accessible arXiv version: https://arxiv.org/pdf/2011.12091.pdf

Here are answers to some questions asked by one student following this course about the examination:

1) In what form will we have an exam - in the form of questions on the article or in the form of questions on the whole course?
The questions will be mostly on the paper but they can be about the course as well. You can look at the examinations from the previous years to get an idea of which type of questions were asked in the previous years for the IAR part.

2) What topics will be considered in the exam if the exam is in the form of questions? In this course, we have a lot of information on a very wide range of issues - is it worth preparing for everything or focusing on something specific? For example, to focus on NLP and not pay so much attention to IR or to voice recognition.
I (Georges Quénot) can't tell for NLP versus voice recognition; maybe Didier Schwab will give information about this. You should pay attention to IR as we will balance the questions between both topics. It is probably good to focus on the parts of the course relevant to the papers but there may be a few general question about other parts too.

3) What materials can be used in the exam? Is it possible to use presentation slides or only handwritten notes?
What you can bring with you for the exam is specified on the temporary site gbx9mo23.imag.fr You will need to bring your own copies of the papers and these can be manually annotated by you. In addition, you are allowed to bring notes on no more than two double-sided A4 sheets. This is indeed not enough for printing all of the course material si you would have to make yourself a synthesis of the most important parts.

Previous years examinations - Information retrieval only

2017-2018 examination: gbx9mo23-2017-2018-exam.pdf, papers:
https://www.researchgate.net/publication/305081616_A_Simple_Enhancement_for_Ad-hoc_Information_Retrieval_via_Topic_Modelling,
http://www.tyr.unlu.edu.ar/tallerIR/2014/papers/novel-tfidf.pdf,
http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf

2018-2019 examination: gbx9mo23-2018-2019-exam.pdf, papers:
https://danluu.com/bitfunnel-sigir.pdf,
https://arxiv.org/pdf/1604.01325.

2019-2020 examination: gbx9mo23-2019-2020-exam.pdf, papers:
http://openaccess.thecvf.com/content_CVPR_2019/papers/Dong_Dual_Encoding_for_Zero-Example_Video_Retrieval_CVPR_2019_paper.pdf
https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1302

2020-2021 examination: gbx9mo23-2020-2021-exam.pdf, papers:
https://arxiv.org/pdf/1707.05612
https://people.cs.umass.edu/~elm/papers/zamani.pdf

2021-2022 examination: gbx9mo23-2021-2022-exam.pdf, papers:
https://www.cs.rit.edu/~rlaz/files/LTR_Formulas_SIGIR2021.pdf
http://vireo.cs.cityu.edu.hk/papers/MM2020_dual_task_video_retrieval.pdf

Second session examination

The second session examination will be handled separately for the Natural Language Processing (NLP) and Information Retrieval (IR) parts. Marks will be given independently and then averaged.

Second session examination, NLP part

To come (or will be sent directly by mail).

Second session examination, IR part

The second session IR examination will take place on April 26, 27 and and 28, from 10:00am to 11:30am. It will be a 30 minute oral examination and it will he held remotely via zoom. You will individually receive an email with the exact day and time and the zoom link you should use. You may have to wait in the waiting room if we happen to have delays.

There is no new paper to study for this session. Questions and/or exercises will be related to the papers selected for session 1 (above) with some related to the course in relation or not with the papers.

2021-2022 page