research development NLP

Discover the Lab

The Lab is the Research & Development department of Lingua Custodia. Each member is a Doctor in Machine Learning and a specialist in Natural Language Processing (NLP). It has a triple mission: to actively contribute to academic and industrial research in these specific fields, to maintain the products and services offered by Lingua Custodia at the state of the art, and to develop new value-added applications for our clients. 

The team

Doctor in Machine Learning

Melissa Ailem

Melissa Ailem

Researcher

DBLP
Google Scholar

Doctor en Natural Language Processing

Raheel Qader

Raheel Qader

Head of Lab

DBLP
Google Scholar

Doctor in Machine Learning

Jingshu Liu

Jingshu Liu

Engineer

DBLP
Google Scholar

Partners

Projects

Lingua Custodia Lab works on very different topics, with Natural Language Processing as a common denominator.
Here are the reserach projects created by the Lab's team. 

Prototypes

Discover the prototypes developed by the Lab team and based on Natural Language Processing applied to finance.

Extra-financial Criteria Lexical Comparator

This prototype, developed by Lingua Custodia’s Lab, gives an analysis of two documents and is based on the GRI Standards.

comparator

Activities

TALN+RECITAL Conference 2022

The Lab team publishes and presents their approach to integrating terminology constraints into neural translation models.

From 27 June to 1 July 2022, the French Computer Science and Systems Laboratory (“LIS”) and the Computer Laboratory from Avignon University (“LIA”), alongside the Association for Natural Language Processing (ATALA) jointly organise the 29th conference about NLP (TALN in French) and the 24th meeting of Students Researchers in Computer Science for NLP (RECITAL in French).

Covid-19 MLIA Challenge

The Lingua Custodia Lab made it 1st in 4 out of 7 languages pairs in the 2nd round of Covid-19 MLIA Challenge – a European collaboration project supported by the European Commission, the European Language Resources Coordination (ELRC), and several other entities and universities.

The project organises a community evaluation effort aimed at accelerating the creation of resources and tools for improved MultiLingual Information Access using Machine Learning technology.

WMT 2021

This paper describes Lingua Custodia’s submission to the WMT21 shared task on machine translation using terminologies. We consider three directions, namely English to French, Russian, and Chinese. We rely on a Transformer-based architecture as a building block, and we explore a method which introduces two main changes to the standard procedure to handle terminologies. The first one consists in augmenting the training data in such a way as to encourage the model to learn a copy behavior when it encounters terminology constraint terms. The second change is constraint token masking, whose purpose is to ease copy behavior learning and to improve model generalization.Empirical results show that our method satisfies most terminology constraints while maintaining high translation quality.

ACL 2021 Conference

This research paper “Encouraging Neural Machine Translation to Satisfy Terminology Constraints” has been accepted for publication at the ACL 2021 conference, in the Findings of ACL 2021. It presents a new approach to encourage neural machine translation to satisfy lexical constraints. ACL is the premier conference of the field of computational linguistics, rewarding the most promising research papers woldwide. This new recognition confirms the  Lab’s position as a leader in the NLP field, alongside most prestigious companies: Google Research, Facebook AI or Amazon Sciences.

IJCAI-PRICAI: Learning Data Representation for Clustering

This workshop aims at discovering the recent advanced on data representation for clustering under different approaches. Thereby, the LDRC workshop is an opportunity to (i) present the recent advances in data representation based clustering algorithms; (ii) outline potential applications that could inspire new data representation approaches for clustering; and (iii) explore benchmark data to better evaluate and study data representation based clustering models.

Lecture at Université Grenoble Alpes

Head of the Lab Raheel Qader gave a lecture at the Université Grenoble Alpes, to master students majoring in Financial Engineering. The lecture talks about Natural Language Processing applied to Finance, and more specifically Machine Translation.

Twitter
LinkedIn