The Lab is the Research & Development department of Lingua Custodia. Each member is a Doctor in Machine Learning and a specialist in Natural Language Processing (NLP). It has a triple mission: to actively contribute to academic and industrial research in these specific fields, to maintain the products and services offered by Lingua Custodia at the state of the art, and to develop new value-added applications for our clients.
Lingua Custodia Lab works on very different topics, with Natural Language Processing as a common denominator.
Here are the reserach projects created by the Lab's team.
Neural Machine Translation (NMT) is the state-of-the-art in Machine Translation as it can generate high quality translation. However, we restricted translation to this stage, it would be less accurate for a specialised financial text. Specialised corpora used to feed our engines are often of modest size in comparison with corpora issued from the general domain. Our added value lies precisely in our ability to enrich our training data with financial terminology. Selective data training is one of most investigated approaches aiming to increase the training dataset without introducing too much noise. In this project we study how to select the best portion of general or noisy data.
Despite their effectiveness, Neural Machine Translation (NMT) systems exhibit an important drawback, namely they do not provide explicit source-target link between original segments and translated segments. Without this source-target link, it is particularly laborious for NMT systems to enforce terminological constraints in domain-specific scenarios (e.g. financial translations). The objective of this project is two folds. First, it consists of building bilingual terminologies (lexicons) in the considered. The second part consists of extending existing NMT systems on new systems able to efficiently integrate the specific terminological constraints.
Neural machine models are very powerful models, achieving state-of-the-art results in most languages. However, such models do not contain hard alignments between source and target words, making it difficult to handle tasks such as translating markup tags (e.g. html, xml). A markup tag gives format indications: it is attached to a word to indicate that it should be in italics for example. In this project, we will create markup-enriched parallel training data and design a network that is able to correctly translate markups and put them back in the correct positions in the translated text. As a result, it will now be possible to automatically translate html, xml and many other document types, which was not possible with traditional neural machine translation models.
Discover the prototypes developed by the Lab team and based on Natural Language Processing applied to finance.
This research paper “Encouraging Neural Machine Translation to Satisfy Terminology Constraints” has been accepted for publication at the ACL 2021 conference, in the Findings of ACL 2021. It presents a new approach to encourage neural machine translation to satisfy lexical constraints. ACL is the premier conference of the field of computational linguistics, rewarding the most promising research papers woldwide. This new recognition confirms the Lab’s position as a leader in the NLP field, alongside most prestigious companies: Google Research, Facebook AI or Amazon Sciences.
This workshop aims at discovering the recent advanced on data representation for clustering under different approaches. Thereby, the LDRC workshop is an opportunity to (i) present the recent advances in data representation based clustering algorithms; (ii) outline potential applications that could inspire new data representation approaches for clustering; and (iii) explore benchmark data to better evaluate and study data representation based clustering models.
Head of the Lab Raheel Qader gave a lecture at the Université Grenoble Alpes, to master students majoring in Financial Engineering. The lecture talks about Natural Language Processing applied to Finance, and more specifically Machine Translation.