The Lab is the Research & Development department of Lingua Custodia. Each member is a Doctor in Natural Language Processing (NLP) and a specialist in Machine Learning. The Lab team has a triple mission: to actively contribute to academic and industrial research in these specific fields, to maintain the products and services offered by Lingua Custodia at the state of the art, and to develop new value-added applications for our clients.
The Lab works on very different topics, with Natural Language Processing as a common denominator.
Here are the reserach projects created by the Lab's team.
Neural Machine Translation (NMT) is the state-of-the-art in Machine Translation as it can generate high quality translation. However, we restricted translation to this stage, it would be less accurate for a specialised financial text. Specialised corpora used to feed our engines are often of modest size in comparison with corpora issued from the general domain. Our added value lies precisely in our ability to enrich our training data with financial terminology. Selective data training is one of most investigated approaches aiming to increase the training dataset without introducing too much noise. In this project we study how to select the best portion of general or noisy data.
Despite their effectiveness, Neural Machine Translation (NMT) systems exhibit an important drawback, namely they do not provide explicit source-target link between original segments and translated segments. Without this source-target link, it is particularly laborious for NMT systems to enforce terminological constraints in domain-specific scenarios (e.g. financial translations). The objective of this project is two folds. The first part consists of building bilingual terminologies (lexicons) in the considered. The second part consists of extending existing NMT systems on new systems able to efficiently integrate the specific terminological constraints.
Neural machine models are very powerful models, achieving state-of-the-art results in most languages. However, such models do not contain hard alignments between source and target words, making it difficult to handle tasks such as translating markup tags (e.g. html, xml).A markup tag gives format indications: it is attached to a word to indicate that it should be in italics for example. In this project, we will create markup-enriched parallel training data and design a network that is able to correctly translate markups and put them back in the correct positions in the translated text. As a result, it will now be possible to automatically translate html, xml and many other document types, which was not possible with traditional neural machine translation models.
Lingua Custodia’s team is composed of a diversified mix of profiles, strongly skilled in their area of expertise, all committed to our entrepreneurial adventure. But what we value the most at Lingua Custodia are soft skills: Team spirit, trustfulness, open-minded thinking, enthusiasm, freedom to try new ideas or practices. Purpose Internship supervised by the Lab […]