How to understand the core concepts of AI, LLMs and RAG!

If you find some of the different terminology used for Large Language Models (LLMs) and AI confusing, you are not alone!

This is the first in a series of articles about AI, LLMs and Retrieval Augmented Generation (RAG) where we aim to explain clearly and succinctly, some of the key terminology you might be hearing about. We hope you find these posts helpful!


What are foundation models?


A foundation model is an AI model, trained on huge amounts of data (documents, audio, images, text….). It is trained to ‘generate’ the next word as it ‘learns’ the language. It should then be specialised and fine-tuned for a wide variety of applications and tasks, which then means it is no longer a foundation model!


What are LLMs?


A LLM is an umbrella term used for all foundation and specialised models.

For example:

In the case of Llama, the foundation model is not usable directly but serves as the foundation for all the subsequent specialised models. Llama instruct is a question and answering model and code Llama is a coding assistant.

All three models are LLMs.

What are the benefits and challenges of a foundation model?

In terms of benefits: 


Flexibility and adaptability

Foundation models are flexible and adaptable as they can be be fine-tuned for a wide range of tasks, saving time and resources compared to building new models from scratch for each specific task.

Cost efficient

While foundation models are costly, once you have them, you can adapt them as many times as you want on new tasks.

Accessibility

Open source foundation models are accessible as smaller companies with less access to computational resources can leverage these models to create innovative AI applications. (Note that there are many closed models which are not accessible!)

(Note – Open source foundation models – almost anyone can use, access the source code and customise the foundation model which in theory, improves accessibility, transparency etc.  Meta’s Llama 2 is an open source foundation model.  Chatgpt is not open source. 

As for the challenges: 

Bias

Foundation models are trained on large and diverse data sets which may contain biases present in the data, and which will be mirrored in the model’s outputs.

Security and privacy

The huge amounts of data needed to train a foundation model naturally raises security and privacy concerns.  The data should be secure and handled responsibility.

Lack of transparency

Foundation models can be a ‘black box’ .  The issue with data has already been highlighted.  In addition, it is important to understand how the foundation model generates its outputs to identify any potential errors or bias.  This is a hot topic with ongoing empirical studies.

Lingua Custodia wins the Large AI Grand Challenge Award organised by the European Commission!

AI award

Lingua Custodia wins the Large AI Grand Challenge

The French Fintech company Lingua Custodia, a specialist in Natural Language Processing (NLP) applied to Finance since 2011, was delighted to receive an award in Brussels yesterday. This award, which was presented by EU Commissioner Thierry Breton, is designed to reward innovative start-ups and SMEs for devising ambitious strategies and making commitments to develop large-scale AI foundation models that will provide a competitive edge for Europe.

Together with 3 other technology SMEs, Lingua Custodia will share a prize of a total of €1 million and access to two of Europe’s world-leading supercomputers, LUMI and LEONARDO for 8 million hours. This challenge was highly competitive and received 94 proposals.

Lingua Custodia’s AI foundation models

Lingua Custodia’s winning proposal focused on developing a series of AI foundation models with 3 major objectives, using the company’s existing skills and known expertise in the AI arena:

  • Build very cost effective, fast and efficient models to run on smaller servers and democratize the technology while reducing energy consumption
  • Ensure the models can handle multilingual queries and make them available to non-English speakers
  • Tune the models for the retrieval of information (RAG) to enhance the usage of generative AI for multilingual knowledge management.
Lingua Custodia’s focus on cost and energy efficient AI foundation models


Olivier Debeugny, CEO of Lingua Custodia, declared to Thierry Breton: “Lingua Custodia is an AI company, that has raised a modest amount of capital since its launch. This has been a catalyst for our creativity and resourcefulness and we therefore have the skills to optimize everything we develop. This is why we have been working on the design of multilingual, extremely cost and energy efficient models to be applied to an AI use case with a high Return on Investment.”

About Lingua Custodia

Lingua Custodia is a Fintech company leader in Natural Language Processing (NLP) for Finance. It was created in 2011 by finance professionals to initially offer specialised machine translation.

Leveraging its state-of-the-art NLP expertise, the company now offers a growing range of applications: Speech-to-Text automation, Linguistic data extraction from unstructured documents, etc.. and achieves superior quality thanks to highly domain-focused machine learning algorithms.

Its cutting-edge technology has been regularly rewarded and recognised by both the industry and clients: Investment houses, global investment banks, private banks, financial divisions within major corporations and service providers for financial institutions.