Amazon GenAI, GPTQ, Automatic prompt optimisation and more

and

Sep 15, 2023

Welcome to the ninth edition of our newsletter The Token! In this episode we take a brief look at how Amazon has integrated generative AI into its product reviews, the integration of GPTQ quantization in the transformers library as well as our new blog post series on automatic prompt engineering 🔥

As ever, let us know what you think, and if you find yourself in need of help with an NLP problem, get in touch at hi@mantisnlp.com.

🖱 Amazon GenAI

Amazon experiments with generative AI to help you navigate product reviews ✨

Product reviews are a great use case for Large Language Models (LLM) since it can be hard to create useful categories for them a priori in order to train a model to recognise them. Additionally the most interesting themes are new ones, for example that a user noticed a drop in quality recently.

Here are the individual components you need to replicate this

🧠 Extract themes from reviews, traditionally in an unsupervised way, now using an LLM

👍 Sentiment analysis of reviews, traditionally using a sentiment classifier, lately you can use an LLM to break down the reviews into sentiments for a particular theme or aspect

📄 Summarise all reviews or the reviews for a particular theme

Additionally, if you have all this information over time, you can also start to be able pick up anomalies i.e. cases where lately a certain theme is receiving more negative or positive reviews or a new theme is emerging.

🔗 Read more here https://techcrunch.com/2023/08/14/amazon-taps-generative-ai-to-enhance-product-reviews/?guccounter=1

🛠 HuggingFace AutoGPTQ integration

AutoGPTQ is now integrated in transformers 🔥 This means you can load models quantised with GPTQ in one line. You can also more easily quantise models of choice, with GPTQ, using optimum or fine tune a quantised model using PEFT 🚀

This drastically reduces the memory requirements to load a Large Language Model, meaning you can load quite large models and even fine tune those models using only 1GPU. This makes it much more easier to experiment and deploy those models. It also reduces inference costs 💸

🔗 Read more https://huggingface.co/blog/gptq-integration

🖱McKinsey Lilli

McKinsey is experimenting with an internal generative AI powered assistant 🔥

McKinsey announced they are testing an internal AI assistant called Lilli powered by a Large Language Model. The assistant seems to be a Retrieval Augmented Generation (RAG) system over their knowledge base of case studies, client interviews and white papers. Lilli seems to be crucial at the initial steps of a project where it is important to retrieve the most relevant knowledge from prior projects and identify experts to consult.

We believe internal AI assistants will proliferate inside companies to help them navigate their internal information better in the same way Google made it easier for all of us to navigate the world wide web. We believe these AI assistants will be RAG based to plug into internal data and custom in order to deliver the most relevant information for each company.

🔗 Read more here https://www.mckinsey.com/about-us/new-at-mckinsey-blog/meet-lilli-our-generative-ai-tool

🧪 Nougat

The typical pipeline for extracting information from document image formats like pdfs is to perform Optical Character Recognition (OCR) and then use a model, like LayoutLM, to extract the information you are interested in. This approach comes with two main drawbacks

1. Errors propagate from the OCR step

2. Post processing the information to create a structured output creates further errors

Instead you can train an end to end transformer model that receives a document image as an input and outputs structured information. This allows you to better control those errors and has been shown to perform better. You can also easily create synthetic data to train such a model and unlike LLMs, these models tend to be quite small and easily trainable 🚀

A good example of the benefits of this approach can be seen in a recent work from Meta, that aim to capture mathematical expressions in academic papers better. Math formulas are good examples of cases where the OCR-model pipeline fails ❌ because even though the different parts of a formula are captured correctly, it is not so easy to reconstruct the formula afterwards.

🔗 Read more in the original paper that introduced the approach https://arxiv.org/pdf/2111.15664.pdf and the recent paper from Meta https://arxiv.org/pdf/2308.13418v1.pdf

✍️ Automatic prompt engineering

We wrote a series about automatic prompt engineering using a Retrieval Augmented Generation (RAG) use case. We discuss how you can use an LLM to optimise your prompt according to your custom metrics ✨

Part I defines the RAG use case and the main concepts of prompt engineering while Part II introduces the flow for an automatic prompt engineering as well as some useful metrics. Part III is a hands on tutorial to implement the RAG part and Part IV is a hands on tutorial about the automatic evaluation with an LLM

We hope to release more code and concepts around this areas which is receiving a lot of attention lately, stay tuned 🚀

The Token