Welcome to the February edition of the Token. The year started strong in terms of AI developments 🚀 We had OpenAI announce o3 at the end of last year and Operator at the start of this, while the US announced the 500B Stargate project. Then, out of nowhere - for those not paying close attention at least - DeepSeek surprised everyone with their latest R1 model 🔥
In this issue, we also cover a case study on an AI training program we run for a US-based company 🎓 as well as our thoughts on the recent DeepSeek model 🐳
📰 Bites from AI news
OpenAI introduced Operator, which has the ability to control human interfaces to complete tasks. This is still an early version but it signposts the direction of travel for this year, which will be around agents and assistants 🤖 https://openai.com/index/introducing-operator/
The US announced the equivalent of the Manhattan project for AI. Project Stargate is a 500B investment towards building AI infrastructure to support the next generation of AI developments 💰 https://www.bbc.com/news/articles/cy4m84d2xz2o
AutoML now comes to RAG 🔥 Things like parsing, chunking, selecting a vector database and creating evaluation data among others are now optimised on your behalf. https://github.com/Marker-Inc-Korea/AutoRAG
Anthropic shares useful patterns for building agentic applications. It also s uggests a useful definition for agents - which is that their execution path is dynamic vs fixed, which is often the case in agentic workflows where you use multiple LLMs with separate roles. https://www.anthropic.com/research/building-effective-agents
DeepSeek updated their frontier model which seems to surpass GPT4o, Claude 3.5 and Llama 405B while also being open source 🔥 Of course the biggest news was the release of DeepSeek R1, a model that not only seems to match o1 in terms of reasoning ability but is also open source 😮 It costs 100x less to use 💸 and comes from China 🇨🇳 https://arxiv.org/abs/2501.12948
💼 Empowering Software Teams with AI Skills
We designed a customised 10-week AI training program for a US-based client to upskill their data-focused team, including data engineers and administrators. The program combined weekly lectures on foundational and advanced AI concepts with hands-on workshops, enabling participants to apply their learning in real-world scenarios. Covering key topics such as data preprocessing, model training, evaluation, large language models, and deployment, the course was tailored to align with the client’s business objectives. By the end of the program, participants reported a significant improvement in their AI development skills, leading the company to initiate an internal hackathon to explore AI-driven innovations.
🔗 Read the entire case study here https://mantisnlp.com/work/ai-upskilling/
💡What happened with DeepSeek? 🐳
In summary, DeepSeek, a Chinese 🇨🇳 company, released their R1 model. It matches o1, the best currently available model from OpenAI, before Google, Anthropic or Meta reached this milestone - and only 1 month after the model was released. There three main reasons for the attention this has received are
1️⃣ the model is open source 🔥 which challenges the lead OpenAI and proprietary providers had
2️⃣ the model is 100x cheaper to use 💥 than o1, which seems to be a combination of all the latest developments in the area like multi token prediction, 8 bit training and inference, more experts with less active parameters.
3️⃣ the reasoning part of the model, which is what differentiates GPT4o from o1, was not trained with human annotated data, but primarily in a self-reinforcing training loop leveraging whether the model was getting the answers correct This is easy to verify in reasoning problems like math and code. Making this process work is a breakthrough 🚀
Put all of those things together and you have the best available model right now being open source, 100x cheaper to use and from a Chinese company 🇨🇳
Interested to see how your organisation can optimize its processes by integrating AI? Get in touch with one of our AI experts for a free consultancy call with us, and one of our AI experts https://mantisnlp.com/contact/#cta
Did you guys try AutoRAG yourself? What's your experience with it?