Top LLM Papers for the Week from 26/02 to 03/03
Stay Updated with Recent Large Language Models Research
Large language models (LLMs) have advanced rapidly in recent years. As new generations of models are developed, researchers and engineers need to stay informed on the latest progress. This article summarizes some of the most important LLM papers published during the First Week of March 2024.
The papers cover various topics shaping the next generation of language models, from model optimization and scaling to reasoning, benchmarking, and enhancing performance. Keeping up with novel LLM research across these domains will help guide continued progress toward models that are more capable, robust, and aligned with human values.
Table of Contents:
LLM Progress & Benchmarking
LLM Reasoning
LLM Training, Evaluation & Inference
LLM Fine-TuningÂ
Transformers & Attention Based Models
My E-book: Data Science Portfolio for Success Is Out!
I recently published my first e-book Data Science Portfolio for Success which is a practical guide on how to build your data science portfolio. The book covers the following topics: The Importance of Having a Portfolio as a Data Scientist How to Build a Data Science Portfolio That Will Land You a Job?
1. LLM Progress & Benchmarking
Beyond Language Models: Byte Models are Digital World Simulators
Orca-Math: Unlocking the Potential of SLMs in Grade School Math
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
2. LLM Reasoning
3. LLM Training, Evaluation & Inference
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Evaluating Very Long-Term Conversational Memory of LLM Agents
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
4. LLM Fine-Tuning
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
5. Transformers & Attention Based Models
Are you looking to start a career in data science and AI and do not know how? I offer data science mentoring sessions and long-term career mentoring:
Mentoring sessions: https://lnkd.in/dXeg3KPW
Long-term mentoring: https://lnkd.in/dtdUYBrM