Top Important LLM Papers for the Week from 12/02 to 18/02
Stay Updated with Recent Large Language Models Research
Large language models (LLMs) have advanced rapidly in recent years. As new generations of models are developed, researchers and engineers need to stay informed on the latest progress. This article summarizes some of the most important LLM papers published during the Third Week of February 2024.
The papers cover various topics shaping the next generation of language models, from model optimization and scaling to reasoning, benchmarking, and enhancing performance. Keeping up with novel LLM research across these domains will help guide continued progress toward models that are more capable, robust, and aligned with human values.
Table of Contents:
LLM Progress & Benchmarking
LLM Reasoning
LLM Training & Evaluation
Transformers & Attention Based Models
My E-book: Data Science Portfolio for Success Is Out!
I recently published my first e-book Data Science Portfolio for Success which is a practical guide on how to build your data science portfolio. The book covers the following topics: The Importance of Having a Portfolio as a Data Scientist How to Build a Data Science Portfolio That Will Land You a Job?
1. LLM Progress & Benchmarking
1.1. Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
1.2. DeAL: Decoding-time Alignment for Large Language Models
1.4. Mixtures of Experts Unlock Parameter Scaling for Deep RL
1.6. Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
1.7. LiRank: Industrial Large-Scale Ranking Models at LinkedIn
1.8. AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts
1.11. BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
1.13. OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
1.14. Lumos: Empowering Multimodal LLMs with Scene Text Recognition
1.15. A Human-Inspired Reading Agent with a Gist Memory of Very Long Contexts
1.16. Graph Mamba: Towards Learning on Graphs with State Space Models
1.17. MPIrigen: MPI Code Generation through Domain-Specific Language Models
2. LLM Reasoning
2.1. InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
2.3. Premise Order Matters in Reasoning with Large Language Models
3. LLM Training & InferenceÂ
3.1. Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
3.3. Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss
3.4. Data Engineering for Scaling Language Models to 128K Context
3.5. Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping
3.8. Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
4. Transformers & Attention Based Models
4.2. Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
4.3. Transformers Can Achieve Length Generalization But Not Robustly
Are you looking to start a career in data science and AI and do not know how? I offer data science mentoring sessions and long-term career mentoring:
Mentoring sessions: https://lnkd.in/dXeg3KPW
Long-term mentoring: https://lnkd.in/dtdUYBrM