Top Important LLM Papers for the Week from 08/01 to 14/01
Stay Updated with Recent Large Language Models Research
Large language models (LLMs) have advanced rapidly in recent years. As new generations of models are developed, researchers and engineers need to stay informed on the latest progress. This article summarizes some of the most important LLM papers published during the Second Week of January 2024.
The papers cover various topics shaping the next generation of language models, from model optimization and scaling to reasoning, benchmarking, and enhancing performance. Keeping up with novel LLM research across these domains will help guide continued progress toward models that are more capable, robust, and aligned with human values.
Table of Contents:
LLM Progress & Benchmarking
LLM Fine Tuning
LLM Reasoning
LLM Training & Evaluation
Transformers & Attention Based Models
My E-book: Data Science Portfolio for Success Is Out!
I recently published my first e-book Data Science Portfolio for Success which is a practical guide on how to build your data science portfolio. The book covers the following topics: The Importance of Having a Portfolio as a Data Scientist How to Build a Data Science Portfolio That Will Land You a Job?
1. LLM Progress & Benchmarking
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models
DocGraphLM: Documental Graph Language Model for Information Extraction
A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
2. LLM Fine Tuning
3. LLM Reasoning
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
The Impact of Reasoning Step Length on Large Language Models
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
4. LLM Training & Evaluation
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
5. Transformers & Attention Based Models
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Score Distillation Sampling with Learned Manifold Corrective
6. LLM Ethics and Trustworthiness
6.1. TrustLLM: Trustworthiness in Large Language Models
Are you looking to start a career in data science and AI and do not know how? I offer data science mentoring sessions and long-term career mentoring:
Mentoring sessions: https://lnkd.in/dXeg3KPW
Long-term mentoring: https://lnkd.in/dtdUYBrM