#reinforcement learning

📈 Newsletters Towards Data Science 11 min read

Surviving High Uncertainty in Logistics with MARL

Part 2. Building scale-invariant agents that seamlessly change contexts The post Surviving High Uncertainty in Logistics with MARL appeared first on Towards Data Science .

#multi-agent reinforcement learning #logistics optimization #scheduling

🕐 a day ago

Read →

📈 Newsletters Towards Data Science 8 min read

Playing Connect Four with Deep Q-Learning

Solving multiplayer games with function approximation The post Playing Connect Four with Deep Q-Learning appeared first on Towards Data Science .

#reinforcement learning #deep q-learning #connect four

🕐 2 days ago

Read →

🍎 AI Labs Apple ML Research 2 min read

PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents using outcome-only rewards suffers from credit-assignment ambiguity,…

#multi-agent reasoning #tool-use #reinforcement learning

🕐 3 days ago

Read →

☁️ AI Labs AWS Machine Learning Blog 15 min read

Reinforcement fine-tuning with LLM-as-a-judge

In this post, we take a deeper look at how RLAIF or RL with LLM-as-a-judge works with Amazon Nova models effectively.

#reinforcement learning #llm fine-tuning #reward models

🕐 6 days ago

Read →

💹 News AI Business 2 min read

Record $1.1B Seed Funding for Reinforcement Learning Startup

The vendor’s goal is achieving superintelligence.

#reinforcement learning #ai funding #startup

🕐 8 days ago

Read →

🚀 News TechCrunch AI 3 min read

DeepMind’s David Silver just raised $1.1B to build an AI that learns without human data

Ineffable Intelligence, a British AI lab founded a mere few months ago by former DeepMind researcher David Silver, has raised $1.1 billion in funding at a valuation of $5.1 billion.

#reinforcement learning #deepmind #ai funding

🕐 9 days ago

Read →

📈 Newsletters Towards Data Science 8 min read

Introduction to Approximate Solution Methods for Reinforcement Learning

Learn about function approximation and the different choices for approximation functions The post Introduction to Approximate Solution Methods for Reinforcement Learning appeared first on Towards Data Science .

#machine learning #artificial intelligence #programming

🕐 12 days ago

Read →

🐻 Research Berkeley AI Research 14 min read

Gradient-based Planning for World Models at Longer Horizons

GRASP is a new gradient-based planner for learned dynamics (a “world model”) that makes long-horizon planning practical by (1) lifting the trajectory into virtual states so optimization is parallel across…

#world models #gradient-based planning #reinforcement learning

🕐 16 days ago

Read →

🧐 Safety LessWrong 1 min read

You can’t imitation-learn how to continual-learn

In this post, I’m trying to put forward a narrow, pedagogical point, one that comes up mainly when I’m arguing in favor of LLMs having limitations that human learning does…

#continual learning #imitation learning #reinforcement learning

🕐 a month ago

Read →

🤗 AI Labs Hugging Face Blog 45 min read

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

#reinforcement learning #distributed training #open source

🕐 a month ago

Read →

🐻 Research Berkeley AI Research 9 min read

RL without TD learning

In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer . Unlike traditional methods, this algorithm is not based on temporal difference…

#reinforcement learning #off-policy learning #temporal difference learning

🕐 6 months ago

Read →

🐻 Research Berkeley AI Research 9 min read

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to…

#reinforcement learning #autonomous vehicles #traffic optimization

🕐 1 year, 1 month ago

Read →

📐 Research The Gradient 24 min read

Mamba Explained

Is Attention all you need? Mamba, a novel AI model based on State Space Models (SSMs), emerges as a formidable alternative to the widely used Transformer models, addressing their inefficiency…

#deep learning #reinforcement learning #overviews

🕐 2 years ago

Read →