Shie Mannor

143 papers · 2003–2025 · 15 conferences · across top CS/AI conferences

Achievements

🗺️ Taxonomy Completionist (40) 🏃 Academic Marathon (22) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (15) 🐣 Hot Topic Early Bird 🌈 Renaissance Researcher (7) 🐝 Cross-Pollinator (13) 🏠 Conference Loyalist (41) 🌟 Keyword Trendsetter Combo (5) 🏆 Keyword Champion (3) 👑 Triple Crown 🌱 Topic Pioneer 🔬 Deep Specialist (18) 🤝 Dynamic Duo (19) 🏆 Grand Slam 🗃️ Keyword Collector (210) ❓ The Questioner (2) 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (18) ⚡ Prolific Year (10) 💎 Century Club (143)

Conferences

ICML (47) NIPS (41) COLT (13) AAAI (11) JMLR (10) ICLR (7) UAI (4) AISTATS (2) CVPR (2) ACML (1) ALT (1) CORL (1) IJCAI (1) RSS (1) WACV (1)

Top co-authors

Yonathan Efroni (19) Gal Dalal (15) Huan Xu (13) Gal Chechik (12) Constantine Caramanis (11) Guy Tennenholtz (10) Nadav Merlis (9) Assaf Hallak (8) Jeongyeol Kwon (8) Aviv Tamar (7)

Research topics

Keywords

reinforcement learning (32) online learning (25) regret bound (21) multi-armed bandit (13) markov decision process (12) policy gradient (11) robust optimization (9) regret minimization (8) stochastic optimization (8) sample complexity (6) contextual bandit (6) policy optimization (6) value function (6) model-based reinforcement learning (6) policy iteration (5) game theory (5) deep reinforcement learning (5) temporal difference learning (5) thompson sampling (5) robust markov decision process (5)

Papers

On Bits and Bandits: Quantifying the Regret-Information Trade-off ICLR 2025

Policy Gradient with Tree Expansion ICML 2025

RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression CVPR 2025

A Classification View on Meta Learning Bandits ICML 2025

Reinforcement Learning with Segment Feedback ICML 2025

Global Convergence of Policy Gradient in Average Reward MDPs ICLR 2025

Efficient Value Iteration for s-rectangular Robust Markov Decision Processes ICML 2024

Sobolev Space Regularised Pre Density Models ICML 2024

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization ICML 2024

Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel ICML 2024

Improving Token-Based World Models with Parallel Observation Prediction ICML 2024

Solving Non-rectangular Reward-Robust MDPs via Frequency Regularization AAAI 2024

Tree Search-Based Policy Optimization under Stochastic Execution Delay ICLR 2024

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation NIPS 2024

Prospective Side Information for Latent MDPs ICML 2024

Train Hard, Fight Easy: Robust Meta Reinforcement Learning NIPS 2023

Learning Hidden Markov Models When the Locations of Missing Observations are Unknown ICML 2023

Planning and Learning with Adaptive Lookahead AAAI 2023

PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient ICML 2023

Learning to Initiate and Reason in Event-Driven Cascading Processes ICML 2023

Reward-Mixing MDPs with Few Latent Contexts are Learnable ICML 2023

Representation-Driven Reinforcement Learning ICML 2023

Optimization or Architecture: How to Hack Kalman Filtering NIPS 2023

Individualized Dosing Dynamics via Neural Eigen Decomposition NIPS 2023

Policy Gradient for Rectangular Robust Markov Decision Processes NIPS 2023

DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles CORL 2022

Analysis of Stochastic Processes through Replay Buffers ICML 2022

Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning AAAI 2022

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning ICLR 2022

Online Apprenticeship Learning AAAI 2022

Reinforcement Learning for Datacenter Congestion Control AAAI 2022

Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning NIPS 2022

Tractable Optimality in Episodic Latent MABs NIPS 2022

Finite Sample Analysis Of Dynamic Regression Parameter Learning NIPS 2022

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms ICML 2022

Optimizing Tensor Network Contraction Using Reinforcement Learning ICML 2022

The Geometry of Robust Value Functions ICML 2022

Actor-Critic based Improper Reinforcement Learning ICML 2022

Efficient Risk-Averse Reinforcement Learning NIPS 2022

Reinforcement Learning with a Terminator NIPS 2022

Bandits with partially observable confounded data UAI 2021

Action redundancy in reinforcement learning UAI 2021

Robust Value Iteration for Continuous Control Tasks RSS 2021

Reinforcement Learning in Reward-Mixing MDPs NIPS 2021

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction NIPS 2021

Sim and Real: Better Together NIPS 2021

Twice regularized MDPs and the equivalence between robustness and regularization NIPS 2021

RL for Latent MDPs: Regret Guarantees and a Lower Bound NIPS 2021

Reinforcement Learning with Trajectory Feedback AAAI 2021

Lenient Regret for Multi-Armed Bandits AAAI 2021

Online Limited Memory Neural-Linear Bandits with Likelihood Matching ICML 2021

Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks ICML 2021

Over-the-Air Adversarial Flickering Attacks Against Video Recognition Networks CVPR 2021

Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning ICLR 2021

Acting in Delayed Environments with Non-Stationary Markov Policies ICLR 2021

Value Iteration in Continuous Actions, States and Time ICML 2021

Detecting Rewards Deterioration in Episodic Reinforcement Learning ICML 2021

Confidence-Budget Matching for Sequential Budgeted Learning ICML 2021

Known unknowns: Learning novel concepts using reasoning-by-elimination UAI 2021

Tight Lower Bounds for Combinatorial Multi-Armed Bandits COLT 2020

An adaptive stochastic optimization algorithm for resource allocation ALT 2020

Online Planning with Lookahead Policies NIPS 2020

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs AAAI 2020

Optimistic Policy Optimization with Bandit Feedback ICML 2020

Topic Modeling via Full Dependence Mixtures ICML 2020

Off-Policy Evaluation in Partially Observable Environments AAAI 2020

Scalable Detection of Offensive and Non-compliant Content / Logo in Product Images WACV 2020

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies NIPS 2019

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem COLT 2019

Action Robust Reinforcement Learning and Applications in Continuous Control ICML 2019

Reward Constrained Policy Optimization ICLR 2019

The Natural Language of Actions ICML 2019

Exploration Conscious Reinforcement Learning Revisited ICML 2019

A Bayesian Approach to Robust Reinforcement Learning UAI 2019

Nonlinear Distributional Gradient Temporal-Difference Learning ICML 2019

On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters AAAI 2019

How to Combine Tree-Search Methods in Reinforcement Learning AAAI 2019

Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning NIPS 2019

Distributional Policy Optimization: An Alternative Approach for Continuous Control NIPS 2019

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning NIPS 2018

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning NIPS 2018

Beyond the One-Step Greedy Approach in Reinforcement Learning ICML 2018

A General Approach to Multi-Armed Bandits Under Risk Criteria COLT 2018

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning COLT 2018

Multi-objective Bandits: Optimizing the Generalized Gini Index ICML 2017

Rotting Bandits NIPS 2017

End-to-End Differentiable Adversarial Imitation Learning ICML 2017

Approximate Value Iteration with Temporally Extended Actions (Extended Abstract) IJCAI 2017

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization COLT 2017

Shallow Updates for Deep Reinforcement Learning NIPS 2017

Consistent On-Line Off-Policy Evaluation ICML 2017

Adaptive Skills Adaptive Partitions (ASAP) NIPS 2016

Heteroscedastic Sequences: Beyond Gaussianity ICML 2016

Graying the black box: Understanding DQNs ICML 2016

Hierarchical Decision Making In Electricity Grid Management ICML 2016

Learning the Variance of the Reward-To-Go JMLR 2016

Regularized Policy Iteration with Nonparametric Function Spaces JMLR 2016

Policy Gradient for Coherent Risk Measures NIPS 2015

Community Detection via Measure Space Embedding NIPS 2015

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach NIPS 2015

Dynamic Sensing: Better Classification under Acquisition Constraints ICML 2015

Off-policy Model-based Learning under Unknown Factored Dynamics ICML 2015

Thompson Sampling for Learning Parameterized Markov Decision Processes COLT 2015

Sensor Selection for Crowdsensing Dynamical Systems AISTATS 2015

Online Learning for Adversaries with Memory: Price of Past Mistakes NIPS 2015

Set-Valued Approachability and Online Learning with Partial Monitoring JMLR 2014

Robust Logistic Regression and Classification NIPS 2014

Time-Regularized Interrupting Options (TRIO) ICML 2014

How hard is my MDP?" The distribution-norm to the rescue" NIPS 2014

Concept Drift Detection Through Resampling ICML 2014

Scaling Up Robust MDPs using Function Approximation ICML 2014

Approachability in unknown games: Online learning meets multi-objective optimization COLT 2014

Latent Bandits. ICML 2014

Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations ICML 2014

Thompson Sampling for Complex Online Problems ICML 2014

Reinforcement Learning in Robust Markov Decision Processes NIPS 2013

Robust Sparse Regression under Adversarial Corruption ICML 2013

Temporal Difference Methods for the Variance of the Reward To Go ICML 2013

Approachability, fast and slow COLT 2013

Online Learning for Time Series Prediction COLT 2013

Opportunistic Strategies for Generalized No-Regret Problems COLT 2013

Learning Multiple Models via Regularized Weighting NIPS 2013

Online PCA for Contaminated Data NIPS 2013

The Perturbed Variation NIPS 2012

More Is Better: Large Scale Partially-supervised Sentiment Classification ACML 2012

Statistical Optimization in High Dimensions AISTATS 2012

The Sample Complexity of Dictionary Learning JMLR 2011

Does an Efficient Calibrated Forecasting Strategy Exist? COLT 2011

The Sample Complexity of Dictionary Learning COLT 2011

Robust approachability and regret minimization in games with partial monitoring COLT 2011

From Bandits to Experts: On the Value of Side-Observations NIPS 2011

Committing Bandits NIPS 2011

Distributionally Robust Markov Decision Processes NIPS 2010

Online Classification with Specificity Constraints NIPS 2010

Robustness and Regularization of Support Vector Machines JMLR 2009

Online Learning with Sample Path Constraints JMLR 2009

Regularized Policy Iteration NIPS 2008

Robust Regression and Lasso NIPS 2008

The Robustness-Performance Tradeoff in Markov Decision Processes NIPS 2006

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems JMLR 2006

A Geometric Approach to Multi-Criterion Reinforcement Learning JMLR 2004

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem JMLR 2004

Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity JMLR 2003