Papers
Toward Minimax Off-policy Value Estimation
AISTATS 2015
Regularized Off-Policy TD-Learning
NIPS 2012
The Fixed Points of Off-Policy TD
NIPS 2011