1. This article presents two algorithms for imitation learning from policy data that has been corrupted by temporally correlated noise in expert actions.
2. The algorithms, DoubIL and ResiduIL, use modern variants of the instrumental variable regression technique to recover the underlying policy without requiring access to an interactive expert.
3. Both algorithms compare favorably to behavioral cloning on simulated control tasks.
The article is generally trustworthy and reliable, as it provides a detailed description of the two algorithms developed for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. The authors provide evidence for their claims through simulations and comparisons with behavioral cloning on simulated control tasks. Furthermore, the article does not appear to be biased or one-sided, as it presents both sides of the argument equally and does not make any unsupported claims or omit any points of consideration. Additionally, there are no promotional elements present in the article, nor does it appear to be partial in any way. Finally, possible risks are noted throughout the article, making it clear that further research is needed before these algorithms can be applied in real-world settings.