The widespread adoption of effective hybrid closed loop systems would benefit people living with type 1 diabetes by improving the amount of time spent within target blood glucose range. Hybrid closed loop systems (also known as 'artificial pancreas' typically utilise simple control algorithms to select the best insulin dose for maintaining blood glucose levels within a healthy range. Online reinforcement learning has been utilised as a method for further enhancing glucose control in these devices. Previous approaches have been shown to reduce patient risk and improve time spent in the target range when compared to classical control algorithms, but are prone to instability in the learning process, often resulting in the selection of unsafe actions. This study in the Journal of Biomedical Informatics presents an evaluation of offline reinforcement learning for developing effective dosing policies without the need for potentially dangerous patient interaction during training.