Posts by Collection

portfolio

Incentivizing both Grounding and Reasoning in Large Language Models with Online Reinforcement Learning

Fine-tuned LLaMA-based LLM agents with online reinforcement learning (PPO) in a text based multi-step environment (BabyAI-Text). Investigated the impact of encouraging “reasoning-before-action”. In this simple setting, reasoning-before-action did not improve sample efficiency but provided interpretability advantages, and we also observed an interesting “reasoning collapse” phenomenon.

Stock Portfolio Optimization with Deep Reinforcement Learning

Trained a deep Q-learning agent for stock portfolio optimization. Benchmarked against mean-variance and Sharpe-ratio baselines, with comparable or superior returns in test scenarios.

publications

Paper Title Number 1

Published in Journal 1, 2009

This paper is about the number 1. The number 2 is left for future work.

Recommended citation: Your Name, You. (2009). "Paper Title Number 1." Journal 1. 1(1).
Download Paper | Download Slides

Paper Title Number 2

Published in Journal 1, 2010

This paper is about the number 2. The number 3 is left for future work.

Recommended citation: Your Name, You. (2010). "Paper Title Number 2." Journal 1. 1(2).
Download Paper | Download Slides

Paper Title Number 3

Published in Journal 1, 2015

This paper is about the number 3. The number 4 is left for future work.

Recommended citation: Your Name, You. (2015). "Paper Title Number 3." Journal 1. 1(3).
Download Paper | Download Slides

Paper Title Number 4

Published in GitHub Journal of Bugs, 2024

This paper is about fixing template issue #693.

Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

Pavanpreet Singh Gandhi