Towards a theory of balancing exploration and exploitation in probabilistic environments

In proceedings of the sixth International Conference on Cognitive Modeling
Nellen, S., & Lovett, M. C.
Pittsburgh, PA

Learning to make good choices in a probabilistic environmentrequires that the Decision Maker resolves the tension betweenexploration (learning about all available options) andexploitation (consistently choosing the best option in order tomaximize rewards). We present a mathematical learningmodel that makes selections in a repeated-choice probabilistictask based on the expected payoff associated with each optionand the information gain that will result from choosing thatoption. This model can be used to analyze the relative impactof exploration and exploitation over time and under differentconditions. It predicts the aggregated and individual learningtrajectories of participants in various versions of the tasksufficiently well to support our basic argument: Informationgain is a valid and rational criterion underlying humandecision making. Future modeling work will be addressingthe exact nature of the interaction between exploration andexploitation.

The CAUSE Research Group is supported in part by a member initiative grant from the American Statistical Association’s Section on Statistics and Data Science Education