Overview
We propose to investigate the impact of sentiment and activity volumes of social media, financial news, and search engines on stock volatility. We use state-of-the-art sentiment analysis tools to process a large amount of heterogeneous data gathered from various sources which include Twitter, StockTwits, Reddiit, Google Finance, Nexis, Google Trends and Wikipedia. Our research questions are then (i) whether sentiment analysis and internet activity variables provide predictive power when used jointly with a large set of other economic predictors, (ii) what is the nature of this potential impact, and (iii) how can this information be used to generate improved volatility predictions. Our proposed approach is based on Bayesian regression models with spike-and-slab priors. This allows for dealing with high-dimensional data and for doing Bayesian model averaging which results in improved predictions compared to using one single model. We will investigate various model extensions to account for phenomena such as regime switches, non-linearities, and interactions among predictors. In addition, we plan to apply a Bayesian regression model that jointly uses low-frequency and high-frequency return data to generate more precise volatility estimates and, consequently, predictions. In summary, the goal of the project is to contribute to the development of the next generation of volatility prediction models that combine sentiment signals with advanced statistical methods.