Abstract:
In the dynamic entertainment industry, predicting a movie's opening box office revenue remains critical for filmmakers and studios. To address this challenge, we present a novel Cross-modal transformer and a Hierarchical Fusion Neural Network (CHFNN) model tailored to predict movie box office earnings based on multimodal features extracted from movie trailers, posters, and reviews. The Cross-modal Transformer component of the CHFNN model captures intricate inter-modal relationships by performing a cross-modal fusion of the extracted features. It employs self- attention mechanisms to dynamically weigh the importance of each modality's information. This allows the model to learn to focus on the most relevant information from trailers, posters, and reviews, adapting to the unique characteristics of each movie. The Hierarchical Fusion Neural Network within CHFNN further refines the fused features, enabling a deeper understanding of the inherent hierarchical structure of multimodal data. By hierarchically combining the cross- modal features, our model learns to capture both global and local interactions, enhancing its predictive capacity. We evaluate the performance of the CHFNN model on a comprehensive Internet Movie Dataset by obtaining metadata for 50,186 movies from the 1990s to 2022, which includes movie trailers, posters, and review data. Our results demonstrate that the CHFNN model outperforms existing models in prediction accuracy, achieving 95.80% prediction accuracy. The CHFNN model provides state-of-the-art predictive power and offers interpretability through attention mechanisms, allowing insights into the factors contributing to a movie's box office success.
Keyword:
Reprint Author's Address:
Source :
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY
ISSN: 1798-2340
Year: 2024
Issue: 7
Volume: 15
Page: 822-837
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: