Data acquisition- Film metadata and grosses were retrieved via the OMDb API, which provided consistent coverage of titles, release years, and core attributes. To supplement gaps and capture box-office dynamics at the weekly level, additional records were scraped from public box-office tables using a Python/Playwright pipeline. Parsing routines converted raw HTML into structured tables with release dates, weekly grosses, and theater counts. This dual approach ensured full coverage of ~4,000 U.S. first-run films while standardizing the variables needed for stress tagging, classification, and modeling.
Tone classification- Primary and secondary tones were assigned by a GPT-based classifier built with the OpenAI API, guided by the nine-tone taxonomy. The classifier did not rely on external plot or marketing fields; instead, it drew on GPT’s internal film knowledge and semantic reasoning. For well-known titles, GPT classified directly from title context. For obscure or ambiguous films, the system incorporated fallback logic.
Multiple GPT model passes acted as peer checks to reduce drift, and a human QC process spot-checked 5–10% of titles to achieve 90-95% accuracy. This workflow ensured consistency across +4,000 first-run releases, yielding both primary and secondary tone assignments.
Stress flagging- Each film was tagged with two indicators of economic strain at the time of release. First, NBER recession periods, pulled from the FRED database, provided an on/off (binary) measure of contraction. Second, the Conference Board’s Consumer Sentiment Index (CSI) supplied a monthly stress score. To make interpretation comparable across decades, CSI was binned into three bands: High Stress < 80, Elevated = 80–99, and Normal ≥ 100. These flags were merged into a single stress_level field, so every release could be analyzed in relation to both formal recessions and audience sentiment at the moment of opening.
Data cleaning- Initial classifier outputs and box-office data were consolidated with Power Query into a master file. Titles were deduped, non-theatrical re-releases and IMAX-only events were excluded, and fields were standardized for modeling.
Statistical modeling- To test how tones performed under different economic conditions, we estimated linear models of five-week real gross. Predictors included opening theaters and its interaction with stress level, primary emotional tone × stress interactions, and a control for sentiment at release. These models confirmed both the diminishing returns of scale under stress (≈ –$4.7K per theater, p = 0.02) and the significance of tone–stress effects (p < 0.05 for several interactions). Outputs included distributional summaries, supply-bias matrices, volatility measures, and coefficients- each are documented in the
R Appendix.
Composite payoff- In addition to modeled gross uplift, a mean_rating_effect metric was constructed by converting IMDb, Rotten Tomatoes, and Metascore uplifts into dollar-equivalent terms and averaging them with the regression results. This allowed emotional payoff and financial performance to be compared in the same unit.
Presentation- Outputs were exported as 10 CSVs and visualized in Tableau dashboards, assets were prototyped in Figma, and embedded within this site via Tilda for narrative presentation.