Measuring the International Appeal of Novels
Marijn Koolen
eHumanities Group, Amsterdam, 15/01/2015
Overview
- Introduction: Beyond the book
- Approach
- Experiments
- Tool criticism
Beyond the Book
- Investigate globalisation in literature
- NL eScience project, digital humanities cluster
- Team:
- Karina van Dalen-Oskam (Huygens ING, UvA)
- Floor Buschenhenke (Huygens ING)
- Carlos Martinez-Ortiz (NL eScience Center)
- Marijn Koolen (UvA)
Research Question
- Are there textual aspects of a novel that contribute to its international appeal?
- no doubt many factors contribute: fame, trends, economy, ...
- we focus on the text
Measuring International Appeal
- Two parts:
- Establish international interest in novels
- Identify potential textual aspects
Establish International Interest
- Which novels are internationally appealing?
- those that get translated in different languages
- those that get sold in many countries
- talk to translators, publishers
Identify Textual Aspects
- Many aspects possible
- We start with named entities and topics
1. International Interest
Book Markets
- International book market
- How does tranlation market work?
- Germany is gateway for Dutch novels
- English translation is epitomy
Translations Dutch Import
Translations Dutch Export
Operationalising International Interest
- Proxy for interest:
- number of lanugages translated to
- number of non-Dutch editions
- number of English language translations
- Translation data from WorldCat
- seems more extensive than Index Translationum (UNESCO)
Other Signal of International Appeal
- Popularity and appreciation
- number of people who read the novel
- number of people who liked the novel
- GoodReads provides popularity of works and editions
- work.editions API
- number of ratings and avg. rating per edition
- Next on our list
Interviews
- Conducted interviews with publishers, editors and translators on what makes text suitable for markets
- diverse ideas: style, genre, themes, topics
- trends are important
Topical Hurdles
- Several mentioned topicality:
- rugby and ice hockey don’t sell in NL
- e.g. ice hockey references in Mordecai Richler’s work
- would need lots of explaining
- similarly, German crime thrillers set in Schwarzwald
Regional Interest
- How can we measure regional interest in topics?
- e.g. what topics do Canadians care about?
- or Dutch, German, etc.
- wild idea: look at Wikipedia
- who views or edits Wiki articles?
- Again, there are other aspects
- topicality is first choice
Relative Country Contributions
- Which countries contribute edits to a Wiki article?
- relative to their overall contributions
Wikipedia Country Contributions
Dutch and Canadian Contributions
Dutch and Canadian Contributions
Data
- Book collection:
- 492 Dutch novels, published between 1933 and 2008
- 318 translated to other languages, 27 to English
- Annotating text features
- named entity detection (from NameScape project)
- entities linked to Wikipedia articles
Other Textual Aspects
- Other textual aspects that could contribute:
- idea: use LIWC to characterise novels
- other ideas? syntactic analysis, topic modelling, ...
Entity Statistics
Linking Entities to Wikipedia
Linking Confidence
Confidence & Accuracy
- Is confidence related to accuracy?
- eye-balling suggests it is, but needs proper experimental test
- sometimes hard to judge
- stability across genres, authors?
Linked Entities
- Person
- high ambiguity, referring to fictional characters
- Locations
- low ambiguity, referring to external places
- Organisations
- Misc
- low ambiguity, specific entities (e.g. Bible, WOII)
Internationality & Entity Type
Correlations
Miscellaneous Entities?
- What kinds of entities are these?
- events, objects, buildings, phenomena, languages
- cultural references
- countries (misclassified)
Source Criticism
- Number of miscellaneous entities correlates weakly with
- Total number of translated editions
- number of English & German editions
- not with number of French editions
- Two explanations:
- French market is not a signal
- WorldCat lacks info on French editions
Wikipedia Country Contributions
- Dutch vs. non-Dutch contributions
- are topics edited more by Dutch people typically Dutch topics?
- are novels with international topics more internationally appealing?
- or are novels with typically Dutch entities more internationally appealing?
Dutch Contributions to Entities
- How robust are observed patterns against variations in accuracy?
- random errors only a problem for small data sets
- systematic errors can be fatal for any data set
- What happens to those patterns when we use different tools?
- of when we ‘improve’ tools
Comparing Observed Patterns
- Use set of manually annotated named entities
- compare patterns created by different tools
Conclusions
- Translators, publishers say cult. interest plays a role
- our analysis suppports this
- Cultural refs. seem signal of international interest
- non-Ducth readers prefer to read Dutch novels that are typically Dutch?
- ‘write about what you know’
- Signals are weak, decision based on many signals
Future Work
- More qualitative analysis of features
- Analyse other features:
- textual, e.g. emotional tone, stylistics
- appeal, e.g. popularity