Historical Network Research
How to Analyse Networks
HNR Workshop 2015
UvA, Amsterdam, 19/06/2015
Marijn Koolen
Overview
- Background
- Networks & Structural Properties
- Discussion
Constitutions & References
Analysing Networks
- How do we compare two (or more) networks?
- i.e. how can we establish (meaningful) similarities/differences?
- How can we explain differences/similarities?
- How can we understand affect of differences/similarities?
- Characters in novels
- interactions as networks
- what does network show? (interpreting)
- what counts as interaction? (modeling)
Completeness and Contingency
- Selection of data determines structure of network
- e.g. correspondence data: what fraction is available?
- how well does sample represent actual network
2. Networks & Structural Properties
Types of Analysis
- Visual vs. quantitative analysis
- visualisations show clusters
- quantitative analysis shows different aspects
- better for macro-analysis?
- connecting analyses at different levels
Network Characteristics
- A conceptual model: meaning from structure
- Degrees: number of connections of a node
- Walks: traversal of network edges
- Clusters: communities, cliques, trust and familiarity
- Components: connectedness, flow
- Centrality: influence
Degree Structure
- How many connections?
- in-degree: incoming edges
- out-degree: outgoing edges
- in == out in undirected networks
- What causes difference in degrees?
- and what are consequences?
Wikipedia Link Structure Evolution
Walks, Trails and Paths
- Indirect connections:
- Walk: alternating sequence of nodes and edges, starting and ending with nodes
- Trail: a walk where are all edges are distinct
- Path: a walk where all nodes and edges are distinct
Distance and Diameter
- Distance: number of edge traversals
- Why are Wikipedia paths so short?
Hypothesis Testing
- You can link only to web sites you know
- Sites you tend to know and link to are:
- local sites (geographical or topical)
- famous sites
- What aspects of global link structure would you expect to find?
- e.g. how do above aspects affect network structure?
Clustering & Small Worlds
- Links to local sites:
- clustering: extent to which neighbours of me are neighbours of each other
- Links to famous sites:
- connect distant parts of the network
- Consequence:
Wikipedia Linking Guideliens
- Link to articles that are relevant this article
- e.g. they are related to this article
- therefore probably to each other
- neighbours of me are neighbours of each other
- Wikipedia has high clustering
- and many links to generic articles (high-degree, central)
- Consequence: very small world
Toy Example
- 100 nodes in a circle, each connected to its immediate neighbours
- 100 edges, all nodes have degree 2
- Diameter: 50
- Avg. path length: 25
- Strategically add 6 long-distance edges :
- diameter = 25, avg. path length = 11
- Consequence: distant nodes become much closer
Actor-Actor Network
- Compare networks of actors
- Source criticism!
Hypothesis About Blockbusters (1/2)
- Assumptions:
- blockbuster films have some famous actors
- plus many less known, ‘cheaper’ actors?
- most less known actors have starred in few blockbusters, worked with few famous actors
- famous actors have starred in many blockbusters, worked with many other famous actors
- assortative mating
Hypothesis About Blockbusters (2/2)
- Famous actors tie the whole network together
- short distance between most famous actors
- therefore, short distance between non-famous actors
- expectation: actor network is a small-world
Technical vs. Social Networks
- Newman (2006):
- social networks tend to show assortative mixing
- technical networks disassortative mixing
Clusters and Bridges
- Bridge is node that brings two clusters together
- Community detection
- remove bridges, disconnect subcomponents
- densely linked clusters are more stable
- Algorithms can detect bridges and communities
- Clique: maximally connected subcomponent
Components
- Component: set of connected nodes
- Weakly connected (WCC): no edge direction
- Strongly connected (SCC): use edge direction, all nodes reach all others
Components in Wikipedia & Web
Centrality
- Centrality: importance or influence of a node
- Different definitions:
- degree
- betweenness
- closeness
- eigenvector (PageRank)
- alpha, harmonic, ...
Betweenness Centrality
- Betweenness centrality:
- number of shortest paths between any two nodes via a particular node
- hubs are typical central nodes
- geographical: traffic (least resistance)
- communication: influence in information flow
Generative Mechanisms
- What mechanisms generate a network
- preferential attachment
- geographical constraints
- accessibility
- Mechanisms can be modelled
- to generate networks to test hypotheses
- a ‘simple’ model can generate web-like hyperlink network
Random vs. Scale-Free
- Two famous models for networks
- Random network:
- two nodes have prob. p of being connected
- nodes roughly same degree (normal distribution)
- Scale-free network:
- probability of connection based on degrees (preferential attachment)
- few have high degree, many have low degree
- degree distribution follows power-law
Weighted Edges
- Orbis models travel cost (money and time) for different modes of transport across Roman world
- terrain type, edge length and travel modes represent different costs
- e.g. the walk edge between A and B costs more time than the ride edge.
- path distance is calculated as sum of weighted edges
From Global to Local
- Local network is subset of global network
- Local network of Wikipedia search results
- focus on topical network
- affects: structure, patters, meaning, predictions?
- clusters of sub-topics?
I Could Use This For...
- What could applying these notions to your data bring?
- Which structural aspects would be useful?
- Which would be useless or too difficult to derive meaning from?
Wrap Up
- Networks have many structural properties
- properties mean something
- meaning from structure
- Underlying mechanisms shape network