Endogenous Technology Spillovers in R&D Collaboration Networks¶


by Hsieh, Konig and Liu (2023, RAND forthcoming)

Zhiyuan Chen

School of Business

Renmin University of China

Roadmap¶

  • A Gentle Introduction to Structural Modeling and Estimation

  • The Paper in a Nutshell

  • Theory

  • Estimation

  • Conclusion

A Gentle Introduction to Structural Modeling and Estimation¶

  • Reduced-form Methods:

    • Focus on estimating the causal effect of a treatment ($X$) on an outcome ($Y$).

    • Do not explicitly model the underlying mechanisms.

    • e.g., OLS, IV, RDD, matching, etc.

  • Structural Methods:

    • Focus on estimating the structural parameters of a model that describes the relationship between $X$ and $Y$.

    • Explicitly model the underlying mechanisms

    • Counterfactual analysis for policy evaluation and welfare analysis

    • e.g., demand estimation, dynamic programming, quantitative trade models, etc.

Connecting Model with Data: A Comparison¶

Features Reduced-form Model Structural Model
Data Generation Process $Y_i = f(X_i) + \varepsilon_i$ $X_i = argmax_{z} \{U(Y_i, z)\}$
Estimation Purpose Causal effect of $X$ on $Y$ Structural parameters of the model
Estimation Method OLS, IV, RDD, Matching GMM, MLE, MCMC
Advantages Easy to implement, interpret, and understand More flexible, can incorporate more information, can be used for counterfactual analysis
Disadvantages mechanisms? mis-specification? General equilibrium effects? Usually more complicated, requires more assumptions, overfitting problem

The Paper by Hsieh, Konig and Liu (2023) in a Nutshell¶

  • Research Question: How to account for the co-evolution of R&D networks and firms' R&D expenditures in a model that can be applied to the real-world data?

  • Key Contributions:

    • Develop a structural model of R&D collaboration networks that incorporates endogenous production, R&D investment and collaboration decisions.

    • A complete equilibrium characterization of the model, consistent with the observed R&D networks

      • nested R&D network structures
      • Pareto distributed output levels
      • Competitive equilibrium exhibits under-connected R&D networks
    • An estimation strategy that is computationally feasible for large-scale networks

    • Counterfactual analysis of R&D subsidy, showing R&D subsidy targeting critical R&D collaborations is much more effective (5 times larger!) than a uniform (ignorant of network structure) R&D subsidy policy

Theory¶

Without the dynamics, the model is an extended Cournot model with R&D collaboration networks.

  • Firm Profits: $$\pi_i(\mathbf{y}, G)= \underbrace{\eta_i y_i}_{\text{private cost reductions}}-\frac1{2}y_i^2-\underbrace{\color{blue}\lambda y_i\sum_{j\neq i}y_j}_{\text{\color{blue}product market competition}} +\underbrace{{\color{red}\rho y_i \sum_{j=1}^n a_{ij}y_j}}_{\text{\color{red}spillovers due to cooperation}} -\underbrace{\zeta \sum_{j=1}^n a_{ij}}_{\text{\color{green}R\&D collaborations}}$$

    • R\&D collaboration network $\mathbf{A} = (a_{ij})$ is a binary matrix, where $a_{ij} = 1$ if $i$ collaborates with firm $j$, and $0$ otherwise; $a_{ii}=0$.

    • NO knowledge spillover if NO collaboration ($a_{ij} =0$), $\text{\color{red}inconsistent with the real-world patent citations and tons of empirical evidence on knowledge spillovers}(e.g., Liu and Ma (2021))$

Theory (cont'd)¶

(Myopic) Best Response Dynamics: A sequence of R&D efforts $\mathbf{y}_t$ and collaboration network $G_t$ such that in $[t, t+\Delta t]$:

  • R\&D Choice (type-I extreme value shocks): $$\Pr(y,\mathbf{y}_{-it}, G_t|\mathbf{y}_{t-1}, G_t) = \chi \frac{e^{\vartheta \pi_i(y, \mathbf{y}_{-it},G_t)}}{\int_{\mathcal{Y}}e^{\vartheta \pi_i(y', \mathbf{y}_{-it},G_t)}dy'}+ o(\Delta t)$$
  • Collaboration Choice (type-I extreme value shocks): $$\Pr(\mathbf{y}_t, G_t'|\mathbf{y}_t, G_t) = \xi \frac{e^{\vartheta \Phi(\mathbf{y}_{t},G_t')}}{e^{\vartheta \Phi(\mathbf{y}_{t},G_t}+e^{\mathbf{y}_{t},G_t'})}+ o(\Delta t)$$
  • The potential function $\Phi(\mathbf{y}, G)$ aggregates the R&D efforts and collaboration network

  • The R&D choice is static, much like choice of intermediate inputs, no dynamic implications for future output levels

Theory (cont'd)¶

Stationary Distribution: $(\mathbf{y}_t, G_t)$ is an ergodic Markov Chain with a unique stationary distribution $\mu^{\vartheta}(\mathbf{y}, G)$ given by the Gibbs measure:$$ \mu^{\vartheta}(\mathbf{y}, G) = \frac{e^{\vartheta \Phi(\mathbf{y}, G)}}{\sum_{G'\in \mathcal{G}^n \int_{\mathcal{Y}^n}e^{\vartheta \Phi(\mathbf{y}', G')d\mathbf{y}'}}}$$

  • Directly computing $\mu^{\vartheta}(\mathbf{y}, G)$ is costly, especially for large networks (the denominator)

  • Use the composite likelihood method to compute conditional distributions:

    • $\mu^{\vartheta}(\mathbf{y}|G) $ is a joint normal distribution

    • $\mu^{\vartheta}(G|\mathbf{y})$ a product of "independent" pairwise independent linking probabilities $$\Pr(a_{ij}=1|\mathbf{y}) = \frac{e^{\vartheta \Phi(\rho y_i y_j -\zeta)}}{1+e^{\vartheta \Phi(\rho y_i y_j -\zeta)}}$$

Theory (cont'd)¶

Equilibrium Characterization: $\vartheta\rightarrow \infty$ (i.e., small noise), $G$ is a nested split graph:$$ a_{ij} = \mathbf{1}_{\{\rho y_i y_j>\zeta\}}$$ $$y_i = \eta_i + \sum_{j\neq i}^n y_{j}(a_{ij}y_j - \lambda)$$

  • Echos the core-periphery structure of real-world R&D networks

  • Policy Implication: The equilibrium R&D spending is too low compared to the socially optimal level:$$y_i = \frac{\eta_i}{1-\psi}+\frac{1}{1-\psi}\sum_{j\neq i}^n y_{j}(\rho \mathbf{1}_{\{\rho y_i y_j>\zeta\}}-\lambda)$$

Reference: Thomas J. Sargent and John Stachurski, Economic Networks: Theory and Computation

In [34]:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
def simulate_nested_split_rd(n, eta, rho, lam, zeta, 
                             max_iter=100, tol=1e-8):
    """
    Simulate the fixed‐point (A, y) for given parameters.
    Args:
      n       : number of firms
      eta     : array‐like, length‐n vector of productivities η_i
      rho     : float, benefit parameter
      lam     : float, spill‐out parameter λ
      zeta    : float, link cost ζ
      max_iter: maximum number of updates to A,y
      tol     : convergence tolerance on max |y_new - y_old| 
    Returns:
      A       : (n×n) adjacency matrix of the network
      y       : length‐n vector of R&D efforts
    """
    η = np.asarray(eta).reshape(n)
    # initialize A to no links
    A = np.zeros((n,n), dtype=int)
    # initial guess for y: just the stand‐alone η
    y = η.copy() 
    for it in range(max_iter):
        # --- 1) given A, solve for y:  (I - B) y = η ---
        B = np.zeros((n,n))
        # off‐diagonal entries
        B += rho * A
        B -= lam * (np.ones((n,n)) - np.eye(n))
        # zero out diagonal
        np.fill_diagonal(B, 0.0)  
        # solve (I - B) y = η
        M = np.eye(n) - B
        y_new = np.linalg.solve(M, η)  
        # --- 2) update adjacency A by threshold rule ---
        thresh = zeta / rho
        # outer product y_i * y_j
        Yprod = np.outer(y_new, y_new)
        A_new = (Yprod > thresh).astype(int)
        np.fill_diagonal(A_new, 0)   # no self‐links
        # check convergence
        if np.max(np.abs(y_new - y)) < tol and np.array_equal(A_new, A):
            break 
        y = y_new
        A = A_new
    return A, y

def plot_adjacency(A, ax=None):
    """Plot adjacency matrix as a heatmap."""
    if ax is None:
        fig, ax = plt.subplots()
    im = ax.matshow(A, cmap='Blues')
    ax.set_xlabel('j')
    ax.set_ylabel('i')
    ax.set_xticks(np.arange(-0.5, A.shape[1], 1), minor=True)
    ax.set_yticks(np.arange(-0.5, A.shape[0], 1), minor=True)
    ax.grid(which='minor', color='gray', linestyle='-', linewidth=0.5)
    ax.tick_params(which='minor', bottom=False, left=False)
    return ax

if __name__ == "__main__":
    # === parameters from Figure 4 ===
    n   = 10
    lam = 0.06
    rho = 0.05 # in paper, rho = 0.02 
    eta = [1.00, 0.71, 0.58, 0.50, 0.45,
           0.41, 0.38, 0.35, 0.33, 0.32]
    zetas = [0.005, 0.0075, 0.015]
    # Simulate for each ζ and plot
    fig, axes = plt.subplots(1, len(zetas), figsize=(12,4))
    for ax, ζ in zip(axes, zetas):
        A, y = simulate_nested_split_rd(n, eta, rho, lam, ζ)
        ax = plot_adjacency(A, ax=ax)
        ax.set_title(fr'$\zeta={ζ}$')
    
    plt.suptitle("Stepwise Adjacency Matrices of the Nested‐Split Graph")
    plt.tight_layout(rect=[0,0,1,0.95])
    plt.show()
In [96]:
# socially optimal rd effort 
def simulate_nested_split_opt_rd(n, eta, rho, lam, zeta, psi,
                             max_iter=100, tol=1e-8):
    η = 1/psi * np.asarray(eta).reshape(n)
    # initialize A to no links
    A = np.zeros((n,n), dtype=int)
    # initial guess for y: just the stand‐alone η
    y = η.copy() 
    for it in range(max_iter):
        # --- 1) given A, solve for y:  (I - B) y = η ---
        B = np.zeros((n,n))
        # off‐diagonal entries
        B += 1/psi * rho * A
        B -= 1/psi * lam * (np.ones((n,n)) - np.eye(n))
        # zero out diagonal
        np.fill_diagonal(B, 0.0)  
        # solve (I - B) y = η
        M = np.eye(n) - B
        y_new = np.linalg.solve(M, η)  
        # --- 2) update adjacency A by threshold rule ---
        thresh = zeta / rho
        # outer product y_i * y_j
        Yprod = np.outer(y_new, y_new)
        A_new = (Yprod > thresh).astype(int)
        np.fill_diagonal(A_new, 0)   # no self‐links
        # check convergence
        if np.max(np.abs(y_new - y)) < tol and np.array_equal(A_new, A):
            break 
        y = y_new
        A = A_new
    return A, y
## Firm productivity and R&D effort
n   = 10
lam = 0.06
rho = 0.05 # in paper, rho = 0.02 
psi = 0.90 # weight in consumer welfare 
eta = [1.00, 0.71, 0.58, 0.50, 0.45,
       0.41, 0.38, 0.35, 0.33, 0.32]
zeta = 0.005
Aopt, yopt = simulate_nested_split_opt_rd(n, eta, rho, lam, zeta, psi)
A, y = simulate_nested_split_rd(n, eta, rho, lam, zeta)
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# figure1: R&D effort vs. productivity
axes[0].scatter(eta, y, color='blue', s=80)
axes[0].scatter(eta, yopt, color='blue', s=80, marker='*')
axes[0].legend(['Equilibrium R&D effort', 'Optimal R&D effort'])
axes[0].set_xlabel(r"Firm productivity $\eta_i$")
axes[0].set_ylabel(r"R&D effort $y_i$")
axes[0].set_title(r"R&D effort vs. productivity")
axes[0].grid(True)
# figure2: Degree vs. productivity
axes[1].scatter(eta, np.sum(A, axis=1), color='red', s = 80)
axes[1].scatter(eta, np.sum(Aopt, axis=1), color='red', s = 80, marker='*')
axes[1].legend(['Equilibrium degree', 'Optimal degree'])
axes[1].set_xlabel(r"Firm productivity $\eta_i$")
axes[1].set_ylabel("Degree (number of links)")
axes[1].set_title("Degree vs. productivity")
axes[1].grid(True)
plt.tight_layout()
plt.show()

Empirical Analysis¶

  • A sample of 1,738 firms with 632 R&D collaborations

  • The empirical profit function of firm $i$: $$\pi_i(y_i, G) = \eta_i y_i - \frac{1}{2}y_i^2 - \rho \sum_{j=1}^n f_{ij}a_{ij}y_j y_i - \lambda \sum_{j=1}^n b_{ij}y_jy_i-\sum_{j=1}^n a_{ij}\zeta_{ij}$$

  • Parameterization:

    • Productivity: $\eta_i = \mathbf{X}_i \boldsymbol{\delta} + \kappa z_i$

    • Symmetric R&D collaboration cost: $\zeta_{ij} = \mathbf{W}_{ij}\boldsymbol{\gamma} - z_i -z_j$

    • $z_i$ is the latent/unobserved firm heterogeneity

  • Estimation Challenge: Computing the denominator of the stationary distribution has to consider all possible R&D collaborations in $G'$

    • Composite likelyhood method

Estimation Method: Bayesian MCMC (Markov Chain Monte Carlo)¶

  • Bayesian goal: Compute expectations under an intractable posterior
    $$ p(\theta\mid y)\;\propto\;p(y\mid\theta)\,p(\theta) $$
    • $p(y\mid\theta)$ is the likelihood of data given parameters $\theta$.
    • $p(\theta)$ is the prior belief.
  • Monte Carlo methods approximate such integrals by random sampling: $$\mathbb{E}_{p(\theta|y)}[f(\theta)]\approx \frac1{S}\sum_{s=1}^S f(\theta^{(s)})$$

    • $S$ is the number of samples.
    • $\theta^{(s)}$ is a sample from $p(\theta|y)$.
  • Key idea: Construct a Markov chain $\{\theta^{(t)}\}$ whose
    stationary/equilibrium distribution is $p(\theta\mid y)$.

Two Core Algorithms for MCMC¶

1. Metropolis–Hastings¶

(1). Propose $\theta'\sim q(\theta'\mid\theta^{(t)})$: (a) $\theta' = \theta^{t}+\epsilon$, $\epsilon\sim N(0,\sigma^2)$ or (b) $\theta'$ drawn from a fixed distribution

(2). Compute
$$ \alpha = \min\!\biggl(1,\; \frac{p(y\mid\theta')\,p(\theta')\,q(\theta^{(t)}\mid\theta')} {p(y\mid\theta^{(t)})\,p(\theta^{(t)})\,q(\theta'\mid\theta^{(t)})} \biggr) $$ (3). Accept $\theta^{(t+1)}=\theta'$ with probability $\alpha$; otherwise $\theta^{(t+1)}=\theta^{(t)}$.

Why it works: This “Metropolis step” enforces detailed balance, guaranteeing the posterior is stationary for the chain.

2. Gibbs Sampling¶

  • Iterate over each component $i$:
    $$ \theta_i^{(t+1)} \sim p(\theta_i\mid \theta_{-i}^{(t)},\,y) $$
  • Requires closed-form conditionals.

Why it works: Each update leaves the joint posterior invariant; under mild conditions the chain is ergodic, so it converges to the true posterior.

Practical Tips¶

  • Run multiple chains & check convergence

  • Discard burn-in (initial transient).

  • Monitor effective sample size & autocorrelation.

  • For a typical structural estimation, the estimation usually takes days!

Example: a simple M-H Markov Chain Below is an example of using a simple Metropolis–Hastings Markov chain Monte Carlo to estimate the posterior of the mean $\mu$ of a Normal model with known variance.

  • Observed data: $\log p(y \mid \mu)= -\frac{n}{2}\log(2\pi\sigma^2)\;-\;\frac{1}{2\sigma^2}\sum_{i=1}^n (y_i - \mu)^2$
  • Prior: $\log p(\mu)= -\frac{1}{2}\log(2\pi\tau^2) \;-\;\frac{\mu^2}{2\tau^2}$
  • Posterior (after dropping constants): $\log p(\mu \mid y)\;\propto\; -\frac{1}{2\sigma^2}\sum_{i=1}^n (y_i - \mu)^2\;-\;\frac{\mu^2}{2\tau^2}$
In [114]:
# Bayesian inference using Metropolis-Hastings with Gaussian Random Walk 
# Generate synthetic data
np.random.seed(1)
true_mu = 5.0
sigma = 2.0   # known standard deviation
n_samples = 50
data = np.random.normal(loc=true_mu, scale=sigma, size=n_samples)
tau = 10.0
# Log-posterior (up to a constant)
def log_post(mu, data, sigma, tau):
    ll = -0.5 * np.sum((data - mu)**2) / sigma**2  # log-likelihood
    lp = -0.5 * mu**2 / tau**2                    # log-prior
    return ll + lp
# Metropolis-Hastings sampler
def metropolis_hastings(log_post, data, sigma, tau, initial_mu, proposal_std, n_iters):
    mu_current = initial_mu
    logp_current = log_post(mu_current, data, sigma, tau)
    samples = np.zeros(n_iters)
    for i in range(n_iters):
        mu_proposal = np.random.normal(mu_current, proposal_std) #Gaussian proposal 
        logp_proposal = log_post(mu_proposal, data, sigma, tau)  
        # acceptance probability
        log_alpha = logp_proposal - logp_current
        if np.log(np.random.rand()) < log_alpha:
            mu_current = mu_proposal
            logp_current = logp_proposal
        samples[i] = mu_current
    return samples
# Run the sampler
n_iters = 10000
proposal_std = 0.5
initial_mu = 0.0
samples = metropolis_hastings(
    log_posterior, data, sigma, tau, initial_mu, proposal_std, n_iters
)
# Discard burn-in
burn_in = 2000 #dropping first 2000 samples
posterior_samples = samples[burn_in:]
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Trace plot
axes[0].plot(samples, alpha=0.5, color='blue')
axes[0].axvline(burn_in, color='red', linestyle='--', label='Burn-in cutoff')
axes[0].set_title(r'Trace of $\mu$ samples')
axes[0].set_xlabel(r'Iteration')
axes[0].set_ylabel(r'$\mu$')
axes[0].legend()

# Posterior histogram
axes[1].hist(posterior_samples, bins=30, density=True, alpha=0.5, color='blue', edgecolor='black')
axes[1].axvline(true_mu, color='black', linestyle='-', label=r'True $\mu$')
axes[1].set_title(r'Posterior distribution of $\mu$ (post burn-in)')
axes[1].set_xlabel(r'$\mu$')
axes[1].set_ylabel('Density')
axes[1].legend()

plt.tight_layout()
plt.show()

# Print summary
print("-------------------------------------")
print(f"True value    :  {true_mu:.3f}")
print(f"Posterior mean estimate: {np.mean(posterior_samples):.3f}")
print(f"Posterior std estimate: ({np.std(posterior_samples):.3f})")
print("-------------------------------------")
-------------------------------------
True value    :  5.000
Posterior mean estimate: 4.949
Posterior std estimate: (0.284)
-------------------------------------

Extensions and Takeaways¶

This paper provides a framework with many extensions and potential applications (probably more than an RAND publication!):

  • NO GROWTH!
  • Can introduce non-collaborative R&D spillovers, which further drive down the R&D collaboration incentives:

    • Firms can free-ride on other firms' R&D efforts, even if they do not collaborate with them: internal vs. external R&D spillovers
    • Revaluate the problem of optimal R&D allocation, R&D subsidy, etc.
    • In directed spillover networks, subsidizing the upstream and key collaboration note is the most effective way
  • The consequence of the breakdown/promoting of R&D collaborations

    • Industrial policy/Geopolitics promoting domestic R&D collaborations BUT not overseas collaborations
    • Global innovation networks in key industries (e.g., semiconductors, AI, etc.)
    • Promoting industry-Academia-Research collaborations (产学研结合)
  • Not only limited to R&D collaborations:

    • Environmental Economics: trading of carbon emission rights
    • Corporate Finance: ownership networks