Data Updater

Performs incremental updates to fetch only the latest changes from Drupal.org.

Purpose

Efficiently maintains up-to-date datasets by fetching only new or modified records instead of re-downloading everything.

How It Works

  1. Check timestamp: Finds the latest change timestamp in BigQuery
  2. Fetch changes: Downloads only records newer than that timestamp
  3. Process data: Transforms and cleans the new records
  4. Upsert data: Updates BigQuery tables with new/changed records

Common Commands

# Update single resource with latest changes
make cli update project

# Update all resources (main workflow)
make data

Supported Resources

Most resources support incremental updates, except:

  • user - Requires full re-extraction
  • term - Requires full re-extraction
  • vocabulary - Requires full re-extraction

Benefits

  • Faster: Only downloads changed data
  • Efficient: Reduces API calls and bandwidth
  • Fresh data: Keeps datasets current without full rebuilds
  • Automated: Can be run regularly via cron/scheduler

When to Use

  • Regular maintenance: Daily/weekly automated updates
  • After initial setup: Once BigQuery tables exist
  • Monitoring changes: Track new issues, releases, etc.

Use make data as your primary command for keeping data current.