Skip to content

Scrape Command

The scrape command fetches artists from Spotify and/or YouTube Music, looks up MusicBrainz IDs, and exports to CSV.

Basic Usage

Fetch artists from both Spotify and YouTube Music:

artistscraper scrape

Command Options

Options:
  --config, -c PATH     Path to configuration file (default: config.json)
  --spotify-only        Fetch artists from Spotify only
  --youtube-only        Fetch artists from YouTube Music only
  --skip-musicbrainz    Skip MusicBrainz ID lookup
  --lidarr              Add artists to Lidarr after export
  --output, -o PATH     Output CSV file path (overrides config)
  --verbose, -v         Enable verbose output
  --help                Show this message and exit

Examples

Fetch from Specific Source

Spotify only:

artistscraper scrape --spotify-only

YouTube Music only:

artistscraper scrape --youtube-only

Custom Output File

artistscraper scrape --output my_artists.csv

Or short form:

artistscraper scrape -o my_artists.csv

Skip MusicBrainz Lookup

For faster scraping without ID lookups:

artistscraper scrape --skip-musicbrainz

Warning

When using --skip-musicbrainz, you cannot use --lidarr as Lidarr requires MusicBrainz IDs.

Add to Lidarr

Fetch, lookup IDs, and add to Lidarr in one command:

artistscraper scrape --lidarr

Verbose Output

See detailed logging:

artistscraper scrape --verbose

Or short form:

artistscraper scrape -v

Combining Options

You can combine multiple options:

# Fetch from Spotify only, enable verbose output, custom file
artistscraper scrape --spotify-only --verbose --output spotify_artists.csv

# Fetch from YouTube Music and add to Lidarr
artistscraper scrape --youtube-only --lidarr

# Fast scrape without IDs, Spotify only
artistscraper scrape --spotify-only --skip-musicbrainz

Using a Custom Config File

Specify a different configuration file:

artistscraper scrape --config /path/to/config.json

Short form:

artistscraper scrape -c /path/to/config.json

What Happens During Scrape

Step 1: Fetch from Sources

The scraper:

  • Connects to Spotify (if enabled)
    • Fetches all liked tracks
    • Fetches all followed artists
    • Fetches all playlists (public, private, collaborative)
    • Extracts unique artist names
  • Connects to YouTube Music (if enabled)
    • Fetches all liked videos
    • Fetches all channel subscriptions
    • Fetches all playlists
    • Extracts artist names from titles and channels

Progress is shown with a progress bar.

Step 2: Deduplicate

  • Combines artists from all sources
  • Removes duplicates
  • Tracks which source(s) each artist came from
  • Counts how many tracks by each artist

Step 3: MusicBrainz Lookup

Unless --skip-musicbrainz is used:

  • Looks up MusicBrainz ID for each artist
  • Uses fuzzy matching (90% similarity threshold)
  • Respects 1 request/second rate limit
  • Shows progress bar
  • Logs artists without matches to skipped_artists.log

Step 4: Export

  • Writes matched artists to CSV with:
    • Artist Name
    • MusicBrainz ID (format: lidarr:ID)
    • Source (Spotify, YouTube Music, or both)
    • Play Count
  • Writes unmatched artists to skipped_artists.log

Step 5: Lidarr Integration (Optional)

If --lidarr is used:

  • Connects to Lidarr
  • For each artist with MusicBrainz ID:
    • Checks if already exists
    • Searches Lidarr's database
    • Adds new artists with monitoring enabled
  • Shows summary of added/existing/failed artists

Output

The scraper generates:

  1. artists.csv (or custom name): Main output file
  2. skipped_artists.log: Artists without MusicBrainz IDs

See Output Files for details.

Performance

Typical execution times:

  • Fetching (Spotify + YouTube): 30-60 seconds
  • MusicBrainz lookups: 1 second per artist
    • 100 artists ≈ 2 minutes
    • 500 artists ≈ 9 minutes
    • 1000 artists ≈ 17 minutes
  • Lidarr import: 1-5 seconds per artist

Total time depends on your library size.

From Source Installation

If you installed from source with Poetry:

poetry run artistscraper scrape [options]

Next Steps