diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..9e8b1a54 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,161 @@ +# Claude Code Development Guide for ArchiveBox + +## Quick Start + +```bash +# Set up dev environment +uv sync --dev + +# Run tests as non-root user (required - ArchiveBox refuses to run as root) +sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/ -v' +``` + +## Development Environment Setup + +### Prerequisites +- Python 3.11+ (3.13 recommended) +- uv package manager +- A non-root user for running tests (e.g., `testuser`) + +### Install Dependencies +```bash +uv sync --dev +``` + +### Activate Virtual Environment +```bash +source .venv/bin/activate +``` + +## Running Tests + +### CRITICAL: Never Run as Root +ArchiveBox has a root check that prevents running as root user. Always run tests as a non-root user: + +```bash +# Run all migration tests +sudo -u testuser bash -c 'source /path/to/.venv/bin/activate && python -m pytest archivebox/tests/test_migrations_*.py -v' + +# Run specific test file +sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/test_migrations_08_to_09.py -v' + +# Run single test +sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/test_migrations_fresh.py::TestFreshInstall::test_init_creates_database -xvs' +``` + +### Test File Structure +``` +archivebox/tests/ +├── test_migrations_helpers.py # Schemas, seeding functions, verification helpers +├── test_migrations_fresh.py # Fresh install tests +├── test_migrations_04_to_09.py # 0.4.x → 0.9.x migration tests +├── test_migrations_07_to_09.py # 0.7.x → 0.9.x migration tests +└── test_migrations_08_to_09.py # 0.8.x → 0.9.x migration tests +``` + +## Test Writing Standards + +### NO MOCKS - Real Tests Only +Tests must exercise real code paths: +- Create real SQLite databases with version-specific schemas +- Seed with realistic test data +- Run actual `python -m archivebox` commands via subprocess +- Query SQLite directly to verify results + +### NO SKIPS +Never use `@skip`, `skipTest`, or `pytest.mark.skip`. Every test must run. + +### Strict Assertions +- `init` command must return exit code 0 (not `[0, 1]`) +- Verify ALL data is preserved, not just "at least one" +- Use exact counts (`==`) not loose bounds (`>=`) + +### Example Test Pattern +```python +def test_migration_preserves_snapshots(self): + """Migration should preserve all snapshots.""" + result = run_archivebox(self.work_dir, ['init'], timeout=45) + self.assertEqual(result.returncode, 0, f"Init failed: {result.stderr}") + + ok, msg = verify_snapshot_count(self.db_path, expected_count) + self.assertTrue(ok, msg) +``` + +## Migration Testing + +### Schema Versions +- **0.4.x**: First Django version. Tags as comma-separated string, no ArchiveResult model +- **0.7.x**: Tag model with M2M, ArchiveResult model, AutoField PKs +- **0.8.x**: Crawl/Seed models, UUID PKs, status fields, depth/retry_at +- **0.9.x**: Seed model removed, seed_id FK removed from Crawl + +### Testing a Migration Path +1. Create SQLite DB with source version schema (from `test_migrations_helpers.py`) +2. Seed with realistic test data using `seed_0_X_data()` +3. Run `archivebox init` to trigger migrations +4. Verify data preservation with `verify_*` functions +5. Test CLI commands work post-migration (`status`, `list`, `add`, etc.) + +### Squashed Migrations +When testing 0.8.x (dev branch), you must record ALL replaced migrations: +```python +# The squashed migration replaces these - all must be recorded +('core', '0023_alter_archiveresult_options_archiveresult_abid_and_more'), +('core', '0024_auto_20240513_1143'), +# ... all 52 migrations from 0023-0074 ... +('core', '0023_new_schema'), # Also record the squashed migration itself +``` + +## Common Gotchas + +### 1. File Permissions +New files created by root need permissions fixed for testuser: +```bash +chmod 644 archivebox/tests/test_*.py +``` + +### 2. DATA_DIR Environment Variable +Tests use temp directories. The `run_archivebox()` helper sets `DATA_DIR` automatically. + +### 3. Extractors Disabled for Speed +Tests disable all extractors via environment variables for faster execution: +```python +env['SAVE_TITLE'] = 'False' +env['SAVE_FAVICON'] = 'False' +# ... etc +``` + +### 4. Timeout Settings +Use appropriate timeouts for migration tests (45s for init, 60s default). + +### 5. Circular FK References in Schemas +SQLite handles circular references with `IF NOT EXISTS`. Order matters less than in other DBs. + +## Architecture Notes + +### Crawl Model (0.9.x) +- Crawl groups multiple Snapshots from a single `add` command +- Each `add` creates one Crawl with one or more Snapshots +- Seed model was removed - crawls now store URLs directly + +### Migration Strategy +- Squashed migrations for clean installs +- Individual migrations recorded for upgrades from dev branch +- `replaces` attribute in squashed migrations lists what they replace + +## Debugging Tips + +### Check Migration State +```bash +sqlite3 /path/to/index.sqlite3 "SELECT app, name FROM django_migrations WHERE app='core' ORDER BY id;" +``` + +### Check Table Schema +```bash +sqlite3 /path/to/index.sqlite3 "PRAGMA table_info(core_snapshot);" +``` + +### Verbose Test Output +```bash +sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/test_migrations_08_to_09.py -xvs 2>&1 | head -200' +```