TODO_fix_migration_path.md

# TODO: Fix Migration Path for v0.7.2/v0.8.6rc0 → v0.9.0

## Critical Issue

The migrations currently **LOSE DATA** during the v0.7.2 → v0.9.0 upgrade:
- `extractor` field data is not being copied to `plugin` field
- `output` field data is not being copied to `output_str` field
- Timestamp fields (`added`, `updated`) may not be properly transformed
- Tag UUID → INTEGER conversion may lose FK relationships

## Test Database Locations

Sample databases for testing are available at:
```
/Users/squash/Local/Code/archiveboxes/archivebox-migration-path/archivebox-v0.7.2/data/index.sqlite3
/Users/squash/Local/Code/archiveboxes/archivebox-migration-path/archivebox-v0.8.6rc0/data/index.sqlite3
```

Schema comparison reports:
```
/tmp/schema_comparison_report.md
/tmp/table_presence_matrix.md
```

## How to Test Migrations

### 1. Fresh Install Test
```bash
rm -rf /tmp/test_fresh && mkdir -p /tmp/test_fresh
DATA_DIR=/tmp/test_fresh python -m archivebox init
DATA_DIR=/tmp/test_fresh python -m archivebox status
```

### 2. v0.7.2 Migration Test
```bash
rm -rf /tmp/test_v072 && mkdir -p /tmp/test_v072
cp /Users/squash/Local/Code/archiveboxes/archivebox-migration-path/archivebox-v0.7.2/data/index.sqlite3 /tmp/test_v072/
DATA_DIR=/tmp/test_v072 python -m archivebox init
DATA_DIR=/tmp/test_v072 python -m archivebox status
```

### 3. v0.8.6rc0 Migration Test
```bash
rm -rf /tmp/test_v086 && mkdir -p /tmp/test_v086
cp /Users/squash/Local/Code/archiveboxes/archivebox-migration-path/archivebox-v0.8.6rc0/data/index.sqlite3 /tmp/test_v086/
DATA_DIR=/tmp/test_v086 python -m archivebox init
DATA_DIR=/tmp/test_v086 python -m archivebox status
```

### 4. Verify Data Integrity

After each test, compare original vs migrated data:

```bash
# Check ArchiveResult data preservation
echo "=== ORIGINAL ==="
sqlite3 /path/to/original.db "SELECT id, extractor, output, status FROM core_archiveresult LIMIT 5;"

echo "=== MIGRATED ==="
sqlite3 /tmp/test_vXXX/index.sqlite3 "SELECT id, plugin, output_str, status FROM core_archiveresult LIMIT 5;"

# Check Snapshot data preservation
echo "=== ORIGINAL SNAPSHOTS ==="
sqlite3 /path/to/original.db "SELECT id, url, title, added, updated FROM core_snapshot LIMIT 5;"

echo "=== MIGRATED SNAPSHOTS ==="
sqlite3 /tmp/test_vXXX/index.sqlite3 "SELECT id, url, title, bookmarked_at, created_at, modified_at FROM core_snapshot LIMIT 5;"

# Check Tag data preservation
echo "=== ORIGINAL TAGS ==="
sqlite3 /path/to/original.db "SELECT * FROM core_tag;"

echo "=== MIGRATED TAGS ==="
sqlite3 /tmp/test_vXXX/index.sqlite3 "SELECT * FROM core_tag;"

# Check snapshot-tag relationships
sqlite3 /tmp/test_vXXX/index.sqlite3 "SELECT COUNT(*) FROM core_snapshot_tags;"
```

**CRITICAL**: Verify:
- Row counts match
- All URLs, titles, timestamps are preserved
- All extractor values are copied to plugin field
- All output values are copied to output_str field
- All tag relationships are maintained (tag IDs should be converted from UUID to INTEGER for v0.8.6)

## Migration Philosophy

### Principle: Minimal Manual SQL

Use this approach for complex migrations:

1. **Python**: Detect existing schema version
   ```python
   def get_table_columns(table_name):
       cursor = connection.cursor()
       cursor.execute(f"PRAGMA table_info({table_name})")
       return {row[1] for row in cursor.fetchall()}

   cols = get_table_columns('core_archiveresult')
   has_extractor = 'extractor' in cols
   has_plugin = 'plugin' in cols
   ```

2. **SQL**: Modify database structure during migration
   ```sql
   CREATE TABLE core_archiveresult_new (...);
   INSERT INTO core_archiveresult_new SELECT ... FROM core_archiveresult;
   DROP TABLE core_archiveresult;
   ALTER TABLE core_archiveresult_new RENAME TO core_archiveresult;
   ```

3. **Python**: Copy data between old and new field names
   ```python
   if 'extractor' in cols and 'plugin' in cols:
       cursor.execute("UPDATE core_archiveresult SET plugin = COALESCE(extractor, '')")
   ```

4. **SQL**: Drop old columns/tables
   ```sql
   -- Django's RemoveField will handle this
   ```

5. **Django**: Register the end state so Django knows what the schema should be
   ```python
   migrations.SeparateDatabaseAndState(
       database_operations=[...],  # Your SQL/Python migrations
       state_operations=[...]       # Tell Django what the final schema looks like
   )
   ```

### Key Files

- **core/migrations/0023_upgrade_to_0_9_0.py**: Raw SQL migration that upgrades tables from v0.7.2/v0.8.6 schema
  - Should create NEW tables with OLD field names (extractor, output, added, updated)
  - Should preserve ALL data during table rebuild
  - Should NOT add new fields yet (let Django migrations handle that)

- **core/migrations/0025_alter_archiveresult_options_...py**: Django-generated migration
  - Adds new fields (plugin, output_str, bookmarked_at, created_at, etc.)
  - Should include RunPython to copy data from old fields to new fields AFTER AddField operations
  - RemoveField operations to remove old columns

- **crawls/migrations/0002_upgrade_from_0_8_6.py**: Handles crawls_crawl table upgrade
  - v0.8.6 has `seed_id` + `persona` (VARCHAR)
  - v0.9.0 has `urls` + `persona_id` (UUID FK)

## How to Make vs Apply Migrations

### Making Migrations (Creating New Migrations)

**Always run from the archivebox/ subdirectory** (NOT from a data dir):

```bash
cd archivebox/
./manage.py makemigrations
./manage.py makemigrations --check  # Verify no unreflected changes
```

This works because `archivebox/manage.py` has:
```python
os.environ.setdefault('ARCHIVEBOX_DATA_DIR', '.')
```

### Applying Migrations (Testing Migrations)

**Always run from inside a data directory** using `archivebox init`:

```bash
# WRONG - Don't do this:
cd /some/data/dir
../path/to/archivebox/manage.py migrate

# RIGHT - Do this:
DATA_DIR=/some/data/dir python -m archivebox init
```

Why? Because `archivebox init`:
- Sets up the data directory structure
- Runs migrations with proper DATA_DIR context
- Creates necessary files and folders
- Validates the installation

## Schema Version Differences

### v0.7.2 Schema (Migration 0022)
- **ArchiveResult**: `id` (INTEGER), `uuid`, `extractor`, `output`, `cmd`, `pwd`, `cmd_version`, `start_ts`, `end_ts`, `status`, `snapshot_id`
- **Snapshot**: `id`, `url`, `timestamp`, `title`, `added`, `updated`, `crawl_id`
- **Tag**: `id` (INTEGER), `name`, `slug`
- **Crawl**: Doesn't exist in v0.7.2

### v0.8.6rc0 Schema
- **ArchiveResult**: `id`, `abid` (not uuid!), `extractor`, `output`, `created_at`, `modified_at`, `retry_at`, `status`, ...
- **Snapshot**: `id`, `url`, `bookmarked_at`, `created_at`, `modified_at`, `crawl_id`, `status`, `retry_at`, ...
- **Tag**: `id` (UUID/CHAR!), `name`, `slug`, `abid`, `created_at`, `modified_at`, `created_by_id`
- **Crawl**: `id`, `seed_id`, `persona` (VARCHAR), `max_depth`, `tags_str`, `status`, `retry_at`, ...

### v0.9.0 Target Schema
- **ArchiveResult**: `id` (INTEGER), `uuid`, `plugin` (not extractor!), `output_str` (not output!), `hook_name`, `created_at`, `modified_at`, `output_files`, `output_json`, `output_size`, `output_mimetypes`, `retry_at`, ...
- **Snapshot**: `id`, `url`, `bookmarked_at` (not added!), `created_at`, `modified_at` (not updated!), `crawl_id`, `parent_snapshot_id`, `status`, `retry_at`, `current_step`, `depth`, `fs_version`, ...
- **Tag**: `id` (INTEGER!), `name`, `slug`, `created_at`, `modified_at`, `created_by_id`
- **Crawl**: `id`, `urls` (not seed_id!), `persona_id` (not persona!), `label`, `notes`, `output_dir`, ...

## Critical Gotchas and Mistakes to Avoid

### 1. ❌ DON'T Create New Fields in SQL Migration (0023)

**WRONG**:
```python
# In core/migrations/0023_upgrade_to_0_9_0.py
cursor.execute("""
    CREATE TABLE core_archiveresult_new (
        id INTEGER PRIMARY KEY,
        plugin VARCHAR(32),  # ❌ New field!
        output_str TEXT,     # ❌ New field!
        ...
    )
""")
```

**RIGHT**:
```python
# In core/migrations/0023_upgrade_to_0_9_0.py - Keep OLD field names!
cursor.execute("""
    CREATE TABLE core_archiveresult_new (
        id INTEGER PRIMARY KEY,
        extractor VARCHAR(32),  # ✓ OLD field name
        output VARCHAR(1024),   # ✓ OLD field name
        ...
    )
""")
```

**Why**: If you create new fields in SQL, Django's AddField operation in migration 0025 will overwrite them with default values, losing your data!

### 2. ❌ DON'T Copy Data in SQL Migration

**WRONG**:
```python
# In core/migrations/0023
cursor.execute("""
    INSERT INTO core_archiveresult_new (plugin, output_str, ...)
    SELECT COALESCE(extractor, ''), COALESCE(output, ''), ...
    FROM core_archiveresult
""")
```

**RIGHT**: Keep old field names in SQL, let Django AddField create new columns, then copy:
```python
# In core/migrations/0025 (AFTER AddField operations)
def copy_old_to_new(apps, schema_editor):
    cursor = connection.cursor()
    cursor.execute("UPDATE core_archiveresult SET plugin = COALESCE(extractor, '')")
    cursor.execute("UPDATE core_archiveresult SET output_str = COALESCE(output, '')")
```

### 3. ❌ DON'T Assume Empty Tables Mean Fresh Install

**WRONG**:
```python
cursor.execute("SELECT COUNT(*) FROM core_archiveresult")
if cursor.fetchone()[0] == 0:
    return  # Skip migration
```

**Why**: Fresh installs run migrations 0001-0022 which CREATE empty tables with old schema. Migration 0023 must still upgrade the schema even if tables are empty!

**RIGHT**: Detect schema version by checking column names:
```python
cols = get_table_columns('core_archiveresult')
has_extractor = 'extractor' in cols
if has_extractor:
    # Old schema - needs upgrade
```

### 4. ❌ DON'T Run Migrations from Data Directories

**WRONG**:
```bash
cd /path/to/data/dir
python manage.py makemigrations
```

**RIGHT**:
```bash
cd archivebox/  # The archivebox package directory
./manage.py makemigrations
```

### 5. ❌ DON'T Use WHERE Clauses to Skip SQL Selects

**WRONG**:
```sql
INSERT INTO new_table SELECT uuid FROM old_table
WHERE EXISTS (SELECT 1 FROM pragma_table_info('old_table') WHERE name='uuid');
```

**Why**: SQLite still evaluates the `uuid` column reference even if WHERE clause is false, causing "no such column" errors.

**RIGHT**: Use Python to detect schema, then run appropriate SQL:
```python
if 'uuid' in get_table_columns('old_table'):
    cursor.execute("INSERT INTO new_table SELECT uuid FROM old_table")
else:
    cursor.execute("INSERT INTO new_table SELECT abid as uuid FROM old_table")
```

### 6. ❌ DON'T Mix UUID and INTEGER for Tag IDs

v0.8.6rc0 has Tag.id as UUID, but v0.9.0 needs INTEGER. The conversion must:
1. Create mapping of old UUID → new INTEGER
2. Update core_tag with new IDs
3. Update core_snapshot_tags with new tag_id values

See `core/migrations/0023_upgrade_to_0_9_0.py` PART 3 for the correct approach.

### 7. ❌ DON'T Forget SeparateDatabaseAndState

When you manually change the database with SQL, you MUST tell Django what the final state is:

```python
migrations.SeparateDatabaseAndState(
    database_operations=[
        migrations.RunPython(my_sql_function),
    ],
    state_operations=[
        migrations.RemoveField('archiveresult', 'extractor'),
        migrations.RemoveField('archiveresult', 'output'),
    ],
)
```

Without `state_operations`, Django won't know the old fields are gone and `makemigrations --check` will show unreflected changes.

### 8. ✅ DO Print Debug Messages

```python
print(f'Migrating ArchiveResult from v0.7.2 schema...')
print(f'DEBUG: has_uuid={has_uuid}, has_abid={has_abid}, row_count={row_count}')
```

This helps diagnose which migration path is being taken.

### 9. ✅ DO Test All Three Scenarios

Always test:
1. Fresh install (empty database)
2. v0.7.2 upgrade (12 snapshots, 44 archiveresults, 2 tags)
3. v0.8.6rc0 upgrade (14 snapshots, 0 archiveresults, multiple tags with UUIDs)

### 10. ✅ DO Verify No Unreflected Migrations

After all changes:
```bash
cd archivebox/
./manage.py makemigrations --check
# Should output: No changes detected
```

## Current Status

As of 2025-01-01, migrations have these issues:

1. ✅ Fresh install works
2. ✅ v0.7.2 → v0.9.0 migration runs without errors
3. ✅ v0.8.6rc0 → v0.9.0 migration runs without errors
4. ❌ **DATA IS LOST**: `extractor` → `plugin` field data not copied
5. ❌ **DATA IS LOST**: `output` → `output_str` field data not copied
6. ❌ Timestamps (added/updated → bookmarked_at/created_at/modified_at) may have wrong values
7. ❌ Tag relationships may be broken after UUID → INTEGER conversion

## Files That Need Fixing

1. **core/migrations/0023_upgrade_to_0_9_0.py**
   - Line 42-58: CREATE TABLE should use OLD field names (extractor, output, added, updated)
   - Lines 64-88: INSERT SELECT should just copy data as-is, no field renaming yet
   - Remove all references to plugin, output_str, bookmarked_at, created_at - these are added by 0025

2. **core/migrations/0025_...py**
   - Add RunPython operation AFTER all AddField operations
   - This RunPython should copy: extractor→plugin, output→output_str, added→bookmarked_at/created_at, updated→modified_at
   - Fix syntax error on line 28: `{extractor" in cols}` → `{"extractor" in cols}`

3. **crawls/migrations/0002_upgrade_from_0_8_6.py**
   - Already correctly handles conditional upgrade based on schema detection
   - No changes needed if crawls table data isn't critical

## Next Steps

1. Fix core/migrations/0023 to preserve OLD field names
2. Fix core/migrations/0025 to copy data from old → new fields after AddField
3. Remove debug print statements (lines with `print(f'DEBUG:...`)
4. Test all three scenarios
5. Verify data integrity with SQL queries above
6. Run `./manage.py makemigrations --check` to ensure no unreflected changes

## Reference: Field Mappings

| Old Field (v0.7.2/v0.8.6) | New Field (v0.9.0) | Notes |
|---------------------------|-------------------|--------|
| `extractor` | `plugin` | Rename |
| `output` | `output_str` | Rename |
| `added` | `bookmarked_at` | Rename + also use for `created_at` |
| `updated` | `modified_at` | Rename |
| `abid` | `uuid` | v0.8.6 only, field rename |
| Tag.id (UUID) | Tag.id (INTEGER) | v0.8.6 only, type conversion |
| `seed_id` | `urls` | Crawl table, v0.8.6 only |
| `persona` (VARCHAR) | `persona_id` (UUID FK) | Crawl table, v0.8.6 only |

## Testing Checklist

- [ ] Fresh install creates correct schema
- [ ] Fresh install has 0 snapshots, 0 archiveresults
- [ ] v0.7.2 migration preserves all 12 snapshots
- [ ] v0.7.2 migration preserves all 44 archiveresults
- [ ] v0.7.2 migration preserves all 2 tags
- [ ] v0.7.2 migration copies `extractor` → `plugin` (check first 5 rows)
- [ ] v0.7.2 migration copies `output` → `output_str` (check first 5 rows)
- [ ] v0.7.2 migration copies `added` → `bookmarked_at` (compare timestamps)
- [ ] v0.7.2 migration copies `updated` → `modified_at` (compare timestamps)
- [ ] v0.8.6 migration preserves all 14 snapshots
- [ ] v0.8.6 migration converts Tag IDs from UUID → INTEGER
- [ ] v0.8.6 migration preserves tag relationships in core_snapshot_tags
- [ ] v0.8.6 migration converts `abid` → `uuid` field
- [ ] `./manage.py makemigrations --check` shows no changes
- [ ] All migrations run without errors
- [ ] `archivebox status` shows correct snapshot/link counts
more migrations fixes attempts 2025-12-31 17:44:55 -08:00			`# TODO: Fix Migration Path for v0.7.2/v0.8.6rc0 → v0.9.0`

			`## Critical Issue`

			`The migrations currently LOSE DATA during the v0.7.2 → v0.9.0 upgrade:`
			- `extractor` field data is not being copied to `plugin` field
			- `output` field data is not being copied to `output_str` field
			- Timestamp fields (`added`, `updated`) may not be properly transformed
			`- Tag UUID → INTEGER conversion may lose FK relationships`

			`## Test Database Locations`

			`Sample databases for testing are available at:`
			```
			`/Users/squash/Local/Code/archiveboxes/archivebox-migration-path/archivebox-v0.7.2/data/index.sqlite3`
			`/Users/squash/Local/Code/archiveboxes/archivebox-migration-path/archivebox-v0.8.6rc0/data/index.sqlite3`
			```

			`Schema comparison reports:`
			```
			`/tmp/schema_comparison_report.md`
			`/tmp/table_presence_matrix.md`
			```

			`## How to Test Migrations`

			`### 1. Fresh Install Test`
			```bash
			`rm -rf /tmp/test_fresh && mkdir -p /tmp/test_fresh`
			`DATA_DIR=/tmp/test_fresh python -m archivebox init`
			`DATA_DIR=/tmp/test_fresh python -m archivebox status`
			```

			`### 2. v0.7.2 Migration Test`
			```bash
			`rm -rf /tmp/test_v072 && mkdir -p /tmp/test_v072`
			`cp /Users/squash/Local/Code/archiveboxes/archivebox-migration-path/archivebox-v0.7.2/data/index.sqlite3 /tmp/test_v072/`
			`DATA_DIR=/tmp/test_v072 python -m archivebox init`
			`DATA_DIR=/tmp/test_v072 python -m archivebox status`
			```

			`### 3. v0.8.6rc0 Migration Test`
			```bash
			`rm -rf /tmp/test_v086 && mkdir -p /tmp/test_v086`
			`cp /Users/squash/Local/Code/archiveboxes/archivebox-migration-path/archivebox-v0.8.6rc0/data/index.sqlite3 /tmp/test_v086/`
			`DATA_DIR=/tmp/test_v086 python -m archivebox init`
			`DATA_DIR=/tmp/test_v086 python -m archivebox status`
			```

			`### 4. Verify Data Integrity`

			`After each test, compare original vs migrated data:`

			```bash
			`# Check ArchiveResult data preservation`
			`echo "=== ORIGINAL ==="`
			`sqlite3 /path/to/original.db "SELECT id, extractor, output, status FROM core_archiveresult LIMIT 5;"`

			`echo "=== MIGRATED ==="`
			`sqlite3 /tmp/test_vXXX/index.sqlite3 "SELECT id, plugin, output_str, status FROM core_archiveresult LIMIT 5;"`

			`# Check Snapshot data preservation`
			`echo "=== ORIGINAL SNAPSHOTS ==="`
			`sqlite3 /path/to/original.db "SELECT id, url, title, added, updated FROM core_snapshot LIMIT 5;"`

			`echo "=== MIGRATED SNAPSHOTS ==="`
			`sqlite3 /tmp/test_vXXX/index.sqlite3 "SELECT id, url, title, bookmarked_at, created_at, modified_at FROM core_snapshot LIMIT 5;"`

			`# Check Tag data preservation`
			`echo "=== ORIGINAL TAGS ==="`
			`sqlite3 /path/to/original.db "SELECT * FROM core_tag;"`

			`echo "=== MIGRATED TAGS ==="`
			`sqlite3 /tmp/test_vXXX/index.sqlite3 "SELECT * FROM core_tag;"`

			`# Check snapshot-tag relationships`
			`sqlite3 /tmp/test_vXXX/index.sqlite3 "SELECT COUNT(*) FROM core_snapshot_tags;"`
			```

			`CRITICAL: Verify:`
			`- Row counts match`
			`- All URLs, titles, timestamps are preserved`
			`- All extractor values are copied to plugin field`
			`- All output values are copied to output_str field`
			`- All tag relationships are maintained (tag IDs should be converted from UUID to INTEGER for v0.8.6)`

			`## Migration Philosophy`

			`### Principle: Minimal Manual SQL`

			`Use this approach for complex migrations:`

			`1. Python: Detect existing schema version`
			```python
			`def get_table_columns(table_name):`
			`cursor = connection.cursor()`
			`cursor.execute(f"PRAGMA table_info({table_name})")`
			`return {row[1] for row in cursor.fetchall()}`

			`cols = get_table_columns('core_archiveresult')`
			`has_extractor = 'extractor' in cols`
			`has_plugin = 'plugin' in cols`
			```

			`2. SQL: Modify database structure during migration`
			```sql
			`CREATE TABLE core_archiveresult_new (...);`
			`INSERT INTO core_archiveresult_new SELECT ... FROM core_archiveresult;`
			`DROP TABLE core_archiveresult;`
			`ALTER TABLE core_archiveresult_new RENAME TO core_archiveresult;`
			```

			`3. Python: Copy data between old and new field names`
			```python
			`if 'extractor' in cols and 'plugin' in cols:`
			`cursor.execute("UPDATE core_archiveresult SET plugin = COALESCE(extractor, '')")`
			```

			`4. SQL: Drop old columns/tables`
			```sql
			`-- Django's RemoveField will handle this`
			```

			`5. Django: Register the end state so Django knows what the schema should be`
			```python
			`migrations.SeparateDatabaseAndState(`
			`database_operations=[...], # Your SQL/Python migrations`
			`state_operations=[...] # Tell Django what the final schema looks like`
			`)`
			```

			`### Key Files`

			`- core/migrations/0023_upgrade_to_0_9_0.py: Raw SQL migration that upgrades tables from v0.7.2/v0.8.6 schema`
			`- Should create NEW tables with OLD field names (extractor, output, added, updated)`
			`- Should preserve ALL data during table rebuild`
			`- Should NOT add new fields yet (let Django migrations handle that)`

			`- core/migrations/0025_alter_archiveresult_options_...py: Django-generated migration`
			`- Adds new fields (plugin, output_str, bookmarked_at, created_at, etc.)`
			`- Should include RunPython to copy data from old fields to new fields AFTER AddField operations`
			`- RemoveField operations to remove old columns`

			`- crawls/migrations/0002_upgrade_from_0_8_6.py: Handles crawls_crawl table upgrade`
			- v0.8.6 has `seed_id` + `persona` (VARCHAR)
			- v0.9.0 has `urls` + `persona_id` (UUID FK)

			`## How to Make vs Apply Migrations`

			`### Making Migrations (Creating New Migrations)`

			`Always run from the archivebox/ subdirectory (NOT from a data dir):`

			```bash
			`cd archivebox/`
			`./manage.py makemigrations`
			`./manage.py makemigrations --check # Verify no unreflected changes`
			```

			This works because `archivebox/manage.py` has:
			```python
			`os.environ.setdefault('ARCHIVEBOX_DATA_DIR', '.')`
			```

			`### Applying Migrations (Testing Migrations)`

			Always run from inside a data directory using `archivebox init`:

			```bash
			`# WRONG - Don't do this:`
			`cd /some/data/dir`
			`../path/to/archivebox/manage.py migrate`

			`# RIGHT - Do this:`
			`DATA_DIR=/some/data/dir python -m archivebox init`
			```

			Why? Because `archivebox init`:
			`- Sets up the data directory structure`
			`- Runs migrations with proper DATA_DIR context`
			`- Creates necessary files and folders`
			`- Validates the installation`

			`## Schema Version Differences`

			`### v0.7.2 Schema (Migration 0022)`
			- ArchiveResult: `id` (INTEGER), `uuid`, `extractor`, `output`, `cmd`, `pwd`, `cmd_version`, `start_ts`, `end_ts`, `status`, `snapshot_id`
			- Snapshot: `id`, `url`, `timestamp`, `title`, `added`, `updated`, `crawl_id`
			- Tag: `id` (INTEGER), `name`, `slug`
			`- Crawl: Doesn't exist in v0.7.2`

			`### v0.8.6rc0 Schema`
			- ArchiveResult: `id`, `abid` (not uuid!), `extractor`, `output`, `created_at`, `modified_at`, `retry_at`, `status`, ...
			- Snapshot: `id`, `url`, `bookmarked_at`, `created_at`, `modified_at`, `crawl_id`, `status`, `retry_at`, ...
			- Tag: `id` (UUID/CHAR!), `name`, `slug`, `abid`, `created_at`, `modified_at`, `created_by_id`
			- Crawl: `id`, `seed_id`, `persona` (VARCHAR), `max_depth`, `tags_str`, `status`, `retry_at`, ...

			`### v0.9.0 Target Schema`
			- ArchiveResult: `id` (INTEGER), `uuid`, `plugin` (not extractor!), `output_str` (not output!), `hook_name`, `created_at`, `modified_at`, `output_files`, `output_json`, `output_size`, `output_mimetypes`, `retry_at`, ...
			- Snapshot: `id`, `url`, `bookmarked_at` (not added!), `created_at`, `modified_at` (not updated!), `crawl_id`, `parent_snapshot_id`, `status`, `retry_at`, `current_step`, `depth`, `fs_version`, ...
			- Tag: `id` (INTEGER!), `name`, `slug`, `created_at`, `modified_at`, `created_by_id`
			- Crawl: `id`, `urls` (not seed_id!), `persona_id` (not persona!), `label`, `notes`, `output_dir`, ...

			`## Critical Gotchas and Mistakes to Avoid`

			`### 1. ❌ DON'T Create New Fields in SQL Migration (0023)`

			`WRONG:`
			```python
			`# In core/migrations/0023_upgrade_to_0_9_0.py`
			`cursor.execute("""`
			`CREATE TABLE core_archiveresult_new (`
			`id INTEGER PRIMARY KEY,`
			`plugin VARCHAR(32), # ❌ New field!`
			`output_str TEXT, # ❌ New field!`
			`...`
			`)`
			`""")`
			```

			`RIGHT:`
			```python
			`# In core/migrations/0023_upgrade_to_0_9_0.py - Keep OLD field names!`
			`cursor.execute("""`
			`CREATE TABLE core_archiveresult_new (`
			`id INTEGER PRIMARY KEY,`
			`extractor VARCHAR(32), # ✓ OLD field name`
			`output VARCHAR(1024), # ✓ OLD field name`
			`...`
			`)`
			`""")`
			```

			`Why: If you create new fields in SQL, Django's AddField operation in migration 0025 will overwrite them with default values, losing your data!`

			`### 2. ❌ DON'T Copy Data in SQL Migration`

			`WRONG:`
			```python
			`# In core/migrations/0023`
			`cursor.execute("""`
			`INSERT INTO core_archiveresult_new (plugin, output_str, ...)`
			`SELECT COALESCE(extractor, ''), COALESCE(output, ''), ...`
			`FROM core_archiveresult`
			`""")`
			```

			`RIGHT: Keep old field names in SQL, let Django AddField create new columns, then copy:`
			```python
			`# In core/migrations/0025 (AFTER AddField operations)`
			`def copy_old_to_new(apps, schema_editor):`
			`cursor = connection.cursor()`
			`cursor.execute("UPDATE core_archiveresult SET plugin = COALESCE(extractor, '')")`
			`cursor.execute("UPDATE core_archiveresult SET output_str = COALESCE(output, '')")`
			```

			`### 3. ❌ DON'T Assume Empty Tables Mean Fresh Install`

			`WRONG:`
			```python
			`cursor.execute("SELECT COUNT(*) FROM core_archiveresult")`
			`if cursor.fetchone()[0] == 0:`
			`return # Skip migration`
			```

			`Why: Fresh installs run migrations 0001-0022 which CREATE empty tables with old schema. Migration 0023 must still upgrade the schema even if tables are empty!`

			`RIGHT: Detect schema version by checking column names:`
			```python
			`cols = get_table_columns('core_archiveresult')`
			`has_extractor = 'extractor' in cols`
			`if has_extractor:`
			`# Old schema - needs upgrade`
			```

			`### 4. ❌ DON'T Run Migrations from Data Directories`

			`WRONG:`
			```bash
			`cd /path/to/data/dir`
			`python manage.py makemigrations`
			```

			`RIGHT:`
			```bash
			`cd archivebox/ # The archivebox package directory`
			`./manage.py makemigrations`
			```

			`### 5. ❌ DON'T Use WHERE Clauses to Skip SQL Selects`

			`WRONG:`
			```sql
			`INSERT INTO new_table SELECT uuid FROM old_table`
			`WHERE EXISTS (SELECT 1 FROM pragma_table_info('old_table') WHERE name='uuid');`
			```

			Why: SQLite still evaluates the `uuid` column reference even if WHERE clause is false, causing "no such column" errors.

			`RIGHT: Use Python to detect schema, then run appropriate SQL:`
			```python
			`if 'uuid' in get_table_columns('old_table'):`
			`cursor.execute("INSERT INTO new_table SELECT uuid FROM old_table")`
			`else:`
			`cursor.execute("INSERT INTO new_table SELECT abid as uuid FROM old_table")`
			```

			`### 6. ❌ DON'T Mix UUID and INTEGER for Tag IDs`

			`v0.8.6rc0 has Tag.id as UUID, but v0.9.0 needs INTEGER. The conversion must:`
			`1. Create mapping of old UUID → new INTEGER`
			`2. Update core_tag with new IDs`
			`3. Update core_snapshot_tags with new tag_id values`

			See `core/migrations/0023_upgrade_to_0_9_0.py` PART 3 for the correct approach.

			`### 7. ❌ DON'T Forget SeparateDatabaseAndState`

			`When you manually change the database with SQL, you MUST tell Django what the final state is:`

			```python
			`migrations.SeparateDatabaseAndState(`
			`database_operations=[`
			`migrations.RunPython(my_sql_function),`
			`],`
			`state_operations=[`
			`migrations.RemoveField('archiveresult', 'extractor'),`
			`migrations.RemoveField('archiveresult', 'output'),`
			`],`
			`)`
			```

			Without `state_operations`, Django won't know the old fields are gone and `makemigrations --check` will show unreflected changes.

			`### 8. ✅ DO Print Debug Messages`

			```python
			`print(f'Migrating ArchiveResult from v0.7.2 schema...')`
			`print(f'DEBUG: has_uuid={has_uuid}, has_abid={has_abid}, row_count={row_count}')`
			```

			`This helps diagnose which migration path is being taken.`

			`### 9. ✅ DO Test All Three Scenarios`

			`Always test:`
			`1. Fresh install (empty database)`
			`2. v0.7.2 upgrade (12 snapshots, 44 archiveresults, 2 tags)`
			`3. v0.8.6rc0 upgrade (14 snapshots, 0 archiveresults, multiple tags with UUIDs)`

			`### 10. ✅ DO Verify No Unreflected Migrations`

			`After all changes:`
			```bash
			`cd archivebox/`
			`./manage.py makemigrations --check`
			`# Should output: No changes detected`
			```

			`## Current Status`

			`As of 2025-01-01, migrations have these issues:`

			`1. ✅ Fresh install works`
			`2. ✅ v0.7.2 → v0.9.0 migration runs without errors`
			`3. ✅ v0.8.6rc0 → v0.9.0 migration runs without errors`
			4. ❌ DATA IS LOST: `extractor` → `plugin` field data not copied
			5. ❌ DATA IS LOST: `output` → `output_str` field data not copied
			`6. ❌ Timestamps (added/updated → bookmarked_at/created_at/modified_at) may have wrong values`
			`7. ❌ Tag relationships may be broken after UUID → INTEGER conversion`

			`## Files That Need Fixing`

			`1. core/migrations/0023_upgrade_to_0_9_0.py`
			`- Line 42-58: CREATE TABLE should use OLD field names (extractor, output, added, updated)`
			`- Lines 64-88: INSERT SELECT should just copy data as-is, no field renaming yet`
			`- Remove all references to plugin, output_str, bookmarked_at, created_at - these are added by 0025`

			`2. core/migrations/0025_...py`
			`- Add RunPython operation AFTER all AddField operations`
			`- This RunPython should copy: extractor→plugin, output→output_str, added→bookmarked_at/created_at, updated→modified_at`
			- Fix syntax error on line 28: `{extractor" in cols}` → `{"extractor" in cols}`

			`3. crawls/migrations/0002_upgrade_from_0_8_6.py`
			`- Already correctly handles conditional upgrade based on schema detection`
			`- No changes needed if crawls table data isn't critical`

			`## Next Steps`

			`1. Fix core/migrations/0023 to preserve OLD field names`
			`2. Fix core/migrations/0025 to copy data from old → new fields after AddField`
			3. Remove debug print statements (lines with `print(f'DEBUG:...`)
			`4. Test all three scenarios`
			`5. Verify data integrity with SQL queries above`
			6. Run `./manage.py makemigrations --check` to ensure no unreflected changes

			`## Reference: Field Mappings`

			`\| Old Field (v0.7.2/v0.8.6) \| New Field (v0.9.0) \| Notes \|`
			`\|---------------------------\|-------------------\|--------\|`
			\| `extractor` \| `plugin` \| Rename \|
			\| `output` \| `output_str` \| Rename \|
			\| `added` \| `bookmarked_at` \| Rename + also use for `created_at` \|
			\| `updated` \| `modified_at` \| Rename \|
			\| `abid` \| `uuid` \| v0.8.6 only, field rename \|
			`\| Tag.id (UUID) \| Tag.id (INTEGER) \| v0.8.6 only, type conversion \|`
			\| `seed_id` \| `urls` \| Crawl table, v0.8.6 only \|
			\| `persona` (VARCHAR) \| `persona_id` (UUID FK) \| Crawl table, v0.8.6 only \|

			`## Testing Checklist`

			`- [ ] Fresh install creates correct schema`
			`- [ ] Fresh install has 0 snapshots, 0 archiveresults`
			`- [ ] v0.7.2 migration preserves all 12 snapshots`
			`- [ ] v0.7.2 migration preserves all 44 archiveresults`
			`- [ ] v0.7.2 migration preserves all 2 tags`
			- [ ] v0.7.2 migration copies `extractor` → `plugin` (check first 5 rows)
			- [ ] v0.7.2 migration copies `output` → `output_str` (check first 5 rows)
			- [ ] v0.7.2 migration copies `added` → `bookmarked_at` (compare timestamps)
			- [ ] v0.7.2 migration copies `updated` → `modified_at` (compare timestamps)
			`- [ ] v0.8.6 migration preserves all 14 snapshots`
			`- [ ] v0.8.6 migration converts Tag IDs from UUID → INTEGER`
			`- [ ] v0.8.6 migration preserves tag relationships in core_snapshot_tags`
			- [ ] v0.8.6 migration converts `abid` → `uuid` field
			- [ ] `./manage.py makemigrations --check` shows no changes
			`- [ ] All migrations run without errors`
			- [ ] `archivebox status` shows correct snapshot/link counts