Back to Roadmap

Unified Activity Data Model

Replace the current HealthKit-centric schema (workouts, workout_statistics, heart_rate_samples, workout_routes, route_locations, workout_events, workout_activities, sync_state) with the SYNTHESIS schema. The current database has only test data and can be nuked (SYNTHESIS §12).

Tables to implement (Drizzle ORM, from SYNTHESIS §7)

Core:

  • athletes — bridges better-auth user to all activity data (1:N user→athletes)
  • activities — replaces workouts, adds canonical type system, self-referencing parent_id for sessions, multi-source support (source + external_id)
  • activity_statistics — replaces workout_statistics, key-value for sport-specific metrics
  • activity_streams — Postgres metadata pointing to Parquet files on S3/R2/MinIO. Replaces heart_rate_samples, workout_routes, route_locations
  • activity_events — replaces workout_events
  • sync_states — replaces sync_state, now per-athlete per-source

Reference data:

  • activity-types.json seed file in packages/seed-data/ — canonical slugs, names, categories for ~80 activity types

Does NOT include (deferred): exercise_sets, exercise_templates, climb_logs, equipment*, workout_plans, activity_intervals, sleep_sessions, sleep_streams, health_samples_staging, health_streams, athlete_metrics, training_zones, sport_zone_priorities, source_priorities.

Acceptance criteria

  • Drizzle schema matches SYNTHESIS §7 for all core tables listed above
  • bun run db:push applies cleanly
  • oRPC contract updated for activities (not workouts)
  • Old workouts-based routes removed or redirected
  • packages/seed-data/activity-types.json exists with Tier 1 + Tier 2 types from SYNTHESIS §8

Key design decisions (from SYNTHESIS)

  • Use "activity" terminology (not "workout") — avoids collision with planned/prescribed sessions
  • String slugs for activity types (not integer enums) — 'running', 'bouldering', etc.
  • Store both canonical slug (activity_type) AND original provider type (source_activity_type) for lossless round-tripping
  • source_data JSONB — raw provider response, never modified after ingest
  • Self-referencing parent_id for session grouping (gym session = parent, sub-activities = children)
  • Unit conventions: distance=meters, duration=seconds, calories=kcal, HR=bpm, speed=m/s, coordinates=decimal degrees

Blocks

#4, #3, #5, #16, #20, #21, #23, #24, #25, #27

Reference

See docs/api-research/SYNTHESIS.md §2 (Users vs Athletes), §3 (Activity Types), §4 (Sessions), §5 (Promoted Fields), §7 (Schema), §8 (Activity Type Catalog), §9 (Unit Conventions).*

Status
Done
Priority
Top
Platform
API
Date
1 month ago