From 6c7854edd4a714527e226390a069464aa8084532 Mon Sep 17 00:00:00 2001
From: Daniel Bulant <danbulant@gmail.com>
Date: Sat, 20 Jun 2026 22:51:51 +0200
Subject: [PATCH] gen docs

---
 api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md | 374 ++++++++++++++++++
 api/POSSIBLE_QUESTIONS.md                     |  81 ++++
 2 files changed, 455 insertions(+)
 create mode 100644 api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md
 create mode 100644 api/POSSIBLE_QUESTIONS.md

diff --git a/api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md b/api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md
new file mode 100644
index 0000000..f9ab904
--- /dev/null
+++ b/api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md
@@ -0,0 +1,374 @@
+# Party Analysis and Question Generation Algorithm
+
+This document describes how the API analyzes a party's Spotify data and turns that analysis into quiz questions.
+
+Relevant implementation files:
+
+- `src/workflows/party-analysis.ts` — computes and stores `party.analysisData`.
+- `src/workflows/quiz.ts` — starts analysis, runs the quiz loop, and scores answers.
+- `src/party/question-generator.ts` — chooses a question type and attaches a song.
+- `src/party/audio-question-generator.ts` — builds audio metadata choice questions.
+- `src/party/social-question-generator.ts` — builds social choice questions.
+- `src/party/numeric-question-generator.ts` — builds numeric questions.
+- `src/party/question-utils.ts` — shared fairness, deduplication, option, and song selection logic.
+
+## High-level flow
+
+```mermaid
+flowchart TD
+    A[Quiz starts] --> B[Analyze party]
+    B --> C[Store party.analysisData]
+    C --> D[Initialize quiz state]
+    D --> E[Generate next question]
+    E --> F[Publish current question in party data]
+    F --> G[Wait for player answers or timeout]
+    G --> H[Score round]
+    H --> I[Review period]
+    I --> J{More questions?}
+    J -->|Yes| E
+    J -->|No| K[Show results and mark party ended]
+```
+
+When a quiz starts, `QuizWorkflow.startQuiz` first runs `partyAnalysisWorkflow.analyzeParty(partyId)`. The generated analysis is saved to the `party.analysisData` JSON column and then reused for every question in the quiz.
+
+The quiz currently asks up to `TOTAL_QUESTIONS = 5` questions. Each question has a 60 second answer window, followed by a 5 second review period.
+
+## Party analysis
+
+Party analysis converts each member's listening data into comparable track, artist, and genre scores. The result is a compact party-level summary designed for fast question generation.
+
+### 1. Minimum party size
+
+If a party has fewer than 2 members, analysis is saved as empty:
+
+- `storyClusters: []`
+- `pairwise: []`
+- `groupSummary.totalMembers`
+- `groupSummary.mostSharedGenres: []`
+- `groupSummary.mostDiverseMember: null`
+- `groupSummary.mostAlignedPair: null`
+- `memberProfiles: []`
+
+The workflow returns `analyzed: false` in that case.
+
+### 2. Per-member scoring
+
+For each party member, the analysis workflow fetches several Spotify-derived tables and accumulates scores into three maps:
+
+- tracks
+- artists
+- genres
+
+The score inputs are:
+
+| Source | Scoring |
+| --- | --- |
+| Medium-term top tracks | `MAX_POSITION - position + 1`, with `MAX_POSITION = 50` |
+| Saved tracks | track `+10`, artists `+5`, genres `+2.5` |
+| Playback history | track `+5` if played in last 24h, `+3` if played in last week, otherwise `+1`; artists get half, genres get quarter |
+| Medium-term top artists | `MAX_POSITION - position + 1` |
+| Followed artists | artist `+10`, genres `+10` |
+| Saved albums | album artists `+5`, genres `+2.5` |
+
+Top track and top artist scores are position-weighted, so rank 1 contributes more than rank 50. Saved and followed items add fixed preference signals. Playback history adds recency-weighted listening signals.
+
+### 3. Party-level entity maps
+
+After all member scores are fetched, the workflow builds one map per entity type:
+
+- `TrackEntityScore`
+- `ArtistEntityScore`
+- `GenreEntityScore`
+
+Each entity contains:
+
+- entity id and display name
+- track artist names and album name, for tracks
+- `memberScores`, the list of members who contributed to the entity and their score
+- `memberCount`, the number of party members represented by that entity
+
+These maps make it possible to tell which songs, artists, and genres are shared by multiple people and which are strongly associated with one person.
+
+### 4. Story clusters
+
+Story clusters group entities by the exact subset of party members that share them.
+
+For example, if Alice and Bob both have the same genre, that genre goes into the cluster keyed by `Alice|Bob`. If all party members share a track, that track goes into the all-members cluster.
+
+Clusters are sorted by:
+
+1. all-members cluster first
+2. larger `memberCount`
+3. total track score in the cluster
+
+Within each cluster, tracks, artists, and genres are sorted by total score descending.
+
+The stored analysis is compacted to:
+
+- top 8 story clusters
+- top 20 tracks, artists, and genres per cluster
+
+### 5. Pairwise similarity
+
+For every pair of party members, the workflow computes:
+
+- `sharedTracks`
+- `sharedArtists`
+- `sharedGenres`
+- `similarity`
+
+Similarity uses a weighted Jaccard-style score across tracks, artists, and genres:
+
+```text
+similarity = sum(min(scoreA, scoreB) for shared entities)
+             / sum(max(scoreA, scoreB) for all entities in either profile)
+```
+
+This rewards members who share high-scoring music preferences, not just raw overlap counts.
+
+The most similar pair becomes `groupSummary.mostAlignedPair`.
+
+### 6. Member profiles and genre diversity
+
+Each member profile stores:
+
+- `userId`
+- `totalScore`, based on track and artist scores
+- `genreScores`
+- `trackCount`
+- `artistCount`
+
+Genre diversity is calculated as entropy over the member's genre score distribution:
+
+```text
+entropy = -sum(p * ln(p))
+```
+
+where `p` is the genre score divided by the member's total score. The member with the highest entropy becomes `groupSummary.mostDiverseMember`.
+
+Stored member profiles keep only the top 20 genres by score.
+
+### 7. Most shared genres
+
+The workflow aggregates genre scores across members, sorts genres by:
+
+1. `memberCount` descending
+2. total genre score descending
+
+It keeps the top 10 genres that are shared by at least 2 members as `groupSummary.mostSharedGenres`.
+
+## Generated analysis shape
+
+The saved `party.analysisData` contains:
+
+```ts
+type PartyAnalysisResult = {
+  storyClusters: StoryCluster[];
+  pairwise: PairwiseComparison[];
+  groupSummary: {
+    totalMembers: number;
+    mostSharedGenres: GenreEntityScore[];
+    mostDiverseMember: GenreDiversity | null;
+    mostAlignedPair: PairwiseComparison | null;
+  };
+  memberProfiles: MemberProfile[];
+};
+```
+
+This JSON is intentionally denormalized and compact so the question generators can work without recomputing party analytics during each round.
+
+## Question generation
+
+Each quiz round calls `generatePartyQuestion`, passing:
+
+- database client
+- party id
+- current `QuizState`
+- saved `party.analysisData`
+- question index
+
+The generator fetches current party members, chooses a question type order, asks each question builder for a valid question, then attaches a suitable song.
+
+### 1. Question type ordering
+
+The possible question types are:
+
+- `audio-metadata`
+- `social`
+- `numeric`
+
+For each round, the generator randomizes their priority using base weights and recent-history penalties:
+
+| Type | Base weight |
+| --- | ---: |
+| `audio-metadata` | `1` |
+| `numeric` | `0.55` |
+| `social` | `0.1` |
+
+A random value up to `0.35` is added. Each occurrence of the same type in the last 3 questions subtracts `0.45`.
+
+This means audio metadata questions are preferred by default, but the generator avoids repeating the same category too often.
+
+### 2. Candidate generation
+
+Each question builder creates multiple candidates, then `pickQuestionCandidate` selects one.
+
+Candidates contain:
+
+- `key` — unique question identity
+- `subjectKey` — the entity being asked about, such as `track:...`, `artist:...`, `genre:...`, `member:...`, or `pair:...`
+- optional fairness metadata
+- the partial question object
+
+A candidate is rejected if the quiz history already contains the same normalized:
+
+- question key
+- subject key
+- question text
+
+This prevents repeated questions and repeated subjects across the quiz.
+
+### 3. Fairness weighting
+
+Tracks and artists are sorted by fairness before they are used as question subjects.
+
+Fairness is derived from an entity's `memberScores`:
+
+- `memberIds` — party members connected to the entity
+- `memberCount` — how many members are connected
+- `score` — total member score for shared entities
+
+For single-member entities, the fairness score is negative history usage for that member. This prevents the quiz from repeatedly focusing on one person when only single-member subjects are available.
+
+Question candidate weight is:
+
+```text
+if no fairness data:
+  weight = 8
+else:
+  weight = 8 + memberCount * 20 + clamp(score, 0, 100) / 20
+```
+
+Weighted random selection is then used. Shared, high-scoring entities are therefore more likely, but not guaranteed, to be selected.
+
+### 4. Audio metadata questions
+
+`buildAudioMetadataQuestion` produces choice questions about party music metadata. It uses:
+
+- most shared genres
+- fair tracks from story clusters
+- fair artists from story clusters
+- detailed track rows from the database for album, artist, release date, and duration metadata
+
+Examples of generated questions include:
+
+- `What song is currently playing?`
+- `Which genre is shared by the most party members?`
+- `Which genre is ranked #<rank> in the party's shared genres?`
+- `Which artist is ranked highest in the shared audio data?`
+- `Which artist is ranked #<rank> in the shared audio data?`
+- `Which track is ranked highest across the party?`
+- `Which track is ranked #<rank> in the party analysis?`
+- `Which artist appears on "<album>"?`
+- `Which of these tracks came out first?`
+- `Which of these tracks came out most recently?`
+- `What's the longest track by <artist>?`
+- `Who performs "<track>"?`
+- `What is the name of this track by <artist>?`
+- `"<track>" appears on which album?`
+
+Options are built from relevant candidate pools, deduplicated, shuffled, and only emitted when there are enough valid options.
+
+### 5. Social questions
+
+`buildSocialQuestion` produces choice questions about players and relationships in the party.
+
+Examples include:
+
+- `Who is leading the quiz right now?`
+- `Who looks like the most diverse listener in the party?`
+- `Who listens the most to "<track>"?`
+- `Which two players share the most musical taste?`
+
+Social questions require enough party members for the question to make sense:
+
+- leader/diverse/top-listener questions need at least 2 members
+- most-aligned-pair questions need at least 3 members
+
+The top-listener question prefers shared tracks when shared tracks exist, so the quiz does not unnecessarily focus on solo-only data.
+
+### 6. Numeric questions
+
+`buildNumericQuestion` produces numeric-answer questions. Numeric questions are scored by closeness during the quiz rather than by exact choice index.
+
+Examples include:
+
+- `What's the release year of <album or track>?`
+- `What year did "<track>" come out?`
+- `What year did <artist>'s first party track come out?`
+- `For how many players in the party is "<track>" a top track?`
+- `How many players in the party have "<artist>" as a favourite artist?`
+
+Release-year questions use a range around the correct year, capped at the current year and widened to a minimum span. Count questions use a range from `0` to the current party size.
+
+### 7. Question timing
+
+Every generated question is wrapped by `buildQuestionWindow`, which sets:
+
+```ts
+startTimestamp = Date.now();
+endTimestamp = startTimestamp + 60_000;
+```
+
+These timestamps are used by the quiz workflow to decide when answer collection times out.
+
+## Song selection
+
+After a question candidate is selected, `selectQuestionSong` chooses an audio track to attach to the question.
+
+Song candidates come from:
+
+1. the song already attached by the question builder
+2. the question subject, if the subject is a track or artist
+3. people mentioned by the question, when member-specific subjects can imply a relevant song
+4. fair tracks from the story clusters
+5. top party songs queried from member top tracks
+
+The selector avoids reusing songs from previous quiz rounds when possible by checking prior `song.platform_id` values.
+
+Some question types should keep or prefer their subject song:
+
+- Questions where hearing the exact song is necessary keep the subject song.
+- Questions where the song helps but is not mandatory prefer a relevant fresh song.
+- Other questions prefer fair, fresh, adjacent party songs so audio does not reveal the answer too directly.
+
+## Quiz response and scoring
+
+For each question, the quiz workflow waits until all current party members answer or the question deadline is reached. Missing answers are recorded with `selected: -1` and score 0.
+
+Choice questions are scored exactly:
+
+```text
+pointsGained = question.points if selected option index equals question.correct
+pointsGained = 0 otherwise
+```
+
+Numeric questions are scored relatively by answer distance:
+
+1. Ignore no-answer responses for ranking.
+2. Compute absolute distance from the correct numeric value.
+3. Group equal distances together.
+4. Award the closest group full points and linearly decrease points for later distance groups.
+5. If all numeric answers are equally distant, only exact answers receive points.
+
+After each round, scores are added to `quizState.scores`, the quiz enters `review`, then continues to the next question. After the final question, the quiz status becomes `results` and the party status is marked `ended`.
+
+## Design goals
+
+The current algorithm is optimized for:
+
+- **Shared relevance:** Prefer content that represents multiple party members.
+- **Personal variety:** Avoid repeatedly targeting the same member or subject.
+- **Freshness:** Avoid repeated question keys, subjects, text, and songs.
+- **Playable trivia:** Only emit questions with enough options, valid text, and usable metadata.
+- **Low round latency:** Do expensive aggregation once in party analysis, then use compact JSON during quiz rounds.
diff --git a/api/POSSIBLE_QUESTIONS.md b/api/POSSIBLE_QUESTIONS.md
new file mode 100644
index 0000000..55c12f9
--- /dev/null
+++ b/api/POSSIBLE_QUESTIONS.md
@@ -0,0 +1,81 @@
+# Possible Quiz Questions
+
+This document lists every question template currently generated by the API.
+
+Relevant files:
+
+- `src/party/audio-question-generator.ts`
+- `src/party/social-question-generator.ts`
+- `src/party/numeric-question-generator.ts`
+
+Notes:
+
+- Dynamic placeholders are shown with angle brackets, for example `<track>` or `<artist>`.
+- Every generated question is worth `10` points.
+- Choice questions generate shuffled answer options.
+- Numeric questions generate a numeric answer range instead of options.
+- Not every template is available in every party. A question is only emitted when the required analytics, metadata, answer options, and party size are available.
+
+## Audio metadata choice questions
+
+Audio metadata questions are generated by `buildAudioMetadataQuestion`.
+
+| Template | Correct answer | Required data / condition |
+| --- | --- | --- |
+| `What song is currently playing?` | Current/selected top song title | A resolvable top song and enough track title options |
+| `Which genre is shared by the most party members?` | The highest-ranked shared genre | Shared genre analytics and enough genre options |
+| `Which genre is ranked #<rank> in the party's shared genres?` | The genre at that exact rank | Shared genre analytics and enough genre options |
+| `Which artist is ranked highest in the shared audio data?` | The highest-ranked fair artist | Fair artist analytics and enough artist options |
+| `Which artist is ranked #<rank> in the shared audio data?` | The artist at that exact rank | Fair artist analytics and enough artist options |
+| `Which track is ranked highest across the party?` | The top fair shared track | Top fair track is shared by more than one member and enough track options exist |
+| `Which track is ranked #<rank> in the party analysis?` | The track at that exact rank | Fair track analytics and enough track options |
+| `Which artist appears on "<album>"?` | An artist from the album | A top track with album name and artist metadata, plus enough artist options |
+| `Which of these tracks came out first?` | Earliest released track among detailed top tracks | At least two detailed top tracks with album release dates |
+| `Which of these tracks came out most recently?` | Latest released track among detailed top tracks | At least two detailed top tracks with album release dates and distinct earliest/latest tracks |
+| `What's the longest track by <artist>?` | Longest detailed top track for that artist | At least two detailed top tracks by the same artist with durations |
+| `Who performs "<track>"?` | The track's artist | A top track with artist metadata and enough artist options |
+| `What is the name of this track by <artist>?` | The track title | A top track with artist metadata and enough track title options; song title is hidden |
+| `Which song is this audio clip from?` | Current/selected top song title | A top song exists, differs from the iterated top track, and enough track title options exist; song title is hidden |
+| `"<track>" appears on which album?` | The track's album | A top track with album metadata and enough album options |
+
+## Social choice questions
+
+Social questions are generated by `buildSocialQuestion`.
+
+| Template | Correct answer | Required data / condition |
+| --- | --- | --- |
+| `Who is leading the quiz right now?` | Current quiz leader | At least 2 members and a clear leader in `quizState.scores` |
+| `Who looks like the most diverse listener in the party?` | Member with highest genre entropy | At least 2 members and `groupSummary.mostDiverseMember` is available |
+| `Who listens the most to "<track>"?` | Member with the highest score for that track | At least 2 members, a fair top track, and a top listener for that track |
+| `Which two players share the most musical taste?` | Pair from `groupSummary.mostAlignedPair` | At least 3 members and a resolvable most-aligned pair |
+
+## Numeric questions
+
+Numeric questions are generated by `buildNumericQuestion`.
+
+| Template | Correct answer | Range | Required data / condition |
+| --- | --- | --- | --- |
+| `What's the release year of <album-or-track>?` | Album release year | Release-year range around correct year | A fair top track with an album release date |
+| `What year did "<track>" come out?` | Track album release year | Release-year range around correct year | A detailed fair top track with an album release date |
+| `What year did <artist>'s first party track come out?` | Release year of the earliest party track by that artist | Release-year range around correct year | At least two detailed top tracks by the same artist with release dates |
+| `For how many players in the party is "<track>" a top track?` | Count of party members whose top tracks include the track | `0` to party member count | A fair top track that resolves to a database track and appears for at least one member |
+| `How many players in the party have "<artist>" as a favourite artist?` | Count of party members whose top artists include the artist | `0` to party member count | A fair top artist that resolves to a database artist and appears for at least one member |
+
+## Template count
+
+Current total: **24 question templates**.
+
+- Audio metadata: 15
+- Social: 4
+- Numeric: 5
+
+## Selection behavior
+
+The generator does not walk this list in a fixed order. For each round:
+
+1. It prioritizes a question family: `audio-metadata`, `numeric`, or `social`.
+2. The priority is randomized but penalizes question families used in the last 3 rounds.
+3. Each family builds all valid candidates it can from the current party data.
+4. Candidates with repeated question keys, subject keys, or text are filtered out.
+5. A weighted random candidate is selected, with higher weight for subjects shared by more party members.
+6. A song is attached when possible, preferring fresh songs that were not used in prior rounds.