From 6c7854edd4a714527e226390a069464aa8084532 Mon Sep 17 00:00:00 2001 From: Daniel Bulant Date: Sat, 20 Jun 2026 22:51:51 +0200 Subject: [PATCH] gen docs --- api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md | 374 ++++++++++++++++++ api/POSSIBLE_QUESTIONS.md | 81 ++++ 2 files changed, 455 insertions(+) create mode 100644 api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md create mode 100644 api/POSSIBLE_QUESTIONS.md diff --git a/api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md b/api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md new file mode 100644 index 0000000..f9ab904 --- /dev/null +++ b/api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md @@ -0,0 +1,374 @@ +# Party Analysis and Question Generation Algorithm + +This document describes how the API analyzes a party's Spotify data and turns that analysis into quiz questions. + +Relevant implementation files: + +- `src/workflows/party-analysis.ts` — computes and stores `party.analysisData`. +- `src/workflows/quiz.ts` — starts analysis, runs the quiz loop, and scores answers. +- `src/party/question-generator.ts` — chooses a question type and attaches a song. +- `src/party/audio-question-generator.ts` — builds audio metadata choice questions. +- `src/party/social-question-generator.ts` — builds social choice questions. +- `src/party/numeric-question-generator.ts` — builds numeric questions. +- `src/party/question-utils.ts` — shared fairness, deduplication, option, and song selection logic. + +## High-level flow + +```mermaid +flowchart TD + A[Quiz starts] --> B[Analyze party] + B --> C[Store party.analysisData] + C --> D[Initialize quiz state] + D --> E[Generate next question] + E --> F[Publish current question in party data] + F --> G[Wait for player answers or timeout] + G --> H[Score round] + H --> I[Review period] + I --> J{More questions?} + J -->|Yes| E + J -->|No| K[Show results and mark party ended] +``` + +When a quiz starts, `QuizWorkflow.startQuiz` first runs `partyAnalysisWorkflow.analyzeParty(partyId)`. The generated analysis is saved to the `party.analysisData` JSON column and then reused for every question in the quiz. + +The quiz currently asks up to `TOTAL_QUESTIONS = 5` questions. Each question has a 60 second answer window, followed by a 5 second review period. + +## Party analysis + +Party analysis converts each member's listening data into comparable track, artist, and genre scores. The result is a compact party-level summary designed for fast question generation. + +### 1. Minimum party size + +If a party has fewer than 2 members, analysis is saved as empty: + +- `storyClusters: []` +- `pairwise: []` +- `groupSummary.totalMembers` +- `groupSummary.mostSharedGenres: []` +- `groupSummary.mostDiverseMember: null` +- `groupSummary.mostAlignedPair: null` +- `memberProfiles: []` + +The workflow returns `analyzed: false` in that case. + +### 2. Per-member scoring + +For each party member, the analysis workflow fetches several Spotify-derived tables and accumulates scores into three maps: + +- tracks +- artists +- genres + +The score inputs are: + +| Source | Scoring | +| --- | --- | +| Medium-term top tracks | `MAX_POSITION - position + 1`, with `MAX_POSITION = 50` | +| Saved tracks | track `+10`, artists `+5`, genres `+2.5` | +| Playback history | track `+5` if played in last 24h, `+3` if played in last week, otherwise `+1`; artists get half, genres get quarter | +| Medium-term top artists | `MAX_POSITION - position + 1` | +| Followed artists | artist `+10`, genres `+10` | +| Saved albums | album artists `+5`, genres `+2.5` | + +Top track and top artist scores are position-weighted, so rank 1 contributes more than rank 50. Saved and followed items add fixed preference signals. Playback history adds recency-weighted listening signals. + +### 3. Party-level entity maps + +After all member scores are fetched, the workflow builds one map per entity type: + +- `TrackEntityScore` +- `ArtistEntityScore` +- `GenreEntityScore` + +Each entity contains: + +- entity id and display name +- track artist names and album name, for tracks +- `memberScores`, the list of members who contributed to the entity and their score +- `memberCount`, the number of party members represented by that entity + +These maps make it possible to tell which songs, artists, and genres are shared by multiple people and which are strongly associated with one person. + +### 4. Story clusters + +Story clusters group entities by the exact subset of party members that share them. + +For example, if Alice and Bob both have the same genre, that genre goes into the cluster keyed by `Alice|Bob`. If all party members share a track, that track goes into the all-members cluster. + +Clusters are sorted by: + +1. all-members cluster first +2. larger `memberCount` +3. total track score in the cluster + +Within each cluster, tracks, artists, and genres are sorted by total score descending. + +The stored analysis is compacted to: + +- top 8 story clusters +- top 20 tracks, artists, and genres per cluster + +### 5. Pairwise similarity + +For every pair of party members, the workflow computes: + +- `sharedTracks` +- `sharedArtists` +- `sharedGenres` +- `similarity` + +Similarity uses a weighted Jaccard-style score across tracks, artists, and genres: + +```text +similarity = sum(min(scoreA, scoreB) for shared entities) + / sum(max(scoreA, scoreB) for all entities in either profile) +``` + +This rewards members who share high-scoring music preferences, not just raw overlap counts. + +The most similar pair becomes `groupSummary.mostAlignedPair`. + +### 6. Member profiles and genre diversity + +Each member profile stores: + +- `userId` +- `totalScore`, based on track and artist scores +- `genreScores` +- `trackCount` +- `artistCount` + +Genre diversity is calculated as entropy over the member's genre score distribution: + +```text +entropy = -sum(p * ln(p)) +``` + +where `p` is the genre score divided by the member's total score. The member with the highest entropy becomes `groupSummary.mostDiverseMember`. + +Stored member profiles keep only the top 20 genres by score. + +### 7. Most shared genres + +The workflow aggregates genre scores across members, sorts genres by: + +1. `memberCount` descending +2. total genre score descending + +It keeps the top 10 genres that are shared by at least 2 members as `groupSummary.mostSharedGenres`. + +## Generated analysis shape + +The saved `party.analysisData` contains: + +```ts +type PartyAnalysisResult = { + storyClusters: StoryCluster[]; + pairwise: PairwiseComparison[]; + groupSummary: { + totalMembers: number; + mostSharedGenres: GenreEntityScore[]; + mostDiverseMember: GenreDiversity | null; + mostAlignedPair: PairwiseComparison | null; + }; + memberProfiles: MemberProfile[]; +}; +``` + +This JSON is intentionally denormalized and compact so the question generators can work without recomputing party analytics during each round. + +## Question generation + +Each quiz round calls `generatePartyQuestion`, passing: + +- database client +- party id +- current `QuizState` +- saved `party.analysisData` +- question index + +The generator fetches current party members, chooses a question type order, asks each question builder for a valid question, then attaches a suitable song. + +### 1. Question type ordering + +The possible question types are: + +- `audio-metadata` +- `social` +- `numeric` + +For each round, the generator randomizes their priority using base weights and recent-history penalties: + +| Type | Base weight | +| --- | ---: | +| `audio-metadata` | `1` | +| `numeric` | `0.55` | +| `social` | `0.1` | + +A random value up to `0.35` is added. Each occurrence of the same type in the last 3 questions subtracts `0.45`. + +This means audio metadata questions are preferred by default, but the generator avoids repeating the same category too often. + +### 2. Candidate generation + +Each question builder creates multiple candidates, then `pickQuestionCandidate` selects one. + +Candidates contain: + +- `key` — unique question identity +- `subjectKey` — the entity being asked about, such as `track:...`, `artist:...`, `genre:...`, `member:...`, or `pair:...` +- optional fairness metadata +- the partial question object + +A candidate is rejected if the quiz history already contains the same normalized: + +- question key +- subject key +- question text + +This prevents repeated questions and repeated subjects across the quiz. + +### 3. Fairness weighting + +Tracks and artists are sorted by fairness before they are used as question subjects. + +Fairness is derived from an entity's `memberScores`: + +- `memberIds` — party members connected to the entity +- `memberCount` — how many members are connected +- `score` — total member score for shared entities + +For single-member entities, the fairness score is negative history usage for that member. This prevents the quiz from repeatedly focusing on one person when only single-member subjects are available. + +Question candidate weight is: + +```text +if no fairness data: + weight = 8 +else: + weight = 8 + memberCount * 20 + clamp(score, 0, 100) / 20 +``` + +Weighted random selection is then used. Shared, high-scoring entities are therefore more likely, but not guaranteed, to be selected. + +### 4. Audio metadata questions + +`buildAudioMetadataQuestion` produces choice questions about party music metadata. It uses: + +- most shared genres +- fair tracks from story clusters +- fair artists from story clusters +- detailed track rows from the database for album, artist, release date, and duration metadata + +Examples of generated questions include: + +- `What song is currently playing?` +- `Which genre is shared by the most party members?` +- `Which genre is ranked # in the party's shared genres?` +- `Which artist is ranked highest in the shared audio data?` +- `Which artist is ranked # in the shared audio data?` +- `Which track is ranked highest across the party?` +- `Which track is ranked # in the party analysis?` +- `Which artist appears on ""?` +- `Which of these tracks came out first?` +- `Which of these tracks came out most recently?` +- `What's the longest track by ?` +- `Who performs ""?` +- `What is the name of this track by ?` +- `"" appears on which album?` + +Options are built from relevant candidate pools, deduplicated, shuffled, and only emitted when there are enough valid options. + +### 5. Social questions + +`buildSocialQuestion` produces choice questions about players and relationships in the party. + +Examples include: + +- `Who is leading the quiz right now?` +- `Who looks like the most diverse listener in the party?` +- `Who listens the most to ""?` +- `Which two players share the most musical taste?` + +Social questions require enough party members for the question to make sense: + +- leader/diverse/top-listener questions need at least 2 members +- most-aligned-pair questions need at least 3 members + +The top-listener question prefers shared tracks when shared tracks exist, so the quiz does not unnecessarily focus on solo-only data. + +### 6. Numeric questions + +`buildNumericQuestion` produces numeric-answer questions. Numeric questions are scored by closeness during the quiz rather than by exact choice index. + +Examples include: + +- `What's the release year of ?` +- `What year did "" come out?` +- `What year did 's first party track come out?` +- `For how many players in the party is "" a top track?` +- `How many players in the party have "" as a favourite artist?` + +Release-year questions use a range around the correct year, capped at the current year and widened to a minimum span. Count questions use a range from `0` to the current party size. + +### 7. Question timing + +Every generated question is wrapped by `buildQuestionWindow`, which sets: + +```ts +startTimestamp = Date.now(); +endTimestamp = startTimestamp + 60_000; +``` + +These timestamps are used by the quiz workflow to decide when answer collection times out. + +## Song selection + +After a question candidate is selected, `selectQuestionSong` chooses an audio track to attach to the question. + +Song candidates come from: + +1. the song already attached by the question builder +2. the question subject, if the subject is a track or artist +3. people mentioned by the question, when member-specific subjects can imply a relevant song +4. fair tracks from the story clusters +5. top party songs queried from member top tracks + +The selector avoids reusing songs from previous quiz rounds when possible by checking prior `song.platform_id` values. + +Some question types should keep or prefer their subject song: + +- Questions where hearing the exact song is necessary keep the subject song. +- Questions where the song helps but is not mandatory prefer a relevant fresh song. +- Other questions prefer fair, fresh, adjacent party songs so audio does not reveal the answer too directly. + +## Quiz response and scoring + +For each question, the quiz workflow waits until all current party members answer or the question deadline is reached. Missing answers are recorded with `selected: -1` and score 0. + +Choice questions are scored exactly: + +```text +pointsGained = question.points if selected option index equals question.correct +pointsGained = 0 otherwise +``` + +Numeric questions are scored relatively by answer distance: + +1. Ignore no-answer responses for ranking. +2. Compute absolute distance from the correct numeric value. +3. Group equal distances together. +4. Award the closest group full points and linearly decrease points for later distance groups. +5. If all numeric answers are equally distant, only exact answers receive points. + +After each round, scores are added to `quizState.scores`, the quiz enters `review`, then continues to the next question. After the final question, the quiz status becomes `results` and the party status is marked `ended`. + +## Design goals + +The current algorithm is optimized for: + +- **Shared relevance:** Prefer content that represents multiple party members. +- **Personal variety:** Avoid repeatedly targeting the same member or subject. +- **Freshness:** Avoid repeated question keys, subjects, text, and songs. +- **Playable trivia:** Only emit questions with enough options, valid text, and usable metadata. +- **Low round latency:** Do expensive aggregation once in party analysis, then use compact JSON during quiz rounds. diff --git a/api/POSSIBLE_QUESTIONS.md b/api/POSSIBLE_QUESTIONS.md new file mode 100644 index 0000000..55c12f9 --- /dev/null +++ b/api/POSSIBLE_QUESTIONS.md @@ -0,0 +1,81 @@ +# Possible Quiz Questions + +This document lists every question template currently generated by the API. + +Relevant files: + +- `src/party/audio-question-generator.ts` +- `src/party/social-question-generator.ts` +- `src/party/numeric-question-generator.ts` + +Notes: + +- Dynamic placeholders are shown with angle brackets, for example `` or ``. +- Every generated question is worth `10` points. +- Choice questions generate shuffled answer options. +- Numeric questions generate a numeric answer range instead of options. +- Not every template is available in every party. A question is only emitted when the required analytics, metadata, answer options, and party size are available. + +## Audio metadata choice questions + +Audio metadata questions are generated by `buildAudioMetadataQuestion`. + +| Template | Correct answer | Required data / condition | +| --- | --- | --- | +| `What song is currently playing?` | Current/selected top song title | A resolvable top song and enough track title options | +| `Which genre is shared by the most party members?` | The highest-ranked shared genre | Shared genre analytics and enough genre options | +| `Which genre is ranked # in the party's shared genres?` | The genre at that exact rank | Shared genre analytics and enough genre options | +| `Which artist is ranked highest in the shared audio data?` | The highest-ranked fair artist | Fair artist analytics and enough artist options | +| `Which artist is ranked # in the shared audio data?` | The artist at that exact rank | Fair artist analytics and enough artist options | +| `Which track is ranked highest across the party?` | The top fair shared track | Top fair track is shared by more than one member and enough track options exist | +| `Which track is ranked # in the party analysis?` | The track at that exact rank | Fair track analytics and enough track options | +| `Which artist appears on ""?` | An artist from the album | A top track with album name and artist metadata, plus enough artist options | +| `Which of these tracks came out first?` | Earliest released track among detailed top tracks | At least two detailed top tracks with album release dates | +| `Which of these tracks came out most recently?` | Latest released track among detailed top tracks | At least two detailed top tracks with album release dates and distinct earliest/latest tracks | +| `What's the longest track by ?` | Longest detailed top track for that artist | At least two detailed top tracks by the same artist with durations | +| `Who performs ""?` | The track's artist | A top track with artist metadata and enough artist options | +| `What is the name of this track by ?` | The track title | A top track with artist metadata and enough track title options; song title is hidden | +| `Which song is this audio clip from?` | Current/selected top song title | A top song exists, differs from the iterated top track, and enough track title options exist; song title is hidden | +| `"" appears on which album?` | The track's album | A top track with album metadata and enough album options | + +## Social choice questions + +Social questions are generated by `buildSocialQuestion`. + +| Template | Correct answer | Required data / condition | +| --- | --- | --- | +| `Who is leading the quiz right now?` | Current quiz leader | At least 2 members and a clear leader in `quizState.scores` | +| `Who looks like the most diverse listener in the party?` | Member with highest genre entropy | At least 2 members and `groupSummary.mostDiverseMember` is available | +| `Who listens the most to ""?` | Member with the highest score for that track | At least 2 members, a fair top track, and a top listener for that track | +| `Which two players share the most musical taste?` | Pair from `groupSummary.mostAlignedPair` | At least 3 members and a resolvable most-aligned pair | + +## Numeric questions + +Numeric questions are generated by `buildNumericQuestion`. + +| Template | Correct answer | Range | Required data / condition | +| --- | --- | --- | --- | +| `What's the release year of ?` | Album release year | Release-year range around correct year | A fair top track with an album release date | +| `What year did "" come out?` | Track album release year | Release-year range around correct year | A detailed fair top track with an album release date | +| `What year did 's first party track come out?` | Release year of the earliest party track by that artist | Release-year range around correct year | At least two detailed top tracks by the same artist with release dates | +| `For how many players in the party is "" a top track?` | Count of party members whose top tracks include the track | `0` to party member count | A fair top track that resolves to a database track and appears for at least one member | +| `How many players in the party have "" as a favourite artist?` | Count of party members whose top artists include the artist | `0` to party member count | A fair top artist that resolves to a database artist and appears for at least one member | + +## Template count + +Current total: **24 question templates**. + +- Audio metadata: 15 +- Social: 4 +- Numeric: 5 + +## Selection behavior + +The generator does not walk this list in a fixed order. For each round: + +1. It prioritizes a question family: `audio-metadata`, `numeric`, or `social`. +2. The priority is randomized but penalizes question families used in the last 3 rounds. +3. Each family builds all valid candidates it can from the current party data. +4. Candidates with repeated question keys, subject keys, or text are filtered out. +5. A weighted random candidate is selected, with higher weight for subjects shared by more party members. +6. A song is attached when possible, preferring fresh songs that were not used in prior rounds.