This commit is contained in:
Daniel Bulant 2026-06-20 22:51:51 +02:00
parent dd4e776175
commit 6c7854edd4
No known key found for this signature in database
2 changed files with 455 additions and 0 deletions

View file

@ -0,0 +1,374 @@
# Party Analysis and Question Generation Algorithm
This document describes how the API analyzes a party's Spotify data and turns that analysis into quiz questions.
Relevant implementation files:
- `src/workflows/party-analysis.ts` — computes and stores `party.analysisData`.
- `src/workflows/quiz.ts` — starts analysis, runs the quiz loop, and scores answers.
- `src/party/question-generator.ts` — chooses a question type and attaches a song.
- `src/party/audio-question-generator.ts` — builds audio metadata choice questions.
- `src/party/social-question-generator.ts` — builds social choice questions.
- `src/party/numeric-question-generator.ts` — builds numeric questions.
- `src/party/question-utils.ts` — shared fairness, deduplication, option, and song selection logic.
## High-level flow
```mermaid
flowchart TD
A[Quiz starts] --> B[Analyze party]
B --> C[Store party.analysisData]
C --> D[Initialize quiz state]
D --> E[Generate next question]
E --> F[Publish current question in party data]
F --> G[Wait for player answers or timeout]
G --> H[Score round]
H --> I[Review period]
I --> J{More questions?}
J -->|Yes| E
J -->|No| K[Show results and mark party ended]
```
When a quiz starts, `QuizWorkflow.startQuiz` first runs `partyAnalysisWorkflow.analyzeParty(partyId)`. The generated analysis is saved to the `party.analysisData` JSON column and then reused for every question in the quiz.
The quiz currently asks up to `TOTAL_QUESTIONS = 5` questions. Each question has a 60 second answer window, followed by a 5 second review period.
## Party analysis
Party analysis converts each member's listening data into comparable track, artist, and genre scores. The result is a compact party-level summary designed for fast question generation.
### 1. Minimum party size
If a party has fewer than 2 members, analysis is saved as empty:
- `storyClusters: []`
- `pairwise: []`
- `groupSummary.totalMembers`
- `groupSummary.mostSharedGenres: []`
- `groupSummary.mostDiverseMember: null`
- `groupSummary.mostAlignedPair: null`
- `memberProfiles: []`
The workflow returns `analyzed: false` in that case.
### 2. Per-member scoring
For each party member, the analysis workflow fetches several Spotify-derived tables and accumulates scores into three maps:
- tracks
- artists
- genres
The score inputs are:
| Source | Scoring |
| --- | --- |
| Medium-term top tracks | `MAX_POSITION - position + 1`, with `MAX_POSITION = 50` |
| Saved tracks | track `+10`, artists `+5`, genres `+2.5` |
| Playback history | track `+5` if played in last 24h, `+3` if played in last week, otherwise `+1`; artists get half, genres get quarter |
| Medium-term top artists | `MAX_POSITION - position + 1` |
| Followed artists | artist `+10`, genres `+10` |
| Saved albums | album artists `+5`, genres `+2.5` |
Top track and top artist scores are position-weighted, so rank 1 contributes more than rank 50. Saved and followed items add fixed preference signals. Playback history adds recency-weighted listening signals.
### 3. Party-level entity maps
After all member scores are fetched, the workflow builds one map per entity type:
- `TrackEntityScore`
- `ArtistEntityScore`
- `GenreEntityScore`
Each entity contains:
- entity id and display name
- track artist names and album name, for tracks
- `memberScores`, the list of members who contributed to the entity and their score
- `memberCount`, the number of party members represented by that entity
These maps make it possible to tell which songs, artists, and genres are shared by multiple people and which are strongly associated with one person.
### 4. Story clusters
Story clusters group entities by the exact subset of party members that share them.
For example, if Alice and Bob both have the same genre, that genre goes into the cluster keyed by `Alice|Bob`. If all party members share a track, that track goes into the all-members cluster.
Clusters are sorted by:
1. all-members cluster first
2. larger `memberCount`
3. total track score in the cluster
Within each cluster, tracks, artists, and genres are sorted by total score descending.
The stored analysis is compacted to:
- top 8 story clusters
- top 20 tracks, artists, and genres per cluster
### 5. Pairwise similarity
For every pair of party members, the workflow computes:
- `sharedTracks`
- `sharedArtists`
- `sharedGenres`
- `similarity`
Similarity uses a weighted Jaccard-style score across tracks, artists, and genres:
```text
similarity = sum(min(scoreA, scoreB) for shared entities)
/ sum(max(scoreA, scoreB) for all entities in either profile)
```
This rewards members who share high-scoring music preferences, not just raw overlap counts.
The most similar pair becomes `groupSummary.mostAlignedPair`.
### 6. Member profiles and genre diversity
Each member profile stores:
- `userId`
- `totalScore`, based on track and artist scores
- `genreScores`
- `trackCount`
- `artistCount`
Genre diversity is calculated as entropy over the member's genre score distribution:
```text
entropy = -sum(p * ln(p))
```
where `p` is the genre score divided by the member's total score. The member with the highest entropy becomes `groupSummary.mostDiverseMember`.
Stored member profiles keep only the top 20 genres by score.
### 7. Most shared genres
The workflow aggregates genre scores across members, sorts genres by:
1. `memberCount` descending
2. total genre score descending
It keeps the top 10 genres that are shared by at least 2 members as `groupSummary.mostSharedGenres`.
## Generated analysis shape
The saved `party.analysisData` contains:
```ts
type PartyAnalysisResult = {
storyClusters: StoryCluster[];
pairwise: PairwiseComparison[];
groupSummary: {
totalMembers: number;
mostSharedGenres: GenreEntityScore[];
mostDiverseMember: GenreDiversity | null;
mostAlignedPair: PairwiseComparison | null;
};
memberProfiles: MemberProfile[];
};
```
This JSON is intentionally denormalized and compact so the question generators can work without recomputing party analytics during each round.
## Question generation
Each quiz round calls `generatePartyQuestion`, passing:
- database client
- party id
- current `QuizState`
- saved `party.analysisData`
- question index
The generator fetches current party members, chooses a question type order, asks each question builder for a valid question, then attaches a suitable song.
### 1. Question type ordering
The possible question types are:
- `audio-metadata`
- `social`
- `numeric`
For each round, the generator randomizes their priority using base weights and recent-history penalties:
| Type | Base weight |
| --- | ---: |
| `audio-metadata` | `1` |
| `numeric` | `0.55` |
| `social` | `0.1` |
A random value up to `0.35` is added. Each occurrence of the same type in the last 3 questions subtracts `0.45`.
This means audio metadata questions are preferred by default, but the generator avoids repeating the same category too often.
### 2. Candidate generation
Each question builder creates multiple candidates, then `pickQuestionCandidate` selects one.
Candidates contain:
- `key` — unique question identity
- `subjectKey` — the entity being asked about, such as `track:...`, `artist:...`, `genre:...`, `member:...`, or `pair:...`
- optional fairness metadata
- the partial question object
A candidate is rejected if the quiz history already contains the same normalized:
- question key
- subject key
- question text
This prevents repeated questions and repeated subjects across the quiz.
### 3. Fairness weighting
Tracks and artists are sorted by fairness before they are used as question subjects.
Fairness is derived from an entity's `memberScores`:
- `memberIds` — party members connected to the entity
- `memberCount` — how many members are connected
- `score` — total member score for shared entities
For single-member entities, the fairness score is negative history usage for that member. This prevents the quiz from repeatedly focusing on one person when only single-member subjects are available.
Question candidate weight is:
```text
if no fairness data:
weight = 8
else:
weight = 8 + memberCount * 20 + clamp(score, 0, 100) / 20
```
Weighted random selection is then used. Shared, high-scoring entities are therefore more likely, but not guaranteed, to be selected.
### 4. Audio metadata questions
`buildAudioMetadataQuestion` produces choice questions about party music metadata. It uses:
- most shared genres
- fair tracks from story clusters
- fair artists from story clusters
- detailed track rows from the database for album, artist, release date, and duration metadata
Examples of generated questions include:
- `What song is currently playing?`
- `Which genre is shared by the most party members?`
- `Which genre is ranked #<rank> in the party's shared genres?`
- `Which artist is ranked highest in the shared audio data?`
- `Which artist is ranked #<rank> in the shared audio data?`
- `Which track is ranked highest across the party?`
- `Which track is ranked #<rank> in the party analysis?`
- `Which artist appears on "<album>"?`
- `Which of these tracks came out first?`
- `Which of these tracks came out most recently?`
- `What's the longest track by <artist>?`
- `Who performs "<track>"?`
- `What is the name of this track by <artist>?`
- `"<track>" appears on which album?`
Options are built from relevant candidate pools, deduplicated, shuffled, and only emitted when there are enough valid options.
### 5. Social questions
`buildSocialQuestion` produces choice questions about players and relationships in the party.
Examples include:
- `Who is leading the quiz right now?`
- `Who looks like the most diverse listener in the party?`
- `Who listens the most to "<track>"?`
- `Which two players share the most musical taste?`
Social questions require enough party members for the question to make sense:
- leader/diverse/top-listener questions need at least 2 members
- most-aligned-pair questions need at least 3 members
The top-listener question prefers shared tracks when shared tracks exist, so the quiz does not unnecessarily focus on solo-only data.
### 6. Numeric questions
`buildNumericQuestion` produces numeric-answer questions. Numeric questions are scored by closeness during the quiz rather than by exact choice index.
Examples include:
- `What's the release year of <album or track>?`
- `What year did "<track>" come out?`
- `What year did <artist>'s first party track come out?`
- `For how many players in the party is "<track>" a top track?`
- `How many players in the party have "<artist>" as a favourite artist?`
Release-year questions use a range around the correct year, capped at the current year and widened to a minimum span. Count questions use a range from `0` to the current party size.
### 7. Question timing
Every generated question is wrapped by `buildQuestionWindow`, which sets:
```ts
startTimestamp = Date.now();
endTimestamp = startTimestamp + 60_000;
```
These timestamps are used by the quiz workflow to decide when answer collection times out.
## Song selection
After a question candidate is selected, `selectQuestionSong` chooses an audio track to attach to the question.
Song candidates come from:
1. the song already attached by the question builder
2. the question subject, if the subject is a track or artist
3. people mentioned by the question, when member-specific subjects can imply a relevant song
4. fair tracks from the story clusters
5. top party songs queried from member top tracks
The selector avoids reusing songs from previous quiz rounds when possible by checking prior `song.platform_id` values.
Some question types should keep or prefer their subject song:
- Questions where hearing the exact song is necessary keep the subject song.
- Questions where the song helps but is not mandatory prefer a relevant fresh song.
- Other questions prefer fair, fresh, adjacent party songs so audio does not reveal the answer too directly.
## Quiz response and scoring
For each question, the quiz workflow waits until all current party members answer or the question deadline is reached. Missing answers are recorded with `selected: -1` and score 0.
Choice questions are scored exactly:
```text
pointsGained = question.points if selected option index equals question.correct
pointsGained = 0 otherwise
```
Numeric questions are scored relatively by answer distance:
1. Ignore no-answer responses for ranking.
2. Compute absolute distance from the correct numeric value.
3. Group equal distances together.
4. Award the closest group full points and linearly decrease points for later distance groups.
5. If all numeric answers are equally distant, only exact answers receive points.
After each round, scores are added to `quizState.scores`, the quiz enters `review`, then continues to the next question. After the final question, the quiz status becomes `results` and the party status is marked `ended`.
## Design goals
The current algorithm is optimized for:
- **Shared relevance:** Prefer content that represents multiple party members.
- **Personal variety:** Avoid repeatedly targeting the same member or subject.
- **Freshness:** Avoid repeated question keys, subjects, text, and songs.
- **Playable trivia:** Only emit questions with enough options, valid text, and usable metadata.
- **Low round latency:** Do expensive aggregation once in party analysis, then use compact JSON during quiz rounds.

81
api/POSSIBLE_QUESTIONS.md Normal file
View file

@ -0,0 +1,81 @@
# Possible Quiz Questions
This document lists every question template currently generated by the API.
Relevant files:
- `src/party/audio-question-generator.ts`
- `src/party/social-question-generator.ts`
- `src/party/numeric-question-generator.ts`
Notes:
- Dynamic placeholders are shown with angle brackets, for example `<track>` or `<artist>`.
- Every generated question is worth `10` points.
- Choice questions generate shuffled answer options.
- Numeric questions generate a numeric answer range instead of options.
- Not every template is available in every party. A question is only emitted when the required analytics, metadata, answer options, and party size are available.
## Audio metadata choice questions
Audio metadata questions are generated by `buildAudioMetadataQuestion`.
| Template | Correct answer | Required data / condition |
| --- | --- | --- |
| `What song is currently playing?` | Current/selected top song title | A resolvable top song and enough track title options |
| `Which genre is shared by the most party members?` | The highest-ranked shared genre | Shared genre analytics and enough genre options |
| `Which genre is ranked #<rank> in the party's shared genres?` | The genre at that exact rank | Shared genre analytics and enough genre options |
| `Which artist is ranked highest in the shared audio data?` | The highest-ranked fair artist | Fair artist analytics and enough artist options |
| `Which artist is ranked #<rank> in the shared audio data?` | The artist at that exact rank | Fair artist analytics and enough artist options |
| `Which track is ranked highest across the party?` | The top fair shared track | Top fair track is shared by more than one member and enough track options exist |
| `Which track is ranked #<rank> in the party analysis?` | The track at that exact rank | Fair track analytics and enough track options |
| `Which artist appears on "<album>"?` | An artist from the album | A top track with album name and artist metadata, plus enough artist options |
| `Which of these tracks came out first?` | Earliest released track among detailed top tracks | At least two detailed top tracks with album release dates |
| `Which of these tracks came out most recently?` | Latest released track among detailed top tracks | At least two detailed top tracks with album release dates and distinct earliest/latest tracks |
| `What's the longest track by <artist>?` | Longest detailed top track for that artist | At least two detailed top tracks by the same artist with durations |
| `Who performs "<track>"?` | The track's artist | A top track with artist metadata and enough artist options |
| `What is the name of this track by <artist>?` | The track title | A top track with artist metadata and enough track title options; song title is hidden |
| `Which song is this audio clip from?` | Current/selected top song title | A top song exists, differs from the iterated top track, and enough track title options exist; song title is hidden |
| `"<track>" appears on which album?` | The track's album | A top track with album metadata and enough album options |
## Social choice questions
Social questions are generated by `buildSocialQuestion`.
| Template | Correct answer | Required data / condition |
| --- | --- | --- |
| `Who is leading the quiz right now?` | Current quiz leader | At least 2 members and a clear leader in `quizState.scores` |
| `Who looks like the most diverse listener in the party?` | Member with highest genre entropy | At least 2 members and `groupSummary.mostDiverseMember` is available |
| `Who listens the most to "<track>"?` | Member with the highest score for that track | At least 2 members, a fair top track, and a top listener for that track |
| `Which two players share the most musical taste?` | Pair from `groupSummary.mostAlignedPair` | At least 3 members and a resolvable most-aligned pair |
## Numeric questions
Numeric questions are generated by `buildNumericQuestion`.
| Template | Correct answer | Range | Required data / condition |
| --- | --- | --- | --- |
| `What's the release year of <album-or-track>?` | Album release year | Release-year range around correct year | A fair top track with an album release date |
| `What year did "<track>" come out?` | Track album release year | Release-year range around correct year | A detailed fair top track with an album release date |
| `What year did <artist>'s first party track come out?` | Release year of the earliest party track by that artist | Release-year range around correct year | At least two detailed top tracks by the same artist with release dates |
| `For how many players in the party is "<track>" a top track?` | Count of party members whose top tracks include the track | `0` to party member count | A fair top track that resolves to a database track and appears for at least one member |
| `How many players in the party have "<artist>" as a favourite artist?` | Count of party members whose top artists include the artist | `0` to party member count | A fair top artist that resolves to a database artist and appears for at least one member |
## Template count
Current total: **24 question templates**.
- Audio metadata: 15
- Social: 4
- Numeric: 5
## Selection behavior
The generator does not walk this list in a fixed order. For each round:
1. It prioritizes a question family: `audio-metadata`, `numeric`, or `social`.
2. The priority is randomized but penalizes question families used in the last 3 rounds.
3. Each family builds all valid candidates it can from the current party data.
4. Candidates with repeated question keys, subject keys, or text are filtered out.
5. A weighted random candidate is selected, with higher weight for subjects shared by more party members.
6. A song is attached when possible, preferring fresh songs that were not used in prior rounds.