374 lines
13 KiB
Markdown
374 lines
13 KiB
Markdown
# Party Analysis and Question Generation Algorithm
|
|
|
|
This document describes how the API analyzes a party's Spotify data and turns that analysis into quiz questions.
|
|
|
|
Relevant implementation files:
|
|
|
|
- `src/workflows/party-analysis.ts` — computes and stores `party.analysisData`.
|
|
- `src/workflows/quiz.ts` — starts analysis, runs the quiz loop, and scores answers.
|
|
- `src/party/question-generator.ts` — chooses a question type and attaches a song.
|
|
- `src/party/audio-question-generator.ts` — builds audio metadata choice questions.
|
|
- `src/party/social-question-generator.ts` — builds social choice questions.
|
|
- `src/party/numeric-question-generator.ts` — builds numeric questions.
|
|
- `src/party/question-utils.ts` — shared fairness, deduplication, option, and song selection logic.
|
|
|
|
## High-level flow
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[Quiz starts] --> B[Analyze party]
|
|
B --> C[Store party.analysisData]
|
|
C --> D[Initialize quiz state]
|
|
D --> E[Generate next question]
|
|
E --> F[Publish current question in party data]
|
|
F --> G[Wait for player answers or timeout]
|
|
G --> H[Score round]
|
|
H --> I[Review period]
|
|
I --> J{More questions?}
|
|
J -->|Yes| E
|
|
J -->|No| K[Show results and mark party ended]
|
|
```
|
|
|
|
When a quiz starts, `QuizWorkflow.startQuiz` first runs `partyAnalysisWorkflow.analyzeParty(partyId)`. The generated analysis is saved to the `party.analysisData` JSON column and then reused for every question in the quiz.
|
|
|
|
The quiz currently asks up to `TOTAL_QUESTIONS = 5` questions. Each question has a 60 second answer window, followed by a 5 second review period.
|
|
|
|
## Party analysis
|
|
|
|
Party analysis converts each member's listening data into comparable track, artist, and genre scores. The result is a compact party-level summary designed for fast question generation.
|
|
|
|
### 1. Minimum party size
|
|
|
|
If a party has fewer than 2 members, analysis is saved as empty:
|
|
|
|
- `storyClusters: []`
|
|
- `pairwise: []`
|
|
- `groupSummary.totalMembers`
|
|
- `groupSummary.mostSharedGenres: []`
|
|
- `groupSummary.mostDiverseMember: null`
|
|
- `groupSummary.mostAlignedPair: null`
|
|
- `memberProfiles: []`
|
|
|
|
The workflow returns `analyzed: false` in that case.
|
|
|
|
### 2. Per-member scoring
|
|
|
|
For each party member, the analysis workflow fetches several Spotify-derived tables and accumulates scores into three maps:
|
|
|
|
- tracks
|
|
- artists
|
|
- genres
|
|
|
|
The score inputs are:
|
|
|
|
| Source | Scoring |
|
|
| --- | --- |
|
|
| Medium-term top tracks | `MAX_POSITION - position + 1`, with `MAX_POSITION = 50` |
|
|
| Saved tracks | track `+10`, artists `+5`, genres `+2.5` |
|
|
| Playback history | track `+5` if played in last 24h, `+3` if played in last week, otherwise `+1`; artists get half, genres get quarter |
|
|
| Medium-term top artists | `MAX_POSITION - position + 1` |
|
|
| Followed artists | artist `+10`, genres `+10` |
|
|
| Saved albums | album artists `+5`, genres `+2.5` |
|
|
|
|
Top track and top artist scores are position-weighted, so rank 1 contributes more than rank 50. Saved and followed items add fixed preference signals. Playback history adds recency-weighted listening signals.
|
|
|
|
### 3. Party-level entity maps
|
|
|
|
After all member scores are fetched, the workflow builds one map per entity type:
|
|
|
|
- `TrackEntityScore`
|
|
- `ArtistEntityScore`
|
|
- `GenreEntityScore`
|
|
|
|
Each entity contains:
|
|
|
|
- entity id and display name
|
|
- track artist names and album name, for tracks
|
|
- `memberScores`, the list of members who contributed to the entity and their score
|
|
- `memberCount`, the number of party members represented by that entity
|
|
|
|
These maps make it possible to tell which songs, artists, and genres are shared by multiple people and which are strongly associated with one person.
|
|
|
|
### 4. Story clusters
|
|
|
|
Story clusters group entities by the exact subset of party members that share them.
|
|
|
|
For example, if Alice and Bob both have the same genre, that genre goes into the cluster keyed by `Alice|Bob`. If all party members share a track, that track goes into the all-members cluster.
|
|
|
|
Clusters are sorted by:
|
|
|
|
1. all-members cluster first
|
|
2. larger `memberCount`
|
|
3. total track score in the cluster
|
|
|
|
Within each cluster, tracks, artists, and genres are sorted by total score descending.
|
|
|
|
The stored analysis is compacted to:
|
|
|
|
- top 8 story clusters
|
|
- top 20 tracks, artists, and genres per cluster
|
|
|
|
### 5. Pairwise similarity
|
|
|
|
For every pair of party members, the workflow computes:
|
|
|
|
- `sharedTracks`
|
|
- `sharedArtists`
|
|
- `sharedGenres`
|
|
- `similarity`
|
|
|
|
Similarity uses a weighted Jaccard-style score across tracks, artists, and genres:
|
|
|
|
```text
|
|
similarity = sum(min(scoreA, scoreB) for shared entities)
|
|
/ sum(max(scoreA, scoreB) for all entities in either profile)
|
|
```
|
|
|
|
This rewards members who share high-scoring music preferences, not just raw overlap counts.
|
|
|
|
The most similar pair becomes `groupSummary.mostAlignedPair`.
|
|
|
|
### 6. Member profiles and genre diversity
|
|
|
|
Each member profile stores:
|
|
|
|
- `userId`
|
|
- `totalScore`, based on track and artist scores
|
|
- `genreScores`
|
|
- `trackCount`
|
|
- `artistCount`
|
|
|
|
Genre diversity is calculated as entropy over the member's genre score distribution:
|
|
|
|
```text
|
|
entropy = -sum(p * ln(p))
|
|
```
|
|
|
|
where `p` is the genre score divided by the member's total score. The member with the highest entropy becomes `groupSummary.mostDiverseMember`.
|
|
|
|
Stored member profiles keep only the top 20 genres by score.
|
|
|
|
### 7. Most shared genres
|
|
|
|
The workflow aggregates genre scores across members, sorts genres by:
|
|
|
|
1. `memberCount` descending
|
|
2. total genre score descending
|
|
|
|
It keeps the top 10 genres that are shared by at least 2 members as `groupSummary.mostSharedGenres`.
|
|
|
|
## Generated analysis shape
|
|
|
|
The saved `party.analysisData` contains:
|
|
|
|
```ts
|
|
type PartyAnalysisResult = {
|
|
storyClusters: StoryCluster[];
|
|
pairwise: PairwiseComparison[];
|
|
groupSummary: {
|
|
totalMembers: number;
|
|
mostSharedGenres: GenreEntityScore[];
|
|
mostDiverseMember: GenreDiversity | null;
|
|
mostAlignedPair: PairwiseComparison | null;
|
|
};
|
|
memberProfiles: MemberProfile[];
|
|
};
|
|
```
|
|
|
|
This JSON is intentionally denormalized and compact so the question generators can work without recomputing party analytics during each round.
|
|
|
|
## Question generation
|
|
|
|
Each quiz round calls `generatePartyQuestion`, passing:
|
|
|
|
- database client
|
|
- party id
|
|
- current `QuizState`
|
|
- saved `party.analysisData`
|
|
- question index
|
|
|
|
The generator fetches current party members, chooses a question type order, asks each question builder for a valid question, then attaches a suitable song.
|
|
|
|
### 1. Question type ordering
|
|
|
|
The possible question types are:
|
|
|
|
- `audio-metadata`
|
|
- `social`
|
|
- `numeric`
|
|
|
|
For each round, the generator randomizes their priority using base weights and recent-history penalties:
|
|
|
|
| Type | Base weight |
|
|
| --- | ---: |
|
|
| `audio-metadata` | `1` |
|
|
| `numeric` | `0.55` |
|
|
| `social` | `0.1` |
|
|
|
|
A random value up to `0.35` is added. Each occurrence of the same type in the last 3 questions subtracts `0.45`.
|
|
|
|
This means audio metadata questions are preferred by default, but the generator avoids repeating the same category too often.
|
|
|
|
### 2. Candidate generation
|
|
|
|
Each question builder creates multiple candidates, then `pickQuestionCandidate` selects one.
|
|
|
|
Candidates contain:
|
|
|
|
- `key` — unique question identity
|
|
- `subjectKey` — the entity being asked about, such as `track:...`, `artist:...`, `genre:...`, `member:...`, or `pair:...`
|
|
- optional fairness metadata
|
|
- the partial question object
|
|
|
|
A candidate is rejected if the quiz history already contains the same normalized:
|
|
|
|
- question key
|
|
- subject key
|
|
- question text
|
|
|
|
This prevents repeated questions and repeated subjects across the quiz.
|
|
|
|
### 3. Fairness weighting
|
|
|
|
Tracks and artists are sorted by fairness before they are used as question subjects.
|
|
|
|
Fairness is derived from an entity's `memberScores`:
|
|
|
|
- `memberIds` — party members connected to the entity
|
|
- `memberCount` — how many members are connected
|
|
- `score` — total member score for shared entities
|
|
|
|
For single-member entities, the fairness score is negative history usage for that member. This prevents the quiz from repeatedly focusing on one person when only single-member subjects are available.
|
|
|
|
Question candidate weight is:
|
|
|
|
```text
|
|
if no fairness data:
|
|
weight = 8
|
|
else:
|
|
weight = 8 + memberCount * 20 + clamp(score, 0, 100) / 20
|
|
```
|
|
|
|
Weighted random selection is then used. Shared, high-scoring entities are therefore more likely, but not guaranteed, to be selected.
|
|
|
|
### 4. Audio metadata questions
|
|
|
|
`buildAudioMetadataQuestion` produces choice questions about party music metadata. It uses:
|
|
|
|
- most shared genres
|
|
- fair tracks from story clusters
|
|
- fair artists from story clusters
|
|
- detailed track rows from the database for album, artist, release date, and duration metadata
|
|
|
|
Examples of generated questions include:
|
|
|
|
- `What song is currently playing?`
|
|
- `Which genre is shared by the most party members?`
|
|
- `Which genre is ranked #<rank> in the party's shared genres?`
|
|
- `Which artist is ranked highest in the shared audio data?`
|
|
- `Which artist is ranked #<rank> in the shared audio data?`
|
|
- `Which track is ranked highest across the party?`
|
|
- `Which track is ranked #<rank> in the party analysis?`
|
|
- `Which artist appears on "<album>"?`
|
|
- `Which of these tracks came out first?`
|
|
- `Which of these tracks came out most recently?`
|
|
- `What's the longest track by <artist>?`
|
|
- `Who performs "<track>"?`
|
|
- `What is the name of this track by <artist>?`
|
|
- `"<track>" appears on which album?`
|
|
|
|
Options are built from relevant candidate pools, deduplicated, shuffled, and only emitted when there are enough valid options.
|
|
|
|
### 5. Social questions
|
|
|
|
`buildSocialQuestion` produces choice questions about players and relationships in the party.
|
|
|
|
Examples include:
|
|
|
|
- `Who is leading the quiz right now?`
|
|
- `Who looks like the most diverse listener in the party?`
|
|
- `Who listens the most to "<track>"?`
|
|
- `Which two players share the most musical taste?`
|
|
|
|
Social questions require enough party members for the question to make sense:
|
|
|
|
- leader/diverse/top-listener questions need at least 2 members
|
|
- most-aligned-pair questions need at least 3 members
|
|
|
|
The top-listener question prefers shared tracks when shared tracks exist, so the quiz does not unnecessarily focus on solo-only data.
|
|
|
|
### 6. Numeric questions
|
|
|
|
`buildNumericQuestion` produces numeric-answer questions. Numeric questions are scored by closeness during the quiz rather than by exact choice index.
|
|
|
|
Examples include:
|
|
|
|
- `What's the release year of <album or track>?`
|
|
- `What year did "<track>" come out?`
|
|
- `What year did <artist>'s first party track come out?`
|
|
- `For how many players in the party is "<track>" a top track?`
|
|
- `How many players in the party have "<artist>" as a favourite artist?`
|
|
|
|
Release-year questions use a range around the correct year, capped at the current year and widened to a minimum span. Count questions use a range from `0` to the current party size.
|
|
|
|
### 7. Question timing
|
|
|
|
Every generated question is wrapped by `buildQuestionWindow`, which sets:
|
|
|
|
```ts
|
|
startTimestamp = Date.now();
|
|
endTimestamp = startTimestamp + 60_000;
|
|
```
|
|
|
|
These timestamps are used by the quiz workflow to decide when answer collection times out.
|
|
|
|
## Song selection
|
|
|
|
After a question candidate is selected, `selectQuestionSong` chooses an audio track to attach to the question.
|
|
|
|
Song candidates come from:
|
|
|
|
1. the song already attached by the question builder
|
|
2. the question subject, if the subject is a track or artist
|
|
3. people mentioned by the question, when member-specific subjects can imply a relevant song
|
|
4. fair tracks from the story clusters
|
|
5. top party songs queried from member top tracks
|
|
|
|
The selector avoids reusing songs from previous quiz rounds when possible by checking prior `song.platform_id` values.
|
|
|
|
Some question types should keep or prefer their subject song:
|
|
|
|
- Questions where hearing the exact song is necessary keep the subject song.
|
|
- Questions where the song helps but is not mandatory prefer a relevant fresh song.
|
|
- Other questions prefer fair, fresh, adjacent party songs so audio does not reveal the answer too directly.
|
|
|
|
## Quiz response and scoring
|
|
|
|
For each question, the quiz workflow waits until all current party members answer or the question deadline is reached. Missing answers are recorded with `selected: -1` and score 0.
|
|
|
|
Choice questions are scored exactly:
|
|
|
|
```text
|
|
pointsGained = question.points if selected option index equals question.correct
|
|
pointsGained = 0 otherwise
|
|
```
|
|
|
|
Numeric questions are scored relatively by answer distance:
|
|
|
|
1. Ignore no-answer responses for ranking.
|
|
2. Compute absolute distance from the correct numeric value.
|
|
3. Group equal distances together.
|
|
4. Award the closest group full points and linearly decrease points for later distance groups.
|
|
5. If all numeric answers are equally distant, only exact answers receive points.
|
|
|
|
After each round, scores are added to `quizState.scores`, the quiz enters `review`, then continues to the next question. After the final question, the quiz status becomes `results` and the party status is marked `ended`.
|
|
|
|
## Design goals
|
|
|
|
The current algorithm is optimized for:
|
|
|
|
- **Shared relevance:** Prefer content that represents multiple party members.
|
|
- **Personal variety:** Avoid repeatedly targeting the same member or subject.
|
|
- **Freshness:** Avoid repeated question keys, subjects, text, and songs.
|
|
- **Playable trivia:** Only emit questions with enough options, valid text, and usable metadata.
|
|
- **Low round latency:** Do expensive aggregation once in party analysis, then use compact JSON during quiz rounds.
|