gen docs
This commit is contained in:
parent
dd4e776175
commit
6c7854edd4
2 changed files with 455 additions and 0 deletions
374
api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md
Normal file
374
api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md
Normal file
|
|
@ -0,0 +1,374 @@
|
|||
# Party Analysis and Question Generation Algorithm
|
||||
|
||||
This document describes how the API analyzes a party's Spotify data and turns that analysis into quiz questions.
|
||||
|
||||
Relevant implementation files:
|
||||
|
||||
- `src/workflows/party-analysis.ts` — computes and stores `party.analysisData`.
|
||||
- `src/workflows/quiz.ts` — starts analysis, runs the quiz loop, and scores answers.
|
||||
- `src/party/question-generator.ts` — chooses a question type and attaches a song.
|
||||
- `src/party/audio-question-generator.ts` — builds audio metadata choice questions.
|
||||
- `src/party/social-question-generator.ts` — builds social choice questions.
|
||||
- `src/party/numeric-question-generator.ts` — builds numeric questions.
|
||||
- `src/party/question-utils.ts` — shared fairness, deduplication, option, and song selection logic.
|
||||
|
||||
## High-level flow
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[Quiz starts] --> B[Analyze party]
|
||||
B --> C[Store party.analysisData]
|
||||
C --> D[Initialize quiz state]
|
||||
D --> E[Generate next question]
|
||||
E --> F[Publish current question in party data]
|
||||
F --> G[Wait for player answers or timeout]
|
||||
G --> H[Score round]
|
||||
H --> I[Review period]
|
||||
I --> J{More questions?}
|
||||
J -->|Yes| E
|
||||
J -->|No| K[Show results and mark party ended]
|
||||
```
|
||||
|
||||
When a quiz starts, `QuizWorkflow.startQuiz` first runs `partyAnalysisWorkflow.analyzeParty(partyId)`. The generated analysis is saved to the `party.analysisData` JSON column and then reused for every question in the quiz.
|
||||
|
||||
The quiz currently asks up to `TOTAL_QUESTIONS = 5` questions. Each question has a 60 second answer window, followed by a 5 second review period.
|
||||
|
||||
## Party analysis
|
||||
|
||||
Party analysis converts each member's listening data into comparable track, artist, and genre scores. The result is a compact party-level summary designed for fast question generation.
|
||||
|
||||
### 1. Minimum party size
|
||||
|
||||
If a party has fewer than 2 members, analysis is saved as empty:
|
||||
|
||||
- `storyClusters: []`
|
||||
- `pairwise: []`
|
||||
- `groupSummary.totalMembers`
|
||||
- `groupSummary.mostSharedGenres: []`
|
||||
- `groupSummary.mostDiverseMember: null`
|
||||
- `groupSummary.mostAlignedPair: null`
|
||||
- `memberProfiles: []`
|
||||
|
||||
The workflow returns `analyzed: false` in that case.
|
||||
|
||||
### 2. Per-member scoring
|
||||
|
||||
For each party member, the analysis workflow fetches several Spotify-derived tables and accumulates scores into three maps:
|
||||
|
||||
- tracks
|
||||
- artists
|
||||
- genres
|
||||
|
||||
The score inputs are:
|
||||
|
||||
| Source | Scoring |
|
||||
| --- | --- |
|
||||
| Medium-term top tracks | `MAX_POSITION - position + 1`, with `MAX_POSITION = 50` |
|
||||
| Saved tracks | track `+10`, artists `+5`, genres `+2.5` |
|
||||
| Playback history | track `+5` if played in last 24h, `+3` if played in last week, otherwise `+1`; artists get half, genres get quarter |
|
||||
| Medium-term top artists | `MAX_POSITION - position + 1` |
|
||||
| Followed artists | artist `+10`, genres `+10` |
|
||||
| Saved albums | album artists `+5`, genres `+2.5` |
|
||||
|
||||
Top track and top artist scores are position-weighted, so rank 1 contributes more than rank 50. Saved and followed items add fixed preference signals. Playback history adds recency-weighted listening signals.
|
||||
|
||||
### 3. Party-level entity maps
|
||||
|
||||
After all member scores are fetched, the workflow builds one map per entity type:
|
||||
|
||||
- `TrackEntityScore`
|
||||
- `ArtistEntityScore`
|
||||
- `GenreEntityScore`
|
||||
|
||||
Each entity contains:
|
||||
|
||||
- entity id and display name
|
||||
- track artist names and album name, for tracks
|
||||
- `memberScores`, the list of members who contributed to the entity and their score
|
||||
- `memberCount`, the number of party members represented by that entity
|
||||
|
||||
These maps make it possible to tell which songs, artists, and genres are shared by multiple people and which are strongly associated with one person.
|
||||
|
||||
### 4. Story clusters
|
||||
|
||||
Story clusters group entities by the exact subset of party members that share them.
|
||||
|
||||
For example, if Alice and Bob both have the same genre, that genre goes into the cluster keyed by `Alice|Bob`. If all party members share a track, that track goes into the all-members cluster.
|
||||
|
||||
Clusters are sorted by:
|
||||
|
||||
1. all-members cluster first
|
||||
2. larger `memberCount`
|
||||
3. total track score in the cluster
|
||||
|
||||
Within each cluster, tracks, artists, and genres are sorted by total score descending.
|
||||
|
||||
The stored analysis is compacted to:
|
||||
|
||||
- top 8 story clusters
|
||||
- top 20 tracks, artists, and genres per cluster
|
||||
|
||||
### 5. Pairwise similarity
|
||||
|
||||
For every pair of party members, the workflow computes:
|
||||
|
||||
- `sharedTracks`
|
||||
- `sharedArtists`
|
||||
- `sharedGenres`
|
||||
- `similarity`
|
||||
|
||||
Similarity uses a weighted Jaccard-style score across tracks, artists, and genres:
|
||||
|
||||
```text
|
||||
similarity = sum(min(scoreA, scoreB) for shared entities)
|
||||
/ sum(max(scoreA, scoreB) for all entities in either profile)
|
||||
```
|
||||
|
||||
This rewards members who share high-scoring music preferences, not just raw overlap counts.
|
||||
|
||||
The most similar pair becomes `groupSummary.mostAlignedPair`.
|
||||
|
||||
### 6. Member profiles and genre diversity
|
||||
|
||||
Each member profile stores:
|
||||
|
||||
- `userId`
|
||||
- `totalScore`, based on track and artist scores
|
||||
- `genreScores`
|
||||
- `trackCount`
|
||||
- `artistCount`
|
||||
|
||||
Genre diversity is calculated as entropy over the member's genre score distribution:
|
||||
|
||||
```text
|
||||
entropy = -sum(p * ln(p))
|
||||
```
|
||||
|
||||
where `p` is the genre score divided by the member's total score. The member with the highest entropy becomes `groupSummary.mostDiverseMember`.
|
||||
|
||||
Stored member profiles keep only the top 20 genres by score.
|
||||
|
||||
### 7. Most shared genres
|
||||
|
||||
The workflow aggregates genre scores across members, sorts genres by:
|
||||
|
||||
1. `memberCount` descending
|
||||
2. total genre score descending
|
||||
|
||||
It keeps the top 10 genres that are shared by at least 2 members as `groupSummary.mostSharedGenres`.
|
||||
|
||||
## Generated analysis shape
|
||||
|
||||
The saved `party.analysisData` contains:
|
||||
|
||||
```ts
|
||||
type PartyAnalysisResult = {
|
||||
storyClusters: StoryCluster[];
|
||||
pairwise: PairwiseComparison[];
|
||||
groupSummary: {
|
||||
totalMembers: number;
|
||||
mostSharedGenres: GenreEntityScore[];
|
||||
mostDiverseMember: GenreDiversity | null;
|
||||
mostAlignedPair: PairwiseComparison | null;
|
||||
};
|
||||
memberProfiles: MemberProfile[];
|
||||
};
|
||||
```
|
||||
|
||||
This JSON is intentionally denormalized and compact so the question generators can work without recomputing party analytics during each round.
|
||||
|
||||
## Question generation
|
||||
|
||||
Each quiz round calls `generatePartyQuestion`, passing:
|
||||
|
||||
- database client
|
||||
- party id
|
||||
- current `QuizState`
|
||||
- saved `party.analysisData`
|
||||
- question index
|
||||
|
||||
The generator fetches current party members, chooses a question type order, asks each question builder for a valid question, then attaches a suitable song.
|
||||
|
||||
### 1. Question type ordering
|
||||
|
||||
The possible question types are:
|
||||
|
||||
- `audio-metadata`
|
||||
- `social`
|
||||
- `numeric`
|
||||
|
||||
For each round, the generator randomizes their priority using base weights and recent-history penalties:
|
||||
|
||||
| Type | Base weight |
|
||||
| --- | ---: |
|
||||
| `audio-metadata` | `1` |
|
||||
| `numeric` | `0.55` |
|
||||
| `social` | `0.1` |
|
||||
|
||||
A random value up to `0.35` is added. Each occurrence of the same type in the last 3 questions subtracts `0.45`.
|
||||
|
||||
This means audio metadata questions are preferred by default, but the generator avoids repeating the same category too often.
|
||||
|
||||
### 2. Candidate generation
|
||||
|
||||
Each question builder creates multiple candidates, then `pickQuestionCandidate` selects one.
|
||||
|
||||
Candidates contain:
|
||||
|
||||
- `key` — unique question identity
|
||||
- `subjectKey` — the entity being asked about, such as `track:...`, `artist:...`, `genre:...`, `member:...`, or `pair:...`
|
||||
- optional fairness metadata
|
||||
- the partial question object
|
||||
|
||||
A candidate is rejected if the quiz history already contains the same normalized:
|
||||
|
||||
- question key
|
||||
- subject key
|
||||
- question text
|
||||
|
||||
This prevents repeated questions and repeated subjects across the quiz.
|
||||
|
||||
### 3. Fairness weighting
|
||||
|
||||
Tracks and artists are sorted by fairness before they are used as question subjects.
|
||||
|
||||
Fairness is derived from an entity's `memberScores`:
|
||||
|
||||
- `memberIds` — party members connected to the entity
|
||||
- `memberCount` — how many members are connected
|
||||
- `score` — total member score for shared entities
|
||||
|
||||
For single-member entities, the fairness score is negative history usage for that member. This prevents the quiz from repeatedly focusing on one person when only single-member subjects are available.
|
||||
|
||||
Question candidate weight is:
|
||||
|
||||
```text
|
||||
if no fairness data:
|
||||
weight = 8
|
||||
else:
|
||||
weight = 8 + memberCount * 20 + clamp(score, 0, 100) / 20
|
||||
```
|
||||
|
||||
Weighted random selection is then used. Shared, high-scoring entities are therefore more likely, but not guaranteed, to be selected.
|
||||
|
||||
### 4. Audio metadata questions
|
||||
|
||||
`buildAudioMetadataQuestion` produces choice questions about party music metadata. It uses:
|
||||
|
||||
- most shared genres
|
||||
- fair tracks from story clusters
|
||||
- fair artists from story clusters
|
||||
- detailed track rows from the database for album, artist, release date, and duration metadata
|
||||
|
||||
Examples of generated questions include:
|
||||
|
||||
- `What song is currently playing?`
|
||||
- `Which genre is shared by the most party members?`
|
||||
- `Which genre is ranked #<rank> in the party's shared genres?`
|
||||
- `Which artist is ranked highest in the shared audio data?`
|
||||
- `Which artist is ranked #<rank> in the shared audio data?`
|
||||
- `Which track is ranked highest across the party?`
|
||||
- `Which track is ranked #<rank> in the party analysis?`
|
||||
- `Which artist appears on "<album>"?`
|
||||
- `Which of these tracks came out first?`
|
||||
- `Which of these tracks came out most recently?`
|
||||
- `What's the longest track by <artist>?`
|
||||
- `Who performs "<track>"?`
|
||||
- `What is the name of this track by <artist>?`
|
||||
- `"<track>" appears on which album?`
|
||||
|
||||
Options are built from relevant candidate pools, deduplicated, shuffled, and only emitted when there are enough valid options.
|
||||
|
||||
### 5. Social questions
|
||||
|
||||
`buildSocialQuestion` produces choice questions about players and relationships in the party.
|
||||
|
||||
Examples include:
|
||||
|
||||
- `Who is leading the quiz right now?`
|
||||
- `Who looks like the most diverse listener in the party?`
|
||||
- `Who listens the most to "<track>"?`
|
||||
- `Which two players share the most musical taste?`
|
||||
|
||||
Social questions require enough party members for the question to make sense:
|
||||
|
||||
- leader/diverse/top-listener questions need at least 2 members
|
||||
- most-aligned-pair questions need at least 3 members
|
||||
|
||||
The top-listener question prefers shared tracks when shared tracks exist, so the quiz does not unnecessarily focus on solo-only data.
|
||||
|
||||
### 6. Numeric questions
|
||||
|
||||
`buildNumericQuestion` produces numeric-answer questions. Numeric questions are scored by closeness during the quiz rather than by exact choice index.
|
||||
|
||||
Examples include:
|
||||
|
||||
- `What's the release year of <album or track>?`
|
||||
- `What year did "<track>" come out?`
|
||||
- `What year did <artist>'s first party track come out?`
|
||||
- `For how many players in the party is "<track>" a top track?`
|
||||
- `How many players in the party have "<artist>" as a favourite artist?`
|
||||
|
||||
Release-year questions use a range around the correct year, capped at the current year and widened to a minimum span. Count questions use a range from `0` to the current party size.
|
||||
|
||||
### 7. Question timing
|
||||
|
||||
Every generated question is wrapped by `buildQuestionWindow`, which sets:
|
||||
|
||||
```ts
|
||||
startTimestamp = Date.now();
|
||||
endTimestamp = startTimestamp + 60_000;
|
||||
```
|
||||
|
||||
These timestamps are used by the quiz workflow to decide when answer collection times out.
|
||||
|
||||
## Song selection
|
||||
|
||||
After a question candidate is selected, `selectQuestionSong` chooses an audio track to attach to the question.
|
||||
|
||||
Song candidates come from:
|
||||
|
||||
1. the song already attached by the question builder
|
||||
2. the question subject, if the subject is a track or artist
|
||||
3. people mentioned by the question, when member-specific subjects can imply a relevant song
|
||||
4. fair tracks from the story clusters
|
||||
5. top party songs queried from member top tracks
|
||||
|
||||
The selector avoids reusing songs from previous quiz rounds when possible by checking prior `song.platform_id` values.
|
||||
|
||||
Some question types should keep or prefer their subject song:
|
||||
|
||||
- Questions where hearing the exact song is necessary keep the subject song.
|
||||
- Questions where the song helps but is not mandatory prefer a relevant fresh song.
|
||||
- Other questions prefer fair, fresh, adjacent party songs so audio does not reveal the answer too directly.
|
||||
|
||||
## Quiz response and scoring
|
||||
|
||||
For each question, the quiz workflow waits until all current party members answer or the question deadline is reached. Missing answers are recorded with `selected: -1` and score 0.
|
||||
|
||||
Choice questions are scored exactly:
|
||||
|
||||
```text
|
||||
pointsGained = question.points if selected option index equals question.correct
|
||||
pointsGained = 0 otherwise
|
||||
```
|
||||
|
||||
Numeric questions are scored relatively by answer distance:
|
||||
|
||||
1. Ignore no-answer responses for ranking.
|
||||
2. Compute absolute distance from the correct numeric value.
|
||||
3. Group equal distances together.
|
||||
4. Award the closest group full points and linearly decrease points for later distance groups.
|
||||
5. If all numeric answers are equally distant, only exact answers receive points.
|
||||
|
||||
After each round, scores are added to `quizState.scores`, the quiz enters `review`, then continues to the next question. After the final question, the quiz status becomes `results` and the party status is marked `ended`.
|
||||
|
||||
## Design goals
|
||||
|
||||
The current algorithm is optimized for:
|
||||
|
||||
- **Shared relevance:** Prefer content that represents multiple party members.
|
||||
- **Personal variety:** Avoid repeatedly targeting the same member or subject.
|
||||
- **Freshness:** Avoid repeated question keys, subjects, text, and songs.
|
||||
- **Playable trivia:** Only emit questions with enough options, valid text, and usable metadata.
|
||||
- **Low round latency:** Do expensive aggregation once in party analysis, then use compact JSON during quiz rounds.
|
||||
81
api/POSSIBLE_QUESTIONS.md
Normal file
81
api/POSSIBLE_QUESTIONS.md
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
# Possible Quiz Questions
|
||||
|
||||
This document lists every question template currently generated by the API.
|
||||
|
||||
Relevant files:
|
||||
|
||||
- `src/party/audio-question-generator.ts`
|
||||
- `src/party/social-question-generator.ts`
|
||||
- `src/party/numeric-question-generator.ts`
|
||||
|
||||
Notes:
|
||||
|
||||
- Dynamic placeholders are shown with angle brackets, for example `<track>` or `<artist>`.
|
||||
- Every generated question is worth `10` points.
|
||||
- Choice questions generate shuffled answer options.
|
||||
- Numeric questions generate a numeric answer range instead of options.
|
||||
- Not every template is available in every party. A question is only emitted when the required analytics, metadata, answer options, and party size are available.
|
||||
|
||||
## Audio metadata choice questions
|
||||
|
||||
Audio metadata questions are generated by `buildAudioMetadataQuestion`.
|
||||
|
||||
| Template | Correct answer | Required data / condition |
|
||||
| --- | --- | --- |
|
||||
| `What song is currently playing?` | Current/selected top song title | A resolvable top song and enough track title options |
|
||||
| `Which genre is shared by the most party members?` | The highest-ranked shared genre | Shared genre analytics and enough genre options |
|
||||
| `Which genre is ranked #<rank> in the party's shared genres?` | The genre at that exact rank | Shared genre analytics and enough genre options |
|
||||
| `Which artist is ranked highest in the shared audio data?` | The highest-ranked fair artist | Fair artist analytics and enough artist options |
|
||||
| `Which artist is ranked #<rank> in the shared audio data?` | The artist at that exact rank | Fair artist analytics and enough artist options |
|
||||
| `Which track is ranked highest across the party?` | The top fair shared track | Top fair track is shared by more than one member and enough track options exist |
|
||||
| `Which track is ranked #<rank> in the party analysis?` | The track at that exact rank | Fair track analytics and enough track options |
|
||||
| `Which artist appears on "<album>"?` | An artist from the album | A top track with album name and artist metadata, plus enough artist options |
|
||||
| `Which of these tracks came out first?` | Earliest released track among detailed top tracks | At least two detailed top tracks with album release dates |
|
||||
| `Which of these tracks came out most recently?` | Latest released track among detailed top tracks | At least two detailed top tracks with album release dates and distinct earliest/latest tracks |
|
||||
| `What's the longest track by <artist>?` | Longest detailed top track for that artist | At least two detailed top tracks by the same artist with durations |
|
||||
| `Who performs "<track>"?` | The track's artist | A top track with artist metadata and enough artist options |
|
||||
| `What is the name of this track by <artist>?` | The track title | A top track with artist metadata and enough track title options; song title is hidden |
|
||||
| `Which song is this audio clip from?` | Current/selected top song title | A top song exists, differs from the iterated top track, and enough track title options exist; song title is hidden |
|
||||
| `"<track>" appears on which album?` | The track's album | A top track with album metadata and enough album options |
|
||||
|
||||
## Social choice questions
|
||||
|
||||
Social questions are generated by `buildSocialQuestion`.
|
||||
|
||||
| Template | Correct answer | Required data / condition |
|
||||
| --- | --- | --- |
|
||||
| `Who is leading the quiz right now?` | Current quiz leader | At least 2 members and a clear leader in `quizState.scores` |
|
||||
| `Who looks like the most diverse listener in the party?` | Member with highest genre entropy | At least 2 members and `groupSummary.mostDiverseMember` is available |
|
||||
| `Who listens the most to "<track>"?` | Member with the highest score for that track | At least 2 members, a fair top track, and a top listener for that track |
|
||||
| `Which two players share the most musical taste?` | Pair from `groupSummary.mostAlignedPair` | At least 3 members and a resolvable most-aligned pair |
|
||||
|
||||
## Numeric questions
|
||||
|
||||
Numeric questions are generated by `buildNumericQuestion`.
|
||||
|
||||
| Template | Correct answer | Range | Required data / condition |
|
||||
| --- | --- | --- | --- |
|
||||
| `What's the release year of <album-or-track>?` | Album release year | Release-year range around correct year | A fair top track with an album release date |
|
||||
| `What year did "<track>" come out?` | Track album release year | Release-year range around correct year | A detailed fair top track with an album release date |
|
||||
| `What year did <artist>'s first party track come out?` | Release year of the earliest party track by that artist | Release-year range around correct year | At least two detailed top tracks by the same artist with release dates |
|
||||
| `For how many players in the party is "<track>" a top track?` | Count of party members whose top tracks include the track | `0` to party member count | A fair top track that resolves to a database track and appears for at least one member |
|
||||
| `How many players in the party have "<artist>" as a favourite artist?` | Count of party members whose top artists include the artist | `0` to party member count | A fair top artist that resolves to a database artist and appears for at least one member |
|
||||
|
||||
## Template count
|
||||
|
||||
Current total: **24 question templates**.
|
||||
|
||||
- Audio metadata: 15
|
||||
- Social: 4
|
||||
- Numeric: 5
|
||||
|
||||
## Selection behavior
|
||||
|
||||
The generator does not walk this list in a fixed order. For each round:
|
||||
|
||||
1. It prioritizes a question family: `audio-metadata`, `numeric`, or `social`.
|
||||
2. The priority is randomized but penalizes question families used in the last 3 rounds.
|
||||
3. Each family builds all valid candidates it can from the current party data.
|
||||
4. Candidates with repeated question keys, subject keys, or text are filtered out.
|
||||
5. A weighted random candidate is selected, with higher weight for subjects shared by more party members.
|
||||
6. A song is attached when possible, preferring fresh songs that were not used in prior rounds.
|
||||
Loading…
Reference in a new issue