itpdp/api/PARTY_ANALYSIS_AND_QUESTION_GENERATION.md
Daniel Bulant 6c7854edd4
gen docs
2026-06-20 22:51:51 +02:00

13 KiB

Party Analysis and Question Generation Algorithm

This document describes how the API analyzes a party's Spotify data and turns that analysis into quiz questions.

Relevant implementation files:

  • src/workflows/party-analysis.ts — computes and stores party.analysisData.
  • src/workflows/quiz.ts — starts analysis, runs the quiz loop, and scores answers.
  • src/party/question-generator.ts — chooses a question type and attaches a song.
  • src/party/audio-question-generator.ts — builds audio metadata choice questions.
  • src/party/social-question-generator.ts — builds social choice questions.
  • src/party/numeric-question-generator.ts — builds numeric questions.
  • src/party/question-utils.ts — shared fairness, deduplication, option, and song selection logic.

High-level flow

flowchart TD
    A[Quiz starts] --> B[Analyze party]
    B --> C[Store party.analysisData]
    C --> D[Initialize quiz state]
    D --> E[Generate next question]
    E --> F[Publish current question in party data]
    F --> G[Wait for player answers or timeout]
    G --> H[Score round]
    H --> I[Review period]
    I --> J{More questions?}
    J -->|Yes| E
    J -->|No| K[Show results and mark party ended]

When a quiz starts, QuizWorkflow.startQuiz first runs partyAnalysisWorkflow.analyzeParty(partyId). The generated analysis is saved to the party.analysisData JSON column and then reused for every question in the quiz.

The quiz currently asks up to TOTAL_QUESTIONS = 5 questions. Each question has a 60 second answer window, followed by a 5 second review period.

Party analysis

Party analysis converts each member's listening data into comparable track, artist, and genre scores. The result is a compact party-level summary designed for fast question generation.

1. Minimum party size

If a party has fewer than 2 members, analysis is saved as empty:

  • storyClusters: []
  • pairwise: []
  • groupSummary.totalMembers
  • groupSummary.mostSharedGenres: []
  • groupSummary.mostDiverseMember: null
  • groupSummary.mostAlignedPair: null
  • memberProfiles: []

The workflow returns analyzed: false in that case.

2. Per-member scoring

For each party member, the analysis workflow fetches several Spotify-derived tables and accumulates scores into three maps:

  • tracks
  • artists
  • genres

The score inputs are:

Source Scoring
Medium-term top tracks MAX_POSITION - position + 1, with MAX_POSITION = 50
Saved tracks track +10, artists +5, genres +2.5
Playback history track +5 if played in last 24h, +3 if played in last week, otherwise +1; artists get half, genres get quarter
Medium-term top artists MAX_POSITION - position + 1
Followed artists artist +10, genres +10
Saved albums album artists +5, genres +2.5

Top track and top artist scores are position-weighted, so rank 1 contributes more than rank 50. Saved and followed items add fixed preference signals. Playback history adds recency-weighted listening signals.

3. Party-level entity maps

After all member scores are fetched, the workflow builds one map per entity type:

  • TrackEntityScore
  • ArtistEntityScore
  • GenreEntityScore

Each entity contains:

  • entity id and display name
  • track artist names and album name, for tracks
  • memberScores, the list of members who contributed to the entity and their score
  • memberCount, the number of party members represented by that entity

These maps make it possible to tell which songs, artists, and genres are shared by multiple people and which are strongly associated with one person.

4. Story clusters

Story clusters group entities by the exact subset of party members that share them.

For example, if Alice and Bob both have the same genre, that genre goes into the cluster keyed by Alice|Bob. If all party members share a track, that track goes into the all-members cluster.

Clusters are sorted by:

  1. all-members cluster first
  2. larger memberCount
  3. total track score in the cluster

Within each cluster, tracks, artists, and genres are sorted by total score descending.

The stored analysis is compacted to:

  • top 8 story clusters
  • top 20 tracks, artists, and genres per cluster

5. Pairwise similarity

For every pair of party members, the workflow computes:

  • sharedTracks
  • sharedArtists
  • sharedGenres
  • similarity

Similarity uses a weighted Jaccard-style score across tracks, artists, and genres:

similarity = sum(min(scoreA, scoreB) for shared entities)
             / sum(max(scoreA, scoreB) for all entities in either profile)

This rewards members who share high-scoring music preferences, not just raw overlap counts.

The most similar pair becomes groupSummary.mostAlignedPair.

6. Member profiles and genre diversity

Each member profile stores:

  • userId
  • totalScore, based on track and artist scores
  • genreScores
  • trackCount
  • artistCount

Genre diversity is calculated as entropy over the member's genre score distribution:

entropy = -sum(p * ln(p))

where p is the genre score divided by the member's total score. The member with the highest entropy becomes groupSummary.mostDiverseMember.

Stored member profiles keep only the top 20 genres by score.

7. Most shared genres

The workflow aggregates genre scores across members, sorts genres by:

  1. memberCount descending
  2. total genre score descending

It keeps the top 10 genres that are shared by at least 2 members as groupSummary.mostSharedGenres.

Generated analysis shape

The saved party.analysisData contains:

type PartyAnalysisResult = {
  storyClusters: StoryCluster[];
  pairwise: PairwiseComparison[];
  groupSummary: {
    totalMembers: number;
    mostSharedGenres: GenreEntityScore[];
    mostDiverseMember: GenreDiversity | null;
    mostAlignedPair: PairwiseComparison | null;
  };
  memberProfiles: MemberProfile[];
};

This JSON is intentionally denormalized and compact so the question generators can work without recomputing party analytics during each round.

Question generation

Each quiz round calls generatePartyQuestion, passing:

  • database client
  • party id
  • current QuizState
  • saved party.analysisData
  • question index

The generator fetches current party members, chooses a question type order, asks each question builder for a valid question, then attaches a suitable song.

1. Question type ordering

The possible question types are:

  • audio-metadata
  • social
  • numeric

For each round, the generator randomizes their priority using base weights and recent-history penalties:

Type Base weight
audio-metadata 1
numeric 0.55
social 0.1

A random value up to 0.35 is added. Each occurrence of the same type in the last 3 questions subtracts 0.45.

This means audio metadata questions are preferred by default, but the generator avoids repeating the same category too often.

2. Candidate generation

Each question builder creates multiple candidates, then pickQuestionCandidate selects one.

Candidates contain:

  • key — unique question identity
  • subjectKey — the entity being asked about, such as track:..., artist:..., genre:..., member:..., or pair:...
  • optional fairness metadata
  • the partial question object

A candidate is rejected if the quiz history already contains the same normalized:

  • question key
  • subject key
  • question text

This prevents repeated questions and repeated subjects across the quiz.

3. Fairness weighting

Tracks and artists are sorted by fairness before they are used as question subjects.

Fairness is derived from an entity's memberScores:

  • memberIds — party members connected to the entity
  • memberCount — how many members are connected
  • score — total member score for shared entities

For single-member entities, the fairness score is negative history usage for that member. This prevents the quiz from repeatedly focusing on one person when only single-member subjects are available.

Question candidate weight is:

if no fairness data:
  weight = 8
else:
  weight = 8 + memberCount * 20 + clamp(score, 0, 100) / 20

Weighted random selection is then used. Shared, high-scoring entities are therefore more likely, but not guaranteed, to be selected.

4. Audio metadata questions

buildAudioMetadataQuestion produces choice questions about party music metadata. It uses:

  • most shared genres
  • fair tracks from story clusters
  • fair artists from story clusters
  • detailed track rows from the database for album, artist, release date, and duration metadata

Examples of generated questions include:

  • What song is currently playing?
  • Which genre is shared by the most party members?
  • Which genre is ranked #<rank> in the party's shared genres?
  • Which artist is ranked highest in the shared audio data?
  • Which artist is ranked #<rank> in the shared audio data?
  • Which track is ranked highest across the party?
  • Which track is ranked #<rank> in the party analysis?
  • Which artist appears on "<album>"?
  • Which of these tracks came out first?
  • Which of these tracks came out most recently?
  • What's the longest track by <artist>?
  • Who performs "<track>"?
  • What is the name of this track by <artist>?
  • "<track>" appears on which album?

Options are built from relevant candidate pools, deduplicated, shuffled, and only emitted when there are enough valid options.

5. Social questions

buildSocialQuestion produces choice questions about players and relationships in the party.

Examples include:

  • Who is leading the quiz right now?
  • Who looks like the most diverse listener in the party?
  • Who listens the most to "<track>"?
  • Which two players share the most musical taste?

Social questions require enough party members for the question to make sense:

  • leader/diverse/top-listener questions need at least 2 members
  • most-aligned-pair questions need at least 3 members

The top-listener question prefers shared tracks when shared tracks exist, so the quiz does not unnecessarily focus on solo-only data.

6. Numeric questions

buildNumericQuestion produces numeric-answer questions. Numeric questions are scored by closeness during the quiz rather than by exact choice index.

Examples include:

  • What's the release year of <album or track>?
  • What year did "<track>" come out?
  • What year did <artist>'s first party track come out?
  • For how many players in the party is "<track>" a top track?
  • How many players in the party have "<artist>" as a favourite artist?

Release-year questions use a range around the correct year, capped at the current year and widened to a minimum span. Count questions use a range from 0 to the current party size.

7. Question timing

Every generated question is wrapped by buildQuestionWindow, which sets:

startTimestamp = Date.now();
endTimestamp = startTimestamp + 60_000;

These timestamps are used by the quiz workflow to decide when answer collection times out.

Song selection

After a question candidate is selected, selectQuestionSong chooses an audio track to attach to the question.

Song candidates come from:

  1. the song already attached by the question builder
  2. the question subject, if the subject is a track or artist
  3. people mentioned by the question, when member-specific subjects can imply a relevant song
  4. fair tracks from the story clusters
  5. top party songs queried from member top tracks

The selector avoids reusing songs from previous quiz rounds when possible by checking prior song.platform_id values.

Some question types should keep or prefer their subject song:

  • Questions where hearing the exact song is necessary keep the subject song.
  • Questions where the song helps but is not mandatory prefer a relevant fresh song.
  • Other questions prefer fair, fresh, adjacent party songs so audio does not reveal the answer too directly.

Quiz response and scoring

For each question, the quiz workflow waits until all current party members answer or the question deadline is reached. Missing answers are recorded with selected: -1 and score 0.

Choice questions are scored exactly:

pointsGained = question.points if selected option index equals question.correct
pointsGained = 0 otherwise

Numeric questions are scored relatively by answer distance:

  1. Ignore no-answer responses for ranking.
  2. Compute absolute distance from the correct numeric value.
  3. Group equal distances together.
  4. Award the closest group full points and linearly decrease points for later distance groups.
  5. If all numeric answers are equally distant, only exact answers receive points.

After each round, scores are added to quizState.scores, the quiz enters review, then continues to the next question. After the final question, the quiz status becomes results and the party status is marked ended.

Design goals

The current algorithm is optimized for:

  • Shared relevance: Prefer content that represents multiple party members.
  • Personal variety: Avoid repeatedly targeting the same member or subject.
  • Freshness: Avoid repeated question keys, subjects, text, and songs.
  • Playable trivia: Only emit questions with enough options, valid text, and usable metadata.
  • Low round latency: Do expensive aggregation once in party analysis, then use compact JSON during quiz rounds.