Accuracy scores on HLV-1K are presented on frame-level, within-event-level, cross-event-level and long-term-level.
# | Model | LLM Params |
Frames | Date | Accuracy Scores (%) | ||||
---|---|---|---|---|---|---|---|---|---|
Frame-level | Within-event-level | Cross-event-level | Long-term-level | Overall | |||||
1 | LLaVA-Video | 72B | 120 | 2025-01-03 | 84.41 | 78.43 | 80.10 | 75.65 | 78.93 |
2 | LLaVA-OneVision | 72B | 120 | 2025-01-03 | 80.33 | 75.06 | 77.25 | 68.74 | 74.01 |
3 | Qwen2-VL | 72B | 120 | 2025-01-03 | 61.44 | 66.83 | 66.96 | 67.17 | 65.78 |
4 | Kangaroo | 8B | 120 | 2025-01-03 | 75.23 | 63.57 | 65.04 | 54.60 | 62.71 |
5 | Gemini 1.5 Pro | - | 120 | 2025-01-03 | 60.39 | 64.46 | 63.08 | 62.37 | 62.41 |
6 | LongVA | 7B | 120 | 2025-01-03 | 67.89 | 59.12 | 61.37 | 59.67 | 61.74 |
7 | InternVL2.5 | 8B | 120 | 2025-01-03 | 60.72 | 65.02 | 62.73 | 59.34 | 61.24 |
8 | GPT-4o | - | 120 | 2025-01-03 | 53.88 | 59.08 | 56.64 | 54.37 | 55.48 |
9 | Claude 3.5 Sonnet | - | 20 | 2025-01-03 | 26.21 | 23.98 | 27.73 | 28.89 | 27.24 |
Green date indicates the newly added/updated models - indicates closed-source models