Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Show HN: Marlin-2B: a tiny VLM to extract structured information from videos (huggingface.co)

7 points by HappyPablo 1 days ago | 2 comments

teamcubitflow 1 days ago [-]

I'm surprised that kind of captioning came from a 2B model; glad the fine tuning process actually shows a deliberate approach to making qwen 3.5 into essentially a new model of it's kind.

HappyPablo 1 days ago [-]

hey this is shubham, yeah Qwen3.5VL is awesome and it's training vocab is quiet strong so with the right data curation you can prolly take it into a bunch of other narrow tasks eg: we trying to fine-tune it to use SAM3 in a loop for segmentation tasks in the videos

HappyPablo 1 days ago [-]

[dead]

Rendered at 23:23:16 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.