Abstract: In visual tracking tasks, identifying repeatedly appearing targets is a significant challenge. Existing algorithms often use network structures such as CNNs and ViTs to extract the ...