(bsz, seq_len, num_q_heads, head_dim) transpose(1, 2) 成了 [bsz, n_q_head, seq_len, head_dim]