Gene regulatory networks (GRNs) are essential for understanding cell fate decisions and disease mechanisms, yet cross-species GRN inference from single-cell RNA-seq data remains challenging due to noise, sparsity, and cross-species distribution shifts. We propose GP-DHT (GenePair DualHeadTransformer), a cross-species single-cell GRN inference framework that models genes and cells in a heterogeneous graph with multi-level expression relations and learns structured regulatory representations via multi-relational graph attention. A dual-head Transformer further captures local gene pair regulatory dependencies and global cross-cell interaction patterns. To improve robustness under sparse and cross-species settings, GP-DHT introduces gene pair level supervised contrastive learning. Experiments on seven BEELINE benchmark datasets show consistent gains over representative baselines, improving AUROC and AUPRC by approximately 5 to 7 percent on most datasets. GP-DHT also recovers known regulatory modules and helps distinguish conserved from species-specific regulations.