
Dear Editor,
The influenza viruses cause continual epidemics in human society. As is reported by the World Health Organization (WHO), each year the seasonal influenza viruses, i.e., human influenza A(H1N1), A(H3N2) and B viruses, infected 5~15% of the world's population, leading to about 3 to 5 million cases of severe illness and about 250000 to 500000 deaths worldwide (WHO, 2014). Vaccination is currently the most effective way to fight against it. Due to the frequent mutations on the HA protein, the virus often changes its antigen, which may lead to the ineffectiveness of the influenza vaccines (Carrat and Flahault, 2007; Taubenberger and Kash, 2010).
Understanding the antigenic changes of the virus is helpful in developing vaccines against the influenza virus. Currently, the Hemagglutinininhibition (HI) assay is most widely used in determining the antigenic characteristics of influenza virus (Hirst, 1941; Ndifon et al., 2009; Ndifon, 2011). In each HI assay, a HI table is generated, each entry H^{ij} of which represents the HI titer of strain i relative to antisera raised against strain j (Figure 1A). The standard measure of the antigenic difference between virus strains i and j is the reciprocal of the normalized HI titer of i relative to antisera raised against j: rNHT^{ij}=H^{jj} /H^{ij}, or vice versa, rNHT^{ji}(Ndifon et al., 2009). Usually, only oneway rNHT distance (either rNHT^{ij} or rNHT^{ji}) is used in determining the antigenic variant in influenza surveillance. Few studies have investigated the relationship between the pairwise twoway rNHT distances (Ndifon et al., 2009), i.e., the rNHT^{ij} and rNHT^{ji} distance. Here, by collecting large amounts of HI data of seasonal influenza virus, i.e., influenza A(H1N1), A(H3N2) and B virus (Supplementary Materials), we systematically analyzed the relationship between the pairwise twoway rNHT distances for influenza viruses.
Figure 1. The discrepancies of pairwise twoway rNHT distances for influenza A(H1N1), A(H3N2) and B virus. (A) The HI table and the calculation of pairwise twoway rNHT distances. (B) The distribution of the differences between the pairwise twoway rNHT distances in influenza A(H1N1), A(H3N2) and B viruses. (C) Correlations (Spearman Correlation Coefficient) between the differences of pairwise twoway rNHT distances and fold differences in rate of the homologous titers (Homotiter), the length of isolation time interval (Time period), the number of amino acid mutations on HA1 protein (HA1), antigenic epitopes, receptorbinding sites (RBS) and Nglycosylation sites (Ngly) between pairs of viruses for influenza A (H1N1), A(H3N2) and B virus. (D) The ratio of antigenic variant in different bins of rNHT distance. The dashed box refers to the bin with the largest uncertainty in determining the antigenic variant for influenza virus.
Although there are median correlations between the pairwise twoway rNHT distances for all three (sub)types (with Pearson Correlation Coefficient (PCC) ranging from 0.61 to 0.68, Supplementary Figure S1), surprisingly, large discrepancies were observed between the pairwise twoway rNHT distances, as is shown in Figure 1B. The median differences between the pairwise twoway rNHT distances all equal to 4, while the mean differences range from 11 to 12 for all three (sub)types.
Generally, a strain is said to be antigenically drifted relative to another strain if the rNHT distance is greater than or equal to 4 (WHO, 2002; Ndifon et al., 2009). We found that among a total of 456, 1286 and 753 pairs of twoway rNHT distances for A(H1N1), A(H3N2) and B virus, 72%, 59% and 70% of these pairs agree with each other for the three (sub)types respectively. For the other pairs of twoway rNHT distances, one rNHT distance is greater than or equal to 4, but the other rNHT distance is less than 4, or vice versa. In some cases, the larger rNHT distance is much larger than the smaller one (Figure 1B and Supplementary Figure S1). This suggests that one should be cautious to determine the antigenic variant based on oneway rNHT distance only, which was usually done in routine influenza surveillance.
We next attempted to investigate the factors behind the difference of the pairwise twoway rNHT distances. The fold difference in rate of homogenous titers (the HI titer of a strain relative to the antisera raised against itself) for virus strains i and j was found to correlate most strongly with the differences between the pairwise twoway rNHT distances in all three subtypes (Figure 1C). To some extent, the homogenous titer for a strain reflects its ability of binding to the red blood cells, which was reported to significantly influence the titers of the virus in HI assays. Therefore, the difference of homogenous titers for strain i and j could in part represent the different ability of the viruses binding to the red blood cells, which further influenced the difference between the pairwise twoway rNHT distances. Moreover, this could be validated by the significantly positive correlations between the differences of receptorbinding sites and that of the pairwise twoway rNHT distances (Figure 1C).
In addition, we found the differences between the pairwise twoway rNHT distances for a pair of virus strains i and j correlated positively with the length of the time period between the isolation time of this pair of viruses (Figure 1C). Interestingly, we found that for all three (sub)types of influenza viruses, the rNHT distance with the virus generating homogenous titer isolated in earlier time is significantly larger than that with the virus generating homogenous titer isolated later (Supplementary Figure S2). This suggests that more antisera raised against the virus isolated in earlier time are needed to neutralize the virus isolated later than the reverse.
We further analyzed whether the differences on HA1 protein between virus strains contributed to the discrepancies of pairwise twoway rNHT distances. The number of amino acid mutations on HA1, antigenic epitopes, receptorbinding sites and Nglycosylation sites were all observed to correlate positively with the differences between the pairwise twoway rNHT distances (Figure 1C). Moreover, the changes of 60, 72 and 50 amino acid positions on HA1 protein for influenza A(H1N1), A(H3N2) and B virus respectively also correlated significantly with the differences between the pairwise twoway rNHT distances (Supplementary Table S2). Most of them were located in antigenic epitopes for influenza A(H1N1) and A(H3N2) viruses. Taken together, the differences on HA1 protein could explain 46%, 45% and 42% of the variance of the differences between the pairwise twoway rNHT distances for influenza A(H1N1), A(H3N2) and B virus, respectively (Supplementary Table S3).
To facilitate its usage in influenza surveillance, we next determined the confidence level of oneway rNHT distances in determining the antigenic variant of influenza virus. The rAHM distance, which integrated the pairwise rNHT distances (Archetti et al., 1950; WHO, 2002) (see Supplementary Materials), was reported to be more accurate in measuring the antigenic difference. When the rAHM distance between a pair of viruses is greater than or equal to 4, they were regarded as antigenic variants. Using this definition, we determined the ratio of antigenic variants for different bins of rNHT distance (Figure 1D). When the rNHT distance is less than 4, or greater than or equal to 8, the ratio of antigenic variants is mostly larger than 80%. While in the bin of 4~8 ( ≥4 & < 8), the rNHT distance has the least confidence in determining the antigenic variants for all three (sub)types. We next attempted to improve the confidence of antigenic variants determination for the rNHT distances in this bin using the HA protein sequences. For influenza A(H3N2) virus, we found the HA protein sequencebased computational method PREDACH3 (Du et al., 2012; Peng et al., 2016) achieved a higher accuracy than the rNHT did (0.65 vs 0.54) (Table S4).
Considering the large discrepancy between the pairwise twoway rNHT distances in the HI assay, determining the antigenic variants using oneway rNHT distance may be misleading. The better way is to determine the pairwise twoway rNHT distances between a pair of viruses. Considering the substantial amount of experimental effort involved in HI assays, an alternative way is to improve the quality of HI data by computational methods. For example, the methods developed by Ndifon could be used to recover unmeasured HI data from the measured data and to minimize noise and nonantigenic variation in the HI data (Ndifon, 2011). Besides, since the influenza virus is reported to evolve at population level, the populationbased methods, such as the antigenic map, antigenic cluster or antigenic cartography (Smith et al., 2004; Barnett et al., 2012; Du et al., 2012; Peng et al., 2016), may improve the confidence level of individual oneway rNHT distance. This is particular useful when determining the antigenic relationships for lots of viruses. Last but not the least, the sequencebased computational methods (Liao et al., 2008; Deem and Pan, 2009; Du et al., 2012) developed recently, such as the PREDACH3 mentioned above (Supplementary Table S4), could also help determine the antigenic relationship between viruses when there is a lack of twoway rNHT distances.
HTML

Figure S1. The pairwise twoway rNHT distances for influenza A (H1N1) (A), A (H3N2) (B) and B (C) virus, respectively.For a pair of rNHT distances, the smaller one was displayed along the yaxis, while the larger one was displayed along the xaxis. PCC and SCC refer to the Pearson and Spearman Correlation Coefficient respectively between the twoway rNHT distances. The pairs of rNHT distances in the gray rectangle region refer to those disagree with each other in determining the antigenic variant with the cutoff 4.
Figure S2. Comparison between the pairwise twoway rNHT distances with the virus generating antisera isolated chronologically for influenza A (H1N1), A (H3N2) and B viruses. ^{"***"}, pvalue < 0.0001.
Part Ⅰ: Influenza A(H1N1) virus Epitope PCC pvalue SCC pvalue HA1  0.29 3.30E08 0.27 2.74E07 RBS  0.29 4.67E08 0.34 1.83E10 Epitope A  0.25 3.98E06 0.2 1.54E04 Epitope B  0.27 3.47E07 0.29 3.25E08 Epitope C  0.24 5.92E06 0.22 4.48E05 Epitope D  0.13 1.36E02 0.14 1.02E02 Epitope E  0.22 3.61E05 0.09 1.12E01 Epitopes  0.31 6.81E09 0.27 4.48E07 Nonepitope  0.19 4.19E04 0.18 8.17E04 Ngly  0.29 4.51E08 0.26 1.86E06 S43 C 0.33 2.74E10 0.34 1.45E10 S47 E 0.23 1.63E05 0.15 6.33E03 S54 E 0.23 1.70E05 0.21 6.61E05 S56 E 0.11 3.60E02 0.13 1.93E02 S61 O 0.11 4.13E02 0.11 4.27E02 S66 E 0.15 4.32E03 0.11 4.30E02 S71 E 0.26 1.56E06 0.26 1.52E06 S73 E 0.12 2.23E02 0.16 3.69E03 S74 E 0.14 1.21E02 0.15 6.46E03 S80 E 0.26 1.69E06 0.23 1.72E05 S82 E 0.13 1.69E02 0.17 2.02E03 S84 E 0.11 4.03E02 0.14 7.91E03 S85 E 0.12 3.11E02 0.11 3.77E02 S89 D 0.11 3.56E02 0.07 1.96E01 S121 A 0.14 8.78E03 0.19 4.15E04 S125 B 0.33 5.17E10 0.35 3.28E11 S127 A 0.21 1.11E04 0.21 1.40E04 S129 A 0.18 7.78E04 0.14 7.54E03 S130 O 0.24 6.93E06 0.23 2.26E05 S133 A 0.21 1.09E04 0.23 1.68E05 S134 A 0.15 4.33E03 0.11 4.30E02 S135 O 0.18 9.94E04 0.17 1.72E03 S137 O 0.13 1.56E02 0.07 1.98E01 S138 A 0.18 8.44E04 0.17 1.73E03 S139 A 0.19 3.97E04 0.19 5.89E04 S140 A 0.13 1.56E02 0.07 1.98E01 S141 A 0.14 1.12E02 0.16 2.90E03 S146 A 0.02 6.93E01 0.11 4.19E02 S149 O 0.11 4.13E02 0.11 4.27E02 S153 B 0.16 3.99E03 0.16 3.16E03 S157 O 0.12 3.11E02 0.11 3.77E02 S162 D 0.18 1.09E03 0.09 9.12E02 S163 D 0.12 3.43E02 0.17 1.87E03 S166 D 0.17 2.13E03 0.13 1.74E02 S168 D 0.19 3.41E04 0.1 5.81E02 S169 D 0.11 4.13E02 0.11 4.27E02 S183 B 0.19 5.07E04 0.2 2.66E04 S186 B 0.24 5.82E06 0.23 2.86E05 S189 B 0.15 6.16E03 0.16 3.01E03 S191 O 0.22 4.62E05 0.24 7.17E06 S193 B 0.21 1.38E04 0.22 3.58E05 S195 B 0.11 4.13E02 0.11 4.27E02 S205 D 0.14 8.78E03 0.19 4.15E04 S207 D 0.14 1.12E02 0.14 1.06E02 S209 D 0.09 1.18E01 0.12 2.40E02 S216 O 0.18 6.80E04 0.23 1.74E05 S222 D 0.16 2.68E03 0.19 3.58E04 S224 D 0.17 1.75E03 0.21 1.35E04 S227 O 0.11 4.13E02 0.11 4.27E02 S245 O 0.11 4.13E02 0.11 4.27E02 S252 A 0.17 2.19E03 0.12 2.17E02 S255 O 0.07 2.20E01 0.11 4.98E02 S258 E 0.13 1.49E02 0.13 1.77E02 S261 O 0.18 7.29E04 0.15 5.41E03 S267 O 0.1 7.79E02 0.12 3.13E02 S271 C 0.28 1.57E07 0.26 1.41E06 S273 C 0.15 7.00E03 0.16 3.25E03 S277 C 0.13 2.13E02 0.17 1.95E03 S295 O 0.11 3.60E02 0.13 1.93E02 S298 O 0.18 1.09E03 0.09 9.12E02 Part Ⅱ: Influenza A(H3N2) virus Epitope PCC pvalue SCC pvalue HA1  0.3 9.10E15 0.32 1.29E16 RBS  0.15 8.09E05 0.13 5.84E04 Epitope A  0.26 1.37E11 0.26 3.27E11 Epitope B  0.27 1.01E12 0.25 4.37E11 Epitope C  0.22 7.96E09 0.23 1.72E09 Epitope D  0.21 5.45E08 0.26 1.17E11 Epitope E  0.28 1.76E13 0.28 1.67E13 Epitopes  0.3 6.88E15 0.35 2.98E20 Nonepitopes  0.11 6.57E03 0.07 5.90E02 Ngly  0.21 7.65E08 0.15 8.50E05 S2 O 0.06 1.08E01 0.08 4.56E02 S3 O 0.18 3.18E06 0.14 5.47E04 S9 O 0.19 8.46E07 0.14 2.09E04 S10 O 0.12 1.75E03 0.11 3.98E03 S25 O 0.19 1.94E06 0.14 2.38E04 S31 O 0.14 2.59E04 0.13 6.86E04 S34 O 0.12 2.54E03 0.11 5.67E03 S50 C 0.24 7.15E10 0.16 3.17E05 S53 C 0.12 2.42E03 0.12 2.61E03 S54 C 0.12 1.89E03 0.1 9.19E03 S62 E 0.24 1.14E09 0.2 3.47E07 S63 E 0.11 3.81E03 0.1 1.36E02 S67 E 0.1 8.91E03 0.11 4.64E03 S75 E 0.16 3.79E05 0.15 1.97E04 S78 E 0.08 3.11E02 0.09 2.74E02 S82 E 0.21 6.88E08 0.14 2.43E04 S83 E 0.23 3.72E09 0.19 1.62E06 S112 O 0.08 4.16E02 0.05 2.06E01 S121 D 0.13 8.61E04 0.15 1.40E04 S124 A 0.1 7.99E03 0.1 8.76E03 S126 A 0.14 2.07E04 0.13 6.59E04 S129 B 0.12 1.75E03 0.11 3.98E03 S131 A 0.18 2.78E06 0.15 8.85E05 S132 A 0.12 1.75E03 0.11 3.98E03 S133 A 0.16 3.50E05 0.14 2.69E04 S135 A 0.25 1.73E10 0.22 8.44E09 S137 A 0.22 3.00E08 0.15 8.83E05 S138 A 0.1 7.91E03 0.08 3.98E02 S142 A 0.09 2.12E02 0.05 1.87E01 S143 A 0.09 1.93E02 0.07 8.26E02 S144 A 0.15 1.40E04 0.11 4.65E03 S145 A 0.16 2.92E05 0.19 1.59E06 S146 A 0.09 2.18E02 0.07 5.87E02 S148 O 0.12 2.74E03 0.11 4.81E03 S155 B 0.2 4.31E07 0.17 1.02E05 S156 B 0.18 4.48E06 0.16 6.63E05 S157 B 0.15 2.03E04 0.11 4.59E03 S158 B 0.35 0.00E+00 0.26 6.98E12 S159 B 0.07 5.89E02 0.1 8.76E03 S160 B 0.09 1.74E02 0.07 8.89E02 S164 B 0.19 7.42E07 0.17 1.75E05 S173 D 0.09 1.79E02 0.08 5.37E02 S174 D 0.2 2.32E07 0.19 1.68E06 S188 B 0.08 5.49E02 0.08 3.66E02 S189 B 0.32 0.00E+00 0.27 2.11E12 S190 B 0.06 1.21E01 0.1 1.06E02 S193 B 0.07 9.29E02 0.09 2.71E02 S197 B 0.12 1.49E03 0.09 1.66E02 S201 D 0.11 3.80E03 0.15 1.13E04 S202 O 0.13 8.44E04 0.1 1.21E02 S207 D 0.09 1.66E02 0.1 7.59E03 S208 D 0.17 2.00E05 0.12 1.40E03 S209 D 0.09 1.62E02 0.09 2.70E02 S213 D 0.15 9.22E05 0.16 4.24E05 S217 D 0.12 1.92E03 0.12 2.06E03 S222 O 0.13 1.23E03 0.09 2.42E02 S223 O 0.07 9.09E02 0.08 4.47E02 S229 D 0.09 2.42E02 0.11 5.50E03 S230 D 0.13 1.29E03 0.13 5.91E04 S233 O 0.13 1.27E03 0.07 6.47E02 S240 D 0.1 9.90E03 0.11 5.17E03 S244 D 0.08 4.57E02 0.08 4.21E02 S246 D 0.09 2.18E02 0.07 9.35E02 S260 E 0.2 1.46E07 0.14 5.02E04 S262 E 0.14 3.73E04 0.12 1.46E03 S271 O 0.05 2.27E01 0.09 2.50E02 S275 C 0.12 2.94E03 0.12 1.54E03 S276 C 0.2 3.60E07 0.17 7.46E06 S278 C 0.15 1.02E04 0.15 1.09E04 S299 C 0.09 2.82E02 0.09 2.91E02 S308 C 0.29 5.06E14 0.17 1.73E05 S327 O 0.06 1.05E01 0.08 4.49E02 Part Ⅲ Influenza B virus Epitope PCC pvalue SCC pvalue HA1  0.38 4.44E16 0.26 1.14E07 RBS  0.44 0.00E+00 0.29 1.26E09 Epitope A  0.33 5.43E12 0.2 4.46E05 Epitope B  0.4 0.00E+00 0.3 6.60E10 Epitope C  0.28 1.16E08 0.24 8.75E07 Epitope D  0.42 0.00E+00 0.28 5.24E09 Epitopes  0.41 0.00E+00 0.28 4.81E09 Nonepitopes  0.33 3.93E12 0.22 5.71E06 Ngly  0.27 3.52E08 0.22 1.01E05 S29 O 0.09 6.25E02 0.11 2.10E02 S38 O 0.42 0.00E+00 0.15 2.49E03 S40 O 0.1 4.50E02 0.03 5.92E01 S45 O 0.01 7.95E01 0.1 4.33E02 S56 O 0.23 2.14E06 0.16 1.57E03 S58 O 0.12 1.55E02 0.03 5.50E01 S71 O 0.4 0.00E+00 0.27 3.04E08 S73 O 0.28 8.53E09 0.22 5.02E06 S75 O 0.22 6.67E06 0.16 1.41E03 S76 O 0.27 1.73E08 0.2 4.10E05 S80 O 0.12 1.47E02 0.06 2.36E01 S81 O 0.38 8.88E16 0.26 1.13E07 S87 O 0.09 6.88E02 0.14 5.01E03 S116 A 0.21 2.54E05 0.15 2.58E03 S121 A 0.14 3.24E03 0.08 1.17E01 S122 A 0.36 3.24E14 0.25 4.43E07 S126 A 0.15 2.61E03 0.07 1.71E01 S136 A 0.41 0.00E+00 0.27 2.25E08 S137 A 0.32 2.25E11 0.25 3.02E07 S139 O 0.26 8.70E08 0.1 4.04E02 S146 B 0.24 5.67E07 0.17 3.98E04 S147 B 0.37 3.33E15 0.15 1.71E03 S148 B 0.36 2.40E14 0.25 3.05E07 S149 B 0.37 5.77E15 0.25 2.04E07 S150 B 0.36 6.71E14 0.3 3.14E10 S154 O 0.35 2.24E13 0.16 8.82E04 S159 O 0.37 3.33E15 0.15 1.71E03 S160 O 0.37 3.33E15 0.15 1.71E03 S162 C 0.13 6.40E03 0.1 4.07E02 S163 C 0.04 4.08E01 0.1 3.59E02 S165 C 0.17 6.01E04 0.08 1.07E01 S167 C 0.33 5.31E12 0.15 2.02E03 S169 O 0.25 1.55E07 0.22 5.51E06 S170 O 0.13 7.75E03 0 9.52E01 S172 O 0.28 4.26E09 0.21 1.98E05 S179 O 0.19 1.43E04 0.1 4.09E02 S180 O 0.14 3.50E03 0.08 9.02E02 S194 D 0.14 4.78E03 0.15 2.83E03 S195 D 0.4 0.00E+00 0.27 2.87E08 S199 D 0.39 0.00E+00 0.27 2.04E08 S200 D 0.42 0.00E+00 0.26 7.43E08 S206 O 0.44 0.00E+00 0.27 1.38E08 S227 O 0.12 1.81E02 0.11 2.73E02 S230 O 0.23 3.21E06 0.12 1.13E02 S238 O 0.36 3.22E14 0.14 3.35E03 S252 O 0.35 5.48E13 0.25 2.50E07 S259 O 0.37 4.66E15 0.26 1.04E07 S264 O 0.27 1.38E08 0.19 1.52E04 S276 O 0.03 5.40E01 0.1 4.55E02 S296 O 0.2 2.71E05 0.12 1.41E02 Table S2.
H1N1 H3N2 B Number of variables 49 57 21 pvalue < 2.2e16 < 2.2e16 < 2.2e16 Multiple Rsquared 0.54 0.5 0.45 Adjusted Rsquared 0.46 0.45 0.42 Table S3. Summary of the least squares linear models in influenza H1N1, H3N2 and B virus which take the changes on HA1 protein (see Table S2) as variables to explain the variability of the differences of the pairwise twoway rNHT distances
Method Accuracy Sensitivity Specificity PREDACH3 0.65 0.51 0.81 Table S4. The performance of PREDACH3 in determining the antigenic relationship for those viral pairs with rNHT distance in the bin of 4~8 (≥4 & < 8) for influenza A (H3N2) virus