This substitution model was determined to be the most appropriate by ModelTest [22]. ML bootstrap support was calculated after 100 reiterations.

Multilocus sequence analysis For each locus, each allele was assigned a distinct arbitrary number using a nonredundant database program available at http://www.pubmlst.org. The combination of allele numbers for each isolate defined the sequence type (ST). Allele profiles were analyzed using eBURST v3 software [23] to determine the clonal complexes (CCs) defined as sets of related strains that share at least 5 identical alleles at the 7 loci. A complementary eBURST analysis was conducted to determine the CCs sharing at least 4 identical alleles at the 7 loci. The program LIAN 3.5 [24], available at http://www.pubmlst.org, was used to calculate the standardized index of association (sIA) to test the null hypothesis of linkage disequilibrium, the mean genetic diversity (H) and the genetic

diversity at each locus (h). The number of synonymous (dS) and non-synonymous (dN) substitutions per site was determined from codon-aligned sequences using Sequence Type Analysis and Recombinational Tests Version 2 (START2) software [25]. Other genetic analyses, including the determination of allele and allelic profile frequencies, mol% G + C content and polymorphic site numbering, were also carried out using START2 software. A distance matrix in nexus format was generated from the set of allelic profiles and then used for decomposition analyses with SplitsTree 4.0 software [26]. Recombination events were detected from the aligned ST concatenated sequences using the RDP v3.44 [27] software package with the following parameters: general (linear sequence, highest P value of 0.05, Bonferroni correction), RDP (no reference, window size of 8 polymorphic sites, 0-100% sequence identity range), GENECONV (scan triplets, G-scale of 1), Bootscan (window size of

200 bp, step size of 20 bp, 70% cutoff, F84 model, 100 bootstrap replicates, binomial P value), MAxChi (scan triplets, fraction of variable sites per window set to 0.1), CHIMAERA (scan triplets, fraction of variable sites per window set to 0.1) and Siscan (window of 200 bp, step size of 20 bp, use 1/2/3 variable positions, this website nearest outlier for the 4th sequence, 1000 P value permutations, 100 scan permutations). Other statistics All qualitative variables with the exception of the sIA were compared using a Chi-squared test or the Fisher’s exact test where appropriate; a P value ≤0.05 was considered to reflect significance. All computations were performed using R project software (http://www.r-project.org). Phylotaxonomics The population structure was inferred from multilocus phylogenetic analysis (MLPA) following reconstruction of the distance and ML trees from the concatenated sequences (alignment length of 3993 nt).