Improve client identification based on BEP 10
Review Request #130077 - Created April 10, 2017 and updated
Identify clients more accurately using the 'v' string embedded in extended handshake messages as described by BitTorrent Enhancement Proposal 10 'Extension Protocol' (BEP 10).
My tests showed that:
- most peers/clients send the string
- most peers/clients send useful/compliant string content
- clients are identified even if PeerID::identifyClient() fails to do so (e.g. 'Unknown client' -> 'Tixati')
- client identification becomes more accurate (e.g. 'Azureus 5.7.x.x' -> 'Vuze 5.7.x.x')
Here's a screenshot comparing identification methods. Format of column 'Client': v-string-based identification, peer-id-based identification, peer id.
I've collected some statistics and it seems few torrent client violate this BEP and send client name in other than UTF-8 encoding.
This results in something like:?Torrent 1.5(µTorrent 126.96.36.199) -XL0012--x???Awt???4(Xunlei 0.0.1.2)
It would be better to detect ? (hex BDBFEF) which Qt seems to use to substitute wrongly encoded UTF-8 characters and ignore value in this case. So we need something likeif (!new_client.isEmpty() && !new_client.contains(QString(QChar::ReplacementCharacter))) stats.client = new_client + "(" + stats.client + ")";