Comportement de plage de chaînes étrange à l'aide de correspondances NSRegularExpression

J'essaie d'analyser une réponse HTTP brute et j'obtiens une plage incorrecte lorsque j'essaie de convertir un NSRange en Range. Voici le code correspondant d'un terrain de jeu:Comportement de plage de chaînes étrange à l'aide de correspondances NSRegularExpression

public extension NSRange { 
    public func toStringRange(_ str: String) -> Range<String.Index>? { 
     guard str.characters.count >= length - location && location < str.characters.count else { return nil } 
     let fromIdx = str.characters.index(str.startIndex, offsetBy: self.location) 
     print("from: \(self.location) = \(fromIdx)") 
     let toIdx = str.characters.index(fromIdx, offsetBy: self.length) 
     return fromIdx..<toIdx 
    } 
} 

let responseString = "HTTP/1.0 200 OK\r\nContent-Length: 193\r\nContent-Type: application/json\r\n" 
let responseRange = NSRange(location: 0, length: responseString.characters.count) 
let responseRegex = try! NSRegularExpression(pattern: "^(HTTP/1.\\d) (\\d+) (.*?\r\n)(.*)", options: [.anchorsMatchLines]) 
guard let matchResult = responseRegex.firstMatch(in: responseString, options: [], range: responseRange), 
    matchResult.numberOfRanges == 5, 
    let versionRange = matchResult.rangeAt(1).toStringRange(responseString), 
    let statusRange = matchResult.rangeAt(2).toStringRange(responseString), 
    let headersRange = matchResult.rangeAt(4).toStringRange(responseString) 
    else { fatalError() }

La sortie de l'impression dans toStringRange() est

from: 0 = Index(_base: Swift.String.UnicodeScalarView.Index(_position: 0), _countUTF16: 1) 
from: 9 = Index(_base: Swift.String.UnicodeScalarView.Index(_position: 9), _countUTF16: 1) 
from: 17 = Index(_base: Swift.String.UnicodeScalarView.Index(_position: 18), _countUTF16: 1)

Pourquoi l'appel 3ème toStringRange() retournant une plage de chaîne qui commence à 18 au lieu de 17?

Source

2016-10-25 Mark Lilback

Votre méthode de conversion NSRange-Range<String.Index> ne le fait pas correctement pour travail groupes de graphèmes étendus et caractères en dehors du « plan multilingue de base » (drapeaux de emojis, etc,).

NSRange chiffres points de code UTF-16 (correspondant à la représentation unichar dans NSString). Range<String.Index> compte Swift Characters qui représentent des grappes de graphes étendues.

Dans votre cas concret, "\r\n" compte comme deux points de code UTF-16, mais comme un seul Character, et qui provoque le « changement » indésirable.

Voici un exemple simplifié:

let responseString = "OK\r\nContent-Length" 

let nsRange = (responseString as NSString).range(of: "Content") 
print(nsRange.location, nsRange.length) // 4 7 

if let sRange1 = nsRange.toStringRange(responseString) { 
    print(responseString.substring(with: sRange1)) // "ontent-" 
}

En utilisant la méthode

extension String { 
    func range(from nsRange: NSRange) -> Range<String.Index>? { 
     guard 
      let from16 = utf16.index(utf16.startIndex, offsetBy: nsRange.location, limitedBy: utf16.endIndex), 
      let to16 = utf16.index(from16, offsetBy: nsRange.length, limitedBy: utf16.endIndex), 
      let from = String.Index(from16, within: self), 
      let to = String.Index(to16, within: self) 
      else { return nil } 
     return from ..< to 
    } 
}

de NSRange to Range<String.Index> vous obtiendrez le résultat escompté:

if let sRange2 = responseString.range(from: nsRange) { 
    print(responseString.substring(with: sRange2)) // "Content" 
}

Source

2016-10-25 22:03:27

Comportement de plage de chaînes étrange à l'aide de correspondances NSRegularExpression

Répondre

Questions connexes