2011-10-10 4 views
0

je me demande pourquoi je vais avoir des problèmes lors de l'insertion dans des chaînes comme db hey hey %80 le '%80' produisent encore un ccueil:db Mongo utf-8 exception

Uncaught exception 'MongoException' with message 'non-utf8 string: hey hey �' 

ce que je dois faire? :(% est 80 pas un utf-8; Char:? O

js passer la chaîne à l'unité de commande:

function new_pool_post(_url,_data,_starter){ 
$.ajax({ 
    type:'POST', 
    data:_data, 
    dataType:'json', 
    url:_url, 
    beforeSend:function(){ 
    $('.ajax-loading').show(); 
    $(_starter).attr('disabled','disabled'); 
    }, 
    error:function(){ 
     $('.ajax-loading').hide(); 
     $(_starter).removeAttr('disabled'); 
    }, 
    success:function(json){ 
    $('.ajax-loading').hide(); 
    $(_starter).removeAttr('disabled'); 
    if(json){ 
     $('.pool-append').prepend(json.pool_post); 

    } 
    } 
}); 
} 
contrôleur

recevoir des données:

$id_project = $this->input->post('id_project',true); 
       $id_user = $this->session->userdata('user_id'); 
       $pool_post = $this->input->post('pool_post',true); 
contrôleur

désinfecter données:

public function xss_clean($str, $is_image = FALSE) 
    { 
     /* 
     * Is the string an array? 
     * 
     */ 
     if (is_array($str)) 
     { 
      while (list($key) = each($str)) 
      { 
       $str[$key] = $this->xss_clean($str[$key]); 
      } 

      return $str; 
     } 
       /*Remove non utf-8; chars*/ 

       $str = htmlspecialchars(urlencode(preg_replace('/[\x00-\x1F\x80-\xFF]/','',$str))); 

     /* 
     * Remove Invisible Characters 
     */ 
     $str = remove_invisible_characters($str); 

     // Validate Entities in URLs 
     $str = $this->_validate_entities($str); 

     /* 
     * URL Decode 
     * 
     * Just in case stuff like this is submitted: 
     * 
     * <a href="http://%77%77%77%2E%67%6F%6F%67%6C%65%2E%63%6F%6D">Google</a> 
     * 
     * Note: Use rawurldecode() so it does not remove plus signs 
     * 
     */ 
     $str = rawurldecode($str); 

     /* 
     * Convert character entities to ASCII 
     * 
     * This permits our tests below to work reliably. 
     * We only convert entities that are within tags since 
     * these are the ones that will pose security problems. 
     * 
     */ 

     $str = preg_replace_callback("/[a-z]+=([\'\"]).*?\\1/si", array($this, '_convert_attribute'), $str); 

     $str = preg_replace_callback("/<\w+.*?(?=>|<|$)/si", array($this, '_decode_entity'), $str); 

     /* 
     * Remove Invisible Characters Again! 
     */ 
     $str = remove_invisible_characters($str); 

     /* 
     * Convert all tabs to spaces 
     * 
     * This prevents strings like this: ja vascript 
     * NOTE: we deal with spaces between characters later. 
     * NOTE: preg_replace was found to be amazingly slow here on 
     * large blocks of data, so we use str_replace. 
     */ 

     if (strpos($str, "\t") !== FALSE) 
     { 
      $str = str_replace("\t", ' ', $str); 
     } 

     /* 
     * Capture converted string for later comparison 
     */ 
     $converted_string = $str; 

     // Remove Strings that are never allowed 
     $str = $this->_do_never_allowed($str); 

     /* 
     * Makes PHP tags safe 
     * 
     * Note: XML tags are inadvertently replaced too: 
     * 
     * <?xml 
     * 
     * But it doesn't seem to pose a problem. 
     */ 
     if ($is_image === TRUE) 
     { 
      // Images have a tendency to have the PHP short opening and 
      // closing tags every so often so we skip those and only 
      // do the long opening tags. 
      $str = preg_replace('/<\?(php)/i', "&lt;?\\1", $str); 
     } 
     else 
     { 
      $str = str_replace(array('<?', '?'.'>'), array('&lt;?', '?&gt;'), $str); 
     } 

     /* 
     * Compact any exploded words 
     * 
     * This corrects words like: j a v a s c r i p t 
     * These words are compacted back to their correct state. 
     */ 
     $words = array(
       'javascript', 'expression', 'vbscript', 'script', 
       'applet', 'alert', 'document', 'write', 'cookie', 'window' 
      ); 

     foreach ($words as $word) 
     { 
      $temp = ''; 

      for ($i = 0, $wordlen = strlen($word); $i < $wordlen; $i++) 
      { 
       $temp .= substr($word, $i, 1)."\s*"; 
      } 

      // We only want to do this when it is followed by a non-word character 
      // That way valid stuff like "dealer to" does not become "dealerto" 
      $str = preg_replace_callback('#('.substr($temp, 0, -3).')(\W)#is', array($this, '_compact_exploded_words'), $str); 
     } 

     /* 
     * Remove disallowed Javascript in links or img tags 
     * We used to do some version comparisons and use of stripos for PHP5, 
     * but it is dog slow compared to these simplified non-capturing 
     * preg_match(), especially if the pattern exists in the string 
     */ 
     do 
     { 
      $original = $str; 

      if (preg_match("/<a/i", $str)) 
      { 
       $str = preg_replace_callback("#<a\s+([^>]*?)(>|$)#si", array($this, '_js_link_removal'), $str); 
      } 

      if (preg_match("/<img/i", $str)) 
      { 
       $str = preg_replace_callback("#<img\s+([^>]*?)(\s?/?>|$)#si", array($this, '_js_img_removal'), $str); 
      } 

      if (preg_match("/script/i", $str) OR preg_match("/xss/i", $str)) 
      { 
       $str = preg_replace("#<(/*)(script|xss)(.*?)\>#si", '[removed]', $str); 
      } 
     } 
     while($original != $str); 

     unset($original); 

     // Remove evil attributes such as style, onclick and xmlns 
     $str = $this->_remove_evil_attributes($str, $is_image); 

     /* 
     * Sanitize naughty HTML elements 
     * 
     * If a tag containing any of the words in the list 
     * below is found, the tag gets converted to entities. 
     * 
     * So this: <blink> 
     * Becomes: &lt;blink&gt; 
     */ 
     $naughty = 'alert|applet|audio|basefont|base|behavior|bgsound|blink|body|embed|expression|form|frameset|frame|head|html|ilayer|iframe|input|isindex|layer|link|meta|object|plaintext|style|script|textarea|title|video|xml|xss'; 
     $str = preg_replace_callback('#<(/*\s*)('.$naughty.')([^><]*)([><]*)#is', array($this, '_sanitize_naughty_html'), $str); 

     /* 
     * Sanitize naughty scripting elements 
     * 
     * Similar to above, only instead of looking for 
     * tags it looks for PHP and JavaScript commands 
     * that are disallowed. Rather than removing the 
     * code, it simply converts the parenthesis to entities 
     * rendering the code un-executable. 
     * 
     * For example: eval('some code') 
     * Becomes:  eval&#40;'some code'&#41; 
     */ 
     $str = preg_replace('#(alert|cmd|passthru|eval|exec|expression|system|fopen|fsockopen|file|file_get_contents|readfile|unlink)(\s*)\((.*?)\)#si', "\\1\\2&#40;\\3&#41;", $str); 


     // Final clean up 
     // This adds a bit of extra precaution in case 
     // something got through the above filters 
     $str = $this->_do_never_allowed($str); 

     /* 
     * Images are Handled in a Special Way 
     * - Essentially, we want to know that after all of the character 
     * conversion is done whether any unwanted, likely XSS, code was found. 
     * If not, we return TRUE, as the image is clean. 
     * However, if the string post-conversion does not matched the 
     * string post-removal of XSS, then it fails, as there was unwanted XSS 
     * code found and removed/changed during processing. 
     */ 

     if ($is_image === TRUE) 
     { 
      return ($str == $converted_string) ? TRUE: FALSE; 
     } 

     log_message('debug', "XSS Filtering completed"); 
     return $str; 
    } 
contrôleur

passe des données filtrées à des modèles et des inserts de modèle dans mongo db: rien g plus ... :)

+1

Même si vous envoyez vos requêtes sur une requête uri et que vous ne l'encodez pas correctement, '% 80' évalue à ASCII' P'. Post un extrait complet s'il vous plaît. –

+0

J'utilise codeigniter php framework et en passant les chaînes par requête XHR dans la méthode POST – sbaaaang

+1

en utilisant urlencode() c'est ok – sbaaaang

Répondre

2

j'avais problème lié

eq

ucfirst pour UTF-8 doit utiliser mb_ucfirst ('HELO', 'UTF-8');

Et je pense que dans votre problème de la situation est avec: substr il doit utiliser mb_substr

autre:

Alors meybe sur BEGIN iconv convertir en iso-8859-1 et écrire à db icône t utf-8

+0

uhm pense ne pas comprendre ...: P pouvez-vous mieux expliquer? vous avez mon même problème? – sbaaaang

+1

remplacer $ temp. = Substr ($ mot, $ i, 1). "\ S *"; à mb_substr – user956584

+0

uhm fait mais rien n'a changé – sbaaaang

-1

pour éviter le problème, vous pouvez utiliser

header("Content-Type: text/html; charset=UTF-8"); 

en haut du fichier php.
J'ai trouvé la solution dans this stackoverflow post et j'ai travaillé pour moi lors de la migration de DB MySQL vers MongoDB avec des caractères spéciaux latins.