Recherche d'une séquence d'octets dans un fichier binaire avec Java

J'ai une séquence d'octets que je dois rechercher dans un ensemble de fichiers binaires utilisant Java. Exemple: Je recherche la séquence d'octets DEADBEEF (en hexadécimal) dans un fichier binaire. Comment ferais-je cela en Java? Y at-il une méthode intégrée, comme String.contains() pour les fichiers binaires?Recherche d'une séquence d'octets dans un fichier binaire avec Java

Source

2009-10-02 Bassam

Non, il n'existe pas de méthode intégrée pour cela. Mais, directement copié de HERE (avec deux corrections appliquées au code d'origine):

/** 
* Knuth-Morris-Pratt Algorithm for Pattern Matching 
*/ 
class KMPMatch { 
    /** 
    * Finds the first occurrence of the pattern in the text. 
    */ 
    public int indexOf(byte[] data, byte[] pattern) { 
     int[] failure = computeFailure(pattern); 

     int j = 0; 
     if (data.length == 0) return -1; 

     for (int i = 0; i < data.length; i++) { 
      while (j > 0 && pattern[j] != data[i]) { 
       j = failure[j - 1]; 
      } 
      if (pattern[j] == data[i]) { j++; } 
      if (j == pattern.length) { 
       return i - pattern.length + 1; 
      } 
     } 
     return -1; 
    } 

    /** 
    * Computes the failure function using a boot-strapping process, 
    * where the pattern is matched against itself. 
    */ 
    private int[] computeFailure(byte[] pattern) { 
     int[] failure = new int[pattern.length]; 

     int j = 0; 
     for (int i = 1; i < pattern.length; i++) { 
      while (j > 0 && pattern[j] != pattern[i]) { 
       j = failure[j - 1]; 
      } 
      if (pattern[j] == pattern[i]) { 
       j++; 
      } 
      failure[i] = j; 
     } 

     return failure; 
    } 
}

Source

2009-10-02 05:11:13 janko

J'adore StackOverflow. Merci! – Teekin

Très peu d'optimisation: vous n'avez pas besoin de calculer la fonction de défaillance du modèle si la longueur data.length est zéro ==> vous pouvez déplacer la vérification data.length à la première ligne de la fonction. – dexametason

private int bytesIndexOf(byte[] source, byte[] search, int fromIndex) { 
    boolean find = false; 
    int i; 
    for (i = fromIndex; i < (source.length - search.length); i++) { 
     if (source[i] == search[0]) { 
      find = true; 
      for (int j = 0; j < search.length; j++) { 
       if (source[i + j] != search[j]) { 
        find = false; 
       } 
      } 
     } 
     if (find) { 
      break; 
     } 
    } 
    if (!find) { 
     return -1; 
    } 
    return i; 
}

Source

2011-01-12 04:18:36 joseluisbz

Ne fonctionnera pas sur le dernier octet de chaîne. –

Vous pouvez trouver la bibliothèque sur Github à: https://github.com/twitter/elephant-bird

package com.twitter.elephantbird.util; 

import java.io.IOException; 
import java.io.InputStream; 
import java.util.Arrays; 

/** 
* An efficient stream searching class based on the Knuth-Morris-Pratt algorithm. 
* For more on the algorithm works see: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm. 
*/ 
public class StreamSearcher { 

    protected byte[] pattern_; 
    protected int[] borders_; 

    // An upper bound on pattern length for searching. Throws exception on longer patterns 
    public static final int MAX_PATTERN_LENGTH = 1024; 

    public StreamSearcher(byte[] pattern) { 
    setPattern(pattern); 
    } 

    /** 
    * Sets a new pattern for this StreamSearcher to use. 
    * @param pattern 
    *   the pattern the StreamSearcher will look for in future calls to search(...) 
    */ 
    public void setPattern(byte[] pattern) { 
    if (pattern.length > MAX_PATTERN_LENGTH) { 
     throw new IllegalArgumentException("The maximum pattern length is " + MAX_PATTERN_LENGTH); 
    } 

    pattern_ = Arrays.copyOf(pattern, pattern.length); 
    borders_ = new int[pattern_.length + 1]; 
    preProcess(); 
    } 

    /** 
    * Searches for the next occurrence of the pattern in the stream, starting from the current stream position. Note 
    * that the position of the stream is changed. If a match is found, the stream points to the end of the match -- i.e. the 
    * byte AFTER the pattern. Else, the stream is entirely consumed. The latter is because InputStream semantics make it difficult to have 
    * another reasonable default, i.e. leave the stream unchanged. 
    * 
    * @return bytes consumed if found, -1 otherwise. 
    * @throws IOException 
    */ 
    public long search(InputStream stream) throws IOException { 
    long bytesRead = 0; 

    int b; 
    int j = 0; 

    while ((b = stream.read()) != -1) { 
     bytesRead++; 

     while (j >= 0 && (byte)b != pattern_[j]) { 
     j = borders_[j]; 
     } 
     // Move to the next character in the pattern. 
     ++j; 

     // If we've matched up to the full pattern length, we found it. Return, 
     // which will automatically save our position in the InputStream at the point immediately 
     // following the pattern match. 
     if (j == pattern_.length) { 
     return bytesRead; 
     } 
    } 

    // No dice, Note that the stream is now completely consumed. 
    return -1; 
    } 

    /** 
    * Builds up a table of longest "borders" for each prefix of the pattern to find. This table is stored internally 
    * and aids in implementation of the Knuth-Moore-Pratt string search. 
    * <p> 
    * For more information, see: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm. 
    */ 
    protected void preProcess() { 
    int i = 0; 
    int j = -1; 
    borders_[i] = j; 
    while (i < pattern_.length) { 
     while (j >= 0 && pattern_[i] != pattern_[j]) { 
     j = borders_[j]; 
     } 
     borders_[++i] = ++j; 
    } 
    } 
}

Source

2015-03-17 14:43:04

Où se trouve la limite de 1024 octets pour le modèle comme indiqué par le membre MAX_PATTERN_LENGTH inutilisé? – user1767316

Vous pouvez trouver la séquence d'octets à partir du fichier de commande giga-octets en utilisant bigdoc.

Lib et exemple ici sur Github à: https://github.com/riversun/bigdoc

package org.example; 

import java.io.File; 
import java.util.List; 

import org.riversun.bigdoc.bin.BigFileSearcher; 

public class Example { 

    public static void main(String[] args) throws Exception { 

     byte[] searchBytes = "hello world.".getBytes("UTF-8"); 

     File file = new File("/var/tmp/yourBigfile.bin"); 

     BigFileSearcher searcher = new BigFileSearcher(); 

     List<Long> findList = searcher.searchBigFile(file, searchBytes); 

     System.out.println("positions = " + findList); 
    } 
}

Si vous souhaitez rechercher sur la mémoire, vérifier. exemples ici sur Github à: https://github.com/riversun/finbin

import java.util.List; 

import org.riversun.finbin.BigBinarySearcher; 

public class Example { 

    public static void main(String[] args) throws Exception { 

     BigBinarySearcher bbs = new BigBinarySearcher(); 

     byte[] iamBigSrcBytes = "Hello world.It's a small world.".getBytes("utf-8"); 

     byte[] searchBytes = "world".getBytes("utf-8"); 

     List<Integer> indexList = bbs.searchBytes(iamBigSrcBytes, searchBytes); 

     System.out.println("indexList=" + indexList); 
    } 
}

Retourne toutes les positions concordantes dans le tableau d'octets

Il peut également résister à un large éventail d'octets :)

Source

2015-07-09 05:00:50 riversun

Recherche d'une séquence d'octets dans un fichier binaire avec Java

Répondre

Questions connexes