Comment remplacer le texte centré dans un fichier PDF avec PDFBox

J'utilise l'exemple PDFTextReplacement. Il fait le remplacement comme prévu, Dans le cas où mon texte est aligné à gauche. Mais si mon PDF d'entrée a un texte centré, il remplace le texte par un alignement à gauche. Ok, donc je dois recalculer le bon point de départ.Comment remplacer le texte centré dans un fichier PDF avec PDFBox

Pour cette raison, j'ai deux cibles ou des questions:

Comment déterminer l'alignement?
Comment calculer le bon point de départ?

Voici mon code:

public PDDocument doIt(String inputFile, Map<String, String> text) 
     throws IOException, COSVisitorException { 
    // the document 
    PDDocument doc = null; 

    doc = PDDocument.load(inputFile); 
    List pages = doc.getDocumentCatalog().getAllPages(); 
    for (int i = 0; i < pages.size(); i++) { 
     PDPage page = (PDPage) pages.get(i); 
     PDStream contents = page.getContents(); 

     PDFStreamParser parser = new PDFStreamParser(contents.getStream()); 
     parser.parse(); 
     List tokens = parser.getTokens(); 
     for (int j = 0; j < tokens.size(); j++) { 
      Object next = tokens.get(j); 

      if (next instanceof PDFOperator) { 

       PDFOperator op = (PDFOperator) next; 

       // Tj and TJ are the two operators that display 
       // strings in a PDF 

       String pstring = ""; 
       int prej = 0; 
       if (op.getOperation().equals("Tj")) { 
        // Tj takes one operator and that is the string 
        // to display so lets update that operator 
        COSString previous = (COSString) tokens.get(j - 1); 
        String string = previous.getString(); 
        // System.out.println(j + " " + string); 
        if (j == prej) { 
         pstring += string; 
        } else { 
         prej = j; 
         pstring = string; 
        } 

        previous.reset(); 
        previous.append(string.getBytes("ISO-8859-1")); 
       } else if (op.getOperation().equals("TJ")) { 
        COSArray previous = (COSArray) tokens.get(j - 1); 
        for (int k = 0; k < previous.size(); k++) { 
         Object arrElement = previous.getObject(k); 
         if (arrElement instanceof COSString) { 
          COSString cosString = (COSString) arrElement; 
          String string = cosString.getString(); 

          if (j == prej) { 
           pstring += string; 
          } else { 
           prej = j; 
           pstring = string; 
          } 

          cosString.reset(); 
          // cosString.append(string 
          // .getBytes("ISO-8859-1")); 
         } 

        } 

        COSString cosString2 = (COSString) previous 
          .getObject(0); 

        for (int t = 1; t < previous.size(); t++) 
         previous.remove(t); 

        // cosString2.setNeedToBeUpdate(true); 

        if (text.containsKey(pstring.trim())) { 

         String textValue = text.get(pstring.trim()); 
         cosString2.append(textValue.getBytes("ISO-8859-1")); 

         for (int k = 1; k < previous.size(); k++) { 
          previous.remove(k); 

         } 
        } 

       } 
      } 
     } 
     // now that the tokens are updated we will replace the 
     // page content stream. 
     PDStream updatedStream = new PDStream(doc); 
     OutputStream out = updatedStream.createOutputStream(); 
     ContentStreamWriter tokenWriter = new ContentStreamWriter(out); 
     tokenWriter.writeTokens(tokens); 
     page.setContents(updatedStream); 
    } 
    return doc; 
}

Source

2013-10-11 markus0074

* Comment déterminer l'alignement * - PDF ne connaît pas l'alignement. Il dessine un texte commençant à l'origine actuelle, c'est tout. Vous pouvez essayer de déterminer un alignement en comparant la position du texte de la "ligne" actuelle avec les dimensions de la page et la position du texte sur les "lignes" avant et après ("ligne" car PDF ne suit pas nécessairement une ligne de texte concept). Mais si un texte semble centré, êtes-vous sûr qu'il était destiné à être centré? Il peut aussi avoir été simplement échancré à une certaine distance et par hasard maintenant * regarder centré *. – mkl

@mkl Oui, c'est exactement ce que j'ai vu dans le PDDocument. Je dois donc affiner mes questions. 1. Comment obtenir l'espace exact utilisé par le contenu (icepdf utilise lineText.getBounds())? 2. comment caler l'espace utilisé pour de nouvelles chaînes (basé sur BASE14 polices)) – markus0074

Votre code fonctionne à un très bas niveau, il inspecte les instructions individuelles à partir des flux de contenu de la page. Ainsi, il ne bénéficie pas de fonctionnalités de niveau supérieur. Cela signifie surtout qu'à ce niveau vous devez suivre vous-même les changements de l'état graphique actuel. Pour être en mesure de le faire, vous devez d'abord étudier la [spécification PDF ISO 32000-1] (http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf) d'abord, en particulier les chapitres 8 (pour comprendre comment l'état graphique change) et 9 (pour comprendre comment le texte est dessiné). – mkl

vous pouvez utiliser cette fonction:

public void doIt(String inputFile, String outputFile, String strToFind, String message) 
      throws IOException, COSVisitorException 
     { 
      // the document 
      PDDocument doc = null; 
      try 
      { 
       doc = PDDocument.load(inputFile); 
       List pages = doc.getDocumentCatalog().getAllPages(); 
       for(int i=0; i<pages.size(); i++) 
       { 
        PDPage page = (PDPage)pages.get(i); 
        PDStream contents = page.getContents(); 
        PDFStreamParser parser = new PDFStreamParser(contents.getStream()); 
        parser.parse(); 
        List tokens = parser.getTokens(); 
        for(int j=0; j<tokens.size(); j++) 
        { 
         Object next = tokens.get(j); 
         if(next instanceof PDFOperator) 
         { 
          PDFOperator op = (PDFOperator)next; 
          //Tj and TJ are the two operators that display 
          //strings in a PDF 
          if(op.getOperation().equals("Tj")) 
          { 
           //Tj takes one operator and that is the string 
           //to display so lets update that operator 
           COSString previous = (COSString)tokens.get(j-1); 
           String string = previous.getString(); 
           string = string.replaceFirst(strToFind, message); 
           previous.reset(); 
           previous.append(string.getBytes()); 
          } 
          else if(op.getOperation().equals("TJ")) 
          { 
           COSArray previous = (COSArray)tokens.get(j-1); 
           for(int k=0; k<previous.size(); k++) 
           { 
            Object arrElement = previous.getObject(k); 
            if(arrElement instanceof COSString) 
            { 
             COSString cosString = (COSString)arrElement; 
             String string = cosString.getString(); 
             string = string.replaceFirst(strToFind, message); 
             cosString.reset(); 
             cosString.append(string.getBytes()); 
            } 
           } 
          } 
         } 
        } 
        //now that the tokens are updated we will replace the 
        //page content stream. 
        PDStream updatedStream = new PDStream(doc); 
        OutputStream out = updatedStream.createOutputStream(); 
        ContentStreamWriter tokenWriter = new ContentStreamWriter(out); 
        tokenWriter.writeTokens(tokens); 
        page.setContents(updatedStream); 
       } 
       doc.save(outputFile); 
      } 
      finally 
      { 
       if(doc != null) 
       { 
        doc.close(); 
       } 
      } 
     }

Source

2014-02-26 16:52:07 Bourkadi

Où exactement votre code essaie-t-il de déterminer l'alignement de la ligne d'origine? Laisser seul ré-aligner la ligne changée? – mkl

ne pas utiliser l'alignement, il suffit de faire une sorte de marqueur comme _TEXTHERE dans votre pdf et le remplacer par la fonction de replave – Bourkadi

Mais l'op explicitement voulu conserver un alignement spécial, il voulait un texte centré après le remplacement. – mkl

Comment remplacer le texte centré dans un fichier PDF avec PDFBox

Répondre

Questions connexes