logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

CAM::PDF::PageText - Extract text from PDF page tree

Author

       See CAM::PDF

perl v5.36.0                                       2022-12-08                            CAM::PDF::PageText(3pm)

Description

       This module attempts to extract sequential text from a PDF page.  This is not a robust process, as PDF
       text is graphically laid out in arbitrary order.  This module uses a few heuristics to try to guess what
       text goes next to what other text, but may be fooled easily by, say, subscripts, non-horizontal text,
       changes in font, form fields etc.

       All those disclaimers aside, it is useful for a quick dump of text from a simple PDF file.

Functions

       $pkg->render($pagetree)
       $pkg->render($pagetree, $verbose)
           Turn a page content tree into a string.  This is a class method that should be called like:

              CAM::PDF::PageText->render($pagetree);

License

       Same as CAM::PDF

Name

       CAM::PDF::PageText - Extract text from PDF page tree

Synopsis

          my $pdf = CAM::PDF->new($filename);
          my $pageone_tree = $pdf->getPageContentTree(1);
          print CAM::PDF::PageText->render($pageone_tree);

See Also