xml - LibXML - Looping through nodes until -


i'm trying parse below xml using perl's xml::libxml library.

<?xml version="1.0" encoding="utf-8" ?> <taggedpdf-doc>    <part>   <sect>     <h4>2.1 study purpose </h4>     <p>this study purpose content</p>     <p>content 1</p>     <p>content 2</p>     <p>content 3 </p>     <p>content 4</p>     <p>3. header</p>     <p>obj content 4</p>     <p>obj content 2</p>   </sect>  </part> </taggedpdf-doc> 

for header study purpose, i'm trying display related siblings. expected output is:

<h4>2.1 study purpose </h4> <p>this study purpose content</p> <p>content 1</p> <p>content 2</p> <p>content 3 </p> <p>content 4</p> 

my perl code below. can display first node.

given value of first node,study purpose, there way can loop , print nodes until hit node containing "digit followed '.'"?

my perl implementation:

my $purpose_str = 'purpose , rationale|study purpose|study rationale'; $parser = xml::libxml->new; #print "parser file $file is: $parser \n";      $dom = $parser->parse_file($file);  $root = $dom->getdocumentelement; $dom->setdocumentelement($root);  $purpose_search('/taggedpdf-doc/part/sect/h4') {     $purpose_nodeset = $dom->find($purpose_search);     foreach $purp_node ($purpose_nodeset -> get_nodelist)     {         if ($purp_node =~ m/$purpose_str/i)         {             #get corresponding child nodes             @childnodes = $purp_node->nonblankchildnodes();              $first_kid = shift @childnodes;             $second_kid = $first_kid->nextnonblanksibling();             #$third_kid = $second_kid->nextnonblanksibling();              $first_kid -> string_value;             $second_kid -> string_value;             #$third_kid -> string_value;         }          print "study purpose is: $first_kid\n.$second_kid\n";     } }     

do not @ child nodes if want siblings. use textcontent if want match node's text content.

#!/usr/bin/perl use warnings; use strict; use xml::libxml;  $file        = 'input.xml'; $purpose_str = 'purpose , rationale|study purpose|study rationale'; $dom         = xml::libxml->load_xml(location => $file);  $purpose_search('/taggedpdf-doc/part/sect/h4') {     $purpose_nodeset = $dom->find($purpose_search);     $purp_node ($purpose_nodeset -> get_nodelist)     {         if ($purp_node->textcontent =~ m/$purpose_str/i)         {             @siblings = $purp_node->find('following-sibling::*')                            ->get_nodelist;              $i (0 .. $#siblings)             {                 if ($siblings[$i]->textcontent =~ /^[0-9]+\./)                 {                     splice @siblings, $i;                     last;                 }             }              print $_->textcontent, "\n" @siblings;         }      } }     

Comments

Popular posts from this blog

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -

asp.net mvc - breakpoint on javascript in CSHTML? -