[php] Extract tag from html

Status
Not open for further replies.

skinner

Active Member
741
2010
59
5,450
Hi, I need your help to extract some code form one html page.

I take this code from html page:

PHP:
<div class="shareWrapper rounded">
					
<ul class="listCat">
<li><b>Categories:<br /></b>
<a href="/category/26/category 1/">Category 1</a>	
</li>
</ul>
					
 <ul class="listCat">
<li>
<b>Tags:<br /></b>
<a href="/tags/tag1/">tag 1</a><br /><a href="/tag/tag2/">tag 2</a><br /><a href="/tag/tag3/">tag3</a>
</li>
</ul>
</div>

I use this code, but it's not finished yet

PHP:
foreach($videoContent->find("ul[class=listCat]") as $content) 
{


foreach($content->find('li') as $li)
       {
	  		
			 $tags[] = $li;	
			 
			
		 
	
       }

}

foreach($tags as $tag)
 {
 			
 			$tag = strip_tags( $tag );  
			$tagFinal .= $tag;
		             	 
			 			
 }
 

echo '<b>Tags2 : </b>' .$tagFinal. '<br>';

What I need is to:

  1. Check is Category is set
  2. make 2 variables, 1 for category (if exists) and 1 for tag with their content

I need to return two separate value, for category and tags, and insert them later into database..

How can I extract them?

Thanks
 
4 comments
Which library are you using for parsing html? Is it Simple HTML Dom?
And explain what you mean by "Check is Category is set"

Hi,

yes, I'm using simple html dom. But if you use another better method, no problem.

Sometimes this code doesn't exist:

PHP:
<li><b>Categories:<br /></b>
<a href="/category/26/category 1/">Category 1</a>    
</li>

and I need to check if exists and if it's present on the html page. If is not present, I need to extract only tags.

For example:

Categories:
- Category 1

Tags:
- tag 1
- tag 2
- tag 3

but if categories isn't in the page I need output only:

Tags:
- tag 1
- tag 2
- tag 3
 
Try this:
PHP:
foreach($videoContent->find("ul[class=listCat]") as $content) 
{
    foreach($content->find('li') as $li)
    {
        $heading = $li->firstChild()->plaintext; //Gets the first child inside the tag with html removed
        
        echo "$heading<br />";
        foreach($li->find('a') as $a)
        {
            echo "&nbsp;&nbsp;&nbsp;-&nbsp;" . $a->plaintext . '<br />';
        }
        echo '<br />';        
    }
}

Test code:
PHP:
<?php

require_once('simple_html_dom.php');

$videoContent = str_get_html('<div class="shareWrapper rounded">
                    
<ul class="listCat">
<li><b>Categories:<br /></b>
<a href="/category/26/category 1/">Category 1</a>    
</li>
</ul>
                    
 <ul class="listCat">
<li>
<b>Tags:<br /></b>
<a href="/tags/tag1/">tag 1</a><br /><a href="/tag/tag2/">tag 2</a><br /><a href="/tag/tag3/">tag3</a>
</li>
</ul>
</div> ');

foreach($videoContent->find("ul[class=listCat]") as $content) 
{
    foreach($content->find('li') as $li)
    {
        $heading = $li->firstChild()->plaintext; //Gets the first child inside the tag with html removed
        
        echo "$heading<br />";
        foreach($li->find('a') as $a)
        {
            echo "&nbsp;&nbsp;&nbsp;-&nbsp;" . $a->plaintext . '<br />';
        }
        echo '<br />';        
    }
}

?>
 
Status
Not open for further replies.
Back
Top