Status
Not open for further replies.

Porsche_maniak

Active Member
283
2009
0
40
Hi guys !
I am not so good with php as you previously may know...
So i can't achieve the following and i hope you can help me with some script.

I want to get the contents from .gz file (which is a txt file) and search in this .txt file for match with '[url=http://rapidshare , [url=http://megaupload ,[url=http://hotfile , and .torrent '.

If there is match , then show rapidshare.png for [url=http://rapidshare
If there is match , then show megaupload.png for [url=http://megaupload

and so on...

Thanks !
 
20 comments
Code:
$file = gzfile('yourfile.gz');
$file = implode("\n",$file);

$matches = array(
    array('rapidshare.png', '[url=http://rapidshare'),
    array('megaupload.png', '[url=http://megaupload')
);

foreach($matches as $match) {
    if(strpos($file, $match[1]) !== false) {
        $image = $match[0];
        break;
    }
}
Something like this maybe.

This is assuming your gz file is purely a gzipped text file, not a tar.gz for example.
 
Firstly you need to break it up.

  • Archive Opening (extraction into memory) | gzopen
  • Parsing the contents (Extract links from tags) | preg_match_all
  • Compile the output.

So i would start by making a base class to work with
PHP:
class GzipDownloads
{
   protected $gzfile = null;
   protected $gzcontents = null;

   function __construct($file)
   {
        if(!file_exists($file))
       {
           trigger_error('Unable to open ' . $file,E_USER_ERROR); //Die here
       }
       $this->gzfile = gzopen($file);
       $this->gzcontents = gzread($this->gzfile,filesize($file));
   }

   function getMeta()
   {
      preg_match_all('/\[url\=(.*?).*?\].*?\[.*?\]/is',$this->gzcontent,$matches);
      $mata = array();
      foreach($matches as $match)
      {
          //Url segment should be [1]
          if(preg_match('/http:\/\//',$match[1]))
          {
               $usegments = parse_url($match[1]);
               if($usegment['host'])
               {
                   $host = str_replace(array('.com','.net','.co.uk','.org'),'',$usegment['host']); //Remove tld
                   $meta[$host] = true;
               }
          }
       }
       return $mata;
   }
}

Before i can build the getMeta() method witch will hold the links and other statuses, i need to examine the contents of the text file to look for similarities

-- Updated CODE AND Read below

PHP:
$gzd = new GzipDownloads('my.file.txt.gz');
$meta = $gzd->getMeta();
foreach($meta as $host)
{
    if(file_exists('images/' . $host . '.png'))
    {
        //images/rapidshare.png exists so show it here!
        //$host will be the domain name without any tld so rapidhsare links will be rapidshare, hotfile.com/../../../ will be hotfile
    }
}

The above is all example and untested.
 
Firstly you need to break it up.

  • Archive Opening (extraction into memory) | gzopen
  • Parsing the contents (Extract links from tags) | preg_match_all
  • Compile the output.

So i would start by making a base class to work with
Before i can build the getMeta() method witch will hold the links and other statuses, i need to examine the contents of the text file to look for similarities

He doesn't need regex, just to check if it contains rs, mu or whatever else.
Also, gzfile() should be fine for any gzipped text file.

Porsche_maniak: the code I posted will read one gzipped text file and check which hosts it contains. This seems like what you are asking for.

If you wish to handle multiple files, simply read each one in a loop or use a class and have an object per file.

Regardless of how you read it in though, that is how you should match certain strings.
 
Update my post, please check!

Edit:
I really don't know why people run from PCRE :/ puzzles me, i mean PHP /mysql can handle 100s / 1000s of queries per second yet people try so hard to get them down to like 3 and that :/
 
Again, im not trying to say your code is wrong lite, but regex is useless here.

He needs to check for matches, he doesn't need the matches.
Strpos is much, much faster.

Also, since all he wants is to read the files in and check for string matches, such a complex object won't be needed.
 
Yea i know dood, just saying using my code you only need to have an image in the folder and it will match for that aswell.

so if a new host came out called terahost.com, he just needs to add the terahost.png to the dir and its found,
 
Yes but the point is, it is excessive and slower than it could be.

All you need is to read each gz in, check for matches with strpos and use the correct image. It is a very, very simple problem with an equally simple solution.
 
litewarez really tnx for your effort,but i think that JmZ is right...

@ JmZ
how do i read each one in a loop ?

@litewarez
Hmm sounds interesting...
 
Do you mean one gz contains many files? or may gzips contain one file each?

If they each have one file, you just loop through them or stick them in an object.

for example, off the top of my head:

PHP:
$files = array('some.gz', 'file.gz', 'made.gz', 'up.gz');

function hostImage($filename) {
    $file = gzfile($filename);
    $file = implode("\n",$file);

    $matches = array(
        array('rapidshare.png', '[url=http://rapidshare'),
        array('megaupload.png', '[url=http://megaupload')
    );

    $image = 'none.png';

    foreach($matches as $match) {
        if(strpos($file, $match[1]) !== false) {
            $image = $match[0];
            break;
        }
    }

    return $image;
}

foreach($files as $file) {
     $image = hostImage($file);
     // do other things here
}
It'd be nicer class based now I believe, because having the array of strings to match inside the function isn't ideal.

Also, this will return 'none.png' if nothing is found.
 
@JmZ

Yea .. They are arround 1300 .gz files in a folder and keep increasing . Each .gz contains only 1 .txt file . I have a path where the .gz files are - $pathz = CONTENT_DIR.$y.'/'.$m;
The path is taking 10 .gz files.When user click the next page it takes the next 10 .gz.

I hope i haven't confused you.
 
Yeah I see.

Well you'd want to find all gz files in the dir first, then loop through like above.

e.g.

change $files at the top of the code, to:
PHP:
$current_dir = getcwd();
chdir($pathz);
$files = glob('*.gz');
chdir($current_dir);
 
It is content/10/05
I tried
Code:
$pathz = CONTENT_DIR.$y.'/'.$m.'/'.$glo;

 $glo=glob('*.gz');

 
 echo $pathz;

To see if i am going to take the .gz file names but it was again content/10/05/
 
PHP:
/*
    *Functions
*/
function hostImage($filename) {
    $file = gzfile($filename);
    $file = implode("\n",$file);

    $matches = array(
        array('rapidshare.png', '[url=http://rapidshare'),
        array('megaupload.png', '[url=http://megaupload')
    );

    $image = 'none.png';

    foreach($matches as $match) {
        if(strpos($file, $match[1]) !== false) {
            $image = $match[0];
            break;
        }
    }
    return $image;
}

//Ditrectory stuff
$gdir = CONTENT_DIR.$y.'/'.$m.'/*.gz';

foreach(glob($gdir) as $file)
{
    $image = hostImage(CONTENT_DIR.$y.'/'.$m.'/' . $file);
    echo $image; // rapidshare.png or megaupload.png
}
 
Warning: gzfile(content/10/05/content/10/05/entry100501-210451.txt.gz) [function.gzfile]: failed to open stream: No such file or directory

the bolded shouldnt exist..
 
Wow what a blob of too many coders. :))

First let me say I have no idea where your code came from as I don't see content_dir being defined anywhere.
But looking at what it is producing try removing the content_dir. from the command whatever it was.
Seeing what your total code is for these routines would be nice since you are following 2 peoples advice.
 
Status
Not open for further replies.
Back
Top