SharpLeech 2 source code is now available on GitHub under GPL v2

Hyperz · Sep 8, 2013

Small heads up. I'm currently working with a client who needs SharpLeech to work on some specific forums. Because of this and because the GPL v2 license requires me to make any changes publicly available there will be a 2.0.1 release which at the very least will add IPB 3.4.x support and improved vBulletin 4.x.x support. Do note that this doesn't mean that the default plugins will be updated. I *might* build plugins for specific sites (for a fee that is) for those that need it after I'm done with the current work.

The release will posted at https://github.com/Hyperz/SharpLeech/releases when it's done.

Hyperz · Sep 9, 2013

Download it here.

ausafali · Sep 9, 2013

thank you

scylla · Jan 29, 2018

Github link is dead, does anyone still have the program that's not in source code?

Tango · Jan 30, 2018

It looks like someone cloned it here https://github.com/TiCK3R/SharpLeech

scylla · Jan 30, 2018

Thanks for that, I went there I compiled it.
Here it is compiled. https://www.dropbox.com/s/6a1w99sc5gka7hk/Sharpleech.7z?dl=0

I have a weird issue where every topic title is the name of the target forum.. Target forum is running phpbb 2. Any idea on how to resolve it?

https://i.imgur.com/H5dnwYw.png

Hyperz · Jan 30, 2018

So this thing is ancient. I'm surprised you even got it to post stuff at this point. Back in the day it could scrape from most supported forum software without requiring custom scraping code (with some exceptions). However, said code is 7 years out of date by now and looking back the codebase is complete garbage (hence why I took down the code a long time ago, besides github not liking piracy stuff).

To get it to scrape the correct topic title you'd have to provide a custom scraping implementation in the plugin xml files. If you're scraping from a phpbb2 forum the easiest way to do that would be to copy/paste the code for the default phpbb2 implementation and edit it and/or look at existing plugins.

scylla · Jan 30, 2018

Sure, it's old but it gets the job done. Compiling it was easier than I thought it would be for someone who barely used visual studio.

I opened up a xml & "saved as" to prevent overwriting. I think the one I copied was warezbb. I'm not sure which is the default phpbb2 implementation in the sitereaders subfolder. I'm trying to copy over the cheat engine tutorial threads, since recently they deleted their tables forum & are looking for third partys to host content.

Here's the current cheatengine.xml that I have.

Code:

<?xml version="1.0" encoding="utf-8" ?>

<!-- SharpLeech 2.x.x SiteReader Plugin -->

<!-- Version MUST be in x.x.x.x format! -->
<SiteReader pluginVersion="2.0.0.0" pluginAuthor="Hyperz">
    <Settings>
        <SiteName>Cheatengine</SiteName>
        <BaseUrl>http://forum.cheatengine.org</BaseUrl>
        <TopicsPerPage>45</TopicsPerPage>
      
        <!-- Supported type values are: IP.Board 3.1.4+, IP.Board 3.x.x, IP.Board 2.x.x,
             vBulletin 4.x.x, vBulletin 3.x.x, phpBB 3.x.x, phpBB 2.x.x -->
        <Type>phpBB 2.x.x</Type>
      
        <!-- If unsure choose ISO-8859-1. Except for phpBB 3 boards, they use UTF-8 by default. -->
        <DefaultEncoding>ISO-8859-1</DefaultEncoding>
      
        <!-- Set to true if the site uses SEO urls, otherwise false. -->
        <AllowRedirects>false</AllowRedirects>
        <UseFriendlyLinks>false</UseFriendlyLinks>
    </Settings>

    <Sections>
        <Section title="Cheat Engine Tutorials" id="7" />
      
      
        <!-- If you have an account with VIP access you can un-comment this (:
        <Section title="VIP / Donators Only" id="24" />
        -->
    </Sections>

    <!-- Edit this when the site requires custom parsing -->
    <Code>
        <![CDATA[
      
        protected override void Init()
        {
            base.Init();
        }

        public override void LoginUser(string username, string password)
        {
            base.LoginUser(username, password);
        }

        public override void LogoutUser()
        {
            base.LogoutUser();
        }

        public override string[] GetTopicUrls(string html)
        {
            return base.GetTopicUrls(html);
        }

        public override SiteTopic GetTopic(string url)
        {
            return base.GetTopic(url);
        }

        public override SiteTopic GetTopic(int topicId)
        {
            return base.GetTopic(topicId);
        }
      
        public override HttpWebRequest GetPage(int sectionId, int page, int siteTopicsPerPage)
        {
            return base.GetPage(sectionId, page, siteTopicsPerPage);
        }

        public override void MakeReady(int sectionId)
        {
            base.MakeReady(sectionId);
        }
      
        ]]>
    </Code>
</SiteReader>

Hyperz · Jan 30, 2018

You'll need to edit this part:

PHP:

        public override SiteTopic GetTopic(string url)
        {
            return base.GetTopic(url);
        }

This is the default code for it, taken from the DefaultSiteTypes.cs file:

PHP:

        public override SiteTopic GetTopic(string url)
        {
            if (!this.User.IsLoggedIn) return null;

            HtmlDocument doc = new HtmlDocument();
            HttpWebRequest req;
            HttpResult result;

            req = Http.Prepare(url);
            req.Method = "GET";
            req.Referer = url;

            try
            {
                result = this.AllowRedirects ? Http.HandleRedirects(Http.Request(req), false) : Http.Request(req);
                doc.LoadHtml(result.Data);

                ErrorLog.LogException(result.Error);

                HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//img[@alt='Reply with quote']");
                string link = HttpUtility.HtmlDecode(nodes[0].ParentNode.GetAttributeValue("href", String.Empty));
                
                nodes = doc.DocumentNode.SelectNodes("//*[@class='maintitle']");
                string title = HttpUtility.HtmlDecode(nodes[0].InnerText).Trim();

                req = Http.Prepare((link.StartsWith("http:")) ? link : this.BaseUrl + "/" + link);
                req.Method = "GET";
                req.Referer = url;

                result = this.AllowRedirects ? Http.HandleRedirects(Http.Request(req), false) : Http.Request(req);
                doc.LoadHtml(result.Data);

                ErrorLog.LogException(result.Error);

                string content = doc.DocumentNode.SelectNodes("//textarea[@name='message']")[0].InnerText;

                content = HttpUtility.HtmlDecode(content.Substring(content.IndexOf(']') + 1)).Trim();
                content = content.Substring(0, content.Length - "[/quote]".Length);

                // Empty read topics cookie
                var cookies = from Cookie c in Http.SessionCookies
                              where c.Name.EndsWith("_t")
                              select c;

                foreach (Cookie c in cookies) c.Value = String.Empty;

                return new SiteTopic(
                    title.Trim(),
                    content.Trim(),
                    0, 0, url
                );
            }
            catch (Exception error)
            {
                ErrorLog.LogException(error);
                return null;
            }
        }

Assuming everything else works you'd just need to edit the xpath of:

PHP:

nodes = doc.DocumentNode.SelectNodes("//*[@class='maintitle']");

scylla · Jan 30, 2018

Thanks. I see what you mean now, I'll give it a shot tomorrow.

Hyperz · Jan 30, 2018

Had a look at the cheatengine site. It seems they used the same css class for the site's title. This should get the topic title instead:

PHP:

 nodes = doc.DocumentNode.SelectNodes("//a[@class='maintitle']");

scylla · Jan 30, 2018

Thanks for that Hyperz, that did resolve the issue with cheatengine forums.

scylla · Jan 30, 2018

I was browsing through some posts about how you wanted to make SL opensource so other users could contribute. Does sharpleech have a discord yet?

Hyperz · Jan 30, 2018

Nope. Last time I worked on it was 7 years or so ago. I released the source mostly because I wasn't doing anything with it anymore. In other words, this is pretty much abandoned software. If you want to do something with it or fork it feel free to do so. All I ask is that the credits for the original work remain. That said, I wouldn't recommend basing anything off this code. Some of it dates back to 2008 when I initially got into programming. It doesn't follow any design patterns and a lot of the code doesn't make sense when you look at C# 7 and .NET > 4.5's features. For example the HTTP stuff doesn't use modern async and relies on the deprecated HttpWebRequest/Response classes (use HttpClient and maybe something like Flurl.Http), the plugin system should be done using something like the MEF, the GUI should use a proper design pattern like MVVM, the IRC and media player bloat shouldn't be in there, etc.

scylla · Jan 31, 2018

Unfortunately, I'm not a coder, so I wouldn't be able to add anything if I wanted to, at least not easily lol. Barely figured out how to compile it lol. I'm used to dealing with modifications to vbulletin & some htaccess when I need to (linked to my vb.org profile), which is quite different than dealing with C#. Realistically, I'd have to ask another coder or put the file & source out there and hope some other coder makes the changes I need while still making SL free.

If you're up for it, I would pay to have the DefaultSiteTypes.cs file updated to provide proper leeching from:

mybb 1.8 & 2.0
vbulletin 4 (still has issues from what I can tell, when trying to leech, at least from this site http://www.psvitaiso.com/)
vbulletin 5 (it sucks but surprisingly, some communities have adopted it, against advice not to)
smf (Simple Machines Forum) 1 & 2 versions
proboards (big free forum software)
xenforo
and example xml templates for each.

I understand if you dont want to but I figured I would ask and explain my situation/ideas anyways.

Hyperz · Jan 31, 2018

No harm in asking, but I'll have to pass on that. You can probably find a freelancer that wants to do it but chances are it's never going to be worth the price unless he charges crazy low rates and/or only makes it work with the default html/css of the forum software.

For every individual forum software:

A nulled copy has to be found if it's not free software and local installation has to be setup.
A scraping and posting implementation has to be written.
A bunch of existing forums using that forum software has to be found so that a pattern can be found and scraping implementation adjusted to work with most of them.
The code has to be tested against those existing forums AND a few forums of every other supported forum type.

None of that is hard or requires a lot of code but it is a very time consuming and annoying process if you want to do it right. And at the end of the day there will still be plenty of sites that will require a custom scraping implementation because their html/css structure deviates from the default one. Not to mention cross-forum incompatibilities (such as BBCode).

SharpLeech 2 source code is now available on GitHub under GPL v2

Hyperz

Active Member

Hyperz

Active Member

ausafali

Active Member

scylla

Member

Tango

Moderator

scylla

Member

Hyperz

Active Member

scylla

Member

Hyperz

Active Member

scylla

Member

Hyperz

Active Member

scylla

Member

scylla

Member

Hyperz

Active Member

scylla

Member

Hyperz

Active Member