1. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,195
    Small heads up. I'm currently working with a client who needs SharpLeech to work on some specific forums. Because of this and because the GPL v2 license requires me to make any changes publicly available there will be a 2.0.1 release which at the very least will add IPB 3.4.x support and improved vBulletin 4.x.x support. Do note that this doesn't mean that the default plugins will be updated. I *might* build plugins for specific sites (for a fee that is) for those that need it after I'm done with the current work.

    The release will posted at https://github.com/Hyperz/SharpLeech/releases when it's done.
     
  2. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,195
  3. ausafali

    ausafali New Member Member

    Jun 6, 2012
    902
    thank you
     
  4. scylla

    scylla New Member

    Feb 8, 2011
    7
    Github link is dead, does anyone still have the program that's not in source code?
     
  5. Gavo

    Gavo Super Moderator Staff Member

    Jul 9, 2009
    3,168
  6. scylla

    scylla New Member

    Feb 8, 2011
    7
    Last edited: Jan 30, 2018
  7. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,195
    So this thing is ancient. I'm surprised you even got it to post stuff at this point. Back in the day it could scrape from most supported forum software without requiring custom scraping code (with some exceptions). However, said code is 7 years out of date by now and looking back the codebase is complete garbage (hence why I took down the code a long time ago, besides github not liking piracy stuff).

    To get it to scrape the correct topic title you'd have to provide a custom scraping implementation in the plugin xml files. If you're scraping from a phpbb2 forum the easiest way to do that would be to copy/paste the code for the default phpbb2 implementation and edit it and/or look at existing plugins.
     
  8. scylla

    scylla New Member

    Feb 8, 2011
    7
    Sure, it's old but it gets the job done. Compiling it was easier than I thought it would be for someone who barely used visual studio.

    I opened up a xml & "saved as" to prevent overwriting. I think the one I copied was warezbb. I'm not sure which is the default phpbb2 implementation in the sitereaders subfolder. I'm trying to copy over the cheat engine tutorial threads, since recently they deleted their tables forum & are looking for third partys to host content.

    Here's the current cheatengine.xml that I have.

    Code:
    <?xml version="1.0" encoding="utf-8" ?>
    
    <!-- SharpLeech 2.x.x SiteReader Plugin -->
    
    <!-- Version MUST be in x.x.x.x format! -->
    <SiteReader pluginVersion="2.0.0.0" pluginAuthor="Hyperz">
        <Settings>
            <SiteName>Cheatengine</SiteName>
            <BaseUrl>http://forum.cheatengine.org</BaseUrl>
            <TopicsPerPage>45</TopicsPerPage>
          
            <!-- Supported type values are: IP.Board 3.1.4+, IP.Board 3.x.x, IP.Board 2.x.x,
                 vBulletin 4.x.x, vBulletin 3.x.x, phpBB 3.x.x, phpBB 2.x.x -->
            <Type>phpBB 2.x.x</Type>
          
            <!-- If unsure choose ISO-8859-1. Except for phpBB 3 boards, they use UTF-8 by default. -->
            <DefaultEncoding>ISO-8859-1</DefaultEncoding>
          
            <!-- Set to true if the site uses SEO urls, otherwise false. -->
            <AllowRedirects>false</AllowRedirects>
            <UseFriendlyLinks>false</UseFriendlyLinks>
        </Settings>
    
        <Sections>
            <Section title="Cheat Engine Tutorials" id="7" />
          
          
            <!-- If you have an account with VIP access you can un-comment this (:
            <Section title="VIP / Donators Only" id="24" />
            -->
        </Sections>
    
        <!-- Edit this when the site requires custom parsing -->
        <Code>
            <![CDATA[
          
            protected override void Init()
            {
                base.Init();
            }
    
            public override void LoginUser(string username, string password)
            {
                base.LoginUser(username, password);
            }
    
            public override void LogoutUser()
            {
                base.LogoutUser();
            }
    
            public override string[] GetTopicUrls(string html)
            {
                return base.GetTopicUrls(html);
            }
    
            public override SiteTopic GetTopic(string url)
            {
                return base.GetTopic(url);
            }
    
            public override SiteTopic GetTopic(int topicId)
            {
                return base.GetTopic(topicId);
            }
          
            public override HttpWebRequest GetPage(int sectionId, int page, int siteTopicsPerPage)
            {
                return base.GetPage(sectionId, page, siteTopicsPerPage);
            }
    
            public override void MakeReady(int sectionId)
            {
                base.MakeReady(sectionId);
            }
          
            ]]>
        </Code>
    </SiteReader>
     
  9. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,195
    You'll need to edit this part:
    PHP:
            public override SiteTopic GetTopic(string url)
            {
                return 
    base.GetTopic(url);
            }
    This is the default code for it, taken from the DefaultSiteTypes.cs file:
    PHP:
            public override SiteTopic GetTopic(string url)
            {
                if (!
    this.User.IsLoggedIn) return null;

                
    HtmlDocument doc = new HtmlDocument();
                
    HttpWebRequest req;
                
    HttpResult result;

                
    req Http.Prepare(url);
                
    req.Method "GET";
                
    req.Referer url;

                try
                {
                    
    result this.AllowRedirects Http.HandleRedirects(Http.Request(req), false) : Http.Request(req);
                    
    doc.LoadHtml(result.Data);

                    
    ErrorLog.LogException(result.Error);

                    
    HtmlNodeCollection nodes doc.DocumentNode.SelectNodes("//img[@alt='Reply with quote']");
                    
    string link HttpUtility.HtmlDecode(nodes[0].ParentNode.GetAttributeValue("href"String.Empty));
                    
                    
    nodes doc.DocumentNode.SelectNodes("//*[@class='maintitle']");
                    
    string title HttpUtility.HtmlDecode(nodes[0].InnerText).Trim();

                    
    req Http.Prepare((link.StartsWith("http:")) ? link this.BaseUrl "/" link);
                    
    req.Method "GET";
                    
    req.Referer url;

                    
    result this.AllowRedirects Http.HandleRedirects(Http.Request(req), false) : Http.Request(req);
                    
    doc.LoadHtml(result.Data);

                    
    ErrorLog.LogException(result.Error);

                    
    string content doc.DocumentNode.SelectNodes("//textarea[@name='message']")[0].InnerText;

                    
    content HttpUtility.HtmlDecode(content.Substring(content.IndexOf(']') + 1)).Trim();
                    
    content content.Substring(0content.Length "[/quote]".Length);

                    
    // Empty read topics cookie
                    
    var cookies from Cookie c in Http.SessionCookies
                                  where c
    .Name.EndsWith("_t")
                                  
    select c;

                    foreach (
    Cookie c in cookiesc.Value String.Empty;

                    return new 
    SiteTopic(
                        
    title.Trim(),
                        
    content.Trim(),
                        
    00url
                    
    );
                }
                catch (
    Exception error)
                {
                    
    ErrorLog.LogException(error);
                    return 
    null;
                }
            }
    Assuming everything else works you'd just need to edit the xpath of:
    PHP:
    nodes doc.DocumentNode.SelectNodes("//*[@class='maintitle']");
     
  10. scylla

    scylla New Member

    Feb 8, 2011
    7
    Thanks. I see what you mean now, I'll give it a shot tomorrow.
     
  11. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,195
    Had a look at the cheatengine site. It seems they used the same css class for the site's title. This should get the topic title instead:
    PHP:
     nodes doc.DocumentNode.SelectNodes("//a[@class='maintitle']"); 
     
  12. scylla

    scylla New Member

    Feb 8, 2011
    7
    Thanks for that Hyperz, that did resolve the issue with cheatengine forums.
     
  13. scylla

    scylla New Member

    Feb 8, 2011
    7
    I was browsing through some posts about how you wanted to make SL opensource so other users could contribute. Does sharpleech have a discord yet?
     
  14. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,195
    Nope. Last time I worked on it was 7 years or so ago. I released the source mostly because I wasn't doing anything with it anymore. In other words, this is pretty much abandoned software. If you want to do something with it or fork it feel free to do so. All I ask is that the credits for the original work remain. That said, I wouldn't recommend basing anything off this code. Some of it dates back to 2008 when I initially got into programming. It doesn't follow any design patterns and a lot of the code doesn't make sense when you look at C# 7 and .NET > 4.5's features. For example the HTTP stuff doesn't use modern async and relies on the deprecated HttpWebRequest/Response classes (use HttpClient and maybe something like Flurl.Http), the plugin system should be done using something like the MEF, the GUI should use a proper design pattern like MVVM, the IRC and media player bloat shouldn't be in there, etc.
     
  15. scylla

    scylla New Member

    Feb 8, 2011
    7
    Unfortunately, I'm not a coder, so I wouldn't be able to add anything if I wanted to, at least not easily lol. Barely figured out how to compile it lol. I'm used to dealing with modifications to vbulletin & some htaccess when I need to (linked to my vb.org profile), which is quite different than dealing with C#. Realistically, I'd have to ask another coder or put the file & source out there and hope some other coder makes the changes I need while still making SL free.

    If you're up for it, I would pay to have the DefaultSiteTypes.cs file updated to provide proper leeching from:

    mybb 1.8 & 2.0
    vbulletin 4 (still has issues from what I can tell, when trying to leech, at least from this site http://www.psvitaiso.com/)
    vbulletin 5 (it sucks but surprisingly, some communities have adopted it, against advice not to)
    smf (Simple Machines Forum) 1 & 2 versions
    proboards (big free forum software)
    xenforo
    and example xml templates for each.

    I understand if you dont want to but I figured I would ask and explain my situation/ideas anyways.
     
    Last edited: Jan 31, 2018
  16. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,195
    No harm in asking, but I'll have to pass on that. You can probably find a freelancer that wants to do it but chances are it's never going to be worth the price unless he charges crazy low rates and/or only makes it work with the default html/css of the forum software.

    For every individual forum software:

    • A nulled copy has to be found if it's not free software and local installation has to be setup.
    • A scraping and posting implementation has to be written.
    • A bunch of existing forums using that forum software has to be found so that a pattern can be found and scraping implementation adjusted to work with most of them.
    • The code has to be tested against those existing forums AND a few forums of every other supported forum type.

    None of that is hard or requires a lot of code but it is a very time consuming and annoying process if you want to do it right. And at the end of the day there will still be plenty of sites that will require a custom scraping implementation because their html/css structure deviates from the default one. Not to mention cross-forum incompatibilities (such as BBCode).
     

Share This Page