1. AnonSharer

    AnonSharer Member

    Mar 7, 2018
    14
    What are the filehosts that allow remote upload?
    For scraping sites, selenium is good or anyother modules?
     
  2. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,258
    Most popular filehosts support remote upload and/or "cloning". You can dig through https://www.wjunction.com/forums/file-hosts-official-support.95/ to find ones that fit your requirements.

    As for Selenium, no. Never use selenium for scraping unless whatever you are trying to do can't be done without a browser. 99% of the time scraping can be done without a browser. The main use case for Selenium is stuff like automating tests for website frontends etc. For scraping you really don't want to depend on a browser along with 1.000.000+ lines of code worth of overhead and attack surface just to extract some simple data out of some text/html.

    As is usually the case with Python (you forgot to mention the language) there are multiple options. My suggestions would be:
    - requests (easy to use library for doing HTTP requests)
    - lxml (very fast and lightweight XML and HTML parser written in C, with XPATH support)

    Those 2 is really all you need. In some rare cases you might need Javascript support. I've personally never needed it because often times you can parse out whatever you need with substring operations or regular expressions on the relevant piece of JS. This saves a lot of overhead and again running a JS engine introduces a rather large attack surface. But anyway, should you need it there are python bindings for Google's V8 engine.
     

Share This Page