• _thebrain_@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    37
    ·
    13 days ago

    I wonder how effective they are. When I first heard about ssh targets (like endlessh) I thought it was an awesome idea. But as I started to look at some analyzed logged data it turns out they are either slightly effective to not at all effective. If simple logic can be written so a dumb ssh bot programed to find vulnerable ssh servers can easily avoid a tar pit, I would think it is pretty trivial for an AI crawler to do the same thing. I am interested to see some analyzed data on something like this after several months on the open internet.

    • tempest@lemmy.ca
      link
      fedilink
      English
      arrow-up
      21
      arrow-down
      1
      ·
      13 days ago

      The reality is that depending on the crawling architecture someone is watching.

      As aggressive as the LLM crawlers are there still have limits so a competently written one will have a budget for each host/site as well as a heuristic for the quality of results. It may dig for a bit and periodically return but if you’re site is not one that is known to generate high quality data it may only get crawled when there isn’t something better in the queue.

    • Saprophyte@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      12 days ago

      Super effective, I tried both of these on a couple of domains I have and the amount of hits they get vs how long crawlers stay in them is insane. I use the AI robot.txt file and if they ignore it will spend hours scraping randomized nonsense text from unlimited internal links. I’m sure large legit ai companies have protection, but I get a lot of traffic from Africa and Asia in particular. Not sure if it’s the source or a VPN, but I just look at the geos and tend not to dig deep.