If this was just a matter of Yahoo Slurp somehow having access to the headers of emails, or otherwise trolling links, then this would occur on more than a subset of users with Yahoo accounts - it’s happening on more than just Yahoo email accounts. . .
@pancakehollow I believe that Yahoo mail actively encourages users to unsubscribe from mailing lists, adding buttons and pop-ups in some cases to promote that, e.g. if messages haven’t been read within a few weeks. Could it be that the subscribers in question have clicked away a Yahoo message without understanding that this would result in them being unsubscribed?
I had an exchange with Yahoo and Oath.com (I believe that is the company handling their crawler) and the response was less than satisfactory. Here is the response:
"The crawler is simply attempting to index URLs, it does not understand what the URL does to the backend. I believe the unsubscribe requests are being called by the crawler because the request URLs are discoverable by the crawler, which means they may have been called and cached someone at a certain point in time. If the robots.txt is correctly placed with the correct content, our crawler will no longer attempt to index those URLs. "
It’s pretty impossible for the unsubscribe URL’s to be discoverable - that would be a HUGE security risk. It is also odd that those who have been unsubscribed by their crawler are vocal spokespeople for a politically-positioned issue.
I have checked with those being unsubscribed and none of the parties every clicked any unsubscribe links.
This is curious. The technicians recommendation was to use a robots.txt file to disallow their crawler and unsubscribes but that too is very odd given if unsubscribes from Yahoo are done via a crawler, and if someone actually does unsubscribe, then the logic of the technician - that the crawler would honor the robots.txt file - would stand counter to the recipients interest in unsubscribing.
I may be having a related problem. In the last two email sends, we have gotten a lot more unsubscribes than normal. We normally get 1-4 unsubscribes. In the last 2 we got 22 and 36. We have not reached out to the people who unsubscribed but many were very surprising based on who they are. It seems like something is going on. Looking at the server log, I am not seeing requests for robots from the ip numbers that are unsubscribing. Should I add a robots.txt file? If so, where do I get one and where do I put it?
When I do a whois on the ip numbers, the 5 that I tried all went to Amazon Technologies Inc. As far as I know, none of our subscribers are from Amazon. All but 5 of the unsubscribes in the logs end with “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)”
I doubt if this is related, but I think these last 2 sends were the first after I upgraded to 3.3.4-RC2.
How should I proceed? Is there a way to make sure the unsubscribe is legitimate? Or is there a way to make sure that the unsubscribes are from a person, ,perhaps a captcha?
Let me know if I can provide any more info to help.
I do think something is going on, on a scale we have not seen before. This is verging on newsworthy and I don’t think it’s related to the way phplist works. My logs are showing very distinct behavior of Yahoo, oath.com and inktomi bots. I have reached out to those who are being unsubscribed and they were surprised. I even noticed in the logs people attempting to re-subscribe (you can see that when they click on the subscribe button shortly after receiving a notice that they have been unsubscribed - although not all are reporting receiving the unsubscribe message). You might want to read this discussion group posting:
I was told by Yahoo to put a robots.txt file in the root directory (above the /lists directory in most cases) disallowing all with an asterisk or I guess you could just single out Yahoo.)
This is even stranger because they are admitting that by placing a robots.txt file in the directory disallowing it, then when a user really, really does unsubscribe, they won’t be because it would appear that Yahoo looks at a robots.txt file first to ascertain if they can unsubscribe someone which is even weirder or just totally incompetent.
Could someone from phplist development pipe in and advise if there is any way at all that a crawler can get the unique user ID by crawling phplist? I know that most emails sent out and passing through any inbound email system will have the unsubscribe URL embedded in the headers - is it possible that is where these company’s are grabbing the URL from?
All that is happening is robots are “following” links in emails. That might happen automatically, say when Yahoo receives an email for one of its members, or anti-malware software checks a received email. There is nothing special about the unsubscribe link.
I guess that you have enabled one-click unsubscribe, which means the automatic link click really does unsubscribe. You can change that to require the real subscriber to confirm.
Adding a robots.txt file should reduce the problem. I have a robots file and don’t have a problem with rogue unsubscriptions.
Still, Yahoo or others shouldn’t be unsubscribing regardless of how we setup phplist. That makes Yahoo and those doing the unsubscribing as malicious, at least in my book.
I would also suggest that a more clear explanation of the potential pitfalls of roque robots unsubscribing be included in the description of the one-click unsubscribe, which honestly I didn’t understand until Duncan explained it so succinctly. In fact including his phrase “. . . one-click unsubscribe” is a good starting point for that. I’m guessing a lot of people across phplist have been unwittingly unsubscribed because of this setting.
That is helpful, but I am not sure that the default is working as I expected. Until recently, my config file had this:
// if a user should immediately be unsubscribed, when using their personal URL, instead of
// the default way, which will ask them for a reason, set this to 1
When I upgraded, i started with a fresh short config file and only added from the extended that I thought I needed. It looked to me like the above was the default, so I left it off. That is when my problem with unsubscribes started.
Am I correct that leaving off jumpoff option out of the config filie means jumpoff is off and the spiders won’t be able to automatically unsubscribe people? I could be wrong, but I believe my experience was the opposite. In other words, the default when this option was not in the config file was to set jumpoff to 1, allowing the spiders to unsubscribe. Am I confused?
@phpvdn Do you feel like fixing the incorrect value in the config_extended file, which Duncan described, in a pull request? Also adding a default robots.txt file would help others avoid the same issue.
I don’t feel like I am qualified to take this on, but I do hope that someone will fix the jumpoff default. In my opinion, the risk of unintentional unsubscriptions is so great that the default should be that the jumpoff is off, and the config file should contain a warning as to the risks of setting it to on.
Also, there was one other config option that was missing when my problems with unsubscribes started. This was taken out of the config file.
// to increase security the session of a user is checked for the IP address
// this needs to be the same for every request. This may not work with
// network situations where you connect via multiple proxies, so you can
// switch off the checking by setting this to 0
Is this related? The reason I ask is that most (or perhaps all) of my suspicious unsubscribes came from an IP from amazon that does not seem to be related to the email domains. Do you have a recommendation regarding how this should be set? It would be great to have more detailed documentation about the pros and cons of all of these options somewhere.
Regarding the robots file, if you think it is a good idea, it would be great if there was some documentation. I haven’t added it yet. I guess I am concerned that it could block legitimate unsubscribes. Is that possible?