Why is PHP list killing Amazon AWS instances?

synapse · December 16, 2016, 7:57pm

Hi all, I’m running PHP List on Amazon AWS on Elastic Beanstalk + RDS on T2.Micro. I thought that it was perhaps just one server instance where the hardware failed, but now on a second instance the sending of a list via SES+API is bringing down the instance completely. The messages are sending out at a send rate of 2000/hr but at some point it is killing the server that is hosting the PHPList code. The DB seems to be doing ok. I have domain throttling enabled. I’m going to experiment with turning that off and see if that is of any help at all.

synapse · December 16, 2016, 11:25pm

So, I’ve come to determine that this problem is 3-fold

Not enough server resource for what I’m asking
I didn’t have enough of a pause between message sends. Set at 1s it was hammering the server, so set to 2s the server was much more responsive (resources again) DEBUNKED
Lastly though, via SES API, SES does not like batch periods to be set too long such as 3600s, I had to define the batch as 60s in order for it to have a predictable send rate DEBUNKED

SES seems to have a hard limit on how many emails are sent per batch, vs batch over time. And if you are over the limit, SES begins to reject the messages. You can see this live on the Log of Events. As the rejection log builds up, this brings the PHPList application to a complete halt and monopolizes the server resources. The greater the the number of rejections from SES the more impacted the server becomes.

duncanc · December 17, 2016, 7:47am

@synapse SES usually has a limit on the number of emails in a 24-hour period and a maximum send rate, so you need to ensure that the phplist batch and throttle settings match those.

synapse · December 20, 2016, 9:23am

@duncanc I’m definitely not hitting the SES limits. If I hit the SES limits, you will see errors in the phpl events log. That’s how I was able to troubleshoot. It really came down to CPU and server resources. Once I upped the processing power of the database and setup failover, SES stopped rejecting high rate requests.

I really believe that this is a bug with PHP List, because I watched it minute-by-minute and as the log events piled up, PHPList would begin to degrade, take up more server memory and cache until the application server froze. I’m on Amazon AWS and am running a separate Application server and database server instance. What you also found is that the number of open ports and requests on the server would start to go up until all available ports on the database were saturated and you would effectively DDOS yourself. I had to reboot the database servers in order to fix the issue. Reboot, not rebuild. I believe what is happening is that PHPList is opening more and more ports as send requests are rejected and not closing the ports as they get saturated with requests.

duncanc · December 20, 2016, 10:11am

Not sure that I understand what you mean. You seem to be saying that you were hitting SES limits, as indicated by errors in the event log, but also saying that you were not hitting limits.

SES seems to have a hard limit on how many emails are sent per batch, vs
batch over time. And if you are over the limit, SES begins to reject
the messages. You can see this live on the Log of Events.

What are your phplist batch and throttle settings, and what is the frequency of running the cron job?
Also which version of phplist are you using?

synapse · December 20, 2016, 5:19pm

Not sure that I understand what you mean. You seem to be saying that you were hitting SES limits, as indicated by errors in the event log, but also saying that you were not hitting limits.

My SES send limits are 14 emails per second and 50k/day total, I can watch SES statistics live and I’m not going over that. What happens is if you don’t have enough processing power (server capability, 2 cpu vs 1 cpu) PHP List appears to not be able to open enough ports to the database to make requests. This results in send requests not being submitted to SES and/or there is too many requests to SES concurrently, which looks like flooding. I don’t know as AWS doesn’t share this information.

All I know is what I’m seeing in the consoles on the backend. I don’t know which is necessarily cause vs effect, But as PHPList opens more and more port requests to SES, the more requests get denied and as the errors build up in the log of events then PHPList stalls.

It appears to me that PHPList is not closing ports that it is opening, either on the database or w/ SES. There should be some type of time out for port requests, or maybe I don’t have that configured in config.php.

After this happened at 65 open port connections, I increased the processing power of the mysql server, it fixed it temporarily. It happened again though, and this time it was simply more ports that were left open and by the time it completely crashed from flooding. Once I reboot the dbase server, the ports free up and we can resume once again. Pictures below:

synapse · December 20, 2016, 5:31pm

I’m on 3.2.6 and I’m not running cron, but have a server on the backend refreshing the processing URL using the secret code and that refreshes every 30 seconds.

define('MAX_PROCESS_MESSAGE', 99999);

define('PROCESSCAMPAIGNS_PARALLEL',1);

define('MAILQUEUE_BATCH_SIZE', 43200);

define('MAILQUEUE_BATCH_PERIOD', 3600);

define('MAILQUEUE_THROTTLE', 0);

define('MAILQUEUE_AUTOTHROTTLE', 0);

duncanc · December 21, 2016, 11:17am

@synapse Thanks for the extra information. From the event log entries it looks like phplist is not sending to SES. Either the attempt to create a curl handle is failing, or the actual sending is failing.
If there is nothing else in the event log regarding errors then possibly the web server or php logs might have something.

phplist has just one database connection for each execution. If it gets any database error then it exits, it doesn’t try creating a new connection. So I can’t immediately see why there will be loads of database connections.

Using the “remote secret” approach is not going to let you send as fast as possible. With that approach phplist will send only for one minute elapsed time then exit. So there will be a gap until the next refresh of the URL.
To send as fast as possible you need to use a cron job with no batch processing, batch size set to 0. Use throttle if necessary so that the sending rate is less than allowed by SES.

There was a change to the SES processing in phplist 3.2.6 to try to increase the send rate. The code now uses the same curl handle for requests, unless an sending error occurs. But it also sets a long keep alive time on the http connection, which is probably unnecessary. I think that you can safely remove these lines from file admin/class.phplistmailer.php lines 763 and 764

        'Connection: keep-alive',
        'Keep-Alive: 300',

synapse · December 21, 2016, 9:20pm

So, I’m finding that the issue with excessive database connections is happening with SMTP as well. I’ve tried several different SMTP and API connections and finding that the problem persists. I’ve tried with both domain_throttle and mailqueue_throttle disabled and neither have had an effect.

I’ve A/B tested this and it is clear that PHPList is the process that is making all of these DB connections and flooding the DB ports. If a campaign message is queue’d database connections will open up, if some sort of processing error is reached, the number of connections will go up and up until the server crashes. I have PHPList operating on two separate application servers with different config.php files, so I can A/B test what is happening. I believe that these errors are occuring at the application/code level. Once the application hangs and stops taking and serving HTTP requests and many dbase ports are open, I have to restart the application server in order to clear the backup.

Is it possible that PHPList is opening port requests, and even though they time out, not closing them?

synapse · December 21, 2016, 10:10pm

So, I’ve found the culprit. I’m not using a CRON process, but instead use:

http://domain/lists/admin/?page=processqueue&secret=xxxxxx

I have this URL loaded on a browser that is refreshed by a refresh extension in Chrome. Each time this page is refreshed, PHPList opens a port in the database. If the refresh period is long enough, 300s for instance, regardless of send method (api/smtp), there will be no issues. It is enough time for the port request to expire it seems that the port closing period is under 300s. I have been refreshing at 10s, 30s, 60s, and each refresh produces an open port request. So they stackup as each one is under the port opening expiration period. I can literally watch the refresh happen, the queue will process, a port connection will open on the database, and 150s later, the port will close.

I don’t know if the same issue would be faced if someone setup cron to 30s.

duncanc · December 21, 2016, 10:37pm

@synapse The way phplist processes the queue for the “remote secret” is different to the way for a cron job. Each time that the url is requested, the new instance of phplist will try to stop the currently running instance. Whereas for a cron job the new instance detects a currently running instance and terminates itself. So using a cron job is going to be more effective.

If you want to continue with the remote secret approach then you need to make the url refresh longer than one minute. I think that phplist will stop processing the queue after one minute, but you can probably confirm that from the database activity.

synapse · December 23, 2016, 8:47am

I’ve had problems setting up crontabs, which is why I haven’t really been pursuing them much, but I will soon.

Doesn’t this issue of port connections to the DB seem like a bug to you? Another anecdote I’ve found is that the issue isn’t limited to the database. I have recently left an SMTP config running with a longer refresh rate and found that the something has made the application hang after some time. Nothing shows in the log of events, a group of DB connections has built up, and the application server is hung. The only resolution is to restart the application server, either via . Something in the code definitely hangs up, because other applications in the same root directory are not affected, only phplist when this happens.

For testing purposes, I’m running 3 separate concurrent server instances of PHPList connecting to the same database. As these DB connections build up, I can clear the connections by restarting the app servers one by one. And I am not sending the secret queue to all three servers, just one.

duncanc · December 27, 2016, 2:29pm

@synapse There is clearly a problem somewhere in your setup but I’m afraid I don’t really have any further insight into what it might be.

synapse · December 31, 2016, 8:02pm

Hi Duncan- New developments here. Starting to see DB errors, which I logged into and fixed the errors in the DB. Even after fixing the errors in the DB, there were still many open connections on the DB, I restarted all application servers (LAMP stacks) that had PHPList running, no change, I ultimately had to reboot the database servers in order to clear the connections. I believe that there may be issues in how the code handles database connections.

Are there things I can do in the database or in managing the database that works well with PHPL?

Here’s the error log from the DB

phplist.phplist_usermessage	check	warning	Table is marked as crashed
phplist.phplist_usermessage	check	warning	1 client is using or hasn't closed the table properly
phplist.phplist_usermessage	check	error	Checksum for key:  4 doesn't match checksum for records
phplist.phplist_usermessage	check	error	Corrupt

danwaterloo · January 1, 2017, 6:24am

@synapse, I would not recommend running 3 copies of the front end with one database, and sending with all three.

I have used a setup like this to import lists, but not to send. Only one system will be able to send at a time, and it looks like it is doing something bad to your database.

You’ll need to go into the database with a tool like mysqlworkbench, and fix the database.

synapse · January 2, 2017, 11:34am

Hi @danwaterloo, I haven’t been sending this way but have tested it and it doesn’t work very well at all. It might though, if enough time passes for each instance to close the connection for each send task, as a means to running multiple SMTP w/ different username/password.

I’m running on Amazon AWS. By default, AWS likes InnoDB, and all of my tables are MyISAM. I’m wondering if this is the core of my issue at hand.

synapse · April 20, 2017, 12:06am

Looking back at this issue, I think the root cause was in how I was implementing both the secret processqueue and the browser queue. I was using a program that would refresh the web page at given intervals, in case the browser crashed out. Each time the page refreshed, it would open a new database session and it would not log out until it essentially timed out. So, lesson learned, just don’t use a browser refresh extension/widget.