Spams lessons - part 1 cover image

Spams lessons - part 1

John Hoskins • July 8, 2019

About 20 years ago, I became involved in one of the largest or the largest collection of spams in the world. The spams in question where the forwarded complaints of users of the AOL service. As part of a team that was setting up a new internal database service, we offered our services to this team so they could be better supported in their efforts.

While we had great intentions, the team did not understand how as an organization we were unprepared for the real scale of the problems of spam. Due to an architectural decision, the place that people reported spam to could get full. And without a clear indication about when it was full, the process and timing of emptying it meant that people could not submit spam complaints about 90% of the time.

When we moved the database over to our new server, we found that our original estimate was off by a factor of 10. The amount of new data was over 10 times larger than what we had planned on. A new setup had allowed the box to never be full. This meant that people could try to forward a spam and the system would take it. In addition, the techniques that spam team was using was both killing the server – using table scans instead of indexes – and slowing them down. While we wished that we could have kept this customer, our equipment that was bought to host lots of new applications was going to be over run in months.

So, we worked to put them together with a mature team that could handle the volume of data that was now pouring in but there was one requirement before they would take their data off our hands. With the help of Chris Lane, I wrote a ‘spam_indexer’ program which would allow the firehose of spam to be indexed and available in near real time. The system supported the efforts to find the worst perpetrators and sue them.

Once we were done with our original work, we tried to pitch the idea of a proactive system to stop spam before it was delivered – looking at sender IP addresses, content, etc. When we talked to the appropriate teams about this, we were informed that there was an edict from the top, Steve Case, that our customers privacy was not to be invaded – deliver the mail, let them complain, and we will sue. Move along nothing you can do here. And so, I moved on to other things, after all spam was just a side project.

So what happened from those days 20 some years ago, the spam problem has grown and morphed and generally is part of the background radiation that the internet lives through. What could we have done back then to make a better today?