Starting your own “black hole” email server

In this article, I’ll describe an customizable yet simple “black hole” email server which you can deploy to your personal site (as long as you have superuser/administrator access to the server your @domain site is backed by). The phrase “black hole” encapsulates the server’s ability to asynchronously read hundreds of inbound emails per second (all without being able to process outbound email).

I’m going to start of by describing exactly how email works. To many people (myself included), email feels like ad-hoc, point-to-point communication:

However, it’s better represented as a service:

Note the clear distinction here: IMAP and POP3 are used by an email client to read emails from a server, while SMTP is used to send emails to a server (this includes client-to-server and server-to-server communication). Let’s work through a quick toy scenario. For this example, I’ll assume that you’ve already set up Microsoft Outlook on your local machine as the client, with GMail as the target email service. When you refresh your inbox, Outlook fetches the email from imap.gmail.com (using IMAP) or pop.gmail.com (using POP3), depending on the settings that you’ve selected. When you compose a reply to one of your unread emails and hit send, Outlook communicates the body of the email and appropriate headers via SMTP to smtp.gmail.com. GMail’s SMTP servers then lookup the destination email server associated with the recipient’s email address and subsequently send the email, again via SMTP.

If you wanted to, you could specify a different outgoing mail server. Local SMTP servers, such as sendmail and postfix, can be used to send emails directly from your @domain machine, as long as port 25 has been opened by your ISP. Python has its own SMTP implementation via the smtpd module. A quick skim over the docs shows that it’s laughably simple to use:

# smtp_example.py

import asyncore
from smtpd import SMTPServer

class SMTPExample(SMTPServer, object):

    def __init__(self, *args, **kwargs):
        super(Courier, self).__init__(*args, **kwargs)

    def process_message(self, peer, mailfrom, rcpttos, data):
        print(data)

if __name__ == "__main__":
    SMTPExample(("0.0.0.0", 25), None)
    asyncore.loop()

The process_message function is called every time the server intercepts an inbound or outbound message. In this case, the entire intercepted message is simply dumped to stdout – no other actions are taken. Needless to say, this is relatively useless; let’s improve the code so that it does something a bit more useful:


from email import message_from_string

....

    def process_message(self, peer, mailfrom, rcpttos, data):
        headers = message_from_string(data)
        body = headers.get_payload(decode=True).strip()
        print(body)
....

Now, instead of just printing the entire message (which includes headers), the code grabs the body of the email and prints it to stdout. For more information on how exactly this works, check out the email module in the Python docs.

Armed with a better understanding of smtpd, we can greatly expand the code. I’ve created a very simple callback-based SMTP server implementation using the smtpd package, which I’ve dubbed Courier:

#!/usr/bin/env python

import asyncore
from email import message_from_string
from email.iterators import typed_subpart_iterator
from email.utils import formataddr
import logging
from smtpd import SMTPServer
import sys


class Courier(SMTPServer, object):
    """
        Courier class, built on top of Python's SMTPServer.
    """

    def __init__(self, cb_fn, cb_data, *args, **kwargs):
        super(Courier, self).__init__(*args, **kwargs)
        self._cb_fn = cb_fn
        self._cb_data = cb_data

    def process_message(self, peer, mailfrom, rcpttos, data):
        """
            Process a single inbound message.
        """
        headers = message_from_string(data)
        body = ""

        # multipart
        if headers.is_multipart():
            for part in typed_subpart_iterator(headers, "text", "plain"):
                body += part.get_payload(decode=True).strip()

        # message body
        else:
            body += headers.get_payload(decode=True).strip

        # compile the data
        mail_data = {
            "sender": headers["from"],
            "recipient": headers["to"],
            "subject": headers["subject"],
            "body": body
        }
        logging.debug(mail_data)
        
        return self._cb_fn(mail_data, self._cb_data)

if __name__ == "__main__":
    args = parser.parse_args()
    start_courier(args)

To use it, simply define a callback function which processes the message defined by mail_data. This callback can be set up to perform a wide array of different functions, such as saving the email to a MySQL database or text file, understanding its contents through NLP, or just dumping its contents to stdout/stderr. Note that this implementation is unable to process email attachments – I’ll leave that as an exercise for you. Hint: you’ll probably have to make use of base64 encoding; python just happens to have its own built-in library to help with that.

Of course, there’s one huge catch to all this: smtpd doesn’t have an inbuilt mail relay (also known as mail transfer agent, or MTA), which is required to lookup destination IP addresses and send email. This means that “outbound” emails sent from your @domain server don’t actually get sent to the recipient; they simply get intercepted by Courier and processed by whatever callback you set. To create a fully functional and customizable SMTP server in Python is a much bigger project. slimta and TwistedMail are strong resources for better understanding how a fully functional local SMTP server is implemented.

For simplicity, I’ve pasted all of the Courier code with some example callbacks into a Github Gist. Since my machine is only accessible through SSH, I’ve set the server to run continuously in a screen session:

screen -dR courier
sudo python courier.py -c bounce

If you take a closer look at the code, you’ll see that the “bounce” command will append attach a nice blurb to the beginning of the email sent to it. Try it yourself! Send an email to courier [at] frankzliu [dt] com to receive an automated reply from my experimental email address1. YMMV – I’ve already disabled CAPTCHA and less secure apps on that Google account, so using GMail as an outgoing server works fine for me. Most public mail services such as Google and Yahoo put restrictions on the sender email address, which prevents me from using custom email addresses.

1It’s likely that it’ll get put into your spam folder, so be sure to check that!

Passive income from volatile online marketplaces

Everybody loves making money. Even more so when it’s near effortless.

Passive income
As defined by Wikipedia circa April 2015, passive income is “an income received on a regular basis, with little effort required to maintain” the income-generating method itself. What comes to the forefront of most people’s minds is probably a steady, interest- or investment-based source of income, such as investment portfolios and/or property rentals. Buying and keeping ahold of rapidly appreciating assets (such as housing in New York or Silicon Valley) are also common ways of making long-term money.

Both of these methods, however, are rather risky. What if the housing market crashes the month you finally piece together enough cash to buy that rental? Or perhaps the set of stocks you invested in goes in the wrong direction? RIP your hard earned money…

A solution: online marketplaces
While these “standard” methods for generating passive income clearly have their shortcomings, I think that there’s more risk-free methods for doing so, simply by looking towards the interwebz. Simply put: when it comes to the e-goods, highly volatile open markets are your friend. Why? Because prices are reflected the instant they are put online, and with an appropriate bot and/or web scraper, you can have rapid turnarounds on the dollar. I’ll use the Steam Community Marketplace as an example, with a specific focus on medium-cost Counter-Strike (Global Offensive) weapon skins. For the purposes of this example, define “medium-cost” as any skin with a steady median price of between $1-$10. On this particular online marketplace, many popular skins have drastic price fluctuations which can occur in a matter of seconds. There are various reasons for these price fluctuations which I will not go into – perhaps a future article if anybody is interested. For situations such as these, using a bot is actually not uncommon at all; many script kiddies login to their Steam accounts and then use browser plugins to nab a desired item the instant it goes below a certain threshold (these guys are also highly likely to get banned for doing so, as it violates the Steam Community Marketplace ToS).

But once these users get the item that they want, this process usually stops. What seems to be less common is applying such scripts to make money off of the online marketplace itself. A better way to do it is to automate the price checker from a remote, logged-out server (in my case, an AWS instance), so that the Steam police can’t link your account. The great part about AWS is that, even if it gets blocked, you can always assign it a different IP and continue raking in the $$$. In fact, for all you code-savvy people, I put together some spaghetti code to do this using Python and CasperJS, which you can access on Github. Simply input the weapon, skin, and exterior you’d like to track, the desired threshold on the price in USD, and your phone number into the command line, and the script will automatically send you a text message the moment the item drops below the threshold price. Once that happens, simply navigate to the buy page on your computer or mobile device and nab the item. For maximum effectiveness, sell it immediately in case the price goes back up, which it likely will due to your constant buying and selling. This is both simple and a non-violation of the ToS (since you are not using an automated buy/sell process), meaning that Steam cannot, for all intents and purposes, ban or suspend your account.

Now, there are caveats. Steam charges both a CS:GO and marketplace fee, which equates to a total of 15% the selling price of the item. I’m fairly certain that it’s implemented partially to combat making returns on the marketplace, but don’t let that stop you; it simply means that, when you buy low, you have to take this extra 15% sell fee into consideration. This is where the volatility of the online marketplace comes into play – if price fluctuations of > 15% ever occur within a short period of time, you can instantly turn around the buy price of that item for more money. The speed of the internet is truly a beautiful thing.

Two words of warming: 1) once you’ve made the money, turning it back into real cash requires another 10% fee through a third-party service, and 2) money made off of online marketplace transactions likely counts as taxable barter income, so don’t blame me if the IRS suddenly appears at your door.

Preliminary results
After a test run of one week on the CS:GO marketplace with a starting inventory of around $20, I’ve gotten a 20% return on the starting value of my Steam wallet. While this may not seem like much, imagine starting with $2000 and applying this method over a variety of items in the community marketplace. Making back $400 within a week doesn’t sound so bad now, huh?

As the title of this section suggests, these are only preliminary results. I’ve decided to apply this to a variety of marketplaces, and perhaps extend this method to barter real material goods such as eBay and Craigslist in the future. The principle remains the same: 1) buy low, 2) sell high, and 3) profit. And no, this is not a South Park episode or random internet meme – I mean profit in as literal a sense as possible.

In conclusion…
Right, so I know what’s going through your head right now. “If this is such a good idea, why the hell are you sharing your code with us?” As it turns out, you reading this is another source of passive income for me, assuming you don’t have AdBlock or some other anti-ad browser plugin. Both ad views and clicks on this site count towards income, which, in the long run, will likely provide me with a greater amount of passive income than quick turnarounds on these relatively short-lived online marketplaces. I also enjoy building my site much more than monotonously buying and selling e-items, so this provides me with a moderate level of entertainment and fulfillment as well.

Happy selling (and buying)!

Seam carving in Python

I believe that the best algorithms are usually the simplest. Seam carving, an intelligent image resizing algorithm first introduced by Avidan and Shamir in SIGGRAPH’07, is the embodiment of such a statement – no expensive image descriptors or fancy deep learning required. Rather, seam carving employs dynamic programming to find connected (or semi-connected) seams of minimal energy. By removing and adding seams at these locations, the resulting image can be resized with minimal impact to the actual content in the image. Best of all, the algorithm can be customized with only a bit of user input: regions can be forcibly removed or kept if the user specifies such.

As a part of the larger panorama project that I’m currently undertaking, I decided to implement seam carving and use it in conjunction with multi-band blending to emulate parallax tolerance. Not sure how well it will work yet, but it’s always worth a try. You can grab the code here. Enjoy!

UPDATE 1: I’ve transitioned the seam carving code into radiant, a comprehensive image editing library that I’ve been working on. The link now takes you there, instead of the original codebase.

PyPano: A better panorama stitcher

Non-parallax image stitching (i.e. stitching images which are taken from a fixed camera position) is a very well-known problem. In the past couple of weeks, I’ve been coding up a full panorama stitching pipeline in Python, which implements the well-known algorithm presented by Matthew Brown and David Lowe at ICCV’03. My code currently implements a couple of optimizations as well:

  1. ORB over SIFT. ORB features are faster to compute, easier to match, and provide a similar level of scale-invariance when compared with SIFT. This occurs due to the binary nature of ORB features, which inherently makes them easier to manage.
  2. Faster bundle adjustment. Direct bundle adjustment of homography elements does indeed use around 2x more memory and computational power vs. camera parameters, but almost always completes in much fewer iterations. One-by-one addition into the bundle adjuster is also no longer needed, which greatly improves runtime. I’ll provide more details on this process later.
  3. Cylindrical projection. Purely for viewing purposes, since planar coordinates look quite unnatural as the field-of-view approaches 180 degrees in either direction. To the best of my knowledge, no Python-based open source stitcher is able to do this yet.

I have some TODOSs lined up as well:

  1. Adaptive local feature matching. Inspired by this paper, I’d like to add in an iterative, cascade-like process which searches for local feature matches between images. This should make the stitching process much more accurate for images which do not contain globally distinctive features.

You can find it here. Feel free to use the code where you see fit, but please cite it if you do so. Feel free to email me or file an issue on Github if you find any bugs. Enjoy!

UPDATE 1: I have been notified that the Github link is broken. I will fix this momentarily.
UPDATE 2: Link has been fixed.
UPDATE 3: Added some features.
UPDATE 4: Updated link.

How to replace MATLAB

Ran out of MATLAB licenses at work? Memory issues? Too damn expensive? Fear not – below, I’ve provided a set of simple steps that you can take to replace MATLAB! All you need to do is open up the terminal and type the following (and for all those Fedora users out there, use yum instead of apt):

On Linux

Step 1: sudo apt-get install python
Step 2: sudo apt-get install python-pip
Step 3: pip install numpy
Step 4: pip install matplotlib
Step 5: python
Step 6: $$ profit $$

On OS X

Step 1: ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Step 2: brew install python
Step 3: pip install numpy
Step 4: pip install matplotlib
Step 5: python
Step 6: $$ profit $$

In all seriousness…

I’ve come to love Python as a programming platform, which I really just started using about six months ago. While it may not be the best language for deploying production systems (Java/C++ is better for that), it’s quick, has an amazing set of packages which greatly extend its functionality, and best of all, is free. Want to do fast matrix arithmetic and more advanced computation? Use NumPy and SciPy. Want to experiment with algorithms in machine learning and computational statistics? Use scikit-learn or Vowpal Wabbit. Want to quickly create a robust web appliation? Use Flask. The possibilities are endless.

Admittedly, I had strong initial resistance to Python programming. Prior to becoming a Python addict, I essentially used only C/C++ and MATLAB (and sometimes still do today for my small side projects). For low-level tasks, C was a great language to use, while MATLAB provided me with a lot of high-level functionality. Occasionally, when I needed to throw together a quick demo for non-techies, I’d use Java for its easy GUI capabilities.

The beauty of Python is that it provides me with all three. Many Python libraries are coded in C/C++, and then “ported” to Python using bindings. This allows you to have speed and efficiency in a native high-level interpreter such as Python. In fact, NumPy’s array indexing conventions are so similar to MATLAB’s (see this page for details) that I was able to smoothly make the transition between MATLAB and numpy in something like half a day. Tkinter is also a great interface for creating and modifying windows, and, in my opinion, is easier to use than Swing.

Python does have its disadvantages as well. It’s lack of static typing means that you can potentially have lots of long bug chains within your code. Creating a 1-element Python tuple can be confusing – I still sometimes try to instantiate one without the trailing comma. It’s also absolutely terrible for long-term code maintenance, which is why most established companies and large open source projects use either Java/Scala or C++.

All-in-all, I think its advantages are too considerable to ignore. The next time you’re thinking about doing a small side project, take a step back and consider Python.

Disable CPU throttling on MacBook Pro (15-inch, mid 2014)

(This is an extension of the original post by Rhys Oxenhams).

Lately, I’ve noticed that my MBP often slows down drastically after it’s on for a while, or if I run some computation-intensive multi-threaded code. Today, I finally decided to explore the problem. (DISCLAIMER: I am a OS X noobie – I do most of my code in linux – so please take what I say with a grain of salt. Nonetheless, if it seems that you have having a similar problem, I encourage you to try out this method. Be sure to let me know if it works!) The way I see it, you have two solutions:

Option 1: Use Windows

People in the tech industry often forget that Windows, apart from its myraid of issues related to security, is actually a well-designed operating system. Yes, OS X maintains a higher level of compatibility with Linux systems, but Windows is still superior on some fronts. I use my Windows partition primarily for running large-scale image processing experiments in MATLAB (it runs horribly slow in OS X), and for performing GPU-intensive tasks (I increase both core and memory clock frequencies for the discrete 750M).

From my experience, booting into Windows removes the the throttling issues. Of course, your computer will still shut down if any of the core components reaches a temperature of Tjunction, which is typically somewhere around 100 degrees Celsius for mobile CPUs and 150 degrees or so for GPUs.

Option 2: Modify kernel drivers

Warning: modify kernel drivers at your own risk.

There is a way, however, to reduce throttling directly in OS X. After replicating this condition, I checked both the activity monitor and ran top (which displays the most CPU-intensive, i.e. top, processes). I quickly noticed that the generic “kernel_task” process was responsible for a huge portion of the CPU usage on my computer. A quick google search led me to the link above, which I promptly tried. It seemed to make sense as well – MBPs are known for throttling both CPUs and GPUs in order to meet power and temperature constraints, and Intel’s SpeedStep technology paves the way for OS X to dynamically adjust the clock periods of the on-board oscillators. One can imagine why this is just by looking at their design, too.

One problem – the numerical portion of the model identifier for all mid-2014 MBPs is 11,3, which, after looking inside the aforementioned kext (kernel driver), the file that we need to identify and remove doesn’t actually exist. If you try removing all plists within the kext, nothing noticeable happens upon reboot – the driver continues to hog resources.

It turns out that you can use the pmset, which allows you to manage a variety of power-related settings from the command line. Open up a terminal and type:

sudo pmset -a dps 0

It should disable dynamic clock cycle changes and other “power optimizations” which OS X automatically does.