A few years ago I completed Harvard’s CS50x – an online computer science course. My favourite part of the course was the self-directed final project. I’m interested in the question of how we can filter the mass of digital information that we’re confronted with every day, so that we can enjoy the best bits without spending all of our time scanning and processing. In my own life, I’d found that I had less time to keep an eye on Twitter. I really wanted a tool to keep an eye on Twitter activity while I was at work. So I explored this in my final project. I built a tool to monitor Twitter activity for a given list.

What I built was pretty basic – it was running on an old laptop in the corner of my flat, and it just displayed the results on the laptop screen after a specified number of hours had elapsed. A solid final project, but it only ever felt like a proof of concept. I wanted to build this into something I could use, and something that other people could use too.

So over the next few years I built it from the ground up as a full-blown web application that has now been launched as a real business: MySocialSummary.com.

The CS50 class was great, but once you have the foundations of understanding in place, there’s no substitute for the motivation, exploration, imagination and excitement of working on your own idea. Subject to having a reasonable grounding (and I’ve taken a number of computer science classes, not just CS50), I’d definitely recommend taking on a project that excites you, to take your knowledge to the next level.

Going through this process I’ve learnt a lot. I’ve divided this into two lists: one on specific computer science and business insights, the other on more overarching observations.

Specific computer science and business lessons

  • How to embed tweets in emails. (Or, rather, why you can’t embed tweets in emails.)
  • Working with APIs. An API is a tool for communicating between two pieces of software. Manipulation of data from the Twitter API is the foundation of My Social Summary.
  • How to load test your website. If you’ve built a website, you probably want people to visit. You commission your infrastructure with a certain number of visitors in mind. It’s worth testing how your site would behave if an unexpectedly large number of visitors arrive at once. This is called Load Testing. I used a tool called Load Impact to check out the load speed of My Social Summary when you add more users. As you can see, the homepage performs well with 50 visitors loading it at the same time.

    load test of mysocialsummary.com with 50 users

  • Working with code libraries. When you’re programming, you’ll often want to do something that other people have done before. For example, loading a web page, sending an email, or accessing the Twitter API. If you’re trying to solve a common problem, often there’ll be a code library of pre-written code that will help you with this. By using a code library, rather than producing this code from scratch, you can save a lot of effort, and the quality of the code is probably much higher quality than what you’d produce yourself. This frees you up to focus on the problems that no-one else has worked on yet. Not all code libraries are created equal – some are better documented and more fully-featured than others – so picking the right one is an important decision.
  • How to set up login via Twitter. A good example of when a code library can be useful. I’d have definitely struggled to build this authentication myself if I’d had to do so from scratch. Check out this documentation.

    A diagram of part of the Twitter authentication process

  • How to securely handle user input. If you take any form of user input on your website that ends up interacting with the web server, it’s a security risk. So you need to build your site from the ground up with security in mind. I really enjoyed learning about how you can protect yourself using parameterised database queries and other security best practices.
  • How to set up a LAMP server. That’s a common infrastructure stack, using Linux, Apache, MySQL and PHP to run a website. I’ve learnt how to manipulate the different parts of the machine (e.g. updating Apache settings) and use the different management tools.
  • The complexities of daylight saving time. When the clocks changed, I noticed that my daily summary emails were arriving an hour early. Fixing this problem for myself was trivial, but solving it for all different users – and possible future users – was harder. I’ve built functionality that checks every evening against the very latest timezone rules, and supports every timezone. (Check out the exhaustive dropdown list on the page where you can sign up for a free trial account)
  • The benefits of refactoring code. If code is a set of instructions, refactoring is rewriting those instructions to be more efficient. Refactoring your code isn’t fun – it’s hard work, and involves trawling through your past self’s logic and trying to improve upon it. But it can make massive performance improvements. In one case I was able to decrease the number of database interactions by a factor of a thousand through batch operations.

    Are you too busy to improve your process?

  • The importance of good version control and deployment infrastructure. Any halfway competent programmer will tell you that good version control is really important. At the start of this project I was evidently not a halfway competent programmer. If you operate version control you can review the history of your code over time, and quickly track down errors. Your deployment infrastructure is the process from getting your code from a code file on your computer to your web server so that it can start doing real work. Ideally you want this to be as frictionless as possible. In the early days, I had no version control, and my deployment procedure was copying and pasting the contents of one text file into another. I can confirm that this was not a good process.
  • How to work with a front-end framework. Specifically Twitter Bootstrap. My interest in this project has been more in concepts and engineering than in presentation. But any website needs to have a front-end of HTML and CSS. Rather than building all of this from scratch, I used the Bootstrap library to help set up the structure and presentation of the site. This made it quite easy to make the site mobile responsive.
  • How to integrate with a payment provider. Taking money from people online has a few complexities. The main one is passing data from your payment provider to your web server, so that you know who’s paid what. This entailed using Paddle’s Webhooks. Handling all the different types of event that might happen – and doing so securely – was quite a bit of work, but it was very satisfying to get this set up and working correctly, and see money come in to my account.
  • How to use cron jobs. On a Linux machine, you can instruct the computer to automatically carry out tasks at certain times. These are called ‘cron jobs’, and they can be really powerful. I make extensive use of these for My Social Summary, for the management of user accounts, and the sending out of communications to new users.
  • How to satisfy the Twitter brand guidelines – and how to update things when they change…. Before coming up with a name, I carefully checked Twitter’s brand guidelines to make sure that the name was consistent. I felt pretty smug about having thought to do this. However, Twitter changed the goalposts, and later the name I’d chosen was no longer acceptable. So I had to come up with a new name for the service – and then roll out this change to all aspects of the service.
  • How to set up HTTPS security. Privacy and security are important, so I wanted to make sure that the site I was building used HTTPS encryption. Setting this up was quite easy – much of the process was done by my web hosting provider who installed the certificate. Seeing the little padlock next to the domain name for the first time was very satisfying.
  • How to set up email authentication via DKIM. I had no idea what this was. But one day I saw a little question mark when I opened up one of my summary emails, next to the sender name. Gmail said that it wasn’t sure if the sender of the email was actually who they claimed to be. Google had instructions on what to do next if you were the person sending this email, and I didn’t even need to follow all of these – pretty much all I had to do was change a setting with my hosting provider and this was resolved.

    Example of an image without DKIM

  • How to migrate hosting provider. Part-way in to the project, it became clear that I’d outgrown the infrastructure I’d started on and needed a more robust web server. The process of transitioning from one provider to another is something I’ve overseen before in my day job, but to do all the work myself was useful. Before I started the process, I had to choose a provider with the right infrastructure and an SLA to match my needs. (I chose TSOHost and I’m happy with them.) Then I had to commission a new database within slightly different constraints, migrate the data over, and switch the traffic over to the new site. None of this was massively difficult, but the work took about a day nonetheless.
  • How to register a business in the UK.
  • How to track business finances. Fortunately the costs of a digital business are easy to track, so a simple Google spreadsheet suffices.
  • How to complete the year-end tax process. I haven’t had to do this yet – this is likely to be the real learning curve.
  • How to comply with the EU’s new rules on VAT for digital services. The EU changed its laws for how VAT is charged on digital services. Rather than being charged at the rate of the country of the person selling the service, VAT has to be charged at the rate in the country of the person buying the product. You have to collect evidence to prove this, and you can either make VAT payments to each EU country separately, according to its own processes, or you can submit centralised returns through the UK VAT MOSS service. Both of these processes seemed tedious and onerous, so I opted instead to use a payment provider that would handle VAT on my behalf. This meant switching from Stripe or PayPal (I’d been testing out both) to Paddle. Here’s a blog post on Paddle’s site discussing the changes.
  • How to set up a Content Delivery Network. Your website lives on a web server – a computer probably sitting in a datacentre somewhere. If someone the other side of the world wants to see one of your web pages, the information has to travel all the way to your web server and back. It would be quicker if they were requesting the web page from somewhere closer to home. A CDN is a network of web servers around the world that does this task. A CDN can also help take some of the load off your web server – if people are getting pages from the CDN rather than your web server, the web server doesn’t have to work as hard. I use the Cloudflare CDN. It has a free tier and is easy to set up. It disrupted the site’s HTTPS setup for a few hours while I turned it on, but other than setting up the CDN was easy.

    Cloudflare CDN usage graph example

Broader lessons

  • Starting with a Minimum Viable Product is the right way to build a digital product. I started off by building the simplest possible tool to meet my needs, and then I successively improved it from there until I had a product that was ready to launch. Had I tried to build everything in one go, I wouldn’t have known how to do it, or what I was trying to build. And I wouldn’t have had the motivation and insights provided by being able to use the tool myself from early on in the process. It would not have even been clear if what I was building was going to useful to anyone until after it had been launched. The Lean Startup methodology, and the Agile Manifesto, tell us that you should start off by building a Minimum Viable Product – the most basic version of your product that is still valuable. I already had an intellectual appreciation of this, but it’s always useful to test theories out in practice.

    How to build a minimum viable product diagram

  • The power of self-directed learning. This project has been entirely self-directed. This has been thrilling – a chance to follow my interests and try and constantly reassess what I need to do – or learn – next. You learn a lot more, and a lot more broadly, if you’re learning to achieve a personal goal.
  • The power of emergent possibilities and understanding. When I started this project, I didn’t know everything upfront. I didn’t even know what I would need to know. The Agile approach to projects – release early, and iterate based on evidence of real user behavior – is thoroughly vindicated by this. Working on this project has strengthened my understanding of, and commitment to, the Agile and Lean principles of releasing early and improving your product from there.
  • The importance of knowing how you work – and what you need to work. Working on a side project is different to working your day job. I’ve become more sensitive to when I’m in the right headspace to do some work on My Social Summary, and when I just need to take it easy. I also have a better understanding about the conditions I need to get work done outside of my normal job. I tend to work quite well in uninterrupted stretches of at least an hour and a half – especially if I’m programming.
  • The importance of dealing with changing circumstances. The external environment changes. During this project, VAT legislation changed and Twitter’s naming rules changed. You need an approach to building a product that is not only resilient to this, but embraces it. Again, if I hadn’t been following an Agile approach, I’d have been in trouble.
  • To trust in your ability to keep learning and know more than you know now. Taking on a self-directed project builds self-confidence and self-reliance.

I’d love for you to check out what I’ve been working on. My Social Summary has a free one month trial, with no card details needed, so see if it can help you get more from Twitter.

What’s next for me?

  • Spread the word about My Social Summary.
  • Explore how we might overcome the ‘filter bubble’ that seems to be narrowing our political and cultural discourse. This feels like an important design challenge.
  • From a computer science perspective, I’d like to build something in a different, and more modern, technology stack. Probably Node.js / Foundation (for front-end) / PostgreSQL / Heroku.