Installing Open edX on Microsoft Azure

We've been getting to know the Open edX platform here at Microsoft, and of course we want to run it on the Azure cloud platform. So I put together a couple of notes that walk through the process of getting Open edX up and running.

Provisioning a Linux Virtual Machine

Azure offers a broad variety of cloud services, but for Open edX we're going to use an IaaS delivery model, running it on a hosted Ubuntu Linux virtual machine. Sign into the Azure portal if you haven't already done so, and create a new virtual machine.

Today, Open edX is supported on Ubuntu 12.04 LTS, so we'll go ahead and use that image from the gallery.

[Screenshot] Creating an Ubuntu 12.04 virtual machine

Security prerequisites: SSH certificates

While you can use a password to authenticate access to the server, best security practice is to create passwordless virtual machines secured with an SSL public / private keypair. And anyway

The Azure site has detailed instructions on how to create an OpenSSL certificate for various client platforms; here I'll simply note the steps I used on my own Windows 8.1 machine. I'll just note that you'll need to a copy of openssl, which I had already installed on my machine as a dependency of Git.

Firstly, you'll need to create a certificate with openssl.

C:\Scratch> "\Program Files (x86)\Git\bin\openssl.exe" req -x509 -nodes \
 -days 365 -newkey rsa:2048 -keyout edx.key -out edx.pem \
 -config "c:\Program Files (x86)\Git\ssl\openssl.cnf"

Loading 'screen' into random state - done
Generating a 2048 bit RSA private key
.+++
..........................................................+++
writing new private key to 'edx.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:WA
Locality Name (eg, city) []:Redmond
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Microsoft Corporation
Organizational Unit Name (eg, section) []:
Common Name (eg, YOUR name) []:Tim Sneath
Email Address []:tims@microsoft.com

The output .pem file will be used to provision the virtual machine in a minute; but you'll use the key on the client to connect to the server. Once the VM is provisioned, you'll open a terminal using ssh.

I tend to use putty on Windows: it's a portable 500KB executable that is well-featured and very serviceable for our needs. You'll want to convert the key into a format that putty understands, so let's do that also. It's a two stage operation - first converting to an RSA key and then converting it to the putty private key file format.

C:\Scratch> "C:\Program Files (x86)\Git\bin\openssl.exe" rsa -in edx.key -out edx.rsakey
writing RSA key

Now from the separate puttygen utility (also available from http://putty.org), we can load the RSA key we've just generated and then save a private key to a local file.

Here's the puttygen app - you'll use the Load and Save private key buttons:

WARNING: you might be looking at this and thinking that a username / password combination just seems like a lot less work. But here's the thing - the Open edX production server installation process disables password authentication, so unless you reconfigure the server at the end of the process, you'll be locked out the next time you try and reconnect. Trust me: it's worth getting SSH set up for key-based authentication before you get too far down the road.

Completing the Azure virtual machine configuration

With our certificate .pem file in hand, we can now proceed to configuration of the Open edX instance.

Pick the latest version release of the machine to minimize the number of post-installation updates, and give it a name. During setup, I prefer to pick a larger size machine to speed up the installation process, since it's resource intensive; it's easy to reconfigure the machine afterwards to something more appropriate, particularly for a development environment. Also note that best practice for production sites is also to rename the user account from the easily guessable azureuser default.

[Screenshot] Choosing installation settings for an Open edX stack

Most of the other settings Azure provides are sane, but make sure you pick an appropriate region for your machine. Azure will now provision the VM with the default Ubuntu image, which should take no more than a couple of minutes.

Before we start doing anything with Open edX, let's make sure that any security updates and important patches are installed:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade

Ignore any offer Ubuntu makes to upgrade to a later version (e.g. 14.04 LTS) - Open edX is at the time of writing only supported on 12.04 LTS. A shutdown / reboot is probably worthwhile at this point just to make sure that you have a stable working base for Open edX:

sudo reboot

Installing Open edX

Once the server is back up and you've reconnected to it, it's time to get to work.

As someone who doesn't do Linux devops for a living, I've been learning some new tricks as I've been doing this. When I first started, I'd get frustrated because ssh would sometimes timeout despite a solid internet connection, and whatever that session was doing would just get killed. Fine if I was just browsing around in a shell, not so good when you're halfway through a lengthy install operation and now your machine is in an inconsistent and unknown state.

So I've become fast friends with screen, which acts as a persistent window session manager. If the ssh session drops for any reason and you have screen up and running, you can always reconnect to the existing screen session. You can also use screen to log commands and output, which is also helpful for tracking down a problem. Other screen managers are available, notably tmux, but I like the simplicity of screen. So go ahead and create a screen session before doing anything else:

screen -d -RR

If your ssh session gets disconnected, simply log in again and issue the command above to be reconnected to that session. You can of course find lots more information about screen online including this GNU Screen survival guide posted on StackOverflow.

The Open edX installation itself should be relatively painless, albeit lengthy. There's a simple shell script that performs the install, and you can run that by piping it to bash, as so:

wget https://raw.githubusercontent.com/edx/configuration/master/util/install/vagrant.sh -O - | bash

More information on the steps in this script can be found on the edX documentation github.

The script grabs a couple of hundred megabytes of libraries and tools, including: python, git, gcc, perl, nginx, mongodb, and ansible. This is when you're going to appreciate having selected a powerful Azure instance with plenty of memory. This takes time: even on my standard tier D4 instance, with eight cores and 28GB memory, it took nearly an hour to install. A few steps - install python base-requirements, code sandbox | Install sandbox requirements into sandbox venv and gather static assets with paver - individually took nearly ten minutes to complete on my machine. And don't panic when you see a couple of 'failed' warnings, for instance this one, where the script checks the existence of ruby before installing it:

[Screenshot] Installing edX

With luck, the installation will eventually finish and return you to a shell prompt. If you're unfortunate enough to experience a problem, check the edx-ops forum to see if others have experienced the same issue.

Here's what a successfully finished installation looks like:
[Screenshot] Completed installation

At this point, it's probably worth rebooting the server: indeed Open edX may require it for some configurations:

sudo reboot

Finishing Up

Now the installation is completed, you'll need to open the relevant endpoints on the virtual machine so that you can access both the content management system and the learner environments from a web browser. By default, the Open edX learner environment is on the standard HTTP port 80, and the authoring studio environment is on port 18010, so you'll need to create standalone endpoints for each. Unless you have a need to present a different public-facing port, it's easiest to simply pass through from the public to private ports:

[Screenshot] Configuring Azure endpoints for Open edX

And that's it! You should now be up and running.

Here's how the vanilla, just-installed LMS configuration should look:

[Screenshot] Default Open edX platform configuration

...and here's the Studio site:

[Screenshot] Default Open edX studio configuration

In a future post, I'll write about configuring and customizing this default instance. But in the meantime, edX has some starter commands to help you manage the stack on their docs site.

Have fun!