REST’ing on Containers…
A lot has been written and talked about Web Services over the years and Docker Containers are the latest buzz. My first experience with web services, building and using them came at Microsoft back at its Dynamics CRM 3.0 engineering team where we built, consumed and exposed a set of entity (contacts, tasks, Leads…) web services as an abstraction over traditional relational database tables. I still fondly recall some of our tongue-in-cheek conversations regarding number of times a particular call from its web user-interface to middle tier to the database had to go through Object to XML back to Object then back to XML and so on, eventually down to a SQL SELECT, UPDATE or DELETE statement for the SQL Server (the eventual bottleneck for growing beyond enterprise to web scale). As for containers back in 2008, I had an engaging discussion with Internet Explorer developers regarding making use of Windows Virtual PC technology to offer backward compatibility and freely build groundbreaking new features.
However fast forwarding to present day with our startup NLPCORE it is a completely different ball game all together. See my earlier post (http://www.nlpcore.io/?p=36) for a brief overview of what we are about and our hardware experiments (we have btw added massive computing and storage capacity of our own with a couple of blade servers since then and yes the liquid cooled CPU/GPU monster is still happily crunching all computations thrown at it continuously!). In this post I will focus on our implementation stack and how our early conviction on Web Services and Docker Containers has paid us rich dividends after a few iterations, trials, tribulations and eventual triumphs.
Here is a brief lowdown on how we have stacked together our components. Each component is encapsulated in a virtual container that can interact with others using its well defined end points. That allows us to independently revise/upgrade or even replace or concurrently maintain multiple versions of each component.
Containers help us encapsulate all dependencies at their tried and tested versions, configure their settings and deploy just the required component. Our components expose and consume well-defined and versioned web services to communicate with each other as well as with third parties as long as they have proper authentication and access tokens provided by our identity management system (using OAuth protocols that does not require us to create or maintain user IDs or email addresses at our end).
Putting it all together
All our code is written in Python and we recently integrated Swagger helping us tremendously on API references, documentation and samples. We host our own source code maintenance platform (gitlab) to ensure that we can maintain source code with proper versions and enable multiple dev teams to independently check-out, make changes and merge check-in any changes.
Our build process is also fully automated and follows CI/CD model. As part of code development, developers write check-in test scripts that our build engine executes after successful compilation and if approved it will continue to build a complete Docker container image with appropriate dependencies automatically pulled down and baked in to the image. Thanks to Docker’s incremental imaging ability, any subsequent builds after code changes only require a delta image to be created. A developer can therefore can build our entire platform from scratch and continue to work on any one portion with rest of the container(s) remaining unchanged and accessible for their testing and verification.
Besides our development stack, Varun has gone ahead and put together our blog/documentation site at http://nlpcore.io running a self-hosted WordPress site (as what else – another Docker contained instance).
Our entire customer facing, internal and partner facing enterprise (internet, intranet and extranet) is therefore running on a number of Docker instances that are properly isolated, connected through gigabit switch, internet and virtual private networks where appropriate and each component protected by authenticated access. Besides hosting it all on a set of blade/custom built servers ourselves, we subscribe to an off the shelf storage cloud service to ensure we have our data backed up somewhere else safely.
Frankly coming from an enterprise into a startup with minimal resources, I am super impressed and amazed with what our brilliant CTO Varun Mittal has assembled together while managing his work and studies. The entire docker infrastructure that Varun put together not only helped UW unblock their labs, earned him a well-deserved Research Assistantship but it is now also a published paper (we’ll add references at our site soon)!
Putting platform to test!
As Varun made progress on putting our hardware infrastructure, refactored, rewrote majority of the existing code in discrete components hosted by Docker instances, improved core NLP/ML algorithms (more on this in a future post), he showed us the new results and APIs at work very impressively. However we needed a real world test case and at the same time we were really hard pressed on finding a solid engineering support for him for revamping our own web user interface. After a few failed attempts through internship offers, and my own half-hearted attempts on taking up UX development (I haven’t given it up just yet! Thanks to Coursera, I have sped through quite a few Python, HTML5/CSS courses – so will be at it soon again!), we finally decided to go all out and find a serious third party partner who could take this on end to end. And I was fortunate to reconnect with one of my old friends who happened to be just the partner we needed!
I wrote down an Engineering requirements document that heavily leveraged our existing proof of concept implementation at http://nlpcore.com and our planned Web Services (exposing a clean interface across each component as depicted in the architecture diagram above) and handed this off to our new partners to get started on rebuilding our new interface from ground up! They recommended and chose a server side Java based framework for web apps vaadin – something that was new for us to learn and play with!
After a couple of weeks of ramp-up, we are thoroughly pleased to report that our bets have paid off! Having a third party develop our own user interface at an arm’s length using (supposedly) well-defined web services is a true test of our decoupled architecture and one of the best ways to eat our own dogfood (our cloud search platform).
The new user interface (still a work in progress) is completely data-driven and decoupled from our platform (even the notion of entity types – genes, proteins, people… is completely abstracted – another future post!). Thanks to vaadin, we can not only easily maintain and extend the server side Java modules, but also apply different CSS styles to morph look and feel of the interface without changing any code (written in Java and maintained by application server).
Magic of CI/CD helps us take regular drops (as often, as early as possible), and deploy it on another Docker instance that we surface to our pilot users at http://beta.nlpcore.com.
We are very excited and optimistic about completing our planned features across search platform web services as well as life sciences search, collaboration and procurement solution in next couple of months ahead. We already have a number of pilot customers identified and will be circling back with them to get them to try out our life sciences solution, provide feedback (that is baked right into our solution as part of its collaboration features) and help us get deployed deeply in the biotech community.
Furthermore, we will be documenting and writing samples describing our web services platform that we envision a wide spectrum of life sciences researchers, coders, enterprise search consumers, developers, add-on developers (proprietary data formats, data stores…) will find super attractive and super easy to work with. We will ourselves provide working components (including a couple of search algorithms that can be plugged right in our interface) as samples to jump start this community.