It has been quite a year where we rewrote our user-interface (read at nlpcore.io/user-documentation), revamped our core search algorithms (read at nlpcore.io/search-engine-improvements-in-preview-release), revised our core infrastructure for scale (read at nlpcore.io/revamping-it-all), and to round it all we now invite you to preview our development platform complete with secure web service APIs and code samples all hosted on widely used cloud services.
We have built our APIs using Swagger (swagger.io) framework for RESTful web services and have hosted these services on Tyk (tyk.io) API Gateway and Management Platform – both of these are well trusted and widely used tools among developers community. Furthermore we have integrated these with our identity and authentication management services (built on freeIPA freeipa.org).
Developers Portal (developers.nlpcore.com)
Registration and Secure Access to Projects
To access our platform APIs, developers register at our portal using the registration page at (developers.nlpcore.com/portal/register/). Once registered, they can log-in to the portal and request an API Key at (developers.nlpcore.com/portal/apis) to get started (they only need to request the key once). They can register for any publicly available projects and / or create new projects where they can then upload their own documents or web pages for search.
To facilitate this secure access both for developers, and for our web search UI, we maintain identities in our directory service over LDAP protocol. Additionally this service is capable of authenticating users identity through third-party login services such as Google+ and Facebook. Once authenticated, users get added to their selected projects and assigned policy granting them access to our APIs.
The API catalog is maintained and documented using Swagger. We have implemented complete versioning support including multiple versions running concurrently (as an example our beta and alpha sites are on different API versions but use the same extracted data) using our new revised infrastructure (read more at nlpcore.io/revamping-it-all). This Swagger maintained site allows you to also test any of our APIs but to do so, you must click on the Authorize button on the page (on the right above the API list) and use your developer key.
Our object model consists of following components.
Project – Projects provide a scoping boundary both for security and documents collection. Search results can span across projects but to do so, user must have access to these projects.
Document – Document is a body of text (numeric data, image, audio, video – any media type in future) – a file or a web page that is identified, stored, indexed, searched and retrieved as a whole or in part through its containing topics, entities, or references (see below).
Topic – A topic, a category or a concept is a named collection of relevant (to search keywords) extracted terms from a collection of documents. For example in Life Sciences, these are Proteins, Genes, Cell Lines or more generically bio-molecules. We plan to provide People, Organizations, Places, Events, Products, Documents and Concepts as broad general topics across all domains.
Entity – An entity is an extracted term from a document. Entity may be grouped in one or more topics and is considered related to other entities through annotations or description in the document where they occur with-in certain distance of one another.
Annotation Reference – An entity reference or annotation reference is the surrounding text containing this entity together with its related entities. Any two entities may have multiple references with-in or across documents and its count signifies the relative strength of their relationship.
Our APIs are conveniently grouped in these namespaces below.
Project – The Project APIs provide available project list. For security reasons we have chosen not to expose any create or delete APIs for projects but instead we require developers to manually do so using our developers portal as described above.
Document – These APIs allow developers to extract various attributes – meta data, word graph with-in and across documents, and related documents.
Search – These APIs allow developers to get documents, search term suggestions, topics, as well as the relationship graph of extracted entities (topic instances).
Entities – These APIs allow developers to get surrounding text (annotation reference) for an extracted entity by our search platform.
Feedback – These APIs allow developers to record users feedback into our search platform. They can save search results in arbitrary collections to recall later, as well as mark any of the extracted result as relevant (or not) and reassign its topic name to their own liking. We record their feedback to improve subsequent search results both individually and collectively over time.
Besides Swagger where a developer can test a single API in isolation, we have also put together a few working samples to demonstrate specific life sciences use cases. To this end we have used Jupyter Notebook (jupyter.org) interactive computing environment. Developers can modify the python code to their liking (either simply parameter values or entire code blocks) at the website itself and run it to see the results or copy it for their own use.
While this is still a work in progress, we have now completed most of the required building blocks as we envisioned (and discovered!) and have exciting things in store for 2018! We invite you to try out our life sciences web search UI, try out our APIs and give us your feedback (firstname.lastname@example.org). Have a great season and start of 2018!