Search Quality – Validating Protein-Protein and Host Gene-Virus Interactions using NLPCORE

As we round-out our product features, improve our platform and core text mining / entity extraction algorithms, we also focused on validating quality of search results. Thanks to our collaborators at University of Washington and Center for Infectious Disease Research (CIDR) we chose two representative life sciences data sets – one being the most commonly used Protein-Protein interactions and another being an experimentally discovered Host Gene-Virus interactions. Using these sets, we were able to not only validate a high recall rate but also a good precision through identifying interactions that were otherwise not mentioned in experimentally discovered set.

 

Here is a link to our Application Note (DRAFT) that we intend to revise shortly in the new year with our latest iteration of core algorithms. And here is the link for its Supplimentary Information with more details on our methods and data sets used in our tests.