Search Engine Improvements in Preview Release

After months of experiments, we are finally getting ready to unravel our new search engine along with a preview release of our life sciences research portal. Here is a brief summary of improvements that we made in our core platform.

  • The new algorithm looks for search terms exhaustively across the entire corpus (currently NIH open access articles and abstracts totaling over 10 Million) as the initial search space
  • It quickly narrows down the search space based upon strength of word occurrences, their part of speech, their significance (as a dictionary term) and their relationship to other search terms
  • We have incorporated dictionaries to improve search relevance and make the interface generic such that we can continually add new dictionaries and results will get better automatically. Dictionaries can be applied freely – be it a list of know proteins, reagents or information categories. The search engine will identify matching words and improves their proximity search to discover more meaningful and contextually relevant items.
  • The new algorithm discovers new concepts above and beyond Bioentities or reagents, based upon their relevance to your search terms and their frequent occurrences in the search space. These concepts allow you to further filter search results in conjunction with Bioentities and reagents – thereby letting you precisely identify your candidate results.
  • We have made the search execution platform robust and incorporated search progress monitoring interface as part of all requests. User is given progress feedback across various stages of query execution.