Contact Us


GOSTAR is the largest manually annotated structure-activity relationship (SAR) database of small molecules published in leading medicinal chemistry journals and patents. Compounds from both discovery and development stages targeting all target families are covered. Along with SAR, key properties like ADME and toxicity are captured. This relational database enables users to navigate and analyze massive content of small molecules to derive insightful decisions in design and discovery of novel compounds.

Content coverage

The GOSTAR database content is composed from various sources which includes:

  • MedChem Journals
  • Patents
  • FDA/EMEA/PMDA Reports
  • Clinical Trial Registries
  • Scientific Reviews
  • Company Websites
  • Books
  • Conferences
  • Public Sources

Figure 1. A quick view of content covered and sources of the content.

Patents covered in 2020

The patent coverage in GOSTAR database is very comprehensive. The content was indexed from more than 2900 patents in the year 2020. GOSTAR avoids duplicity or redundancy in database by avoiding capturing similar patents, i.e. patent published in multiple patent offices.

Table 1: Number of patents (patent office wise) covered in 2020 updates.

Preclinical candidates covered in 2020

In the year 2020, the GOSTAR database was enriched with 1500+ preclinical compounds acting against various indications like COVID-19, Non-alcoholic steatohepatitis (NASH), Hepatitis virus infections, HIV infections, Cardiovascular diseases, and various cancers.


Few significant drug inclusions in 2020 were:

  • EPV-COV19
  • FT-8225
  • VNRX-9945
  • CARG-201
  • S-540956
  • BMS-818251
  • BRII-732
  • CR-13626
  • NAB815
  • CV730
  • GLPG-4124
  • IDG-16177

Target space covered in 2020 updates

New content was updated for more than 2500 protein targets in 2020. While content for EGFR was updated from 200+ references, Adenosine A2A receptor was updated from 86 references and KRAS had content updated from 54 references, whilst NOTCH made into top 20 with around 4.7K compounds covered from a reference (Table 2).

Table 2: List of top 20 targets covered in 2020 updates

Distribution of SAR content

Figure 2. Assay wise distribution of SAR content covered 2020.

Of the 1.2 million SAR rows added to the GOSTAR, functional in-vitro and in-vivo contribute 41.25% to data, binding constitutes 32.28%, and 6.69% of content consists of ADME properties.

Approximately, 2% content is around toxicity properties of the compounds covered in 2020 and the rest 17% represents other property types including physicochemical properties.