Patents are the Cornerstone for Advancing Data-Driven KRAS Drug Discovery-old

Overview

Patents are the cornerstone for advancing data-driven KRAS Drug Discovery. This case study highlights how a leading US biopharmaceutical company, active in KRAS research, leveraged our GOSTAR™ database and custom patent curation services to overcome the challenge of data acquisition. KRAS mutations are common genetic alterations in human cancers, and while historically challenging, recent advances (like Sotorasib) have revitalized the field. Our collaboration provided a critical dataset of novel chemical structures and associated bioactivity data (SAR data) sourced directly from patents, enabling the client to accelerate their search for effective small-molecule KRAS inhibitors and informed decision-making in their R&D programs.

Our client

Our client

Our client is a leading us biopharmaceutical company actively involved in KRAS research. Their research and development (R&D) efforts are centered on data-driven discovery programs aimed at identifying and developing novel small-molecule KRAS inhibitors to treat cancers linked to KRAS mutations.

Client’s challenge

Client’s challenge

A key aspect of the client’s R&D was the critical need to gather comprehensive SAR data (structure-activity relationship) and novel chemical structures directly from patents related to KRAS. Patents represent a rich source of diverse chemotypes and industrial innovation, holding significantly more unique compounds than journal articles alone.

  • The number of patent applications featuring KRAS grew exponentially between 2014 and 2024, confirming the surging interest in KRAS-modulating small molecules.
  • GOSTAR, our platform renowned for patent-derived compounds, already offers 24 times as many unique compounds as ChEMBL, underscoring the gap between publicly available and patent-derived data.
  • The visualization of chemical space diversity using ChemTreeMap confirmed that GOSTAR™’s patent-derived chemical space is far more expansive and diverse than journal-based databases.

To maintain an up-to-date internal repository and effectively drive their medicinal chemistry programs in KRAS Drug Discovery, the client required expert assistance in extracting and validating high-quality, analysis-ready SAR data from this massive patent landscape.

Comparison of GOSTAR vs ChEMBL statistics for Human KRAS, showing 16,444 compounds in GOSTAR.

Figure 1: GOSTAR® vs. ChEMBL stats for Human KRAS

Client’s goals

Client’s goals

The primary goal of the US biopharmaceutical client was to secure a robust, comprehensive, and meticulously structured dataset of novel chemical structures and SAR data related to KRAS. This was intended to achieve the following

Fuel internal discovery

Immediately integrate the data into their in-house platform to accelerate their medicinal chemistry programs focused on developing small-molecule KRAS inhibitors.

Assess uniqueness

Effectively assess the uniqueness of their own solutions against the broad scope of industrial innovation captured in the patents.

Enhance decision-making

Use the structured patent information to inform crucial R&D decisions and swiftly identify potential opportunities within KRAS Drug Discovery.

Overcome data bottleneck

Eliminate the bottleneck created by the time-consuming process of manually extracting, validating, and structuring SAR data from the rapidly growing number of KRAS-related patents.

Our approach

The client engaged us due to our two decades of experience and our position as the global leader in the manual extraction of SAR data from scientific literature. Our solution addressed the critical bottleneck of manual data curation, enabling the client’s researchers to focus on identifying novel compounds.

Expert team

We leveraged our team of over 60 PhDs to identify relevant literature, extract appropriate data, and deliver it in a standardized, analysis-ready format.

Time savings

Our patent curation services reduced the weekly time spent by the client’s researchers on data extraction and curation by 60-70% (9–14 hours saved per week).

Quality process

Our rigorous curation process includes three stages—manual curation, review, and quality control—ensuring the extracted data is clean, consistent, and exceptionally accurate.

Oursolution

We successfully delivered a comprehensive dataset that met the client’s requirement for immediate integration into their internal repository for further review and analysis. This dataset was built on high-fidelity SAR data extraction from patents essential for advancing KRAS Drug Discovery.

The supplied data included crucial, analysis-ready fields:

  • Chemical structure
  • Compound’s chemical and IUPAC name
  • Targets
  • Cell lines
  • Biological source or species
  • Experimental endpoints with prefixes and units of measurements
  • Assay conditions
  • InChI

With our assistance, the client was able to swiftly proceed with their ongoing research program, avoiding the bottleneck of the data collection and standardization phase. The efficacy of our manual curation and quality control was greatly appreciated, establishing a long-term partnership for similar data curation requests. (Explore our core Data Curation Services).

ChemTreeMap showing chemical diversity visualization; red dots (GOSTAR) demonstrate expansive chemical space for KRAS Drug Discovery.

Chemical diversity visualization – GOSTAR® vs. ChEMBL

Patents are the Cornerstone for Advancing Data-Driven KRAS Drug Discovery

Conclusion

As the global leader in the manual extraction of SAR data from scientific literature, we consistently replicate these successful outcomes with top life science companies. Our custom patent curation services are vital for streamlining the data collection and standardization processes, especially in rapidly evolving fields like KRAS Drug Discovery. By delivering high-quality, analysis-ready data, we enable pharmas and biotechs to realize significant time and cost savings, empowering their researchers to focus on the formidable task of identifying and developing novel therapeutics. (See another example of our expertise in this area: Drug target dossier: target intelligence for data-driven drug repurposing).