Skip to main content

It is amply clear that digitization and utilization of data science is playing an important role in all aspects of human lives. Drug discovery and development is no exception. How far we have successfully implemented digitization and data science in drug discovery and development? Is this reality or a far-fetched dream?

Question: With the rapid foray of data science and digital transformation technologies in a pan-industry manner, there seem to be various interpretations regarding the identity and role of these terms in the life sciences and biopharma industry. Can you share your perspective on this?

Digital transformation is the buzz word these days, everyone is talking about it. But if we look under the hood, there are two major components:

‘Digitization’ and ‘Digitalization’

  • In ‘digitization’, unstructured and scattered data is converted into a structured and machine-readable format.
  • On the other hand, in ‘digitalization’, advanced analytics such as machine learning or deep learning methods are applied on top of the data to derive value from it.

Digital transformation in pharmaceutical industry is nothing but adaptation of digitization at organizational level and implementation of analytics to accelerate drug discovery and development, to bring better medicines for the benefit of humankind.

Why is data transformation important?

  1. It helps bring efficiency in operations
  2. It enables engagement of the stakeholders effectively in a data-driven approach
  3. It helps uncover new trends thereby paving the way for generation of new ideas and innovation

Until recently, digital transformation seemed like a long-term vision across industries. However, recent global events and circumstances have forced enterprises to embrace it rather quickly.

Question: You alluded to the conversion of unstructured information into structured formats, data analytics driven by AI/ML tools and technologies, and the general principles of data transformation. How do all these diverse elements come together to function in tandem, where does it start and end?

Let us understand the journey from data to insights, as this sets the stage for fundamental understanding on this topic.

  • Multiple data points culminate into ‘information’
  • Linking the information together builds ‘knowledge’
  • Identifying patterns in the knowledgebase generates ‘insights’

The end-point of this journey is ‘wisdom’ which dictates what to do and what not to do.


In this journey, everyone tends to focus on the “attractive” analytics piece, majorly driven by AI.

However, it is equally if not more important to focus on the initial part, up to building knowledge, as this forms the foundation of all future analytics.

If enough emphasis is not given to the first steps, we are left with artefactual results where we may not be able to make any sense of it. This initial phase can be broadly termed as ‘data digitization’ that involves structuring, harmonizing and integrating data.

The second phase is ‘data analytics’ where an outcome is predicted by leveraging AI/ML tools on structured data.

Question: How are all these “data” principles applied to the vast, multi-domain life sciences industry?

There is a deluge of data in the biopharma industry. If you look at the trajectory from drug target identification all the way to initiation of a clinical study, there are a number of nodes in between.

Each node requires data from various streams such chemistry, biology, discovery technology, DMPK, efficacy, safety, IND enabling studies and much more. In this journey, you can appreciate that a huge volume and a wide variety of data is generated along the way. The data is heterogenous, disparate and complex.

While it’s great to generate such rich datasets, unless we practice digitization principles, the data will become useless in no time.

There are several key aspects of digitization we must consider such as standardization, ontologies, annotation, FAIR data principle practices and data warehouse creation.

Question:  The aspects of digitization, and how they are specific in context and practice within the biopharma space?

‘Annotation’ and ‘contextualization’ are complex and multi-layered problems, unique to life sciences.

Let us consider a simple example of protein binding interactions. In one scenario, a chemical may bind to a protein that functions as a target receptor, while in another case, a protein may function as an ion channel that allows specific chemicals to pass through.

Hence it is clear that the relationship between a protein and chemical is context driven and not necessarily the same all the time. A human being can infer such relationships but to structure and digitize this information in a seamless automated manner is a different challenge altogether. These kinds of problems are common in life sciences, whereas in non-life science areas, data relationships are often uniform across situations.

‘Data standardization’ across experiments is another crucial element for performing any advanced analytics. The situation is further confounded today with the availability of many open access heterogeneous databases that pharma companies wish to combine with their proprietary data assets, a task that cannot be performed unless this data is standardized. 

Question: Ontologies and data standards are key aspects to consider within the purview of data digitization in this industry. What is the importance of these topics?

Yes. Another important challenge in digitization is the usage of ‘ontologies’. All the important entities such as drugs, diseases, and targets have a large number of ontologies.

For example, if we focus on disease as an entity, we have many ontologies such as ICD, DO, MESH, UMLS etc. However, during data integration we have a hard time mapping data with any particular ontology as there isn’t necessarily a 1-1 mapping in place.

Hence, we must find a way to address ontology-oriented issues.

Regarding ‘data standards’, as many are aware, digitization is a reasonably well accepted practice in regulatory submission in drug discovery and development.

We do use SDTM (Study Data Tabulation Model) in clinical practice and in the recent past FDA has mandated to use SEND (Standard for the Exchange of Non-clinical Data) format for pre-clinical data. We are heading in right direction and while there is a standardized format in regulatory submission, we still have room to improve. The formats for US-FDA and Japan-PMDA submission for example are not the same and this varies from authority to authority.

If we have consistent and well digitized data at the foundation level, we do not have to reinvent the wheel to submit the data to regulatory authorities as per their formats.

Question: Considering the sheer volume and diversity of data generated in biopharma research, how does one approach data digitization? Is there a gold-standard method?

This is a pertinent question that can be addressed with a brief overview of the current practices and standards in data digitization. Each has its own merits and demerits.

  • First, there is manual data curation by SMEs- this ensures good quality but yields low volume.
  • Second, is high throughput automated data curation. In this case, machine learning, text mining and NLP can be used for data extraction, integration and standardization. This approach is certainly more volume efficient than manual curation, but data quality may be compromised.
  • Finally, we have semi-automation as a middle ground. This is a more favourable and acceptable method to extract and structure data. Most of the time, we start with automation followed by manual curation, enabling us to infuse context and train systems more efficiently. This further allows us to verify or validate the data to transform it into a machine-readable format.

Question: Having discussed digitization as the preliminary part of the data journey;  Where does artificial intelligence come into the journey?

I am sure we all must have experienced in one way or the other that we tend to use the words AI/ML/analytics rather loosely and often interchangeably. However, there are fundamental differences.

  • AI is any program that can sense, reason, act and adapt- could be IoT, robotics or data analytics.
  • ML is a subset of AI wherein the algorithm is trained on data and its performance improves as it is exposed to more data.
  • DL is a further subset of ML where neural networks adapt and learn by themselves.

Irrespective of what we call it, these technologies are useful across the pharma value chain, right from discovery to post-market. For practical purposes however, I will henceforth refer to the term AI in our discussion.

Question: AI technologies support the entire pharma value chain; what are some specific examples of its utility and the major players using AI to optimize drug discovery and development?

Sure, there are several AI applications either under development or being implemented in all aspects of the drug discovery and development paradigm including pre-clinical, clinical, manufacturing, supply chain, commercial and post-market surveillance.

As we know, AI is more prevalent or practiced in clinical stage and thereafter, where the data is more structured and standardized. This is another testimony to emphasize the importance of structured data and need of digitization at the beginning of the journey.

At Excelra, we have been successful in implementing AI methods, having provided various services to our partners towards accelerating their drug discovery and development programs.

Few more examples come to mind where several traditional large pharma companies and even younger biotech start-ups have embraced AI and digital transformation in their R&D efforts:

  • Novartis has digital innovation hubs across several geographies, while on the other hand BI has digital labs housed within a separate entity called BI-X that supports initiatives within the organization.
  • Companies like Lilly and Teva leverage AI for manufacturing, while others like Pfizer are focused on employing AI for optimizing patient engagement.
  • Insilico Medicine an AI-based biotech, worked in tandem with WuXi and Uni. of Toronto, and identified a potential drug in a record time of 46 days which is 10-15 times faster than conventional methods.
  • Another highlight is the collaboration between a pharma company Sumitomo Dainippon and an AI-company Exscientia, who entered the first AI-predicted drug into clinical trials recently against obsessive compulsive disorder. This was done in under a year, whereas a traditional process would have taken up to 4 years to complete.

The last two examples specially provide substantial testimony to the utility of AI in accelerating drug discovery.

Question: Although it is early days; it surely looks like the life sciences industry has been able to successfully implement AI across the broad spectrum of activities in drug discovery and development. What is the reality of the global acceptance and practice of AI in the pharma world? Finally, what are the pitfalls one must look out for and where is this transformation heading towards?

At the outset, it is indeed noteworthy to acknowledge that pharma and biotech companies are exploring and taking advantage of AI as a mainstream tool to accelerate, optimize and improve numerous processes, functions and stages of drug design and development these days. However, the pharma industry is still lagging in the area of monetization on AI.

I recall a study by McKinsey published in The Economist that compared different sectors and their gains from AI. Pharma was at the very bottom with respect to the % share of total analytics, as well as gains in absolute numbers.

This points towards a huge room for improvement.

A lay person might wonder why pharma is last in the list. The bottom line comes down to what we discussed earlier; that we are presently challenged by data, specifically all the confounding factors we have noted at the beginning of the data journey: heterogeneity, complexity, context driven challenges, lack of standard ontologies etc.

We have to further admit that our field has been relatively slow in transitioning from legacy systems to sophisticated technologies.


Finally, in pharma, purely data science doesn’t feature as a standalone solution; rather, a deep understanding of the life science domain is fundamentally needed to draw meaningful insights.


Having said all this, there is a way forward to really derive synergy from the unison of data, data science and digital transformation in this industry. It is important that we develop standards for digitization, democratize data, adopt new technologies, build cross-functional teams and collaborate with external partners wherever necessary.

Only after we have tackled all these aspects can we leverage the true power of AI and ensure that treatments reach the market faster and cheaper, to impact lives and improve outcomes.

Please fill the form

"*" indicates required fields

This will close in 0 seconds