The “AWS of Biology”: Who Actually Builds It?

The bottleneck in AI-native biology is no longer hypothesis generation. It’s experimental execution density, and that changes where infrastructure value accrues.

May 22, 2026

For most of modern biotech, biology itself was never the rate limiter. Throughput was. Although sequencing costs collapsed faster than almost any technology curve in industrial history (Check out a great deal in Human Longevity Inc in memoriam of Craig Venter, who made cheap sequencing possible) , wet-lab iteration remained stubbornly physical: graduate students pipetting liquids by hand, fragmented inventory systems, irreproducible protocols buried in supplementary PDFs, CRO coordination through email chains, freezers full of chaotically labeled samples, and experimental timelines measured in weeks because a single assay queue backed up an entire program.

Software scaled because computation became standardized. A developer could run the same application on different servers without changing the code. That is what made Amazon Web Services powerful: compute became interchangeable infrastructure.

Biology does not behave that way. Experimental systems are still tightly coupled to specific hardware, workflows, and lab conditions. A protocol built for robotic liquid handlers from Hamilton Company often requires recalibration on systems from Tecan because the robots move differently, dispense liquids differently, and run different software. Similarly, automated workflows developed inside Synthace may not transfer cleanly into cloud laboratory environments like Emerald Cloud Lab because the orchestration systems and instruments are different.

Even small variables like reagent batches, humidity, incubation timing, or how aggressively liquids are pipetted can change biological results. Biology still lacks the clean abstraction layers that made cloud computing possible. So when people want to push their comapny PR as “AWS of biology,” the real opportunity is not infinitely portable experiments. It is building the infrastructure that makes biology more standardized, reproducible, and programmable than it is today.

The last decade produced an enormous increase in computational capability: foundation models, generative protein design, multimodal biological embeddings. But wet-lab throughput remains structurally constrained. The bottleneck is no longer hypothesis generation. It is experimental execution density. Most AI-native biotech companies eventually collide with the same problem: they can generate candidate space faster than they can validate it. A competent computational biologist can spin up a GPU cluster, run inference on a 70B parameter model, and tear down the infrastructure in an afternoon. A wet-lab scientist trying to run a comparable experiment will spend weeks on protocol optimization, reagent sourcing, instrument calibration, and data cleaning before generating a single interpretable result. That delta is the actual market opportunity. The question is who captures it and how.

The LIMS Problem Is Not What You Think

Laboratory information management systems (LIMS) have existed since the 1980s. The fact that they remain a genuine pain point in 2026 tells you something about the structural difficulty of the problem. Benchling, a software platform for managing biological R&D workflows and experimental data, is the canonical success story, and it’s worth understanding precisely why. Benchling won by starting with molecular biology workflows like sequence editing, construct design, cloning, where the data model was tractable, generating habitual daily engagement from bench scientists, and then expanding upward into electronic lab notebooks and data management once it owned the scientist’s working memory. It then went further: sample lineage, compliance, permissions, inventory, experimental coordination across regulated environments. The company embedded itself into operational workflows deep enough that displacement became organizationally expensive even when competing products were technically superior. That’s closer to ERP (entreprise resource planning) than laboratory software.

Importantly, Benchling’s strategic position improves as biology becomes more computationally orchestrated. AI-generated experimental design increases the importance of standardized execution environments because model outputs only compound in value if downstream experimental systems remain structured enough to absorb and reproduce them. Most investors focus on the model layer. The execution layer into which models pipe their outputs may end up structurally stronger, for the same reason that database infrastructure often outlasts the applications built on top of it.

The companies that failed (and there were many with genuinely good technology) largely underestimated this dynamic. They built comprehensive systems first and tried to migrate scientists away from existing tools. Scientists are conservative about workflow changes for defensible reasons: a protocol that works is worth protecting, and a software migration that breaks an assay mid-campaign has real cost. LabVantage, IDBS, and Dotmatics all have meaningful enterprise presence, but their growth profiles look nothing like Benchling’s, partly because they approached the market as IT infrastructure rather than scientist tooling. Dotmatics’ acquisition by Insightful Science and subsequent consolidation plays represent the other strategic path: horizontal aggregation of a fragmented market rather than organic platform expansion (which is not without issues; see paragraphs below for horizonal vs vertical integration in biotech). Whether that creates durable value or just a bigger but still-fragmented bundle remains an open question.

The data locked inside a pharma company’s LIMS (decades of assay results, failed compound libraries, dose-response curves) is strategically valuable in ways nobody has fully monetized yet. The vendor that owns the schema owns the key.

Hardware Is Not Commoditizing the Way People Expect

There’s a persistent assumption that lab automation hardware will commoditize like server hardware did; that liquid handlers become fungible, the software layer captures the margin, and liquid handling companies become interchangeable suppliers. This is probably wrong, and the reason is interesting. The biomechanics of liquid handling are genuinely complex. Aspirating 200 nanoliters of a viscous protein solution without introducing bubbles, carryover, or evaporation artifacts requires precision engineering refined over decades. Hamilton’s VENUS software is genuinely unplesant with an interface that looks like ealy 2000 and documentation that rewards persistence rather than clarity. But nobody switches away from Hamilton systems because the physical reliability of the instrument is hard to replicate, and every protocol in the lab is written against Hamilton behavior specifically. The switching cost is embedded in the protocol library, not the instrument contract.

The more interesting development is what happens when automation infrastructure becomes tightly coupled to machine-learning-driven iteration. Once closed-loop optimization enters biology throughput infrastructure starts behaving like computational infrastructure. Historically, Hamilton and Tecan and Beckman Coulter were viewed primarily as instrument vendors. Increasingly, they look more like control points within biological compute systems. That’s a different strategic position, and one that their current pricing and go-to-market strategies don’t yet fully reflect.

Opentrons identified a different segment: academic labs, startups, and applications where $150K for a Hamilton is prohibitive. The OT-2 is not a Hamilton competitor but a different class of instrument, lower throughput, less precision, more variability (like iPhone 17 vs iPhone 13). But it’s open, Python-programmable, and costs $10K. The Flex platform moved upmarket. What’s strategically interesting about Opentrons is less the hardware and more the cultural implication: it potentially onboards a generation of scientists who learn to think about automation natively, in code, from their first lab job. That’s a slow-compounding shift in what scientists expect from experimental environments, which has downstream effects on the market for everything above the hardware layer.

Formulatrix is worth examining because it represents deep vertical specialization carried to near-monopoly: crystallography and imaging workflows, where ROCK Imager systems have become the standard for protein crystallography imaging to a degree that is almost uncomfortable from a competitive standpoint. Formulatrix doesn’t care whether your drug discovery hypothesis is correct. It needs you to be running crystallography screens. And switching imaging platforms mid-project requires revalidating months of data. That’s a defensible position that doesn’t depend on being the best instrument in the abstract, it depends on being embedded in a workflow where switching is costly.

Cloud Labs: The Contract Fab Analogy Is Better Than AWS

The cloud lab concept, remote access to robotic experimentation platforms via software-defined protocols, is clearly correct in theory. The ability to submit an experimental design and receive structured data back, without managing instrument uptime, reagent inventories, or technician schedules, is obviously valuable (Check how cool and efficient it is in practice by seeing the autonomus lifestar2 footage from Insilico Medicine and then their pipeline)

Emerald Cloud Lab has been building toward this since 2012: a proprietary specification language (SciLang), a full instrument fleet, experiments defined entirely in software. Strateos went a similar direction. Arctoris built a UK-based cloud lab focused on pharmacology assays for drug discovery specifically. The model is right. The adoption curve has been slower than expected.

The AWS analogy breaks down quickly here because biological experiments are not cleanly virtualizable. Tacit operational knowledge still matters enormously. Small procedural deviations frequently alter outputs. Biological systems remain highly context-sensitive in ways that compute workloads are not. But the failed analogy obscures the more important point: centralized experimental infrastructure may become economically dominant even without full abstraction. The better comparison is contract semiconductor fabrication during early industry consolidation. Most companies eventually stopped building full-stack fabrication internally not because they achieved a clean abstraction layer, but because centralized operators achieved superior throughput economics, process consistency, and tooling specialization that internal teams couldn’t match on any reasonable capital budget (read Chris Miller’s “Chip War” for a clear understanding of the current semiconductor situation).

Biology appears to be moving in the same direction, although more slowly due to biological variability and regulatory friction. The path to enterprise adoption for cloud labs requires either regulatory acceptance of remote execution for IND-enabling studies (slow) or a sufficient track record that internal risk committees get comfortable, which is also slow. So cloud labs have found their natural early market in academic collaborations, early-stage drug discovery, and applications where iteration speed matters more than GLP compliance. Argonaut Bio took the narrower approach: synthetic biology workflows specifically, where the customer is more likely to be a startup or internal platform team with fewer regulatory constraints on individual experiments. The narrower focus may be correct. There’s a version of the cloud lab market that looks like one dominant platform, and a version that looks like vertical SaaS — specialized platforms owning specific workflow categories. Given the difficulty of building general-purpose biology infrastructure, the vertical version seems more likely in the near term.

Synthetic Biology Tooling: Where Scientific and Commercial Timelines Diverge Most

The design-build-test-learn cycle in synthetic biology has gotten dramatically faster over the last decade. DNA synthesis costs have dropped roughly 10,000-fold since the early 2000s, sequencing costs have dropped more, and protein structure prediction is tractable in ways that felt implausible five years ago. But the infrastructure to actually run high-throughput genetic engineering campaigns is still fragmented and largely manual at most organizations.

Twist Bioscience solved the synthesis bottleneck using silicon-based DNA synthesis, achieving cost structures that make large library generation economically sensible. The strategic insight was that DNA synthesis, done on silicon at scale, could look more like semiconductor manufacturing than chemical synthesis. Twist has real competitive advantages in yield and error rate at long constructs. IDT (now part of Danaher) dominates shorter oligos with the advantage of Danaher’s operational discipline and customer relationships. The synthesis market has effectively bifurcated: commodity short oligos where IDT and a handful of others compete on price, and longer synthetic genes and complex libraries where Twist holds a meaningful position. But synthesis is only one step.

The harder problem is the rest of the DBTL stack. Who stores and tracks the constructs? Who manages transformation and selection? Who runs the assay? Who cleans and structures the data? Ginkgo Bioworks built this stack internally and has been attempting to sell access to it as a platform service: the Codebase licensing model, where customers access Ginkgo’s organisms and protocols without rebuilding the infrastructure themselves. The commercial reality has been harder than the vision suggested, and the diagnosis is worth being precise about. Ginkgo’s actual moat is the organism and pathway library plus accumulated experimental data, not the automation infrastructure per se, though they’re linked. The market initially priced the company as if organism engineering would scale with software-like marginal economics. That assumption turned out to be overly optimistic because biological systems retained too much context dependence, customer-specific variability remained high, automation gains were uneven, and biological standardization progressed more slowly than modeled.

Dismissing the underlying infrastructure thesis entirely is probably also incorrect. Large-scale organism engineering pipelines, automated assay systems, protocol libraries, strain optimization workflows, and biological process data accumulate strategic value over long time horizons. Infrastructure businesses frequently look operationally messy during early market formation because standardization hasn’t stabilized yet. The same pattern appeared repeatedly in cloud computing, semiconductor tooling, and industrial automation. Ginkgo’s current challenges are partly execution and partly market timing, not necessarily evidence that the platform model is wrong.

Data Advantages: Loops Beat Snapshots

Every pitch deck in bio-infrastructure mentions data network effects. A true data network effect requires that more users generate data that makes the platform more valuable for all users. In biology, this is harder to achieve than people admit. Experimental data from one customer’s cell line in one growth condition is not obviously useful for another customer’s cell line in a different condition. The context-dependence of biological data means raw volume is much less valuable than curated, structured, annotated datasets from controlled experiments.

The more defensible framing is not static data assets but continuously compounding experimental feedback loops. Data generated through standardized execution systems becomes disproportionately valuable because it remains computationally interoperable over time: you can train on it, compare across it, and use it to calibrate future experiments. Data collected on heterogeneous, poorly documented protocols is largely stranded. This is a meaningful distinction that changes which companies actually have durable data positions versus which ones have accumulated volume that they can’t use (which is aslo one of the reasons why there is no foundation model for cryo yet).

The relevant distinction is not how much biological data a company has collected. It is whether that data was generated under standardized enough conditions to remain computationally useful as models improve.

The Regulatory Asymmetry Nobody Talks About Enough

One structural advantage that bio-infrastructure companies have over their customers is regulatory positioning. A LIMS vendor or lab automation company selling to pharma doesn’t need FDA approval to operate. Their customers do. This creates an asymmetry where the infrastructure layer can move faster, experiment more freely, and generate revenue from multiple customers simultaneously, while each customer faces the full weight of biotech/medtech regulatory timelines. The infrastructure company doesn’t inherit the risk of any single drug program failing. It gets paid regardless of whether the science works (unless it’s their fault, obv).

This changes when infrastructure is used for GMP manufacturing or regulated assays. LabVantage and IDBS have substantial businesses in validated LIMS systems embedded in QC workflows for GMP facilities. This is a structurally different market from early R&D tooling. Validation is expensive, change control is slow, and switching costs are enormous. The average validated LIMS replacement at a major pharmaceutical company is a multi-year project costing tens of millions of dollars. Once you’re in, you’re in for a decade. That’s an extremely durable revenue stream but also slow growth and limited ability to update the product, because every update requires revalidation. The startups trying to enter this segment have to offer something sufficiently compelling that customers will absorb the validation cost; the incumbents benefit from a status quo that is physically difficult to disrupt.

The Missing Layer: Biological APIs

Software engineering has a well-developed concept of APIs": standardized interfaces that let you call a function without caring about the implementation underneath. Biology doesn’t have this, and the absence is more fundamental than it might seem. When a software team integrates with Stripe’s payment API, they get deterministic behavior: the same input produces the same output, errors are typed and documented, the interface is stable across versions. When a biology team tries to call a “run CRISPR screen” function despite using the same protocol, same reagents, same cell line, they get variable outputs, undocumented failure modes, and results sensitive to ambient humidity, reagent lot numbers, and which technician ran the experiment on which day.

The effort to build something API-like out of biology is a deep bet in the bioengineering infrastructure space. Synthace’s platform is one articulation: a high-level protocol language that abstracts over specific instruments and generates executable steps for available hardware, plus structured data capture that makes results comparable across runs. The technical challenge is that the abstraction layer has to be biologically meaningful. It’s not enough to say “mix these two things”; you have to specify temperature, timing, mixing dynamics, and dozens of other parameters that affect the result. That’s a lot of domain knowledge to encode, and the edge cases are endless. The economic challenge is that scientists are skeptical of black boxes, and the value of structured protocol specification is only visible over time, when you’re comparing run 47 to run 3 and trying to understand why the outputs diverged.

The companies that will partially solve this are likely the ones finding specific workflow categories narrow enough to make reliability achievable. Beckman Coulter’s automated cell culture systems approach biological APIs for cell expansion: specify starting conditions and target passage, system manages it. Not perfectly deterministic, but reliable enough to build on top of. As these partial APIs proliferate across workflow categories like liquid handling, cell culture, sequencing library prep or protein purification, the stack starts to look more like software infrastructure and less like artisanal science. Once experimentation itself becomes computationally orchestrated, the strategic center of gravity shifts toward whoever controls the interfaces between models, experiments, instrumentation, and validated operational systems. That is where infrastructure becomes extremely difficult to displace.

Who Captures Durable Leverage

A large portion of current biotech investing still evaluates companies primarily through scientific novelty: differentiated targets, novel modalities, proprietary models, unique datasets. Those matter, but they don’t necessarily determine who captures durable leverage. Infrastructure companies frequently become more strategically important than discovery companies because they sit directly inside iteration loops rather than depending on them.

The history of technology infrastructure suggests that the companies building durable positions are usually not the most scientifically impressive or the most technically ambitious. They’re the ones owning a chokepoint in the workflow, accumulating switching costs faster than competitors can erode them, and building data or network assets that compound over time. In bioengineering infrastructure, a few patterns stand out.

The companies that own the biological data model historically beat the companies that only owned the hardware. Sequencers and lab instruments eventually commoditize; structured datasets, annotations, and interpretation pipelines do not. Before AI, a lab could buy a genome sequencer and still spend months extracting meaningful insight without specialized bioinformatics infrastructure. Foundation models are beginning to compress that advantage by making biological interpretation cheaper and more portable, shifting value away from proprietary analysis layers and toward ownership of large, high-quality longitudinal datasets.
Vertical depth beats horizontal ambiguity. The companies with the clearest positions like Formulatrix in crystallography imaging or Twist in gene synthesis are deeply specialized. Platforms trying to be everything to everyone in biology are struggling, because “everything in biology” is too heterogeneous to serve with a single product architecture.

The important shift is not that biology is becoming software. It is that computation is colliding with a physical system that refuses to fully abstract. AI can generate hypotheses, designs, and experimental plans at extraordinary scale, but biological execution still depends on fragile workflows, tacit knowledge, instrument-specific behavior, and operational discipline. That makes infrastructure more strategically important, not less. The durable companies are unlikely to be the ones with the flashiest models or the cheapest hardware. They are the ones embedding themselves inside the experimental loop deeply enough that removing them would slow down the production of biological insight itself.

Regeneration.AI

Discussion about this post

Ready for more?