Development Roadmap

This page outlines the key development priorities that the core Cpipe team are working towards.

Near Term Goals

  1. Moving to a fully VCF based pipeline

    Currently Cpipe uses VCF up to the point of performing annotations with Annovar, but after that point it processes variants in CSV format. We believe that it is better for variants to remain in VCF format so that downstream tools can continue processing the files using a standard format. Therefore we are working to reengineer the pipeline stages from Annovar forward so that everything is VCF-native.

  2. Family based sequencing

    Cpipe currently is focussed on analysing singleton data. However we know that family based sequencing, most especially trios, is a key clinical use case. While Cpipe can analyse trio and family data, the pipeline is suboptimal because the samples from within each family are not called jointly, and the inheritance within the family is not modeled in the variant caller. We are therefore planning to fully support trios by allowing input of family structure (initially, through provision of a PED file), in which cases the samples within each family will be called jointly together.

  3. Improved Regression Tests

    The current tests focus on SNVs but do not simulate Indels. Additionally, variants that are only simulated at 0.5 allele frequency. The tests will be expanded to simulate variants at higher and lower frequencies so as to ensure that Cpipe detects variants and correctly calls their heterozygosity at varying threshold.

Longer Term Goals

  1. Analysis of Cancer Samples

    Cpipe is currently focused on analysis of germline variants. However cancer is an extremely important clinical use case that requires significant dedicated customisation of the pipeline.

  2. Web Based / API Control of the Pipeline

    Currently Cpipe is fully controlled by command line operations. For sophisticated users this is often the most preferred mode of operation. However longer term Cpipe should have an accessible interface that supports all the operations required to manage the ordinary operation of the pipeline without direct intervention via command line. Simultaneously, we believe that anybody should be able to build their own interface for controlling Cpipe, so the system will support an API that allows third party applications to perform all the same functions as the full web interface.

  3. CNV Calling

    CNV calling from exome data requires a completely different set of analysis tools to those for identification of SNVs and Indels. To cater to this, Cpipe will add support of common CNV callers and annotation steps to allow for clinically relevant copy number changes to be identified.