• LOGIN
    Login with username and password
Repository logo

BORIS Portal

Bern Open Repository and Information System

  • Publications
  • Theses
  • Research Data
  • Projects
  • Organizations
  • Researchers
  • More
  • Collections
  • Statistics
  • LOGIN
    Login with username and password
Repository logo
Unibern.ch
  1. Home
  2. Publications
  3. SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.
 

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.

Options
  • Details
  • Files
BORIS DOI
10.7892/boris.79078
Publisher DOI
10.1093/bioinformatics/btu343
PubMed ID
24845651
Description
Summary: Many time-consuming analyses of next-generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics because of their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying.

The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes.
Date of Publication
2014
Publication Type
Article
Subject(s)
500 Science > 590 Animals (Zoology)
600 Technology > 630 Agriculture
000 Computer science, knowledge & systems > 040 Unassigned
500 Science > 570 Life sciences; biology
Language(s)
en
Contributor(s)
Wiewiórka, Marek S
Messina, Antonio
Pacholewska, Alicja Elzbietaorcid-logo
Departement klinische Veterinärmedizin, Pferdeklinik (ISME)
Departement für klinische Veterinärmedizin (DKV)
Institut für Genetik
Maffioletti, Sergio
Gawrysiak, Piotr
Okoniewski, Michał J
Additional Credits
Departement klinische Veterinärmedizin, Pferdeklinik (ISME)
Series
Bioinformatics
Publisher
Oxford University Press
ISSN
1367-4803
Access(Rights)
open.access
Show full item
BORIS Portal
Bern Open Repository and Information System
Build: dd892c [ 9.04. 8:30]
Explore
  • Projects
  • Funding
  • Publications
  • Research Data
  • Organizations
  • Researchers
  • Audiovisual Material
  • Software & other digital items
  • Events
More
  • About BORIS Portal
  • Send Feedback
  • Cookie settings
  • Service Policy
Follow us on
  • Mastodon
  • YouTube
  • LinkedIn
UniBe logo