Extracts sequences of one of the flank type described.

get_flank_from_feature(
  feature_file,
  fasta_file,
  width = 10,
  flank_type = 1,
  write_flankbed = FALSE,
  write_outputfasta = FALSE,
  outfile = "flank_out"
)

Arguments

feature_file

Either a path or a connection to either a bed file, file format should be standard UCSC bed format with column: 1. chromosome-id, 2. start, 3. end, 4. name, 5. score, 6. strand OR gene feature file with extention .gff or .gff3 ** Chromosome names should be same as fasta file.

fasta_file

Either a path or a connection to reference multi-fasta file, from which subset of sequences for given input feature is to be retrieved. In the sequence header: only string before first space and/or first colon (:) will be considered for further processes. **Important consideration when header have long names.

width

Numeric, width to determine the flank length, Default: 10

flank_type

Numeric,choose region whose sequence (of width length) is to be fetched.

  • 1: sequence upstream of start coordinate

  • 2: sequence downstream of start coordinate

  • 3: sequence downstream of end coordinate

  • 4: upstream and downstream of start coordinate

  • 5: upstream and downstream of end coordinate

  • 6: upstream and downstream of feature/gene coordinates

  • 7: middle region, ie. width length from start and end coordinate Start----->ATGCGGATGCGGTC<------End

Default: 1

write_flankbed

Logical, to return flank region as a output bed file, Default: FALSE

write_outputfasta

Logical, to return flank sequences as a output multi-fasta file, Default: FALSE

outfile

character vector, defining output file name, Default: 'flank_out'

Value

fasta sequence and ranges of the flank region

Examples

if (FALSE) { feature_file_in <- system.file("exdata","Sc_ref_genes.gff", package = "fastaR") ref_fasta <- system.file("exdata", "Sc_ref_genome.fasta", package = "fastaR") fastaR::get_flank_from_feature(feature_file = feature_file_in, fasta_file = ref_fasta, flank_type = 2) }