cd Sample_AS-DKO-CD4T_a-CD3CD28
cat AS-DKO-LCD3_AGGTTCAT_L002_R1_001.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R1_002.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R1_003.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R1_004.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R1_005.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R1_006.fastq.gz > tempR1.gz
cat AS-DKO-LCD3_AGGTTCAT_L002_R2_001.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R2_002.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R2_003.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R2_004.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R2_005.fastq.gz AS-DKO-LCD3_AGGTTCAT_L002_R2_006.fastq.gz > tempR2.gz
rm *.fastq.gz
mv tempR1.gz AS-DKO-LCD3_AGGTTCAT_L002_R1_001.fastq.gz
mv tempR2.gz AS-DKO-LCD3_AGGTTCAT_L002_R2_001.fastq.gz
cd ..
cd Sample_AS-DKO-CD4T_P+I
cat AS-DKO-CD4T_ATATAAGA_L002_R1_001.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R1_002.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R1_003.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R1_004.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R1_005.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R1_006.fastq.gz > tempR1.gz
cat AS-DKO-CD4T_ATATAAGA_L002_R2_001.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R2_002.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R2_003.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R2_004.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R2_005.fastq.gz AS-DKO-CD4T_ATATAAGA_L002_R2_006.fastq.gz > tempR2.gz
rm *.fastq.gz
mv tempR1.gz AS-DKO-CD4T_ATATAAGA_L002_R1_001.fastq.gz
mv tempR2.gz AS-DKO-CD4T_ATATAAGA_L002_R2_001.fastq.gz
cd ..
cd Sample_AS-WT-CD4T_a-CD3CD28
cat AS-WT-LCD3_AGCGCTGG_L002_R1_001.fastq.gz AS-WT-LCD3_AGCGCTGG_L002_R1_002.fastq.gz AS-WT-LCD3_AGCGCTGG_L002_R1_003.fastq.gz AS-WT-LCD3_AGCGCTGG_L002_R1_004.fastq.gz AS-WT-LCD3_AGCGCTGG_L002_R1_005.fastq.gz > tempR1.gz
cat AS-WT-LCD3_AGCGCTGG_L002_R2_001.fastq.gz AS-WT-LCD3_AGCGCTGG_L002_R2_002.fastq.gz AS-WT-LCD3_AGCGCTGG_L002_R2_003.fastq.gz AS-WT-LCD3_AGCGCTGG_L002_R2_004.fastq.gz AS-WT-LCD3_AGCGCTGG_L002_R2_005.fastq.gz > tempR2.gz
rm *.fastq.gz
mv tempR1.gz AS-WT-LCD3_AGCGCTGG_L002_R1_001.fastq.gz
mv tempR2.gz AS-WT-LCD3_AGCGCTGG_L002_R2_001.fastq.gz
cd ..
cd Sample_AS-WT-CD4T_P+I
cat AS-WT-CD4T_AGTTATAG_L002_R1_001.fastq.gz AS-WT-CD4T_AGTTATAG_L002_R1_002.fastq.gz AS-WT-CD4T_AGTTATAG_L002_R1_003.fastq.gz AS-WT-CD4T_AGTTATAG_L002_R1_004.fastq.gz AS-WT-CD4T_AGTTATAG_L002_R1_005.fastq.gz > tempR1.gz
cat AS-WT-CD4T_AGTTATAG_L002_R2_001.fastq.gz AS-WT-CD4T_AGTTATAG_L002_R2_002.fastq.gz AS-WT-CD4T_AGTTATAG_L002_R2_003.fastq.gz AS-WT-CD4T_AGTTATAG_L002_R2_004.fastq.gz AS-WT-CD4T_AGTTATAG_L002_R2_005.fastq.gz > tempR2.gz
rm *.fastq.gz
mv tempR1.gz AS-WT-CD4T_AGTTATAG_L002_R1_001.fastq.gz
mv tempR2.gz AS-WT-CD4T_AGTTATAG_L002_R2_001.fastq.gz
cd ..
Question: Binning Over Genes And Calculating The Coverage [Bedops/Bedmap]
3
I have a list of genes and a coverage track (generated using BEDOPS-based binning script also called as
What I want do it to divide the each gene into 100 bins and then calculate the mean overlap from my coverage file, so that later all the respective bins can be added for all genes to give a average protein binding/coverage in that bin.
I couldn't understand that how I can make a specific number of bins for the genes though I can regulate the binSize using
Then, I used this list as a reference, (added a fake 4th column, which is also not very intuitive) to find the mean overlap for my coverage file.
Problems :
1) This method is not very intutitive (because of R code, it takes ~4-6 mins to complete the process for a list of 30.2K genes producing 30.2K*100 lines as a bed file to be used as a reference). It there a way in bedOps suite to generate the bins on the fly.
2) The output of
Can you suggest some other way to approach this problem.
Thanks
binReads.sh), modified it to take bed file as input).What I want do it to divide the each gene into 100 bins and then calculate the mean overlap from my coverage file, so that later all the respective bins can be added for all genes to give a average protein binding/coverage in that bin.
I couldn't understand that how I can make a specific number of bins for the genes though I can regulate the binSize using
binReads.sh. I made a custom R code, for taking my list of genes and the dividing each of them into 100 bins.Then, I used this list as a reference, (added a fake 4th column, which is also not very intuitive) to find the mean overlap for my coverage file.
Problems :
1) This method is not very intutitive (because of R code, it takes ~4-6 mins to complete the process for a list of 30.2K genes producing 30.2K*100 lines as a bed file to be used as a reference). It there a way in bedOps suite to generate the bins on the fly.
2) The output of
bedmap --mean genes_bins.bed file.cov > means.tsv
is valid numeric for first 42K reference lines out of 3.2M(30.2K*100)
and for the rest it produces "NAN". To cross-validate, I took some
specific binned genes and it produces a valid output. Is this because of
the size of the files (shouldn't be).Can you suggest some other way to approach this problem.
genes_bins.bed
chr1 3214481 3219051 Xkr4-bin1
chr1 3219051 3223621 Xkr4-bin2
chr1 3223621 3228192 Xkr4-bin3
chr1 3228192 3232762 Xkr4-bin4
chr1 3232762 3237332 Xkr4-bin5
chr1 3237332 3241902 Xkr4-bin6
coverage.bed
chr1 3001500 3001520 1 1
chr1 3001520 3001540 1 1
chr1 3001540 3001560 1 1
chr1 3001560 3001580 1 1
chr1 3001580 3001600 1 1
chr1 3001600 3001620 1 1
chr1 3001620 3001640 1 1
P.S. It was started as comment under Converting BAM to bedGraph for viewing on UCSC and I thought its better to ask as a seperate question. This probably be answered by Alex Reynolds himself. The final aim to make the composite profiles of your protein of interest from the chip-seq datasets.Thanks

headof the file, I will mention that. Secondly, thats the whole point of binning, the genes have unequal length, if you want to make a Plotting Read Density/ Distribution/ Composite Profiles [Chip-Seq], then the genes should be be binned and avg binding per bin is calculated in the end to be summed up for per bin for all bins of all genes, to get an average binding over the whole length of the gene. This will effect, that sum genes have more density in a bin and some less. Can you suggest some other strategy!!