Tuesday, November 25, 2014

HLA PAPER IDEA: ADD SNP CALLINGS TO DETERMINE GENOTYPE

Saturday, October 25, 2014

Below, I show 2 ways to extract every 4th line: lines 4 and lines 8 in somefile.

  1. sed
    $ sed -n '0~4p' somefile
    line 4
    line 8
    $

    0~4 means select every 4th line, beginning at line 0.

    Line 0 has nothing, so the first printed line is line 4.

    -n means only explicitly printed lines are included in output.

Wednesday, October 15, 2014

Rule of naming external reads:

Forward_SampleName and Reverse_SampleName

Tuesday, October 7, 2014

show all hidden files in mac os

  1. Open Terminal found in Finder > Applications > Utilities
  2. In Terminal, paste the following: defaults write com.apple.finder AppleShowAllFiles YES
  3. Press return
  4. Hold ‘alt’ on your keyboard, then right click on the Finder icon in the dock and click Relaunch.


defaults write com.apple.finder AppleShowAllFiles NO

Tuesday, September 23, 2014

1908  svn co http://epitope3.liai.org/svn/bioinformatics/jgbaum/NGS_pipeline/trunk/ ./trunk
 1913  svn up
 1919  svn merge --reintegrate https://epitope3.liai.org/svn/bioinformatics/jgbaum/NGS_pipeline/branches/alex/
 1920  svn stat
 1921  svn commit
 1923  svn up


Friday, September 19, 2014

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000880_BeSc_IKZF3_Details -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000877_140829_D00361_0097_BHA0E7ADXX_GrSe10_ChIP41 -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000880_BeSc_IKZF3_Details/analysis.config -t CHIP

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000883_BeSc_IKZF3_D10_Details -q QC_LOCATION=/Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000861_8_19_14_GrSe02_verB_BAC -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000883_BeSc_IKZF3_D10_Details/analysis.config -t CHIP

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000931_StSt_BAC_708_506_Details -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000861_8_19_14_GrSe02_verB_BAC -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping//000931_StSt_BAC_708_506_Details/analysis.config -t Mapping

Wednesday, September 17, 2014

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DESeq_Inputfile_PaMa_891.csv -t PaMa_891

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DEseq_inputfile_SOTON03_11SEP14.csv -t SOTON03_11SEP14

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DESeq_Inputfile_StSc01_11SEP14.csv -t StSc01_11SEP14

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DeSeq_Inputfile_DaWeTEMRA_SCT_11SEP14.csv -t DaWeTEMRA_SCT_11SEP14

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DESeq_Inputfile_IsEn_SCT_11SEP14.csv -t IsEn_SCT_11SEP14

Monday, September 15, 2014


/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000876_8_27_14_AnTS_RNAseq_Details/ -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000875_08_27_14_AgTs_RNAseq/ -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000876_8_27_14_AnTS_RNAseq_Details/analysis.config -t Mapping

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000887_SLE_H3K27ac_W25_Redo_Details -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000877_140829_D00361_0097_BHA0E7ADXX_GrSe10_ChIP41 -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000887_SLE_H3K27ac_W25_Redo_Details/analysis.config -t CHIP

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000769_140529_D00361_0050_AH9A9MADXX_5_29_14_JiKa_RRBS_JrWen_BC13 -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/analysis.config -t Mapping


/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DESeq_Inputfile_PaMa_891.csv -t PaMa_891

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DEseq_inputfile_SOTON03_11SEP14.csv -t SOTON03_11SEP14

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DESeq_Inputfile_StSc01_11SEP14.csv -t StSc01_11SEP14

Wednesday, August 20, 2014

/share/apps/perl/perl-5.18.1/bin/perl /share/apps/NGS_pipeline_dev/src/mapReads_Debug.pl -i 000851 -a /Bioinformatics/Users/zfu/RNAseq_Analysis_Configs/000851_D10_K27Ac_Validation_Details/analysis.config -m /Bioinformatics/Users/zfu/RNAseq_Analysis_Configs/000851_D10_K27Ac_Validation_Details/InputMetaData.csv -s /share/apps/NGS_pipeline_dev/configs/system.config

Tuesday, August 19, 2014

PMID Classifier

1.      Open PuTTy on the desktop
2.      The host and port have been saved but they are:
a.       Host: classifier.internal.iedb.org
b.      Port: 30022
c.       Connection Type: ssh
3.      After entering 2a – 2c, click “Open.”
4.      You will be prompted for a login as (rohini) and password (classify!123).  Hit “enter” after typing each of them.  You can see characters for the username but the password will be invisible.
5.      Change directory using cd /srv/www/classifier_tool and hit “enter”
6.      Open the browser and type the address as http://classifier.internal.iedb.org:8080/classifier_tool/ and hit “enter”
a.       It will tell you it cannot find the server but it needs to be open for (7) to work
7.      To run the tool, type: python manage.py runserver classifier.internal.iedb.org:8080 into the PuTTy screen and hit “enter.”
8.      You will need to select “Try Again” on the browser page to connect to the opened site.
9.      Use the options given on the opened site to run the tool.
10.  Once the classifier is run, a link will appear from which the results can be downloaded. (Emily knows all these things).
a.       Click on the link and open the zip file.
b.      Drag the files to the query folders. 

PDB Classifier
The PDB query is run on Rohini’s computer.

1.      The query and other files are located on Rohini’s computer and placed into Places à Home Folderà bcell_textclassifierà iedb_files but there is a shortcut to iedb_files on Rohini’s desktop. 
a.       Open the iedb_files folder.
b.      Do not touch any of the lone text files (“ann results”). 
c.       The folder called “Initial catch up run for classifier” has the 7/31/12 files, which were the set of files generated after the PDB classifier was run for the first time.
2.      Open the Terminal, which is located on the taskbar.
a.       Type cd bcell_textclassifier/ [enter]
b.      Type ./runClassifier.sh [enter]
                                                  i.      Once you type [enter] you will see script.
                                                ii.      When the query is finished, the Terminal script text says “see folder iedb_files for results.”  The next line says “rohini@...”.
                                              iii.      When the script is finished running you can close the Terminal.
3.      When the script has finished running, go to the iedb_files folder
a.       If there are new PMIDs the date at the end of ann_results_pmid_2012-7-31 will be renamed with the day you ran the query and you will also see a zip file dated with the day you ran the query.  Here, “2012-7-31” is the day the query was last run and would be replaced with 2012-8-6 if there were new PMIDs on 8/6/2012.
b.      The output will be in a zip file (locate zip file icon in iedb_files folder called “ann_results_pmid_2012-7-31.zip” [or whatever the date run was]).
c.        Take the zip file and put it on the desktop.  Open the files from the desktop and send the files to your e-mail.
d.      Delete the zip file after but make sure the files you sent to yourself are correct.
4.      If new PMIDs were not found, the date at the end of the files will not be updated.


Friday, August 15, 2014

7.2 Export Wiggle Files
MEDIPS allows to export genome wide coverage pro les as wiggle les for visualization in common genome browsers.
> MEDIPS.exportWIG(Set = hESCs_MeDIP[[1]], file = "hESC.MeDIP.rep1.wig",
+ format = "rpkm", descr = "")
❼ Set: a MEDIPS or COUPLING SET. In case of a COUPLING SET, the
format parameter must be set to pdensity because in this case a sequence
pattern (e.g. CpG) density pro le will be exported.
❼ file: the output le name
❼ format: can be either count or rpkm for a MEDIPS SET or pdensity for
a COUPLING SET.
❼ descr: a track description for the wiggle le


Thursday, August 14, 2014

Two ideas for HLA typing

1. Mapping to Genomic DNA
2. using exact match instead of soft clips
3. change trimming methods
4. add QC step
5. homozygous parameter in exon

Todo:

1. try another trimming methods
2. try homozygous parameter each exon
3. investigate the reads that can map to both alleles

Wednesday, August 13, 2014

2014-08-13

1. commit change to branch
2. deploy to the /share/apps/NGS_pipeline_dev
3. run mapReads_Debug.pl with system.config


2014-08-13

USING "-s /share/apps/NGS_pipeline_dev/configs/system.config" when debug pipelines


2014-08-13

Mapping Pipeline Error:

INFO: 2014/08/13 07:54:17 Mapping.pm (3112): Report location: /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/report.html
INFO: 2014/08/13 07:54:17 Mapping.pm (3137): Searching /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13 for .bw files
INFO: 2014/08/13 07:54:17 Mapping.pm (3158): Folder: /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4.bam / Bigwig file:/Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4.bam/accepted_hits.sorted.bw
INFO: 2014/08/13 07:54:17 Mapping.pm (3171): Folder: /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4_unique.bam / Bigwig file:/Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4_unique.bam/accepted_hits.sorted.bw
WARN: 2014/08/13 07:54:17 Pipeline.pm (1017): Creating new parameter 'ALL_TRACKS_URL' and setting its value to 'http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&position=chr19&hgt.customText=https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/allTracks.txt'
WARN: 2014/08/13 07:54:17 Pipeline.pm (1017): Creating new parameter 'ALL_TRACKS_UNIQUE_URL' and setting its value to 'http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&position=chr19&hgt.customText=https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/allTracksUnique.txt'
Use of uninitialized value $bowtie_dir in concatenation (.) or string at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Pipeline/Mapping.pm line 3323.
INFO: 2014/08/13 07:54:17 Mapping.pm (3323): premapping_dir:001_premapping_filter tophat:002_tophat bowtie:  low_complexity:003_low_complexity_filter bigwig:005_bigwig bamMetrics:004_bam_metrics HTSeq:006_HTSeq calc_rpkm:007_calc_rpkm
Use of uninitialized value $bowtie_dir in concatenation (.) or string at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Pipeline/Mapping.pm line 3328.
INFO: 2014/08/13 07:54:17 Mapping.pm (3360): Error log was found /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/mapHiSeq2500Reads.err
INFO: 2014/08/13 07:54:18 Mapping.pm (3373): Dust Filtered Reads: 582601
readline() on closed filehandle $INF at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Utils.pm line 36.
Use of uninitialized value $value in scalar chomp at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Utils.pm line 39.
Use of uninitialized value in addition (+) at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Pipeline/Mapping.pm line 3402.
INFO: 2014/08/13 07:54:18 Mapping.pm (3413): Adaptor Reads: 0
INFO: 2014/08/13 07:54:18 Mapping.pm (3416): Good Illumina Reads: 49979864
INFO: 2014/08/13 07:54:18 Mapping.pm (3424): 00002_13: JrWen_RNAseq_BC_13
INFO: 2014/08/13 07:54:18 Mapping.pm (3425): Mapped reads: 33334388
INFO: 2014/08/13 07:54:18 Mapping.pm (3426): Uniquely Mapped reads: 30441021
INFO: 2014/08/13 07:54:18 Mapping.pm (3477): Expected 51330230 Total Reads the sum of the parts is 51330230
INFO: 2014/08/13 07:54:18 Mapping.pm (3478): Expected 100% the sum of the percentages is 100.000
Can't locate object method "get_param" via package "https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4.bam/accepted_hits.sorted.bw" (perhaps you forgot to load "https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4.bam/accepted_hits.sorted.bw"?) at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Pipeline/Mapping.pm line 4034.

Monday, August 4, 2014

Issue of new NGS pipeline

1. 0% mappability
2. allTrack files doesn't work
3. some wrong adaptor URL in report.html
4. wrong URL in ucsc browser link


Friday, August 1, 2014

Results folder is here:
Y:\NGS_analyses\automated\RNA-Seq\Mapping\000775_JrWen_BC13_Details


Running folder is here, where you can find the .config file + MetaData:
Y:\Groups\core\hiseq_raw_data\140529_D00361_0050_AH9A9MADXX_5_29_14_JiKa_RRBS_JrWen_BC13

Command:
/share/apps/perl/perl-5.18.1/bin/perl /share/apps/NGS_pipeline/src/mapReads.pl -a 
/Bioinformatics/NGS_analyses/ad_hoc/Groups/core/hiseq_raw_data/140529_D00361_0050_AH9A9MADXX_5_29_14_JiKa_RRBS_JrWen_BC13/JrWen_BC13.config -m /Bioinformatics/NGS_analyses/ad_hoc/Groups/core/hiseq_raw_data/140529_D00361_0050_AH9A9MADXX_5_29_14_JiKa_RRBS_JrWen_BC13/InputMetaData.csv


/share/apps/perl/perl-5.18.1/bin/perl /Bioinformatics/Users/zfu/Source_Code/NGS_Pipeline/RNAseq/20140801/src/mapReads.pl -a /Bioinformatics/Users/zfu/Source_Code/NGS_Pipeline/RNAseq/20140801/input/JrWen_BC13.config -m /Bioinformatics/Users/zfu/Source_Code/NGS_Pipeline/RNAseq/20140801/input/InputMetaData.csv


"https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/QC/000802_7_22_14_GrSe03_NXT02_VerA/SCT6_TU22_ThSTAR_BC_N502_N701_well1_CTCTCTAC-CTCTCTAT_L001_R1_001_fastqc/fastqc_report.html"

"../../QC/000802_7_22_14_GrSe03_NXT02_VerA/SCT6_TU22_ThSTAR_BC_N502_N701_well1_CTCTCTAC-CTCTCTAT_L001_R1_001_fastqc/fastqc_report.html"

"https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000803_GrSe03_NextFXP02_VerA_Details/SCT6_TU22_ThSTAR_BC_N502_N701_well1/001_premapping_filter/premapping_counts.txt"

"SCT6_TU22_ThSTAR_BC_N502_N701_well1/001_premapping_filter/premapping_counts.txt"

'ANALYSIS_ARCHIVE_QC' and setting its value to '/Bioinformatics/NGS_analyses/automated/RNA-Seq/QC'

'/Bioinformatics/NGS_analyses/automated/RNA-Seq/QC' -> '../../QC'

MASTER_RESULTS_DIR_MAPPING=/Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000803_GrSe03_NextFXP02_VerA_Details

'/Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000803_GrSe03_NextFXP02_VerA_Details' + "/" -> ''

Wednesday, July 23, 2014

=IF(Q1>49,1,0) * IF(T1>49,1,0) * IF(R1>49,1,0) * IF(U1>49,1,0)

=IF(R1>49,1,0) * IF(U1>49,1,0)

Monday, July 21, 2014

2014-07-21

for i in Final*;do echo mv $i ${i/Final/};done | sh

change multiple folders name

Thursday, July 10, 2014

2014-07-10

1. use simply_locus_sample_pair to generate locus_sample pair first

2. any other analysis should go from there

python /Bioinformatics/Users/zfu/HLA_Typing/src/HLA_Typing_Parsing_Codes/simply_locus_sample_pair.py /Bioinformatics/NGS_raw_data/LIAI/MiSeq/HLA_Typing_Runs/Run52_5loci/5loci_pooling /Bioinformatics/Users/zfu/HLA_Typing/Run52_5loci_pooling

python /Bioinformatics/Users/zfu/HLA_Typing/src/HLA_Typing_Parsing_Codes/5loci_pooling.py /Bioinformatics/NGS_raw_data/LIAI/MiSeq/HLA_Typing_Runs/Run52_5loci/original /Bioinformatics/NGS_raw_data/LIAI/MiSeq/HLA_Typing_Runs/Run52_5loci/5loci_pooling_original

Difference between V8.0_HF_105 and V8.0_105

1. 4 digits set should > 2
2. full length coverage reads > 1


mkdir /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_118
sed -e 's/HF_100/HF_118/g' /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_100/runTyping.py > /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_118/runTyping.py
python /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_118/runTyping.py

source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_101/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_102/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_103/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_104/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_110/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_106/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_107/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_108/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_109/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_111/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_112/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_113/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_114/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_115/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_116/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_117/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_118/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_119/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_120/QSUB_SUBMIT.sh



Friday, June 27, 2014

2014-06-27
 
    alleleCombination_list = choseAlleleCombination(selectedAllele_list, effective_mapped_reads_dict, primer_start, primer_end)  
   
   
  printOutMappabilityPerExon(effective_mapped_reads_dict, project_folder):
 

Thursday, June 26, 2014

2014-06-26

Function: generateExonAlignmentInfo(scratch_folder, output_id, exons_boundary_dict, equipment_id, alignmentLengthCutoff)

TODO

1. add a function named paired_mapping_reads_dict

key: short read ID
value: a list, should have two element, one forward read info and one reverse read info





Wednesday, June 18, 2014

2014-06-18

Converting SAM directly to a sorted BAM file

Like many Unix tools, SAMTools is able to read directly from a stream i.e. stdout.

samtools

samtools view -bS file.sam | samtools sort - file_sorted

1440  samtools view -bS DPB1_13\:01.report.bowtie2.sam | samtools sort - DPB1_13_01

 1442  samtools index DPB1_13_01.bam DPB1_13_01.bai


Thursday, May 22, 2014

Random Number Generation



You could use random.sample to generate the list with one call:

import random
my_randoms = random.sample(xrange(100), 10)

That generates numbers in the (inclusive) range from 0 to 99. If you want 1 to 100, you could use this (thanks to @martineau for pointing out my convoluted solution):

my_randoms = random.sample(xrange(1, 101), 10)

Monday, May 12, 2014

2014-05-12

Error information in running classifier

jython /srv/www/classifier_tool/classifier_scripts/use_weka_to_predict_sub_category.py
Traceback (most recent call last):
  File "/srv/www/classifier_tool/classifier_scripts/use_weka_to_predict_sub_category.py", line 198, in <module>
    pred=int(classifier.classifyInstance(input_data.instance(i)))
    at weka.core.RelationalLocator.copyRelationalValues(RelationalLocator.java:88)
    at weka.filters.Filter.copyValues(Filter.java:359)
    at weka.filters.Filter.push(Filter.java:276)
    at weka.filters.unsupervised.attribute.NominalToBinary.convertInstance(NominalToBinary.java:503)
    at weka.filters.unsupervised.attribute.NominalToBinary.input(NominalToBinary.java:177)
    at weka.classifiers.functions.MultilayerPerceptron.distributionForInstance(MultilayerPerceptron.java:2102)
    at weka.classifiers.Classifier.classifyInstance(Classifier.java:81)
    at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 9 != 6

Thursday, April 24, 2014

2014-04-24

update the model file and train feature file in

10.0.3.183/srv/www/classifier_tool/classifier_files/SVM_2012/

update the weka model file in

10.0.3.183/srv/www/classifier_tool/classifier_files/Weka_model_files/

wrong ids in curatable and uncuratable list due to the "\n" by using "cat" to combine files

Friday, April 18, 2014

2014-04-18

7.0.1

Class I + DRB1

Search for allele which has one read cover the entire exon for each exon (2, 3,4)

Class II

Search for allele has coverage on each base at exon 2 + exon 3

7.0.2

Exclude 10 bases from exon-intron boundary and find the minimum coverage

7.0.3

find the minimum coverage from the exon-intron boundary

Search for allele which has one read cover the entire exon for each exon (2, 3,4) , if its number < 2, then search alleles have coverage on each base at exon 2 + exon 3 + exon 4

Wednesday, April 2, 2014

2014-04-02

find . -maxdepth 6 -type f -name "depthOfCoverage_allReads.txt" -exec cp --parents {} /Bioinformatics/Users/zfu/HLA_Typing/Run42_Coverage \;

find . -maxdepth 6 -type f -name "depthOfCoverage_allReads.txt" -exec cp --parents {} /Bioinformatics/Users/zfu/HLA_Typing/Run43_Coverage \;

Tuesday, April 1, 2014

2014-04-01

1. add a count for the number of mapped reads to each exon in the output
2. add a count for the total bases mapped in each exon to the output
3. use the count of the total bases mapped to the templates to select the best pairs rather than the total number of reads mapped
4. enforce continuous coverage within each exon by adding several constraints
5. reads that cross exon borders should only be counted in the exon in which the majority of the read lies

dict 1:

key: allele id
value: dict 2

dict 2:

key: exon
value: dict 3

dict 3:

key: short read id + sequence
value: list 4

list 4: alignment start , alignment length, alignment end

Friday, March 28, 2014

2014-03-28

Workflow for retrain classifier:

1. Generate random test ids
2. Generate features
3. Generate SVM models and predictions.

Thursday, March 27, 2014

2014-03-27

Carefully thinking about the set intersection and differences

Tuesday, March 25, 2014

2013-03-25

the index of the first non-zero element in a list

myList.index(filter(lambda x: x!=0, myList)[0])
 
1813DR_S64_L001_R2_001.fastq.gz was damaged 
 
 

Thursday, March 20, 2014

2014-03-20

merge all files in different directories:

find /path/to/directory/ -name *.csv -print0 | xargs -0 -I file cat file > merged.file
 
samtools view -bS DQB1_030201.sam | samtools sort - DQB1_030201_sorted
samtools index DQB1_030201_sorted.bam DQB1_030201_sorted.bai

samtools view -bS DQB1_030501.sam | samtools sort - DQB1_030501_sorted
samtools index DQB1_030501_sorted.bam DQB1_030501_sorted.bai

samtools view -bS DQB1_0331.sam | samtools sort - DQB1_0331_sorted
samtools index DQB1_0331_sorted.bam DQB1_0331_sorted.bai

samtools view -bS DQB1_040101.sam | samtools sort - DQB1_040101_sorted
samtools index DQB1_040101_sorted.bam DQB1_040101_sorted.bai

samtools view -bS DQB1_040201.sam | samtools sort - DQB1_040201_sorted
samtools index DQB1_040201_sorted.bam DQB1_040201_sorted.bai

 

Wednesday, March 19, 2014

2014-03-19

INSERT INTO pubmed_temp SELECT * FROM pubmed_information_backup20140306;

mysql> INSERT INTO pubmed_temp SELECT * FROM pubmed_information_backup20140306;
Query OK, 360246 rows affected (47.85 sec)
Records: 360246  Duplicates: 0  Warnings: 0

mysql> ALTER IGNORE TABLE pubmed_temp ADD UNIQUE INDEX PUBMED_ID_INDEX (Pubmed_ID);
Query OK, 360246 rows affected (21.50 sec)
Records: 360246  Duplicates: 180123  Warnings: 0

SELECT * FROM pubmed_temp ORDER BY Num DESC LIMIT 10;

 RENAME TABLE pubmed_temp TO  pubmed_information;

mysql> select count(1) from t4_tokenized_pubmed_information_new;
+----------+
| count(1) |
+----------+
|    54472 |
+----------+
1 row in set (0.02 sec)





Tuesday, March 18, 2014

03-17-2014

new ids with abstract.

9450 rows in set (0.00 sec)

Thursday, March 13, 2014

2014-3-12

0. Change the order of sample name
1. Scale the raw data
2. Generate 7 csv files
3. Do heatmap
4. assemble the all-in-one table
5. Change row name to avoid duplicate

Wednesday, March 12, 2014

2014-03-11

x <- scale(x, center = FALSE)

hmap(x, labRow = FALSE, method = "OLO")

hmap(x, labRow = FALSE, method = "OLO", col=diverge_hcl(100), range=c(-3.5,3.5), colorkey=TRUE)


hmap(x, labRow = FALSE, method = "OLO", col = c("yellow", "blue"))

x <- read.csv(file="/Bioinformatics/Users/zfu/2014_BP_TB_Paper/rank_log2.csv", head=TRUE,sep=",")
x
attributes(x)
x <- as.matrix(x)
x
attributes(x)
x[1:10,]
?read.csv
x <- read.table(file="/Bioinformatics/Users/zfu/2014_BP_TB_Paper/rank_log2.csv", head=TRUE, sep=",", row.names=1)
x
x <- data.matrix(x)
x
attributes(x)
x[1:10,]
library(seritation)
library(seriation)
o <- c(seriate(dist(x), method ="OLO"),seriate(dist(t(x)), method = "OLO"))
o
history
history(100)

o1 <- seriate(dist(x), method = "OLO")
o2 <- seriate(dist(t(x)), method = "OLO")
o1
desribe(o1)
attributes(o1)
attributes(o2)
o1[1]
o1[[1]]
o1[[1]][1]
o1[[1]][[1]]
attributes(o1[[1]][[1]])
head(get_order(o1))
order1 <- get_order(o1)
order2 <- get_order(o2)
order2
x
attributes(x)
clustered_data <- x[order1,order2]
clustered_data
clustered_data[1:2,]
ls()
history(100)
> pdf("aa.pdf")
> heatmap.2(clustered_data, col=my_palette, scale="none", Colv = NULL, dendrogram = "row", key=T, keysize = 1.5, density.info="none", trace="none",cexCol=0.9, labRow=NA)
> dev.off()
null device
          1
> heatmap.2(clustered_data, col=my_palette, scale="none", dendrogram = "column", key=T, keysize=1.5, density.info="none", trace="none",cexCol=0.5, labRow=NA)


cc = c(rep("blue",10),rep("brown",11),rep("cyan",11),rep("orange",4),rep("red",15))


heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "row", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12), key=T, keysize=1)


aa_disease
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/figure3_diseaseType.pdf")
heatmap.2(clustered_data, col=my_palette, scale="none", dendrogram = "column", key=T, keysize=1, density.info="none", trace="none",cexCol=0.3, labRow=NA, ColSideColors=aa_disease, margin=c(12, 12), labCol=NA)
dev.off()
aa_disease
colnames(clustered_data)
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/test.pdf")
heatmap.2(clustered_data, col=my_palette, scale="none", dendrogram = "column", key=T, keysize=1, density.info="none", trace="none",cexCol=0.3, labRow=NA, ColSideColors=aa_disease, margin=c(12, 12))
dev.off()
x1 <- read.table(file="/Bioinformatics/Users/zfu/2014_BP_TB_Paper/th1_th17_signiture.csv", head=TRUE, sep=",", row.names=1)
x1 <- as.matrix(x1)
attributes9x1)
attributes(x1)
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "row", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12))
dev.off()
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
x1
scale(x1)
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "row", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12), key=T, keysize=1)
dev.off()
history(-25)
history(25)






x1 <- read.table(file="/Bioinformatics/Users/zfu/2014_BP_TB_Paper/th1_th17_signiture_data.csv", head=TRUE, sep=",", row.names=1)
x1 <- as.matrix(x1)
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "row", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12))
dev.off()
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "none", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12))
dev.off()
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "none", density.info="none", trace="none", cexCol=0.3, labRow=NA, margin=c(12, 12))
dev.off()
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "none", density.info="none", trace="none", cexCol=0.3, labRow=NA, margin=c(12, 12), labCol=NA)






Monday, March 10, 2014

2014-03-10

360246 rows in set (0.36 sec)
PUBMED_INFORMAION;
2014-03-10

New mapping methods

HLA pipeline 5.0.3
HLA pipeline 5.0.4
HLA pipeline 5.0.5
HLA pipeline 5.0.6
HLA pipeline 5.0.7
HLA pipeline 5.0.8
HLA pipeline 5.0.9

Old mapping methods

HLA pipeline 5.0.10
HLA pipeline 5.0.11
HLA pipeline 5.0.12
HLA pipeline 5.0.13
HLA pipeline 5.0.14
HLA pipeline 5.0.15
HLA pipeline 5.0.16

delete all sam files: HLA pipeline 5.0.X
keep all sam files:  HLA pipeline 5.0.X.1


Pipeline 5.0

Class 1: 175bp alignment
Class 2: 200bp alignment

Friday, March 7, 2014

2014-03-06

 t4_tokenized_pubmed_information     |
| t4_tokenized_pubmed_information_new |

WERE EMPTY


pubmed_information has 360246 rows in set (0.34 sec)


2014-3-5


DROP TABLE pubmed_information;
CREATE TABLE pubmed_information LIKE pubmed_information_backup20140306;

Thursday, March 6, 2014

2014-03-05

********************************

Correct One

********************************

10662 rows in set (4 hours 4 min 7.70 sec)

mysql> select Table4_20140226.PubMed_ID from Table4_20140226 LEFT JOIN pubmed_information ON Table4_20140226.PubMed_ID = pubmed_information.PubMed_ID WHERE pubmed_information.PubMed_ID IS NULL;



360246 rows in set (4 hours 56 min 4.57 sec)

mysql> select PubMed_ID from Table4_20140226 INNER JOIN pubmed_information USING (PubMed_ID);

Tuesday, March 4, 2014

2014-03-04

Mysql dataset difference

SELECT *
  FROM MyTableA
 WHERE imageURL NOT IN (SELECT imageURL FROM MyTableB)
 
SELECT a.id FROM a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL
SELECT b.id FROM b LEFT JOIN a ON b.id = a.id WHERE a.id IS NULL 

You can also use a left outer join (the first tells you where a row exists in table a and not b, the second vice-versa):
 
select Table4_20140226.PubMed_ID from Table4_20140226 LEFT JOIN pubmed_information ON Table4_20140226.PubMed_ID = pubmed_information.PubMed_ID WHERE pubmed_information.PubMed_ID IS NULL; 
 
 
SELECT DISTINCT value FROM table_a
INNER JOIN table_b
USING (value);

+-------+
| value |
+-------+
| B     |
+-------+
 
SELECT DISTINCT value FROM table_a
WHERE (value) IN
(SELECT value FROM table_b);

+-------+
| value |
+-------+
| B     |
+-------+
   

Friday, February 28, 2014

How to select a specific row in Mysql?

Select Row 56

SELECT * FROM customer LIMIT 55,1

Thursday, February 27, 2014

20140227

1. What is "new_ids_in_table4.txt" file?
2. Results of running: update_pubmed_info_with_table4.py

update_pubmed_info_with_table4.py:240: Warning: Data truncated for column 'Author' at row 1
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC5\x82otni...' for column 'Authors' at row 34775
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC5\x9Fiogl...' for column 'Authors' at row 49380
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1 or ...' for column 'Abstract' at row 170249
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC4\x87 S, ...' for column 'Authors' at row 174522
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3 ELI...' for column 'Abstract' at row 174522
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC4\x87 S.' for column 'Authors' at row 174523
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC4\x87 S, ...' for column 'Authors' at row 174525
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xBCM ea...' for column 'Abstract' at row 176282
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1 and...' for column 'Abstract' at row 177475
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB22-mi...' for column 'Title' at row 177476
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2(2)-...' for column 'Abstract' at row 177476
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xBB aut...' for column 'Abstract' at row 178076
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3 and...' for column 'Abstract' at row 178665
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2\xE2\x82\x82-...' for column 'Title' at row 178666
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2\xE2\x82\x82-...' for column 'Abstract' at row 178666
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-hai...' for column 'Abstract' at row 178668
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x80\x897BN...' for column 'Affiliations' at row 179185
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB32 do...' for column 'Abstract' at row 179185
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1GalC...' for column 'Title' at row 179737
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1GalC...' for column 'Abstract' at row 179737
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB11, \xCE...' for column 'Abstract' at row 179738
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x84\xAB) s...' for column 'Abstract' at row 179942
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1 sub...' for column 'Abstract' at row 180166
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB5RI),...' for column 'Abstract' at row 180167
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2.' for column 'Title' at row 180169
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 (IL...' for column 'Abstract' at row 180169
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1\xCE\xB2 T...' for column 'Abstract' at row 180644
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB5RI, ...' for column 'Abstract' at row 180646
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xBAB ac...' for column 'Abstract' at row 180878
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x86\x924)-...' for column 'Abstract' at row 181259
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB11 an...' for column 'Abstract' at row 181537
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB21-3G...' for column 'Abstract' at row 181538
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x80\x85nM)...' for column 'Abstract' at row 181847
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB11.' for column 'Title' at row 181848
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB11 su...' for column 'Abstract' at row 181848
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB22-mi...' for column 'Title' at row 182107
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2(2)-...' for column 'Abstract' at row 182107
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2) pl...' for column 'Abstract' at row 182108
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3-Car...' for column 'Title' at row 182109
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3-car...' for column 'Abstract' at row 182109
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x80\xB3 st...' for column 'Abstract' at row 182110
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 and...' for column 'Abstract' at row 182366
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-hai...' for column 'Abstract' at row 182749
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-Sec...' for column 'Title' at row 183362
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x88\xA5Dru...' for column 'Affiliations' at row 183362
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB240 a...' for column 'Abstract' at row 183362
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC4\x8Devi\xC4...' for column 'Authors' at row 183363
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x80\x89\xC3\x97\xE2...' for column 'Abstract' at row 183611
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 pep...' for column 'Title' at row 184011
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 (A\xCE...' for column 'Abstract' at row 184011
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-tur...' for column 'Abstract' at row 184013
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xBCM). ...' for column 'Abstract' at row 184488
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-cel...' for column 'Abstract' at row 184490
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3\xCE\xB4 T...' for column 'Title' at row 184492
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3\xCE\xB4 T...' for column 'Abstract' at row 184492
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1\xCE\xB2) ...' for column 'Abstract' at row 184494
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1-cha...' for column 'Abstract' at row 184866
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3R ef...' for column 'Abstract' at row 185759
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 dom...' for column 'Title' at row 185764
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1\xCE\xB2 T...' for column 'Abstract' at row 185764
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x88\xBC12 ...' for column 'Abstract' at row 185765
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1-hel...' for column 'Abstract' at row 185766
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1-d-m...' for column 'Abstract' at row 185767
  cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\x94H7 m...' for column 'Abstract' at row 185960
  cursor.execute(sql)

Wednesday, February 26, 2014

20140224

How to login to the IEDB-sever

mysql --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p

username: rdamle
password: iedb123

use table: Table4_20140226  as the updated table 4

mysqldump -u <db_username> -h <db_host> -p db_name table_name > table_name.sql
 
mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed pubmed_information > pubmed_information.sql 
 
 
mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed t4_tokenized_pubmed_information > t4_tokenized_pubmed_information.sql
 
 
mysql> show tables;
+---------------------------------+
| Tables_in_pubmed                |
+---------------------------------+
| pubmed_information              |
| t4_tokenized_pubmed_information |
| table4_reference                |
| table4_reference_latest         |
| table_4_reference_last_updated  |
| temp_pubmed_data                |
| temp_stemmed_for_svm            |
+---------------------------------+
7 rows in set (0.00 sec)


mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed table4_reference > table4_reference.sql 

mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed table4_reference_latest > table4_reference_latest.sql 
 
mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed table_4_reference_last_updated > table_4_reference_last_updated.sql   

mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed temp_pubmed_data > temp_pubmed_data.sql   


mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed temp_stemmed_for_svm > temp_stemmed_for_svm.sql   

Friday, February 7, 2014

2014-02-7

Need modified sample_locus.csv

Redo the analysis from 5.0.4 to 5.0.8

Thursday, February 6, 2014

2013-02-06

1. Need resubmit IHW pipeline 5.0.9 from JOB 6 to JOB 88

2. Issac microarray analysis: mouse4302cdf database


Thursday, January 30, 2014

2013-01-29

Pipeline 5.0.2

Count mapped reads which have > 100bp alignment.
2014-01-30

pipeline 5.0.3

--local (required) - end trimming
-M 100 (remove)
-N 0 (required) - 0 mismatches to seed
-L 20 (modify) - specifies seed length - should be set to 1/2 minimum alignment length since we're requiring 100% identity
-i S,1,0 (modify) - need to reduce the number of seeds tested - to test every 5 seeds from a 300mer, the combo would be S,1,0.23
--mp 1000,1000 (reqired) - all scoring options seem OK for a minimum alignment length of 50
--np 1000
--rdg 1000,1000
--rfg 1000,1000
--score-min L,100,0
--no-mixed (remove)
--fr (required)
--no-discordant (required)
--ignore-quals (required)
count reads with alignment > 50bp

Thursday, January 23, 2014

2013-01-22

Change to pipeline 5.0.1

1. remove the 30bp exon-intron boundary.

DESeq

1. input csv file has CR will ruin everything
2. GLM modeling, need to discuss with Julia
2014-01-23

Modification in 5.0.1

countMappedReads.py

1. Remove the 30bp intron-exon boundary condition in calculating the coverage
2. Generate max coverage file to record the max overall reads coverage in each exon
3.  Do not generate filter reads any more.
4. Sort all individual reference with max overall reads coverage, then pickup the top 200.

HLA_parts.sh

1. Do not remove the scratch directory

map_single.sh

1. Do not remove the scratch directory

select_single.sh

 1. Remove filtered reads location as input parameters

runTyping.py

1. use /BioScratch/zfu as scratch dir
2. no filtered reads parameters
3. use one countMappedReads.py for all classes

/Bioinformatics/Users/zfu/HLA_Typing/HLA_cDNA_Database/IMGT.Release.3.12.0/Index.Test.Single

Only kept alleles with 2-digits subtype name








Tuesday, January 21, 2014

work log 01-21-2014

1.  Finished the recommendation letter 3
2. Generate reports for HLA Run36