Lord's Lamb: 2014

Tuesday, November 25, 2014

HLA PAPER IDEA: ADD SNP CALLINGS TO DETERMINE GENOTYPE

Saturday, October 25, 2014

Below, I show 2 ways to extract every 4th line: lines 4 and lines 8 in somefile.

sed
```
$ sed -n '0~4p' somefile
line 4
line 8
$
```
0~4 means select every 4th line, beginning at line 0.

Line 0 has nothing, so the first printed line is line 4.

-n means only explicitly printed lines are included in output.

Wednesday, October 15, 2014

Rule of naming external reads:

Forward_SampleName and Reverse_SampleName

Tuesday, October 7, 2014

show all hidden files in mac os

Open Terminal found in Finder > Applications > Utilities
In Terminal, paste the following: defaults write com.apple.finder AppleShowAllFiles YES
Press return
Hold ‘alt’ on your keyboard, then right click on the Finder icon in the dock and click Relaunch.

defaults write com.apple.finder AppleShowAllFiles NO

Tuesday, September 23, 2014

1908 svn co http://epitope3.liai.org/svn/bioinformatics/jgbaum/NGS_pipeline/trunk/ ./trunk
1913 svn up
1919 svn merge --reintegrate https://epitope3.liai.org/svn/bioinformatics/jgbaum/NGS_pipeline/branches/alex/
1920 svn stat
1921 svn commit
1923 svn up

Friday, September 19, 2014

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000880_BeSc_IKZF3_Details -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000877_140829_D00361_0097_BHA0E7ADXX_GrSe10_ChIP41 -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000880_BeSc_IKZF3_Details/analysis.config -t CHIP

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000883_BeSc_IKZF3_D10_Details -q QC_LOCATION=/Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000861_8_19_14_GrSe02_verB_BAC -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000883_BeSc_IKZF3_D10_Details/analysis.config -t CHIP

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000931_StSt_BAC_708_506_Details -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000861_8_19_14_GrSe02_verB_BAC -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping//000931_StSt_BAC_708_506_Details/analysis.config -t Mapping

Wednesday, September 17, 2014

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DESeq_Inputfile_PaMa_891.csv -t PaMa_891

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DEseq_inputfile_SOTON03_11SEP14.csv -t SOTON03_11SEP14

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DESeq_Inputfile_StSc01_11SEP14.csv -t StSc01_11SEP14

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DeSeq_Inputfile_DaWeTEMRA_SCT_11SEP14.csv -t DaWeTEMRA_SCT_11SEP14

/share/apps/NGS_pipeline_dev/src/DEReads.pl -i /Bioinformatics/Groups/core/DESeq_Input/DESeq_Inputfile_IsEn_SCT_11SEP14.csv -t IsEn_SCT_11SEP14

Monday, September 15, 2014

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000876_8_27_14_AnTS_RNAseq_Details/ -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000875_08_27_14_AgTs_RNAseq/ -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000876_8_27_14_AnTS_RNAseq_Details/analysis.config -t Mapping

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000887_SLE_H3K27ac_W25_Redo_Details -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000877_140829_D00361_0097_BHA0E7ADXX_GrSe10_ChIP41 -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000887_SLE_H3K27ac_W25_Redo_Details/analysis.config -t CHIP

/share/apps/NGS_pipeline_dev/src/generateReport.pl -m /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU -q /Bioinformatics/NGS_analyses/automated/RNA-Seq/QC/000769_140529_D00361_0050_AH9A9MADXX_5_29_14_JiKa_RRBS_JrWen_BC13 -a /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/analysis.config -t Mapping

Wednesday, August 20, 2014

/share/apps/perl/perl-5.18.1/bin/perl /share/apps/NGS_pipeline_dev/src/mapReads_Debug.pl -i 000851 -a /Bioinformatics/Users/zfu/RNAseq_Analysis_Configs/000851_D10_K27Ac_Validation_Details/analysis.config -m /Bioinformatics/Users/zfu/RNAseq_Analysis_Configs/000851_D10_K27Ac_Validation_Details/InputMetaData.csv -s /share/apps/NGS_pipeline_dev/configs/system.config

Tuesday, August 19, 2014

PMID Classifier

1. Open PuTTy on the desktop

2. The host and port have been saved but they are:

a. Host: classifier.internal.iedb.org

b. Port: 30022

c. Connection Type: ssh

3. After entering 2a – 2c, click “Open.”

4. You will be prompted for a login as (rohini) and password (classify!123). Hit “enter” after typing each of them. You can see characters for the username but the password will be invisible.

5. Change directory using cd /srv/www/classifier_tool and hit “enter”

6. Open the browser and type the address as http://classifier.internal.iedb.org:8080/classifier_tool/ and hit “enter”

a. It will tell you it cannot find the server but it needs to be open for (7) to work

7. To run the tool, type: python manage.py runserver classifier.internal.iedb.org:8080 into the PuTTy screen and hit “enter.”

8. You will need to select “Try Again” on the browser page to connect to the opened site.

9. Use the options given on the opened site to run the tool.

10. Once the classifier is run, a link will appear from which the results can be downloaded. (Emily knows all these things).

a. Click on the link and open the zip file.

b. Drag the files to the query folders.

PDB Classifier

The PDB query is run on Rohini’s computer.

1. The query and other files are located on Rohini’s computer and placed into Places à Home Folderà bcell_textclassifierà iedb_files but there is a shortcut to iedb_files on Rohini’s desktop.

a. Open the iedb_files folder.

b. Do not touch any of the lone text files (“ann results”).

c. The folder called “Initial catch up run for classifier” has the 7/31/12 files, which were the set of files generated after the PDB classifier was run for the first time.

2. Open the Terminal, which is located on the taskbar.

a. Type cd bcell_textclassifier/ [enter]

b. Type ./runClassifier.sh [enter]

i. Once you type [enter] you will see script.

ii. When the query is finished, the Terminal script text says “see folder iedb_files for results.” The next line says “rohini@...”.

iii. When the script is finished running you can close the Terminal.

3. When the script has finished running, go to the iedb_files folder

a. If there are new PMIDs the date at the end of ann_results_pmid_2012-7-31 will be renamed with the day you ran the query and you will also see a zip file dated with the day you ran the query. Here, “2012-7-31” is the day the query was last run and would be replaced with 2012-8-6 if there were new PMIDs on 8/6/2012.

b. The output will be in a zip file (locate zip file icon in iedb_files folder called “ann_results_pmid_2012-7-31.zip” [or whatever the date run was]).

c. Take the zip file and put it on the desktop. Open the files from the desktop and send the files to your e-mail.

d. Delete the zip file after but make sure the files you sent to yourself are correct.

4. If new PMIDs were not found, the date at the end of the files will not be updated.

Friday, August 15, 2014

7.2 Export Wiggle Files
MEDIPS allows to export genome wide coverage pro les as wiggle les for visualization in common genome browsers.
> MEDIPS.exportWIG(Set = hESCs_MeDIP[[1]], file = "hESC.MeDIP.rep1.wig",
+ format = "rpkm", descr = "")
❼ Set: a MEDIPS or COUPLING SET. In case of a COUPLING SET, the
format parameter must be set to pdensity because in this case a sequence
pattern (e.g. CpG) density pro le will be exported.
❼ file: the output le name
❼ format: can be either count or rpkm for a MEDIPS SET or pdensity for
a COUPLING SET.
❼ descr: a track description for the wiggle le

Thursday, August 14, 2014

Two ideas for HLA typing

1. Mapping to Genomic DNA
2. using exact match instead of soft clips
3. change trimming methods
4. add QC step
5. homozygous parameter in exon

Todo:

1. try another trimming methods
2. try homozygous parameter each exon
3. investigate the reads that can map to both alleles

Wednesday, August 13, 2014

2014-08-13

1. commit change to branch
2. deploy to the /share/apps/NGS_pipeline_dev
3. run mapReads_Debug.pl with system.config

2014-08-13

USING "-s /share/apps/NGS_pipeline_dev/configs/system.config" when debug pipelines

2014-08-13

Mapping Pipeline Error:

INFO: 2014/08/13 07:54:17 Mapping.pm (3112): Report location: /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/report.html
INFO: 2014/08/13 07:54:17 Mapping.pm (3137): Searching /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13 for .bw files
INFO: 2014/08/13 07:54:17 Mapping.pm (3158): Folder: /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4.bam / Bigwig file:/Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4.bam/accepted_hits.sorted.bw
INFO: 2014/08/13 07:54:17 Mapping.pm (3171): Folder: /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4_unique.bam / Bigwig file:/Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4_unique.bam/accepted_hits.sorted.bw
WARN: 2014/08/13 07:54:17 Pipeline.pm (1017): Creating new parameter 'ALL_TRACKS_URL' and setting its value to 'http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&position=chr19&hgt.customText=https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/allTracks.txt'
WARN: 2014/08/13 07:54:17 Pipeline.pm (1017): Creating new parameter 'ALL_TRACKS_UNIQUE_URL' and setting its value to 'http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&position=chr19&hgt.customText=https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/allTracksUnique.txt'
Use of uninitialized value $bowtie_dir in concatenation (.) or string at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Pipeline/Mapping.pm line 3323.
INFO: 2014/08/13 07:54:17 Mapping.pm (3323): premapping_dir:001_premapping_filter tophat:002_tophat bowtie: low_complexity:003_low_complexity_filter bigwig:005_bigwig bamMetrics:004_bam_metrics HTSeq:006_HTSeq calc_rpkm:007_calc_rpkm
Use of uninitialized value $bowtie_dir in concatenation (.) or string at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Pipeline/Mapping.pm line 3328.
INFO: 2014/08/13 07:54:17 Mapping.pm (3360): Error log was found /Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/mapHiSeq2500Reads.err
INFO: 2014/08/13 07:54:18 Mapping.pm (3373): Dust Filtered Reads: 582601
readline() on closed filehandle $INF at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Utils.pm line 36.
Use of uninitialized value $value in scalar chomp at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Utils.pm line 39.
Use of uninitialized value in addition (+) at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Pipeline/Mapping.pm line 3402.
INFO: 2014/08/13 07:54:18 Mapping.pm (3413): Adaptor Reads: 0
INFO: 2014/08/13 07:54:18 Mapping.pm (3416): Good Illumina Reads: 49979864
INFO: 2014/08/13 07:54:18 Mapping.pm (3424): 00002_13: JrWen_RNAseq_BC_13
INFO: 2014/08/13 07:54:18 Mapping.pm (3425): Mapped reads: 33334388
INFO: 2014/08/13 07:54:18 Mapping.pm (3426): Uniquely Mapped reads: 30441021
INFO: 2014/08/13 07:54:18 Mapping.pm (3477): Expected 51330230 Total Reads the sum of the parts is 51330230
INFO: 2014/08/13 07:54:18 Mapping.pm (3478): Expected 100% the sum of the percentages is 100.000
Can't locate object method "get_param" via package "https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4.bam/accepted_hits.sorted.bw" (perhaps you forgot to load "https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000854_JrWen_BC13_Details_TEST_WU/JrWen_RNAseq_BC_13/005_bigwig/accepted_hits_filtered_dust_4.bam/accepted_hits.sorted.bw"?) at /Bioinformatics/apps/NGS_pipeline_dev/src/NGSPipeline/Pipeline/Mapping.pm line 4034.

Monday, August 4, 2014

Issue of new NGS pipeline

1. 0% mappability
2. allTrack files doesn't work
3. some wrong adaptor URL in report.html
4. wrong URL in ucsc browser link

Friday, August 1, 2014

Results folder is here:
Y:\NGS_analyses\automated\RNA-Seq\Mapping\000775_JrWen_BC13_Details

Running folder is here, where you can find the .config file + MetaData:
Y:\Groups\core\hiseq_raw_data\140529_D00361_0050_AH9A9MADXX_5_29_14_JiKa_RRBS_JrWen_BC13

Command:
/share/apps/perl/perl-5.18.1/bin/perl /share/apps/NGS_pipeline/src/mapReads.pl -a
/Bioinformatics/NGS_analyses/ad_hoc/Groups/core/hiseq_raw_data/140529_D00361_0050_AH9A9MADXX_5_29_14_JiKa_RRBS_JrWen_BC13/JrWen_BC13.config -m /Bioinformatics/NGS_analyses/ad_hoc/Groups/core/hiseq_raw_data/140529_D00361_0050_AH9A9MADXX_5_29_14_JiKa_RRBS_JrWen_BC13/InputMetaData.csv

/share/apps/perl/perl-5.18.1/bin/perl /Bioinformatics/Users/zfu/Source_Code/NGS_Pipeline/RNAseq/20140801/src/mapReads.pl -a /Bioinformatics/Users/zfu/Source_Code/NGS_Pipeline/RNAseq/20140801/input/JrWen_BC13.config -m /Bioinformatics/Users/zfu/Source_Code/NGS_Pipeline/RNAseq/20140801/input/InputMetaData.csv

"https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/QC/000802_7_22_14_GrSe03_NXT02_VerA/SCT6_TU22_ThSTAR_BC_N502_N701_well1_CTCTCTAC-CTCTCTAT_L001_R1_001_fastqc/fastqc_report.html"

"../../QC/000802_7_22_14_GrSe03_NXT02_VerA/SCT6_TU22_ThSTAR_BC_N502_N701_well1_CTCTCTAC-CTCTCTAT_L001_R1_001_fastqc/fastqc_report.html"

"https://informaticsdata.liai.org/NGS_analyses/automated/RNA-Seq/Mapping/000803_GrSe03_NextFXP02_VerA_Details/SCT6_TU22_ThSTAR_BC_N502_N701_well1/001_premapping_filter/premapping_counts.txt"

"SCT6_TU22_ThSTAR_BC_N502_N701_well1/001_premapping_filter/premapping_counts.txt"

'ANALYSIS_ARCHIVE_QC' and setting its value to '/Bioinformatics/NGS_analyses/automated/RNA-Seq/QC'

'/Bioinformatics/NGS_analyses/automated/RNA-Seq/QC' -> '../../QC'

MASTER_RESULTS_DIR_MAPPING=/Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000803_GrSe03_NextFXP02_VerA_Details

'/Bioinformatics/NGS_analyses/automated/RNA-Seq/Mapping/000803_GrSe03_NextFXP02_VerA_Details' + "/" -> ''

Wednesday, July 23, 2014

=IF(Q1>49,1,0) * IF(T1>49,1,0) * IF(R1>49,1,0) * IF(U1>49,1,0)

=IF(R1>49,1,0) * IF(U1>49,1,0)

Monday, July 21, 2014

2014-07-21

for i in Final*;do echo mv $i ${i/Final/};done | sh

change multiple folders name

Thursday, July 10, 2014

2014-07-10

1. use simply_locus_sample_pair to generate locus_sample pair first

2. any other analysis should go from there

python /Bioinformatics/Users/zfu/HLA_Typing/src/HLA_Typing_Parsing_Codes/simply_locus_sample_pair.py /Bioinformatics/NGS_raw_data/LIAI/MiSeq/HLA_Typing_Runs/Run52_5loci/5loci_pooling /Bioinformatics/Users/zfu/HLA_Typing/Run52_5loci_pooling

python /Bioinformatics/Users/zfu/HLA_Typing/src/HLA_Typing_Parsing_Codes/5loci_pooling.py /Bioinformatics/NGS_raw_data/LIAI/MiSeq/HLA_Typing_Runs/Run52_5loci/original /Bioinformatics/NGS_raw_data/LIAI/MiSeq/HLA_Typing_Runs/Run52_5loci/5loci_pooling_original

Difference between V8.0_HF_105 and V8.0_105

1. 4 digits set should > 2
2. full length coverage reads > 1

mkdir /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_118
sed -e 's/HF_100/HF_118/g' /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_100/runTyping.py > /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_118/runTyping.py
python /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_118/runTyping.py

source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_101/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_102/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_103/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_104/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_110/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_106/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_107/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_108/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_109/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_111/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_112/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_113/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_114/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_115/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_116/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_117/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_118/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_119/QSUB_SUBMIT.sh
source /Bioinformatics/Users/zfu/HLA_Typing/ARCHIVE_DONOT_DELETE/V8.0_HF_120/QSUB_SUBMIT.sh

Friday, June 27, 2014

2014-06-27

alleleCombination_list = choseAlleleCombination(selectedAllele_list, effective_mapped_reads_dict, primer_start, primer_end)

printOutMappabilityPerExon(effective_mapped_reads_dict, project_folder):

Thursday, June 26, 2014

2014-06-26

Function: generateExonAlignmentInfo(scratch_folder, output_id, exons_boundary_dict, equipment_id, alignmentLengthCutoff)

TODO

1. add a function named paired_mapping_reads_dict

key: short read ID
value: a list, should have two element, one forward read info and one reverse read info

Wednesday, June 18, 2014

2014-06-18

Converting SAM directly to a sorted BAM file

Like many Unix tools, SAMTools is able to read directly from a stream i.e. stdout.

samtools

samtools view -bS file.sam | samtools sort - file_sorted

1440 samtools view -bS DPB1_13\:01.report.bowtie2.sam | samtools sort - DPB1_13_01

1442 samtools index DPB1_13_01.bam DPB1_13_01.bai

Thursday, May 22, 2014

Random Number Generation

You could use random.sample to generate the list with one call:

import random
my_randoms = random.sample(xrange(100), 10)

That generates numbers in the (inclusive) range from 0 to 99. If you want 1 to 100, you could use this (thanks to @martineau for pointing out my convoluted solution):

my_randoms = random.sample(xrange(1, 101), 10)

Monday, May 12, 2014

2014-05-12

Error information in running classifier

jython /srv/www/classifier_tool/classifier_scripts/use_weka_to_predict_sub_category.py
Traceback (most recent call last):
File "/srv/www/classifier_tool/classifier_scripts/use_weka_to_predict_sub_category.py", line 198, in <module>
    pred=int(classifier.classifyInstance(input_data.instance(i)))
    at weka.core.RelationalLocator.copyRelationalValues(RelationalLocator.java:88)
    at weka.filters.Filter.copyValues(Filter.java:359)
    at weka.filters.Filter.push(Filter.java:276)
    at weka.filters.unsupervised.attribute.NominalToBinary.convertInstance(NominalToBinary.java:503)
    at weka.filters.unsupervised.attribute.NominalToBinary.input(NominalToBinary.java:177)
    at weka.classifiers.functions.MultilayerPerceptron.distributionForInstance(MultilayerPerceptron.java:2102)
    at weka.classifiers.Classifier.classifyInstance(Classifier.java:81)
    at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 9 != 6

Thursday, April 24, 2014

2014-04-24

update the model file and train feature file in

10.0.3.183/srv/www/classifier_tool/classifier_files/SVM_2012/

update the weka model file in

10.0.3.183/srv/www/classifier_tool/classifier_files/Weka_model_files/

wrong ids in curatable and uncuratable list due to the "\n" by using "cat" to combine files

Friday, April 18, 2014

2014-04-18

7.0.1

Class I + DRB1

Search for allele which has one read cover the entire exon for each exon (2, 3,4)

Class II

Search for allele has coverage on each base at exon 2 + exon 3

7.0.2

Exclude 10 bases from exon-intron boundary and find the minimum coverage

7.0.3

find the minimum coverage from the exon-intron boundary

Search for allele which has one read cover the entire exon for each exon (2, 3,4) , if its number < 2, then search alleles have coverage on each base at exon 2 + exon 3 + exon 4

Wednesday, April 2, 2014

2014-04-02

find . -maxdepth 6 -type f -name "depthOfCoverage_allReads.txt" -exec cp --parents {} /Bioinformatics/Users/zfu/HLA_Typing/Run42_Coverage \;

find . -maxdepth 6 -type f -name "depthOfCoverage_allReads.txt" -exec cp --parents {} /Bioinformatics/Users/zfu/HLA_Typing/Run43_Coverage \;

Tuesday, April 1, 2014

2014-04-01

1. add a count for the number of mapped reads to each exon in the output
2. add a count for the total bases mapped in each exon to the output
3. use the count of the total bases mapped to the templates to select the best pairs rather than the total number of reads mapped
4. enforce continuous coverage within each exon by adding several constraints
5. reads that cross exon borders should only be counted in the exon in which the majority of the read lies

dict 1:

key: allele id
value: dict 2

dict 2:

key: exon
value: dict 3

dict 3:

key: short read id + sequence
value: list 4

list 4: alignment start , alignment length, alignment end

Friday, March 28, 2014

2014-03-28

Workflow for retrain classifier:

1. Generate random test ids
2. Generate features
3. Generate SVM models and predictions.

Thursday, March 27, 2014

2014-03-27

Carefully thinking about the set intersection and differences

Tuesday, March 25, 2014

2013-03-25

the index of the first non-zero element in a list

myList.index(filter(lambda x: x!=0, myList)[0])

1813DR_S64_L001_R2_001.fastq.gz was damaged

Thursday, March 20, 2014

2014-03-20

merge all files in different directories:

find /path/to/directory/ -name *.csv -print0 | xargs -0 -I file cat file > merged.file

samtools view -bS DQB1_030201.sam | samtools sort - DQB1_030201_sorted
samtools index DQB1_030201_sorted.bam DQB1_030201_sorted.bai

samtools view -bS DQB1_030501.sam | samtools sort - DQB1_030501_sorted
samtools index DQB1_030501_sorted.bam DQB1_030501_sorted.bai

samtools view -bS DQB1_0331.sam | samtools sort - DQB1_0331_sorted
samtools index DQB1_0331_sorted.bam DQB1_0331_sorted.bai

samtools view -bS DQB1_040101.sam | samtools sort - DQB1_040101_sorted
samtools index DQB1_040101_sorted.bam DQB1_040101_sorted.bai

samtools view -bS DQB1_040201.sam | samtools sort - DQB1_040201_sorted
samtools index DQB1_040201_sorted.bam DQB1_040201_sorted.bai

Wednesday, March 19, 2014

2014-03-19

INSERT INTO pubmed_temp SELECT * FROM pubmed_information_backup20140306;

mysql> INSERT INTO pubmed_temp SELECT * FROM pubmed_information_backup20140306;
Query OK, 360246 rows affected (47.85 sec)
Records: 360246 Duplicates: 0 Warnings: 0

mysql> ALTER IGNORE TABLE pubmed_temp ADD UNIQUE INDEX PUBMED_ID_INDEX (Pubmed_ID);
Query OK, 360246 rows affected (21.50 sec)
Records: 360246 Duplicates: 180123 Warnings: 0

SELECT * FROM pubmed_temp ORDER BY Num DESC LIMIT 10;

RENAME TABLE pubmed_temp TO pubmed_information;

mysql> select count(1) from t4_tokenized_pubmed_information_new;
+----------+
| count(1) |
+----------+
| 54472 |
+----------+
1 row in set (0.02 sec)

Tuesday, March 18, 2014

03-17-2014

new ids with abstract.

9450 rows in set (0.00 sec)

Thursday, March 13, 2014

2014-3-12

0. Change the order of sample name
1. Scale the raw data
2. Generate 7 csv files
3. Do heatmap
4. assemble the all-in-one table
5. Change row name to avoid duplicate

Wednesday, March 12, 2014

2014-03-11

x <- scale(x, center = FALSE)

hmap(x, labRow = FALSE, method = "OLO")

hmap(x, labRow = FALSE, method = "OLO", col=diverge_hcl(100), range=c(-3.5,3.5), colorkey=TRUE)

hmap(x, labRow = FALSE, method = "OLO", col = c("yellow", "blue"))

x <- read.csv(file="/Bioinformatics/Users/zfu/2014_BP_TB_Paper/rank_log2.csv", head=TRUE,sep=",")
x
attributes(x)
x <- as.matrix(x)
x
attributes(x)
x[1:10,]
?read.csv
x <- read.table(file="/Bioinformatics/Users/zfu/2014_BP_TB_Paper/rank_log2.csv", head=TRUE, sep=",", row.names=1)
x
x <- data.matrix(x)
x
attributes(x)
x[1:10,]
library(seritation)
library(seriation)
o <- c(seriate(dist(x), method ="OLO"),seriate(dist(t(x)), method = "OLO"))
o
history
history(100)

o1 <- seriate(dist(x), method = "OLO")
o2 <- seriate(dist(t(x)), method = "OLO")
o1
desribe(o1)
attributes(o1)
attributes(o2)
o1[1]
o1[[1]]
o1[[1]][1]
o1[[1]][[1]]
attributes(o1[[1]][[1]])
head(get_order(o1))
order1 <- get_order(o1)
order2 <- get_order(o2)
order2
x
attributes(x)
clustered_data <- x[order1,order2]
clustered_data
clustered_data[1:2,]
ls()
history(100)
> pdf("aa.pdf")
> heatmap.2(clustered_data, col=my_palette, scale="none", Colv = NULL, dendrogram = "row", key=T, keysize = 1.5, density.info="none", trace="none",cexCol=0.9, labRow=NA)
> dev.off()
null device
1
> heatmap.2(clustered_data, col=my_palette, scale="none", dendrogram = "column", key=T, keysize=1.5, density.info="none", trace="none",cexCol=0.5, labRow=NA)

cc = c(rep("blue",10),rep("brown",11),rep("cyan",11),rep("orange",4),rep("red",15))

heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "row", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12), key=T, keysize=1)

aa_disease
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/figure3_diseaseType.pdf")
heatmap.2(clustered_data, col=my_palette, scale="none", dendrogram = "column", key=T, keysize=1, density.info="none", trace="none",cexCol=0.3, labRow=NA, ColSideColors=aa_disease, margin=c(12, 12), labCol=NA)
dev.off()
aa_disease
colnames(clustered_data)
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/test.pdf")
heatmap.2(clustered_data, col=my_palette, scale="none", dendrogram = "column", key=T, keysize=1, density.info="none", trace="none",cexCol=0.3, labRow=NA, ColSideColors=aa_disease, margin=c(12, 12))
dev.off()
x1 <- read.table(file="/Bioinformatics/Users/zfu/2014_BP_TB_Paper/th1_th17_signiture.csv", head=TRUE, sep=",", row.names=1)
x1 <- as.matrix(x1)
attributes9x1)
attributes(x1)
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "row", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12))
dev.off()
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
x1
scale(x1)
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "row", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12), key=T, keysize=1)
dev.off()
history(-25)
history(25)

x1 <- read.table(file="/Bioinformatics/Users/zfu/2014_BP_TB_Paper/th1_th17_signiture_data.csv", head=TRUE, sep=",", row.names=1)
x1 <- as.matrix(x1)
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "row", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12))
dev.off()
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "none", density.info="none", trace="none", cexCol=0.3, labRow=NA, labCol=NA, margin=c(12, 12))
dev.off()
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "none", density.info="none", trace="none", cexCol=0.3, labRow=NA, margin=c(12, 12))
dev.off()
pdf("/Bioinformatics/Users/zfu/2014_BP_TB_Paper/Th1_Th17_Significant.pdf")
heatmap.2(scale(x1), col=my_palette, scale="none", Colv=NULL, dendrogram = "none", density.info="none", trace="none", cexCol=0.3, labRow=NA, margin=c(12, 12), labCol=NA)

Monday, March 10, 2014

2014-03-10

360246 rows in set (0.36 sec)
PUBMED_INFORMAION;

2014-03-10

New mapping methods

HLA pipeline 5.0.3
HLA pipeline 5.0.4
HLA pipeline 5.0.5
HLA pipeline 5.0.6
HLA pipeline 5.0.7
HLA pipeline 5.0.8
HLA pipeline 5.0.9

Old mapping methods

HLA pipeline 5.0.10
HLA pipeline 5.0.11
HLA pipeline 5.0.12
HLA pipeline 5.0.13
HLA pipeline 5.0.14
HLA pipeline 5.0.15
HLA pipeline 5.0.16

delete all sam files: HLA pipeline 5.0.X
keep all sam files: HLA pipeline 5.0.X.1

Pipeline 5.0

Class 1: 175bp alignment
Class 2: 200bp alignment

Friday, March 7, 2014

2014-03-06

t4_tokenized_pubmed_information |
| t4_tokenized_pubmed_information_new |

WERE EMPTY

pubmed_information has 360246 rows in set (0.34 sec)

2014-3-5

DROP TABLE pubmed_information;
CREATE TABLE pubmed_information LIKE pubmed_information_backup20140306;

Thursday, March 6, 2014

2014-03-05

********************************

Correct One

********************************

10662 rows in set (4 hours 4 min 7.70 sec)

mysql> select Table4_20140226.PubMed_ID from Table4_20140226 LEFT JOIN pubmed_information ON Table4_20140226.PubMed_ID = pubmed_information.PubMed_ID WHERE pubmed_information.PubMed_ID IS NULL;

360246 rows in set (4 hours 56 min 4.57 sec)

mysql> select PubMed_ID from Table4_20140226 INNER JOIN pubmed_information USING (PubMed_ID);

Tuesday, March 4, 2014

2014-03-04

Mysql dataset difference

SELECT *
  FROM MyTableA
 WHERE imageURL NOT IN (SELECT imageURL FROM MyTableB)

SELECT a.id FROM a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL
SELECT b.id FROM b LEFT JOIN a ON b.id = a.id WHERE a.id IS NULL

You can also use a left outer join (the first tells you where a row exists in table a and not b, the second vice-versa):

select Table4_20140226.PubMed_ID from Table4_20140226 LEFT JOIN pubmed_information ON Table4_20140226.PubMed_ID = pubmed_information.PubMed_ID WHERE pubmed_information.PubMed_ID IS NULL;

SELECT DISTINCT value FROM table_a
INNER JOIN table_b
USING (value);

+-------+
| value |
+-------+
| B     |
+-------+

SELECT DISTINCT value FROM table_a
WHERE (value) IN
(SELECT value FROM table_b);

+-------+
| value |
+-------+
| B     |
+-------+

Friday, February 28, 2014

How to select a specific row in Mysql?

Select Row 56

SELECT * FROM customer LIMIT 55,1

Thursday, February 27, 2014

20140227

1. What is "new_ids_in_table4.txt" file?
2. Results of running: update_pubmed_info_with_table4.py

update_pubmed_info_with_table4.py:240: Warning: Data truncated for column 'Author' at row 1
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC5\x82otni...' for column 'Authors' at row 34775
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC5\x9Fiogl...' for column 'Authors' at row 49380
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1 or ...' for column 'Abstract' at row 170249
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC4\x87 S, ...' for column 'Authors' at row 174522
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3 ELI...' for column 'Abstract' at row 174522
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC4\x87 S.' for column 'Authors' at row 174523
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC4\x87 S, ...' for column 'Authors' at row 174525
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xBCM ea...' for column 'Abstract' at row 176282
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1 and...' for column 'Abstract' at row 177475
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB22-mi...' for column 'Title' at row 177476
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2(2)-...' for column 'Abstract' at row 177476
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xBB aut...' for column 'Abstract' at row 178076
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3 and...' for column 'Abstract' at row 178665
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2\xE2\x82\x82-...' for column 'Title' at row 178666
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2\xE2\x82\x82-...' for column 'Abstract' at row 178666
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-hai...' for column 'Abstract' at row 178668
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x80\x897BN...' for column 'Affiliations' at row 179185
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB32 do...' for column 'Abstract' at row 179185
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1GalC...' for column 'Title' at row 179737
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1GalC...' for column 'Abstract' at row 179737
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB11, \xCE...' for column 'Abstract' at row 179738
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x84\xAB) s...' for column 'Abstract' at row 179942
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1 sub...' for column 'Abstract' at row 180166
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB5RI),...' for column 'Abstract' at row 180167
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2.' for column 'Title' at row 180169
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 (IL...' for column 'Abstract' at row 180169
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1\xCE\xB2 T...' for column 'Abstract' at row 180644
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB5RI, ...' for column 'Abstract' at row 180646
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xBAB ac...' for column 'Abstract' at row 180878
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x86\x924)-...' for column 'Abstract' at row 181259
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB11 an...' for column 'Abstract' at row 181537
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB21-3G...' for column 'Abstract' at row 181538
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x80\x85nM)...' for column 'Abstract' at row 181847
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB11.' for column 'Title' at row 181848
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB11 su...' for column 'Abstract' at row 181848
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB22-mi...' for column 'Title' at row 182107
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2(2)-...' for column 'Abstract' at row 182107
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2) pl...' for column 'Abstract' at row 182108
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3-Car...' for column 'Title' at row 182109
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3-car...' for column 'Abstract' at row 182109
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x80\xB3 st...' for column 'Abstract' at row 182110
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 and...' for column 'Abstract' at row 182366
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-hai...' for column 'Abstract' at row 182749
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-Sec...' for column 'Title' at row 183362
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x88\xA5Dru...' for column 'Affiliations' at row 183362
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB240 a...' for column 'Abstract' at row 183362
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xC4\x8Devi\xC4...' for column 'Authors' at row 183363
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x80\x89\xC3\x97\xE2...' for column 'Abstract' at row 183611
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 pep...' for column 'Title' at row 184011
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 (A\xCE...' for column 'Abstract' at row 184011
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-tur...' for column 'Abstract' at row 184013
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xBCM). ...' for column 'Abstract' at row 184488
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2-cel...' for column 'Abstract' at row 184490
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3\xCE\xB4 T...' for column 'Title' at row 184492
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3\xCE\xB4 T...' for column 'Abstract' at row 184492
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1\xCE\xB2) ...' for column 'Abstract' at row 184494
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1-cha...' for column 'Abstract' at row 184866
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB3R ef...' for column 'Abstract' at row 185759
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB2 dom...' for column 'Title' at row 185764
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1\xCE\xB2 T...' for column 'Abstract' at row 185764
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xE2\x88\xBC12 ...' for column 'Abstract' at row 185765
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1-hel...' for column 'Abstract' at row 185766
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\xB1-d-m...' for column 'Abstract' at row 185767
cursor.execute(sql)
update_pubmed_info_with_table4.py:338: Warning: Incorrect string value: '\xCE\x94H7 m...' for column 'Abstract' at row 185960
cursor.execute(sql)

Wednesday, February 26, 2014

20140224

How to login to the IEDB-sever

mysql --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p

username: rdamle
password: iedb123

use table: Table4_20140226 as the updated table 4

mysqldump -u <db_username> -h <db_host> -p db_name table_name > table_name.sql

mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed pubmed_information > pubmed_information.sql

mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed t4_tokenized_pubmed_information > t4_tokenized_pubmed_information.sql

mysql> show tables;
+---------------------------------+
| Tables_in_pubmed                |
+---------------------------------+
| pubmed_information              |
| t4_tokenized_pubmed_information |
| table4_reference                |
| table4_reference_latest         |
| table_4_reference_last_updated  |
| temp_pubmed_data                |
| temp_stemmed_for_svm            |
+---------------------------------+
7 rows in set (0.00 sec)

mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed table4_reference > table4_reference.sql

mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed table4_reference_latest > table4_reference_latest.sql

mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed table_4_reference_last_updated > table_4_reference_last_updated.sql

mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed temp_pubmed_data > temp_pubmed_data.sql



mysqldump --host=10.0.3.49 --protocol=TCP --port=33306 -u rdamle -p pubmed temp_stemmed_for_svm > temp_stemmed_for_svm.sql

Friday, February 7, 2014

2014-02-7

Need modified sample_locus.csv

Redo the analysis from 5.0.4 to 5.0.8

Thursday, February 6, 2014

2013-02-06

1. Need resubmit IHW pipeline 5.0.9 from JOB 6 to JOB 88

2. Issac microarray analysis: mouse4302cdf database

Thursday, January 30, 2014

2013-01-29

Pipeline 5.0.2

Count mapped reads which have > 100bp alignment.

2014-01-30

pipeline 5.0.3

--local (required) - end trimming

-M 100 (remove)

-N 0 (required) - 0 mismatches to seed

-L 20 (modify) - specifies seed length - should be set to 1/2 minimum alignment length since we're requiring 100% identity

-i S,1,0 (modify) - need to reduce the number of seeds tested - to test every 5 seeds from a 300mer, the combo would be S,1,0.23

--mp 1000,1000 (reqired) - all scoring options seem OK for a minimum alignment length of 50

--np 1000

--rdg 1000,1000

--rfg 1000,1000

--score-min L,100,0

--no-mixed (remove)

--fr (required)

--no-discordant (required)

--ignore-quals (required)

count reads with alignment > 50bp

Thursday, January 23, 2014

2013-01-22

Change to pipeline 5.0.1

1. remove the 30bp exon-intron boundary.

DESeq

1. input csv file has CR will ruin everything
2. GLM modeling, need to discuss with Julia

2014-01-23

Modification in 5.0.1

countMappedReads.py

1. Remove the 30bp intron-exon boundary condition in calculating the coverage
2. Generate max coverage file to record the max overall reads coverage in each exon
3. Do not generate filter reads any more.
4. Sort all individual reference with max overall reads coverage, then pickup the top 200.

HLA_parts.sh

1. Do not remove the scratch directory

map_single.sh

1. Do not remove the scratch directory

select_single.sh

1. Remove filtered reads location as input parameters

runTyping.py

1. use /BioScratch/zfu as scratch dir
2. no filtered reads parameters
3. use one countMappedReads.py for all classes

/Bioinformatics/Users/zfu/HLA_Typing/HLA_cDNA_Database/IMGT.Release.3.12.0/Index.Test.Single

Only kept alleles with 2-digits subtype name

Tuesday, January 21, 2014

work log 01-21-2014

1. Finished the recommendation letter 3
2. Generate reports for HLA Run36

Lord's Lamb