# TF_map **Repository Path**: clariom/TF_map ## Basic Information - **Project Name**: TF_map - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-10-28 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Follow me to create a web-tool by shiny ## Publications Please cite our latest paper when using our TFmapper - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6216026/ - http://www.ijbs.com/v14p1724.htm - http://www.tfmapper.org/ ## Contact Jianming Zeng (PHD student in university of Macau) : jmzeng1314@163.com ## Follow me ### step1: create tables and database in MYSQL First, make sure that the mysql client and server were installed successfully in your OS , and please remember the **password** for root user( the default user in mysql). Then you can log in by `mysql -u root -p ` useful link ( once you forget the password ): - https://www.variphy.com/kb/mac-os-x-reset-mysql-root-password - https://stackoverflow.com/questions/6474775/setting-the-mysql-root-user-password-on-os-x ```mysql show databases; create database tfmapperdb; show databases; CREATE USER tfmapperuser IDENTIFIED BY 'tfmapper_@Abc'; GRANT ALL PRIVILEGES ON tfmapperdb.* TO 'tfmapperuser'@'%' IDENTIFIED BY 'tfmapper_@Abc'; FLUSH PRIVILEGES; ``` Now, you just need to use the tfmapperdb and tfmapperuser. ### step2:upload data in to your database ##### gene tables Firstly, we should download the information about genes in human and mouse from [GENCODE](https://www.gencodegenes.org/) ```shell wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz ## 38M wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M20/gencode.vM20.annotation.gtf.gz ## 25M cat gencode.v29.annotation.gtf |perl -alne '{next unless $F[1] eq "HAVANA";next unless $F[2] eq "gene";/gene_id \"(.*?)\.\d+\"; gene_type \"(.*?)\"; gene_name \"(.*?)\"/;print "$3\t$2\t$1\t$F[0]\t$F[3]\t$F[4]"}' > gencode_v29_human_gene_info cat gencode.vM20.annotation.gtf |perl -alne '{next unless $F[1] eq "HAVANA";next unless $F[2] eq "gene";/gene_id \"(.*?)\.\d+\"; gene_type \"(.*?)\"; gene_name \"(.*?)\"/;print "$3\t$2\t$1\t$F[0]\t$F[3]\t$F[4]"}' > gencode_vM20_mouse_gene_info ``` It doesn't matter if you can't understand the perl scripts above, just check two files - [gencode_v29_human_gene_info](files/gencode_v29_human_gene_info) - [gencode_vM20_mouse_gene_info](files/gencode_vM20_mouse_gene_info) Then we can upload `these files` into our datbase by R codes below: ```r library(RMySQL) host <<- "127.0.0.1" port <<- 3306 user <<- "tfmapperuser" password <<- 'tfmapper_@Abc' library(RMySQL) con <- dbConnect(MySQL(), host=host, port=port, user=user, password=password) sql="USE tfmapperdb;" dbSendQuery(con, sql) sql='show tables;' dbGetQuery(con, sql) options(stringsAsFactors = F) # a simple example to upload one file into mysql . a=read.table('files/gencode_v29_human_gene_info',sep = '\t') head(a) colnames(a)=c('symbol' , 'type' , 'ensembl' , 'chr' ,'start', 'end' ) dbWriteTable(con, 'gencode_v29_human_gene_info', a, append=F,row.names=F) sql='show tables;' dbGetQuery(con, sql) ``` By this way, we should `upload all the information` for our web-tool into mysql. ##### cistrome_metadata Upload the txt files (I download those files from `cistrome`) in to `cistrome_metadata`: ``` TF_human_information.txt TF_mouse_data_information.txt ca_human_data_information.txt ca_mouse_data_information.txt histone_human_data_information.txt histone_mouse_data_information.txt other_human_data_information.txt other_mouse_data_information.txt ``` ##### cistrome_GSM_metadata Pay attention that the `columns` for this table: ``` sampleID GSM bs1 bs2 bs3 IP species type ``` gather all the GSM IDs and search the details by using `GEOmetadb` then upload them into `cistrome_GSM_metadata` Pay attention that the columns for this table: ``` [1] "ID" "title" "gsm" [4] "series_id" "gpl" "status" ············ ``` ##### encode_metadata Upload the txt files (I download those files from `ENCODE`) in to `encode_metadata`: ``` human_TF_GRCh38.conservative.bed.list.txt human_histone_GRCh38.replicated.peaks.bed.list.txt mouse_TF_mm10.conservative.peaks.bed.list.txt mouse_histone_mm10.replicated.peaks.bed.list.txt ``` ##### peaks tables (2X2X2X(23+21)) Lastly, upload all the `peaks annotation files` to mysql ( extremely time consuming and really big size ), about `300` tables. (by chromosome, database,type,species) You should read my codes from begin to end: [upload_into_mysql.R](scripts/upload_into_mysql.R) Please `send me email` to me to request those files ( about 100 Gb), you should read my paper to study the details for how to generate the files ##### very important thing We should create index for some tables in mysql to speed up the searching from user. ### step3: create user interface With the help of Xiaojie Sun, We create a beautiful `ui` framework, as below : ![](figures/home_page_inputs.png) There are totally `4 pages` in our tool, which are : **home, statistics, more, help**. You can check the codes in [UI](ui.R) Please remember the `IDs` we create in UI page: - input values - species(human or mouse )/IP(TF or histone)/database(cistrome or ENCODE)/cellline( too many ) - input_gene/genomic_feature - position, such as '18:28176327,28178670' - output values - DT::dataTableOutput('results') - plotOutput('results_stat') - DT::dataTableOutput('stat_table') - actionButton - do_gene - do_position/zoom_in/zoom_out ### step4 : create server client You can check the codes in [server](server.R) #### part 1 : refresh two input button by updateSelectizeInput check the codes in [updateSelectizeInput](scripts/updateSelectizeInput.R) the gene choices depends on the `species` user choosed ( we should search all the genes from mysql) the cellLine choices depends on the database and species and IP user choosed #### part 2 : get the specific position for choosed gene check the codes in [positions.R](scripts/positions.R) Get the position of choosed gene according to GENCODE database. ( gencode_v29_human_gene_info and gencode_vM20_mouse_gene_info in mysql) first the choosed gene will change the positon. Then zoom_in and zoom_out will also change the position. #### part 3 : search peaks by gene Check the codes in [search_by_gene.R](scripts/search_by_gene.R) Once the user click the button for searching by gene, we should return the result table( the peaks information). ```r paste0(" select * from ",peaks_tab," where symbol=",shQuote(gene) ) ``` #### part 4 : search peaks by position Check the codes in [search_by_position.R](scripts/search_by_position.R) The similar codes as above, this time we don't search peaks by gene, instead of position. ```r paste0("select * from ",peaks_tab," where start > ",start," and end < ",end) ``` #### part 5 : reture the peaks table. Check the codes in [output_main_result_table.R](scripts/output_main_result_table.R) this table is a little complicate. #### part 6 : referesh links Check the codes in [output_links.R](scripts/output_links.R) There are two files : downloadData_csv and downloadData_bed and one link : uiOutput('washUlink') #### part 7 : how to summary the peaks table. Check the codes in [output_stat.R](scripts/output_stat.R) ### step 5 : deploy this tool on a linux (server) we can download the free shiny-server from https://www.rstudio.com/products/shiny/download-server/ Then install shiny-server and use it to host our tool. Also we should install all the R packages which required by our tool. Then visit our tool by the public IP. ### step 6 : use it See `help` page. ## Papers citing [TFmapper](http://www.tfmapper.org/) So far, no paper cite our tool. What a pity !