Caleb’s Presentation - Rentrez

1) Create a search of a topic, gene, or organism

2) Use the summary function to extract relavent information

3) Create a plot to see how this topic has been reported over a particular time span

Install and load the following libraries:

require(rentrez)

## Loading required package: rentrez

## Warning: package 'rentrez' was built under R version 4.2.2

require(glue)

## Loading required package: glue

require(tidyverse)

## Loading required package: tidyverse

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.0 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

require(ggplot2)

Question #1

First look at the list of NCBI databases

entrez_dbs()

##  [1] "pubmed"          "protein"         "nuccore"         "ipg"            
##  [5] "nucleotide"      "structure"       "genome"          "annotinfo"      
##  [9] "assembly"        "bioproject"      "biosample"       "blastdbinfo"    
## [13] "books"           "cdd"             "clinvar"         "gap"            
## [17] "gapplus"         "grasp"           "dbvar"           "gene"           
## [21] "gds"             "geoprofiles"     "homologene"      "medgen"         
## [25] "mesh"            "nlmcatalog"      "omim"            "orgtrack"       
## [29] "pmc"             "popset"          "proteinclusters" "pcassay"        
## [33] "protfam"         "pccompound"      "pcsubstance"     "seqannot"       
## [37] "snp"             "sra"             "taxonomy"        "biocollections" 
## [41] "gtr"

Next, we can specify a particular database summary; I chose nucleotides with “nuccore”

entrez_db_summary("nuccore")

##  DbName: nuccore
##  MenuName: Nucleotide
##  Description: Core Nucleotide db
##  DbBuild: Build221113-0445m.1
##  Count: 509183547
##  LastUpdate: 2022/11/15 09:50

Examine how many nucleotide hits there are for opsin… too many!

opsin_search<- entrez_search(db = "nuccore", term = "opsin")
opsin_search

## Entrez search result with 35130 hits (object contains 20 IDs and no web_history object)
##  Search term (as translated):  opsin[All Fields]

Lets narrow it down

Question #2

Create a file that returns a list of IDs that contain the word opsin in the title of pubmed articles

Add more specificity by restricting this research to amphibians (my study taxa)… salamanders are too narrow of a search

Return the number of IDs

entrez_summary(db = "pubmed", id = opsin_search$ids)

## Warning: ID 2328101885 produced error 'cannot get document summary'

## Warning: ID 2328088504 produced error 'cannot get document summary'

## Warning: ID 2328071543 produced error 'cannot get document summary'

## Warning: ID 2328070562 produced error 'cannot get document summary'

## Warning: ID 2328065778 produced error 'cannot get document summary'

## Warning: ID 2328033094 produced error 'cannot get document summary'

## Warning: ID 2325510557 produced error 'cannot get document summary'

## Warning: ID 2325510556 produced error 'cannot get document summary'

## Warning: ID 2325510555 produced error 'cannot get document summary'

## Warning: ID 2325510554 produced error 'cannot get document summary'

## Warning: ID 2325510552 produced error 'cannot get document summary'

## Warning: ID 922960043 produced error 'cannot get document summary'

## Warning: ID 2327721060 produced error 'cannot get document summary'

## Warning: ID 2327720543 produced error 'cannot get document summary'

## Warning: ID 2327717981 produced error 'cannot get document summary'

## Warning: ID 2327712446 produced error 'cannot get document summary'

## Warning: ID 2327712445 produced error 'cannot get document summary'

## Warning: ID 2327707916 produced error 'cannot get document summary'

## Warning: ID 2327707414 produced error 'cannot get document summary'

## Warning: ID 2327703735 produced error 'cannot get document summary'

## List of  20 esummary records. First record:
## 
##  $`2328101885`
## esummary result with 1 items:
## [1] uid

opsin_search <- entrez_search(db = "pubmed", term = "opsin [TITLE] AND amphibians")
opsin_search

## Entrez search result with 58 hits (object contains 20 IDs and no web_history object)
##  Search term (as translated):  opsin[TITLE] AND ("amphibians"[MeSH Terms] OR "amp ...

There are 20 summary records and 58 entrez search hits

Now we can implement the fetch command

`Fetch` gets complete representation, and `rettype` can specify fasta files

all_opsins<-entrez_fetch(db= "pubmed", id= opsin_search$ids, rettype = "fasta")
class(all_opsins)

## [1] "character"

Gives you the number of characters

nchar(all_opsins)

## [1] 37938

Export the data collected, if desired

write(all_opsins, file = "amphibian_opsin.fasta")

Question #3

I used to study the SHH gene for my master’s degree, so I wanted to compare the number of searches of SHH (Sonic Hedgehog) vs my current gene of study, OPSIN genes

year <- 1960:2022
opsin_search <- glue("opsin[TITLE]) AND {year}[PDAT]")
SHH_search <- glue("sonic hedgehog [TITLE] AND {year}[PDAT]")

search_counts <- tibble(year = year,
                        opsin_search = opsin_search, 
                        SHH_search = SHH_search) %>% 
  mutate(opsin = map_dbl(opsin_search, ~entrez_search(db="pubmed",
                                                      term = .x)$count),
         SHH = map_dbl(SHH_search, ~entrez_search(db="pubmed",
                                                  term = .x)$count))


search_counts %>% 
  select(year, opsin, SHH) %>% 
  pivot_longer(-year) %>% 
  ggplot(aes(x = year,
             y = value,
             group = name,
             color = name))+
  ylab("Search Count") +
  xlab("Year") +
  geom_line()+
  geom_smooth()+
  geom_point(color = "black")+
  theme_bw()+
  ggtitle("Comparison of Sonic Hedge vs Opsin Gene Searches")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Caleb_pres

Emily Bierbaum

2022-11-15

Caleb’s Presentation - Rentrez

1) Create a search of a topic, gene, or organism

2) Use the summary function to extract relavent information

3) Create a plot to see how this topic has been reported over a particular time span

Install and load the following libraries:

Question #1

First look at the list of NCBI databases

Next, we can specify a particular database summary; I chose nucleotides with “nuccore”

Examine how many nucleotide hits there are for opsin… too many!

Lets narrow it down

Question #2

Create a file that returns a list of IDs that contain the word opsin in the title of pubmed articles

Add more specificity by restricting this research to amphibians (my study taxa)… salamanders are too narrow of a search

Return the number of IDs

There are 20 summary records and 58 entrez search hits

Now we can implement the fetch command

`Fetch` gets complete representation, and `rettype` can specify fasta files

Gives you the number of characters

Export the data collected, if desired

Question #3

I used to study the SHH gene for my master’s degree, so I wanted to compare the number of searches of SHH (Sonic Hedgehog) vs my current gene of study, OPSIN genes

Caleb_pres

Emily Bierbaum

2022-11-15

Caleb’s Presentation - Rentrez

1) Create a search of a topic, gene, or organism

2) Use the summary function to extract relavent information

3) Create a plot to see how this topic has been reported over a particular time span

Install and load the following libraries:

Question #1

First look at the list of NCBI databases

Next, we can specify a particular database summary; I chose nucleotides with “nuccore”

Examine how many nucleotide hits there are for opsin… too many!

Lets narrow it down

Question #2

Create a file that returns a list of IDs that contain the word opsin in the title of pubmed articles

Add more specificity by restricting this research to amphibians (my study taxa)… salamanders are too narrow of a search

Return the number of IDs

There are 20 summary records and 58 entrez search hits

Now we can implement the fetch command

Fetch gets complete representation, and rettype can specify fasta files

Gives you the number of characters

Export the data collected, if desired

Question #3

I used to study the SHH gene for my master’s degree, so I wanted to compare the number of searches of SHH (Sonic Hedgehog) vs my current gene of study, OPSIN genes

`Fetch` gets complete representation, and `rettype` can specify fasta files