140 posts · 67,007 views
Yesterday a tweet to a great post came across the ethers, and ever since I read it I knew I had to write this post. Here’s the original nugget:
RT @ctitusbrown: (my) thoughts on data intensive science & workflows: http://bit.ly/tWXSnx
It is a post about why end users are not adopting workflows which could really help them in this eScience world we find ourselves in, and as we keep moving forward with giant data sets and “big data” projects. And some other points about........ Read more »
Goecks, J., Nekrutenko, A., Taylor, J., & Galaxy Team, T. (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, 11(8). DOI: 10.1186/gb-2010-11-8-r86
In most of software and database development the changes that are coming along all the time seem to be tweaks and polishes on the existing strategies. Every so often, though, there’s a big shift in the strategy or mechanism. This week the JBrowse paper I read made me realize that is now firmly underway. Today’s tip of the week will introduce JBrowse, and here I’ll describe some of the reasons this is a game changer.... Read more »
PLoS Biology reports today on WikiPathway. The paper entitled “WikiPathways: Pathway editing for the people,” announces a new wiki for the ‘public curation’ of pathway data. The authors argue that
Â The exponential growth of diverse types of biological data presents the research community with an unprecedented challenge to keep the flood of biological data as accessible, ... Read more »
The ENCODE project is one of the “big data” projects that is generating genome-wide data on a variety of different aspects of genome biology. It’s been around for a while, and some people have heard about it but really haven’t begun to dive into the data yet. And they really should.
We’ve had our hands on the ENCODE data since the earliest days of the new scale-up or production phase. We’ve been doing outreach for the UCSC Genome Browser’s DCC portion of........ Read more »
The ENCODE Project Consortium. (2011) A User's Guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biology, 9(4). DOI: 10.1371/journal.pbio.1001046
More and more we are seeing questions about ways to access Epigenomics data in the workshops we do. This often comes up in the workshop we do that focuses on the ENCODE data, because ENCODE is providing several epigenomics data sets that researchers are interested in. [The workshop we do is based on the materials [...]... Read more »
Integrating large data sets for queries within–and across–various collections is one of the arenas that has lately been pretty active in bioinformatics. As more and more “big data” projects yield huge numbers of data points and data types, this is only becoming more necessary. I love to browse data, but there are times when a large-scale customized query is what you’ll want to make some broader discoveries.
Right now there are a number of resources and interfaces........ Read more »
Lyne, R., Smith, R., Rutherford, K., Wakeling, M., Varley, A., Guillier, F., Janssens, H., Ji, W., Mclaren, P., North, P.... (2007) FlyMine: an integrated database for Drosophila and Anopheles genomics. Genome Biology, 8(7). DOI: 10.1186/gb-2007-8-7-r129
In the realm of bioinformatics resources, few are more venerable than OMIM®, Online Mendelian Inheritance in Man [well, originally not online, on index cards...]. For those who might be new to OMIM, it is a catalog of genes and their variations, and resulting phenotypes in human, with a more clinical perspective than some resources offer. As I was reviewing the history of OMIM for this post, I began to wonder if there even is any repository in genomics that’s been maintained on a compute........ Read more »
Amberger, J., Bocchini, C., & Hamosh, A. (2011) A new face and new challenges for online mendelian inheritance in man (OMIM®). Human Mutation, 32(5), 564-567. DOI: 10.1002/humu.21466
Strasser, B. (2009) Collecting, Comparing, and Computing Sequences: The Making of Margaret O. Dayhoff’s Atlas of Protein Sequence and Structure, 1954–1965. Journal of the History of Biology, 43(4), 623-660. DOI: 10.1007/s10739-009-9221-0
mmmmm….another “big data” paper illustrates a point I’ve been hammering on: there is terrific data coming out of these projects–but it’s not in the publications. So I’m going to talk about this paper in this post (1), but I’ll direct you to the database where the information is really available for your perusal–and to a tutorial that explains how to access it. Off we go….
The ENCODE project is one of the “big data” consortiu........ Read more »
Ernst, J., Kheradpour, P., Mikkelsen, T., Shoresh, N., Ward, L., Epstein, C., Zhang, X., Wang, L., Issner, R., Coyne, M.... (2011) Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. DOI: 10.1038/nature09906
Birney, E., Stamatoyannopoulos, J., Dutta, A., Guigó, R., Gingeras, T., Margulies, E., Weng, Z., Snyder, M., Dermitzakis, E., Stamatoyannopoulos, J.... (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature, 447(7146), 799-816. DOI: 10.1038/nature05874
Rosenbloom, K., Dreszer, T., Pheasant, M., Barber, G., Meyer, L., Pohl, A., Raney, B., Wang, T., Hinrichs, A., Zweig, A.... (2009) ENCODE whole-genome data in the UCSC Genome Browser. Nucleic Acids Research, 38(Database). DOI: 10.1093/nar/gkp961
We all know and love dbSNP, and DGV, and 1000 Genomes, and HapMap, and OMIM, and the couple of other dozen variation databases I can think of off the top of my head. But–even though there’s a lot of stuff out there–you never know what you aren’t seeing. What *isn’t* yet stored in those resources? [...]... Read more »
Bale, S., Devisscher, M., Criekinge, W., Rehm, H., Decouttere, F., Nussbaum, R., Dunnen, J., & Willems, P. (2011) MutaDATABASE: a centralized and standardized DNA variation database. Nature Biotechnology, 29(2), 117-118. DOI: 10.1038/nbt.1772
Here at OpenHelix we think a lot about the differences between nominally similar software that will accomplish some given task. For example, in our workshops we are often asked about the differences between genome browsers. Although UCSC sponsors our workshops and training materials on their browser, we know they aren’t the only genome browser out [...]... Read more »
Lacroix, T., Loux, V., Gendrault, A., Gibrat, J., & Chiapello, H. (2011) CompaGB: An open framework for genome browsers comparison. BMC Research Notes, 4(1), 133. DOI: 10.1186/1756-0500-4-133
Otto West: Apes don’t read philosophy.
Wanda: Yes they do, Otto. They just don’t understand it.
–A Fish Called Wanda
Some of you remember that last year we were treated to a strange case of DNA denialism that was making the rounds of the foodie community. Michael Pollan was all excited and aerated about it for some reason. Even Marion Nestle, who should know better, propagated this cherry-picked and non-peer-reviewed “study” that purported to show that DNA had no i........ Read more »
Berchtold, M., Egli, R., Rhyner, JA., Hameister, H., & Strehler, EE. (1993) Localization of the Human Bona Fide Calmodulin Genes CALM1, CALM2, and CALM3 to Chromosomes 14q24-q31, 2p21.1-p21.3, and 19q13.2-q13.3. Genomics, 16(2), 461-465. DOI: 10.1006/geno.1993.1211
Schouten, H., & Jacobsen, E. (2007) Are Mutations in Genetically Modified Plants Dangerous?. Journal of Biomedicine and Biotechnology, 1-3. DOI: 10.1155/2007/82612
Bioinformatics resources can be really complex–sometimes daunting, heavily loaded with crucial data, and provide amazing visualization of large data sets and various features of the underlying data. And other times, that’s way more than you need. Overkill. Like aiming an elephant gun at a mosquito.... Read more »
Fink JL, & Hamilton N. (2007) DomainDraw: a macromolecular feature drawing program. In silico biology, 7(2), 145-50. PMID: 17688439
How many of you remember the first time you saw that phage image in your Bio 100 textbook? You know–the one that had the angular head, the coiled tube, and the spiky leg-looking things? That’s been burned into my memory banks ever since. And to find out that it was just a teeny packet of [...]... Read more »
Fiers, W., Contreras, R., Duerinck, F., Haegeman, G., Iserentant, D., Merregaert, J., Min Jou, W., Molemans, F., Raeymaekers, A., Van den Berghe, A.... (1976) Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature, 260(5551), 500-507. DOI: 10.1038/260500a0
Klucar, L., Stano, M., & Hajduk, M. (2009) phiSITE: database of gene regulation in bacteriophages. Nucleic Acids Research, 38(Database). DOI: 10.1093/nar/gkp911
Stano, M., & Klucar, L. (2011) phiGENOME: An integrative navigation throughout bacteriophage genomes. Genomics. DOI: 10.1016/j.ygeno.2011.07.004
In my last tip of the week I was really pleased about the opportunity to see the data from a paper set up in a custom GBrowse, but it also reminded me of the limitations of some current strategies for visualization that we are facing. In that case one of the things that I wanted [...]... Read more »
Meyer, M., Munzner, T., DePace, A., & Pfister, H. (2010) MulteeSum: A Tool for Comparative Spatial and Temporal Gene Expression Data. IEEE Transactions on Visualization and Computer Graphics, 16(6), 908-917. DOI: 10.1109/TVCG.2010.137
Meyer, M., Wong, B., Styczynski, M., Munzner, T., & Pfister, H. (2010) Pathline: A Tool For Comparative Functional Genomics. Computer Graphics Forum, 29(3), 1043-1052. DOI: 10.1111/j.1467-8659.2009.01710.x
The field of synthetic biology has been simmering for quite a while. It occasionally takes a big leap, such as when Venter’s team published about their work on M. genitalium, and it took a big leap recently with the paper about modeling a lot of the cellular processes in a simple cell that I talked [...]... Read more »
Wilson ML, Hertzberg R, Adam L, & Peccoud J. (2011) A step-by-step introduction to rule-based design of synthetic genetic constructs using GenoCAD. Methods Enzymol. , 173-188. DOI: 10.1016/B978-0-12-385120-8.00008-5
Cai Y., Wilson M. L., & Peccoud J. (2010) GenoCAD for iGEM: a grammatical approach to the design of standard-compliant constructs. Nucleic Acids Research, 38(8), 2644. DOI: 10.1093/nar/gkq086
Tyson John J., & Novák Béla. (2010) Functional Motifs in Biochemical Reaction Networks. Annual Review of Physical Chemistry, 61(1), 240. DOI: 10.1146/annurev.physchem.012809.103457
BioMart is widely-used data management open-source software, with an interface that enables end-users to generate complex and customized queries across many types and sources of biological data. It’s part of the GMOD tool kit, and many project teams that have big data have chosen the BioMart software to organize and make their data available to [...]... Read more »
Kasprzyk, A. (2011) BioMart: driving a paradigm change in biological data management. Database. DOI: 10.1093/database/bar049
Zhang, J., Haider, S., Baran, J., Cros, A., Guberman, J., Hsu, J., Liang, Y., Yao, L., & Kasprzyk, A. (2011) BioMart: a data federation framework for large collaborative projects. Database. DOI: 10.1093/database/bar038
Guberman, J., Ai, J., Arnaiz, O., Baran, J., Blake, A., Baldock, R., Chelala, C., Croft, D., Cros, A., Cutts, R.... (2011) BioMart Central Portal: an open database network for the biological community. Database. DOI: 10.1093/database/bar041
Haider, S., Ballester, B., Smedley, D., Zhang, J., Rice, P., & Kasprzyk, A. (2009) BioMart Central Portal--unified access to biological data. Nucleic Acids Research, 37(Web Server). DOI: 10.1093/nar/gkp265
Subtitled: the data is not in the papers anymore. Again. And again. As the data deluge continues, and those next-gen sequencing setups and labs continue to crank out more and more data, the details cannot be captured in the papers anymore. They just can’t. Authors can summarize the key findings, and show compelling examples and [...]... Read more »
There are thousands of bioinformatics databases, servers, algorithms, and apps in the bioinformatics ecosystem. Even though we are immersed in this environment ourselves, it seems that every day there’s something new, and in every workshop we do someone brings us an issue they have which requires some sort of tool that we haven’t explored yet–some [...]... Read more »
Parnell, L., Lindenbaum, P., Shameer, K., Dall'Olio, G., Swan, D., Jensen, L., Cockell, S., Pedersen, B., Mangan, M., Miller, C.... (2011) BioStar: An Online Question . PLoS Computational Biology, 7(10). DOI: 10.1371/journal.pcbi.1002216
Dall'Olio, G., Marino, J., Schubert, M., Keys, K., Stefan, M., Gillespie, C., Poulain, P., Shameer, K., Sugar, R., Invergo, B.... (2011) Ten Simple Rules for Getting Help from Online Scientific Communities. PLoS Computational Biology, 7(9). DOI: 10.1371/journal.pcbi.1002202
Recently many of the bioinformatics tweeps I follow were excited about the tool called VarSifter. Here’s the notice that I saw: RT @yokofakun: http://www.youtube.com/watch?v=I7azpqTWFuM Jamie Teer describes VarSifter, an interactive GUI tool for handing/quering/filtering VCFs #ngs I just had a chance to watch the video, and now I can see why they were impressed! Over [...]... Read more »
Dreszer, T., Karolchik, D., Zweig, A., Hinrichs, A., Raney, B., Kuhn, R., Meyer, L., Wong, M., Sloan, C., Rosenbloom, K.... (2011) The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Research. DOI: 10.1093/nar/gkr1055
This week’s video tip is different from our usual tips in several ways. First, you won’t hear me–this webinar was done by Heather Merk of Ohio State. We also usually highlight web-based tools, and this presentation on R statistical computing tools relies on the command line. And it’s longer than we usually do–but because of [...]... Read more »
Blankenberg, D, Von Kuster, G, Coraor, N, Ananda, G, Lazarus, R, Mangan, M, Nekrutenko, A, & Taylor, J. (2010) Galaxy: A Web-Based Genome Analysis Tool for Experimentalists. Current Protocols in Molecular Biology, 19(10). DOI: 10.1002/0471142727.mb1910s89
Do you write about peer-reviewed research in your blog? Use ResearchBlogging.org to make it easy for your readers — and others from around the world — to find your serious posts about academic research.
If you don't have a blog, you can still use our site to learn about fascinating developments in cutting-edge research from around the world.
Research Blogging is powered by SMG Technology.
To learn more, visit seedmediagroup.com.