As we saw earlier, network visualization in R is a breeze with the **visNetwork** package. The graphs are gorgeous, interactive, and fun to play with. In this article, we’ll look at how we can customize the nodes of our network to convey additional information. First, we’ll learn how to color a network by a variable. Then, we’ll leverage the power of to find and highlight nodes with high centrality scores. Finally, we’ll use **igraph** once more to identify and color communities. To code along, click here for the R notebook.

When you exit RStudio, you’ll see a pop-up asking, “Save workspace image to ~/.RData?” If you’re unsure, you’ll probably select **Save**. After all, it *is* the default. Plus, saving is good. Right?

Well…it depends. The next time you load RStudio, everything in your workspace (global environment) will be exactly how you left it. If it took a really long time to load your data into R, then this is fantastic. You don’t have to go through that cumbersome process again! *Phew!*

But what if, instead, you had a bunch of useless crap you’d prefer never to see again? You might…

When it comes to network analysis, **igraph** is a beast of a package with extensive documentation. If you do network analysis in R, chances are you’ve at least heard of it. With **igraph**, you can create and compare deterministic or stochastic networks, calculate centrality measures, and find communities within a network. You can even implement graph algorithms on a network with *millions* of nodes and edges!

Even when it comes to visualization, **igraph** has an abundance of options. First, there’s the basic functionality you’d expect for plotting: changing the size, shape, labels, and layout of the network. But you can…

From gene regulatory networks to agricultural pest control, there are numerous applications of Bayesian networks in biology. If you’re a biologist, or even a computer scientist, this list will help you begin your journey towards understanding Bayesian networks and their use in biology.

A new package called *‘backbone’* recently dropped on CRAN and I’m super stoked about it. Essentially, the package simplifies a co-occurrence matrix by extracting its ‘backbone.’ The package offers various methods for extracting a backbone, one of which involves the use of the hypergeometric distribution to assess whether the co-occurrence is significantly less than or greater than expected. Since the ‘*cooccur*’ package does the same thing, I thought, “Hmmm…Which one’s faster?” And that, my dear friends, is how this battle of the CRAN packages began. Without further ado, let’s start this PACKAGE RACE!!!

For convenience, we’ll use the finches data…

In ecology, co-occurrence networks can help us identify relationships between species using repeated measurements of the species’ presence or absence. When evaluating potential relationships, we might ask: Given presence-absence data, are two species co-occurring at a frequency higher or lower than expected by chance? Somewhat surprisingly, although co-occurrence analysis has been around since the ’70s, there’s no universally agreed upon method for measuring co-occurrence and testing its statistical significance (Veech, 2012). In this post, we’re going to examine the probabilistic model as seen in Veech’s *A probabilistic model for analysing species co‐occurrence*(2012)*.* We’ll start by defining the model before moving…

Co-occurrence networks are a graphical representation of how frequently variables appear together. They’re commonly used in ecology and text mining, where co-occurrence measures how frequently two species are seen together within a sampling site or how frequently two words are present in a single document, respectively. A co-occurrence network allows us to examine several pairs of co-occurring variables simultaneously. To construct a co-occurrence network, each variable is represented by a node, or point. An edge, or link, connecting two nodes represents the co-occurrence between those two variables.

Here, we’ll look at how to construct co-occurrence networks in R using the…

Being mathematically gifted isn’t a strict prerequisite for being a data scientist. Sure, it helps, but being a data scientist is more than just being good at math and statistics. Being a data scientist means knowing how to solve problems and communicate them in an effective and concise manner. It’s a collection of nuanced skills and, chances are, most data scientists need additional assistance in at least one of these skills. So if math is the skill you’re lacking, don’t give up hope!

Thanks to technology, we have computers that do most of the mathematical heavy lifting for us. *Phew…*

When I first started coding in R, there was *so* much new stuff to learn it felt overwhelming. Although there were a lot of people excited to offer me advice, it was hard for me to know whether something was necessary, optional, or simply a personal preference. The following keyboard shortcuts are *technically* optional if you have no intention of using R long-term. However, if you want to move beyond early-noob-status, they go straight from optional to *necessary*.

This keyboard shortcut allows you to execute code within a script. It will either execute the line of code where your cursor…

Graphs are an excellent way to gain a deeper understanding of large systems of information as they provide us with a flexible and intuitive way to generate insight through visualizing the relationships within the data. In this tutorial, we’ll focus specifically on undirected graphs. Both Facebook and LinkedIn connections can be illustrated with undirected graphs because a connection between two people always goes in both directions. Such is the case of the reciprocal nature of these websites (friendships must be mutual, invitations must be accepted, etc.), and unlike platforms such as Twitter where you can follow someone but they don’t…

Hi! I’m Brooke Bradley and I study data science in the biomedical field. Visit thatdarndata.com for more!