Ruby on Medicine: Hunting For The Gene Sequence

Posted: April 3, 2015 at 5:42 am

Previous articles in this series focused on handling very large text files. At some point, you may be interested in searching for a specific pattern in those large files. Manually searching through a large text file is a non-starter, so leveraging the incredible tools of the developers trade is where we turn for help in todays article.

Regular expressions (Regex) are built for this task. They are encoded text strings focused on matching and manipulating patterns in the text. They were born into our world in the 1970s. They are extremely useful and considered the key to powerful text processing.

To be more precise, a regular expression is a string that contains a combination of normal characters and special metacharacters. The normal characters are present to match themselves. On the other hand, the metacharacters represent ideas such as quantity and location of characters.

Regex is a language in and of itself, with special syntax and instructions to implement. It can be used with programming languages, like Ruby, to accomplish different tasks, such as:

These are just a few of the example tasks that are possible. Such tasks can range in complexity from a simple text editors search command to a powerful text processing language.

The bottom line is that you, as a Ruby programmer, will be armed with a very versatile tool that can be used to perform all sorts of text processing tasks.

The example today will focus on the main types of tasks regex performs: Search (locate text) and Replace (edit located text).

Regex comes in handy when searching text, especially when the text is not a straightforward match. As we mentioned above, you may be interested in finding the text ==ant==. This is simple. But when the location of ==ant== matters, such that you want ant but not want, regex is perfect.

Replacing in regex is a power on itself to be added to the search capability of regex. An example when replacing may be needed is when you want to replace extracted (searched) URLs with clickable URLs, that is, a URL having the HTML href attribute.

Lets do some simple examples with regex to warm up. You can use these tables as a reference for some of the metacharacters well use. Also, as a way to test your regex, use Rubular, an online Ruby-based regular expression editor for testing regular expressions.

See the original post:
Ruby on Medicine: Hunting For The Gene Sequence

Related Posts