Shell scripting and the Pipe

Pipe may be the most useful tool in your shell scripting toolbox. It is one of the most used, but also, one of the most misunderstood. As a result, it is often overused or misused. This should help you use a pipe correctly and hopefully make your shell scripts much faster and more efficient.

Most Unix and Linux engineers believe pipe works like this. Take the output from one command and use it as the input of another. This is a massive oversimplification of what’s going on to the point of it really being a false statement. This is what this statement has lead people to think what is happening:

First command produces output
pipe receives output from first command and pushes it directly to input of second command
Second command receives input

But this is more accurate:

First command produces output (application resource)
Pipe is called (system resource)
Pipe creates memory (or disk) segment in FIFO format using SYSTEM memory
Pipe writes block to FIFO segment
Second command is initialized and told to set up application memory segment
Pipe pushes FIFO block to second command memory segment (repeated until all blocks are complete)
Once all blocks are complete, memory segment is torn down
First command is torn down
Second command is released to run with input

Notice, pipe does not put the output from the first command directly into the second. It builds an entire memory segment in system memory to read and write a block at a time. If you think about it, this makes perfect sense from a data protection aspect but isn’t something you would think about when using the command. Now consider these steps with something like:

grep A | grep B | grep C

I think all of us have seen someone do this. Every one of those pipes creates and tears down a system memory segment. It also tears down and creates a new version of grep each time. If you think this isn’t used, go and look through your /etc directory. Every Linux I’ve come across does something similar within their own system startup scripts, Now, compare it to the number of steps for something like this:

grep "A.*B.*C"

# this searches exact order, look up methods for searching any order

In this example, grep is only opened once, there is no building or tearing down of multiple commands or system memory segments. Now consider what happens when you do this in some of the massive loops you build.

So what do you do? This doesn’t really change how you do things on the command line as the resource hit doesn’t effect it as much. However some of your larger scripts you may want to reconsider now. You can’t stop using pipes as it may be the single most powerful tool in shell scripting, allowing nearly any command to work with nearly any other command. Here are some tips!!

When you use a pipe, think about what you’re doing with it, how it’s working, how many times you will be calling it.
Consider how you can condense your commands. Many commands have functions which allow you to do what you need without passing the data to something else. Perhaps you need to use egrep instead of grep, or gawk instead of awk?
Parse your data BEFORE you pass it to a loop or list. Perhaps pass the data through sed or awk first, then push it to the loop. When it enters the loop, it should be formatted exactly as you want it
Know how to manipulate your variables. Bash 4 added some great variable manipulation techniques.

So what happens when you’re exhausted of ideas and you HAVE to use a pipe in that giant loop? Don’t worry, shell scripting can help with that. It’s time for the named pipe!!

What if you could create the data segment and use it over and over again until you were done, THEN tear it down? Well, you can. In steps the named pipe. Before you learn how to do this, please note, it does NOT replace the techniques I’ve mentioned before. Go this route only if you have exhausted your list of other efficiency techniques.

mkfifo my_pipe
cat my_pipe &
echo "blah blah blah" > my_pipe
# There is a lot going on here. Too much for this short tutorial.
# Please go and look up how to use a named pipe before you try this.

It’s difficult to see the usage of a named pipe within the command line (but this example will give you a basic idea of how it works) It’s much more useful to use within a shell script… That however, may be for a different post.

Leave a Reply Cancel reply