1 Introduction to Linux and Servers
Author: Lizel Potgieter, adapted by Amrei Binzer-Panchal
Linux is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds (https://en.wikipedia.org/wiki/Linux). Most servers run on a Linux-based operating system.
If you have no, or not much, experience with working with the command line please take some time to follwo the software carpentry course on the Unix shell.
If you have some experience with the command line you can have a look at the commands below (Section 2 and onwards) to refresh you knowledge.
Either way, to make sure that you are on an adequate level of proficiency jump to the last part of this page and take the Linux Exercise Quiz (Section 10) there.
And last but not least, here are some other resources with many other cool tips and tricks for all of your bioinformatics needs. For the full cheat sheets and other commands, please see:
2 Basic Structure of Commands
cmd refers to a command. Input of cmd from file
cmd < file
Output of cmd2 as file input to cmd1
cmd1 <(cmd2)
Standard output (stdout) of cmd to file
cmd > file
Append stdout to file
cmd >> file
stdout of cmd1 to cmd2
cmd1 | cmd2
Run cmd1 then cmd2
cmd1 ; cmd2
Run cmd2 if cmd1 is successful
cmd1 && cmd2
Run cmd2 if cmd1 is not successful
cmd1 || cmd2
3 Short commands
Stop current command
CTRL-c
Go to start of line
CTRL-a
Go to end of line
CTRL-e
Cut from start of line
CTRL-u
Cut to end of line
CTRL-k
Search history
CTRL-r
Run previous command, replacing abc with 123
^abc^123
4 Grep commands
Case insensitive search
grep -i
Recursive search
grep -r
Inverted search
grep -v
Show matched part of file only
grep -o
5 File-based commands
Create file1
touch file1
Concatenate files and output
cat file1 file2
View and paginate file1
less file1
Get type of file1
file file1
Copy file1 to file2
cp file1 file2
Move file1 to file2
mv file1 file2
Delete file1
rm file1
Show first 10 lines of file1
head file1
Show first 50 lines of file1
head -n 50 file1
Show last 10 lines of file1
tail file1
Output last lines of file1 as it changes
tail -F file1
6 Replacing patterns with other patterns with sed
Replacing a pattern and writing to a new file (use this until you are certain you know what you are doing)
sed "s/foo/bar/g" $infile > $outfile
Replacing a pattern in the same file (there is no going back)
sed -i "s/foo/bar/g" $infile
Replacing a pattern in a line that contains a string (here just foo)
sed -i "/foo/s/bar/foobar/g" $infile
7 Some Useful Commands for Bioinformatics
Count the entries in a fasta file. You can substitute the header (>) for any pattern to count the number of occurrences in your file
grep ">" $infile | wc -l
8 File manipulation with awk
Print columns 2, 4, and 5 to new file
awk '{print $2,$4,$5}' input.txt > outfile
Print columns where value in column 3 is larger than in column 5
awk '$3>$5' file.txt
Print sum of column 1
awk '{sum+=$1} END {print sum}' file.txt
Compute the mean of column 2
awk '{x+=$2}END{print x/NR}' file.txt
Remove duplicates while keeping the order of the file
awk '!visited[$0]++' file.txt
Split multi-fasta into individual fasta files
awk '/^>/{s=++d".fa"} {print > s}' multi.fa
Length of each sequence in a multi-fasta file
awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' file.fa
Sort VCF with header
cat my.vcf | awk '$0~"^#" { print $0; next } { print $0 | "sort -k1,1V -k2,2n" }'
9 A basic for loop
Often we wish to run the same code for all files that are in a folder, have the same extension (like .fq), or have a similar string in the filename. Instead of changing the name in the code and rerunning it manually, we use for loops. You can write this directly into the terminal, or save it into a bash file (extension .sh) This line of code uses i as the variable for all files that have a .fq extension in the folder, and runs fastqc for each of them. The -o ${i}_fastqc indicates that the original file name will be kept, and appended with _fastqc.
for i in *.fq ; do fastqc ${i} -o ${i}_fastqc ; done
10 Linux Exercise Quiz
Please try to complete each task without looking at the answer first.
- Make a folder in the proj folder with your name
solution:
mkdir your_name
- Navigate to your folder
solution:
cd yourname
- Create an empty file
solution:
touch randomfile
- Rename randomfile
solution:
mv randomfile randomfile2
- Delete random file
solution:
rm randomfile2
- Create a directory
solution:
mkdir randomdir
- Delete the directory
solution:
rm -r randomdir
- Create a symbolic link (symlink) from the source data to your own folder. Please do not copy it to your own directories! There will be a new folder for each subsection of the workshop. This example is only for the fastq files we will use for read mapping
solution:
ln -s /1_fastqc/*fq
- Listing the contents of your directory. The symlinks should have a different colour from than white
solution:
ls
- Load the bwa module on the server
solution:
module load bwa/0.7.4