1 Introduction to Linux and Servers
Author: Lizel Potgieter, adapted by Amrei Binzer-Panchal
Linux is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds (https://en.wikipedia.org/wiki/Linux). Most servers run on a Linux-based operating system.
If you have no, or not much, experience with working with the command line please take some time to follwo the software carpentry course on the Unix shell.
If you have some experience with the command line you can have a look at the commands below (Section 2 and onwards) to refresh you knowledge.
Either way, to make sure that you are on an adequate level of proficiency jump to the last part of this page and take the Linux Exercise Quiz (Section 10) there.
And last but not least, here are some other resources with many other cool tips and tricks for all of your bioinformatics needs. For the full cheat sheets and other commands, please see:
2 Basic Structure of Commands
cmd refers to a command. Input of cmd from file
cmd < fileOutput of cmd2 as file input to cmd1
cmd1 <(cmd2)Standard output (stdout) of cmd to file
cmd > fileAppend stdout to file
cmd >> filestdout of cmd1 to cmd2
cmd1 | cmd2Run cmd1 then cmd2
cmd1 ; cmd2Run cmd2 if cmd1 is successful
cmd1 && cmd2Run cmd2 if cmd1 is not successful
cmd1 || cmd23 Short commands
Stop current command
CTRL-cGo to start of line
CTRL-aGo to end of line
CTRL-eCut from start of line
CTRL-uCut to end of line
CTRL-kSearch history
CTRL-r Run previous command, replacing abc with 123
^abc^1234 Grep commands
Case insensitive search
grep -iRecursive search
grep -rInverted search
grep -vShow matched part of file only
grep -o5 File-based commands
Create file1
touch file1Concatenate files and output
cat file1 file2View and paginate file1
less file1Get type of file1
file file1Copy file1 to file2
cp file1 file2Move file1 to file2
mv file1 file2Delete file1
rm file1Show first 10 lines of file1
head file1Show first 50 lines of file1
head -n 50 file1Show last 10 lines of file1
tail file1Output last lines of file1 as it changes
tail -F file16 Replacing patterns with other patterns with sed
Replacing a pattern and writing to a new file (use this until you are certain you know what you are doing)
sed "s/foo/bar/g" $infile > $outfileReplacing a pattern in the same file (there is no going back)
sed -i "s/foo/bar/g" $infileReplacing a pattern in a line that contains a string (here just foo)
sed -i "/foo/s/bar/foobar/g" $infile7 Some Useful Commands for Bioinformatics
Count the entries in a fasta file. You can substitute the header (>) for any pattern to count the number of occurrences in your file
grep ">" $infile | wc -l8 File manipulation with awk
Print columns 2, 4, and 5 to new file
awk '{print $2,$4,$5}' input.txt > outfilePrint columns where value in column 3 is larger than in column 5
awk '$3>$5' file.txtPrint sum of column 1
awk '{sum+=$1} END {print sum}' file.txtCompute the mean of column 2
awk '{x+=$2}END{print x/NR}' file.txtRemove duplicates while keeping the order of the file
awk '!visited[$0]++' file.txtSplit multi-fasta into individual fasta files
awk '/^>/{s=++d".fa"} {print > s}' multi.faLength of each sequence in a multi-fasta file
awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' file.faSort VCF with header
cat my.vcf | awk '$0~"^#" { print $0; next } { print $0 | "sort -k1,1V -k2,2n" }'9 A basic for loop
Often we wish to run the same code for all files that are in a folder, have the same extension (like .fq), or have a similar string in the filename. Instead of changing the name in the code and rerunning it manually, we use for loops. You can write this directly into the terminal, or save it into a bash file (extension .sh) This line of code uses i as the variable for all files that have a .fq extension in the folder, and runs fastqc for each of them. The -o ${i}_fastqc indicates that the original file name will be kept, and appended with _fastqc.
for i in *.fq ; do fastqc ${i} -o ${i}_fastqc ; done10 Linux Exercise Quiz
Please try to complete each task without looking at the answer first.
- Make a folder in the proj folder with your name
solution:
mkdir your_name- Navigate to your folder
solution:
cd yourname- Create an empty file
solution:
touch randomfile- Rename randomfile
solution:
mv randomfile randomfile2- Delete random file
solution:
rm randomfile2- Create a directory
solution:
mkdir randomdir- Delete the directory
solution:
rm -r randomdir- Create a symbolic link (symlink) from the source data to your own folder. Please do not copy it to your own directories! There will be a new folder for each subsection of the workshop. This example is only for the fastq files we will use for read mapping
solution:
ln -s /1_fastqc/*fq- Listing the contents of your directory. The symlinks should have a different colour from than white
solution:
ls- Load the bwa module on the server
solution:
module load bwa/0.7.4