I am reposting this since people wanted a little more info and my question was closed, here is an example of what the output looks like, just typical tabular .txt stuff:
asdfsdf sdfsadf sdfsdf 92 83 sdfsdf ewrwef dsruh 32 42 sjgho uhiu uhgkuh 91 21
In the above, I am trying to just remove all entries where after the third tab it is below 80, and after the 4th it is below 70. So the 4th and 5th columns if viewed in excel must be above 80 and 70 respectively. In this case, only the first row should remain.
I am trying to parse a tabular text file generated by Blastp using awk. Previously I have used this somewhat ugly code, because it worked, to go through to the right columns and cull out values below what I wanted.
#!/bin/bash #$ -cwd #$ -pe mpi 16 awk '$4 > 80.0' blastoutput.txt > StepOne.txt awk '$5 > 70.0' StepOne.txt > Culled.txt
Using it on a new blast result however, the file sizes remain at 300k kb with only a slight decrease on step one, and none for two. My best guess is that it is only recognizing a single line from the whole blast output file, and therefore not removing more. I would think maybe it had something to do with Unix/Windows line ends not being recognized as I saw on other answers, but the thing is I haven’t changed the way I’ve generated the blast results and it was working before, so I don’t know why it would all of a sudden change the way tabular results are created.
I’ve also tried using some parsing options I saw in other answers like the following:
perl -lane 'print $_ if ($F >80.0)' blastp_output_8_26.txt > StepOne.txt
but the results seem to be the same.
Does anyone know what I could do to the blastp output file to make it work with my code? I am convinced something is amiss there, but all my attempts to fix it so far have been for naught.
Go to Source