Bug #396

Long sequences take long to analyse

Added by Fossie Ferreira almost 2 years ago. Updated 10 months ago.

Status:Closed Start:06/08/2010
Priority:Low Due date:
Assigned to:Stijn Imbrechts % Done:

100%

Category:- Spent time: -
Target version:-

Description

REPORTER:

PLATFORM/OS:
Windows XP SP2, IE8

COMPONENT:
Viral isolates

SUMMARY:
RegaDB takes a bit too long to align large sequences. My quick tests indicate roughly 2 minutes for 8000 base pairs

REPRODUCE:
Click patient 74921073 >> click Viral Isolates >> click Add >> capture dummy data in the required fields and upload the attachment >> click OK >> Click Proteins tab (please delete the sequence if you used the live database! – it doesn’t belong to this patient)

ACTUAL RESULT:
RegaDB shows “Aligning!...” for roughly 2 minutes before the amino acid list appears

EXPECTED RESULT:
Is it technically possible for the Amino acid list to generate a little quicker?

NOTE:
Reported by Kristel, verified by Fossie. On Kristel’s PC it takes a whole lot longer to align. When we start using RegaDB for HCV data where sequences tend to be longer, it would be nice if the analysis runs as fast as technically possible.

AR01-019_consensus_2.fas - The sequence that takes 2 minutes (8.7 KB) Fossie Ferreira, 06/08/2010 08:07 am

Ref.fasta (8.9 KB) Fossie Ferreira, 12/08/2010 02:52 pm

History

Updated by Stijn Imbrechts almost 2 years ago

  • % Done changed from 0 to 70

We have been developing and testing a C++ version of the alignment procedure which will run as a webservice on our own server. This will increase the speed considerably because this version is just plain faster but it will run on faster hardware as well.

I'll increase the amount of RAM the GHB installation is allowed to use, that could speed things up a little.

Updated by Fossie Ferreira almost 2 years ago

Thanks. I've informed Kristel that it's on the way.

Updated by Pieter Libin almost 2 years ago

good opportunity to finish this job,
I'll finalize the test script we need to use to do the final testing

Updated by Fossie Ferreira almost 2 years ago

How long still before it's done?

Updated by Stijn Imbrechts almost 2 years ago

working on this right now, but this requires a lot of testing
I'm confident the alignment is now consistently better than the current implementation, but there could still be problems in some rare, exceptional cases. eta 2-3 weeks

Updated by Fossie Ferreira almost 2 years ago

This was fixed along with the codon deletion/insertion thing, wasn't it?

Updated by Stijn Imbrechts almost 2 years ago

No, that was a visualization error which had no impact on performance.

Updated by Fossie Ferreira over 1 year ago

New ETA?

Updated by Fossie Ferreira over 1 year ago

Will this be done in time for the Nov 5th meeting? I'm thinking it may be a good time to brief us on the pros and cons of the new development.

Updated by Stijn Imbrechts over 1 year ago

  • Status changed from New to Feedback

forgot to mention this, but since last week the UZ installation has been using C++ alignment
so the meeting is a good time to get some feedback about this

Updated by Pieter Libin over 1 year ago

can we also run the tests on the final result, as me and gertjan prepared?

the c++ alignment, is this the default alignment as it was available in sequencetool, or did you already implement a new alignment algorithm?

Updated by Stijn Imbrechts over 1 year ago

it's the default needleman-wunsch alignment
libseq:regadb
sequencetool:regadb_align
regadb:test_cpp_align

Updated by Fossie Ferreira over 1 year ago

user feedback from meeting: "feedback from Yoeri: works well, according to Kristel still problems with full-genomes (program gets stuck)"

Updated by Stijn Imbrechts over 1 year ago

  • % Done changed from 70 to 100

should be fixed

Updated by Fossie Ferreira over 1 year ago

To test it, I added a dummy patient with ID 99, uploaded the FASTA file "ref" (attached here) and let it run. The sequence is 8667 base pairs. The alignment was still running after 10 minutes.

Updated by Stijn Imbrechts over 1 year ago

I think this was due to a problem with the server that hosts the alignment webservice. There was no hard disk space left due to temporary files created by another webservice. Have set up a cron job which checks every 10min if disk usage is above 95% and if so, removes these files.

Updated by Fossie Ferreira over 1 year ago

I tested that same sequence again from the update 15. I stopped counting after 4 minutes. How long is it supposed to take?

Updated by Stijn Imbrechts 10 months ago

  • Status changed from Feedback to Resolved

Updated by Fossie Ferreira 10 months ago

  • Status changed from Resolved to Closed

9007 basepair sequence took less that 10 seconds to align. Fixed confirmed, thanks.

Also available in: Atom PDF