Bug #396
Long sequences take long to analyse
| Status: | Closed | Start: | 06/08/2010 | |
|---|---|---|---|---|
| Priority: | Low | Due date: | ||
| Assigned to: | Stijn Imbrechts | % Done: | 100% |
|
| Category: | - | Spent time: | - | |
| Target version: | - |
Description
REPORTER:
PLATFORM/OS:
Windows XP SP2, IE8
COMPONENT:
Viral isolates
SUMMARY:
RegaDB takes a bit too long to align large sequences. My quick tests indicate roughly 2 minutes for 8000 base pairs
REPRODUCE:
Click patient 74921073 >> click Viral Isolates >> click Add >> capture dummy data in the required fields and upload the attachment >> click OK >> Click Proteins tab (please delete the sequence if you used the live database! – it doesn’t belong to this patient)
ACTUAL RESULT:
RegaDB shows “Aligning!...” for roughly 2 minutes before the amino acid list appears
EXPECTED RESULT:
Is it technically possible for the Amino acid list to generate a little quicker?
NOTE:
Reported by Kristel, verified by Fossie. On Kristel’s PC it takes a whole lot longer to align. When we start using RegaDB for HCV data where sequences tend to be longer, it would be nice if the analysis runs as fast as technically possible.
History
Updated by Stijn Imbrechts almost 2 years ago
- % Done changed from 0 to 70
We have been developing and testing a C++ version of the alignment procedure which will run as a webservice on our own server. This will increase the speed considerably because this version is just plain faster but it will run on faster hardware as well.
I'll increase the amount of RAM the GHB installation is allowed to use, that could speed things up a little.
Updated by Pieter Libin almost 2 years ago
good opportunity to finish this job,
I'll finalize the test script we need to use to do the final testing
Updated by Stijn Imbrechts almost 2 years ago
working on this right now, but this requires a lot of testing
I'm confident the alignment is now consistently better than the current implementation, but there could still be problems in some rare, exceptional cases. eta 2-3 weeks
Updated by Fossie Ferreira almost 2 years ago
This was fixed along with the codon deletion/insertion thing, wasn't it?
Updated by Stijn Imbrechts almost 2 years ago
No, that was a visualization error which had no impact on performance.
Updated by Fossie Ferreira over 1 year ago
Will this be done in time for the Nov 5th meeting? I'm thinking it may be a good time to brief us on the pros and cons of the new development.
Updated by Stijn Imbrechts over 1 year ago
- Status changed from New to Feedback
forgot to mention this, but since last week the UZ installation has been using C++ alignment
so the meeting is a good time to get some feedback about this
Updated by Pieter Libin over 1 year ago
can we also run the tests on the final result, as me and gertjan prepared?
the c++ alignment, is this the default alignment as it was available in sequencetool, or did you already implement a new alignment algorithm?
Updated by Stijn Imbrechts over 1 year ago
it's the default needleman-wunsch alignment
libseq:regadb
sequencetool:regadb_align
regadb:test_cpp_align
Updated by Fossie Ferreira over 1 year ago
user feedback from meeting: "feedback from Yoeri: works well, according to Kristel still problems with full-genomes (program gets stuck)"
Updated by Fossie Ferreira over 1 year ago
- File Ref.fasta added
To test it, I added a dummy patient with ID 99, uploaded the FASTA file "ref" (attached here) and let it run. The sequence is 8667 base pairs. The alignment was still running after 10 minutes.
Updated by Stijn Imbrechts over 1 year ago
I think this was due to a problem with the server that hosts the alignment webservice. There was no hard disk space left due to temporary files created by another webservice. Have set up a cron job which checks every 10min if disk usage is above 95% and if so, removes these files.
Updated by Fossie Ferreira over 1 year ago
I tested that same sequence again from the update 15. I stopped counting after 4 minutes. How long is it supposed to take?
Updated by Stijn Imbrechts 10 months ago
- Status changed from Feedback to Resolved
Updated by Fossie Ferreira 10 months ago
- Status changed from Resolved to Closed
9007 basepair sequence took less that 10 seconds to align. Fixed confirmed, thanks.