Skip to content
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions benchmarks/benchmarks/traj_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@
except ImportError:
pass

try:
import MDAnalysis as mda
from MDAnalysisTests.datafiles import PDB
except ImportError:
pass

traj_dict = {
"XTC": [XTC, XTCReader],
"TRR": [TRR, TRRReader],
Expand Down Expand Up @@ -71,3 +77,23 @@ def time_strides(self, traj_format):
"""
for ts in self.reader_object:
pass


class PDBReaderBench(object):
"""Benchmarks for PDB file format reading and parsing"""

units = 'ms'
timeout = 60.0
params = [10, 100, 500]
param_names = ['n_frames']

def setup(self, n_frames):
self.u = mda.Universe(PDB)

def time_iterate(self, n_frames):
for _ in range(n_frames):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm mistaken, but won't this nested loop do something weird?

  • for each integer in n_frames range
  • loop over the entire trajectory--all frames, since u.trajectory isn't sliced

Now, it may be the case that after the first outer loop iteration there's no rewind so you don't redo the full iteration each time, but I'm pretty sure the way this is expressed isn't quite right yet.

Even if you fix that, I'm not entirely certain this is benchmarking what we want on the "reading" side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the standard PDB file only contains 1 frame, but for a better approach i could use PDB_multiframe (which is an NMR) and MDAnalysis treats all the 3d models in this file as a different frame, which removes the need for the outer loop totally, So @tylerjereddy if this approach seems to be better, shall I go ahead?

Copy link
Copy Markdown
Contributor Author

@Dreamstick9 Dreamstick9 Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change and the benchmarks for this updated reader come out to be 5.13 ms

Screenshot 2026-06-04 at 10 00 42 PM

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added time_read to separately benchmark the initial parse, let me know if this approach is the right direction to move forward with.

for ts in self.u.trajectory:
_ = ts.positions



1 change: 1 addition & 0 deletions package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ The rules for this file:
* 2.11.0

Fixes
* Added ASV benchmark for PDB trajectory reading (PR #1234)
* `MDAnalysis.analysis.nucleicacids.WatsonCrickDist`, `MinorPairDist`,
and `MajorPairDist` now match residue names against the full resname
instead of only the first character, fixing incorrect behaviour with
Expand Down
Loading