Reason Screencast on Vimeo

We're trying out something new with yt. Cameron made a screencast introducing a new major feature in the latest release. It's been viewed over 30 times since announcement about a day and a half ago -- this is a success! I think we'll do this again for other major features.

Tagged yt

Improvements to Parallelism

The last few days I've spent some time looking at how parallelism in yt performs.  I conducted two different tests, both of which operated on the 512^3, 7 level "Santa Fe Light Cone" dataset RD0036.  This dataset has 5.5e8 total cells and in the neighborhood of 380,000 grids.  I ran four different tests: a 1D profile of the entire dataset, a 2D profile of the entire dataset, and projections of both "Density" (requires IO) and "Ones" (doesn't require IO).  For the purposes of these tests I have excised any time spent in creating the hierarchy and affiliated grid objects.  The results are shown just here:
 
Output

One thing I found rather astonishing about this is how poorly the 1D profile operates.  It's not incomprehensibly slow, but it operates much slower than the 2D profile.  I investigated why, and I believe that it comes down to a fundamental difference in the implementation.  The 2D profile's binning algorithm is implemented in C, whereas the binning algorithm in the 1D profile is actually implemented in Python, such that it actually scales quite poorly with the number of bins.  Since I was taking 128 bin profile, this enhanced the problem substantially.  Regardless, while a new 1D profiling implementation probably won't be written for yt-2.0 (as it's not so slow as to be prohibitively costly) it will be replaced in the future.

I was, however, quite pleased to see the parallelism scale so nicely for the profiles.  Projections had a bit of a different story, in that their strong scaling saturated at around 64 processors; I believe this is probably due simply to the projection algorithm's cost/benefit ratio.  I would point out that in my tests there was some noise in the high-end -- in one of the runs, projecting on 128 processors took a mere 6 seconds.  Then again, since you only need to project once anyway (and then spend quite a while writing a paper!) the time difference between 30 seconds and 6 seconds is pretty much a wash.

Now, before I get too enthusiastic about this, I did find a couple very disturbing problems with the parallelism in yt.  Both were within the derived quantities -- these are quantities that hang off an object, like sphere.quantities["Extrema"]("Density") and so on.  (In fact, that's the one that brought the problem to light.)  I was performing a call to "Extrema" before conducting the profiling step and I saw that it took orders of magnitude more time than the profiling itself!  But why should this be, when they both touch the disk roughly the same amount?  So I dug in a bit further.

Derived quantities are calculated in two steps.  The first one is to calculate reduced products on every grid object: this would be the min/max for a given grid, for the above example of "Extrema".  All of the intermediate results then get fed into a single "combine" operation, which calculates the final result.  This enables better parallelism as well as grid-by-grid calculation -- much easier on the memory!

Now, in parallel we usually preload all the data that is going to be accessed by a given task.  For instance, if we're calculating the extrema of density, each MPI task will read from disk all the densities in the grids it's going to encounter.  (In serial this is much less wise, and so we avoid it -- although batch-preloading could be a useful enhancement in the future.)  However, what I discovered was that in fact the entire process of preloading was disabled by default.  So instead of conducting maybe 10-30 IO operations, each processor was conducting more like 10,000.  This was why it was so slow!  I've since changed the default for parallel runs to be to indeed preload data.

The other issue was with the communication.  Derived quantities were still using a very old, primitive version of parallelism that I implemented more than two years ago.  In this primitive version, the root processor would turn-by-turn communicate with each of the other processors, and then finally broadcast a result back to all of them.  I have since changed this to be an Allgather operation, which enables collective broadcasting.  This sped things up by a factor of 4, although I am not sure that in the long run this will be wise at high processor counts.  (As a side note, in converting the derived quantities to Allgather, I also converted projections to use Allgatherv instead of Alltoallv.  This should dramatically improve communication in projections, but communication was never the weak spot in projections anyway.)

There are still many places parallelism could be improved, but with just these two changes to the derived quantity parallelism, I see speeds that are much more in line with the amount of IO and processing that occurs.
Tagged mpi parallelism yt

Spatial indexing with Quadtrees and Hilbert Curves - Nick's Blog

Spatial indexing is increasingly important as more and more data and applications are geospatially-enabled. Efficiently querying geospatial data, however, is a considerable challenge: because the data is two-dimensional (or sometimes, more), you can't use standard indexing techniques to query on position. Spatial indexes solve this through a variety of techniques. In this post, we'll cover several - quadtrees, geohashes (not to be confused with geohashing), and space-filling curves - and reveal how they're all interrelated.

Includes a handy Python function for converting to a Hilbert curve.

Tagged python yt

A GUI revisited, once more

A week before I left on my vacation, I spent a week running a conference on how to use our AMR code, Enzo.  During this conference, the idea of a GUI for yt came up again, and my old thesis advisor (the head of a project of his own that's a GUI for analyzing Enzo data!) suggested I scale back my ambitions a bit and try to just replicate the GUI he has created, one step at a time -- and his enthusiasm about this was pretty contagious.  So I spent a day doing that, and then some isolated work on planes afterward, and I ended up with a very simple Tkinter GUI that can open up multiple slices and look at multiple outputs at a time.

Fisheye_0001

I'm calling it Fisheye, and so far it's still very small -- but I've been really enjoying browsing data and using it for very, very rapid exploration.  The other GUIs I've posted about in the past have had their place (and they've all also been very small) and this actually draws on all bits and pieces that were developed for those GUIs!  I'm not sure how long I'll keep playing with this on planes and buses, but it's been fun.

The part about this particular pass at it, and it really is only a little pass at it since it's under 400 lines of code, is that it uses Tkinter for the GUI.  I've been polling the other yt devs to see where Tkinter is available and I've been pleasantly surprised that it has pretty good penetration at supercomputing centers.  So maybe there's something to it.

I know I feel a bit better about it than anything else I've tried -- because it requires no additional dependencies, and it's pretty straightforward to write.
Tagged python yt

Remote Panner

Last night, using 0MQ I set up the image panner to use a remote renderer.  This let me start up a REP/REQ system, where on the remote system (bound to localhost and ssh tunneled) I waited for new requests that describe the limits I wanted to plot, and locally I had the image panning GUI send the new requests and receive the new image.  It worked great -- very easy to set up -- except that the arrays it was passing were 2MB each, so the latency to my home network was unbearably high.  Today I'll give it a shot at work, see if it's a workable solution.
Tagged python scipy yt

Figuring Out Stereo Volume Rendering

Last week I was approached by a friend and collaborator to prepare some large volume renderings using the aforementioned software volume renderer in yt.  In the past we've successfully made very, very large image renderings using yt -- Sam's even made one at 8192^2, although at extremely high resolution like that sometimes the lack of fidelity in the underlying volume renderer shows up; sometimes even artifacts in the AMR grid boundaries, but that's less common.  Making the very large volume renderings isn't too bad -- it scales roughly with the number of pixels, but we can dispatch many frames to be rendered at once on a cluster.  

There are a couple other, more important things to consider when making the big volume renderings.  For starters, the entire structure of volume rendering in yt was not really created to generate a series of images -- only a single image.  The idea was that you would prepare a specific image, make it, and move on.  However, for this project, I want to do a zoomin, or possibly a more complicated camera path.

Additionally, one of the first things that we did with the volume rendering was silly: we applied no normalization to the output images.  That was a mistake, I see now.  Part of the reason for this was uncertainty in the correct normalization -- the bias that the user wanted to apply may not be the natural bias from the image.  But more than that, because the rendering algorithm itself was some what holistically settled upon (the original implementation, which we used for shell-style renderings, was not a "correct" implementation of alpha blending) a natural mechanism for scaling did not immediately present itself.  One likely exists, possibly dependent on the field of view, I simply do not yet know it.  This will have to be rectified, because the mechanism used for scaling a set of images will have to be different than the mechanism for scaling an image in isolation, or else frames will jump in brightness during the movie's course.

The final thing that I wanted to change was to add support for stereo rendering.  Rather than repeat any of the amazing discussion from Paul Bourke's website, I'll simply direct you there.  Everything you ever wanted to know about stereo rendering.  (When I was a first year grad student, we actually bought a copy of his site to use locally -- it was our way of showing support for him putting it online, and it also came with a bunch of source code for example applications.)  I first attempted to apply the correct method for stereo, where the view direction is parallel and the total view frustum is shifted.

This did not work.  In fact, it made me realize that all this time, the yt method for volume rendering is in fact ... not really a volume rendering method inasmuch as it is a planar ray-casting method.  Typically when doing volume rendering, there's a perspective applied to the image: the rays all emanate from a single place, creating a frustum.  But for yt, we actually set up a single plane of vectors at the back of the volume and advance that forward across the image.  This is good and bad; it's good in the sense that it's more clear precisely what is going on.  But it's bad in the sense that correct stereo is more difficult.  (Of course, on Bourke's page he has a workaround that may work for this, but I have not yet attempted it.)  Here's a rough depiction of the different between the two methods.

Renderingmechanisminyt

The upshot is that stereo doesn't seem to work unless you go with the "toe-in" method that can cause eyestrain after a long time and shows visible parallax at the edges.  I'm not sure if this is going to be a problem, but because I am not right now eager to rewrite the rendering backend, this is the way it is for the moment.

To set up the stereo rendering, I separated out the rendering mechanism from the objects to be rendered.  Previously, there was a single VolumeRendering object that you could create, raycast through, then discard.  I created a new camera object that accepted a homogenized volume and would call "traverse" on that volume, feeding a back and a front point.  The Volume is then responsible for passing off fixed-resolution grids to the camera, which accumulates an image buffer by calling the ray traversal functions.  The front and back points are essentially the only thing needed to know this order, but the camera also stores its three orientation vectors and its position that describe it in 3D space.  By separating out these two conceptual objects, we undo some of the "single, carefully constructed image" bias that was in the original volume renderer.  (And, we open ourselves up to being able to use the Camera with a hardware volume renderer, should that day ever come.)

So now we have a camera, and it makes images like this:

C_0001

It's a little dim, but that's a task for another day.  The next step is taking that perspective and turning it into a set of stereo images.  To do that, I added a new class called StereoPairCamera.  It accepts a Camera object and turns it into two camera objects, where the final interocular distance is calculated relative to the image plane width.  As I mentioned above, this only operates via toe-in stereo, so it does this in the simplest manner possible: it moves each of the Left and Right cameras by half of the interocular distance away from the original location, and then recalculates a normal vector to point back at the original center.  Now we can generate left and right images:
Unfortunately, on my laptop (which is my primary/exclusive work computer) I don't have the ability to view these pairs.  To get around that, I wrote a simple stereo pair image viewer in OpenGL and imposed upon my friends that do have a stereo viz wall to test it out -- and after some fiddling with the interocular distance, we got what appeared to be workable stereo pairs.

The full code for generating camera paths as well as stereo pairs is already in yt, but the documentation is still being written; I might also clean up the interface a bit.  Additionally, at some point in the future, the issue of toe-in stereo versus correct parallel-frustum stereo will need to be dealt with; the last thing I really want to do is force people to only use a bad method for generating stereo pairs.  Hopefully that is something that can be dealt with at a later time.

Thanks to the wealth of resources out there for making this a relatively easy task: the aforementioned Paul Bourke website on stereo pairs, the PyOpenGL and PIL teams for making the image pair viewer easy, and everyone else whose work I've built on to make things like this.

L_0001

R_0001

Tagged python scipy viz yt

GSoC folks, read this: (Re: Lightweight copies/renames)

The real challenge of GSoC is not coding, it's learning to work with a community. And the first lesson is: communicate in public

An amazing discussion not only of Google Summer of Code, but of Open Source development in general by Matt Mackall, kernel hacker and project lead of mercurial. Developers on the yt project are in general pretty good about this kind of thing, but there are some aspects of the development process (and I am a culprit here too) that could use improvement along these lines.

Tagged code opensource yt

Bootstrapping into a GUI

I always resisted making a GUI for my analysis toolkit.  There, I said it.  I've come clean.  I don't really want to make one, and I like the scripting interface just fine.  But, as my datasets get larger, and I need to explore them faster, I'm starting to see their usefulness quite a bit more -- as I talked about a bit a few weeks ago.

A couple years ago I wrote one based on wxPython -- it had a system of events and notifications and so on, but it was brittle and had a lot of moving parts.  And I never really used it, which I think might have been the most important part.  So after I wrote it, it kind of underwent bitrot.

But now that I'm using Traits, I'm starting to feel like the right way to write a GUI is in bits.  Right now, I have a VTK bit that does one thing.  I've also got an image pan-and-scan widget that does a thing.  I'm not really interested right now in setting up a big framework that handles all of these things, because one of the coolest things about Traits is that if I ever end up with a lot of Little Things that I want to put together, the actual integration would be relatively straightforward.

So I guess that's the way forward: write things that do what I want right now, this instant, use them, and if they ever get too big for their britches, maybe they could all be strung together.  So for now, I'm going to just keep using the little things to do my research, one step at a time, and maybe some day down the road it'd be nice to get them to work all together.
Tagged python scipy yt

Some more fiddling with Chaco and AMR data

In a followup to a post from Friday, I fiddled a bit more with Chaco and the AMR data plotting today.  By mostly using components that I'd already created, I was able to make it switch between fields while panning and zooming.  By using Chaco and Enable a bit more cleverly, I managed to get a 25x zoomed in plot inset in the bottom left of the zoomed-out plot, and (as should come as little surprise) the necessary code was very short.  Here's what it looks like now, displaying a 512^3, refine everywhere, 7 level dataset:

L7_chaco

Panning and scanning and zooming through a dataset like this really is the only way to fly.  I'm now using it on data for my next paper, and I think it's about three lines of code from writing scripts to plot images based on clicking when I get to a scene I'd like.  There are a whole bunch of second-order fixes and improvements to make, but they have all been bumped to and unspecified time in the future now that it's functional to the extent that I can examine my data in detail!  But, I'll definitely push all of the changes necessary to get this type of interface back up into the main repository.  Perhaps somebody else will take it and run with it.

2D Data Visualization of AMR with Matplotlib and Chaco

With yt we have used matplotlib almost exclusively for 2D plotting of adaptive mesh refinement data.  This week, I implemented a plotter inside Chaco a plotting system designed for interactive exploration of data.

But first!  An interlude about how we visualize 2D AMR data.

Our data is naturally a patch-based format.  What that means is that higher resolution cells are grouped into grid patches -- this ultimately means that the data itself is relatively convenient to look at, and also for some post-processing computation we end up being able to make a lot of guesses at which data to look at or analyze based on its location in grid patches.  Additionally, these grid patches overlap at different resolution levels: finer resolution grids are inside coarser grids.

When generating a 2D representation, we typically will take either an orthogonal slice or a "projection" through the data.  (The additional spatially-organized 2D representation we have in yt is the oblique slice, but it's something of a special case.)  Back when designing the projection and slice algorithms, we decided it was the most efficient to construct a set of "flattened" data: by projecting through the data, we end up with a set of data that represents the finest resolution data available along every sightline through the domain.  This is represented by five values: (x, y, dx, dy, z) where x and y are in the plane of the projected domain.  For slices, we get something similar, but this time it's only the finest data available at any orthogonal slice.

Projection

The alternatively mechanism would be to generate fixed-resolution arrays and fill them in; this gives and takes various things, but ultimately while we provide support for this type of function, and it's the only way to project inline with the simulation (slices can be done adaptively inline), we prefer the "adaptive" projections.  An additional advantage, one that appeals to me particularly (as my AMR runs go to 30 or so levels of refinement) is that we only have to project once with the adaptive projections and we can zoom and pan as much as we like.  More on zooming and panning later!

In matplotlib, there are two natural ways to present this data: by constructing a set of rectangular patches that this data represents and then filling them with the appropriate value, or by pre-pixelizing, using a bit of our knowledge about how AMR data works, and then displaying an image.  The advantage of the first is that we are able to provide only a single set of data to matplotlib, once, and then we can adjust the limits inside the context of matplotlib and describe the data exclusively in the context of the domain.  However, there's a lot of overhead involved -- often we end up with projections that, with just the five values alone, run into the hundreds of megs.  (For very large simulations, particularly.)  Generating a second layer of objects, one for every cell at the finest resolution in the image plane, is probably going to add too much overhead.  So instead we run the pixelizer!  (Of course, at some stage matplotlib also runs a pixelizer, to turn the multiple patches into a single output image.  But it does so with a little less knowledge than we have about the nature of the data, so the generality costs would cost us.)

The pixelizer is relatively simple: for every datapoint in the projection or slice, if it falls within the bounds of our image, deposit it into the appropriate cells with which it overlaps.  If it fully overlaps, deposit the entire thing; otherwise, only deposit a fractional amount.  This process is relatively fast, but not without cost.  The biggest cost is actually in discarding the points that do not fall within the region -- the iteration itself!  Because our data is periodic, we have to check multiple times for most datapoints whether or not they are within a given region.  This also costs us.  Overall, however, the process is straightforward and fast.  (Of course, I do realize that the entire projection process: from the initial grid patches, to summing over an axis, to the compositing and the pixelizing, is something OpenGL was designed to do!  But, we do it all in C.)

So now we have a mechanism that takes multi-resolution AMR data and turns it into an image, given only the data itself, a resolution, and a 2D bounding box.  Typically this is used by yt to generate matplotlib plots.  This week, though, I have been working a bit on a tile display wall routine -- dispatching rendering jobs to different nodes and then having them feed back some portion of the render job to a display tool that broadcasts it to multiple monitors.  (More on that process in a future blog post.)

However, one thing that was always really lacking was an intuitive mechanism for interacting with a plot: in years past, I'd hacked up a matplotlib and wx-based GUI that had some neat features -- a slider to set the width and re-centering abilities.  But it was clunky, and kind of slow, and I hated maintaining it.  The book "wxPython in Action" was quite good, and wx is definitely a very capable toolset, but I ended up using the "pubsub" mechanism in wx and I eventually got lost in the events and handlers and couldn't find my way out.

Enthought's Chaco toolkit, though, was designed for interactive use -- whipping up GUIs around structured data (using the "Traits" interface) is easy and straightforward.  A couple years ago, they added the ability to get 1D data via functions -- so the data could be recalculated on the fly, as the range needed changed.  This week and last, with the help of the head Chaco developer, I was able to extend this to work with image data, as well.

First, a brief description of the UI I was hoping to get: Chaco very easily presents a "google maps" style interface to image data, where you can click and drag to pan and scroll your mouse wheel to zoom in and out.  I wanted to have it set up such that I could do this with the AMR data -- as you zoomed in, it would re-pixelize such that it was displaying a smaller portion at higher resolution, much like with normal google maps.  Almost all of this was available in the standard Chaco distribution -- the pan and zoom tools, the displaying of image data, and the setup for inserting callbacks between different components.  Displaying a single fixed image in Chaco, with the GUI and everything, is on the order of a dozen lines, many of which are boilerplate.

By adding a FunctionImageData class (almost identical to the FunctionDataSource class, except extended to 2D and subclassing Image Data) I was able to mock up the re-generation of data: as the range of data displayed changed, the data handed back was generated.  By inserting a callback to update the range that the data mapped to (i.e., not only generating higher resolution data, but mapping it back to a smaller data region, thus ensuring it fully filled the image plot) the entire experience was complete.

The FunctionImageData requires a callback function that receives a "low" and "high" value that describe the new bounds of the visible data, but I also had to update the "index" -- to do all of this together, I wrote this helper class:

 class ImagePixelizerHelper(object):
     index = None
     def __init__(self, panner):
         self.panner = panner
  
     def __call__(self, low, high):
         b = self.panner.set_low_high(low, high)
         if self.index is not None:
             num_x_ticks = b.shape[0] + 1
             num_y_ticks = b.shape[1] + 1
             xs = mgrid[low[0]:high[0]:num_x_ticks*1j]
             ys = mgrid[low[1]:high[1]:num_y_ticks*1j]
             self.index.set_data( xs, ys )
         return b
 

(Note that even though I sung the praises of Traits earlier, I don't really use them here.  That will happen, it just hasn't yet.)  The "panner" here is something that updates a pixelized image when the bounds change.  What the check for index inside __call__ does is mock up the re-generation of the image indices, if an index is supplied.  This also transforms the initialization of the image plot into a two-strep process:

         pd = ArrayPlotData()
         plot = Plot(pd)
         self.pd = pd
         helper = ImagePixelizerHelper(self.panner)
         fid = FunctionImageData(func = helper)
         fid._data = self.panner.buffer
         self.fid = fid
         bounds = self.panner.bounds
         pd.set_data("imagedata", fid)
  
         img_plot = plot.img_plot("imagedata", colormap=jet,
                                  interpolation='nearest',
                                  xbounds=(0.0, 1.0),
                                  ybounds=(0.0, 1.0))[0]
         helper.index = img_plot.index
         self.helper = helper
  
         fid.data_range = plot.range2d
 

So here I set up the array plot data, I set up the plot based on it, I create the pixelizer helper, then I add a default value to the pixelizer.  This is a crucial step, because it's how the image plot knows what shape the data is.  (There are some other things about the image plot I'm not fully sure of just now, and this is one I now sort of understand -- the relationship between mappers, indexers and plots.)  I've then set the helper to have access to the image plot's index, and the data range is set to the plot's data range.  At this point the two know how to talk together.  Below this I add a few tools for zooming and panning, and a colorbar too

At this point, our plot works.  And it's simple and easy and, best of all, very interactive.  Every time the data range changes the pixelizer updates and passes back a new image: so we've got the best of both worlds.  We only have to store in memory a relatively small amount of data, and it goes to the highest resolution possible.  Here's an example, where I am zooming into a dataset of mine.

Amr_chaco2

(download)

It could go on for a while -- this dataset is really quite deep (this was three scrolls, but I could probably do fifteen or so) but I think this gets the point across.  I was able to zoom in just by scrolling with my mouse, and I can click and drag.  It's intuitive, it's very very fast, and I only had to write a tiny bit of code to get it to go.  This is what I want!

I think that maybe it's time to revisit having a GUI.  I'd forgotten how much fun it is to explore my data, and if I can really get one going in such a short amount of code, it might be very effective to do this rather than constantly having to tweak scripts and scp images.  And with the vast amounts of data I am now sort of drowning in, this will be crucial.  The dataset I originally decided to do this for is just immense; I was struggling at coming up with a way to see it all, to see all the tiny features and structures, because it's at a time in the universe when such things are relatively far apart, but also quite numerous on the length scales we're simulation.  Using this kind of an interface will let me discuss individual halos with the other members of the project, and we'll be able to really see and share and converse, rather than just looking at static images over email.

Another huge advantage that only just occurred to me is that the same code I used to write the backend for the tile display wall could be used for remote visualization in this same manner.  By repurposing that to feed directly back into Chaco, instead of grabbing the arrays myself, we might be able to avoid any copying of data (even the pre-projected values) onto a GUI-able host.  That could be very awesome, too.  The entire Chaco/Traits/ETS toolkit is very powerful and I think I see it being an extremely valuable part of our future plans for interactive analysis.

(edit: evidently I messed up the options for code highlighting and for including all the images in a single slideshow, and from the post-edit area I can't figure out how to fix them..  I guess I have a bit more to learn about posterous!  Sorry about that.)

Tagged python scipy viz yt