Events

Feed icon 28x28
Download original

Video recording and production done by Enthought.

SciPy 2014 Schedule

July 8 - 10, 2014

( 123 available presentations )
Capture thumb
Rating: Everyone
Viewed 193 times
Recorded at:
Date Posted: October 28, 2015

The Council for Geoscience (CGS) is the so called "Geological Survey" of South Africa. Like many similar institutions around the world, financial restrictions play a significant role in limiting what tools are available to scientists. It was from this need to stay scientifically current, while keeping the software inexpensive, that the examination of Python first started and ultimately ended up in the PyGMI project.

The origins of PyGMI started with two separate projects. The first was a joint project where the CGS was responsible for the creation of a software interface for cluster analysis code, developed by the University of Potsdam (Paasche et al 2009). The resulting project was done entirely in Python. Data could be imported, filtered, analyzed and displayed in graph form using Matplotlib.

The second project stemmed from the need to perform 3D modelling on geophysical data. The creation of 3D models can be extremely time-consuming. Packages available tend to follow either the modelling of individual 2.5D profiles, which are then joined up into 3D sections, or modelling fully in three dimensions using polygonal based models. The initial idea was to use the VTK library as the means to create, display and interrogate the model, while using the Scipy and Numpy libraries to perform the actual potential field calculations. It soon became apparent that editing the resulting mesh quickly became complex and time consuming. The ability to easily create and change a model is the very basis of forward modelling and for this reason a new approach was adopted. The newer 3D modelling package was designed to allow the user to model simply by drawing the model, in the same way one would draw views of a house using a paint program. This implies the need to have a front view, as well as a top view. The model is therefore voxel based rather than polygonal. The final model can be displayed either within the PyGMI software, or exported to Google Earth for examination.

Ultimately these two projects formed the basis of what is now the actual PyGMI package -- which is a modular collection of various techniques, including multivariate statistical analysis and potential field modelling. The interface follows a flow diagram approach and the individual modules are independent enough to ensure that they do not interfere with code which has preceded them in previous modules.

The PyGMI software is available for free download at: https://code.google.com/p/pygmi/

Capture thumb
Rating: Everyone
Viewed 38 times
Recorded at:
Date Posted: October 29, 2015

Introduction and Motivation

Gamma-rays are photons with energies typically thousands to millions of times greater than the energy of visible light photons. The vastly higher energies of gamma-rays means that they interact differently with matter, necessitating new sensors and imaging methods to localize gamma ray sources. Many sensors and imaging approaches have been developed to image gamma-rays in 2D, as in a conventional camera, with applications in astronomy, medical imaging, and nuclear security. We have developed a mobile gamma-ray imaging system that merges data from both visual and gamma-ray imaging sensors to generate a visualization of the 3D gamma-ray distribution in real-time. This creates 3D maps of the physical environment and correlates that with the objects emitting gamma-rays. We have used Python to develop a flexible software framework for acquiring data from the multiple sensors, analyzing and merging data streams, and finally visualizing the resulting 3D gamma-ray maps.

Methods

The system consists of a cart that contains a state-of-the art gamma-ray imaging system, called a Compton Imager, coupled with an RGB-D imaging system, a Microsoft Kinect. The software package has three main tasks: gamma-ray acquisition and processing, visual data processing, and finally the merger of these two streams. The gamma-ray data processing pipeline involves many computationally intensive tasks, thus a threaded structure built with multiprocessing forms the basis of the gamma-ray imaging framework. Furthermore, many other Pythonic tools have been used to meet our real-time goal; including numexpr, cython, and even the Python/C API. Several GUI frontends, built with TraitsUI or PySide for example, are used to monitor and control how the acquired data is processed in real-time, while a suite of real-time diagnostics are displayed with matplotlib. The visual pipeline is based on an open-source implementation of RGBDSLAM (http://wiki.ros.org/rgbdslam), which is built on the Robot Operating System (ROS) framework. Finally, these two data streams are sent to a laptop computer via pyzmq, where the final merger and imaging (by solving a statistical inversion problem constrained by the visual data) is accomplished. The results are then displayed as they are produced by the imaging algorithm using mayavi.

Results

Link to Video: Moving Cart 3D scene

This system has been used to demonstrate real-time volumetric gamma ray imaging for the first time [1]. The results from a typical run are shown in the above video. The red line indicates the movement of the system through the environment, while the blue arrows represent an aspect of the gamma-ray data. The 3D point-cloud provided by RGBDSLAM appear incrementally as the system traverses the environment. In the end, the location of a small gamma-ray emitting source is correctly identified with the hotspot in the image.

[1] [https://www.nss-mic.org/2013/ConferenceRecord/Details.asp?PID=N25-4]

Capture thumb
Rating: Everyone
Viewed 17 times
Recorded at:
Date Posted: November 5, 2015

HDF5 is a hierarchical, binary database format that has become the de facto standard for scientific computing. While the specification may be used in a relatively simple way (persistence of static arrays) it also supports several high-level features that prove invaluable. These include chunking, ragged data, extensible data, parallel I/O, compression, complex selection, and in-core calculations. Moreover, HDF5 bindings exist for almost every language - including two Python libraries (PyTables and h5py). This tutorial will cover HDF5 itself through the lens of PyTables.

This tutorial will discuss tools, strategies, and hacks for really squeezing every ounce of performance out of HDF5 in new or existing projects. It will also go over fundamental limitations in the specification and provide creative and subtle strategies for getting around them. Overall, this tutorial will show how HDF5 plays nicely with all parts of an application making the code and data both faster and smaller. With such powerful features at the developer's disposal, what is not to love?!

Knowledge of Python, NumPy, C or C++, and basic HDF5 is recommended but not required.

Outline
Meaning in layout (20 min)

Tips for choosing your hierarchy
Advanced datatypes (20 min)

Tables
Nested types
Tricks with malloc() and byte-counting
Exercise on above topics (20 min)

Chunking (20 min)

How it works
How to properly select your chunksize
Queries and Selections (20 min)

In-core vs Out-of-core calculations
PyTables.where()
Datasets vs Dataspaces
Exercise on above topics (20 min)

The Starving CPU Problem (1 hr)

Why you should always use compression
Compression algorithms available
Choosing the correct one
Exercise
Integration with other databases (1 hr)

Migrating to/from SQL
HDF5 in other databases (JSON example)
Other Databases in HDF5 (JSON example)
Exercise

Capture thumb
Rating: Everyone
Viewed 20 times
Recorded at:
Date Posted: November 4, 2015

Remote sensing data is complicated, very complicated! It is not only geospatially tricky but also indirect as the sensor measures the interaction of the media with the probing radiation, not the geophysics. However the problem is made tractable by the large number of algorithms available in the Scientific Python community, what is needed is a common data model for active remote sensing data that can act as a layer between highly specialized file formats and the cloud of scientific software in Python. This presentation motivates this work by asking: How big is a rainshaft? What is the natural dimensionality of rainfall patterns and how well is this represented in fine scale atmospheric models. Rather than being specific to the domain of meteorology we will break down how we approach this problem in terms what tools across numerous packages we used to read, correct, map and reduce the data to forms able to answer our science questions. This is a "how" presentation, covering signal processing using linear programming methods, mapping using KD Trees, and image analysis using ndimage and, of course graphics using Matplotlib.

Capture thumb
Rating: Everyone
Viewed 24 times
Recorded at:
Date Posted: November 4, 2015

There are a growing number of Python packages (fiona, geopandas, pysal, shapely, etc.) addressing various types of spatial data, as well as the geoprocessing of that data and its statistical analysis. This session explore ways to best collaborate between and strengthen these efforts.

Capture thumb
Rating: Everyone
Viewed 22 times
Recorded at:
Date Posted: November 4, 2015

Capture thumb
Rating: Everyone
Viewed 15 times
Recorded at:
Date Posted: November 4, 2015

Capture thumb
Rating: Everyone
Viewed 25 times
Recorded at:
Date Posted: November 4, 2015

Capture thumb
Rating: Everyone
Viewed 26 times
Recorded at:
Date Posted: November 4, 2015

This talk will cover two projects: vim-ipython (1) and ipython-vimception (2)

1. Most people think of IPython as an application - but much of it is written as a library, making it possible to integrate with other tools.

vim-ipython is a Vim plugin that was first written during the sprints at SciPy 2011 as a two-way interface between the Vim text editor and a running IPython kernel. It turns vim into a frontend for IPython kernels, like the qtconsole and the notebook interface. It allows you to send lines or whole files for IPython to execute, and also get back object introspection and word completions in Vim, like what you get with: object? and object. in IPython. It currently has over 430 star gazers on GitHub. Because vim-ipython simply leverages much of existing IPython machinery, it allows users to interact with non-Python kernels (such as IJulia and IHaskell) in the same manner from the convenience of their favorite text editor. More recently, vim-ipython has gained the ability to conveniently view and edit IPython notebooks (.ipynb files) without a running an IPython Notebook server.

vim-ipython has a small and accessible code base (13 people have contributed patches to the project), which has frequently made it the reference example for how to implement and utilize the IPython messaging protocol that allows for the language-independent communication between frontends and kernels.

2. The IPython Notebook user interface has become highly customizable, and authoring code and content in the Notebook can be more pleasant and productive experience if you take the time to make it yours.

IPython 2.0 brings a modal notion to the Notebook interface. There are two modes: edit and mode command mode. In command mode, many single-key keyboard shortcuts are available. For example, m changes the current cell type to Markdown, a and b will insert a new cell above and below the current one, and so on. Edit mode removes these single key shortcuts so that new code and text can be typed in, but still retains a few familiar shortcuts, such as Ctrl-Enter, Alt-Enter, and Shift-Enter for cell execution (with some nuanced differences).

Part of the motivation behind the introduction of this modal interface was that performing operations on notebook cells became a tedious and awkward, as most operations required Ctrl-m to be typed too many times. For example, inserting 3 cells involved Ctrl-m a Ctrl-m a Ctrl-m a, whereas now it's just aaa in Command mode. But the other major reason for the modal refactor was to make it possible to add and remove shortcuts. For example, a user who finds it annoying that a stands for "insert above" and b for "insert below" and thinks that a for "insert after" and b for "insert before" makes more sense will now be able to make that change for herself.

Some of the keyboard shortcuts in command mode are already vi-like (j and k to move up and down between cells) but many are not, and a few are confusingly placed. ipython-vimception aims to be a reference implementation for how to perform shortcut and user interface customization in the notebook. In particular, along with vim-ipython's new ability to edit .ipynb files, ipython-vimception addresses the concerns of many die-hard vim aficionados. Many of them have otherwise shied away form the notebook interface as it offends their sensibilities for how text editing and document manipulation should be done. However, with the new customizable shortcut system in IPython, along with a vim emulation mode in cell text input areas, they finally will have a way to stay productive without having to change their ways.

Capture thumb
Rating: Everyone
Viewed 21 times
Recorded at:
Date Posted: November 4, 2015

Related URLs

Wilson G. Software Carpentry: lessons learned [v1; ref status: indexed, http://f1000r.es/2x7] F1000Research 2014, 3:62 (doi: 10.12688/f1000research.3-62.v1) - See more at: http://f1000research.com/articles/3-62/v1#sthash.wFA62aN0.dpuf (http://dx.doi.org/10.12688/f1000research.3-62.v1)
talk slides (http://third-bit.com/scipy2014/)

Rating: Everyone
Viewed 35 times
Recorded at:
Date Posted: November 5, 2015

SymPy is a pure Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.

Capture thumb
Rating: Everyone
Viewed 17 times
Recorded at:
Date Posted: October 29, 2015

Spatial weights matrices, $W$, play an essential role in many spatial analysis tasks including measures of spatial association, regionalization, and spatial regression. A spatial weight $w_{i,j}$ represents potential interaction between each $i,j$ pair in a set of $n$ spatial units. The weights are generally defined as either binary $w_{i,j}=\{1,0\}$, depending on whether or not $i$ and $j$ are considered neighbors, or a continuous value reflecting some general distance relationship between $i$ and $j$. This work focuses on the case of binary weights using a contiguity criteria where $i$ and $j$ are rook neighbors when sharing an edge and queen neighbors when sharing a vertex.

Population of the $W$ is computationally expensive, requiring, in the naive case, $O(n^2)$ point or edge comparisons. To improve efficiency data decomposition techniques, in the form of regular grids and quad-trees, as well as spatial indexing techniques using r-trees have be utilized to reduce the total number of local point or edge comparisons. Unfortunately, these algorithms still scale quadratically. Recent research has also shown that even with the application of parallel processing techniques, the gridded decomposition method does not scale as $n$ increases.

This work presents the development and testing of a high performance implementation, written in pure Python, using time constant and $O(n)$ operations, by leveraging high performance containers and a vertex comparison method. The figures below depict results of initial testing using synthetically generated lattices of triangles, squares, and hexagons with rook contiguity in black and queen contiguity in gray. These geometries were selected to control for average neighbour cardinality and average vertex count per geometry. From these initial tests, we report a significant speedup over r-tree implementations and a more modest speedup over gridded decomposition methods. In addition to scaling linearly, while existing methods scale quadratically, this method is also more memory efficient. Ongoing work is focusing on testing using randomly distributed data, and U.S. Census data, the application of parallelization techniques to test further performance improvements, and the use of fuzzy operator to account for spatial error .

Capture thumb
Rating: Everyone
Viewed 26 times
Recorded at:
Date Posted: November 4, 2015

Software for applied geoscientists in the petroleum industry is usually expensive, hard to use, Windows or Linux only, and slow to evolve. Furthermore, it is almost always stridently proprietary and therefore black-box. Open source software is rare. There are few developers working outside of seismic processing and enterprise database development, and consequently there is very little in the web and mobile domain. Reconciling a commitment to open source with a desire to earn a good living is one of the great conundrums of software engineering. We have chosen a hybrid approach of open core (like OpendTect, which has proprietary add-ons) and software-as-a-service (like WordPress.org vs WordPress.com).

Open source back-end
Our open core is a Python web app for producing synthetic seismic models, in much the same way that the now-deprecated Google Image Charts API used to work: the user provides a URL, which contains all the relevant data, and a JPEG image generated by matplotlib is returned. Along with the image, we return some computed data about the model, such as the elastic properties of the rocks involved. The mode of the tool is described by "scripts", which for now reside on the server, but which we plan to allow users to provide as part of the API. Scripts have various parameters, such as the P-wave and S-wave velocities, and the bulk density of the rocks in the model, and it is these parameters that make up most of the data in the API call. Other parameters include the type and frequency of wavelet to use, and the computation method for the reflectivity (for example the Zoeppritz equations, or the Aki\u2013Richards approximation). The app has no user interface to speak of, only a web API. It is licensed under the Apache 2 license and can be found on GitHub. We are running an instance of our app on a "T1.micro" Amazon EC2 instance running Ubuntu.

Proprietary front-end
The commercial, proprietary front end is a Python web app that lives in the Google App Engine walled garden. This app, which uses the Twitter Bootstrap framework, is serving at modelr.io and provides a user object in which a geoscientist can save rocks and scenarios consisting of a script and all its parameters. We chose App Engine for its strong infrastructure, good track record, and the easy availability of tools like the datastore, memcache, login, and so on. We also host support channels and materials through this front end, which has a very lightweight "demo" mode, and otherwise requires a $9/month subscription to use, handled by Stripe. This necessitated serving both the front and back ends over HTTPS, something we wanted to do anyway, because of industry mistrust of the cloud.

Summary
Some of the things we picked up along the way:

We started with a strong need of our own, so had clear milestones from day 1.
We left the project alone for months, but good documentation and GitHub meant this was not a problem.
Sprinting with a professional developer at the start meant less thrashing later.
The cloud landscape is exciting, but it's easy to be distracted by all the APIs. Keeping it simple is a constant struggle.
Pushing through Xeno's paradox to get to a live, public-facing app took stamina and focus.
There's nothing like having other users to get you to up your coding game.
We hope that by telling this story of the early days of a commercial scientific web application, built by a bunch of consulting scientists in Nova Scotia, not a tech startup in San Francisco, we can speed others along the path to creating a rich ecosystem of new geoscience tools and web APIs.

Capture thumb
Rating: Everyone
Viewed 84 times
Recorded at:
Date Posted: November 5, 2015

We will have an open discussion about current matplotlib enhancement proposals and take calls for new ones. Anyone interested in matplotlib's future development efforts is more than welcome to attend. There will be no presentation. Current MEPs exist here: https://github.com/matplotlib/matplotlib/wiki#matplotlib-enhancement-proposals-meps

Capture thumb
Rating: Everyone
Viewed 17 times
Recorded at:
Date Posted: November 5, 2015

Using the wide range of tools and libraries available for working with geospatial data, it is now possible to transport geospatial data from a database to a web-interface in only a few lines of code. In this tutorial, we explore some of these libraries and work through examples which showcase the power of Python for geospatial data.

Capture thumb
Rating: Everyone
Viewed 18 times
Recorded at:
Date Posted: November 5, 2015

One of the difficulties in using Python for scientific applications is that one needs a fairly complete set of Python data processing and visualization packages to be installed, beyond the standard Python distribution. Freely available scientific Python distributions like Enthought Canopy and Anaconda address this problem. A typical approach to teaching Python is to use a dedicated computer lab, where one of these distributions is installed on a set of machines with identical computing environments for use by students. With laptop computers becoming cheap and ubiquitous, an alternative approach is to allow students to use their own computers, where they install one of the scientific Python distributions by themselves. This approach requires more set-up time, because the software often requires some minor tweaking for each software platform, but requires no dedicated hardware and has the advantage of allowing students to easily run programs after class on their own computers. This presentation discusses a third approach that involves creating a software environment for Python using “cloud computing”. There are already commercial products available that provide well-supported Python computing environments in the cloud. This presentation focuses on alternative “roll your own” solutions using open-source software that are specifically targeted for use in an interactive classroom instruction setting.

Creating a virtual computing lab usually involves instantiating a server using a cloud infrastructure provider, such as Amazon Web Services. A new server can be set-up within minutes, with a scientific Python distribution automatically installed during set-up. Students can then login to their own accounts on the server using a browser-based interface to execute Python programs and visualize graphical output. Typically, each student would use a notebook interface to work on lessons.

Different approaches can be used to create separate accounts for multiple users. The simplest would be to create different user accounts on a Linux virtual machine. If greater isolation is required, lightweight linux containers can be created on-demand for each user. Although IPython Notebook can currently be run as a public server to work with multiple notebooks simultaneously, true multi-user support is expected to be implemented further down the road. However, there are a few open-source projects, such as JiffyLab, that already support a multi-user IPython Notebook environment. Another option is to use the open-source GraphTerm server, which supports a multi-user “graphical terminal” environment with a notebook interface. The pros and cons of these different approaches to building a virtual computer lab will be discussed.

Also discussed will be additional features that could be useful in a virtual computing lab such as the capability for the instructor to chat with the students and monitor their individual progress using a “dashboard”. Allowing students to collaborate in groups, with ability to view and edit each others’ code, can help promote classroom interaction. Enhancements to the notebook interface, such as “fill in the blanks” notebooks, can facilitate more structured instruction. The implementation of some of these features in the GraphTerm server will be discussed.

LINKS:

JiffyLab source

GraphTerm source

GraphTerm talk from SciPy 2013

Capture thumb
Rating: Everyone
Viewed 19 times
Recorded at:
Date Posted: November 4, 2015

We discuss recent advances in the Object Oriented Finite-Element project at NIST (also called OOF), a Python and C++ tool designed to bring sophisticated numerical modeling capabilities to users in the field of Materials Science.

As part of the effort to expand the solid-mechanics capabilities of the code, the solver has been extended to include the ability to handle history-dependent properties, such as occur in viscoplastic systems, and inequality constraints, which are present in conventional isotropic plasticity, as well as surface interactions.

This software provides numerous tools for constructing finite-element meshes from microstructural images, and for implementing material properties from a very broad class which includes elasticity, chemical and thermal diffusion, and electrostatics.

The code is a hybrid of Python and C++ code, with the high level user interface and control code in Python, and the heavy numeric work being done in C++. Numerous tools are provided for constructing finite-element meshes from microstructural images, and for implementing material properties from a very broad class which includes elasticity, chemical and thermal diffusion, and electrostatics. The software can be operated either as an interactive, GUI-driven application, as a scripted command-line tool, or as a supporting library, providing useful access to users of varying levels of expertise. At every level, the user-interface objects are intended to be familiar to the materials-science user.

The modular object-oriented design of the code, and the strategy of separating the finite-element infrastructure from the material constitutive rules proved itself in implementing the new solid-mechanics capabilities.

Development on a fully-3D version of the code has also made significant progress, overcoming several challenges associated with user-interface issues. A nontrivial, solved 3D problem will be presented.

Capture thumb
Rating: Everyone
Viewed 16 times
Recorded at:
Date Posted: October 29, 2015

Like most climate models, the CESM (Community Earth System Model) steps through time as a particular model scenario evolves and, at set intervals, outputs the state of all the important variables into single NetCDF files for each component of the model (atmosphere, ocean, land, and sea ice). Each file contains all the variables for a component at a single time step. Because the data volume is large, it is impractical to attempt to handle all the data for a complete model run as a single aggregation. Therefore, a consensus has evolved to mandate that the data be reorganized to contain single variables over some convenient time period. Finding a solution that can take advantage of multi-core architectures to do the job efficiently has not been easy. Recently, in an effort to determine the best solution, researchers at NCAR have conducted a set of benchmark tests to find the best tool for the job. Contenders included NCO (NetCDF Operators, the current incumbent for the task); an in-house Fortran code using the parallel I/O library PIO; a serial Python script using PyNIO; a version of the PyNIO script adapted to work with mpi4py in a very simple manner; CDO; NCL; and Pagoda. Surprisingly, PyNIO parallelized with mpi4py generally outperformed the other contenders by a large margin, and will now be tested as a replacement for the existing NCO scripts. This talk will look at the simple mpi4py and PyNIO code that achieves this result, discuss the reasons why the performance gain varies from case to case, and suggest ways to improve performance in challenging cases. Along the way, PyNIO's capabilities and recent improvements will be explained. In addition, other possible contenders for this role, in particular NetCDF4-Python coupled with mpi4py in a similar fashion, will be benchmarked using the same test suite.

Capture thumb
Rating: Everyone
Viewed 18 times
Recorded at:
Date Posted: November 4, 2015

Python's scientific computing and data analysis ecosystem, built around NumPy, SciPy, Matplotlib, Pandas, and a host of other libraries, is a tremendous success. NumPy provides an array object, the array-oriented ufunc primitive, and standard practices for exposing and writing numerical libraries to Python all of which have assisted in making it a solid foundation for the community. Over time, however, it has become clear that there are some limitations of NumPy that are difficult to address via evolution from within. Notably, the way NumPy arrays are restricted to data with regularly strided memory structure on a single machine is not easy to change.

Blaze is a project being built with the goal of addressing these limitations, and becoming a foundation to grow Python's success in array-oriented computing long into the future. It consists of a small collection of libraries being built to generalize NumPy's notions of array, dtype, and ufuncs to be more extensible, and to represent data and computation that is distributed or does not fit in main memory.

Datashape is the array type system that describes the structure of data, including a specification of a grammar and set of basic types, and a library for working with them. LibDyND is an in-memory array programming library, written in C++ and exposed to Python to provide the local representation of memory supporting the datashape array types. BLZ is a chunked column-oriented persistence storage format for storing Blaze data, well-suited for out of core computations. Finally, the Blaze library ties these components together with a deferred execution graph and execution engine, which can analyze desired computations together with the location and size of input data, and carry out an execution plan in memory, out of core, or in a distributed fashion as is needed.

Capture thumb
Rating: Everyone
Viewed 50 times
Recorded at:
Date Posted: November 5, 2015

Julia is a new, up-and-coming language that has many similarities to Python, but some differences. One of its main advantages is the speed gain obtained by automatically compiling all code (in a somewhat similar way to PyPy, Cython, numba, etc.), despite having an interactive interface very similar to that of Python.

This will be a tutorial on the basic features of Julia from scratch, given by a user (rather than a developer) of the language, emphasising those features which are similar to Python (and hence do not require much explanation) and those features which are rather different.

The idea of the tutorial is to give an idea of why there is suddenly such a buzz around Julia and why it can be useful for certain projects.

This tutorial is aimed at people who are already familiar with the basic scientific Python packages; it is not aimed at beginners in scientific programming.

Capture thumb
Rating: Everyone
Viewed 17 times
Recorded at:
Date Posted: November 4, 2015

Violence remains a significant problem in New York City's poor neighborhoods. There were more than 9,000 gun homicides in 2008 (FBI, 2009) and the CDC (2012) reports that there were more than 71K non-fatal wounds in the US. One novel approach to the problem of violence is the Cure Violence Model (Ransford, Kane and Slutkin 2009; Slutkin 2012). Cure Violence treats violence as a disease passed between people in a social network. The program tries to use the same network to change how people who are prone to and have been the victims of violence react to stress and conflict. Cure Violence is viewed as having been successful in Chicago and shown promising in other cities (Skogin 2009, Wilson 2010, Webster 2009). All of these studies have used reported incidents of violence before and after the program to assess the efficacy. The NYC Council and Robert Wood Johnson Foundation have commited significant resoures to this approach. Both have retained the CUNY John Jay Research & Evaluation center to evaluate the efficacy. Our research adds to the literature by being the first to attempt to measure the change in the propensity to violence of people in the community. Novel preliminary research is presented on network cliques of respondents and the demographic, education, victimization experiences that constitute greatest risk. All of the analysis was conducted in Python libraries including IPython, PySAL, Numpy, Basemap, Fiona, Shapely, Matplotlib, bNetworkX, Pandas and scikit-learn.

Capture thumb
Rating: Everyone
Viewed 41 times
Recorded at:
Date Posted: October 29, 2015

What is yt?

"Lingua-franca for astrophysical simulations"

AGORA
Some selections from the gallery
105 citations to the yt method paper
Massively parallel

Examples of large-scale calculations and visualizations performed with yt

Usage data on XSEDE visualization resources

Volumetric data analysis beyond astrophysics

Neurodome, Whole-earth seismic wave data, Weather simulation data, Nuclear engineering, Radio astronomy

??? (insert your field here!)

What's new in yt-3.0?

Rewrite of data selection, i/o, and field detection and creation

Octree and particle support (i.e., discrete points)

Unit conversions and dimensional analysis baked into the codebase

Rethinking the API, 'rebranding' the project

Advanced volume rendering

Growing the Community

The gallery

Workshops

Contributor statistics.

The future

New data styles

Unstructured meshes

Finite element analysis

Spectral codes

New domain-specific functionality (beyond astrophysics)

Browser GUIs powered by IPython

Capture thumb
Rating: Everyone
Viewed 20 times
Recorded at:
Date Posted: November 5, 2015

Introduction
Purpose of matplotlib
Online Documentation
matplotlib.org
Mailing Lists and StackOverflow
Github Repository
Bug Reports & Feature Requests
What is this "backend" thing I keep hearing about?
Interactive versus non-interactive
Agg
Tk, Qt, GTK, MacOSX, Wx, Cairo
Plotting Functions
Graphs (plot, scatter, bar, stem, etc.)
Images (imshow, pcolor, pcolormesh, contour[f], etc.)
Lesser Knowns: (pie, acorr, hexbin, streamplot, etc.)
What goes in a Figure?
Axes
Axis
ticks (and ticklines and ticklabels) (both major & minor)
axis labels
axes title
figure suptitle
axis spines
colorbars (and the oddities thereof)
axis scale
axis gridlines
legend
Manipulating the "Look-and-Feel"
Introducing matplotlibrc
Properties
color (and edgecolor, linecolor, facecolor, etc...)
linewidth and edgewidth and markeredgewidth (and the oddity that happens in errorbar())
linestyle
fonts
zorder
visible
What are toolkits?
axes_grid1
mplot3d
basemap

Capture thumb
Rating: Everyone
Viewed 24 times
Recorded at:
Date Posted: November 5, 2015

The aim of this course is to introduce new users to the Bayesian approach of statistical modeling and analysis, so that they can use Python packages such as NumPy, SciPy and PyMC effectively to analyze their own data. It is designed to get users quickly up and running with Bayesian methods, incorporating just enough statistical background to allow users to understand, in general terms, what they are implementing. The tutorial will be example-driven, with illustrative case studies using real data. Selected methods will include approximation methods, importance sampling, Markov chain Monte Carlo (MCMC) methods such as Metropolis-Hastings and Slice sampling. In addition to model fitting, the tutorial will address important techniques for model checking, model comparison, and steps for preparing data and processing model output. Tutorial content will be derived from the instructor's book Bayesian Statistical Computing using Python, to be published by Springer in late 2014.

PyMC forest plot

DAG

All course content will be available as a GitHub repository, including IPython notebooks and example data.

Tutorial Outline
Overview of Bayesian statistics.
Bayesian Inference with NumPy and SciPy
Markov chain Monte Carlo (MCMC)
The Essentials of PyMC
Fitting Linear Regression Models
Hierarchical Modeling
Model Checking and Validation
Installation Instructions
The easiest way to install the Python packages required for this tutorial is via Anaconda, a scientific Python distribution offered by Continuum analytics. Several other tutorials will be recommending a similar setup.

One of the key features of Anaconda is a command line utility called conda that can be used to manage third party packages. We have built a PyMC package for conda that can be installed from your terminal via the following command:

conda install -c https://conda.binstar.org/pymc pymc
This should install any prerequisite packages that are required to run PyMC.

One caveat is that conda does not yet have a build of PyMC for Python 3. Therefore, you would have to build it yourself via pip:

pip install git+git://github.com/pymc-devs/pymc.git@2.3
For those of you on Mac OS X that are already using the Homebrew package manager, I have prepared a script that will install the entire Python scientific stack, including PyMC 2.3. You can download the script here and run it via:

sh install_superpack_brew.sh

Capture thumb
Rating: Everyone
Viewed 24 times
Recorded at:
Date Posted: October 28, 2015

Geospatial data is frequently manipulated directly using Python tools, commonly built on top of powerful libraries such as GDAL, GEOS and NetCDF. Delivering model results to end users in many instances requires providing tools in familiar graphical environments, such as desktop GIS systems, which can permit users without programming knowledge to integrate models and results into their existing scientific workflows. This talk discusses how to construct simple wrappers around existing Python programs to enable their use by ArcGIS, a commonly used commercial GIS.

Two separate approaches will be illustrated: creating Python toolboxes, or collections of tools embeddable in workflows, and creating customized Python graphical add-ins, which can control the graphical environment provided within ArcGIS. Building contextual help, interactive widgets, and leveraging numpy for direct data integration will be discussed. While ArcGIS exposes much of its functionality via the ArcPy package, this talk instead focuses on integrating code from other environments, and doesn't presume existing ArcGIS expertise.

Capture thumb
Rating: Everyone
Viewed 69 times
Recorded at:
Date Posted: November 5, 2015

Python and C++ are both popular languages that each bring a lot to the table. The languages also complement one another well: Python is high-level, dynamic, and easy to use while C++ is at-the-metal, static, and (in)famously tricky. There are times when there are real advantages to combining these disparate natures, and Python’s C API provides a strong interface for doing just that. Boost.Python is a C++ library that builds upon and improves Python’s C API to give users a simpler, more intuitive, and safer means to integrate Python and C++.

In this tutorial we’ll look at how to use Boost.Python to effectively bridge the Python/C++ boundary. We’ll start by briefly looking at the fundamentals of the Python C API since that defines the “ground rules”; this includes things like reference counting, the basic object model, and so forth. We’ll then quickly look at the Boost.Python API and show how it provides the same functionality as the underlying C API, but does so in a way that doesn’t obscure the real semantics of the Python language.

After this introduction, the rest of the tutorial will involve writing code to explore various elements of Boost.Python. We’ll focus on techniques for extending Python with C++, that is, writing Python modules in C++. Boost.Python can be used for embedding (i.e. invoking Python code from C++), but that involves a different set of techniques, and in practice most scientific Python developers are more interested in developing extensions.

The syllabus for the four-hour tutorial will be like this:

Introduction: C-API and Boost.Python basics

Note that this can be reduced or eliminated of participants are already comfortable with the topics.

Hello World: Exposing a basic function

In this section we’ll get a minimal Boost.Python module working. This will not only introduce students to the infrastructure of Boost.Python, but it will also give us a chance to make sure that everyone’s build environment is working.

Exposing functions

In this section we’ll look at the details of exposing C++ functions to Python. The topics we’ll cover will include overloading (including Boost.Python’s auto-overload feature), default argument values, and a brief look at call policies.

Exposing classes

Here we’ll look at how to expose C++ classes to Python. Topics will include the basic class_ template, member functions, data members, properties, inheritance, and virtual functions.

boost::python::object

The boost::python::object class is Boost.Python’s primary interface to Python’s PyObject structure. Understanding how to work with this class is a key building-block for developing Python modules with Boost.Python. We’ll explore its API and features, including areas like attribute access, reference counting, and converting between Python and C++ objects.

Derived object types

Boost.Python provides a number of boost::python::object subclasses for important Python classes like list, dict, and tuple. In this section we’ll look at these subclasses and how to use them in Boost.Python modules.

Enums

Boost.Python provides enum_ for exposing C++ enums to Python. Python doesn’t have a notion of enums per se, but in this section we’ll explore how this template makes it straightforward to use C++ enums in Python in a simple and intuitive way.

Type conversion

In this section we’ll look at Boost.Python’s support for doing automatic type-conversion across the Python/C++ boundary. We’ll see how you can register type-converters with Boost.Python which will be invoked whenever Boost.Python needs to convert a Python object to a C++ object or vice versa.

This is a fairly ambitious set of topics, and it’s possible that we won’t be able to cover them all. The topics are roughly in most-often-used to least-often-used order, however, so students will be sure to be exposed to the most important and relevant elements of the course.

Likewise, the four-hour format of the course means that we won’t be able to go into great depth on many topics. The main goal of the course, then, is to give students enough orientation and hands-on experience with Boost.Python that they can continue to learn on their own. Inter-language integration - especially between languages as dissimilar as C++ and Python - can be quite complex, but this tutorial will give students the grounding they need to successfully apply Boost.Python to their problems.

Capture thumb
Rating: Everyone
Viewed 16 times
Recorded at:
Date Posted: November 5, 2015

Slides (http://figshare.com/articles/If_there_s_Computa...)

Capture thumb
Rating: Everyone
Viewed 75 times
Recorded at:
Date Posted: November 5, 2015

Python and C++ are both popular languages that each bring a lot to the table. The languages also complement one another well: Python is high-level, dynamic, and easy to use while C++ is at-the-metal, static, and (in)famously tricky. There are times when there are real advantages to combining these disparate natures, and Python’s C API provides a strong interface for doing just that. Boost.Python is a C++ library that builds upon and improves Python’s C API to give users a simpler, more intuitive, and safer means to integrate Python and C++.

In this tutorial we’ll look at how to use Boost.Python to effectively bridge the Python/C++ boundary. We’ll start by briefly looking at the fundamentals of the Python C API since that defines the “ground rules”; this includes things like reference counting, the basic object model, and so forth. We’ll then quickly look at the Boost.Python API and show how it provides the same functionality as the underlying C API, but does so in a way that doesn’t obscure the real semantics of the Python language.

After this introduction, the rest of the tutorial will involve writing code to explore various elements of Boost.Python. We’ll focus on techniques for extending Python with C++, that is, writing Python modules in C++. Boost.Python can be used for embedding (i.e. invoking Python code from C++), but that involves a different set of techniques, and in practice most scientific Python developers are more interested in developing extensions.

The syllabus for the four-hour tutorial will be like this:

Introduction: C-API and Boost.Python basics

Note that this can be reduced or eliminated of participants are already comfortable with the topics.

Hello World: Exposing a basic function

In this section we’ll get a minimal Boost.Python module working. This will not only introduce students to the infrastructure of Boost.Python, but it will also give us a chance to make sure that everyone’s build environment is working.

Exposing functions

In this section we’ll look at the details of exposing C++ functions to Python. The topics we’ll cover will include overloading (including Boost.Python’s auto-overload feature), default argument values, and a brief look at call policies.

Exposing classes

Here we’ll look at how to expose C++ classes to Python. Topics will include the basic class_ template, member functions, data members, properties, inheritance, and virtual functions.

boost::python::object

The boost::python::object class is Boost.Python’s primary interface to Python’s PyObject structure. Understanding how to work with this class is a key building-block for developing Python modules with Boost.Python. We’ll explore its API and features, including areas like attribute access, reference counting, and converting between Python and C++ objects.

Derived object types

Boost.Python provides a number of boost::python::object subclasses for important Python classes like list, dict, and tuple. In this section we’ll look at these subclasses and how to use them in Boost.Python modules.

Enums

Boost.Python provides enum_ for exposing C++ enums to Python. Python doesn’t have a notion of enums per se, but in this section we’ll explore how this template makes it straightforward to use C++ enums in Python in a simple and intuitive way.

Type conversion

In this section we’ll look at Boost.Python’s support for doing automatic type-conversion across the Python/C++ boundary. We’ll see how you can register type-converters with Boost.Python which will be invoked whenever Boost.Python needs to convert a Python object to a C++ object or vice versa.

This is a fairly ambitious set of topics, and it’s possible that we won’t be able to cover them all. The topics are roughly in most-often-used to least-often-used order, however, so students will be sure to be exposed to the most important and relevant elements of the course.

Likewise, the four-hour format of the course means that we won’t be able to go into great depth on many topics. The main goal of the course, then, is to give students enough orientation and hands-on experience with Boost.Python that they can continue to learn on their own. Inter-language integration - especially between languages as dissimilar as C++ and Python - can be quite complex, but this tutorial will give students the grounding they need to successfully apply Boost.Python to their problems.

Capture thumb
Rating: Everyone
Viewed 21 times
Recorded at:
Date Posted: November 5, 2015

Python and C++ are both popular languages that each bring a lot to the table. The languages also complement one another well: Python is high-level, dynamic, and easy to use while C++ is at-the-metal, static, and (in)famously tricky. There are times when there are real advantages to combining these disparate natures, and Python’s C API provides a strong interface for doing just that. Boost.Python is a C++ library that builds upon and improves Python’s C API to give users a simpler, more intuitive, and safer means to integrate Python and C++.

In this tutorial we’ll look at how to use Boost.Python to effectively bridge the Python/C++ boundary. We’ll start by briefly looking at the fundamentals of the Python C API since that defines the “ground rules”; this includes things like reference counting, the basic object model, and so forth. We’ll then quickly look at the Boost.Python API and show how it provides the same functionality as the underlying C API, but does so in a way that doesn’t obscure the real semantics of the Python language.

After this introduction, the rest of the tutorial will involve writing code to explore various elements of Boost.Python. We’ll focus on techniques for extending Python with C++, that is, writing Python modules in C++. Boost.Python can be used for embedding (i.e. invoking Python code from C++), but that involves a different set of techniques, and in practice most scientific Python developers are more interested in developing extensions.

The syllabus for the four-hour tutorial will be like this:

Introduction: C-API and Boost.Python basics

Note that this can be reduced or eliminated of participants are already comfortable with the topics.

Hello World: Exposing a basic function

In this section we’ll get a minimal Boost.Python module working. This will not only introduce students to the infrastructure of Boost.Python, but it will also give us a chance to make sure that everyone’s build environment is working.

Exposing functions

In this section we’ll look at the details of exposing C++ functions to Python. The topics we’ll cover will include overloading (including Boost.Python’s auto-overload feature), default argument values, and a brief look at call policies.

Exposing classes

Here we’ll look at how to expose C++ classes to Python. Topics will include the basic class_ template, member functions, data members, properties, inheritance, and virtual functions.

boost::python::object

The boost::python::object class is Boost.Python’s primary interface to Python’s PyObject structure. Understanding how to work with this class is a key building-block for developing Python modules with Boost.Python. We’ll explore its API and features, including areas like attribute access, reference counting, and converting between Python and C++ objects.

Derived object types

Boost.Python provides a number of boost::python::object subclasses for important Python classes like list, dict, and tuple. In this section we’ll look at these subclasses and how to use them in Boost.Python modules.

Enums

Boost.Python provides enum_ for exposing C++ enums to Python. Python doesn’t have a notion of enums per se, but in this section we’ll explore how this template makes it straightforward to use C++ enums in Python in a simple and intuitive way.

Type conversion

In this section we’ll look at Boost.Python’s support for doing automatic type-conversion across the Python/C++ boundary. We’ll see how you can register type-converters with Boost.Python which will be invoked whenever Boost.Python needs to convert a Python object to a C++ object or vice versa.

This is a fairly ambitious set of topics, and it’s possible that we won’t be able to cover them all. The topics are roughly in most-often-used to least-often-used order, however, so students will be sure to be exposed to the most important and relevant elements of the course.

Likewise, the four-hour format of the course means that we won’t be able to go into great depth on many topics. The main goal of the course, then, is to give students enough orientation and hands-on experience with Boost.Python that they can continue to learn on their own. Inter-language integration - especially between languages as dissimilar as C++ and Python - can be quite complex, but this tutorial will give students the grounding they need to successfully apply Boost.Python to their problems.

Capture thumb
Rating: Everyone
Viewed 15 times
Recorded at:
Date Posted: November 5, 2015

Python and C++ are both popular languages that each bring a lot to the table. The languages also complement one another well: Python is high-level, dynamic, and easy to use while C++ is at-the-metal, static, and (in)famously tricky. There are times when there are real advantages to combining these disparate natures, and Python’s C API provides a strong interface for doing just that. Boost.Python is a C++ library that builds upon and improves Python’s C API to give users a simpler, more intuitive, and safer means to integrate Python and C++.

In this tutorial we’ll look at how to use Boost.Python to effectively bridge the Python/C++ boundary. We’ll start by briefly looking at the fundamentals of the Python C API since that defines the “ground rules”; this includes things like reference counting, the basic object model, and so forth. We’ll then quickly look at the Boost.Python API and show how it provides the same functionality as the underlying C API, but does so in a way that doesn’t obscure the real semantics of the Python language.

After this introduction, the rest of the tutorial will involve writing code to explore various elements of Boost.Python. We’ll focus on techniques for extending Python with C++, that is, writing Python modules in C++. Boost.Python can be used for embedding (i.e. invoking Python code from C++), but that involves a different set of techniques, and in practice most scientific Python developers are more interested in developing extensions.

The syllabus for the four-hour tutorial will be like this:

Introduction: C-API and Boost.Python basics

Note that this can be reduced or eliminated of participants are already comfortable with the topics.

Hello World: Exposing a basic function

In this section we’ll get a minimal Boost.Python module working. This will not only introduce students to the infrastructure of Boost.Python, but it will also give us a chance to make sure that everyone’s build environment is working.

Exposing functions

In this section we’ll look at the details of exposing C++ functions to Python. The topics we’ll cover will include overloading (including Boost.Python’s auto-overload feature), default argument values, and a brief look at call policies.

Exposing classes

Here we’ll look at how to expose C++ classes to Python. Topics will include the basic class_ template, member functions, data members, properties, inheritance, and virtual functions.

boost::python::object

The boost::python::object class is Boost.Python’s primary interface to Python’s PyObject structure. Understanding how to work with this class is a key building-block for developing Python modules with Boost.Python. We’ll explore its API and features, including areas like attribute access, reference counting, and converting between Python and C++ objects.

Derived object types

Boost.Python provides a number of boost::python::object subclasses for important Python classes like list, dict, and tuple. In this section we’ll look at these subclasses and how to use them in Boost.Python modules.

Enums

Boost.Python provides enum_ for exposing C++ enums to Python. Python doesn’t have a notion of enums per se, but in this section we’ll explore how this template makes it straightforward to use C++ enums in Python in a simple and intuitive way.

Type conversion

In this section we’ll look at Boost.Python’s support for doing automatic type-conversion across the Python/C++ boundary. We’ll see how you can register type-converters with Boost.Python which will be invoked whenever Boost.Python needs to convert a Python object to a C++ object or vice versa.

This is a fairly ambitious set of topics, and it’s possible that we won’t be able to cover them all. The topics are roughly in most-often-used to least-often-used order, however, so students will be sure to be exposed to the most important and relevant elements of the course.

Likewise, the four-hour format of the course means that we won’t be able to go into great depth on many topics. The main goal of the course, then, is to give students enough orientation and hands-on experience with Boost.Python that they can continue to learn on their own. Inter-language integration - especially between languages as dissimilar as C++ and Python - can be quite complex, but this tutorial will give students the grounding they need to successfully apply Boost.Python to their problems.

Capture thumb
Rating: Everyone
Viewed 18 times
Recorded at:
Date Posted: November 5, 2015

Capture thumb
Rating: Everyone
Viewed 56 times
Recorded at:
Date Posted: November 5, 2015

Scientific computing is integrated into the undergraduate meteorology curriculum at Millersville University. The curriculum guidelines published by the American Meteorological Society specifically address scientific computing in undergraduate atmospheric sciences curricula, stating that students should gain “experience using a high-level structured programming language (e.g., C, C++, Python, MATLAB, IDL, or Fortran).” This is addressed at Millersville via the programming-specific courses are ESCI 282 – Fortran Programming for the Earth Sciences, and ESCI 386 – Scientific Programming, Analysis and Visualization with Python. There are also additional courses in which the students are required to use scientific computing as part of their assignments. Examples of these courses are ESCI 445 – Numerical Modeling of the Atmosphere and Oceans, and ESCI 390 – Remote Sensing.

Although the university’s computer science department teaches programming courses in Java, this is not very applicable to our students’ needs for scientific programming. We therefore teach our own programming courses in-house. The required programming course taken by all students is the Fortran course, which teaches them the elements of programming. This is many students’ first exposure to programming. Most then follow-on by taking the Python course. In the Python course, scientific data analysis and visualization are stressed, using the Scientific Python, Numerical Python, and Matplotlib libraries.

In the elective numerical modeling course, students are required to write programs for finite-difference solutions to various 1-D and 2-D partial differential equations relevant to modeling the fluid dynamics of the atmosphere. They may program in any language of their choosing, but the majority of students choose Python, even if they have no prior experience with it. This is because of Python’s intuitive syntax and ease of use. In the elective remote sensing course students are introduced to and use IDL/ENVI for display and analysis of remote sensing imagery.

Prior to 2012 the current Python course was instead taught as a course in IDL. The transition was made for several reasons. The primary reason was the limited market and usage of IDL compared to more pervasive languages such as MATLAB and Python. Many students would not have access to IDL once they graduate. Also, Python is gaining traction in usage in the atmospheric and oceanic sciences, and is not proprietary like IDL and MATLAB, so students will have access to it no matter where they find employment or graduate school opportunities. The high cost of maintaining an institutional IDL license is also an issue that the university must address annually, and it times of lean budgets it becomes an attractive target for elimination. The IDL course is still on the books, but there are no immediate plans for upcoming offerings.

Capture thumb
Rating: Everyone
Viewed 37 times
Recorded at:
Date Posted: November 5, 2015

In this tutorial we will introduce attendees to SymPy. We will show basics of constructing and manipulating mathematical expressions in SymPy, the most common issues and differences from other computer algebra systems, and how to deal with them. In the last part of this tutorial we will show how to solve some practical problems with SymPy. This will include showing how to interface SymPy with popular numeric libraries like NumPy.

This knowledge should be enough for attendees to start using SymPy for solving mathematical problems and hacking SymPy's internals (though hacking core modules may require additional expertise).

Capture thumb
Rating: Everyone
Viewed 23 times
Recorded at:
Date Posted: November 5, 2015

SymPy is a pure Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.

Capture thumb
Rating: Everyone
Viewed 21 times
Recorded at:
Date Posted: November 5, 2015

With support from the DARPA XDATA Initiative, and contributions from community members, the Bokeh visualization library (http://bokeh.pydata.org) has grown into a large, successful open source project with heavy interest and following on GitHub (https://github.com/ContinuumIO/bokeh). The principal goals of Bokeh are to provide capability to developers and domain experts:

easily create novel and powerful visualizations
that extract insight from remote, possibly large data sets
published to the web for others to explore and interact
This talk will describe how the architecture of Bokeh enables these goals, and demonstrate how it can be leveraged by anyone using python for analysis to visualize and present their work. We will talk about current development and future plans, including a brief discussion of Joseph Cottam's exciting academic work on abstract rendering for large data sets that is going into Bokeh (https://github.com/JosephCottam/AbstractRendering).

Capture thumb
Rating: Everyone
Viewed 17 times
Recorded at:
Date Posted: October 28, 2015

Astropy continues to see significant growth in available software, developers, and users. A new major release (V0.3) was made in the past year as well as many minor releases. V0.4 is scheduled for release by the time of conference and will include support for the VO SAMP protocol. We will report on the progress made in building and enhancing the core libraries in the Astropy Project, including a new model and fitting framework, enhanced units, quantities, and table functionality, a VO cone search tool, and a new convolution subpackage. We'll review the current set of tools available, highlighting in particular the new capabilities present. We will also give an overview of current activities and development plans for the core and affiliated packages, as well as adding new resources/tutorials for learning how to use astropy.

Capture thumb
Rating: Everyone
Viewed 20 times
Recorded at:
Date Posted: October 28, 2015

Representing data through colours is a very common approach to conveying important information to an audience. This is done throughout all fields in the scientific community and stakes a claim in the commercial and marketing realm as well. Colour maps and contour maps are the preferred way for scientists to visualise three-dimensional data in two dimensions. Research has shown that the choice of colourmap is crucial since the human brain interpolates hue poorly. We suggest some best practices scientists should consider when deciding how they should present their results. Specifically, we look at some examples of colourmaps that can easily be misinterpreted, making reference to an in-depth supportive study, and suggest alternative approaches to improve them. We conclude by listing some open source tools that aid making good colourmap choices. Kristen Thyng's talk on perception of colourmaps in matplotlib is an excellent follow-on from this.

Capture thumb
Rating: Everyone
Viewed 20 times
Recorded at:
Date Posted: November 5, 2015

Julia is a new, up-and-coming language that has many similarities to Python, but some differences. One of its main advantages is the speed gain obtained by automatically compiling all code (in a somewhat similar way to PyPy, Cython, numba, etc.), despite having an interactive interface very similar to that of Python.

This will be a tutorial on the basic features of Julia from scratch, given by a user (rather than a developer) of the language, emphasising those features which are similar to Python (and hence do not require much explanation) and those features which are rather different.

The idea of the tutorial is to give an idea of why there is suddenly such a buzz around Julia and why it can be useful for certain projects.

This tutorial is aimed at people who are already familiar with the basic scientific Python packages; it is not aimed at beginners in scientific programming.

Capture thumb
Rating: Everyone
Viewed 27 times
Recorded at:
Date Posted: November 5, 2015

SymPy is a pure Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.

Capture thumb
Rating: Everyone
Viewed 15 times
Recorded at:
Date Posted: October 28, 2015

Ginga is an open-source astronomical image viewer and toolkit written in python and hosted on Github. It uses and inter-operates with several key scientific python packages: numpy, scipy, astropy and matplotlib.

In this talk/poster we describe and illustrate recent enhancements to the package since the introductory talk at SciPy 2013, including:

modular/pluggable interfaces for world coordinate systems, image file I/O and star and image catalogs
support for rendering into matplotlib figures
support for image mosaicing
support for image overlays
customizable user-interface bindings
improved documentation
self contained Mac OS X packages
During the talk we will demonstrate the mosaicing plugin that is being used with several instruments at Subaru Telescope in Hawaii, including the new Hyper Suprime-Cam wide-field camera with 116 separate 4Kx2K CCDs.

The talk/poster may be of interest to anyone developing code in python needing to display scientific image (CCD or CMOS) data and astronomers interested in python-based quick look and analysis tools.

Capture thumb
Rating: Everyone
Viewed 31 times
Recorded at:
Date Posted: November 5, 2015

In the paper we compare object-oriented implementations of an advection algorithm written in Python, C++ and modern FORTRAN. The MPDATA advection algorithm (Multidimensional Positive-Definite Advective Transport Algorithm) used as a core of weather, ocean and climate modelling systems serves as an example.

In the context of scientific programming, employment of object-oriented programming (OOP) techniques may help to improve code readability, and hence its auditability and maintainability. OOP offers, in particular, the possibility to reproduce in the program code the mathematical "blackboard abstractions" used in the literature. We compare how the choice of a particular language influences syntax clarity, code length and the performance: CPU time and memory usage.

The Python implementation of MPDATA is based on NumPy. Its performance is compared with C++/Blitz++ and FORTRAN implementations. A notable performance gain when switching from the standard CPython to PyPy will be exemplified, and the reasons for it will be briefly explained. Discussion of other selected solutions for improving the NumPy’s relatively poor performance will be also presented.

This talk will describe and extend on the key findings presented in http://arxiv.org/abs/1301.1334.

Capture thumb
Rating: Everyone
Viewed 14 times
Recorded at:
Date Posted: November 5, 2015

As the line between developer and researcher becomes ever more blurred, the challenge of sharing your compute environment with students and colleagues becomes ever more complex. Large, private organizations have been grappling with this issue for a while, spawning a great deal of enthusiasm around tools like Docker, Puppet, Vagrant, and Packer. And let’s not forget notable python-based upstarts, Ansible and Salt! These tools can generate immense enthusiasm, followed by the question, “Why are we doing this?”

The problem is that researcher / developers can become overwhelmed by the complexity and variety inherent in devops tools - all the while losing sight of the real reason for using these tools: a philosophy of documenting your research compute environments in a reproducible fashion, with a focus on scripting as much as is reasonable.

At UC Berkeley, members of the D-Lab, the Statistical Compute Facility, Computer Science and Research IT have organized a project to develop the Berkeley Common Environment (BCE). I’ll provide an overview of the challenges we’ve tackled in both educational and research contexts, and the needs served by the above-mentioned devops tools. In the end, I argue that a coherent, easy-to-understand philosophy around scientific compute environments is fundamental - the tools are just a way to make your collaboration architecture a little easier for the people building these environments a few times a year. What we should focus on, though, is end-user experience and research community buy-in.

Capture thumb
Rating: Everyone
Viewed 17 times
Recorded at:
Date Posted: November 4, 2015

Background/Motivation
The Scientific Python community's contributions to greater scientific understanding have been underappreciated by academic institutions. One reason for this is that software engineering is widely misunderstood and not recognized as research work in its own right, as opposed to paper publication and patents. A better understanding of the open source software development process itself will help academic institutions recognize the contributions of open source developers.

Methods
I collect historical data from development of Scientific Python projects and render these into formats suitable for analysis using SciPy tools. To demonstrate the potential of this work, I will show two ways of analyzing this data scientifically: as a self-excited Hawkes process exibiting shock behavior, and as information diffusion over a social network.

Results
The purpose of this talk is twofold.

First, to introduce tools and techniques for turning data from open source software production into scientific data suitable for analysis. This talk proposes that there's an opportunity for SciPy to engage in reflexive data science, using its own data to learn more about how it functions and how to operate more efficiently.

Second, this talk will present visualizations of the data based on complex systems research and social network analysis. Building on prior work, these results will focus on the role of productive bursts in communications. Drawing on social network analysis and prior work on roles in Usenet communities and open source communities, this talk will provide historical insight into the interaction between SciPy communities.

Capture thumb
Rating: Everyone
Viewed 8 times
Recorded at:
Date Posted: October 28, 2015

The practice of representing geospatial data upon a flat surface is known as cartography, and the topological implications of projecting fundamentally 3D data onto a 2 dimensional surface has been the challenge of map-makers since time immemorial. Geospatial visualisation software is often implemented without consideration for the 3rd dimension and this commonly results in problems around the dateline or at the poles. For small areas these problems are often not apparent and mostly surmountable, but at a global scale, such as when visualising output from GCMs (General circulation models), the underlying representation must be addressed head-on in order to visualise the data "impact free".

Cartopy is a Python package which builds on top of Proj.4 to define coordinate reference systems for the transformation and visualisation of geospatial data. As well as the fundamental transformations there is also a matplotlib interface allowing easy generation of maps with the same publication quiality expected of matplotlib. Cartopy employs several techniques to handle geospatial data correctly, including true spherical interpolation for raster data, and Shapely geometry interpolate-and-cut transformations for geospatial vector data.

This talk will outline some of the capabilities of cartopy, and continue onto its practical application within the realm of scientific presentation of geospatial data.

Capture thumb
Rating: Everyone
Viewed 9 times
Recorded at:
Date Posted: October 29, 2015

GeoPandas is a library built on top of pandas to extend its capabilities to allow spatial calculations. The two main datatypes are GeoSeries and GeoDataFrame, extending pandas Series and DataFrame, respectively. A GeoSeries contains a collection of geometric objects (such as Point, LineString, or Polygon) and implements nearly all Shapely operations. These include unary operations (e.g. centroid), binary operations (e.g. distance, either elementwise to another GeoSeries or to a single geometry), and cumulative operations (e.g. unary_union to combine all items to a single geometry).

A GeoDataFrame object contains a column of geometries (itself a GeoSeries) that has special meaning. GeoDataFrames can be easily created from spatial data in other formats, such as shapefiles. Rows in the GeoDataFrame represent features, and columns represent attributes. Pandas' grouping and aggregation methods are also supported.

GeoPandas objects can optionally be aware of coordinate reference systems (by adding a crs attribute) and transformed between map projections. Basic support for plotting is included with GeoPandas. Other features include geocoding, export to GeoJSON, and retrieving data from a PostGIS spatial database.

This talk will describe the main features of GeoPandas and show examples of its use.

Capture thumb
Rating: Everyone
Viewed 4 times
Recorded at:
Date Posted: November 4, 2015

In 2013, the Gordon and Betty Moore and the Alfred P. Sloan foundations awarded UC Berkeley, U. Washington and NYU for a collaborative, $38M in support of a 5-year initiative to create novel environments for Data Science. This project was driven by the recognition that computing and data analysis have now become the backbone of all scientific research, and yet the teams, collaborations and individuals that make this possible typically encounter significant barriers in today's academic environments.

The SciPy community is one of the poster children of this issue: many of our members live "officially" in traditional, discipline-oriented scientific research, and yet we have committed time and effort to creating an open ecosystem of tools for research. As we all know, this is often done with little support from the standard incentive structures of science, be it publication venues, funding agencies or hiring, tenure and promotion committees.

The launch of this initiative is an important moment, as it signals the recognition of this problem by important and well-respected foundations in science. At UC Berkeley, we took this opportunity to create the new Berkeley Institute for Data Science. In this effort, the open source tools of the SciPy community will play a central role.

In this talk, I will describe the larger context in which this initiative has been created, as well as the scientific scope of our team, our goals, and the opportunities that we will try to provide with this space. We expect that this new institute, together with our partners at UW and NYU, will play an important role in support of the great work of the SciPy ecosystem.

Capture thumb
Rating: Everyone
Viewed 14 times
Recorded at:
Date Posted: November 5, 2015

In this tutorial, attendees will learn how to derive, simulate, and visualize the motion of a multibody dynamic system with Python tools. The tutorial will demonstrate an advanced symbolic and numeric pipeline for a typical multibody simulation problem. These methods and techniques play an important role in the design and understanding of robots, vehicles, spacecraft, manufacturing machines, human motion, etc. At the end, the attendees will have developed code to simulate the uncontrolled and controlled motion of a human or humanoid robot.

We will highlight the derivation of realistic models of motion with the SymPy Mechanics package. We will walk through the derivation of the equations of motion of a multibody system (i.e. the model or the plant), simulating and visualizing the free motion of the system, and finally we will addfeedback controllers to control the plants that we derive.

It is best if the attendees have some background with calculus-based college level physics. They should also be familiar with the SciPy Stack, in particular IPython, SymPy, NumPy, and SciPy. Our goal is that attendees will come away with the ability to model basic multibody systems, simulate and visualize the motion, and apply feedback controllers all in a Python framework.

The tutorial materials including an outline can be viewed here:

https://github.com/pydy/pydy-tutorial-pycon-2014

Capture thumb
Rating: Everyone
Viewed 12 times
Recorded at:
Date Posted: October 28, 2015

Capture thumb
Rating: Everyone
Viewed 7 times
Recorded at:
Date Posted: November 4, 2015

Numerical simulations have an incredibly broad range of applicability, from computer aided aircraft design to drug discovery. However, any prediction arising from a computer model must be rigorously tested to ensure its reliability. Verification is a process that ensures that the outputs of a computation accurately reflect the solution of the mathematical models.

This talk will first provide an introduction to the method of manufactured solutions (MMS), which is a powerful approach for verification of model problems in which a solution cannot be determined analytically. However, verifying computational science software using manufactured

solutions requires the generation of the solutions with associated forcing terms and their reliable implementation in software. There are several issues that arise in generating solutions, including ensuring that they are meaningful, and the algebraic complexity of the forcing terms. After briefly discussing these issues, the talk will introduce MASA, the Manufactured Analytical Solution Abstraction library. MASA is an open-source library written in C++ (with python interfaces) which is designed for the veri?cation of software used for solving a large class of problems stemming from numerical methods in applied mathematics including nonlinear equations, systems of algebraic equations, and ordinary and partial differential equations.

Example formulations in MASA include the Heat Equation, Laplace's Equation, and the Navier-Stokes Equations, but in principle MASA supports instantiating any model that can be written down mathematically. This talk will end with details on two methods to import manufactured solutions into the library, either by generate the source terms, or using the automatic differentiation capabilities provided in MASA.

The library is available at: https://github.com/manufactured-solutions/MASA

Capture thumb
Rating: Everyone
Viewed 11 times
Recorded at:
Date Posted: October 28, 2015

Tracking the motion of many particles is an established technique [Crocker, J.C., Grier, D.G.], but many physicists, biologists, and chemical engineers still (make their undergraduates) do it by hand. Trackpy, is a flexible, high-performance implementation of these algorithms in Python using the scientific stack -- including pandas, numba, the IPython notebook, and mpld3 -- which scales well to track, filter, and analyze tens of thousands of feature trajectories. It was developed collaboratively by research groups at U. Chicago, U. Penn, Johns Hopkins, and others.

Researchers with very different requirements for performance and precision collaborate on the same package. Some original "magic" manages high-performance components, including numba, using them if they are available and beneficial; however, the package is still fully functional without these features. Accessibility to new programmers is a high priority.

Biological data and video with significant background variation can confound standard feature identification algorithms, and manual curation is unavoidable. Here, the high-performance group operations in pandas and the cutting-edge notebook ecosystem, in particular the interactive IPython tools and mpld3, enable detailed examination and discrimination.

The infrastructure developed for this project can be applied to other work. Large video data sets can be processed frame by frame, out of core. Image sequences and video are managed through an abstract class that treats all formats alike through a handy, idiomatic interface in a companion project dubbed PIMS.

A suite of over 150 unit tests with automated continuous integration testing has ensured stability and accuracy during the collaborative process. In our experience, this is an unusual but worthwhile level of testing for a niche codebase from an academic lab.

In general, we have lessons to share from developing shared tools for researchers with separate priorities and varied levels of programming skill and interest.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: October 28, 2015

The goal of Project Panoptes (Panoptic Astronomical Networked OPtical observatory for Transiting Exoplanets Survey, see http://projectpanoptes.org/) is to build low cost, reliable, robotic telescopes which can be used to detect transiting exoplanets. The hardware is designed to be standardized, using as many commercial off the shelf components as possible so that a Panoptes "unit" can be reproduced quickly and easily by students or amateurs. In this way, many units can be deployed at many different sites to provide continuous and redundant sky coverage. Panoptes is designed from the ground up to be a citizen science project which will involve the public in all aspects of the science, from data acquisition to data reduction.

In this presentation, we describe the current status of the Panoptes Observatory Control System (POCS, see https://github.com/panoptes/POCS), an open source, collaborative, python-based software package. POCS is designed to be a simple as possible in order to make it accessible to non-experts. As such, POCS is a state machine which transitions between a few well defined operating states. We make extensive use of existing modules (notably astropy and pyephem). The challenge we face in writing POCS to to balance our desire for simplicity and accessibility against capability.

We will also briefly describe the other software challenges of our project, specifically an algorithm designed to extract accurate photometry from DSLR images (color images obtained using a Bayer color filter array) rather than from the more traditional filtered monochrome CCD image.

Capture thumb
Rating: Everyone
Viewed 7 times
Recorded at:
Date Posted: November 4, 2015

Numerical Lagrangian tracking is a way to follow parcels of fluid as they are advected by a numerical circulation model. This is a natural method to investigate transport in a system and understand the physics on the wide range of length scales that are actually experienced by a drifter. TRACMASS is a tool for Lagrangian trajectory modeling that has been developed over the past two decades. It has been used to better understand physics and its applications to real-world problems in many areas around the world, in both atmospheric and oceanic settings. TRACMASS is written in FORTRAN, which is great for speed but not as great for ease of use. This code has been wrapped in Python to run batches of simulations and improve accessibility --- and dubbed TracPy.

In this talk, I will outline some of the interesting features of the TRACMASS algorithm and several applications, then discuss the layout of the TracPy code. The code setup and organization have been a learning process and I will also share some of my hard-earned lessons.

TracPy is continually in development and is available on GitHub.

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: November 5, 2015

IPython provides an architecture for interactive computing. The IPython Notebook is a web-based interactive computing environment for exploratory and reproducible computing. With the IPython Notebook, users create documents, called notebooks, that contain formatted text, figures, equations, programming code, and code output.

The IPython Notebook generalizes the notion of output to include images, LaTeX, video, HTML, JavaScript, PDF, etc. These output formats are displayed in the Notebook using IPython’s display architecture, embedded in notebook documents and rendered on the IPython Notebook Viewer. By taking advantage of these rich output formats users can build notebooks that include rich representations and visualizations of data and other content. In this tutorial, we will describe the display architecture, existing Python APIs and libraries that already use it (mpld3, vincent, polotly, etc.), and how users can define custom display logic for their own Python objects.

As of version 2.0, the IPython Notebook also includes interactive JavaScript widgets. These widgets provide a way for users to interact with UI controls in the browser that are tied to Python code in running in the kernel. We will begin by covering the highest-level API for these widgets, “interact,” which automatically builds a user interface for exploring a Python function. Next we will describe the lower-level widget objects that are included with IPython: sliders, text boxes, buttons, etc. However, the full potential of the widget framework lies with its extensibility. Users can create their own custom widgets using Python, JavaScript, HTML and CSS. We will conclude with a detailed look at custom widget creation.

Capture thumb
Rating: Everyone
Viewed 7 times
Recorded at:
Date Posted: November 4, 2015

Circuits, i.e., a network of interconnected components with ports, have found application in various scientific and engineering domains, ranging from applications close to the physical implementation, such as electrical circuits, photonic circuits for optical information processing, superconducting quantum circuits for quantum information applications to more abstract circuit representations of dynamical systems, biological processes or even software algorithms.

This has already led to the development of quite general domain-independent circuit modeling toolkits such as Modelica, but to date, there exist very few open source graphical general circuit editing environments that can be tightly integrated with custom, domain-specific implementation simulation or analysis backends as well as IPython.

Here we present our first attempt at creating such a tool as well as some applications from our own research on nano-photonic quantum circuit models. Our existing QNET software package allows to model these circuits in a purely symbolic fashion and interfaces with various codes for numerical simulation.

We demonstrate that the extension of our package with a visual circuit editor leads to a rich integrated simulation and analysis workflow in which an engineer or researcher can receive very fast feedback when making changes to his model.

As a consequence, it is much easier to build intuition for the particular kinds of circuit models and find novel and creative solutions to an engineering task.

Finally, given the broad range of applications for circuit models and representations, we outline how our visual circuit editor can be adapted to export a circuit for interfacing with other domain specific software such as Modelica.

Capture thumb
Rating: Everyone
Viewed 2 times
Recorded at:
Date Posted: October 28, 2015

We present two new visualizations, case tree plots and checkerboard plots, for visualizing emerging zoonoses.

Zoonoses represent an estimated 58% of all human infectious diseases, and 73% of emerging infectious diseases. Recent examples of zoonotic outbreaks include H1N1, SARS and Middle East Respiratory Syndrome, which have caused thousands of deaths combined. The current toolkit for visualizing data from these emerging diseases is limited.

Case tree and checkerboard plots were developed to address that gap. The visualizations are best suited for diseases like SARS for which there are a limited number of cases, with data available on human to human transmission. They a) allow for easy estimation of epidemiological parameters like basic reproduction number b) indicate the frequency of introductory events, e.g. spillovers in the case of zoonoses c) represent patterns of case attributes like patient sex both by generation and over time.

Case tree plots depict the emergence and growth of clusters of disease over time. Each case is represented by a colored node. Nodes that share an epidemiological link are connected by an edge. The color of the node varies based on the node attribute; it could represent patient sex, health status (e.g. alive, dead), or any other categorical attribute. Node placement along the x-axis corresponds with the date of illness onset for the case.

A second visualization, the checkerboard plot, was developed to complement case tree plots. They can be used in conjunction with case tree plots, or in situations where representing a hypothetical network structure is inappropriate.

The plots are available in the open source package epipy, which is available on github. Detailed documentation and examples are also available. In addition to these visualizations, epipy includes functions for common epidemiology calculations like odds ratio and relative risk.

Capture thumb
Rating: Everyone
Viewed 8 times
Recorded at:
Date Posted: November 4, 2015

A geophysics paper by Spitz has a long paragraph that describes a model, an algorithm, and the results of applying the algorithm to the model. I wanted to implement and test the algorithm to ensure I fully understood the method. This is a good illustration of Python for geophysics because the implementation requires:

Fourier transforms provided by numpy.fft
Setting up linear equations using numpy.array and numpy.matrix
solving the linear equations using scipy.linalg.solve
Applying convolutional filters using scipy.signal.lfilter
A bandlimited flat event model is created using array slicing in numpy and is bandlimited in the frequency domain. Another component of the model is created by convolving a short derivative filter on a similar flat event model. After Fourier transform, linear equations are set up to compute a prediction filter in the FX domain. These equations are created using data slicing, conjugate transpose, matrix multiple (all available in numpy). Scipy.linalg.solve is used to solve for the prediction error filter. A final filter is computed using the recursive filter capability in scipy.signal.lfilter. Results are displayed using matplotlib.

This is quite a tour of scipy and numpy to implement an algorithm described in a single paragraph. Many operations commonly used in geophysics are illustrated in the program. The resulting program is less than 200 lines of code. I will describe the algorithm and share the prototype code.

References:

Spitz, S. (1999). Pattern recognition, spatial predictability, and subtraction of multiple events. The Leading Edge, 18(1), 55-58. doi: 10.1190/1.1438154

Capture thumb
Rating: Everyone
Viewed 4 times
Recorded at:
Date Posted: November 4, 2015

3D reflection seismic data acquired offshore of southeast Japan as part of the Nankai Trough Seismogenic Zone Experiment (NanTroSEIZE) provides a unique opportunity to study active accretionary prism processes. The 3D seismic volume revealed complex interactions between active sedimentation and tectonics within multiple slope basins above the accretionary prism. However, our ability to understand these interactions was hindered without access to expensive specialized software packages.

We implemented stratal slicing of the 3D volume and co-rendering of multiple attributes in python to better visualize our results. Stratal slicing allows volumetric attributes to be displayed in map view along an arbitrary geologic timeline(~30MB animated gif) by interpolating between interpreted geologic surfaces. This enhances the visibility of subtle changes in stratigraphic architecture through time. Co-rendering coherence on top of seismic amplitudes facilitates fault interpretation in both cross section and map view. This technique allowed us to confidently interpret faults near the limit of seismic resolution.

The scientific python ecosystem proved to be an effective platform both for making publication-quality cross sections and for rapidly implementing state-of-the-art seismic visualization techniques. We created publication quality cross sections (some annotations added in Inkscape) and interactive 2D visualizations in matplotlib. For 3D display of seismic volumes we used mayavi to easily create interactive scenes. scipy.ndimage provided most of the underlying image processing capability and allowed us to preform memory-efficient operations on >10GB arrays.

Capture thumb
Rating: Everyone
Viewed 9 times
Recorded at:
Date Posted: November 5, 2015

Starting out with scientific computing in Python can be daunting: Where do I start? What are the basic packages, and what is the use case for each of them? What are the fundamental ideas I need to understand each package and how it works?

In this tutorial, we will use examples of scientific questions and calculations which lead directly to the need for certain computational tools as a gateway to understand the basic structure of the scientific computing ecosystem. The specific packages we will touch on are numpy, matplotlib, scipy, sympy and pandas, all viewed through the wonderful lens of the IPython Notebook.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: October 29, 2015

Computional Biology and Experimental Biology are two specialities that would deeply benefit from more interaction - computationalists need access to data, biologists in wetlabs need computational tools. The KBase Narrative is a computerized laboratory notebook that puts the power of the KBase predictive biology platform into the hands of experimentalists and students. KBase provides cluster computation, analysis and modeling pipelines, large public datasets and a "pluggable" architecture for future services. The Narrative is an interface enabling the sharing of data, approaches and workflows on KBase. It also serves as a teaching tool and publishing platform, allowing other scientists and students to observe and reproduce the processes that led to the published result.

The KBase Narrative is based on the IPython Notebook, extended in the following ways:

Notebooks are stored in a remote object store that enables versioning, provenance and sharing
Support for multiple users has been added, based on OAuth authentication against a "cloud" authentication service (Globus Online)
A framework for dynamically building form inputs for services using Python introspection and the IPython Traitlets package (a version of Traits) and displaying the output in JS visualization widgets
A Docker based provisioning system that builds and tears down sandboxed IPython Notebook servers on demand, providing a scalable, reasonable safe and easy to use environment for running hosted IPython notebooks with much smaller overhead than VM's
A heavily modified user interface that has been designed to support computational biology workflows
The current KBase Narrative was developed over the span of roughly 6 months by a small team of developers and user interface experts - the short time scale was possible due to the huge amount of functionality already provided by the IPython Notebook, and taking advantage of the productivity and power of the Python language.

Capture thumb
Rating: Everyone
Viewed 4 times
Recorded at:
Date Posted: November 5, 2015

Python is currently one of the most popular programming languages and it seems that that Scientific Python has truly hit its stride in recent years. With fame comes a deluge of users, but not necessarily any more developers. Scientific Python is often held up as one of the core strengths of the Python language. Why is this so? And how much does it actually help us? This talk intends to be a frank discussion on the great parts of the SciPy community and the parts that need work.

As a confederation of packages and projects, there are several issues that affect everyone. Sometimes these issues fall through the cracks and other times they are vigorously tackled head on. In either case, I posit that greater communication about these global topics is necessary to support and scale to the next wave of SciPy users and developers.

Points of discussion in this talk may include:

Packaging,
Education,
Matplotlib - aged or awesome,
Competition from other languages,
Diversity,
Employing our own,
Interfacing with the broader Python community,
The legal status of projects, and
Maintaining critical packages in the ecosystem (when devs have moved on).
Historically, the SciPy conference has not had many overview talks, talks about the community itself, what we are doing right, and what we are doing wrong.

They were often relegated to keynotes if they were present at all. This talk is a boots-on-the-ground attempt to rectify that.

Capture thumb

Developing scientific software is a continuous balance between not reinventing the wheel and getting fragile codes to interoperate with one another. Binary software distributions such as Anaconda provide a robust starting point for many scientific software packages, but this solution alone is insufficient for many scientific software developers. HashDist provides a critical component of the development workflow, enabling highly customizable, source-driven, and reproducible builds for scientific software stacks, available from both the IPython Notebook and the command line.

To address these issues, the Coastal and Hydraulics Laboratory at the US Army Engineer Research and Development Center has funded the development of HashDist in collaboration with Simula Research Laboratories and the University of Texas at Austin. HashDist is motivated by a functional approach to package build management, and features intelligent caching of sources and builds, parametrized build specifications, and the ability to interoperate with system compilers and packages. HashDist enables the easy specification of "software stacks", which allow both the novice user to install a default environment and the advanced user to configure every aspect of their build in a modular fashion. As an advanced feature, HashDist builds can be made relocatable, allowing the easy redistribution of binaries on all three major operating systems as well as cloud, and supercomputing platforms. As a final benefit, all HashDist builds are reproducible, with a build hash specifying exactly how each component of the software stack was installed.

This talk will feature an introduction to the problem of packaging Python-based scientific software, a discussion of the basic tools available to scientific Python developers, and a detailed discussion and demonstration of the HashDist package build manager.

The HashDist documentation is available from: http://hashdist.readthedocs.org/en/latest/ HashDist is currently hosted at: https://github.com/hashdist/hashdist

Capture thumb
Rating: Everyone
Viewed 14 times
Recorded at:
Date Posted: November 5, 2015

Traditional university teaching is based on the use of lectures in class and textbooks out of class. The medium (lecture or book) discourages the natural curiosity that can lead to deeper understanding by investigating the result of changing a parameter, or looking at the full results of a time-dependent simulation.

The IPython notebook provides a single medium in which mathematics, explanations, executable code, and animated or interactive visualization can be combined. Notebooks that combine all of these components can enable new modes of student-led inquiry: the student can experiment with modifications to the code and see the results, all without stepping away from the mathematical explanations themselves. When notebooks are used by students in the classroom, students can quickly share and discuss results with the instructor or other class members. The instructor can facilitate deeper learning by posing questions that students may answer through writing appropriate code, during class time.

For the past four years, I have taught a graduate numerical analysis course using SAGE worksheets and IPython notebooks. I will show examples of the notebooks I've developed and successfully used in this course. I will describe some practical aspects of my experience, such as:

Tradeoffs between using IPython and SAGE
Experiences with use of cloud computing platforms
Dealing with students' installation issues
Quickly getting students up to speed with the Python language and packages
Testing and evaluating homework in a math course that is programming-intensive

Capture thumb
Rating: Everyone
Viewed 7 times
Recorded at:
Date Posted: November 4, 2015

As the field of climate modeling continues to mature, we must anticipate the practical implications of the climatic shifts predicted by these models. In this talk, I'll show how we apply the results of climate change models to predict shifts in agricultural zones across the western US. I will outline the use of the Geospatial Data Abstraction Library (GDAL) and Scikit-Learn (sklearn) to perform supervised classification, training the model using current climatic conditions and predicting the zones as spatially-explicit raster surfaces across a range of future climate scenarios. Finally, I'll present a python module (pyimpute) which provides an API to optimize and streamline the process of spatial classification and regression problems.

Outline

This talk will consist of four parts:

A brief overview of climate data and the concept of agro-ecological zones
The theory and intuition behind bioclimatic envelope modeling using supervised classification
Visualization and interpretation of our results
Detailed demonstration of the pyimpute/GDAL/sklearn workflow

Loading spatial data into numpy arrays
Random stratified sampling
Training, assessing and selecting the sklearn classifier
Prediction of zones given future climate data as explanatory variables
Quantifying and interpreting uncertainty
Writing results to spatial data formats
Discussion of performance and memory limitations
Visualizing and interacting with the results

Capture thumb
Rating: Everyone
Viewed 4 times
Recorded at:
Date Posted: November 5, 2015

In this tutorial, attendees will learn how to derive, simulate, and visualize the motion of a multibody dynamic system with Python tools. The tutorial will demonstrate an advanced symbolic and numeric pipeline for a typical multibody simulation problem. These methods and techniques play an important role in the design and understanding of robots, vehicles, spacecraft, manufacturing machines, human motion, etc. At the end, the attendees will have developed code to simulate the uncontrolled and controlled motion of a human or humanoid robot.

We will highlight the derivation of realistic models of motion with the SymPy Mechanics package. We will walk through the derivation of the equations of motion of a multibody system (i.e. the model or the plant), simulating and visualizing the free motion of the system, and finally we will addfeedback controllers to control the plants that we derive.

It is best if the attendees have some background with calculus-based college level physics. They should also be familiar with the SciPy Stack, in particular IPython, SymPy, NumPy, and SciPy. Our goal is that attendees will come away with the ability to model basic multibody systems, simulate and visualize the motion, and apply feedback controllers all in a Python framework.

The tutorial materials including an outline can be viewed here:

https://github.com/pydy/pydy-tutorial-pycon-2014

Capture thumb
Rating: Everyone
Viewed 4 times
Recorded at:
Date Posted: November 5, 2015

SociaLite is a Python-integrated query language for distributed data analysis.
It makes scientific data analysis simple, yet achieves fast performance with its compiler optimizations. The performance of SociaLite is often more than three orders of magnitude faster than Hadoop programs, and close to optimized C programs. For example, PageRank algorithm can be implemented in just 2 lines of SociaLite query, which runs nearly as fast as an optimal parallelized C code.

SociaLite supports well-known high-level concepts to make data analysis easy for non-expert programmers. We support relational tables for storing data, and relational operations, such as join, selection, and projection, for processing the data. Moreover, SociaLite queries are fully integrated with Python, so both SociaLite and Python code can be used to implement data analysis logic. For the integration with Python, we support embedding and extending SociaLite, where embedding supports using SociaLite queries directly in Python code, and extending supports using Python functions in SociaLite queries.

The Python integration makes it easy to implement various analysis algorithms in SociaLite and Python. For example, the BLAST algorithm in bioinformatics can be implemented in just a few lines of SociaLite queries and Python code. Also genome assembly algorithm -- generating a De Bruijn graph and applying Eulerian cycle algorithm -- can be simply implemented. In the talk, I will demonstrate these algorithms in SociaLite as well as more general algorithms such as K-means clustering and logistic regression.

The SociaLite queries are compiled to highly optimized parallel/distributed code; we apply optimizations such as pipelined evaluation and prioritization. The runtime system also speeds up the performance; for example, the customized memory allocator reduces memory allocation time and footprint. In short, SociaLite makes high-performance data analysis easy with its high-level abstractions and compiler/runtime optimizations.

Capture thumb
Rating: Everyone
Viewed 7 times
Recorded at:
Date Posted: November 5, 2015

Tools and libraries for working with geospatial data in Python are currently undergoing rapid development and expansion. Libraries such as shapely, fiona, rasterio, geopandas, and others now provide Pythonic ways of reading, writing, editing, and manipulating geographic data. In this tutorial, participants will be exposed to a number of new and legacy geospatial libraries in Python, with a focus on simple and rapid interaction with geospatial data.

We will utilize Python to interact with geographic data from a database to a web interface, all the while showcasing how Python can be used to access data from online resources, query spatially enabled databases, perform coordinate transformations and geoprocessing functions, and export geospatial data to web-enabled formats for visualizing and sharing with others. Time permitting, we will also briefly explore Python plugin development for the QGIS Desktop GIS environment.

This tutorial should be accessible to anyone who has basic Python knowledge (though familiarity with Pandas, NumPy, matplotlib, etc. will be helpful) as well as familiarity with IPython Notebook. We will take some time at the start of the tutorial to go over installation strategies for geospatial libraries (GDAL/OGR, Proj.4, GEOS) and their Python bindings (Shapely, Fiona, GeoPandas) on Windows, Mac, and Linux. Some knowledge of geospatial concepts such as map projections and GIS data formats will also be helpful.

Outline
Introduction to geospatial data
Map projections, data formats, and looking at maps
Introduction to geospatial libraries
GDAL/OGR (Fiona); Shapely (GEOS); PostGIS; GeoPandas; and more
GeoPandas
Reading data from various sources
Data manipulation and plotting
Writing data to various sources
Getting data from the web
Pushing data to the web (for maps)
Putting it all together
Quick example: From database to web
Introduction to QGIS Desktop GIS (time permitting)
Python interface (PyQGIS)
Building a simple plugin
Plugin deployment

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: November 5, 2015

Computational tools and skills are as critical to the training of physics majors as calculus and math, yet they receive much less emphasis in the undergraduate curriculum. One-off courses that introduce programming and basic numerical problem-solving techniques with commercial software packages for topics that appear in the traditional physics curriculum are insufficient to prepare students for the computing demands of modern technical careers. Yet tight budgets and rigid degree requirements constrain the ability to expand computational course offerings for physics majors.

This talk will present an overview of a recently revamped course at Cal Poly San Luis Obispo that uses Python and associated scientific computing libraries to introduce the fundamentals of open-source tools, version control systems, programming, numerical problem solving and algorithmic thinking to undergraduate physics majors. The spirit of the course is similar to the bootcamps organized by Software Carpentry for researchers in science but is offered as a ten-week for-credit course. In addition to having a traditional in-class component, students learn the basics of Python by completing tutorials on Codecademy's Python track and practice their algorithmic thinking by tackling Project Euler problems. This approach of incorporating online training may provide a different way of thinking about the role of MOOCs in higher education. The early part of the course focuses on skill-building, while the second half is devoted to application of these skills to an independent research-level computational physics project. Examples of recent projects and their results will be presented.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: November 5, 2015

HDF5 is a hierarchical, binary database format that has become the de facto standard for scientific computing. While the specification may be used in a relatively simple way (persistence of static arrays) it also supports several high-level features that prove invaluable. These include chunking, ragged data, extensible data, parallel I/O, compression, complex selection, and in-core calculations. Moreover, HDF5 bindings exist for almost every language - including two Python libraries (PyTables and h5py). This tutorial will cover HDF5 itself through the lens of PyTables.

This tutorial will discuss tools, strategies, and hacks for really squeezing every ounce of performance out of HDF5 in new or existing projects. It will also go over fundamental limitations in the specification and provide creative and subtle strategies for getting around them. Overall, this tutorial will show how HDF5 plays nicely with all parts of an application making the code and data both faster and smaller. With such powerful features at the developer's disposal, what is not to love?!

Knowledge of Python, NumPy, C or C++, and basic HDF5 is recommended but not required.

Outline
Meaning in layout (20 min)

Tips for choosing your hierarchy
Advanced datatypes (20 min)

Tables
Nested types
Tricks with malloc() and byte-counting
Exercise on above topics (20 min)

Chunking (20 min)

How it works
How to properly select your chunksize
Queries and Selections (20 min)

In-core vs Out-of-core calculations
PyTables.where()
Datasets vs Dataspaces
Exercise on above topics (20 min)

The Starving CPU Problem (1 hr)

Why you should always use compression
Compression algorithms available
Choosing the correct one
Exercise
Integration with other databases (1 hr)

Migrating to/from SQL
HDF5 in other databases (JSON example)
Other Databases in HDF5 (JSON example)
Exercise

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: October 29, 2015

CodaLab is a web-based open source platform that allows researchers to share and browse code, data, and create experiments in a truly reproducible manner. CodaLab focuses on accomplishing the following:

Serve as a data repository for data sets including large scale data sets that could only be hosted in a cloud computing environment
Serve as an algorithm repository that researchers can use in their experimentation, to teach and learn from others
Host the execution of experiments as worksheets, sometimes referred to as "executable papers", which are annotated scientific documents that combine textual process descriptions with live data sets and functioning code.
Enable the creation of benchmarks
CodaLab is a community-driven effort led by Percy Liang from Stanford University who built the precursor of CodaLab, namely, MLComp. From a development viewpoint CodaLab supports both the Linux and Windows communities with code in GitHub and Python as one of the main language used to support the scientific community.

At SciPi, we invite the community to participate in CodaLab by creating experiments as executable papers and by sharing them with the rest of the community at http://codalab.org. These worksheets or "executable papers" can then be freely reproduced, appended, and otherwise modified to improve productivity and accelerate the pace of discovery and learning among data-driven scientific professionals.

Capture thumb
Rating: Everyone
Viewed 11 times
Recorded at:
Date Posted: November 5, 2015

This tutorial is targeted to those who are or soon will be teaching numerical methods or scientific computing and are interested in using Python as the programming language for their course. The tutorial will be useful both to academics teaching university courses and those in industry who run training sessions. No prior knowledge of the IPython notebook is necessary, but participants should have some familiarity with Python, Numpy, and Matplotlib.

IPython notebooks are an excellent medium for teaching nuemrical methods since they can include both mathematical explanations and executable code in a single document. The tutorial will begin with an introduction to the IPython notebook, emphasizing how to overcome aspects that can be confusing to students. Next we will go over available free resources for

ensuring that students have a suitable computing environment, using either a cloud platform or a packaged distribution
distributing and collecting notebooks
converting notebooks to other formats that may be useful in a course
We will also review a number of excellent existing resources containing IPython notebooks for numerical methods courses. Using these notebooks as examples, we will discuss how to design effective notebooks for teaching, including

typesetting mathematical equations and expressions using LaTeX
Formatting, referencing, and layout using Markdown
inserting complete or partial code snippets
embedding figures and other media
embedding interactive widgets
We will briefly discuss different approaches to using IPython notebooks in a course, including their use as the basis for

homework assignments
short activities during a class session
longer laboratory sessions
Finally, participants will be asked to develop, individually or in small groups, a notebook of their own that could be used as an assignment, classroom exercise, or lecture.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: October 29, 2015

In preparation for the TOROS project designed to survey the southern hemisphere sky in search for transients, we develop a pipeline for image analysis and processing based on Python.

The code makes extended use of the open source image processing library OpenCV to align the images astrometrically and makes use of other astronomical specific routines like the Astropy package to deal with FITS files. The design will involve integration with SciDB, Astrometry.net, parallelization and other pythonic astronomical tools.

This automated optical transients discovery tool will be tested with real (CSTAR and TORITOS telescopes) and simulated data samples as the input for a machine learning classification tool of light-curves based on AstroML and Scikit-Learn libraries.

The project is version controlled using git and we will handle future collaboration among scientist from different countries using the open source project manager Trac. It will be available as an open source project in popular web repositories like github or bitbucket.

Capture thumb
Rating: Everyone
Viewed 130 times
Recorded at:
Date Posted: October 28, 2015

Rasterio is a GDAL and Numpy-based Python library guided by lessons learned over a decade of using GDAL and Python to solve geospatial problems. Among these lessons: the importance of productivity, enjoyability, and serendipity.

I will discuss the motivation for writing Rasterio and explain how and why it diverges from other GIS software and embraces Python types, protocols, and idioms. I will also explain why Rasterio adheres to some GIS paradigms and bends or breaks others.

Finally, I will show examples of using Rasterio to read, manipulate, and write georeferenced raster data. Some examples will be familiar to users of older Python GIS software and will illustrate how Rasterio lets you get more done with less code and fewer bugs. I will also demonstrate fun and useful features of Rasterio not found in other geospatial libraries.

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: November 5, 2015

IPython provides tools for interactive exploration of code and data. IPython.parallel is the part of IPython that enables an interactive model for parallel execution, and aims to make distributing your work on a multicore computer, local clusters or cloud services such as AWS or MS Azure simple and straightforward. The tutorial will cover how to do interactive and asynchronous parallel computing with IPython, and how to get the most out of your IPython cluster. Some of IPython’s novel interactive features will be demonstrated, such as automatically parallelizing code with magics in the IPython Notebook and interactive debugging of remote execution. Examples covered will include parallel image processing, machine learning, and physical simulations, with exercises to solve along the way.

Introduction to IPython.parallel
Deploying IPython
Using DirectViews and LoadBalancedViews
The basic model for execution
Getting to know your IPython cluster:
Working with remote namespaces
AsyncResult: the API for asynchronous execution
Interacting with incomplete results. Remember, it’s about interactivity
Interactive parallel plotting
More advanced topics:
Using IPython.parallel with traditional (MPI) parallel programs
Debugging parallel code
Minimizing data movement
Task dependencies
Caveats and tuning tips for IPython.parallel

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: November 5, 2015

The tutorial will cover four hours with the following topics

Introduction (10min)
History of scientific societies and publications
Leeuwenhoek was the Man !
The Invisible College
Nullius in Verba
Replication of the early microscope experiments by Leeuwenhoek[a][b]
Image Acquisition (15 min)
Hands on: Cell camera phone microscope
With drop of water
Hands on: Each pair acquires images
Data Sharing (45min)
Image gathering, storage, and sharing (15min)
GitHub (www.github.com)
Figshare (www.figshare.com)
Midas (www.midasplatform.com)
Hands on: Upload the images
Metadata Identifiers (15 min)
Citable
Machine Readable
Hands on: Create data citation and machine readable metadata
Hands on: Download data via RESTful API (15min)
Provenance and
Python scripts
Hands on: Download the data via HTTP
Break (10min)
Local processing (60min)
Replication Enablement (20min)
Package versioning
Virtual Machines
Docker
Cloud services
Hands on:
Create a virtualenv
Run our tutorial package verification script
Revision Control with Git (20min)
Keeping track of changes
Unique hashes
Hands on:
Forking a repository in GitHub
Cloning a repository
Creating a branch
Making a commit
Pushing a branch
Diffing
Merging
Pushing again
Create pull request
Python scripts (20min)
Data analysis, particle counting.
Hands on:
Run scripts on new data
Generate histogram for the data
Testing (30min)
Unit testing with known data
Regression testing with known data
Hands on:
Run tests
Add coverage for another method to the unit tests
Break (10min)
Publication Tools (30min)
Article generation
RST to HTML
GitHub replication and sharing
Hands on:
Run dexy to generate the document
Reproducibility Verification (30min)
Reproducing Works
Publication of Positive and Negative results
Hands on:
Create Open Science Framework (OSF) project
Connect Figshare and Github to OSF project
Fork or link another group’s project in the OSF to run dexy on their work
Infrastructure:

Attendees will use software installed in their laptops to gather and process data, then publish and share a reproducible report.

They will access repositories in GitHub, upload data to a repository and publish materials necessary to replicate their data analysis.

We expect that wireless network will be have moderate bandwidth to allow all attendees to move data, source code and publications between their laptops and hosting servers.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: November 5, 2015

Tools and libraries for working with geospatial data in Python are currently undergoing rapid development and expansion. Libraries such as shapely, fiona, rasterio, geopandas, and others now provide Pythonic ways of reading, writing, editing, and manipulating geographic data. In this tutorial, participants will be exposed to a number of new and legacy geospatial libraries in Python, with a focus on simple and rapid interaction with geospatial data.

We will utilize Python to interact with geographic data from a database to a web interface, all the while showcasing how Python can be used to access data from online resources, query spatially enabled databases, perform coordinate transformations and geoprocessing functions, and export geospatial data to web-enabled formats for visualizing and sharing with others. Time permitting, we will also briefly explore Python plugin development for the QGIS Desktop GIS environment.

This tutorial should be accessible to anyone who has basic Python knowledge (though familiarity with Pandas, NumPy, matplotlib, etc. will be helpful) as well as familiarity with IPython Notebook. We will take some time at the start of the tutorial to go over installation strategies for geospatial libraries (GDAL/OGR, Proj.4, GEOS) and their Python bindings (Shapely, Fiona, GeoPandas) on Windows, Mac, and Linux. Some knowledge of geospatial concepts such as map projections and GIS data formats will also be helpful.

Outline
Introduction to geospatial data
Map projections, data formats, and looking at maps
Introduction to geospatial libraries
GDAL/OGR (Fiona); Shapely (GEOS); PostGIS; GeoPandas; and more
GeoPandas
Reading data from various sources
Data manipulation and plotting
Writing data to various sources
Getting data from the web
Pushing data to the web (for maps)
Putting it all together
Quick example: From database to web
Introduction to QGIS Desktop GIS (time permitting)
Python interface (PyQGIS)
Building a simple plugin
Plugin deployment

Rating: Everyone
Viewed 4 times
Recorded at:
Date Posted: November 5, 2015

Computational and mathematical models can yield profound insights in the study of the spread of infectious diseases, illustrating difficult concepts such as herd immunity, suggesting new avenues for empirical research, and obtaining repeatable, quantifiable evidence in situations where other study designs are difficult if not impossible.

Teaching infectious disease modeling presents a challenge however, as it requires the development of three unrelated skill sets: the theory of infectious disease models, subject-matter expertise about the diseases themselves, and the programming skills needed to implement all but the simplest models.

Rather than forcing these skill sets to be developed in parallel, Zeke is an educational platform meant to allow students to develop these skills in sequence. It uses a zombie epidemic to remove the need for specific expertise regarding disease systems, relying instead on a familiar cultural reference. Students are allowed to first explore the theory of modeling, a Django front-end enabling interaction with models without the need for scientific computing skills. All the models are however implemented to be stand-alone models that can be run using SciPy or other packages, and as students grow in sophistication the open-source nature of Zeke allows them to develop both computational and subject-specific expertise in a stepwise fashion.

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: November 4, 2015

Python is widely used in computational biology, with many high profile bioinformatics software projects, such as Galaxy, Khmer and QIIME, being largely or entirely written in Python. We present scikit-bio, a new library based on the standard Python scientific computing stack (e.g., numpy, scipy, and matplotlib) implementing core bioinformatics data structures, algorithms, parsers, and formatters. scikit-bio is the first bioinformatics-centric scikit, and arises from over ten years of development efforts on PyCogent and QIIME, representing an effort to update the functionality provided by these extensively used tools, and to make that functionality more accessible. scikit-bio is intended to be useful both as a resource for students, who can learn topics such as heuristic-based sequence database searching or iterative progressive multiple sequence alignment from the source code and accompanying documentation, and as a powerful library for 'real-world' bioinformatics developers. To achieve these goals, scikit-bio development is centered around test-driven, peer-reviewed software development; C/Cython integration for computationally expensive algorithms; extensive API documentation and doc-testing based on the numpy docstring standards; user documentation and theoretical discussion of topics in IPython Notebooks; adherence to PEP8; and continuous integration testing. scikit-bio is available free of charge under the BSD license.

Capture thumb
Rating: Everyone
Viewed 27 times
Recorded at:
Date Posted: October 28, 2015

SimpleITK provides scientific image analysis, processing, segmentation and registration for biomedical, microscopy and other scientific fields by supporting multi-dimensional images with physical locations [1]. It's is a layer build upon the Insight Segmentation and Registration Toolkit (ITK) [2].

While there are many Python packages to process 2D photographic images, scientific image analysis adds additional requirements. Images encountered in these domains often have anisotropic pixel spacing, or spatial orientations, and calculations are best performed in physical space as opposed to pixel space.

SimpleITK brings to Python a plethora of capabilities for performing image analysis. Although SimpleITK was developed by the biomedical imaging community, it is also used for generic image processing. It differentiates from OpenCV in offering 3D images and multi-component images, and it differentiates from scipy by offering the abstraction of image classes and their associated data structures. This applies to images modalities such as CT scans, MRI, fMRI, ultrasound, and in microscopy modalities such as confocal, SEM, TEM, and traditional bright and dark field.

Among the key functionalities supported by SimpleITK are over 260 advanced image filtering and segmentation algorithms as well as access to scientific image file formats, including specialized formats such as DICOM, Nifti, NRRD, VTK and other formats that preserve 3D metadata. Example algorithms include Level Sets Segmentation including multi-phase, Label Maps, Region Growing, Statistical Classification, Advanced Thresholding, Geometrical Transformations, Deconvolution, Anti-Aliasing, Edge Detection, Mathematical Morphology on both labels and grayscale images and Fourier Analysis [4,5].

SimpleITK is an open source project with an active community, that builds upon the large amount of image analysis experience of the ITK community [3] working in biomedical images analysis since 1999, and that continues to grow year by year, aggregating state of the art algorithms .

SimpleITK development is sponsored by the US National Library of Medicine.

Capture thumb
Rating: Everyone
Viewed 131 times
Recorded at:
Date Posted: October 29, 2015

Most modern GPS processing techniques started to utilise the time interval between points as a major indicator for finding POIs in a user's trajectories. We are taking step back in this process, and only account for the spatial distribution of measured points in order to extract POIs. Time is solely used as a secondary indicator and for ordering purposes.

Points are first cleaned of any highly inaccurate data and stored in a PostGIS environment. Using developed python module, we extract the point data and order them by time into one large trajectory. Then, for each point, we begin selecting its neighbours (both previous and following) until we reach one within a specified distance of 50m from the first point.The number of the selected points is added to the original point as a new attribute.Continuing with the process the newly calculated values create an imitation of signal, reflecting a point density.

Shift in strength of the signal signifies a change in a user's travel mode which can be used for segmentation of the trajectory into homogeneous parts. Identification of these parts is currently based on the decision tree, where individual aspects of the segment (like average speed, signal strength, time elapsed or centroid) are evaluated and the segment is categorized.

Current work seeks to incorporate neural networks into the processing framework. Since the signal pattern of the each mode of transportation (car, bus, walk, etc.) is independent from the user behaviour, a properly trained model should classify more accurately activities for a broad range of users.

Capture thumb
Rating: Everyone
Viewed 2 times
Recorded at:
Date Posted: November 5, 2015

We will look at the issues that have plagued packaging in the Python ecosystem in the past, and discuss how Conda solves these problems. We will show how to use conda to manage multiple environments. Finally, we will look at how to build your own conda packages.

What is the packaging problem? We will briefly look at the history of the problem and the various solutions to it. There are two sides to the packaging problem: the problem of installing existing packages and the problem of building packages to be installed. We look at the history of distutils, setuptools, distribute, and pip, the some of the problems they solved, and issues that arose, particularly for the scientific Python community.

We will look at the conda package format, the design decisions that guided the format, and the implications of those decisions. A conda package is a bz2 compressed tarfile of all the files installed in a prefix, along with a metadata directory for the package. A conda package is typically installed by hard linking these files into the install prefix. Conda packages should be relocatable, so that they can be installed into any prefix. This allows conda packages to be installed into many virtual environments at once. A conda package is not Python specific.

We will look at how basic commands for installation and environment management. Conda uses a SAT solver to solve package dependency constraints, which is a simple, rigorous, and modern way to ensure that the set of packages that are installed are consistent with one another.

Conda has an extensive build framework which allows anybody to build their own conda packages. We will show how to use these tools and how to upload them to Binstar, a free packaging hosting service.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: November 5, 2015

Introduction
Purpose of matplotlib
Online Documentation
matplotlib.org
Mailing Lists and StackOverflow
Github Repository
Bug Reports & Feature Requests
What is this "backend" thing I keep hearing about?
Interactive versus non-interactive
Agg
Tk, Qt, GTK, MacOSX, Wx, Cairo
Plotting Functions
Graphs (plot, scatter, bar, stem, etc.)
Images (imshow, pcolor, pcolormesh, contour[f], etc.)
Lesser Knowns: (pie, acorr, hexbin, streamplot, etc.)
What goes in a Figure?
Axes
Axis
ticks (and ticklines and ticklabels) (both major & minor)
axis labels
axes title
figure suptitle
axis spines
colorbars (and the oddities thereof)
axis scale
axis gridlines
legend
Manipulating the "Look-and-Feel"
Introducing matplotlibrc
Properties
color (and edgecolor, linecolor, facecolor, etc...)
linewidth and edgewidth and markeredgewidth (and the oddity that happens in errorbar())
linestyle
fonts
zorder
visible
What are toolkits?
axes_grid1
mplot3d
basemap

Capture thumb
Rating: Everyone
Viewed 2 times
Recorded at:
Date Posted: November 5, 2015

Introduction
Purpose of matplotlib
Online Documentation
matplotlib.org
Mailing Lists and StackOverflow
Github Repository
Bug Reports & Feature Requests
What is this "backend" thing I keep hearing about?
Interactive versus non-interactive
Agg
Tk, Qt, GTK, MacOSX, Wx, Cairo
Plotting Functions
Graphs (plot, scatter, bar, stem, etc.)
Images (imshow, pcolor, pcolormesh, contour[f], etc.)
Lesser Knowns: (pie, acorr, hexbin, streamplot, etc.)
What goes in a Figure?
Axes
Axis
ticks (and ticklines and ticklabels) (both major & minor)
axis labels
axes title
figure suptitle
axis spines
colorbars (and the oddities thereof)
axis scale
axis gridlines
legend
Manipulating the "Look-and-Feel"
Introducing matplotlibrc
Properties
color (and edgecolor, linecolor, facecolor, etc...)
linewidth and edgewidth and markeredgewidth (and the oddity that happens in errorbar())
linestyle
fonts
zorder
visible
What are toolkits?
axes_grid1
mplot3d
basemap

Capture thumb
Rating: Everyone
Viewed 2 times
Recorded at:
Date Posted: October 28, 2015

X-ray astronomy is a rapidly expanding field, thanks to the many observations of existing observatories, such as Chandra and XMM-Newton, and the anticipation of high-resolution spectral data from upcoming missions such as Astro-H and Athena+. Understanding these observations and connecting them to astrophysical mechanisms requires not only detailed modeling of the underlying physics but reliable reproduction of the observed phenomena. I present a method of creating synthetic X-ray observations from numerical simulations, which leverages several astronomical Python libraries, including yt, AstroPy, and PyXspec. I will describe the method of generating the observations, the Python packages used, and applications of the method, including connecting observations of galaxy clusters with MHD simulations and preparing simulations for observation proposals.

Capture thumb
Rating: Everyone
Viewed 2 times
Recorded at:
Date Posted: November 5, 2015

NIWA has developed two tools dedicated respectively to the reconstruction of the climates of the past and to the rapid and flexible development of climate services connected to a widely used meteorological database.

PICT (Past Interpretation of Climate Tool allows the user, given a climate proxy or set of proxies, to reconstruct likely anomalies associated with specific proxy epochs. The tool implements the concept of climate analogs, and reconstruct paleo-climate anomalies in terms of mean atmospheric circulation and sea-surface-temperatures, as well as in terms of the possible changes in the probabilities of synoptic weather regimes (or 'attractors' in the climate system). The whole backend of this application has been exclusively developed using Python with Numpy, scipy, pandas and matplotlib scientific libraries. We present a brief overview of the underlying science before exposing the choices made in designing the python-based compute and data visualisation layer.

Clidesc is an application layer, running in the browser, built on top of CLIDE, an open-source database specialized in handling meteorological data in real-time and facilitating its long-term storage. It has been developed using open standards, and facilitate the rapid development of climate services (data analysis and visualisations developed to increase climate intelligence and early warning systems). Clidesc is currently being deployed in several Pacific Islands National Meteorological services. Services can be developed using either R or Python. Development in Python is based on Anaconda and psycopg2, which provides the interface with the postgresql-based Clide database. We present the context and rationale for using open-standards, and give examples of how a user with minimum python knowledge can use templates to rapidly implement a new service tailored to her needs.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: October 28, 2015

Background
What is Earth Engine at a high level?
Why did the Earth Engine (EE) project start? To monitor global deforestation.
What architecture design decisions were made, and why?
Just-in-time computation model
Lazy evaluation for real-time feedback
The Earth Engine Python API
PyPI package: earthengine-api
OAuth authentication
Using IPython Notebooks for algorithm development
Special display methods for interactive maps
Philosophical goals and how they are manifested
Organize the world's (geospatial) information and make it universally accessible and useful
Facilitate open transparent science
Speed up science by reducing the effort required to test hypotheses
Enable collaborative algorithm development
Selected Results
Consumer-grade visualizations
Time-lapse global scale interactive video - blog post, interactive viewer (centered on Austin)
Science-grade Data Products
High-Resolution Global Maps of 21st-Century Forest Cover Change - Science journal publication, blog post, interactive viewer
The Future
Global-scale analysis challenges
An invitation for developers

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: November 4, 2015

Together with theory and experimentation, computational modeling and simulation has become a "third pillar" of scientific enquiry. I am developing a curriculum for a three part, graduate level course on computational methods designed to increase the exposure of graduate students and researchers in the College of Humanities and Social Sciences at the University of Edinburgh to basic techniques used in computational modeling and simulation using the Python programming language. My course requires no prior knowledge or experience with computer programming or software development and all current and future course materials will be made freely available online via GitHub.Together with theory and experimentation, computational modeling and simulation has become a "third pillar" of scientific enquiry. I am developing a curriculum for a three part, graduate level course on computational methods designed to increase the exposure of graduate students and researchers in the College of Humanities and Social Sciences at the University of Edinburgh to basic techniques used in computational modeling and simulation using the Python programming language. My course requires no prior knowledge or experience with computer programming or software development and all current and future course materials will be made freely available online via GitHub.

Capture thumb
Rating: Everyone
Viewed 127 times
Recorded at:
Date Posted: October 28, 2015

Digital holographic microscopy is a fast 3D imaging technique ideally suited to studies of micron-sized objects that diffuse through random walks via Brownian motion [1]. Microspheres fit this category and are widely used in biological assays and as ideal test subjects for experiments in statistical mechanics. Microspheres suspended in water move too quickly to monitor with confocal microscopy. With digital holographic microscopy, 2D images encoding 3D volumes can be recorded at thousands of frames per second [2]. The computationally challenging part of digital holographic microscopy is extracting the 3D information during post-processing.

The open source Holopy package which relies heavily on SciPy and NumPy is used to recover the 3D information via one of two techniques: reconstruction by numerical back-propagation of electromagnetic fields or modeling forward light scattering with Mie theory. The parameter space describing the imaged volume is multidimensional. Even for simple micron-sized spheres, a hologram depends on each sphere's radius and index of refraction in addition to its 3D position. By supplementing Holopy with a GPU-accelerated GUI using PyQt4, we enabled users to interactively adjust the system parameters and see a modeled digital hologram change in response.

Simply adding the capability of interactively manipulating holograms in a GUI led us to notice unexpected discrepancies between the two modeling techniques and failures of both, suggesting further experiments. We observed that the numerical light propagation technique only accurately characterizes the light within a cone stretching from the extent of the image back towards the object. Neither model accurately characterizes the light upstream of the object toward the light source. The GUI was a natural format to interact with the theory and gain insight because it showed us the models in an analogous format to how we see the data on the microscope. Other scientific projects may benefit from tools that allow experimentalists to interact with theory in the same way they interact with their experiments.

[1] Lee et.al., Optics Express, Vol. 15, Issue 26, pp. 18275-18282 (2007) doi: 10.1364/OE.15.018275.

[2] Kaz et.al., Nature Materials, Vol. 11, pp. 138\u2013142 (2012) doi:10.1038/nmat3190.

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: October 28, 2015

Digital holographic microscopy is fast and powerful tool for 3D imaging. Holography captures information about a 3D scene onto a 2D camera using interference. This means that the speed of holographic imaging is limited only by camera speed, making holography an ideal tool for studying fast processes in soft matter systems. However, making use of this encoded information requires significant computational post processing. We have developed and released HoloPy, a python based tool for doing these calculations.

The traditional method for extracting information from holograms is to optically reconstruct by shining light through a hologram to obtain an image of the recorded scene. HoloPy implements the digital equivalent of this, numerical reconstruction, in the form of light propagation by convolution. This is a fast technique based on fast Fourier transforms, which effectively allows refocusing a holographic image after it is taken.

For systems where a detailed scattering model is available, Lee and coworkers showed that it is possible to make more precise measurements by fitting a scattering model to a recorded hologram [1]. We have extended this technique to clusters of spheres [2][3] and to non-spherical particles [4]. HoloPy implements all of these fitting techniques such that they can be used with a few lines of python code. HoloPy also exposes an interface to all of its scattering models compute light scattering of microscopic particles or clusters of particles for other purposes.

HoloPy is open source (GPLv3) and is hosted on launchpad. HoloPy uses Numpy for most of its manipulations, though it calls out to Fortran and C codes to compute light scattering. HoloPy also includes matplotlib and mayavi based tools for visualizing holograms and particles.

[1] Lee et.al., Optics Express, Vol. 15, Issue 26, pp. 18275-18282 (2007)

[2] Fung et. al., JQSRT, Vol 113, Issue 18, pp. 2482-2489 (2012)

[3] Perry et. al., Faraday Discussions, Vol 159, pp. 211-234 (2012)

[4] Wang et. al. JQSRT, (2014)

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: November 5, 2015

ViennaCL provides a BLAS-like interface to a set of OpenCL, CUDA and OpenMP compute kernels for linear algebra operations, such as dense and sparse matrix products, direct and iterative solvers, preconditioners. At the C++ API level, ViennaCL uses templates to represent a mathematical expression graph, for which it then generates an appropriate compute kernel.

Interfacing with a C++ templating API from Python, for which users' expressions are expected to be set at compile time, poses a number of problems for the dynamic creation of objects and execution of arbitrary expressions. For the Python interface, we have a scheduler which takes an expression tree object constructed in Python (using Boost.Python), and then generates and dispatches the relevant kernel, using the relevant data types for the operands. Furthermore, so that users do not regularly incur expensive copying of matrices across slow system buses, PyViennaCL implements various caching mechanisms. Work is currently in progress to support multiple, heterogeneous and distributed platforms, and custom, user-supplied expression nodes, using PyOpenCL and PyCUDA.

To make these features approachable to users familiar with NumPy and SciPy, the PyViennaCL API attempts to be as similar to the NumPy API as possible, providing recognisable classes, methods, and attributes, and transparently converting operand and result types where these things are defined.

This talk will introduce PyViennaCL, covering in more detail the computational architecture described above, as well as these Python API features, and the power of upcoming work to extend the PyViennaCL scheduler and API to custom compute operations, by integrating with PyOpenCL and PyCUDA. In the process, I will provide some comparative benchmark results, to demonstrate the utility of this new work.

Capture thumb
Rating: Everyone
Viewed 4 times
Recorded at:
Date Posted: November 5, 2015

PyMC is a Python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo (MCMC). Its flexibility and extensibility make it applicable to a large suite of problems. Along with core sampling functionality, PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics. PyMC seeks to make Bayesian analysis as painless as possible, so that it may be used by a range of data analysts. Its key features include:

Fits Bayesian statistical models with Markov chain Monte Carlo and other algorithms.
Includes a large suite of well-documented statistical distributions.
Uses NumPy for numerics wherever possible.
Includes a module for modeling Gaussian processes.
Sampling loops can be paused and tuned manually, or saved and restarted later.
Creates summaries including tables and plots.
Traces can be saved to the disk as plain text, Python pickles, SQLite or MySQL database, or hdf5 archives.
Several convergence diagnostics are available.
Extensible: easily incorporates custom step methods and unusual probability distributions.
MCMC loops can be embedded in larger programs, and results can be analyzed with the full power of Python.
The upcoming release of PyMC 3 features an expanded set of MCMC samplers, including Hamiltonian Monte Carlo. For this, we tap into the power of Theano to provide automatic evaluation of mathematical expressions, including gradients used by modern MCMC samplers.

The source and documentation for PyMC can be found on GitHub.

Capture thumb
Rating: Everyone
Viewed 4 times
Recorded at:
Date Posted: October 28, 2015

We present two new visualizations, case tree plots and checkerboard plots, for visualizing emerging zoonoses.

Zoonoses represent an estimated 58% of all human infectious diseases, and 73% of emerging infectious diseases. Recent examples of zoonotic outbreaks include H1N1, SARS and Middle East Respiratory Syndrome, which have caused thousands of deaths combined. The current toolkit for visualizing data from these emerging diseases is limited.

Case tree and checkerboard plots were developed to address that gap. The visualizations are best suited for diseases like SARS for which there are a limited number of cases, with data available on human to human transmission. They a) allow for easy estimation of epidemiological parameters like basic reproduction number b) indicate the frequency of introductory events, e.g. spillovers in the case of zoonoses c) represent patterns of case attributes like patient sex both by generation and over time.

Case tree plots depict the emergence and growth of clusters of disease over time. Each case is represented by a colored node. Nodes that share an epidemiological link are connected by an edge. The color of the node varies based on the node attribute; it could represent patient sex, health status (e.g. alive, dead), or any other categorical attribute. Node placement along the x-axis corresponds with the date of illness onset for the case.

A second visualization, the checkerboard plot, was developed to complement case tree plots. They can be used in conjunction with case tree plots, or in situations where representing a hypothetical network structure is inappropriate.

The plots are available in the open source package epipy, which is available on github. Detailed documentation and examples are also available. In addition to these visualizations, epipy includes functions for common epidemiology calculations like odds ratio and relative risk.

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: November 5, 2015

The aim of this course is to introduce new users to the Bayesian approach of statistical modeling and analysis, so that they can use Python packages such as NumPy, SciPy and PyMC effectively to analyze their own data. It is designed to get users quickly up and running with Bayesian methods, incorporating just enough statistical background to allow users to understand, in general terms, what they are implementing. The tutorial will be example-driven, with illustrative case studies using real data. Selected methods will include approximation methods, importance sampling, Markov chain Monte Carlo (MCMC) methods such as Metropolis-Hastings and Slice sampling. In addition to model fitting, the tutorial will address important techniques for model checking, model comparison, and steps for preparing data and processing model output. Tutorial content will be derived from the instructor's book Bayesian Statistical Computing using Python, to be published by Springer in late 2014.

PyMC forest plot

DAG

All course content will be available as a GitHub repository, including IPython notebooks and example data.

Tutorial Outline
Overview of Bayesian statistics.
Bayesian Inference with NumPy and SciPy
Markov chain Monte Carlo (MCMC)
The Essentials of PyMC
Fitting Linear Regression Models
Hierarchical Modeling
Model Checking and Validation
Installation Instructions
The easiest way to install the Python packages required for this tutorial is via Anaconda, a scientific Python distribution offered by Continuum analytics. Several other tutorials will be recommending a similar setup.

One of the key features of Anaconda is a command line utility called conda that can be used to manage third party packages. We have built a PyMC package for conda that can be installed from your terminal via the following command:

conda install -c https://conda.binstar.org/pymc pymc
This should install any prerequisite packages that are required to run PyMC.

One caveat is that conda does not yet have a build of PyMC for Python 3. Therefore, you would have to build it yourself via pip:

pip install git+git://github.com/pymc-devs/pymc.git@2.3
For those of you on Mac OS X that are already using the Homebrew package manager, I have prepared a script that will install the entire Python scientific stack, including PyMC 2.3. You can download the script here and run it via:

sh install_superpack_brew.sh

Capture thumb
Rating: Everyone
Viewed 2 times
Recorded at:
Date Posted: October 28, 2015

In this paper, we utilize real-time 'social information sources' to automatically detect important events at the urban scale. The goal is to provide city planners and others with information on what is going on, and when and where it is happening. Traditionally, this type of analysis would require a large investment in heavy-duty computing infrastructure, however, we suggest that a focus on real-time analytics in a lightweight streaming framework is the most logical step forward.

Using online Latent Semantic Analysis (LSA) from the gensim Python package, we extract 'topics' from tweets in an online training fashion. To maintain real-time relevance, the topic model is continually updated, and depending on parameterization, can 'forget' past topics. Based on a set of learned topics, a grid of spatially located tweets for each identified topic is generated using standard numpy and scipy.spatial functionality. Using an efficient streaming algorithm for approximating 2D kernel density estimation (KDE), locations with the highest density of tweets on a particular topic are located. Locations are semantically labeled using the learned topics, based on the assumption that events can be directly tied to a particularly popular topic at a particular location.

To facilitate real time visualization of results, we utilize the pico Python/Javascript library as a real-time bridge between server-side Python analysis and client-side Javascript visualization. This enables fast, responsive interactivity of computationally intensive tasks. Additionally, since pico allows streaming data from Python to Javascript, updates to the web-interface are sent and consumed as needed, such that only significant changes in an event's status, or the introduction of a new event, will cause updates to the visualizations. Finally, because all models, data structures, and outputs on the server side are pickle-able Python objects, this entire framework is small enough to be deployed on almost any server with Python installed.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: November 5, 2015

Image analysis is central to a boggling number of scientific endeavors. Google needs it for their self-driving cars and to match satellite imagery and mapping data. Neuroscientists need it to understand the brain. NASA needs it to map asteroids and save the human race. It is, however, a relatively underdeveloped area of scientific computing. Attendees will leave this tutorial confident of their ability to extract information from their images in Python.

Attendees will need a working knowledge of numpy arrays, but no further knowledge of images or voxels or other doodads. After a brief introduction to the idea that images are just arrays and vice versa, we will introduce fundamental image analysis operations: filters, which can be used to extract features such as edges, corners, and spots in an image; morphology, inferring shape properties by modifying the image through local operations; and segmentation, the division of an image into meaningful regions.

We will then combine all these concepts and apply them to several real-world examples of scientific image analysis: given an image of a pothole, measure its size in pixels compare the fluorescence intensity of a protein of interest in the centromeres vs the rest of the chromosome. observe the distribution of cells invading a wound site

Attendees will also be encouraged to bring their own image analysis problems to the session for guidance, and, if time allows, we will cover more advanced topics such as image registration and stitching.

The entire tutorial will be coordinated with the IPython notebook, with various code cells left blank for attendees to fill in as exercises.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: November 4, 2015

Teaching undergraduate students in programming languages like Python is interesting and challenging at the same time. You have to deal mostly with two types of students: Those who have already some experience in programming (not neccessarily Python), e.g. from high school, and those who have not. At Bonn University we have recently changed the structure of such a course for Bachelor of Science in Physics students. First, we include the Python tutorial in a lecture in the first term instead of a voluntary course in the lecture-free time before the fourth semester, where the "Numerical Methods for Physicists" course takes place. Second, instead of a weekly lecture, in which the topics are explained in detail, and a 2 hours exercise class, the new system provides only one introduction lecture per topic but a 3 hours exercise class per week. So especially the tutors are much more responsible for the success of the students in the final report of this course. Furthermore, as a student representative and also tutor for both courses, I have been heavily involved in this process.

I like to initiate a larger discussion about how to teach programming to undergraduate students, especially because in the last decades programming got more and more important in science and due to e.g. the Bologna reform in Europe, it should be easier to change between universities after e.g. the Bachelor program.

Capture thumb
Rating: Everyone
Viewed 6 times
Recorded at:
Date Posted: October 29, 2015

Historically, Matlab has been the primary math software tool used in our courses on Chemical Engineering. Last year, I taught the first course in the department using Python. In this talk I will present how I did that, and why it was possible. The first step was demonstrating that Python + numpy + scipy + matplotlib can solve all the problems we used to solve with Matlab. This was documented in a project called PYCSE through a series of over one hundred blog posts and organized in a web site (1). Second, the development of Python distributions such as Enthought Canopy made it possible to students to easily install and use Python. I had to augment this with some additional functionality with PYCSE (2) which adds some statistical analysis, differential equation solvers, numerical differentiation functions and a publish function to convert Python scripts to PDF files with captured output for grading. The only feature of Python missing is a robust units package; several partial solutions exist, but none solve all the needs of engineering calculations. Third, Emacs + org-mode enabled me to write the course notes with integrated Python code and output. These notes were provided to the students in PDF form, and annotated during lecture using a tablet PC. Finally, the course was administered with box.com and a custom python module to automate assignment collection and return (3). An integrated grade widget in the PDF files that was created when the students published their assignments was used to aggregate the grades for the gradebook. I used an innovative homework schedule of one problem every 2-4 days with rapid feedback to keep students using Python frequently. We used timed quizzes and online exams to assess their learning. Overall, the course was successful. Student evaluations of the course were as good as courses that used other software packages. Based on my experiences, I will continue to use Python and expand its role in engineering education.

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: November 5, 2015

Image analysis is central to a boggling number of scientific endeavors. Google needs it for their self-driving cars and to match satellite imagery and mapping data. Neuroscientists need it to understand the brain. NASA needs it to map asteroids and save the human race. It is, however, a relatively underdeveloped area of scientific computing. Attendees will leave this tutorial confident of their ability to extract information from their images in Python.

Attendees will need a working knowledge of numpy arrays, but no further knowledge of images or voxels or other doodads. After a brief introduction to the idea that images are just arrays and vice versa, we will introduce fundamental image analysis operations: filters, which can be used to extract features such as edges, corners, and spots in an image; morphology, inferring shape properties by modifying the image through local operations; and segmentation, the division of an image into meaningful regions.

We will then combine all these concepts and apply them to several real-world examples of scientific image analysis: given an image of a pothole, measure its size in pixels compare the fluorescence intensity of a protein of interest in the centromeres vs the rest of the chromosome. observe the distribution of cells invading a wound site

Attendees will also be encouraged to bring their own image analysis problems to the session for guidance, and, if time allows, we will cover more advanced topics such as image registration and stitching.

The entire tutorial will be coordinated with the IPython notebook, with various code cells left blank for attendees to fill in as exercises.

Capture thumb

The impact of climate change will resonate through a broad range of fields including public health, infrastructure, water resources, and many others. Long-term coordinated planning, funding, and action are required for climate change adaptation and mitigation. Unfortunately, widespread use of climate data (simulated and observed) in non-climate science communities is impeded by factors such as large data size, lack of adequate metadata, poor documentation, and lack of sufficient computational and visualization resources. Additionally, working with climate data in its native format is not ideal for all types of analyses and use cases often requiring technical skills (and software) unnecessary to work with other geospatial data formats.

We present open source tools developed as part of ClimatePipes and OpenClimateGIS to address many of these challenges by creating an open source platform that provides state-of-the-art user-friendly data access, processing, analysis, and visualization for climate and other relevant geospatial datasets making the climate and other geospatial data available to non-researchers, decision-makers, and other stakeholders.

The overarching goals are:

Enable users to explore real-world questions related to environment and climate change.
Provide tools for data access, geo-processing, analysis, and visualization.
Facilitate collaboration by enabling users to share datasets, workflows, and visualization.
Some of the key technical features include

Support for multiprocessing for large datasets using Python-celery distributed task queuing system
Generic iterators allowing data to be streamed to arbitrary formats (relatively) easily (e.g. ESRI Shapefile, CSV, keyed ESRI Shapefile, CSV, NetCDF)
NumPy based array computations allowing calculations such as monthly means or heat indices optionally on temporally grouped data slices
Decorators to expose existing Python API as a RESTful API
Simple to use, lightweight Web-framework and JavaScript libraries for analyzing and visualizing geospatial datasets using D3 and WebGL.

Capture thumb
Rating: Everyone
Viewed 4 times
Recorded at:
Date Posted: November 5, 2015

This tutorial is targeted to those who are or soon will be teaching numerical methods or scientific computing and are interested in using Python as the programming language for their course. The tutorial will be useful both to academics teaching university courses and those in industry who run training sessions. No prior knowledge of the IPython notebook is necessary, but participants should have some familiarity with Python, Numpy, and Matplotlib.

IPython notebooks are an excellent medium for teaching nuemrical methods since they can include both mathematical explanations and executable code in a single document. The tutorial will begin with an introduction to the IPython notebook, emphasizing how to overcome aspects that can be confusing to students. Next we will go over available free resources for

ensuring that students have a suitable computing environment, using either a cloud platform or a packaged distribution
distributing and collecting notebooks
converting notebooks to other formats that may be useful in a course
We will also review a number of excellent existing resources containing IPython notebooks for numerical methods courses. Using these notebooks as examples, we will discuss how to design effective notebooks for teaching, including

typesetting mathematical equations and expressions using LaTeX
Formatting, referencing, and layout using Markdown
inserting complete or partial code snippets
embedding figures and other media
embedding interactive widgets
We will briefly discuss different approaches to using IPython notebooks in a course, including their use as the basis for

homework assignments
short activities during a class session
longer laboratory sessions
Finally, participants will be asked to develop, individually or in small groups, a notebook of their own that could be used as an assignment, classroom exercise, or lecture.

Capture thumb
Rating: Everyone
Viewed 7 times
Recorded at:
Date Posted: November 5, 2015

Image analysis is central to a boggling number of scientific endeavors. Google needs it for their self-driving cars and to match satellite imagery and mapping data. Neuroscientists need it to understand the brain. NASA needs it to map asteroids and save the human race. It is, however, a relatively underdeveloped area of scientific computing. Attendees will leave this tutorial confident of their ability to extract information from their images in Python.

Attendees will need a working knowledge of numpy arrays, but no further knowledge of images or voxels or other doodads. After a brief introduction to the idea that images are just arrays and vice versa, we will introduce fundamental image analysis operations: filters, which can be used to extract features such as edges, corners, and spots in an image; morphology, inferring shape properties by modifying the image through local operations; and segmentation, the division of an image into meaningful regions.

We will then combine all these concepts and apply them to several real-world examples of scientific image analysis: given an image of a pothole, measure its size in pixels compare the fluorescence intensity of a protein of interest in the centromeres vs the rest of the chromosome. observe the distribution of cells invading a wound site

Attendees will also be encouraged to bring their own image analysis problems to the session for guidance, and, if time allows, we will cover more advanced topics such as image registration and stitching.

The entire tutorial will be coordinated with the IPython notebook, with various code cells left blank for attendees to fill in as exercises.

Capture thumb
Rating: Everyone
Viewed 8 times
Recorded at:
Date Posted: November 5, 2015

Deep learning algorithms have recently garnered much attention for their successes in solving very difficult industrial machine perception problems. However, for many practical purposes, these algorithms are unwieldy due to the rapid proliferation of "hyperparameters" in their specification -- architectural and optimization constants which ordinarily must be specified a priori by the practitioner. There is a growing interest within the machine learning community, and acutely so amongst deep learning researchers, in intelligently automating the selection of hyperparameters for machine learning algorithms by through the use of sequential model-based optimization techniques. [Hyperopt][http://hyperopt.github.io/hyperopt/] is software package designed for this purpose, architected as a general framework for hyperparameter optimization algorithms with support for complicated, awkward hyperparameter spaces that, e.g., involve many hyperparameters that are only meaningful in the context of certain values of other hyperparameters.

[Pylearn2][http://deeplearning.net/software/pylearn2] is a framework for machine learning developed by the LISA laboratory at Université de Montréal; it is a research and prototyping library aimed primarily at machine learning researchers, with a focus on "deep learning" algorithms. Despite being far from a stable release, it has had considerable impact and developed a very active user community outside of the laboratory that birthed it.

This talk will deecribe recent efforts in building a flexible, user-friendly bridge between Pylearn2 and Hyperopt for the purpose of optimizing the hyperparameters of deep learning algorithms. Briefly, it will outline the relevant problem domain and the two packages, the technical challenges we've met in adapting the two for use with one another and our solutions to them, in particular the development of a novel common deferred evaluation/call-graph description language based on functools.partial, which we hope to make available in the near future as a standalone package.

Capture thumb
Rating: Everyone
Viewed 3 times
Recorded at:
Date Posted: November 5, 2015

Parallel and asynchronous computing in python is crippled by pickle's poor object serialization. However, a more robust serialization package would drastically improve the situation. To leverage the cores found in modern processors we need to communicate functions between different processes -- and that means callables must be serialized without pickle barfing. Similarly, parallel and distributed computing with MPI, GPUs, sockets, and across other process boundaries all need serialized functions (or other callables). So why is pickling in python so broken? Python's ability to leverage these awesome communication technologies is limited by python's own inability to be a fully serializable language. In actuality, serialization in python is quite limited, and for really no good reason.

Many raise security concerns for full object serialization, however it can be argued that it is not pickle's responsibility to do proper authentication. In fact, one could apply rather insecure serialization of all objects the objects were all sent across RSA-encrypted ssh-tunnels, for example.

Dill is a serialization package that strives to serialize all of python. We have forked python's multiprocessing to use dill. Dill can also be leveraged by mpi4py, ipython, and other parallel or distributed python packages. Dill serves as the backbone for a distributed parallel computing framework that is being used to design the next generation of large-scale heterogeneous computing platforms, and has been leveraged in large-scale calculations of risk and uncertainty. Dill has been used to enable state persistence and recovery, global caching, and the coordination of distributed parallel calculations across a network of the world's largest computers.

http://pythonhosted.org/dill

https://github.com/uqfoundation

http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/

http://stackoverflow.com/questions/19984152/what-can-multiprocessing-and-dill-do-together?rq=1

https://groups.google.com/forum/#!topic/mpi4py/1fd4FwdgpWY

http://nbviewer.ipython.org/gist/anonymous/5241793

Capture thumb
Rating: Everyone
Viewed 16 times
Recorded at:
Date Posted: November 4, 2015

Functions from libraries such as scipy.optimize, scipy.spatial, statsmodels, and numdifftools comprise the core of the pySI.calibrate routines, which are automatically constructed depending upon the specified model inputs. As a result, the user can focus on identifying different flow systems and understanding the associated spatial processes, rather than the algorithmic divergences which emerge between different models. After calibration is completed, the estimated parameters and their diagnostic statistics can be reported in a uniform fashion. Using functions within pySI.simulate, the parameter estimates can act as inputs in order to predict new flows. More recently developed models, which do not require input parameters, are also made available, allowing comparisons amongst results from differing conceptual formulations. Finally, results may be visualized with plots and networks via matplotlib, igraph, and networkx. Overall, the pySI framework will increase the accessibility of spatial interaction modelling while also serving as a tool which can help new users understand the associated methodological intricacies.

Within this presentation, the concept of spatial interaction and a few key modelling terms will first be introduced, along with several example applications. Next, two traditional techniques for calibrating spatial interaction models, Poisson generalized linear regression and direct maximum likelihood estimation will be contrasted. It will then be demonstrated how this new framework will allow users to execute either form of calibration using identical input variables, which are based upon a pandas DataFrame specification, without any significant mathematical or statistical training. Results from two different conceptual models will be compared to illustrate how pySI can be used to explore different methods and models of spatial interaction.

Capture thumb
Rating: Everyone
Viewed 5 times
Recorded at:
Date Posted: November 5, 2015

As software projects mature and become more robust against bugs, they may also lose some of their runtime performance and memory efficiency. Airspeed velocity (asv) is a new tool to help find those performance degradations before they get out to end users. It automatically runs a benchmark suite over a range of commits in a project's repository, as well as in a matrix of configurations of Python versions and other dependencies. The results, possibly from multiple machines, are then collated and published in a web-based report.

While filling a similar role as projects such as "codespeed" and "vbench", airspeed velocity is designed to be easier to set up and deploy, since it uses only a DVCS repository as its database and the report is deployable to any static web server.

Airspeed velocity provides an easy way to write benchmarks, inspired by "nosetests" and "py.test". It is possible to benchmark runtime, memory usage, or any user-defined metric.

Other features either implemented or in the planning stages include:

tight integration with existing profiling tools, such as RunSnakeRun
parameterized benchmarks to investigate how an algorithm scales with data size
automatic search for degrading commits
The presentation will provide a demo of airspeed velocity, and discuss its early usage for benchmarking the astropy project.

Capture thumb
Rating: Everyone
Viewed 25 times
Recorded at:
Date Posted: November 5, 2015

IPython provides tools for interactive exploration of code and data. IPython.parallel is the part of IPython that enables an interactive model for parallel execution, and aims to make distributing your work on a multicore computer, local clusters or cloud services such as AWS or MS Azure simple and straightforward. The tutorial will cover how to do interactive and asynchronous parallel computing with IPython, and how to get the most out of your IPython cluster. Some of IPython’s novel interactive features will be demonstrated, such as automatically parallelizing code with magics in the IPython Notebook and interactive debugging of remote execution. Examples covered will include parallel image processing, machine learning, and physical simulations, with exercises to solve along the way.

Introduction to IPython.parallel
Deploying IPython
Using DirectViews and LoadBalancedViews
The basic model for execution
Getting to know your IPython cluster:
Working with remote namespaces
AsyncResult: the API for asynchronous execution
Interacting with incomplete results. Remember, it’s about interactivity
Interactive parallel plotting
More advanced topics:
Using IPython.parallel with traditional (MPI) parallel programs
Debugging parallel code
Minimizing data movement
Task dependencies
Caveats and tuning tips for IPython.parallel

Capture thumb
Rating: Everyone
Viewed 22 times
Recorded at:
Date Posted: November 5, 2015

Outline

Overview of astropy (15 minutes) [Greenfield]
Exercise: Import astropy, demonstrate that tools are present and echo simple examples given. (15 minutes: allowing for typical start-up problems)
Units/quantities (15 minutes) [Droettboom]
Exercise: Solve problems using standard units; define new unit; use unit equivalencies; define blackbody function using quantities/units (15 minutes)
Tables (20 minutes) [Aldcroft]
Exercise: read in provided table files and apply requested table manipulations (20 minutes)
Break (15 minutes)
Accessing and updating data
FITS (30 minutes) [Bray]
Exercise: Open supplied data files; manipulate header information; manipulate data; write results; update and append to existing file (30 minutes)
ascii tables (15 minutes) [Aldcroft]
Exercise: Open supplied ascii files, modify and convert into csv files (15 minutes)
coordinates (sky/time) (15 minutes) [Robitaille]
Exercises: solve coordinate/time conversion problems; read in various string representations for coordinates/times; print alternate string representations (15 minutes)

Capture thumb
Rating: Everyone
Viewed 115 times
Recorded at:
Date Posted: November 5, 2015

IPython provides tools for interactive exploration of code and data. IPython.parallel is the part of IPython that enables an interactive model for parallel execution, and aims to make distributing your work on a multicore computer, local clusters or cloud services such as AWS or MS Azure simple and straightforward. The tutorial will cover how to do interactive and asynchronous parallel computing with IPython, and how to get the most out of your IPython cluster. Some of IPython’s novel interactive features will be demonstrated, such as automatically parallelizing code with magics in the IPython Notebook and interactive debugging of remote execution. Examples covered will include parallel image processing, machine learning, and physical simulations, with exercises to solve along the way.

Introduction to IPython.parallel
Deploying IPython
Using DirectViews and LoadBalancedViews
The basic model for execution
Getting to know your IPython cluster:
Working with remote namespaces
AsyncResult: the API for asynchronous execution
Interacting with incomplete results. Remember, it’s about interactivity
Interactive parallel plotting
More advanced topics:
Using IPython.parallel with traditional (MPI) parallel programs
Debugging parallel code
Minimizing data movement
Task dependencies
Caveats and tuning tips for IPython.parallel

Capture thumb
Rating: Everyone
Viewed 6 times
Recorded at:
Date Posted: November 4, 2015

Description

For the University of California, Berkeley and San Francisco, a routine management process of supplier price files used to be a time-consuming process. It is essential to analyze the sometimes tens of thousands of items a supplier offers to make sure the University doesn't accept larger price increases than is in compliance with a contract. A historical figure of past purchases is matched against the current and proposed catalogs and then analyzed to ultimately find out the percentage increase and number of products removed. Each Universities' motivation is to not accept a file that has larger price increases than contracted nor a file with several previously purchased products removed.

To combat the tedious and time-consuming process of manually analyzing the previous spend with the current and proposed files, a Python script was written. This heavily uses Pandas as well as Numpy for computations. The code uploads all three files as a dataframes and creates a common variable to compare similar products. It matches what was previously purchased to the identical products in the current and proposed catalogs. After filtering any 'bad' input that would skew the results, several values are computed and the code outputs the necessary figures to determine if a supplier's price file is acceptable. The code even documents each catalog result automatically so the historical changes are organized and noted in a csv.

This code is an exponential improvement to the manual process that was historically done. The end numbers are known in a matter of seconds as opposed to hours of Excel or Access analysis. For some suppliers, Excel is even incapable of uploading the entire catalog, thus making any analysis nearly impossible. Python and Pandas have not only made the analysts time more efficient but have opened the door for several possibilities.

Although this code has greatly improved this continuous analysis, more advanced techniques could potentially improve the process. The department soon hopes to use a forecasted spend figure rather than a historical snapshot to project spend against the proposed catalog. Moving forward, having the analysts armed with Python knowledge, Strategic Sourcing hopes to yield more meaning through the daily flow of spend data through machine learning techniques.

Capture thumb
Rating: Everyone
Viewed 7 times
Recorded at:
Date Posted: November 4, 2015

In the decade between 1999 and 2008, more newly-approved, first-in-class drugs were found by phenotypic screens than by molecular target-based approaches. This is despite far more resources being invested in the latter, and highlights the rising importance of screens in biomedical research. (Swinney and Anthony, Nat Rev Drug Discov, 2011)

Despite this success, the data from phenotypic screens is vastly underutilized. A typical analysis takes millions of images, obtained at a cost of, say, $250,000, and reduces each to a single number, a quantification of the phenotype of interest. The images are then ranked by that value and the top-ranked images are flagged for further investigation. (Zanella et al, Trends Biotech, 2010)

The images, however, contain a lot more information than just a single phenotypic number. For one, usually only the mean phenotype of all the cells in the image is reported, with no information about variability, even though the distribution of cell shapes in a single image is highly informative (Yin et al, Nat Cell Biol, 2013). Additionally, cells display a variety of off-target phenotypes, independently of the target, that can provide biological insight and new research avenues.

We are developing an unsupervised clustering pipeline, tentatively named high-content-screen unsupervised sample clustering (HUSC), that leverages the scientific Python stack, particularly scipy.stats, pandas, scikit-image, and scikit-learn, to summarize images with feature vectors, cluster them, and infer the functions of genes corresponding to each cluster. The library includes functions for preprocessing images, computing an array of features designed specifically for microscopy images, and accessing a MongoDB database containing sample data. Its API allows easy extensibility by placing screen-specific functions under the screens sub-package. An example IPython notebook with a preliminary analysis can be found here.

We plan to use this library to develop a flexible web interface for flexible and extensible analysis of high-content screens, and relish the opportunity to enlist the help and expertise of the SciPy crowd.

Capture thumb
Rating: Everyone
Viewed 11 times
Recorded at:
Date Posted: November 4, 2015

We introduce a high-performance, open-source application written in Python that models genomic data with a context-free grammar (CFG), a construct from formal language theory. This approach is intended to advance fundamental science by delivering a more extensive model of the genetic interaction of diseases. Current comparative models treat genomic sequences as strings, and recent advances are little more than optimizations of the \"grep approach\". However a genome is a grammar: it is parsed, follows rules, and has an inherent hierarchical structure. Understanding the structure and rules of this implied grammar are essential for mapping loci to diseases when those loci are distributed across genomic regions.

To produce the CFGs, we have implemented the Sequitur algorithm to run on the AWS Elastic MapReduce platform. This application is written in Python and uses the following packages: MRjob, boto, and pandas. This is a petascale computing pipeline that is successful because it uses inherently scalable services and is able to take advantage of the 100G Internet2 connection between Amazon Web Services and the National Institutes of Health (NIH). This architecture delivers unprecedented transfer speeds and relatively low latency.

We discuss the advantages of this architecture, especially for groups without comparable local resources. In reviewing the results of our computation, we not only look at methods to measure the utility of our CFG models, but also the computational advantages of this approach. Just like the fastest alignment algorithms, this complex approach still operates within linear-space. In addition, future pairwise comparisons are faster because our CFGs act as a compressed representation of the raw sequence data. Our hope is that this CFG approach is further tested as a replacement for raw sequence analysis. In addition, we hope that our bioinformatics pipeline serves as an example for the SciPy community on how to perform large computations across the many petabytes made available by NIH.

Capture thumb
Rating: Everyone
Viewed 13 times
Recorded at:
Date Posted: November 4, 2015

To assess how well ocean models are performing, the model products need to be compared with data. Finding what models and data exist has historically been challenging because this information is held and distributed by numerous providers. Accessing data has been challenging because ocean models produce gigabytes or terabytes of information, is usually stored in scientific data formats like HDF or NetCDF, while ocean observations are often stored in scientific data formats or in databases.

To solve this problem, the Integrated Ocean Observing System (IOOS) has been building a distributed information system based on standard web services for discovery and access. IOOS is now embarking on a nationwide system-test using python to formulate queries, process responses, and analyze and visualize the data. An end-to-end (search-access-analyze-visualize) workflow for assessing storm-driven water levels predicted by coastal ocean models will be discussed, which uses OWSLib for OGC CSW service catalog access, Iris for ocean model access and pyoos (which wraps OWSLib) for Sensor Observation Service data access. Analysis and visualization is done with Pandas and Cartopy, and the entire end-to-end workflow is shared as in IPython Notebook with custom environment in Wakari.

Capture thumb
Rating: Everyone
Viewed 30 times
Recorded at:
Date Posted: November 5, 2015

This tutorial is targeted to those who are or soon will be teaching numerical methods or scientific computing and are interested in using Python as the programming language for their course. The tutorial will be useful both to academics teaching university courses and those in industry who run training sessions. No prior knowledge of the IPython notebook is necessary, but participants should have some familiarity with Python, Numpy, and Matplotlib.

IPython notebooks are an excellent medium for teaching nuemrical methods since they can include both mathematical explanations and executable code in a single document. The tutorial will begin with an introduction to the IPython notebook, emphasizing how to overcome aspects that can be confusing to students. Next we will go over available free resources for

ensuring that students have a suitable computing environment, using either a cloud platform or a packaged distribution
distributing and collecting notebooks
converting notebooks to other formats that may be useful in a course
We will also review a number of excellent existing resources containing IPython notebooks for numerical methods courses. Using these notebooks as examples, we will discuss how to design effective notebooks for teaching, including

typesetting mathematical equations and expressions using LaTeX
Formatting, referencing, and layout using Markdown
inserting complete or partial code snippets
embedding figures and other media
embedding interactive widgets
We will briefly discuss different approaches to using IPython notebooks in a course, including their use as the basis for

homework assignments
short activities during a class session
longer laboratory sessions
Finally, participants will be asked to develop, individually or in small groups, a notebook of their own that could be used as an assignment, classroom exercise, or lecture.

Capture thumb
Rating: Everyone
Viewed 11 times
Recorded at:
Date Posted: November 4, 2015

Detailed Abtract

In the last years, IPython, a comprehensive environment for interactive and exploratory computing, has arose as must-have application to run in the daily scientific work-flow because provides, not only an enhanced interactive Python shell (terminal or qt-based), but also a very popular interactive browser-based notebook with an unimaginable scope.

The presentation of our research results and the teaching of our knownledge are very important steps in the scientific research work-flow, and recently, the IPython Notebook has began to be used for all kind of oral communications in several conferences, curses, classes and bootcamps.

Despite the fact that we can present our talks with the IPython notebook itself or through a static Reveal.js-based slideshow powered by IPython.nbconvert (a tool we presented at SciPy 2013), there is not a full-featured and executable (live) IPython presentation tool available. So, we developed a new IPython-Reveal.js-powered live slideshow extension, designed specifically to be used directly from the IPython notebook and to be as executable as the notebook is (because deep inside, it is the notebook itself but rendered with another face), and also powered with a lot of features to address the most common tasks performed during the oral presentation and spreading of our scientific work, such as: main slides and nested slides, fragments views, transitions, themes and speaker notes.

To conclude, we have developed a new visualization tool for the IPython Notebook, suited for the final steps of our scientific research work-flow, providing us with, not only an enhanced and new experience in the oral presentation and communication of our results, but also with a super-powerful tool at teaching time, helping us to easily tranfer our concepts and spread our knowledge.

Important

You can see the extension in action in the following video link: http://www.youtube.com/watch?v=Pc-1FS0l2vg

And you can also see the source code of the extension at the following repository: http://github.com/damianavila/live_reveal

Capture thumb
Rating: Everyone
Viewed 12 times
Recorded at:
Date Posted: November 4, 2015

We present a new method for distributing and using Python that requires no dependencies beyond the Google Chrome web browser. By combining the static linking methodology of traditional supercomputer-style deployments of Python with the technology Portable Native Client (PNaCl) we have constructed a method for building, deploying, and sharing fully-sandboxed scientific python stacks that require no client-side installation: the entire IPython notebook and scientific python stack, in a website, at native speeds. We will present this technology, along with some of its potential applications, describing its shortcomings and future extensibility. We will conclude by demonstrating an IPython notebook run completely client side with no out-of-browser components, backed by Google Drive and an HTML5 File System, and able to pass numpy arrays as typed arrays into the browser without serialization as JSON.

We will begin by briefly describing the problems with deploying scientific python as a stack, particularly the dependency graph, installation time, and so on.

We'll describe the PNaCl technology and build system for scientific python, including how individuals can create their own .pexes with their own application stack

We'll describe potential applications, such as bundling safe, sandboxed executables with scripts and lessons

We will demonstrate a complete system for running the IPython notebook in a sandboxed, Google Chrome window

We'll conclude by describing methods that this system could be extended to run sandboxed python executables on any system, independent of the Chrome web browser, such as supercomputers and non-virtualized hosting providers

Capture thumb

The tutorial will cover four hours with the following topics

Introduction (10min)
History of scientific societies and publications
Leeuwenhoek was the Man !
The Invisible College
Nullius in Verba
Replication of the early microscope experiments by Leeuwenhoek[a][b]
Image Acquisition (15 min)
Hands on: Cell camera phone microscope
With drop of water
Hands on: Each pair acquires images
Data Sharing (45min)
Image gathering, storage, and sharing (15min)
GitHub (www.github.com)
Figshare (www.figshare.com)
Midas (www.midasplatform.com)
Hands on: Upload the images
Metadata Identifiers (15 min)
Citable
Machine Readable
Hands on: Create data citation and machine readable metadata
Hands on: Download data via RESTful API (15min)
Provenance and
Python scripts
Hands on: Download the data via HTTP
Break (10min)
Local processing (60min)
Replication Enablement (20min)
Package versioning
Virtual Machines
Docker
Cloud services
Hands on:
Create a virtualenv
Run our tutorial package verification script
Revision Control with Git (20min)
Keeping track of changes
Unique hashes
Hands on:
Forking a repository in GitHub
Cloning a repository
Creating a branch
Making a commit
Pushing a branch
Diffing
Merging
Pushing again
Create pull request
Python scripts (20min)
Data analysis, particle counting.
Hands on:
Run scripts on new data
Generate histogram for the data
Testing (30min)
Unit testing with known data
Regression testing with known data
Hands on:
Run tests
Add coverage for another method to the unit tests
Break (10min)
Publication Tools (30min)
Article generation
RST to HTML
GitHub replication and sharing
Hands on:
Run dexy to generate the document
Reproducibility Verification (30min)
Reproducing Works
Publication of Positive and Negative results
Hands on:
Create Open Science Framework (OSF) project
Connect Figshare and Github to OSF project
Fork or link another group’s project in the OSF to run dexy on their work
Infrastructure:

Attendees will use software installed in their laptops to gather and process data, then publish and share a reproducible report.

They will access repositories in GitHub, upload data to a repository and publish materials necessary to replicate their data analysis.

We expect that wireless network will be have moderate bandwidth to allow all attendees to move data, source code and publications between their laptops and hosting servers.

Capture thumb
Rating: Everyone
Viewed 10 times
Recorded at:
Date Posted: November 5, 2015

The aim of this course is to introduce new users to the Bayesian approach of statistical modeling and analysis, so that they can use Python packages such as NumPy, SciPy and PyMC effectively to analyze their own data. It is designed to get users quickly up and running with Bayesian methods, incorporating just enough statistical background to allow users to understand, in general terms, what they are implementing. The tutorial will be example-driven, with illustrative case studies using real data. Selected methods will include approximation methods, importance sampling, Markov chain Monte Carlo (MCMC) methods such as Metropolis-Hastings and Slice sampling. In addition to model fitting, the tutorial will address important techniques for model checking, model comparison, and steps for preparing data and processing model output. Tutorial content will be derived from the instructor's book Bayesian Statistical Computing using Python, to be published by Springer in late 2014.

PyMC forest plot

DAG

All course content will be available as a GitHub repository, including IPython notebooks and example data.

Tutorial Outline
Overview of Bayesian statistics.
Bayesian Inference with NumPy and SciPy
Markov chain Monte Carlo (MCMC)
The Essentials of PyMC
Fitting Linear Regression Models
Hierarchical Modeling
Model Checking and Validation
Installation Instructions
The easiest way to install the Python packages required for this tutorial is via Anaconda, a scientific Python distribution offered by Continuum analytics. Several other tutorials will be recommending a similar setup.

One of the key features of Anaconda is a command line utility called conda that can be used to manage third party packages. We have built a PyMC package for conda that can be installed from your terminal via the following command:

conda install -c https://conda.binstar.org/pymc pymc
This should install any prerequisite packages that are required to run PyMC.

One caveat is that conda does not yet have a build of PyMC for Python 3. Therefore, you would have to build it yourself via pip:

pip install git+git://github.com/pymc-devs/pymc.git@2.3
For those of you on Mac OS X that are already using the Homebrew package manager, I have prepared a script that will install the entire Python scientific stack, including PyMC 2.3. You can download the script here and run it via:

sh install_superpack_brew.sh

Capture thumb
Rating: Everyone
Viewed 17 times
Recorded at:
Date Posted: October 28, 2015

The choice of colormap in a scientific figure significantly affects the way the presented information is perceived by the viewer. This follows on Damon McDougall's talk on how to choose a colormap for an application by delving deeper into several important issues and how well many of the available Matplotlib colormaps stand up against the concerns. For example, it is known that the human brain is better able to interpret changes in magnitude of the luminance and saturation of colors in colormaps instead of the hue. Also, some research has shown that logarithmic changes in brightness are perceived as linear changes. Next, being able to print a color plot in black and white from a published paper is sometimes mandatory and often desirable, and is related to the grey scale in a colormap. Finally, it is important to account for various types of color blindness when choosing a divergent colormap for the plot to be as accessible as possible. All of these concerns have implications for the design of colormaps, and will be examined in the context of the properties of the available Matplotlib colormaps in order to make a best choice for a given application.

Capture thumb
Rating: Everyone
Viewed 25 times
Recorded at:
Date Posted: November 5, 2015

In scientific data mining and machine learning, a fundamental division is that of the frequentist and Bayesian approaches to statistics. Often the fodder for impassioned debate among statisticians and other practitioners, the subtle philosophical differences between the two camps can lead to surprisingly different practical approaches to the analysis of scientific data.

In this talk I will delve into both the philosophical and practical aspects of Bayesian and frequentist approaches, drawing from a series of posts from my blog.

I'll start by addressing the philosophical differences between frequentism and Bayesianism, which boil down to different definitions of probability. I'll next move briefly into the mathematical details behind the two approaches, at a level which will be informative to a general scientific audience. I'll then show some examples of the two approaches applied to some increasingly more complicated problems using standard Python packages, namely: NumPy, SciPy, Matplotlib, and emcee.

With this combination of philosophy and practical examples, the audience should walk away with a much better understanding of the differences between frequentist and Bayesian approaches to statistical analysis, and especially how the philosophy of each approach affects the practical aspects of computation in data-intensive scientific research.

Capture thumb
Rating: Everyone
Viewed 119 times
Recorded at:
Date Posted: November 5, 2015

Image analysis is central to a boggling number of scientific endeavors. Google needs it for their self-driving cars and to match satellite imagery and mapping data. Neuroscientists need it to understand the brain. NASA needs it to map asteroids and save the human race. It is, however, a relatively underdeveloped area of scientific computing. Attendees will leave this tutorial confident of their ability to extract information from their images in Python.

Attendees will need a working knowledge of numpy arrays, but no further knowledge of images or voxels or other doodads. After a brief introduction to the idea that images are just arrays and vice versa, we will introduce fundamental image analysis operations: filters, which can be used to extract features such as edges, corners, and spots in an image; morphology, inferring shape properties by modifying the image through local operations; and segmentation, the division of an image into meaningful regions.

We will then combine all these concepts and apply them to several real-world examples of scientific image analysis: given an image of a pothole, measure its size in pixels compare the fluorescence intensity of a protein of interest in the centromeres vs the rest of the chromosome. observe the distribution of cells invading a wound site

Attendees will also be encouraged to bring their own image analysis problems to the session for guidance, and, if time allows, we will cover more advanced topics such as image registration and stitching.

The entire tutorial will be coordinated with the IPython notebook, with various code cells left blank for attendees to fill in as exercises.

Capture thumb
Rating: Everyone
Viewed 14 times
Recorded at:
Date Posted: November 4, 2015

Web of Trails (WOT) is an open source project that uses context-free grammars (CFG's) as the basic building block for search. Current search technology relies upon the presence of words on a page, sometimes augmented with statistical correlations among words. Even with these restrictions, maintenance of an index requires storage much greater than the input size (a polynomial function of it). CFG's have been used for decades in compilation and language tools, and more recently in data compression.

The primary advantage of this CFG approach, based upon the Sequitur algorithm, is that it indexes content in linear-space, not polynomial-space. The secondary advantage is that combined with research in inference, grammars can express human concepts and connections rather than just correlations. This project uses grammar and syntactic analysis to replace lexical and word-based approaches to the problem of searching collections of digital artifacts. Benchmarking in web content indexing will be shown relative to popular alternatives such Apache Lucene and Amazon Cloud Search.

In addition to implementing content indexing with Sequitur, this project will enable domain-specific extensions of WOT. Once complete, we will research novel techniques for generalizing the grammars inferred by Sequitur. As this fundamental research develops, it will inform later framework development and increase search precision. This is a big leap in the state of the art, as text artifacts are no longer represented as bags of words, but as bags on non-terminals in a growing and adapting grammar.

Capture thumb
Rating: Everyone
Viewed 14 times
Recorded at:
Date Posted: October 29, 2015

The FOSSEE team has been promoting the use of FOSS in educational institutions in India. It focuses on the following FOSS systems currently: Scilab, Python, Oscad, OpenFOAM, COIN-OR and OpenFormal (http://fossee.in). On each of these systems, the following three standardised help are provided:

Support to conduct spoken tutorial based workshops, explained below
Creation of Textbook Companion (TBC)
Support to Lab Migration.
A TBC is a collection of code for solved examples of standard textbooks. We have completed a large number of TBCs on Scilab and made them available for online use at Scilab and offline use at Completed Books. Similarly, one may access the Python TBC at Python TBC. These TBCs are created by students and teachers from many colleges and each creator is paid an honorarium through a project funded by the Government of India.

Spoken Tutorial is a screencast of ten minute duration on a FOSS topic, created for self learning. Using Spoken Tutorials, we conduct two hour long workshops on FOSS topics through volunteers, who need not be experts. We conduct online tests and provide certificates for all who pass the tests. All of these are done completely free of cost, thanks to the financial support from the Government of India. Using this method, we have trained more than 200K students in the last two years in India (statistics).

The students love this method, see some testimonials at http://www.spoken-tutorial.org/testimonials. It is being increasingly accepted by colleges and universities, officially. We expect to conduct 5K to 10K workshops in 2014, training 200K to 500K students. As we dub the spoken part into all 22 languages of India, the FOSS topics are accessible also to students who are not fluent in English, thereby helping us reach out to many students. Spoken Tutorial, which started as a documentation project for FOSS systems has transformed into a massive training programme. Our methods are scalable and are available to the FOSS enthusiasts in the rest of the world.

Capture thumb
Rating: Everyone
Viewed 9 times
Recorded at:
Date Posted: November 5, 2015

Good solutions to hard problems require both domain and algorithmic expertise. Domain experts know what to do and computer scientists know how to do it well. Coordination between the algorithmic and domain programmer is challenging to do well and difficult to scale. It is also arguably one of the most relevant blocks to scientific progress today.

This talk draws from experience supporting mathematical programmers in the SymPy project. SymPy is a computer algebra system, a complex problem that requires the graph manipulation algorithms of a modern compiler alongside the mathematics of several PhD theses. SymPy draws from a broad developer base with experienced and novice developers alike and so struggles to maintain a cohesive organized codebase.

We approach this development problem by separating software engineering into a collection of small functions, written by domain experts, alongside an abstract control system, written by algorithmic programmers. We facilitate this division with techniques taken from other languages and compiler technologies. Notably we motivate the use of a few general purpose libraries for multiple dispatch, pattern matching, and programmatic control.

Capture thumb
Rating: Everyone
Viewed 11 times
Recorded at:
Date Posted: November 5, 2015

Astronomical data (whether images on the sky, or other data) are typically stored with information about their corresponding projection (Gnomonic, Mercator, Conical, Aitoff, and many more) and coordinate system (Equatorial, Galactic, Ecliptic, and so on).

I will present WCSAxes, a new framework for plotting such astronomical data, developed as part of the Astropy project. WCSAxes consists primarily of a Matplotlib Axes sub-class that seamlessly handles the plotting of ticks, tick labels, and grid lines for arbitrary coordinate systems and projections.

As an example, the following plot was produced with WCSAxes:

The Galactic Center as seen by Chandra

(Image Credit: NASA/CXC/UMass/D. Wang et al. - http://chandra.harvard.edu/photo/2009/gcenter/)

Since it is a sub-class of the Matplotlib Axes class, all the default Matplotlib methods such as plot, scatter, imshow, contour, as well as patches, lines, collections, and so on are supported, and WCSAxes - in combination with Matplotlib's ability to accept arbitrary transformations - makes it very easy to define whether the plotting should apply to pixel coordinates, or a world coordinate system related to the data.

WCSAxes has been designed as a framework that can be easily used in other Python tools, and it is planned for inclusion in Glue, APLpy, and other astronomical tools. While originally written for Astronomical images, it should be easily extendable to any kind of map (such as Earth-based geospatial data) provided that the projection and coordinate system can be represented by a pixel-to-world transformation.