Quantcast
Viewing all articles
Browse latest Browse all 11

Picks from PyData 2012

A selection of interesting videos to watch from PyData a “semi-annual” event for scientists, engineers, and data analysts in the Python community.

Python and Javascript Web Visualizations – Chris Mueller from Continuum Analytics on Vimeo.

Web-based, data-intensive applications have historically been limited in the types of interactive visualizations they can present to end-users, relying either on server-side applications or plugins for rendering charts and plots. The addition of HTML5, Canvas, and SVG to most modern browsers, along with the large performance improvements in JavaScript interpreters, has made it possible to create highly interactive visualizations directly in the browser. In this talk, Chris will show how to create a fully interactive visualization system for exploring large data sets using Python and JavaScript. Chris will also introduce some of the common JavaScript libraries that streamline client-side development and help developers create well-architected user interfaces in the browser.

This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/

SciKit-Learn Tutorial – Jake VanderPlas from Continuum Analytics on Vimeo.

Machine Learning is a discipline involving algorithms designed to find patterns in and make predictions about data. It is nearly ubiquitous in our world today, and used in everything from web searches to financial forecasts to studies of the nature of the Universe. This tutorial will offer an introduction to scikit-learn, a python machine learning package, and to the central concepts of Machine Learning. We will introduce the basic categories of learning problems and how to implement them using scikit-learn. From this foundation, we will explore practical examples of machine learning using real-world data, from handwriting analysis to automated classification of astronomical images.

This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/

NLTK and Text Procesing – Andrew Montalenti from Continuum Analytics on Vimeo.

Python's Natural Language Toolkit is one of the most widely used and actively developed natural language processing libraries in the open source community. This workshop will introduce the audience to NLTK — what problems its aims to solve, how it differs from other natural language libraries in approach, and how it can be used for large-scale text analysis tasks. Concrete examples will be taken from Parse.ly's work on news article analysis, covering areas such as entity extraction, keyword collocations, and corpus-wide analysis.

This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/

Wikipedia Indexing and Analysis – Didier Deshommes from Continuum Analytics on Vimeo.

Wikipedia’s corpus makes it ideal for doing some natural language procesing tasks (NLP). This talk will cover how to extract data out of Wikipedia for your own use using Python, MongoDB and Solr; it will also cover how to use this data to do familiar NLP tasks such as named entity recognition and suggesting related articles.

This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/

iPython Notebook – Brian Granger from Continuum Analytics on Vimeo.

Data focused computing involves many stages: exploration, visualization, production mode computing, collaboration, debugging, development, presentation and publication. The IPython Notebook is a web based interactive computing environment that can carry the data scientist through all of these stages. The Notebook enables users to build documents that combine live, runnable code with text, LaTeX formulas, images and videos. These documents are version controllable/sharable and preserve a full record of a computation, its results and accompanying material. In this talk I will introduce the Notebook, show how to configure and run it, illustrate its main features and discuss its future.

This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/

Matplotlib Tutorial – Jake VanderPlas from Continuum Analytics on Vimeo.

An important part of data-intensive scientific computing is data visualization. Matplotlib offers a full-featured data visualization package within Python, which is built to interface well with numpy, scipy, Ipython, and related tools. In this tutorial we will introduce and explore the basic features of plotting with matplotlib; from simple plots such as line diagrams, scatter-plots, and histograms, to more sophisticated features such as three dimensional plotting and animations.

This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/

Intro to SciPy – Hugo Shi and Travis Oliphant from Continuum Analytics on Vimeo.

This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/

PyTables – Francesc Alted from Continuum Analytics on Vimeo.

HDF5 is a standard de-facto binary file type specification. However, what makes HDF5 great is the numerous libraries to interact with files of this type and their extremely rich feature set. HDF5 has many bindings for different languages, like C, C, Fortran, Java, Perl and, of course, Python.

During my tutorial I'm going to explain the basics on using HDF5 through PyTables, one of the Python bindings for Python, and how PyTables leverages (and enhances) HDF5 capabilities so as to cope with extremely large datasets, specially in tabular format.

I'll start describing the basic capabilities that PyTables exposes out of HDF5, like creating and accessing large multidimensional datasets, both homogeneous and heterogeneous, and how they can be annotated with user-defined metadata (attributes).

Then, I'll proceed on specific features of PyTables, like high performance compressors (Blosc), automatic parametrization for optimizing performance, how to do very fast queries (using OPSI, a query engine that allows different size/performance ratios in the indexes), and will finish with a glimpse on how to perform out-of-core (also called out-of-memory) computations on huge datasets in a very efficient, memory conscious, way (via the high performance numexpr library).

This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/

Python for Business Intelligence – Stefan Urbanek from Continuum Analytics on Vimeo.

Introduction to business intelligence, data warehousing and online analytical processing with Cubes. Cubes is a lightweight Python framework and OLAP server that provides business point of view modeling for multidimensional data analysis.

This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/


Viewing all articles
Browse latest Browse all 11

Trending Articles