| (H)HMM Library and Designer: This library allows probabilistic sequence models to be constructed through use of Hidden Markov models (HMMs) and Hierarchical Markov models HMMs (HHMMs) in Ocaml programming language. [GPL] |
| A Boosting Algorithm for Classification of Semi-Structured Text: Algorithmic classification of text 'word bags' in order to speed online processing. Article only. |
| A Brief Introduction to Graphical Models and Bayesian Networks: Good explanations with algorithms of some of the tools of natural processing dealing with uncertainty and complexity including inference, explaining away, top-down, bottom-up reasoning, causiality, conditional independence and more. Article only. |
| A Neural Network Face Recognition Assignment: A full dataset for face recognition with face images directory. [AFL] |
| AAAI Machine Learning Directory: AI Topics provides basic, understandable information and helpful resources concerning artificial intelligence, with an emphasis on material available online. By the American Association for Artificial Intelligence. [FREE] |
| ABC++ of Neural Nets: The Programming of Common Models: Information and software for hands-on neural net programming. [GPL] |
| AI Horizon: Computer Science and Artificial Intelligence Programming Resources: Directory of artificial intelligence resources with some nice essays on chess and Go AI with source code links. [GPL] |
| Algorithms and Data Structures for Machine Learning: Introduction to algorithms as well as examples of 1D, 2D lists, stacks, queues, hierarchical lists, priority queues, table, trees, and much more. Many with online demonstrations. |
| ANNI: Artificial Neural Network Investing: A program to perform securities modeling using artificial neural networks and genetic algorithms for customizable prediction. [COM] |
| Applying Machine Learning to Solve an Estimation Problem in Software Inspections: Study of neural network processing in application of inspections using software. Article only. |
| Artificial Intelligence for Beginners: Short overview on AI in game theory with links to game sites and references. Article only. |
| Bayes Net Toolbox for Matlab: Supports several inference algorithms and learning algorithms. Allows simulation of static and dynamic networks, including HMMs, IOHMMs, and Kalman filters. [GPL] |
| Bayes++: Open Source Bayesian Filtering Classes: A library of C++ classes for Bayesian Filtering of discrete systems. [MIT] |
| BayesBuilder: Bayesian network construction tool: This tool supports discrete gaussians and efficient noisy-OR nodes necessary in large networks. Node search, undo/redo, and automatic network layout are also supported. Written in C++ with a Java front end. [GPL] |
| BETSY: A Bayesian Essay Test Scoring System: A windows-based program that classifies text based on trained material. Designed for automated essay scoring, BETSY can be applied to any text classification task. [GPL] |
| BNET, Belief Network Tools & VisionKit, Computer Vision Components: A developer toolkit for researchers and engineers to embed belief networks in software applications. Nice online demo. [COM] |
| Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering: A library of C code useful for writing statistical text analysis, language modeling, and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow). [LGPL] |
| C4.5 and FOIL: The home page of R. Quinlan with FTP links to FOIL (inductive logic programming) and C4.5 (learning decision trees). [LGPL] |
| Chat Room Bots: Part 1: A summary study of these programs. What they are, where do they come from, and how developed. Article only. |
| Chat Room Bots: Part 2: The broadcasting strategy of these software entities. Article only. |
| CHILL: Constructive Heuristics Induction for Language Learning: A general approach to the problem of inducing natural language parsers. It uses an annotated corpus, and produces a parser by using ILP for inducing the rules that control the actions of a shift-reduce parser. [FREE] |
| Classification Toolbox for MATLAB: A complete set of algorithms for classification, clustering, feature selection and reduction for Matlab. [FREE] |
| Courses for Machine Learning, Neural Networks, Probabilistic Graphical Models, and Bayesian Networks: Directories of links for each area of study. Some are commercial courses but many are online and free. [COM], [FREE] |
| Crespin: A Neural Network Builder: Software package to calculate architecture and weights of binary output perceptron neural networks. Solves benchmark problems like Sonar (dim=60, datavectors =211) XOR-8 (dim=8, datavectors =256) in seconds. [COM] |
| CWT - Cognitive Wireless Technology: Optimizing radio signal performance metrics by use of biologically inspired algorithms. Article only. |
| DARPA's new plan for machine learning: Overview of the US defense departments project to develop "computer software able to learn and reason in complex military planning jobs by being shown how to perform a task only once." Article only. |
| Data Access Tools from the US Census Bureau: General purpose data display and extraction tools that works with Census Bureau data. Census data available for pickup through census bureau employees only. |
| Decision Trees: A program for inducing Bayesian decision trees to handle large classification problems requiring a large number of classes and attributes for speech application. Current package features Bayesian splitting, Information gain splitting, Gain ratio splitting, and Pessimistic pruning. [FREE] |
| Dictionary of Machine Learning: Includes links to other dictionaries for AI, NLP, and Prolog. [FREE] |
| ELIE: An Adaptive Information Extraction System: A tool for adaptive information extraction from text. Also included are a number of other text processing tools for POS tagging, chunking, gazetteer, and stemming. [GPL] |
| EM algorithm for Mixture models (test version): Shotaro Akaho's implementation of EM algorithm for modeling Mixtures of Gaussians in Java. An extended version is available from the author. [FREE] |
| Experience-based Language Acquisition: Computational model of human language acquisition written in Java; currently acquires a protolanguage of nouns and verbs language based on visual perception. [BSD] |
| FastMix: Generates Gaussian mixture models for large datasets using efficient KD-clustering algorithms. [FREE] |
| Fraud and Artificial Intelligence: New machine-learning technology may help businesses detect suspicious activity and mitigate the risk of fraudulent transactions. Article only. |
| FTP Repository Site List for Cognitive and Machine Learning: Anonymous sites from popular colleges and universities. To access other pages just replace the 6 in URL with numbers 1-23. |
| Fuzzy C-Means Algorithms In Remote Sensing: Modelling data by 'clusters' to glean knowledge from satellite imagery. Article only. |
| General Hidden Markov Model Library: Hidden Markov Models software library from the Center of Applied Informatics, Cologne. Includes algorithms such as Viterbi, Baum-Welch, and Forward-Backward. [GPL] |
| Grammatical Induction: Directory of grammatical deduction links including online courses and the EMILE and ABL applications. [GPL] |
| GRIDLOCK: A Scalable Approach to Unifying Computer and Communications Security: A globally specified and locally interpreted policy language based software for specification of network access control policy. [GPL] |
| Heuristic Evaluation: A group of articles detailing using heuristics in web based user interface design. Articles only. [FREE] |
| HMM and other statistical programs: This tool implements Hidden Markov Models and application to part-of-speech tagging. Also available; a multivariate hypothesis testing software for gaussian data, and a groundtruth/metadata editing and visualizing toolkit for OCR. [GPL] |
| HMMER: Biosequence Analysis: A tool used to build HMMs from multiple alignments and calculate e-scores. [GPL] |
| HRE API: A Portable Handwriting Recognition Engine: This engine is a functionally complete interface for handwriting recognition. API was written in ANSI C and has minimal reliance on the Windows system. There is a version ported to Linux. [GPL] |
| Inductive Inference and Decision Trees Overview: Concise study of the individual algorithms involved in game theory. Article only. |
| ITI: Incremental Tree Inducer: An algorithm that incrementally constructs decision trees from labeled examples. [AFL] |
| Kardi Teknomo's Tutorials on Machine Learning: Complete explanations of various ML formulae including K Means clustering, Bootstrap sampling, Q Learning by example, generalized inverse, and several more. Some with downloadable code. [FREE] |
| KEEL: Knowledge Extraction based on Evolutionary Learning: The aim of this project is to develop a Computational Environment for integrating the design and use of knowledge extraction models from data using evolutionary algorithms. Genetic learning may also be applied to the model. [GPL] |
| Lemga: Learning Models and Generic Algorithms: A library of classes for optimizing (training) the generic models. Written in C++. [GPL] |
| LENS: Neural Network Simulator: No fewer than five learning algorithms are used by this toolset: steepest descent, momentum descent, "Doug's momentum descent", delta-bar-delta, and quick-prop. Multiple networks can use the same dataset. Written in C using tcl/Tk toolkit. [GPL] |
| libbpfl - A Bayesian Probability Filtering Library: A general purpose library for Bayesian filtering written in C++. [LGPL] |
| LingPipe: Natural Language Processor (NLP): A suite of Java libraries for the linguistic analysis of human language which can link entity mentions to database entries, uncover relations, cluster documents, and discover significant trends. [GPL] |
| LNKnet Pattern Classification: A software package that integrates more than 22 neural network, statistical, and machine learning classification, clustering, and feature selection algorithms into a modular software package. [GPL] |
| Machine Learning & Agent-Based Computing: Examination of applying machine-learning technology to control software agents in the changing Internet environment. Introduction of MLEngine -- a general-purpose AI engine with real-time learning capability. [GPL] |
| Machine Learning FAQ: This FAQ from machinelearning.net contains very good in-depth definitions, citations, and bibliographies on the business of machine learning. |
| Machine Learning for Text Autocompletion in Mozilla: Applying machine learning and artificial intelligence ideas to make Mozilla a more "intelligent" browser. Scroll down. This project is open to new programmers. [GPL] |
| Machine Learning Meets Agent-Based Modeling: When Not to go to A Bar: Paper on adaptive agents and how they have changed through the last ten years. The author discusses the integration of ABM into the El Farol Bar Problem. Online version only. [FREE] |
| Machine Learning News-Ticker: Everything (almost) in the news dealing with: machine learning, data mining, text mining, genetic algorithms, reinforcement learning. |
| Machine Learning Programs by Peter Clark: A collection of downloadable packages including: KM - The Knowledge Machine, Guiding Inductive Learning with a Qualitative Model, LPE - Lazy Partial Evaluation, and CN2 - Rule induction from examples. [GPL] |
| Machine Learning Research Software in LISP: FTP repository of common list algorithms and datasets for research. [GPL] |
| Machine Learning Software Packages: FTP software repository of machine learning programs from Carnegie Mellon University School of Computer Science. There are also links to other repositories. [GPL] |
| Machine Learning Thoughts: Winning The DARPA Grand Challenge: On-line video explaining the methods used to 'train the program'. This page also contains a list of machine learning blogs. Online version only. [FREE] |
| Machine Learning/Learning Theory Hotlist: Directory of links related to ML including people, conferences, journals, data repositories, and more. |
| MALLET: Advanced Machine Learning for Language: An integrated collection of Java code useful for statistical natural language processing, document classification, clustering, information extraction, and other machine learning applications to text. [GPL] |
| Mathematics in Handwriting Recognition: Short study on handwriting recognition method using isolated symbol classification and stroke set partitioning to do expression parsing. Link to online demonstration. Article only. |
| Maximum Entropy Modeling Toolkit: A library of tools for constructing maximum entropy (maxent) model in either python or C++. Some program features are L-BFGS & GIS parameter estimation, and gaussian prior smoothing. [GPL] |
| MEME/MAST: Motif Discovery and Search: A software package to discover motifs (highly conserved regions) in groups of related DNA or protein sequences and, search sequence databases using motifs. [COM] |
| Memetic Algorithms: Directory of links on this population-based approach for heuristic search in optimization. Includes a Call for papers and a large index on memetics on the web. |
| Meta-MEME: Motif-based Hidden markov Modeling of Biological Sequences: Software toolkit for building and using motif-based hidden Markov models of DNA and proteins. There is an online interactive version. Source written in C. [GPL] |
| Minimax Game Trees, Part 1: A detailed explanation of one of the most important data structures ever created for Game Artificial Intelligence. Some good algorithm and information links to the left of the page. Includes link to part 2 at end of article. Article only. |
| Minimax Game Trees, Part 2: The secret behind the strategy involved in the algorithm for game theory. Includes link to part 1. Article only. |
| MIT OpenCourseWare: Medical Decision Support, Fall 2005: A complete course from MIT presenting the main concepts of decision analysis, artificial intelligence, and predictive model construction and evaluation in the specific context of medical applications. Part of the MITOpenCourseware free and educational resource (OER) project located here: http://ocw.mit.edu/OcwWeb/index.htm [FREE] |
| MIX: Software for Mixture Distributions: Software for learning mixture distributions. Examples and two case studies are included. [COM] |
| ML/CBR Conference Announcements: Listing of current and future meetings for machine learning and case based reasoning by David W. Aha |
| MLnet: The Machine Learning Network Information Service: Directory of information relating to machine learning including; software index, bibliography, courses, datasets, showcases and other resource links. |
| Monkiefarm: A Genetic Algorithm Library: This toolset uses an easily extensible genetic graph bipartitioning algorithm to run over an NP hard problem. It encompasses multi-layer perceptron implementations, written in C++. [GPL] |
| NEITHER: A Propositional Theory Refinement: A system to modify an incomplete or incorrect rule base to make it consistent with a set of input training examples. Written in C++ [FREE] |
| NETLAB: Neural Network Software: A toolbox designed to provide the central tools necessary for the simulation of theoretically well founded neural network algorithms and related models for use in teaching, research and applications development. [GPL] |
| Nonparametric Classification with Polynomial MPMC Cascades: Scalable non-parametric classification with Polynomial MPMC Cascades for use in Matlab. [GPL] |
| NSP: N-gram Statistics Package: Software for counting and analyzing word n-grams in text. This package provides standard tests of association for identifying word n-grams in large corpora and allows users to implement other tests with minimal scripting knowledge. Written in Perl. [GPL] |
| OpenCyc: Open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine. Can be used as the basis of a wide variety of intelligent applications including; rapid development of an ontology in a vertical area, email prioritizing, routing, summarization, and annotating, expert systems, game engine development. [GPL] |
| Optical Character Recognition System: Short study of the history and methods used in optical character recognition including the forward-backward, Viterbi, and Baum-Welch algorithms. PDF article only. |
| ORANGE: Inter-active Machine Learning Data Mining components: A component based framework for data input/output, preprocessing, predictive modelling, ensemble methods, and modelling validation. [GPL] |
| Pattern Matching Pointers: Using algorithms to address issues of searching and matching strings and more complicated patterns such as trees, regular expressions, graphs, point sets, and arrays. [GPL] |
| Pfam: Database of Protein Families and HMMs: A large collection of multiple sequence alignments and trained hidden Markov models covering many common protein domains. Alignments are included as well as models for 8296 protein families, based on the Swissprot 48.9 and SP-TrEMBL 31.9 protein sequence databases. [GPL] |
| PRAPI: Pattern Recognition Application Programmer's Interface: A library for many pattern recognition tasks. The main focus of this package is on image analysis but utilizes a general architecture and XML-based data interchange format. Written in C++ [GPL] |
| Probabilistic Networks Library (PNL), and Open Source Machine Learning Library (OpenML) by Intel: Open source library of tools for building machine learning software based on Bayesian mathematical principles. [GPL] |
| PRODIGY: An Architecture for Planning and Learning: A system of research planning and learning utilizing explanation-based learning, partial evaluation, experimentation, graphical knowledge acquisition, automatic abstraction, mixed-initiative planning, and case-based reasoning. [FREE] |
| SAM: Sequence Alignment and Modeling: A collection of tools for creating and using HMMs for biological sequences. [AFL] |
| SenseClusters: Programs to cluster similar contexts together using unsupervised knowledge-lean methods for word sense discrimination, email categorization, and name discrimination. Written in Perl. [GNU] |
| SNoW: Sparse Network of Winnows: A learning architecture specifically tailored for learning in very high-dimensional feature spaces. The current release uses sparse variations of Winnow, Perceptron, and Naive Bayes. [AFL] |
| Software Packages for Graphical Models/Bayesian Networks: Tools for modeling graphs & Bayesian networks. Scroll down. [AFL], [COM] |
| Sorting Algorithms for Machine Learning: Various sorting algorithms including insertion, quick, merge, heap, Dutch National Flag, and radix with on-line demos. [FREE] |
| Spider: General Purpose Machine Learning Toolbox in Matlab: An object orientated environment for machine learning in Matlab. Algorithms can be plugged together and can be compared with (e.g. model selection, statistical tests and visual plots). Algorithms may be downloaded separately. [GPL] |
| STRIP: A Strip-Based Neural-Network Growth Algorithm for Learning Multiple-Valued Functions: Synthesizing multiple-valued logic functions by minimal multilayer feedforward neural networks. Full mathematical explanation. Article only. |
| SUBDUE: Graph Based Knowledge Discovery: The program discovers interesting and repetitive subgraphs in a labeled graph representation using the minimum description length principle. Includes applications to molecular biology. [FREE] |
| The AutoClass Project: A database of cases described by a combination of real and discrete valued attributes, and automatically finds the natural classes in that data. It can be seen as a Naive Bayes classifier where the class node is hidden. [FREE] |
| The Discipline of Machine Learning: A short in-depth definitive study of what modern machine learning is and how it has changed through the years. Article only. |
| The NN Learning Algorithm Benchmarking Page: A repository of benchmarking information for the NN researcher. Several benchmarking facilities are listed as well as information on testing experiments and datasets. |
| The Observable Operator Modeling Kit: Machine learning library for 'observable operator models' (OOMs) suitable for time-series and sequence data classification and prediction. OOMs are similar but more powerful than HMMs. Written in C++ [BSD] |
| The PNC2 Rule Induction System: Windows software tool that induces rules from your data using the PNC2 cluster algorithm. An integrated parameter-tuning component allows easy adjustment of the algorithms behavior to the given problem without requiring any further knowledge. [GPL] |
| The Torch Machine Learning Library: This package forms a complete gradient descent machine learning library. Modules support vector machines in classification and regression, ensemble models such as bagging or adaboost, non-parametric models such as K-nearest neighbors, Parzen regression, and Parzen density estimation. Includes speech recognition tools. Written in C++ [BSD] |
| The Work of Information Mediators: A Comparison of Librarians and Intelligent Software Agents: In this paper, the author examines the characteristics of information agency, the work of librarians and of intelligent agents as information mediators, the differences between human and software agents, the possible tasks for software agents in libraries, and speculates on the future of human and software agency. Article only. |
| Thinking Machine 4: A machine learning chess opponent. Online game version only. |
| TIMBL: Tilburg Memory Based Learner: A program implementing several memory-based learning techniques. These learners store representation of the training set explicitly, and classifies new cases by extrapolation from the most similar stored cases. [AFL] |
| To boldly go where no kernel has gone before: Machine learning in lp semi-inner product spaces: Applying ML to the 'kernal trick' pardigm thereby mapping data into a higher-dimension. Online version only. [FREE] |
| Tree Visualizer: Software which allows one to navigate (fly) through the data tree, zoom in on interesting nodes, click on bars to get counts, and mark interesting places in the tree. Includes datasets for automobiles, voting, produce, and medical research. Uses LEDA, ([AFL] licensed only). [GPL] |
| TRON: Java based machine learning game opponent for the light cycles game in Tron. Online game version only. |
| UCI Knowledge Discovery in Databases Archive: An online repository of large data sets which encompasses a wide variety of data types, analysis tasks, and application areas at the University of California at Irvine. |
| UCI Machine Learning Repository: A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms at the University of California at Irvine. |
| Urban Operations Research by Larson/Odoni: An online book covering subjects related to urban operations research including probability, queueing theory, spatial queues, network applications, and more. Online version only. [FREE] |
| VIBES: Variational Inference for Bayesian Networks: A software package which allows variance-modeled posterior inference to be performed automatically on a Bayesian network. [GPL] |
| WebMO: Web-based interface to computational chemistry. Has support for Gaussian 94/98/03, GAMESS, MolPro 2002, MOPAC 7/93/200x, NWChem 4.6+, QChem 2.1+, and Tinker 4.2+. Unix or Linux based. [FREE] |
| Weka 3: Data Mining Software: A collection of tools that implement decision trees and tables, rule learners, Naive Bayes, support vector machines, voted perceptrons, multi-layer perceptron. Meta schemes include bagging, stacking, and boosting. Written in Java. [GPL] |
| What If: Web-based Scientific Discovery: An algorithm engine which will calculate everything from symmetry, torsion angles, polar fraction through protein analysis and bond angles. Online version only. [FREE] |
| WinBUGS: The BUGS Project: A stand-alone program to allow practical MCMC methods available to applied statisticians. Either a point and click interface can be used to control the analysis or a graphical interface can be constructed. The BUGs project also includes links to GeoBUGS for spatial analysis, PKBUGS for pharmacokinetic modelling, and OpenBUGS for latest developments. [GPL] |
| WinMine Toolkit: A set of tools for Windows 2000/NT/XP that allow you to build statistical models from data. [FREE] |
| WordNet: A Lexical Database for the English Language: An online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. [GPL] |
| YaLE: Yet another Learning Environment: The YALE toolset is an environment for machine learning through use of nested operators. Multiple experiments can be arbitrarily nested together through use of a graphical XML based user interface. [GPL] |