Sunday, 30 December 2007

DocuBurst: Visualizing Document Content using Language Structure

Christopher Collins, Sheelagh Carpendale, and Gerald Penn

Abstract
DocuBurst is the first visualization of document content which takes advantage of the human-created structure in lexical databases. We use an accepted design paradigm to generate visualizations which improve the usability and utility of WordNet as the backbone for document content visualization. A radial, space-filling layout of hyponymy (IS-A relation) is presented with interactive techniques of zoom, filter, and details-on-demand for the task of document visualization. The techniques can be generalized to multiple documents.

http://www.thestar.com/News/article/223620

http://www.cs.utoronto.ca/~ccollins/research/docuburst/index.html

Visualisation

Logo and visulisation
http://www.codinghorror.com/blog/archives/001026.html

Easily expandable menu in Logo - based on Lisp
http://www.cs.berkeley.edu/~bh/logo-sample.html

More on visulisation
http://www.codinghorror.com/blog/archives/000777.html

Compiling Flash Actions Script files
http://www.senocular.com/flash/tutorials/as3withmxmlc/

Different processing visulisations- Java
http://benfry.com/
http://benfry.com/salaryper/
http://benfry.com/isometricblocks/
http://acg.media.mit.edu/people/fry/zipdecode/

http://prefuse.org/

the prefuse visualization toolkit

Prefuse is a set of software tools for creating rich interactive data visualizations. The original prefuse toolkit provides a visualization framework for the Java programming language. The prefuse flare toolkit provides visualization and animation tools for ActionScript and the Adobe Flash Player.

Cool set theory examples

http://indexed.blogspot.com/

The most awesome Flash I have seen - apparently with source but I don't know how to open .fla files.

http://www.levitated.net/daily/levRotationalS.html

dojo.gfx

http://www.thinkvitamin.com/features/design/create-cross-browser-vector-graphics

http://www.dojotoolkit.org/

visualizing online social networks
http://jheer.org/vizster/

http://processing.org/

Processing is an open source programming language and environment for people who want to program images, animation, and interactions. It is used by students, artists, designers, researchers, and hobbyists for learning, prototyping, and production. It is created to teach fundamentals of computer programming within a visual context and to serve as a software sketchbook and professional production tool. Processing is developed by artists and designers as an alternative to proprietary software tools in the same domain.

Processing is free to download and available for GNU/Linux, Mac OS X, and Windows. Please help to release the next version!

Processing is an open project initiated by Ben Fry and Casey Reas. It evolved from ideas explored in the Aesthetics and Computation Group at the MIT Media Lab.

http://www.amazon.com/exec/obidos/ASIN/0596514557/

http://design.yahoo.com/project.php?pid=9
http://design.yahoo.com/index.php#projects

More cool Flash

http://levitated.net/

Saturday, 29 December 2007

Postgresql - Counting Sundays

http://www.depesz.com/index.php/2007/12/27/how-many-1sts-of-any-month-were-sundays-since-1901-01-01/

how many 1sts of any month were sundays - since 1901-01-01?

December 27th, 2007 by depesz

nixternal wrote about boost library for c++.

with it he was able to find the answer to title question in miliseconds (he didn't specify how many, but let's assume that is was less than 10 ms).

so i decided to check how fast can i do it in postgresql …

Pivot queries or CrossTab Queries in PostgreSQL using tablefunc contrib

http://www.postgresonline.com/journal/index.php?/archives/14-CrossTab-Queries-in-PostgreSQL-using-tablefunc-contrib.html

The generic way of doing cross tabs (sometimes called PIVOT queries) in an ANSI-SQL database such as PostgreSQL is to use CASE statements which we have documented in the article What is a crosstab query and how do you create one using a relational database?.

In this particular issue, we will introduce creating crosstab queries using PostgreSQL tablefunc contrib.

Boost library - counting Sundays

http://blog.nixternal.com/2007.12.26/boost-library-rocks/

Boost library rocks!

26 12 2007

I have been messing with a project through the university and we decided that we would go with the Boost library for the project, so to read up on Boost and how to use it, I decided I would attack Project Euler using Boost. One area that Boost really showed its strength was determining how many Sundays fell on the first day of the month between January 1, 1901 and December 31, 2000. In just a few lines, the answer was apparent. Here is the lines of code that answers this problem, and answers it immediately. I had used other calendar solutions in C++ in the past, but the gregorian.hpp library is fast.

#include <iostream>
#include "boost/date_time/gregorian/gregorian.hpp" 
 
int main(void)
 {
    using namespace boost::gregorian;
     int count = 0;
    date_period dp(date(1901 , Jan, 1), date(2000, Dec,  31));
    day_iterator iter(dp.begin ());
 
    while (iter != dp. end())
    {
         if (iter->day() == 1 && iter->day_of_week ().as_enum == 0 )
            count++;
        ++iter;
    }
 
    std::cout << count << std:: endl;
    return 0;
}

The answer takes milliseconds, it is just that fast. Right now I would like to replace my gmpxx libraries for big numbers with a boost library and I think all of my Euler answers will be "boosted."

xTuple Open source ERP

http://www.xtuple.com/

Does your software help your business grow?

xTuple is a new kind of company. We're dedicated to using the power of open source software to help businesses of all sizes grow and prosper.

What does that mean?

For years, we have developed and marketed OpenMFG, a leading Enterprise Resource Planning (ERP) product for small to midsized manufacturers and distributors. More recently, we made the OpenRPT report writer - built as part of OpenMFG - available as open source software.

The newest member of the family is PostBooks, a free and open source accounting, ERP, and CRM package.

Based on the award-winning OpenMFG ERP Suite, PostBooks is available for free download right now. Like all xTuple products, it runs equally well on Windows, Linux, and Mac - and is fully internationalized (multi-currency, support for multiple tax structures, and multilingual translation packs maintained by our global community).

Tuesday, 11 December 2007

Breakage - drum simulator

http://www.blackholeprojector.com/
Breakage: the intelligent drum machine for intelligent breaks

Breakage is an intelligent drum machine designed to make it easy and fun to
play complex, live breakbeat performances. A step-sequencer pattern editor
and previewer, database, sample browser, neural network, pattern morphs,
statistics and probabilistic pattern generator give you the tools to work
with breaks on a higher level than ever before.

Sierpinski Gasket

http://local.wasp.uwa.edu.au/~pbourke/fractals/gasket/

Planet postgresql, Mylyn, Maven,

http://www.planetpostgresql.org/

http://www.eclipse.org/mylyn/
Mylyn is the Task-Focused UI for Eclipse that reduces information overload and makes multi-tasking easy. It does this by making tasks a first class part of Eclipse, and integrating rich and offline editing for repositories such as Bugzilla, Trac, and JIRA. Once your tasks are integrated, Mylyn monitors your work activity to identify information relevant to the task-at-hand, and uses this task context to focus the Eclipse UI on the interesting information, hide the uninteresting, and automatically find what's related. This puts the information you need to get work done at your fingertips and improves productivity by reducing searching, scrolling, and navigation. By making task context explicit Mylyn also facilitates multitasking, planning, reusing past efforts, and sharing expertise.

http://maven.apache.org/

Welcome to Maven

Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.

If you think that Maven could help your project, you can find out more information about in the "About Maven" section of the navigation. This includes an in-depth description of what Maven is , a list of some of its main features , and a set of frequently asked questions about what Maven is .

http://wicket.apache.org/

Welcome to Apache Wicket

With proper mark-up/logic separation, a POJO data model, and a refreshing lack of XML, Apache Wicket makes developing web-apps simple and enjoyable again. Swap the boilerplate, complex debugging and brittle code for powerful, reusable components written with plain Java and HTML.

http://www.s9y.org/

Serendipity - a PHP Weblog/Blog software

Serendipity is a PHP-powered weblog application which gives the user an easy way to maintain an online diary, weblog or even a complete homepage. While the default package is designed for the casual blogger, Serendipity offers a flexible, expandable and easy-to-use framework with the power for professional applications.

Monday, 10 December 2007

Thursday, 6 December 2007

Easily delete duplicate files in windows - free

http://www.easyduplicatefinder.com/
Found on http://lifehacker.com/software/lifehacker-top-10/top-10-free-windows-file-wranglers-330037.php

Friday, 30 November 2007

Universal Digital Library

http://www.ulib.org/index.html

c++ qsort, threaded qsort and shell sort

Shell sort, qsort recurisve, and threaded qsort.
Threaded qsort runs out of memory for creating new threads at around 600 threads.
Making the stack size smaller with NEED_STACK might be able to fix is but I couldn't work out how.

PTHREAD_THREADS_MAX is set to 16384 on this system so that is not the problem.

 It relies on the fact that C++ vectors are stored in contiguous memory.

/*
* "The C++ Programming Language 3rd Ed. Bjarne Stroustrup Pg. 334"
* Copyright 2007 Timothy O'Hare
*/
#include <iostream>
#include <sstream>
#include <vector>
#include <string>
#include <pthread.h>
#include <memory>

using namespace std;

/*****************************************************
* template sort
*/
template<class T> void sort(vector<T>&); // declaration

void f(vector<int>& vi, vector<string>& vs)
{
    time_t start = time(0);
    sort(vi);
    time_t taken = time(0) - start;
    cout << "time taken for roll your own: " << taken << " secs" << endl;
    sort(vs);
}

template<class T> void sort(vector<T>& v) // definition
    // Shell sort (Knuth, Vol.13, pg.84)
{
    const size_t n = v.size();

    for (int gap=n/2; 0<gap; gap/=2)
        for (int i=gap; i<n; i++)
            for (int j=i-gap; 0<=j; j-=gap) {
                //cout << "i:" << i << ", j:" << j << ", gap: " << gap << ", n:" << n << ", v[j]: " << v[j] << endl;
                if (v[j+gap] < v[j])
                    swap(v[j], v[j+gap]);
            }
}

/*****************************************************
* Log
*/

class Log {
    public:
        void co(string s);
        static Log Instance();
        ostream& operator << (ostream& os );
    protected:
        Log() { pthread_mutex_init(&logMutex, NULL); };
    private:
        static std::auto_ptr<Log> theSingleInstance;
        //static Log * l;
        pthread_mutex_t logMutex;
};

Log Log::Instance()
{
    if (theSingleInstance.get() == 0)
      theSingleInstance.reset (new Log);
    return *theSingleInstance;
    /*if (!l) {
        l = new Log();
    }
    return *l;*/
}

void Log::co(string s)
{
    pthread_mutex_lock(&logMutex);
    cout << s;
    pthread_mutex_unlock(&logMutex);
}

ostream& Log::operator << (ostream& os)
{
    pthread_mutex_lock(&logMutex);
    cout << os;
    pthread_mutex_unlock(&logMutex);
}

/*****************************************************
* q sort
*/

template<class T> int partition(vector<T>& v, int left, int right, int pivot)
{
    T pivotVal = v[pivot];
    swap(v[pivot], v[right]);
    int store = left;
    for (int i = left; i < right; i++) {
        if (v[i] < pivotVal) {
            swap(v[i], v[store]);
            store++;
        }
    }
    swap(v[store], v[right]);
    return store;
}

template<class T> void qsort(vector<T>& v, int left, int right)
{
   if (right > left) {
       int pivot = left;
       int newPivot = partition(v, left, right, pivot);
       //cout << "recursive newpviot: " << newPivot << ", left: " << left << ", right: " << newPivot-1 << endl;
       qsort(v, left, newPivot-1);
       //cout << "recursive newpviot: " << newPivot << ", left: " << newPivot+1 << ", right: " << right << endl;
       qsort(v, newPivot+1, right);
   }
}

/*****************************************************
* q sort threaded
*/
void prtv(string s, vector<int> &v)
{
    ostringstream st;
    Log log = Log::Instance();
    if (v.size() > 1) {
        st << s << ": size:" << v.size() << ":";
        log.co(st.str());
        st.str("");
        for (int i = 0; i <= v.size()-2; i++) {
            //cout << "v[" << i << "]:" << v[i] << ",";
            st << v[i] << ",";
           log.co(st.str ());
            st.str("");
        }
    }
    if (v.size() > 0) {
        //cout << "v[" << v.size()-1 << "]:" << v[v.size()-1] << endl;
        st << v[v.size()-1] << endl;
        log.co(st.str());
        st.str("");
    } else {
        st << s << ": size:" << v.size() << endl;
        log.co(st.str());
        st.str("");
    }
}

int partition2(vector<int> &v, int left, int right, int pivot)
{
    /*ostringstream st;
    Log log = Log::Instance();
    st << "End of partition: pivot: " << pivot << ", l: " << left << ", r: " << right << endl;
    log.co(st.str());*/

    int pivotVal = v[pivot];
    swap(v[pivot], v[right]);
    int store = left;
    for (int i = left; i < right; i++) {
        if (v[i] < pivotVal) {
            swap(v[i], v[store]);
            store++;
        }
    }
    swap(v[store], v[right]);
    //prtv("end of partition2", v);
    return store;
}

typedef struct {
    vector<int>& v;
    int left;
    int right;
} qs_t;

int threads;

void *qsort_threaded(void * q)
{
   Log log = Log::Instance();
   ostringstream s;
   qs_t r1 = *(qs_t *)q;
   vector<int> &v = r1.v;
   int left = r1.left;
   int right = r1.right;
   qs_t r2 = *(qs_t *)q;
   //vector<int> &v = r2.v;
   //int left = r2.left;
   //int right = r2.right;
   //vector<int> &v = ((qs_t *)q)->v;
   //int left = ((qs_t *)q)->left;
   //int right = ((qs_t *)q)->right;

    /*ostringstream st;
    st << "begin: left " << left << ", right " << right << endl;
    log.co(st.str());*/
    //prtv("begin", v);

   if (right > left) {
       //int pivot = right/2;
       int pivot = left;
       int newPivot = partition2(v, left, right, pivot);
       pthread_t *tid1 = NULL;
       pthread_t *tid2 = NULL;

       //s << "pivot:" << newPivot << ", left: " << left << ", right: " << right << endl;
        //log.co(s.str());
        r1.left = left;
        r1.right = newPivot-1;
        /*s.str("");
        s << "th1: pivot:" << newPivot << ", left: " << r1.left << ", right: " << r1.right << endl;
        log.co(s.str());*/
        threads++;
        tid1 = new pthread_t;
        if (pthread_create(tid1, NULL, &qsort_threaded, (void*)&r1)) {
            perror("Failed to create initial allocation thread 1:");
            return (void*)-1;
       }

        r2.left = newPivot + 1;
        r2.right = right;
        /*s.str("");
        s << "th2: pivot:" << newPivot << ", left: " << r2.left << ", right: " << r2.right << endl;
        log.co(s.str());*/
        threads++;
        tid2 = new pthread_t;
        if (pthread_create(tid2, NULL, &qsort_threaded, (void*)&r2)) {
            perror("Failed to create initial allocation thread 2:");
            return (void*)-1;
         }

       int i = 0;
       if (tid1) {
           if (0 != pthread_join(*tid1, NULL)) {
            perror("pthread join tid1");
           }
       }

       if (tid2) {
           if (0 != pthread_join(*tid2, NULL)) {
            perror("pthread join tid2");
           }
       }

   } /*else {
    st.str("");
    st << "end: left " << left << " <= right " << right << endl;
    log.co(st.str());
   }*/
/* for (int j=0; j<v.size()-1; j++) {
       t[j] = v[j];
   }*/

   return (void*)0;
}

std::auto_ptr<Log> Log::theSingleInstance;

int main(int argc, char * argv[])
{
    vector<int> vi;
    threads = 0;
    // create a vector of random ints
    int loop_cnt;
    if (argc == 1)
        loop_cnt = 600;
    else
        loop_cnt = atoi(argv[1]);
    srand(time(NULL));
    for( int i = 0; i < loop_cnt; i++ ) {
        int num = (int) rand() % loop_cnt;
        vi.push_back(num);
        //cout << num << ",";
    }
    //cout << endl;

    // copy it
    vector<int> vi_builtin(vi);
    vector<int> vi_qsort(vi);
    vector<int> vi_qsort_threaded(vi);

    // create a vector of strings
    vector<string> vs;
    vs.push_back("1");
    vs.push_back("zerbra");
    vs.push_back("towel");
    vs.push_back("shovel");
    vs.push_back("apple");

    // built in sort
    time_t start = time(0);
    sort(vi_builtin.begin(), vi_builtin.end());
    time_t taken = time(0) - start;
    cout << "time taken for builtin: " << taken << " secs" << endl;

    // qsort
    start = time(0);
    qsort(vi_qsort, 0, vi_qsort.size()-1);
    taken = time(0) - start;
    cout << "time taken for qsort: " << taken << " secs" << endl;

    // qsort threaded
    qs_t q = {vi_qsort_threaded, 0, vi_qsort_threaded.size()-1};
    start = time(0);
    qsort_threaded((void*)&q);
    taken = time(0) - start;
    cout << "time taken for qsort threaded: " << taken << " secs" << endl;

    // shell sort
    f(vi, vs);

    // check if any values are different
    for( int i = 0; i < vi.size(); i++ ) {
        if (vi[i] != vi_builtin[i] ||
            vi_qsort[i] != vi_builtin[i] ||
            vi_qsort_threaded[i] != vi_builtin[i] )
                cerr << "vi[i] " << vi[i] << " != " << vi_builtin[i]
                     << " or vi_qsort[i] " << vi_qsort[i]
                     << " or vi_qsort_threaded[i] " << vi_qsort_threaded[i] << endl;
    }
    // print them out
    /*cout << "qsort_threaded result" << endl;
    for( int i = 0; i < vi_qsort_threaded.size(); i++ ) {
        cout << vi_qsort_threaded[i] << ",";
    }
    cout << endl;*/
    cout << "thread count: " << threads << endl;
    return 0;
}

Thursday, 29 November 2007

Only the Paranoid Survive

http://www.intel.com/pressroom/kits/bios/grove/paranoid.htm
Andrew S. Grove

Sooner or later, something fundamental in your business world will change.

I'm often credited with the motto, "Only the paranoid survive." I have no
idea when I first said this, but the fact remains that, when it comes to
business, I believe in the value of paranoia. Business success contains the
seeds of its own destruction. The more successful you are, the more people
want a chunk of your business and then another chunk and then another until
there is nothing left. I believe that the prime responsibility of a manager
is to guard constantly against other people's attacks and to inculcate this
guardian attitude in the people under his or her management.

Wednesday, 28 November 2007

Using pictures as a captcha

http://research.microsoft.com/asirra/

Monday, 26 November 2007

BZFlag - Multiplayer 3D Tank Game

OpenSource OpenGL Multiplayer Multiplatform Battle Zone capture the Flag.
3D first person Tank Simulation.
http://sourceforge.net/projects/bzflag/

Wednesday, 21 November 2007

Infrarecorder free CD/DVD burning

http://infrarecorder.sourceforge.net/

InfraRecorder is a free CD/DVD burning solution for Microsoft Windows. It
offers a wide range of powerful features; all through an easy to use
application interface and Windows Explorer integration.

InfraRecorder is released under GPL version 2.

Features
Create custom data, audio and mixed-mode projects and record them to
physical discs as well as disc images.
Supports recording to dual-layer DVDs.
Blank (erase) rewritable discs using four different methods.
Record disc images (ISO and BIN/CUE).
Fixate discs (write lead-out information to prevent further data from
being added to the disc).
Scan the SCSI/IDE bus for devices and collect information about their
capabilities.
Create disc copies, on the fly and using a temporary disc image.
Import session data from multi-session discs and add more sessions to
them.
Display disc information.
Save audio and data tracks to files (.wav, .wma, .ogg, .mp3 and
.iso).

Alfresco & Jaspersoft - content management and business intelligence

http://www.alfresco.com/

Alfresco is the Open Source Alternative for Enterprise Content Management (ECM), providing Document Management, Collaboration, Records Management, Knowledge Management, Web Content Management and Imaging.

http://www.jaspersoft.com/

World's leading commercial open source business intelligence solutions for developers and businesses.

Open-source software rated: Ten alternatives you need

http://crave.cnet.co.uk/software/0,39029471,49294100-1,00.htm

Monday, 19 November 2007

C# programming

.NET Framework FAQ

Andy McMullan
http://www.andymcm.com/dotnetfaq.htm
C# FAQ for C++ programmers

Andy McMullan
http://www.andymcm.com/csharpfaq.htm

C# Language Specification
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/CSharpSpecStart.asp

http://www.codeplex.com/
CodePlex is Microsoft's open source project hosting web site. Start a new
project, join an existing one, or download software created by the
community.

C# Programmer's Reference
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csref/html/vcoricprogrammersreference.asp

C# Programming
http://www.hitmill.com/programming/dotNET/csharp.html#microsoft

Hash functions

http://www.linuxworld.com/cgi-bin/mailto/x_linux.cgi?pagetosend=/export/home/httpd/linuxworld/news/2007/111207-hash.html

Userfriendly comic strip

http://ars.userfriendly.org/cartoons/?id=20071117

Password MD5 cracking

passcracking.com/index.php
http://md5.rednoize.com/

SOAP financial news

http://www.xmethods.net/ve2/index.po
http://www.strikeiron.com/MTFinancialNews

 This stock screening tool helps determine how a stock is likely to 
react to commonly occurring news events (e.g., earnings, analyst 
upgrades and downgrades, etc.) based on how it has reacted to 
similar events in the past.

Friday, 16 November 2007

E8 (mathematics)

http://en.wikipedia.org/wiki/E8_(mathematics)

E₈ (mathematics)

From Wikipedia, the free encyclopedia

• Find out more about navigating Wikipedia and finding information •

Jump to: navigation, search

Graph of E₈ Gosset polytope, 4_2,1

Coxeter-Dynkin diagram:

In mathematics, E₈ is the name given to a family of closely related structures. In particular, it is the name of some exceptional simple Lie algebras as well as that of the associated simple Lie groups. It is also the name given to the corresponding root system, root lattice, and Weyl/Coxeter group, and to some finite simple Chevalley groups . It was discovered between the years of 1888-1890 by Wilhelm Killing.

The designation E₈ comes from Wilhelm Killing and Élie Cartan's classification of the complex simple Lie algebras, which fall into four infinite families labeled A_n, B_n, C_n, D_n, and five exceptional cases labeled E₆, E₇, E₈, F₄, and G₂. The E₈ algebra is the largest and most complicated of these exceptional cases, and is often the last case of various theorems to be proved.

http://upload.wikimedia.org/wikipedia/commons/f/fe/E8_graph.svg

Google AJAX Feed + Firefox extensions => Piggy Bank and Solvent

http://simile.mit.edu/wiki/Piggy_Bank
Piggy Bank is a Firefox extension that turns your browser into a mashup
platform, by allowing you to extract data from different web sites and mix
them together.
Piggy Bank also allows you to store this extracted information locally for
you to search later and to exchange at need the collected information with
others.

http://simile.mit.edu/wiki/Solvent

Why do I need screen scrapers?

Piggy Bank needs web pages to embed information in a format that it can
understand. This format is called RDF (Resource Description Framework) and
its main advantage is that makes machine processing a lot easier.
Unfortunately, at these very early stages, not many web pages embed or link
to such "purer" RDF information. Piggy Bank, however, is capable of
executing a particular screen scraper on particular pages in order to
"extract" the information it needs.

In short, screen scrapers allow you to turn a regular web page into a
regular web page plus semantic data, and thus frees the data from the
page/site that contains it.

http://code.google.com/apis/ajaxfeeds/
What is the Google AJAX Feed API?

With the AJAX Feed API, you can download any public Atom or RSS feed using
only JavaScript, so you can easily mash up feeds with your content and
other APIs like the Google Maps API.

The Google AJAX Feed API takes the pain out of developing mashups in
JavaScript because you can now mash up feeds using only a few lines of
JavaScript, rather than dealing with complex server-side proxies. Making it
easy to quickly integrate feeds on your website, as shown below.

PyQT4 Python GUI programming

http://www.riverbankcomputing.co.uk/pyqt/download.php
http://zetcode.com/tutorials/pyqt4/firstprograms/
http://www.rkblog.rk.edu.pl/w/p/introduction-pyqt4/

Thursday, 15 November 2007

Yahoo developer series - movies

http://developer.yahoo.net/blogs/theater/archives/experts_at_work/

Map reduce open source style

http://developer.yahoo.com/blogs/hadoop/

Hadoop and Distributed Computing at Yahoo!

http://developer.yahoo.net/blog/archives/2007/07/yahoo-hadoop.html

Open Source Distributed Computing: Yahoo's Hadoop Support

July 25, 2007

For the last several years, every company involved in building large web-scale systems has faced some of the same fundamental challenges. While nearly everyone agrees that the "divide-and-conquer using lots of cheap hardware" approach to breaking down large problems is the only way to scale, doing so is not easy.

yahoo cluster The underlying infrastructure has always been a challenge. You have to buy, power, install, and manage a lot of servers. Even if you use somebody else's commodity hardware, you still have to develop the software that'll do the divide-and-conquer work to keep them all busy.

It's hard work. And it needs to be commoditized, just like the hardware has been...

We too have been dealing with this at Yahoo. Analyzing petabytes of data takes a lot of CPU power and storage. And given the way our needs (and the web as a whole) have been growing, there will likely be dozens of similarly demanding applications before long.

To build the necessary software infrastructure, we could have gone off to develop our own technology, treating it as a competitive advantage, and charged ahead. But we've taken a slightly different approach. Realizing that a growing number of companies and organizations are likely to need similar capabilities, we got behind the work of Doug Cutting (creator of the open source Nutch and Lucene projects) and asked him to join Yahoo to help deploy and continue working on the [then new] open source Hadoop project.

What started here as a 20 node cluster in March of 2006 was up to nearly 200 a month later and has continued to grow as it eats terabytes and terabytes of data. It wasn't long after that our code contributions back to Hadoop really started to ramp up as well.

http://lucene.apache.org/hadoop/

Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.

Here's what makes Hadoop especially useful:

Scalable: Hadoop can reliably store and process petabytes.
Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.

Hadoop is a Lucene sub-project that contains the distributed computing platform that was formerly a part of Nutch.

For more information about Hadoop, please see the Hadoop wiki.

Running Hadoop MapReduce on Amazon EC2 and Amazon S3

http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&categoryID=112

Wednesday, 14 November 2007

Linux changes to stop wearing down the harddisk

From http://lwn.net/Articles/256769/

Jonathan Corbet
Various fixes have been gathered on the Ubuntu wiki page, but the basic
idea is to change the power savings setting for the hard disk using hdparm
-B 254 /dev/hda (or /dev/sda). The 254 value sets the least aggressive
power savings mode; some users are reporting that 255 will disable power
management completely, while others say it has no effect.

The biggest change distributions can make to help alleviate this problem is
to reduce the number of writes, especially nearly useless writes, to the
disk. One of the culprits reported for Ubuntu is the acpid power management
daemon writing battery status to a logfile every 15 seconds, which seems
like a good way to ensure the battery life reduces more quickly than it
should. Some logging could be deferred or disabled when running from
battery.

Using the relatime option when mounting filesystems is another, fairly
simple, change that could be made to significantly reduce disk writes that
are likely to be pointless. Fedora 8 enables that option by default for all
systems, battery powered or not, for the disk performance increase that it
gives. People running older kernels, before 2.6.20 added the relatime
option, may want to consider disabling atime updates altogether using the
noatime mount option.

Miro - watch free internet video channels and play any video file

http://tvrss.net/
http://www.getmiro.com/ - some people have had problems with it
Azureus with the RSS Feed Scanner plugin to automate tv downloads using
tvrss.net feeds
http://www.deluge-torrent.org/News:Portal
Deluge is a lightweight, Free Software, cross-platform BitTorrent client.
Full Encryption
Trackerless Support
WebUI
Plugin System
Much more..

A Practical Introduction to GNU Privacy Guard in Windows

http://www.glump.net/dokuwiki/gpg/gpg_intro
By Brendan Kidwell

Exetel SMS API

https://www.exetel.com.au/include/Exetel_SMS_API_documentation.pdf

Money Bookers Automatic Payments

http://www.moneybookers.com/app/help.pl?s=m_shoppingcart

Tuesday, 13 November 2007

Find current version on Nokia phone

Find your current software version by entering *#0000# on your phone's
keypad.

Google Gphone - Android - An Open Handset Alliance Project

http://code.google.com/android/
The Open Handset Alliance, a group of more than 30 technology and mobile companies, is developing Android: the first complete, open, and free mobile platform. To help developers get started developing new applications, we're offering an early look at the Android Software Development Kit.

http://code.google.com/android/adc.html
Cool apps that surprise and delight mobile users, built by developers like you, will be a huge part of the Android vision. To support you in your efforts, Google has launched the Android Developer Challenge, which will provide $10 million in awards -- no strings attached -- for great mobile apps built on the Android platform.

Sports Arbitrage Betting program

http://www.safe-install.com/programs/win-risk-free-sports-arbitrage-finder.html
Haven't used this yet.

PostgreSQL Git repository

http://repo.or.cz/w/PostgreSQL.git

And public Git hosting
http://repo.or.cz/

repo.or.cz is a public Git hosting site.
You can create a project here and then publish your development by pushing
to it, or even enable push access for multiple developers.
Alternately, you can just set up a mirror of any project published elsewhere
and we will provide pull and gitweb access for the project.
(read more, incl. terms&conditions)

This service is BETA.
The service is maintained by Petr Baudis,
please contact him with any requests, proposals or issues.

Register project | Register user

How to grab a project?

git clone mirror_URL

See the crash courses at git.or.cz
for more detailed introduction. You can find out the mirror_URL
for each project at the project's summary page.

Store encrypted passwords in PostgreSQL

From http://www.depesz.com/index.php/2007/11/05/encrypted-passwords-in-database/

encrypted passwords in database

November 5th, 2007 by depesz

in most applications you have some variant of this table:

CREATE TABLE users ( id serial PRIMARY KEY, username TEXT NOT NULL, passwd TEXT );

and, usually, the passwd stores user password in clear text way.

this is usually not a problem, but in case you'd like to add password encryption in database, there are some ways to do it - and i'll show you which way i like most.

first solution is a no-brainer. make the app crypt the password and do whatever is neccessary.

now, this looks like a fine solution until you'll have more than 1 application that will be checking/setting passwords. and - usually - you will.

after all - even if you do not plan to put another website on the same database, odds are one day you'll want to change user password from psql. and what then?

so, it is better to leave the encryption job to postgres itself.

to make it so, we'll do some "magic".

first, let's make our users table in a way that it will automatically convert entered password to encrypted.

to do it - we will need pgcrypto module from contrib directory. if you dont know what i'm talking about - that's really bad, as contrib modules are extremly useful.

if you're using pre-packaged postgresql, there should be package named postgresql-contrib-your-version or similarly. just install it.

then, find pgcrypto.sql file. usually you can find it in places like /usr/share/postgresql/contrib/pgcrypto.sql, /usr/local/share/postgresql/contrib/pgcrypto.sql, /usr/local/pgsql/share/postgresql/contrib/pgcrypto.sql or similar.

when you have the file, just connect to your database of choice (using superuser account) and issue (from psql):

\i /home/pgdba/work/share/postgresql/contrib/pgcrypto.sql

which will load the pgcrypto module to your database.

now, for some more interesting fun.

for our users table, we'll add a simple trigger:

CREATE OR REPLACE FUNCTION trg_crypt_users_pass() RETURNS TRIGGER AS $BODY$ DECLARE BEGIN IF substr(NEW.passwd, 1, 3) <> '$1$' THEN NEW.passwd := crypt( NEW.passwd, gen_salt('md5') ); END IF; RETURN NEW; END; $BODY$ LANGUAGE 'plpgsql'; CREATE TRIGGER trg_crypt_users_pass BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE trg_crypt_users_pass();

you might wander why there is this if-with-substr.

it's simple - we want to encrypt only the password that do not start with '$1$'. reason? crypted password will start with '$1$', and if we didn't put the "if" there, the first update to users table (even if it wouldn't touch passwd field) would scramble the password, thus rendering account unusable.

now, let's test if it works:

INSERT INTO users (username, passwd) VALUES ('depesz', 'depesz'); INSERT INTO users (username, passwd) VALUES ('NULL-user', NULL); INSERT INTO users (username, passwd) VALUES ('test', ' '); INSERT INTO users (username, passwd) VALUES ('foo', '$1$');

and what is in the table?

ok, works as expected. the case with passwd = '$1$' is dubious, and we could "fix" the issue with adding length-check to trigger, but it doesn't really bother
me, so i'll leave it as it is - after all, to make a full check i would have to use a regexp, which is not really nice.

so, now our table has encrypted passwords. and i can easily search for users:

# select * from users where username = 'depesz' and crypt('depesz', passwd) = passwd; id | username | passwd ----+----------+------------------------------------ 1 | depesz | $1$Im51jH1k$/9AOm/t.4BixxF7YzZ5hx0 (1 row)

bad password check:

# select * from users where username = 'depesz' and crypt('bad-password', passwd) = passwd; id | username | passwd ----+----------+-------- (0 rows)

now. it's not really "easily". i could definitely do better than that.

so, let's introduce another datatype: "password":

CREATE DOMAIN password as TEXT;

now, let's convert data:

alter table users alter column passwd type password;

ok, but having another datatype doesn't give me anything good. yet.

i'd like to be able to do things like:

select * from users where username = 'depesz' and passwd = 'depesz';

without all this "crypt()" mess. so, let's write some small, custom operators.

because passwords can only "match" or "not match" we will need only 2 operators: "=" and "<>". so, there goes the code:

CREATE FUNCTION password_leq(password, TEXT) RETURNS bool as $BODY$ SELECT crypt($2, $1) = $1::text; $BODY$ language sql immutable; CREATE OPERATOR = ( leftarg = password, rightarg = text, negator = <>, procedure = password_leq );

CREATE FUNCTION password_lne(password, TEXT) RETURNS bool as $BODY$ SELECT crypt($2, $1) <> $1::text; $BODY$ language sql immutable; CREATE OPERATOR <> ( leftarg = password, rightarg = text, negator = =, procedure = password_lne );

CREATE FUNCTION password_req(TEXT, password) RETURNS bool as $BODY$ SELECT crypt($1, $2) = $2::text; $BODY$ language sql immutable; CREATE OPERATOR = ( leftarg = text, rightarg = password, negator = <>, procedure = password_req );

CREATE FUNCTION password_rne(TEXT, password) RETURNS bool as $BODY$ SELECT crypt($1, $2) <> $2::text; $BODY$ language sql immutable; CREATE OPERATOR <> ( leftarg = text, rightarg = password, negator = =, procedure = password_rne );

now, thanks to this we can:

# select * from users where passwd = 'depesz'::text; id | username | passwd ----+----------+------------------------------------ 1 | depesz | $1$Im51jH1k$/9AOm/t.4BixxF7YzZ5hx0 (1 row)

but, unfortunatelly, this will fail:

# select * from users where passwd = 'depesz'; id | username | passwd ----+----------+-------- (0 rows)

reason is very simple - postgresql, when running this query will implicitly cast 'depesz' to 'password', so the "=" operator will be called for (password = password) and not for (password = text)!

to make it working we'll need 2 more operators:

CREATE FUNCTION password_beq(left password, right password) RETURNS bool as $BODY$ DECLARE left_crypted bool; right_crypted bool; BEGIN left_crypted := ( substr(left, 1, 3) = '$1$' ); right_crypted := ( substr(right, 1, 3) = '$1$' ); IF (left_crypted) AND (NOT right_crypted) THEN RETURN crypt(right, left)::TEXT = left::TEXT; END IF; IF (NOT left_crypted) AND (right_crypted) THEN RETURN crypt(left, right)::TEXT = right::TEXT; END IF; RETURN left::TEXT = right::TEXT; END; $BODY$ language plpgsql immutable; CREATE OPERATOR = ( leftarg = password, rightarg = password, negator = <>, procedure = password_beq );

CREATE FUNCTION password_bne(password, password) RETURNS bool as $BODY$ SELECT NOT password_beq($1, $2); $BODY$ language sql immutable; CREATE OPERATOR <> ( leftarg = password, rightarg = password, negator = =, procedure = password_bne );

now, the password_beq function is quite complex. what it does? it tries to guess which side of comparison is encrypted, and which is not.

when only one side of comparison has '$1$' at the beginning, it crypts the other argument, and then compares. if both, or none of arguments have '$1$' - it just compares them as simple strings.

now, i can:

# select * from users where passwd = 'depesz'; id | username | passwd ----+----------+------------------------------------ 1 | depesz | $1$Im51jH1k$/9AOm/t.4BixxF7YzZ5hx0 (1 row)

so, without modifying client code i modified storage of password to make them crypted. which is good, at the very least for me.

this solution has one slight "issue" which can be perceived both as a drawback, or as a bonus benefit:

# select * from users where passwd = '$1$Im51jH1k$/9AOm/t.4BixxF7YzZ5hx0'; id | username | passwd ----+----------+------------------------------------ 1 | depesz | $1$Im51jH1k$/9AOm/t.4BixxF7YzZ5hx0 (1 row)

that is - instead of using standard password i can also authenticate using its hash. whether it's good i leave for you to decide - for me it's definitely a benefit.

Monday, 12 November 2007

Visual studio build script

See at http://www.codinghorror.com/blog/archives/000988.html

Well, just to remind you, Visual Studio project file is MSBuild script file and so each project can be built using "MSBuild MySuperProject.csproj" command. So most people have build script already, but they never use it outside Visual Studio, which is where you are completely to the point.

Oleg Tkachenko on November 1, 2007 03:30 AM

Friday, 9 November 2007

CD DVD burning and other cool applications

http://www.imgburn.com/ Img Burn
http://www.getpaint.net/ Paint.net
http://bluemars.org/clipx/
http://www.scootersoftware.com/
http://www.regexbuddy.com/
http://timesnapper.com/

Your Life: The Movie

Download Trial_{TimeSnapper Professional}

TimeSnapper lets you play back your week just like a movie. You can play it at any speed you like, and jump in at any time you like.

Purchase Now_{Professional License}

When it's time to fill out that dreaded timesheet, TimeSnapper is a savior. No need to tear your hair out trying to remember where all the time went.

http://taskix.robustit.com/
http://www.launchy.net/
http://www.inkscape.org/

Wednesday, 7 November 2007

Javascript Plotting

http://chartr.rubyforge.org/

Chartr is a bit of Ruby glue to interface with the Plotr Javascript library, available here: solutoire.com/plotr/

Chartr is written by David N. Welton for DedaSys LLC.

http://solutoire.com/plotr/

Plotr

Some time ago I was looking for a charting framework for Prototype and I couldn't find it, just because there's none. So that's where it all started. I came across PlotKit, a well written piece of javascript that enables developers to use Canvas or SVG elements for rendering bar, line and pie charts. The only thing was that PlotKit needed the Mochikit library to work. So I took some parts of PlotKit and wrote some parts myself. The result is a lightweight charting framework (12kb!) named Plotr. It's released under the BSD license.

Tango icon library

http://tango.freedesktop.org/Tango_Icon_Library

Drop IO

Found on http://del.icio.us/popular/
http://drop.io/

Each drop is:

A private place for storing and sharing photos, video, audio, notes, docs, etc. Each is accessible only to those whom you tell exactly where to look.

No signup and no 'account'. We don't even ask for your email. Create as many drops as you want, and access each at: drop.io/thedropname.

Google mapplets

http://www.google.com/apis/maps/documentation/mapplets/

Add any page to iGoogle

http://www.bolinfest.com/changeblog/2007/05/03/your-page-here-an-igoogle-gadget/

Tuesday, 6 November 2007

2007 Restaurant Winners SMH

http://www.smh.com.au/news/good-living/2007-winners/2006/09/04/1157222070830.html

Simon Thomsen and Catherine Keenan
September 4, 2006

Restaurant of the year Becasse, city

Chef of the year Katrina Kanetani, pastry chef, Pier, Rose Bay

Best new restaurant Bentley Restaurant & Bar, Surry Hills

Best regional restaurant Fins, Byron Bay

The Sydney Morning Herald Award for Professional Excellence Tony Bilson of Bilson's, for his commitment to fine dining, the marriage of food and wine, his influence on the next generation of chefs and for 35 years of great food.

The Sydney Morning Herald Silver Service Award Toni Urquhart of No. 2 Oak Street, Bellingen, for enthusiastic, charming, genuine, warm-hearted country hospitality and being a great sommelier to boot.

The Good Food Guide Sommelier Award Nick Hildebrandt of Bentley Restaurant & Bar, for showing us a bigger world of wine that's accessible, fascinating and, best of all, fun.

The Josephine Pignolet Young Chef of the Year Award Philip Wood from Tetsuya's. The young chef receives a return international flight courtesy of Qantas, along with the chance to work in leading European restaurants, a substantial cash prize from food suppliers and leading Sydney chefs, and a set of Furitechnics knives.

Editors' picks

Favourite bistro Bistrode, Surry Hills

Favourite Mediterranean VINI, Surry Hills

Favourite Asian Billy Kwong, Surry Hills

Favourite pizza La Disfida, Haberfield

Favourite yum cha Marigold Citymark, Haymarket

Favourite bar Bambini Wine Room, city

Favourite cafe Brasserie Bread, Banksmeadow

City

Three hats

Bilson's, Claude's, est., Guillaume at Bennelong, Marque, Pier, Quay, Tetsuya's

Two hats

Aria, Becasse, Bentley Restaurant & Bar, Bistro Moncur, Buon Ricordo, Iceberg's Dining Room & Bar, Lucio's, Omega, Pello, Pier Tasting Room, Pilu at Freshwater, Rockpool, Sean's Panaroma, Yoshii

One hat

Alchemy 731, Assiette, Astral, The Bathers' Pavilion Restaurant, Billy Kwong, Bird Cow Fish, Bistro CBD, Bistro Moore, Bistrode, The Boathouse on Blackwattle Bay, buzo, Catalina Rose Bay, Coast, Fish Face, Flying Fish, Forbes & Burton, Forty One, Galileo, Grand National, Il Piave, Jonah's, La Sala, Longrain, Lo Studio, Lotus, Mezes at Omega, Milsons, Otto, Restaurant Atelier, Restaurant Balzac, Restaurant Sojourn, Sailors Thai Restaurant, Three Weeds, The Wharf, Ying's

Regional

Two hats

Collits' Inn (Hartley Vale), Fins (Byron Bay), Solitary (Leura Falls)

One hat

Ashcrofts (Blackheath), Bannister's (Mollymook), Boomerang (Byron Bay), Caveau (Wollongong), Courgette (Canberra City), Darley's (Katoomba), dish (Byron Bay), Eschalot (Bowral), The Journeyman (Berrima), Lolli Redini (Orange), Lochiel House (Kurrajong Heights), Neila (Cowra), No. 2 Oak Street (Bellingen), Ottoman Cuisine (Barton), Restaurant II (Newcastle), The River (Moruya), Sage (Braddon), Tonic (Millthorpe), Vulcans (Blackheath), Zest (Nelson Bay)Key

Cool CSS for a webpage

http://www.cs.utexas.edu/~nate/
http://www.cs.utexas.edu/~nate/styles.css

Regular expressions with C on Linux

Have to try this out and see how well it works...
man regex

REGCOMP(3)
Linux Programmers Manual
REGCOMP(3)

NAME
regcomp, regexec, regerror, regfree - POSIX regex functions

SYNOPSIS
#include <sys/types.h>
#include <regex.h>

int regcomp(regex_t *preg, const char *regex, int cflags);
int regexec(const regex_t *preg, const char *string, size_t nmatch,
regmatch_t pmatch[], int eflags);
size_t regerror(int errcode, const regex_t *preg, char *errbuf,
size_t errbuf_size);
void regfree(regex_t *preg);

POSIX REGEX COMPILING
regcomp is used to compile a regular expression into a form that is
suitable for subsequent regexec searches.

regcomp is supplied with preg, a pointer to a pattern buffer storage
area; regex, a pointer to the null-terminated string and cflags, flags used
to determine the type of compilation.

All regular expression searching must be done via a compiled pattern
buffer, thus regexec must always be supplied with the address of a regcomp
initialized pattern buffer.

cflags may be the bitwise-or of one or more of the following:

REG_EXTENDED
Use POSIX Extended Regular Expression syntax when
interpreting regex. If not set, POSIX Basic Regular Expression syntax is
used.

REG_ICASE
Do not differentiate case. Subsequent regexec searches using
this pattern buffer will be case insensitive.

REG_NOSUB
Support for substring addressing of matches is not required.
The nmatch and pmatch parameters to regexec are ignored if the pattern
buffer supplied was compiled with this flag set.

REG_NEWLINE
Match-any-character operators donât match a newline.

A non-matching list ([^...]) not containing a newline does
not match a newline.

Match-beginning-of-line operator (^) matches the empty string
immediately after a newline, regardless of whether eflags, the execution
flags of regexec, contains REG_NOTBOL.

Match-end-of-line operator ($) matches the empty string
immediately before a newline, regardless of whether eflags contains
REG_NOTEOL.

c++ count_if example

Hopefully the attached file shows up...

C++ algorithm examples

From http://cppreference.com/
#include <iostream>
#include <algorithm>
#include <numeric>
#include <vector>

/* count_if */
void count_if() {

int nums[] = { 0, 1, 2, 3, 4, 5, 9, 3, 13 };
int start = 0;
int end = 9;

int target_value = 3;
int num_items = std::count_if( nums+start,
                    nums+end,
                    std::bind2nd(std::equal_to<int>(), target_value) );

std::cout << " nums[] contains " << num_items << " items matching " << target_value << std::endl;
}

/* accumulate */
void accumulate() {

int nums[] = { 0, 1, 2, 3, 4, 5, 9, 3, 13 };
int start = 0;
int end = 9;

int target_value = 3;
int num_items = std::accumulate( nums+start,
                    nums+end,
                    target_value);

std::cout << " accumulate result: " << num_items << std::endl;
}

/* adjacent_difference */
void adjacent_difference() {

int nums[] = { 0, 1, 2, 3, 4, 5, 9, 3, 13, 0 };
int start = 3;
int end = 4;
int result = 0;

std::adjacent_difference( nums+start,
                    nums+end,
                    &result);
std::cout << " adjacent_difference: result: " << result << std::endl;
}

/* adjacent find */
void adjacent_find() {
std::vector<int> v1;
for( int i = 0; i < 10; i++ ) {
   v1.push_back(i);
   // add a duplicate 7 into v1
   if( i == 7 ) {
     v1.push_back (i);

   }
}

std::vector<int>::iterator result;
result = std::adjacent_find( v1.begin(), v1.end() );

if( result == v1.end() ) {
   std::cout << " Did not find adjacent elements in v1" << std::endl;
} else {
   std::cout << " Found matching adjacent elements starting at " << *result << std::endl;
}
}

/* binary_search
   Note: list has to be in order beforehand for it to work
*/
void binary_search()
{
    int nums[] = { -242, -1, 0, 5, 8, 9, 11 };
    int start = 0;
    int end = 7;

    for( int i = 0; i < 10; i++ ) {
        if( std::binary_search( nums+start, nums+end, i ) ) {
           std::cout << " nums[] contains " << i << std::endl;
        } else {
           std::cout << " nums[] DOES NOT contain " << i << std::endl;
        }
    }
}

/* copy */
void copy()
{
    std::vector<int> from_vector;
    for( int i = 0; i < 10; i++ ) {
        from_vector.push_back( i );
    }

    std::vector<int> to_vector(10);

    std::copy( from_vector.begin(), from_vector.end(), to_vector.begin() );

    std::cout << "to_vector contains: ";
    for( unsigned int i = 0; i < to_vector.size(); i++ ) {
        std::cout << to_vector[i] << " ";
    }
    std::cout << std::endl;
}

/* copy_backward */
void copy_backward()
{
std::vector<int> from_vector;
for( int i = 0; i < 10; i++ ) {
   from_vector.push_back( i );
}

std::vector<int> to_vector(15);

std::copy_backward( from_vector.begin(), from_vector.end(), to_vector.end() );
std::cout << "to_vector contains: ";
for( unsigned int i = 0; i < to_vector.size(); i++ ) {
   std::cout << to_vector[i] << " ";
}
std::cout << std::endl;
}

int main ()
{
    count_if();
    accumulate();
    adjacent_difference();
    adjacent_find();
    binary_search();
    copy();
    copy_backward();

    return 0;
}

The Curse of Xanadu

http://www.wired.com/wired/archive/3.06/xanadu_pr.html

By Gary Wolf

It was the most radical computer dream of the hacker era. Ted Nelson's Xanadu project was supposed to be the universal, democratic hypertext library that would help human life evolve into an entirely new form. Instead, it sucked Nelson and his intrepid band of true believers into what became the longest-running vaporware project in the history of computing - a 30-year saga of rabid prototyping and heart-slashing despair. The amazing epic tragedy.

Ten Tips for a (Slightly) Less Awful Resume

http://steve-yegge.blogspot.com/2007/09/ten-tips-for-slightly-less-awful-resume.html

Today's scientific question is: why are the resumes of programmers so uniformly awful? And how do we fix them? The resumes, that is.

If you've spent more than approximately seventeen kiloseconds as an industry programmer, you've had to review bad tech resumes. It's just part of the job. Programmer resumes ultimately have to be gauged by programmers — it takes one to know one. So it winds up being a kind of karmic revenge on you for bad resumes that you've written. C'mon, you know you've done it. You even knew it was bad when you were writing it. Admit it! You listed HTML under programming languages, didn't you? Argh!

So why are tech resumes so bad? You know what I mean. You see the craziest stuff on resumes. Like the candidate who proudly lists every Windows API call she's ever used. Or the candidate who lists every course he took starting from junior high school. Or the one who lists college extension courses he took while doing time for armed robbery.

Execution in the Kingdom of Nouns

http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html
Hello, world! Today we're going to hear the story of Evil King Java and his quest for worldwide verb stamp-outage.

Caution: This story does not have a happy ending. It is neither a story for the faint of heart nor for the critical of mouth. If you're easily offended, or prone to being a disagreeable knave in blog comments, please stop reading now.

Comparison of different SQL implementations

http://troels.arvin.dk/db/rdbms/

The goal of this page — which is a work in progress — is to gather information relevant for people who are porting SQL from one product to another and/or are interested in possibilities and limits of 'cross-product' SQL.

The following tables compare how different DBMS products handle various SQL (and related) features. If possible, the tables also state how the implementations should do things, according to the SQL standard.

Standard / PostgreSQL / DB2 / MS SQL Server / MySQL / Oracle

Friday, 2 November 2007

Anyterm - A Terminal Anywhere

http://anyterm.org/

Introduction

Have you ever wanted SSH or telnet access to your system from an "internet
desert" - from behind a strict firewall, from an internet cafe, or even
from a mobile phone? Anyterm is a combination of a web page and a web
server module that provides this access - see the demos.

Anyterm can use almost any web browser and even works through firewalls.
There is experimental support for mobile phones using WAP. If you join
my.anyterm.org you can access your systems straight away via our server
with no software to install anywhere. Alternatively, you can run the
Anyterm software on your own system - see the deployment examples.

We can also help you to integrate Anyterm-type functionality into your own
applicatons, for example to web-enable a legacy system, or an embedded
system. Contact us for details.

How It Works

Anyterm consists of some Javascript on a web page, an XmlHttpRequest
channel on standard ports back to the server, and an Apache module. The
module uses a pseudo-terminal to communicate with a shell or other
application, and includes terminal emulation. Key presses are picked up by
the Javscript which sends them to the Apache module; changes to the
emulated screen are sent from the module to the Javascript which updates
its display. Performance is quite reasonable and SSL can be used to secure
the connection.

my.anyterm.org

my.anyterm.org is designed for systems administrators and others who want
the benefit of access from anywhere using Anyterm, but who don't want to
risk installing the Anyterm software on their own servers. For a small
charge you can use our Anyterm installation to connect to your own systems.

Wednesday, 31 October 2007

haacked blog

http://haacked.com/

Tuesday, 30 October 2007

Null bytes to fool virus detection

http://blog.didierstevens.com/2007/10/23/a000n0000-0000o000l00d00-0i000e000-00t0r0000i0000c000k/

When I found a malicious script riddled with 0×00 bytes, SANS handler Bojan
Zdrnja explained to me that this was an old trick. When rendering an HTML
page, Internet Explorer will ignore all zero-bytes (bytes with value zero,
0×00). Malware authors use this to obscure their scripts. But this old
trick still packs a punch.

Virus Total

http://www.virustotal.com/
Virustotal is a service that analyzes suspicious files and facilitates the
quick detection of viruses, worms, trojans, and all kinds of malware
detected by antivirus engines.

Hash functions

http://burtleburtle.net/bob/hash/doobs.html

Code and analysis of different hash functions.

Hashes looked at: Additive, Rotating, One-at-a-Time, Bernstein, FNV,
Pearson, CRC, Generalized, Universal, Zobrist, Paul Hsieh's, My Hash,
lookup3.c, MD4

Thursday, 25 October 2007

GIMP 2.4 preview

http://www.redhatmagazine.com/2007/10/23/gimp-24-preview/

by Nicu Buculei

Fedora 8 test releases have a surprise for all users interested in
graphics: a release candidate for the new GIMP 2.4, meaning the final
version will get the stable GIMP 2.41. This is exciting news, as the
previous major release, GIMP 2.2, is several years old, and a lot of new
features were added in the meantime.

In this article, we'll take a look at some of the most visible new
features, but beyond them, there are tons of less visible things: speedups,
a decrease in memory consumption, better importing and exporting, a better
print plugin, better EXIF support, changed scripting language for plugins,
zoomable preview for plugins, many bug fixes, and more.

If you'd like to see a more practical application of these tools, take a
look at my article on improving portraits with GIMP.

Note: You can click on any image in this article to see a larger version.

Fail2ban

http://www.fail2ban.org/wiki/index.php/Main_Page
Fail2ban scans log files like /var/log/pwdfail or /var/log/apache/error_log
and bans IP that makes too many password failures. It updates firewall
rules to reject the IP address.

Ohloh

Open source metrics and connecting people
http://www.ohloh.net/learn
Interesting site. Awesome design and pictures.

shipping container stacking strategy

(See attached file: shipping_issues_08.jpg)

Tuesday, 23 October 2007

python genetic programming

http://www.freenet.org.nz/python/pygene/doc/

Package pygene

pygene is a library for genetic algorithms in python

It aims to be very simple to use, and suitable for people new to genetic
algorithms.
Submodules
gamete: Implements gametes, which are the result of splitting an
organism's genome in two, and are used in the organism's sexual
reproduction
gene: Implements a collection of gene classes
organism: Implements classes for entire organisms
population: pygene/population.py - Represents a population of
organisms
prog: Implements genetic programming organisms
xmlio: xmlio.py

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/199121

<code>
#
# genetic.py
#

import random

MAXIMIZE, MINIMIZE = 11, 22

class Individual(object):
alleles = (0,1)
length = 30
seperator = ''
optimization = MINIMIZE

def __init__(self, chromosome=None):
self.chromosome = chromosome or self._makechromosome()
self.score = None # set during evaluation

def _makechromosome(self):
"makes a chromosome from randomly selected alleles."
return [random.choice(self.alleles) for gene in range(self.length)]

def evaluate(self, optimum=None):
"this method MUST be overridden to evaluate individual fitness
score."
pass

def crossover(self, other):
"override this method to use your preferred crossover method."
return self._twopoint(other)

def mutate(self, gene):
"override this method to use your preferred mutation method."
self._pick(gene)

# sample mutation method
def _pick(self, gene):
"chooses a random allele to replace this gene's allele."
self.chromosome[gene] = random.choice(self.alleles)

# sample crossover method
def _twopoint(self, other):
"creates offspring via two-point crossover between mates."
left, right = self._pickpivots()
def mate(p0, p1):
chromosome = p0.chromosome[:]
chromosome[left:right] = p1.chromosome[left:right]
child = p0.__class__(chromosome)
child._repair(p0, p1)
return child
return mate(self, other), mate(other, self)

# some crossover helpers ...
def _repair(self, parent1, parent2):
"override this method, if necessary, to fix duplicated genes."
pass

def _pickpivots(self):
left = random.randrange(1, self.length-2)
right = random.randrange(left, self.length-1)
return left, right

#
# other methods
#

def __repr__(self):
"returns string representation of self"
return '<%s chromosome="%s" score=%s>' % \
(self.__class__.__name__,
self.seperator.join(map(str,self.chromosome)), self.score)

def __cmp__(self, other):
if self.optimization == MINIMIZE:
return cmp(self.score, other.score)
else: # MAXIMIZE
return cmp(other.score, self.score)

def copy(self):
twin = self.__class__(self.chromosome[:])
twin.score = self.score
return twin

class Environment(object):
def __init__(self, kind, population=None, size=100, maxgenerations=100,

crossover_rate=0.90, mutation_rate=0.01, optimum=None):
self.kind = kind
self.size = size
self.optimum = optimum
self.population = population or self._makepopulation()
for individual in self.population:
individual.evaluate(self.optimum)
self.crossover_rate = crossover_rate
self.mutation_rate = mutation_rate
self.maxgenerations = maxgenerations
self.generation = 0
self.report()

def _makepopulation(self):
return [self.kind() for individual in range(self.size)]

def run(self):
while not self._goal():
self.step()

def _goal(self):
return self.generation > self.maxgenerations or \
self.best.score == self.optimum

def step(self):
self.population.sort()
self._crossover()
self.generation += 1
self.report()

def _crossover(self):
next_population = [self.best.copy()]
while len(next_population) < self.size:
mate1 = self._select()
if random.random() < self.crossover_rate:
mate2 = self._select()
offspring = mate1.crossover(mate2)
else:
offspring = [mate1.copy()]
for individual in offspring:
self._mutate(individual)
individual.evaluate(self.optimum)
next_population.append(individual)
self.population = next_population[:self.size]

def _select(self):
"override this to use your preferred selection method"
return self._tournament()

def _mutate(self, individual):
for gene in range(individual.length):
if random.random() < self.mutation_rate:
individual.mutate(gene)

#
# sample selection method
#
def _tournament(self, size=8, choosebest=0.90):
competitors = [random.choice(self.population) for i in range(size)]
competitors.sort()
if random.random() < choosebest:
return competitors[0]
else:
return random.choice(competitors[1:])

def best():
doc = "individual with best fitness score in population."
def fget(self):
return self.population[0]
return locals()
best = property(**best())

def report(self):
print "="*70
print "generation: ", self.generation
print "best: ", self.best

---------------------------------------------------------------------
#
# onemax.py - useage example
#
# the fittest individual will have a chromosome consisting of 30 '1's
#

import genetic

class OneMax(genetic.Individual):
optimization = genetic.MAXIMIZE
def evaluate(self, optimum=None):
self.score = sum(self.chromosome)
def mutate(self, gene):
self.chromosome[gene] = not self.chromosome[gene] # bit flip

if __name__ == "__main__":
env = genetic.Environment(OneMax, maxgenerations=1000, optimum=30)
env.run()
</code>