Jean-Philippe Lebœuf / Notebook / Archives / "computer science"

August 19, 2005

Finalement l'avenir risque d'être sympa

« 2016 - 2020 : les formes de vie électronique obtiendront des droits les protégeant. »
« 2031-2035 : les ordinateurs seront devenus plus intelligents que les humains. »
« 2051+ : vos pensées, sentiments, mémoires seront transferables à un ordinateur. »

En regardant la frise originale complète, je constate encore l'orientation très SF des innovations présentées dans cette étude (il faudrait que je jette un oeil au document original). Ainsi l'informatique est surtout représentée par l'intelligence artificielle et la robotique, avec un peu d'intelligence ambiante, une dose de traitement du langage naturel et une pincée d'informatique sociale. Bref : il ne me semble pas voir de mention des produits directs des sciences de l'information. Pourtant tout cela n'arrivera pas sans des outils supportant la complexité émergente…

Je me demande si Cédric va bondir sur sa chaise…

(Citations françaises provenant d'InternetActu.net : La chronologie du futur)

Posted by Jean-Philippe on August 19, 2005 5 Comments, 167 TrackBacks

July 22, 2005

Sale temps pour l'innovation

Alan Kay vient de se faire virer de chez Hewlett Packard puisque l'équipe “Advanced Software Research” à laquelle il appartenait a été démantelée (cf. SiliconValley.com's Good Morning Silicon Valley).

Le docteur Kay (US Wikipedia, Wikipédia FR) était quand même l'un des membres éminents du PARC Xerox – Palo Alto Research Center (US Wikipedia, Wikipédia FR), a développé ou participé au développement d'un nombre incroyable de technologies, comme les interfaces graphiques ou la programmation orientée objet, et est le créateur du langage Smalltalk (cf. A Conversation with Alan Kay – Big talk with the creator of Smalltalk — and much more. dans le magazine ACM Queue). Alan Kay a en outre reçu en 2003 le Turing Award, l'équivalent du prix nobel en informatique (US Wikipedia, Wikipédia FR).

Apprendre et pratiquer Smalltalk en 2000/2001 lors de mon DEA fut une expérience des plus instructives, une sorte de révélation. C'était comme si je n'avais pas tout compris de l'informatique avant, comme s'il manquait quelque chose (et il me manque encore beaucoup). On a beau avoir fait du C++, du Java ou de l'Ada, je pense que l'on ne peut pas avoir vraiment compris la programmation orientée objet tant que l'on a pas pratiqué Smalltalk.

Les concepts intégrés dans Smalltalk ainsi que certaines des idées de base de projets plus récents de Kay (comme Croquet) sont à l'origine de nombreux développements dans ma thèse.

C'est pour moi d'autant plus une bien sombre nouvelle que les laboratoires de recherche de Hewlett Packard étaient sur la liste de mes cibles pour l'après-thèse….

(Via Slashdot: HP Fires Father of OOP)

Posted by Jean-Philippe on July 22, 2005 13 Comments, 245 TrackBacks

July 19, 2005

PanIQ!

In a comment on Pyrasun 2.0 - The Spille Blog: Action Hippo Rangers!:

Ha! As the population of the world increases, the sum of its IQ remains a constant.

Great article on what seems now to be called ping-pong development, or “how to surf on technologies”.

Posted by Jean-Philippe on July 19, 2005 1 Comments, 377 TrackBacks

May 25, 2005

Web proxy systems, mainly in Java

After my previous entry entitled Structured graphics, diagramming, graphs and networks in Java one month ago, here come this new entry about web proxies, starting my researches from the article Open Source Personal Proxy Servers Written In Java of Manageability.

Generic proxy systems

Transcoding architectures

IBM Research Web Intermediaries (WBI):

Aiming to produce a more powerful and flexible web, we have developed the concept of intermediaries. Intermediaries are computational entities that can be positioned anywhere along an information stream and are programmed to tailor, customize, personalize, or otherwise enhance data as they flow along the stream.
A caching web proxy is a simple example of an HTTP intermediary. Intermediary-based programming is particularly useful for adding functionality to a system when the data producer (e.g., server or database) or the data consumer (e.g., browser) cannot be modified.
Web Intermediaries (WBI, pronounced "webby") is an architecture and framework for creating intermediary applications on the web. WBI is a programmable web proxy and web server. We are now making available the WBI Development Kit for building web intermediary applications within the WBI framework, using Java APIs. Many types of applications can be built with WBI; you can also download some plugins.
One key intermediary application is the transformation of information from one form to another, a process called transcoding. In fact, the WBI Development Kit now provides the same plugin APIs as IBM WebSphere Transcoding Publisher. Applications developed with WBI version 4.5 can be used with the Transcoding Publisher product (with a few exceptions), as WBI constitutes the backbone on which transcoding operations run.
Other examples of intermediary applications include:
* personalizing the web
* password & privacy management
* awareness and interactivity with other web users
* injecting knowledge from "advisors" into a user's web browsing
* filtering the web for kids
WBI has an interesting and entertaining history.

[quite old: the last update of The WBI Development Kit's tech page was made on “March 25,2004”, but the downloadable files are older (2000, June) – alphaWorks License]

There are two related webpages on IBM alphaWorks: the WBI Development Kit for Java webpage and the Transcoding Proxy webpage.

Many interesting papers can be found in the Publications section, coming from the Almaden Research Center of IBM Research.

WebFountain is another project from IBM Research (UIMA: The Unstructured Information Management Architecture Project) dealing with the problem of search in not-full-structured data: WebFountain is a set of research technologies that collect, store and analyze massive amounts of unstructured and semi-structured text. It is built on an open, extensible platform that enables the discovery of trends, patterns and relationships from data. Again, papers in the Publications section. IBM alphaWorks owns a Semantics research topic area.
AT&T Mobile Network:

AT&T Mobile Netowrk (AMN - formerly known as iMobile) is a project that addresses the research issues in building mobile service platforms. AMN currently consists of three editions: Standard Edition (SE), Enterprise Edition (EE), and Micro Edition (ME).
AMN SE was built by extending iProxy, a programmable proxy. The proxy maintains user and device profiles, accesses and processes internet resources on behalf of the user, keeps track of user interactions, and performs content transformations according to the device and user profiles. The user accesses internet services through a variety of wireless devices and protocols (cell phones with SMS, WAP phones, PDA's, AOL Instant Messenger, Telnet, Email, Http, etc.)
The main research issues in AMN include
* Authentication: How does the proxy and associated services authenticate the user?
* Profile Management: How does the proxy maintain the user and device profiles? How do the profiles affect the services?
* Trancoding Service: How does the proxy map various formats (HTML, XML, WML, Text, GIF, JPEG, etc.) from one form to the other?
* Personalized Services: How can new services be created by taking advantage of the user access logs and location/mobility information?
* Deployment: How should the proxy be deployed? On the server side, on the network, on the client side, or should we use a mixed approach?

[no download]

Developped by AT&T Labs – Research.
The transcoders project on java.net:

[not active – Apache Software License]

There are somewhat interesting references in the description of the project.
Pluxy:

Pluxy is a modular Web proxy which can receive a dynamic set of services. Pluxy provides the infrastructure to download services, to execute them and to make them collaborate. Pluxy comes with a set of basic services like collaborative HTTP request processing, GUI management and distributed services. Three Pluxy applications are introduced: a collaborative filtering service for the Web, an extended caching system and a tool to know about document changes.

[dead: last modifications of the webpages were made from March to April, 1999]

Two papers written by Olivier Dedieu from the INRIA's SOR Action are available, with the same title: Pluxy : un proxy Web dynamiquement extensible (INRIA Research Report RR-3417, may 1998) and Pluxy : un proxy Web dynamiquement extensible (1998 NoTeRe colloquium, 20-23 october 1998).
eRACE Project:

The extensible Retrieval Annotation Caching Engine (eRACE) is a middleware system designed to support the development and provision of intermediary services on Internet. eRACE is a modular, programmable and distributed proxy infrastructure that collects information from heterogeneous Internet sources and protocols according to end-user requests and eRACE profiles registered within the infrastructure.
Collected information is stored in a software cache for further processing, personalized dissemination to subscribed users, and wide-area dissemination on the wireline or wireless Internet. eRACE supports personalization by enabling the registration, maintenance and management of personal profiles that represent the interests of individual users. Furthermore, the structure of eRACE allows the customization of its service provision according to information-access modes (pull or push), client-proxy communication (wireline or wireless; email, HTTP, WAP), and client-device capabilities (PC, PDA, mobile phone, thin clients).Finally, eRACE supports the ubiquitous provision of services by decoupling information retrieval, storage and filtering from content publishing and distribution.
eRACE can easily incorporate mechanisms for providing subscribed users with differentiated service-levels at the middleware level. This is achieved by the translation of user requests and eRACE profiles into ``eRACE requests'' tagged with QoS information. These requests are scheduled for execution by an eRACE scheduler, which can make scheduling decisions based on the QoS tags.
Performance scalability is an important consideration for eRACE given the expanding numbers of WWW users, the huge increase of information sources available on the Web, and the need to provide robust service. To this end, the performance-critical components of eRACE are designed to support multithreading and distributed operation, so that they can be easily deployed on a cluster of workstations.
The eRACE system consists of protocol-specific proxies, like WebRACE, mailRACE, newsRACE and dbRACE, that gather information from the World-Wide Web, POP3 email-accounts, USENET NNTP-news, and Web-database queries, respectively. At the core of eRACE lies a user-driven, high-performance and distributed crawler, filtering processor and object cache, written entirely in Java. Moreover, the system employs Java-based mobile agents to enhance the distribution of loads and its adaptability to changing network-traffic conditions.

[old: no paper since 2002]

The Papers section contains many publications, from Marios Dikaiakos (Department of Computer Science, University of Cyprus) and Demetris Zeinalipour (now Department of Computer Science & Engineering, University of California, Riverside) among others. For example: Intermediary Infrastructures for the WWW.
Platform for Information Applications (PIA):

The Platform for Information Applications (PIA) is an open source framework for rapidly developing flexible, dynamic, and easy to maintain information browser-based applications. Such applications are created without programming and can be maintained by users and office administrators.
This framework has been used to build a broad range of applications, including a "workflow" web server that handles all of the purchase authorizations, time cards, and other (ex-)paperwork at Ricoh Innovations, Inc.
The PIA does this by separating an application into a core processing engine (a shared software engine, akin to a web server) and a task-specific collection of "active" XML pages, which specify not only the content but also the behavior of the application (XML is the W3C standard, for eXtensible Markup Language). So one document, by itself, can include other documents (or pieces of them), iterate over lists, make decisions, calculate, search/substitute text, and in general do almost anything a traditional "CGI script" or document processing program would do.
Application developers can extend the basic set of HTML and PIA elements ("tags") by defining new ones in terms of the existing ones. As a result, a PIA application can be customized simply by editing a single, familiar-looking XML page... in contrast to conventional Web applications, where even a simple change (like adding an input field) might require finding and fixing Perl CGI scripts or recompiling Java classes in several directories.

[very old: “2.1.6 built Tue Apr 3 11:57:21 PDT 2001”]
The concept of “transcoding services” is implemented on the server side in Apache Cocoon:

Apache Cocoon is a web development framework built around the concepts of separation of concerns and component-based web development.
Cocoon implements these concepts around the notion of 'component pipelines', each component on the pipeline specializing on a particular operation. This makes it possible to use a Lego(tm)-like approach in building web solutions, hooking together components into pipelines without any required programming.
Cocoon is "web glue for your web application development needs". It is a glue that keeps concerns separate and allows parallel evolution of all aspects of a web application, improving development pace and reducing the chance of conflicts.

[active – Apache Software License]

Back to 1995, the paper which founded a big part of the concept: Application-Specific Proxy Servers as HTTP Stream Transducers.

This kind of services are being standardized by the Open Pluggable Edge Services (OPES) Working Group of the IETF (The Internet Engineering Task Force):

The Internet facilitates the development of networked services at the application level that both offload origin servers and improve the user experience. Web proxies, for example, are commonly deployed to provide services such as Web caching, virus scanning, and request filtering. Lack of standardized mechanisms to trace and to control such intermediaries causes problems with respect to failure detection, data integrity, privacy, and security.
The OPES Working Group has previously developed an architectural framework to authorize, invoke, and trace such application-level services for HTTP. The framework follows a one-party consent model, which requires that each service be authorized explicitly by at least one of the application-layer endpoints. It further requires that OPES services are reversible by mutual agreement of the application endpoints.

Web filtering

Muffin – World Wide Web Filtering System:

* Written entirely in Java. Requires JDK 1.1
Runs on Unix, Windows 95/NT, and Macintosh.
* Freely available under the GNU General Public License.
* Support for HTTP/0.9, HTTP/1.0, HTTP/1.1, and SSL (https).
* Graphical user interface and command-line interface.
* Remote admin interface using HTML forms.
* Includes several filters which can remove cookies, kill GIF animations, remove advertisements,
add/remove/modify arbitrary HTML tags (like blink), remove Java applets and Javascript, user-agent spoofing, rewrite URLs, and much more.
* View all HTTP headers to aid in CGI development and debugging.
* Users can write their own filters in Java using the provided filter interfaces.

[very old: last file release on SourceForge.net on “April 4, 2000” – GPL]

Old but simple. The thesis slides describing Muffin can be read on the website.
PAW (Pro-Active Webfilter)
AW (pro-active webfilter) is an Open-Source filtering HTTP proxy based on the Brazil Framework provided as a Open-Source Project by SUN. Because the Brazil Framework and PAW are written in Java the software is highly portable.
PAW allows for easy plugin of Handlers (filter outgoing requests) and Filters (filter incoming data - the HTML response) and a GUI for end users. All the configuration files are in XML-Format and thus easy to modify (even without the GUI).
It's aim is to provide an easy to use interface for end users and to be easily extendable by developers. PAW consists of the followig components:
* PAW Server which implements the filtering HTTP Proxy.
* PAW GUI for easy PAW Server administration.

[old: last file release on SourceForge.net on «January 17, 2003» – Apache Software License – uses the Sun Brazil Web Application framework]
Privoxy
Privoxy is a web proxy with advanced filtering capabilities for protecting privacy, modifying web page content, managing cookies, controlling access, and removing ads, banners, pop-ups and other obnoxious Internet junk. Privoxy has a very flexible configuration and can be customized to suit individual needs and tastes. Privoxy has application for both stand-alone systems and multi-user networks.
Privoxy is based on Internet Junkbuster.

[still active but last file release on SourceForge.net on “January 30, 2004” – GPL – coded in C]
The Proxomitron (also there, more information on Proxomitron.info)
For those who have not yet been introduced, meet the Proxomitron: a free, highly flexible, user-configurable, small but very powerful, local HTTP web-filtering proxy.

[old and dead: “There were two separate releases of Proxomitron 4.5, one in May of 2003 and one in June.” – for Windows]

Jon Udell wrote an article entitled SSL Proxying – Opening a window onto secure client/server conversations inspired by the SSL support in Proxomitron. He showed inside it how to code a very simple web proxy with Perl (libwww-perl is a powerful library which can be used to develop web application with Perl).
Amit's Web Proxy Project [coded in Python]: Proxy 2 [dead: “1997”], Proxy 3 [dead: “1998”], Proxy 4 [dead: “2000”], and Proxy 5 [dead: “[2005-04-12] A lot of the HTML-modifying tricks I wanted to implement are easier to implement in GreaseMonkey, so I haven't had much motivation to work on a proxy to do these things. See a list of GreaseMonkey plugins.”].
Amit J. Patel worked on this subject while doing his thesis (more here). He links on his webpage to A list of open-source HTTP proxies written in python, a very complete list on Web proxies in Python.
FilterProxy:

FilterProxy is a generic http proxy with the capability to modify proxied content on the fly. It has a modular system of filters which can modify web pages. The modular system means that many filters can be applied in succession to a web page, and configuration is easy and flexible. FilterProxy can proxy any data served by the HTTP protocol (i.e. anything off the web), and filter any recognizable mime-type. All configuration is done via web-based forms, or editing a configuration file. It was created to fix some of the annoyances of poor web design by rewriting it. It also can improve the web for you, in both speed (Compress) in quality (Rewrite/XSLT). After ads (and their graphics) are stripped out, and html is compressed, surfing over a modem is much faster. Compare to Muffin (a similar project in java), and WebCleaner (a similar project in python) in purpose and functionality. FilterProxy is written in perl, and is quite fast.

[old: last file release on SourceForge.net on “January 12, 2002” – GPL – coded in Perl]
The V6 Web Engine:

V6 is to the Web what pipes are in Unix systems: a compositional device to combine document processing. To be easily integrated in the Web architecture, V6 is available as a personal proxy. Relying on a common skeleton architecture and Web related libraries, V6 can be easily configured to support various sets of filters while remaining portable and browser independent. The filters may act on the requests emitted by the browser (or other web client) or on the document returned by a server, or both.
In the current release, the available filters include
* flexible caching
* request redirection
* HTML filtering (based on NoShit)
* global history
* on-the-fly full text indexing
V6 can be used to support many other navigation aids and Web-related tools in a uniform, browser independent way. In addition, V6 can also be used as a traditional http server: this is particularly useful to serve private files without needing access to the site-wide http server, or to interface to local, private applications (mail, ...) through the CGI interface.

[archeology: last paper from 1996 and “V6 was written mostly in 1995/1996. Development and maintenance stopped in 1997” (there) – Copyright INRIA – coded in Objective Caml]

Useful for very old references: the position paper for example.
A new way of filtering web content is through greasemonkey:

Greasemonkey is a Firefox extension which lets you to add bits of DHTML ("user scripts") to any web page to change its behavior. In much the same way that user CSS lets you take control of a web page's style, user scripts let you easily control any aspect of a web page's design or interaction.

Downside: Firefox needed.

Web 1.0 experience augmentation

The MeStream Proxy:

ThemeStream is an online "personal interest" site. It works on a self-publishing model; authors may post articles freely in a wide variety of categories. Unfortunately, its reader-based rating system is not particularly reliable, nor is it customizable. The MeStream Proxy allows users to customize how they view ThemeStream and rate ThemeStream content.

[very old: last file release on SourceForge.net on “July 31, 2000” – GPL – MeStream was developped using the WBI development kit]

Note: ThemeStream is dead.

Identity management (ala RoboForm)

Super Proxy System (SPS):

Super Proxy System is the combination of a proxyserver and a mailserver.
In addition to relaying the request and response between the user client and remote server, proxyserver also provides some special functions. For example, it helps fill in the form appearing on the webpage. This will release the user from inputing the data every time when browsing some websites such as New York Times(www.nytimes.com). And all kinds of filters can be included if the user wants so that such annoyances as cookies, pop-up windows and javascript can be removed, which will protect your provicy when you surf the internet.
A special mailserver is built together with proxyserver, which is necessary in some cases where a confirmation email should be replied when registering the account in a form.
Super Proxy System makes your web surfing easy and secure.
Super Proxy System can be run in a local area network or individually.

[quite old: “Last Updated Jan. 25, 2003”]

Two members of this project (from the New York University) have interesting lists of publications : David Mazières (papers), and Helen Nissenbaum (papers) .

Web accelerators

Squid Web Proxy Cache:

Squid is...
* a full-featured Web proxy cache
* designed to run on Unix systems
* free, open-source software
* the result of many contributions by unpaid (and paid) volunteers
Squid supports...
* proxying and caching of HTTP, FTP, and other URLs
* proxying for SSL
* cache hierarchies
* ICP, HTCP, CARP, Cache Digests
* transparent caching
* WCCP (Squid v2.3 and above)
* extensive access controls
* HTTP server acceleration
* SNMP
* caching of DNS lookups

[active ;-) – GPL – coded in C]

The reference in the UNIX world (not written in Java). Interesting Related Software webpage on the Squid website.
RabbIT proxy for a faster web:

RabbIT is a web proxy that speeds up web surfing over slow links by doing:
* Compress text pages to gzip streams. This reduces size by up to 75%
* Compress images to 10% jpeg. This reduces size by up to 95%
* Remove advertising
* Remove background images
* Cache filtered pages and images
* Uses keepalive if possible
* Easy and powerful configuration
* Multi threaded solution written in java
* Modular and easily extended
* Complete HTTP/1.1 compliance
RabbIT is a proxy for HTTP, it is HTTP/1.1 compliant (testing being done with Co-Advisors test, http://coad.measurement-factory.com/) and should hopefully support the latest HTTP/x.x in the future. Its main goal is to speed up surfing over slow links by removing unnecessary parts (like background images) while still showing the page mostly like it is. For example, we try not to ruin the page layout completely when we remove unwanted advertising banners. The page may sometimes even look better after filtering as you get rid of pointless animated gif images.
Since filtering the pages is a "heavy" process, RabbIT caches the pages it filters but still tries to respect cache control headers and the old style "pragma: no-cache". RabbIT also accepts request for nonfiltered pages by prepending "noproxy" to the adress (like http://noproxy.www.altavista.com/). Optionally, a link to the unfiltered page can be inserted at the top of each page automatically.
RabbIT is developed and tested under Solaris and Linux. Since the whole package is written in java, the basic proxy should run on any plattform that supports java. Image processing is done by an external program and the recomended program is convert (found in ImageMagick). RabbIT can of course be run without image processing enabled, but then you lose a lot of the time savings it gives.
RabbIT works best if it is run on a computer with a fast link (typically your ISP). Since every large image is compressed before it is sent from the ISP to you, surfing becomes much faster at the price of some decrease in image quality. If some parts of the page are already cached by the proxy, the speedup will often be quite amazing. For 1275 random images only 22% (2974108 bytes out of a total of 13402112) were sent to the client. That is 17 minutes instead of 75 using 28.8 modem.
RabbIT works by modifying the pages you visit so that your browser never sees the advertising images, it only sees one fixed image tag (that image is cached in the browser the first time it is downloaded, so sequential requests for it is made from the browsers cache, giving a nice speedup). For images RabbIT fetches the image and run it through a processor giving a low quality jpeg instead of the animated gif-image. This image is very much smaller and download of it should be quick even over a slow link (modem).

[active: last file release on SourceForge.net on “January 11, 2005” – BSD License]
WWWOFFLE (World Wide Web Offline Explorer)
The wwwoffled program is a simple proxy server with special features for use with dial-up internet links. This means that it is possible to browse web pages and read them without having to remain connected.

[old: “Version 2.8 of WWWOFFLE released on Mon Oct 6 2003” – GPL – coded in C]

HTTP debugger and HTTP/HTML awareness

WebScarab (from OWASP – The Open Web Application Security Project):

WebScarab is a framework for analysing applications that communicate using the HTTP and HTTPS protocols. It is written in Java, and is thus portable to many platforms. In its simplest form, WebScarab records the conversations (requests and responses) that it observes, and allows the operator to review them in various ways.
WebScarab is designed to be a tool for anyone who needs to expose the workings of an HTTP(S) based application, whether to allow the developer to debug otherwise difficult problems, or to allow a security specialist to identify vulnerabilities in the way that the application has been designed or implemented.
A framework without any functions is worthless, of course, and so WebScarab provides a number of plugins, mainly aimed at the security functionality for the moment. Those plugins include:
* Fragments - extracts Scripts and HTML comments from HTML pages as they are seen via the proxy, or other plugins
* Proxy - observes traffic between the browser and the web server. The WebScarab proxy is able to observe both HTTP and encrypted HTTPS traffic, by negotiating an SSL connection between WebScarab and the browser instead of simply connecting the browser to the server and allowing an encrypted stream to pass through it. Various proxy plugins have also been developed to allow the operator to control the requests and responses that pass through the proxy.
o Manual intercept - allows the user to modify HTTP and HTTPS requests and responses on the fly, before they reach the server or browser.
o Beanshell - allows for the execution of arbitrarily complex operations on requests and responses. Anything that can be expressed in Java can be executed.
o Reveal hidden fields - sometimes it is easier to modify a hidden field in the page itself, rather than intercepting the request after it has been sent. This plugin simply changes all hidden fields found in HTML pages to text fields, making them visible, and editable.
o Bandwidth simulator - allows the user to emulate a slower network, in order to observe how their website would perform when accessed over, say, a modem.
* Spider - identifies new URLs on the target site, and fetches them on command.
* Manual request - Allows editing and replay of previous requests, or creation of entirely new requests.
* SessionID analysis - collects and analyses a number of cookies (and eventually URL-based parameters too) to visually determine the degree of randomness and unpredictability.
* Scripted - operators can use BeanShell to write a script to create requests and fetch them from the server. The script can then perform some analysis on the responses, with all the power of the WebScarab Request and Response object model to simplify things.
Future development will probably include:
* Parameter fuzzer - performs automated substitution of parameter values that are likely to expose incomplete parameter validation, leading to vulnerabilities like Cross Site Scripting (XSS) and SQL Injection.
* WAS-XML Static Tests - leveraging the OASIS WAS-XML format to provide a mechanism for checking known vulnerabilities.
As a framework, WebScarab is extensible. Each feature above is implemented as a plugin, and can be removed or replaced. New features can be easily implemented as well. The sky is the limit! If you have a great idea for a plugin, please let us know about it on the list.
There is no shiny red button on WebScarab, it is a tool primarily designed to be used by people who can write code themselves, or at least have a pretty good understanding of the HTTP protocol. If that sounds like you, welcome! Download WebScarab, sign up on the subscription page, and enjoy!

[active: last release on “20050222”]
Charles Web Debugging Proxy:

Charles is an HTTP proxy / HTTP monitor / Reverse Proxy that enables a developer to view all of the HTTP traffic between their machine and the Internet. This includes requests, responses and the HTTP headers (which contain the cookies and caching information).
Charles can act as a man-in-the-middle for HTTP/SSL communication, enabling you to debug the content of your HTTPS sessions.
Charles simulates modem speeds by effectively throttling your bandwidth and introducing latency, so that you can experience an entire website as a modem user might (bandwidth simulator).
Charles is especially useful for Macromedia Flash developers as you can view the contents of LoadVariables, LoadMovie and XML loads.

[seems still active: last update on freshmeat.net on “25-Dec-2004”]
Surfboard:

Surfboard is a filtering HTTP 1.1 proxy. It features dynamic filter management through an interactive HTML console, IP tunneling, WindowMaker applets, and a suite of filters. See the Features page for details.
Who should use this? Surfboard is a "personal proxy", intended to be used by individuals rather than organizations. It's purpose is not to censor or monitor surfing activity, nor is it intended to implement caching within the proxy. Filters could be written to do these things, but it's not something I'm personally interested in doing, and it's already available in other proxies. My goal with surfboard is to make a proxy that covers new ground and let's you "surf in style" by adding visual feedback, interaction, and network load balancing to make websurfing more enjoyable.
Why another filtering proxy? A long time ago, I wanted a simple way to examine HTTP headers for a project I was working on. All the existing proxies I found were overkill for what I wanted, and were nontrivial to configure. So instead, I spent a lunch break writing a very simple proxy in Java that did everything I needed. Later I modified it to remove certain types of banner ads, but I was unhappy with the code -- it was ugly and difficult to maintain. I imagined that someday I would re-write it and "do it right", making everything dynamic with a browser-enabled console, some WindowMaker applets to visualize HTTP activity and to toggle filters on/off on the fly, etc. The typical second-system effect, in other words :-)

[old: last file release on SourceForge.net on “January 12, 2002” – GPL – mainly in Java, but frontend parts coded in C]
the Axis TCP Monitor (tcpmon):

A lightweight Java TCP proxy (from the Axis project).

Personal assistant

Webmate:

WebMate is part of the Intelligent Software Agents project headed by Katia Sycara. It is a personal agent for World-Wide Web browsing and searching developed by Liren Chen. It accompanies you when you travel on the internet and provides you what you want.
Features
* Searching enhancement, including parallel search (it can send search request to the current popular search engines and get results from them, reorder them according to how much overlapping among the different search engines), searching keywords refinement using our relevant keywords extraction technology, relevant feedback, etc.
* Browsing assistant, including learning your current interesting, recommending you new URLs according to your profile and selected resources, giving some URL a short name or alias, monitoring bookmarks of Netscape or IE, getting more like the current browsing page, sending the current browsing page to your friends, prefetching the following hyperlinks at the current browsing page, etc.
* Offline browsing, including downloading the following pages from the current page for offline browsing, getting the references of some pages and printing it out. * Filtering HTTP header, including recording http header and all the transactions between your browser and WWW servers, filtering the cookie to protect your privacy, block the animation gif file to speed up your browsing, etc.
* Checking the HTML page to find the errors in it, checking embedded links in to find the dead links for your learning to write HTML pages or maintain your webmate site, etc.
* Dynamically setting up all kinds of resources, including search engines, dictionaries available in the WWW, online translation systems available in the WWW, etc.
* Programming in Java, independent of operating system, runing in multi-thread.

[dead: downlodable file from March, 2000]

There is a paper about Webmate here (other – older – papers there). The developer, Liren Chen wrote other interesting personal agents. He/She (?) works in The Intelligent Software Agents Lab from The Robotics Institude, School of Computer Science of the Carnegie Mellon University, headed by Katia Sycara (a lot of publications).

Knowledge augmentation and retrieval

Knowledge augmentation

Scone – “A Java Framework to Build Web Navigation Tools”:

Scone is a Java Framework published under the GNU GPL, which was designed to allow the quick development and evaluation of new Web enhancements for research and educational purposes. Scone is focussed on tools which help to improve the navigation and orientation on the Web.
Scone has a modular architecture and offers several components, which can be used, enhanced and programmed using a plugin concept. Scone plugins can augment Web browsers or servers in many ways. They can:
* generate completely new views of Web documents,
* show extra navigation tools inside an extra window next to the browser,
* offer workgroup tools to support collaborative navigation,
* enrich web pages with new navigational elements,
* help to evaluate such prototypes in controlled experiments etc.

[latest version: “Version 1.1.34 from 13. Nov 2004” Scone uses “IBM's WBI (Web Based Intermediary) as Proxy”]

On the Related Projects page, many interesting tools are mentionned; among them: HTMLStreamTokenizer (“HtmlStreamTokenizer is an HTML parser written in Java. The parser classifies the HTML stream is into three broad token types: tags, comments, and text.”), HTTPClient (“This package provides a complete http client library. It currently implements most of the relevant parts of the HTTP/1.0 and HTTP/1.1 protocols, including the request methods HEAD, GET, POST and PUT, and automatic handling of authorization, redirection requests, and cookies. Furthermore the included Codecs class contains coders and decoders for the base64, quoted-printable, URL-encoding, chunked and the multipart/form-data encodings.” – there are other interesting stuff on the webpage) and WebSPHINX: A Personal, Customizable Web Crawler:

WebSPHINX (Website-Specific Processors for HTML INformation eXtraction) is a Java class library and interactive development environment for web crawlers. A web crawler (also called a robot or spider) is a program that browses and processes Web pages automatically.

On the WebSPHINX webpage, one can find a list of other web crawlers and some references.

Scone has got an unbelievable architecture, “developed as a research project at the Distributed Systems and Information Systems Group (VSIS) [from the Department of Informatics] of the University of Hamburg”. In the Documentation section, there are many papers and theses (the list of people in the project is on the main page). Many prototypes were also developed; BrowsingIcons being one of the most impressing: “BrowsingIcons is a tool to support revisitation of Web pages. To do this, it dynamically draws dynamic graphs of the paths of users as they surf the Web. Compared to using a plain browser, people can revisit web pages faster when they use these visualizations. A study showed that they also enjoy the visualizations more than Netscape alone.”
AgentFrank:

The goal of Agent Frank is to be a personal intelligent intermediary and companion to internet infovores during their daily hunter/gatherer excursions. Whew. Okay, so what does that mean? Well, let's take it one buzzword at a time:
Personal - While employing many traditionally server-side technologies, Agent Frank is intended to reside near the user, on the desktop or the laptop.
Intelligent - Agent Frank wants to learn about the user, observe preferences and habits, and become capable of automating many of the tedious tasks infovores face. Eventually, this will come to involve various forms of machine learning and analysis, & etc.
Intermediary - Amongst Agent Frank's facilities are network proxies that can be placed between local clients and remote servers. Using these, Agent Frank can tap into the user's online activities in order to monitor, archive, analyze, and alter information as it flows. For example, using a web proxy, Agent Frank can log sites visited, analyze content, filter out ads or harmful scripting.
Companion - Agent Frank's ultimate purpose is to accompany an infovore and assist along the way.
Agent Frank is, at least initially, a laboratory for hacker/infovores to implement and play with technologies and techniques to fulfill the above goals. At its core, Agent Frank is a patchwork of technologies stitched together into a single environment intended to enable this experimentation. At the edges, Agent Frank is open to plugins and scripting to facilitate quick development and playing with ideas.
Agent Frank wants to be slick & clean one day, but not today. Instead, it is a large and lumbering creature with all the bolts, sockets, and stitches still showing. This is a feature, not a bug.

[old: the last release was on “20030215” – GPL]

Very impressing job done by Leslie Michael Orchard! This platform uses many open source tools:

Implemented in Java, with an intent to stick to 100% pure Java.
Makes use of Jetty for an embedded browser-based user interface
Employs BeanShell to provide a shell prompt interface and scripting facilities
RDF metadata is employed via the Jena toolkit
Web proxy services are provided via the Muffin web proxy
Text indexing and searching enabled by Jakarta Lucene.
Exploring use of HSQL and/or Jisp for data storage.
The Arakne Environment:

an open collaborative hypermedia system for the Web

[old]
There are a few interesting papers on the project page, written by the creator Niels Olof Bouvin from the Departement of Computer Science – DAIMI of the Faculty of Science, University of Aarhus.

Web 1.0 annotation

mprox - a second layer of consciousness:

mprox is not a 'product' - we dont give a shit about business!
mprox is not 'art' - we dont waste time being at the right parties talking shit about our work!
mprox is simply an experiment.
it is an experiment about how the web could be used for not only passive viewing of information, but active commmunication on top of (and below) this information.
it will also be an experiment how people will develop ways to deal with these possibilities, since there is no censorship, control or administration involved.
by using mprox, a second layer of consciousness is created on every web page you visit, that can be used to communicate, post messages, manipulate the content of the page or transform the web page into an art object. possibilities are unlimited and uncontrollable due to an easily expandable "plugin"-system.

[very old: “v0.3, 2000/03/22”]

Knowledge retrieval

YaCy – p2p-based distributed Web Search Engine:

The YACY project is a new approach to build a p2p-based Web indexing network.
* Crawl your own pages or start distributed crawling
* Search your own or the global index
* Built-in caching http proxy, but usage of the proxy is not a requisite
* Indexing benefits from the proxy cache; private information is not stored or indexed
* Filter unwanted content like ad- or spyware; share your web-blacklist with other peers
* Extension to DNS: use your peer name as domain name!
* Easy to install! No additional database required!
* No central server!
* GPL'ed, freeware

[active: “The latest YaCy-release is 0.37” (on “20050502”) – GPL]

Very clear architecture, explained on the technology webpage.

This is the first entry on this subject. More to come.

Posted by Jean-Philippe on May 25, 2005 9 Comments, 1650 TrackBacks

January 02, 2005

Happy New Year with a bit of Computers History

Happy New Year!

To start this promising year, I would like to mention the availability of the paper Computer Programming as an Art from Donald E. Knuth, author of The Art of Computer Programming, written for the 1974 A.M. Turing Award Lecture of the ACM:

“For his major contributions to the analysis of algorithms and the design of programming languages, and in particular for his contributions to the “art of computer programming” through his well-known books in a continuous series by this title.”

(Via langreiter.com, 2004-12-18-650).

Also have a look at Modern Home Computer from Mitch Kapor.

Posted by Jean-Philippe on January 02, 2005 1 Comments, 1285 TrackBacks

December 26, 2004

Couple of extensions

Marie:

You are .rpm You have a nice package. You can be useful, but your many variations sometimes make you tough to find. You aren't apt to get jealous.

Which File Extension are You?

Me:

You are .cgi Your life seems a bit too scripted, and sometimes you are exploited. Still a workhorse though.

Which File Extension are You?

Posted by Jean-Philippe on December 26, 2004 1 Comments, 176 TrackBacks

March 06, 2004

Wandering in IoC lands

Someone seems to have listened to me: in Draft: Introduction to (IoC) Container Internals (LSD::RELOAD), Leo Simons introduces his new paper, an Introduction to (IoC) Container Internals. I've just read the beginning and must confess I understood nothing... Same result with PicoContainer Inversion of Control and IoC Types (there are many things to read there). After these sad news for my intellect, I took a look at the Apache corner: HiveMind Inversion of Control and Avalon IOC Patterns. Same noisy result. The IoC Introduction on Javangelist was just a bit clearer (with what seems to be a good pros and cons part)...
Well, I must be dumb, or theses guys are not pedagogist. My evil plan: read Inversion of Control Containers and the Dependency Injection pattern by Martin Fowler and The Dependency Inversion Principle from Object Mentor, Inc..

Posted by Jean-Philippe on March 06, 2004 13 Comments, 475 TrackBacks

March 04, 2004

Voilà pourquoi je suis nul en informatique

C'est vrai ça, moi je croyais souvent ce que disais les professeurs, j'étais un élève sage, etc. Ce n'est pas avec le "voilà ce que vous devez savoir sur Java" que je pouvais m'en sortir dans ce monde cruel. Au final je suis une bite et si quelqu'un peut me trouver un bon livre sur la pattern IoC (Inversion of Control), je suis preneur!
Next trimester I'll be following the "Introduction to programming with Java" class that is mandatory for all physics students. Oh, joy.

"In this course, students are introduced to a modern, visual development and programming environment (they dare call JBuilder modern!) based on the programming language Java. They will also learn to think algorithmically (no, its not a word in Dutch either) by analysis of non-trivial problems (from what I've heard, add some missing pieces into a GUI cd database). Subjects include elementary object-oriented concepts such as classes, instances, methods and attributes (they mean fields here I presume)."

All lectures and assignments are mandatory. Boy, am I looking forward to biting my tongue as a CompSci student reviews my code...
(On LSD::RELOAD: Mandatory java 101...oh, c'mon!, and the follow-up here: Mandatory java 101...not really)

Posted by Jean-Philippe on March 04, 2004 9 Comments, 0 TrackBacks

June 01, 2003

A bit of history

Two links to learn:

a fun history of the Internet (not very sharp, sometime wrong) on The Lemon: The History of The Internet (via kasia in a nutshell: Telling it like it is);

Java Profiles: James Gosling Bio (on Focus on Java).

Posted by Jean-Philippe on June 01, 2003 24 Comments, 167 TrackBacks

May 31, 2003

Inside the software development process

First, some simple principles driving software development:

“POGE (Principle of Good Enough)”: free range simplicities shall drive your developments (on "Doc Searls Weblog": The SIMPLE case for XMPP, POGE, cont'd);

“PONGE (Principle of Never Good Enough)”, POGE's marketing opposite notion: always find new, useful – or not – functionnalities to implement (also on "Doc Searls Weblog": POGE vs. Microsoft?);

the KISS Principle (“Keep It Simple, Stupid”), a corollary to POGE: “a maxim often invoked when discussing design to fend off creeping featurism and control development complexity”;

the worse-is-better philosophy (via "::Manageability::": Words To Live By "Worse is Better"):

Simplicity-the design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.

Correctness-the design must be correct in all observable aspects. It is slightly better to be simple than correct.

Consistency-the design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.

Completeness-the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.

Next, a more biological way of viewing the sofware development process: The Internet and GPL as a text-file ecosystem (on "BitWorking"). Important notions here:

“software development process can be viewed as the production and processing of text files” (source files): these files are building years after years an ecosystem, the Open Source Software (OSS) ecosystem (compared to the Closed Source Software – CSS – ecosystem, composed of binary files, less granular);

“liberty” vs. “freedom”: liberty is the physical right to do something (here the permission to cross the Internet's checkpoints – this network being viewed as a informational highway: you can go beyond the boundaries) and freedom is the physical ability to do something (here to drive freely on the Internet: you do not have to pay at the checkpoints);

open source software have become autocatalytic in the recent past: “the whole system could be done using only OSS text files and at that point the system became collectively autocatalytic” (the author, Joe Gregorio, is exploiting here ideas from the complexity theory, more precisely from Stuart A. Kauffman).

Finaly, a more human way of feeling software engineering (via "Russell Beattie Notebook": 'Creating by exertion of the imagination...'), with some words from The Mythical Man-Month: Essays on Software Engineering by Frederick P. Brooks, Jr:

"First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, expecially things of his own design.

"Second is the the pleasure of making things that are useful to other people. Deep within, we want others to use our work and to find it helpful.

"Third is the fascination of fahsioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning.

"Fourth is the joy of always learning, which springs from the non-repeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.

"Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures."

Posted by Jean-Philippe on May 31, 2003 23 Comments, 272 TrackBacks

May 30, 2003

What branch of the tree is safer?

An essay (very Lisp-oriented) about the past and the future of computer languages on the (very interesting) Paul Graham's website: “The Hundred-Year Language”. Excerpts:

The evolution of languages differs from the evolution of species because branches can converge. (...) In theory this is possible for species too, but it's so unlikely that it has probably never happened.
Convergence is more likely for languages partly because the space of possibilities is smaller, and partly because mutations are not random. Language designers deliberately incorporate ideas from other languages.

Saying less about implementation should also make programs more flexible. Specifications change while a program is being written, and this is not only inevitable, but desirable.
The word "essay" comes from the French verb "essayer", which means "to try". An essay, in the original sense, is something you write to try to figure something out. This happens in software too. I think some of the best programs were essays, in the sense that the authors didn't know when they started exactly what they were trying to write.

Inefficient software isn't gross. What's gross is a language that makes programmers do needless work. Wasting programmer time is the true inefficiency, not wasting machine time. This will become ever more clear as computers get faster.

Good essay in fact, but I disagree with the baseline of the story: “most of the extra computer power we're given will go to waste”. This sentence do not have to be in the future form, but in the present's one.
Nevermind. To stay or jump on the good branch of the computer languages' tree, think further than the ideas found in this essay. But according to Paul Graham's words, we could conclude that scripting languages will grow more than ever in the future, which is a good thing... So: Perl, Python, Ruby... and now?

(Via Sam Ruby's www.intertwingly.net - “Parallelism Done Right”)

Posted by Jean-Philippe on May 30, 2003 29 Comments, 1050 TrackBacks

The butterfly is not all ;)

In the "Newsfactor" special report “Which Is Buggier - Windows or Linux?” we can clearly see that a pragmatic approach to detect bugs-prone operating systems is necessary. Then we can consider many of the “Microsoft Windows vs. Linux” arguments dealing with bugs without scientific foundations.

(Via "OSNews.com" - “Which Is Buggier - Windows or Linux?”)

Posted by Jean-Philippe on May 30, 2003 21 Comments, 276 TrackBacks

Science Fiction drives Computing Science

In the story “UNIX's True Competition: Linux?” on "OSNews.com", we can read this romantic conclusion:

The reason I wrote this editorial was because of my own "romantism" towards operating systems. One part of me believes that having a single OS running on all devices conceivable is great means for interoperability (think "Star Trek" and how they connect alien devices to their own with ease), but the other, "osnews" part of me, loves to see more and more operating systems and architectures on the plate. Seeing old traditional Unices fading away with time, or at least losing their glorious role every day to Linux or Windows, truly saddens me.

Yes, we all want to implement SF technologies in our software... Creativity drives developments!

Posted by Jean-Philippe on May 30, 2003 15 Comments, 239 TrackBacks

April 28, 2003

I do not want to work in a fastfood!

Via "OSNews" - “Enterprise and Server Software to Become Commodity”:
In "NewsForge: The Online Newspaper of Record for Linux and Open Source" - “There may never be another software billionaire”:

I think that the “restaurant business” is a good analogy for the “software business”. When you are creative and come with new ideas, really new ideas, you have your chance, a fighting chance at success. Nevertheless I know that there will always be opportunities in area close to software engineering. It's the reason why after I graduated from a computer science engineering school, I choosed to go on in the deep world of cognitive sciences: being wider, the next century might offer jobs for billionaires in this domain.

Posted by Jean-Philippe on April 28, 2003 17 Comments, 255 TrackBacks

April 05, 2003

Unix versus Microsoft philosophies

“If Unix development philosophy is small pieces loosely joined, Microsoft's philosophy is big chunks tighly coupled.” (Jason Kottke)

This sentence must be read as an echo of Small Pieces Loosely Joined { a unified theory of the web } from David Weinberger.
This is a clever remark ; however I think Microsoft's philosophy is more based on a Russian nesting dolls philosophy...

(From "kottke.org": “Larry Ellison in a dream world”)

Posted by Jean-Philippe on April 05, 2003 18 Comments, 313 TrackBacks

March 31, 2003

Why I love BASIC

Yes, I agree with Dr. Heinz M. Kabutz:
- “BASIC is for beginners.”
- “BASIC allows global variables.”
- “BASIC supports the GOTO statement.”
- “BASIC is widely understood and supported.”
I learned programming with Microsoft GWBASIC and QBASIC (and QuickBASIC a few months after). It was like a game and I enjoyed creating applications. But that's not because I learned coding with BASIC that I am an awful programmer (I hope so...) and that I put GOTO everywhere, in C language for example. On the contrary, I know when to use GOTO, as explained in the “Kernighan and Ritchie” (we can say GOTO is in my toolbox). I really do not like when a stupid teacher invoke the well-known paper of Dijkstra to say that GOTO is evil!! In general, those teachers are absolutely stupid and bad programmers... Nothing is only black or only white; GOTO also is grey.
So now, we can play with BASIC in Java.

Via Erik's Weblog (Monday, March 31, 2003 [@597])

Posted by Jean-Philippe on March 31, 2003 21 Comments, 1067 TrackBacks

March 29, 2003

Les « penseurs de l'informatique » sont-ils des « penseurs de l'évolution sociale » ?

« L'espèce des penseurs de l'informatique, à laquelle appartenait Jean-Yves Fréhaut, est moins rare qu'on pourrait le croire. Dans chaque entreprise de taille moyenne on peut en trouver un, rarement deux. En outre la plupart des gens admettent vaguement que toute relation, en particulier toute relation humaine, se réduit à un échange d'information (si bien entendu on inclut dans le concept d'information les messages à caractère non neutre, c'est-à-dire gratifiant ou pénalisant). Dans ces conditions, un penseur de l'informatique aura tôt fait de se transformer en penseur de l'évolution sociale. Son discours sera souvent brillant, et de ce fait convaincant ; la dimension affective pourra même y être intégrée. »

Michel Houellebecq, Extension du domaine de la lutte (1994)
[page 43, Éditions J'ai lu, collection Nouvelle Génération (2002)]

Posted by Jean-Philippe on March 29, 2003 16 Comments, 0 TrackBacks

Notebook / Archives / "computer science"