Wednesday, November 22, 2006

pecl_http response performance

I made some small tests to get an idea of the current response performance of pecl/http.


Test victim was a >300k PDF file served by Apache 2.0.55 running the worker MPM.


http://blog.iworks.at/uploads/data.serendipityThumb.png
Full size chart, PDF with stats.


The legend should be read as follows:























php=request to a PHP script serving the PDF file by HttpResponse
php-c=just like above, cached by ETag
file=request to the PDF file served directly by Apache
file-c=just like above, cached by ETag

The results surprised me, because it clearly shows that the performance drop for starting the PHP engine and doing the neccessary negotiation to provide caching is not that bad. On the other side, PHPs throughput when actually serving the file is not exciting. The current CVS version of pecl/http contains some tweaks to improve that situation, but it didn't really work out as well as I hoped.

Saturday, September 16, 2006

HttpRequestDataShare

There are some news to talk about development of pecl/http.


I recently implemented an interface to the curl-share functionality in form of an HttpRequestDataShare class.


This is what reflection will tell you about it:


mike@honeybadger:~/build/php-5.2-debug$ cli --rc HttpRequestDataShare
Class [ <internal:http> class HttpRequestDataShare implements Countable ] {

- Constants [0] {
}

- Static properties [1] {
Property [ private static $instance ]
}

- Static methods [1] {
Method [ <internal> static public method singleton ] {

- Parameters [1] {
Parameter #0 [ <optional> $global ]
}
}
}

- Properties [4] {
Property [ <default> public $cookie ]
Property [ <default> public $dns ]
Property [ <default> public $ssl ]
Property [ <default> public $connect ]
}

- Methods [5] {
Method [ <internal, dtor> public method __destruct ] {

- Parameters [0] {
}
}

Method [ <internal, prototype Countable> public method count ] {

- Parameters [0] {
}
}

Method [ <internal> public method attach ] {

- Parameters [1] {
Parameter #0 [ <required> HttpRequest $request ]
}
}

Method [ <internal> public method detach ] {

- Parameters [1] {
Parameter #0 [ <required> HttpRequest $request ]
}
}

Method [ <internal> public method reset ] {

- Parameters [0] {
}
}
}
}

Using this class, you can save a fair amount of time with name lookups which the following example shows:



<?php
$s
= HttpRequestDataShare::singleton(true);
print_r($s);
for (
$i = 0; $i < 10; ++$i) {
$r = new HttpRequest("http://www.google.com/");
$s->attach($r);
$r->send();
printf("%0.6fn", $r->getResponseInfo("namelookup_time"));
$s->detach($r);
}
?>

Executing this script without dns data sharing enabled gives the following results:


mike@honeybadger:~/build/php-5.2-debug$ cli  -d"http.request.datashare.dns=0"
~/devel/http_rshare.php
HttpRequestDataShare Object
(
[cookie] =>
[dns] =>
[ssl] =>
[connect] =>
)
0.071296
0.048798
0.049598
0.051545
0.046258
0.052318
0.043769
0.060753
0.049168
0.048568

...and with dns data sharing enabled:


mike@honeybadger:~/build/php-5.2-debug$ cli -d"http.request.datashare.dns=1" 
~/devel/http_rshare.php
HttpRequestDataShare Object
(
[cookie] =>
[dns] => 1
[ssl] =>
[connect] =>
)
0.051945
0.000043
0.000041
0.000040
0.000039
0.000041
0.000041
0.000040
0.000040
0.000041

QED


You can either use a per-process global datashare object created with HttpRequestDataShare::singleton(true) or different instances for your HttpRequest objects. Note that dns datasharing is used autmagically for HttpRequestPool requests. Currently libcurl has implemented cookie and dns data sharing only, trying to enable ssl session or connect sharing will raise a warning.


Be sure to try it out; either directly from CVS or the next release, probably being 1.3.0RC1.

Monday, August 21, 2006

__get() and array rumors

There've been lots of rumors about overloaded array properties lately.


The following code



<?php
class funky {
private
$p = array();
function
__get($p) {
return
$this->p;
}
}
$o = new funky;
$o->prop["key"] = 1;
?>


will yield:


Notice: Indirect modification of overloaded property funky::$p has no effect

As arrays are the only complex types that are passed by value (resources don't really count here) the solution to described problem is simple: use an object; either an instance of stdClass or ArrayObject will do well, depending if you want to use array index notation.


So the folloiwng code will work as expected, because the ArrayObject instance will pe passed by handle:



<?php
class smarty {
private
$p;
function
__construct() {
$this->p = new ArrayObject;
}
function
__get($p) {
return
$this->p;
}
}
$o = new smarty;
$o->prop["key"] = 1;
?>

I guess most of you already knew, but anyway... ;)

Friday, August 18, 2006

Round up

It's been a long time since I wrote something here, mostly because I got distracted by some real private life recently ;) and due to paid work of course. Therefore I thought I'd round up what has happened behind the scenes in my PHP world.


PHP-6
I rewrote the output control layer for PHP-6 some months ago and I'm about to upgrade ext/zlib to see how it really works out.


PHP-5.2
I didn't contribute that much to this upcoming release. Two things I'd like to mention are a fix for the Apache2 SAPI where each header("Content-Type: aaa/bbb") caused Apache to add output filters for the type to the outgoing filter chain and the addition of the error_get_last() function, which is a convenient accessor to the last occured error without fiddling around with INI(track_errors) and $php_errormsg.


pecl/http
There's official documentation now available online in the PHP manual, yay! :) It's not fully fleshed out, but gives some feeling about the provided functionality and hints on how to use this module.


php|a published an article by me about pecl/http in their Augusts issue!


There have also been three releases since 1.0, the most recent one (1.2) today. See the changes since then outlined below.


Improvements/Additions



  • Improved response performance (HttpResponse, http_send API)

  • Added http_build_cookie() function

  • Added HttpQueryString::mod(array $params) method

  • Added ArrayAccess to interfaces implemented by HttpQueryString

  • Added HttpMessage::getHeader(string $name) method


Bug Fixes



  • Fixed http_parse_cookie() allowed_extras and flags parameters

  • Fixed configuration with shared dependencies

  • Fixed endless loop in http_build_url("..")

  • Fixed HttpResponse::capture() failure if buffered output exceeds 40k

  • Fixed HttpQueryString failures with objects as params

  • Fixed memory leaks with overloaded classes extending HTTP classes

  • Fixed build with gcc-2.95 (Thanks to Alexander Zhuravlev)

  • Fixed memory leak in inflate code (Thanks to Thomas Landro Johnsen)

Sunday, June 11, 2006

Installing pecl_http

As pecl/http 1.0 has finally been released and I had noticed that it's been packaged already by several projects like PLD, Gentoo and FreeBSD, I wanted to explain what one is going to gain respectively lose by using the different build/configure options for the extension.
The help text of configure for pecl/http should look similar to the following:


  --enable-http           Enable extended HTTP support
--with-http-curl-requests[=LIBCURLDIR]
HTTP: with cURL request support
--with-http-zlib-compression[=LIBZDIR]
HTTP: with zlib encodings support
--with-http-magic-mime[=LIBMAGICDIR]
HTTP: with magic mime response content type guessing
--with-http-shared-deps HTTP: disable to not depend on extensions like hash,
iconv and session (when built shared)

If you link the extension source directory into your php tree, you should be aware that these options show up on the end of the list of configure options for extensions, not--as probably expected--in alphabetical order. This is due to a recent change to use config9.m4 because the HTTP extension may depend on several other PHP extensions (hash, iconv, session).


--with-http-curl-requests
This configure option enables request functionality, uses libcurl and is highly recommended to be enabled. The minumum libcurl version required is 7.12.3. Debian/stable currently ships 7.13.2 (no, this is not a typo).


--with-http-zlib-compression
I think this is the most overseen/ignored option. Besides handling of compressed HTTP messages, it also provides superior deflate/inflate functionaly in regards to stability and performance compared to the standard zlib extension. Both http_deflate()/http_inflate() functions and http.deflate/http.inflate stream filters are able to encode/decode all valid gzip, zlib (AKA deflate) and raw deflated data. It requires at lieast libz version 1.2.0.4, while Debian/stable ships 1.2.2, and is also highly recommended to be enabled.


--with-http-magic-mime
This option enables content type guessing for the HttpResponse and HttpMessage classes. It's rather a gimmick and thus not enabled by default. As there's no version information available for libmagic, I don't even know which is the minimum version required but I guess anything coming from a file-4.1x versioned package should work. If you get an empty string as content type for payload which is obviously XML text, check the magic.mime database you use for a broken first XML section. Comment out everything except the SVG detection as other XML types and HTML is handled further down the magic file (noticed on Debian systems). If you changed your magic.mime database, don't forget to regenerate the precompiled version with the `file -C`command.


--with-http-shared-deps
This option controls whether pecl/http will depend on extensions built as dynamically loadable modules. So, if e.g. ext/iconv has been compiled shared, pecl/http relies on ext/iconv to be loaded when itself is going to be loaded. This option is enabled by default.


ext/hash
pecl/http uses ext/hash to generate ETag hashes (else standard PHP MD5, SHA1 or CRC32).


ext/iconv
If ext/iconv is present, the HttpQueryString class provides an xlate() method for charset transformation.


ext/session
http_redirect() can automatically append session information to the redirect URL.


ext/spl
ext/spl cannot be built shared, so pecl/http always uses it if it's enabled. HttpMessage and HttpRequestPool classes implement the interface Countable provided by ext/spl.

Friday, June 9, 2006

Wednesday, May 31, 2006

Konquerors ViewMode Buttons

If you, after upgrading to Dapper Drake, are missing your beloved ViewMode Buttons in Konqueror, locate


/usr/share/kubuntu-default-settings/kde-profile
/default/share/apps/konqueror/konq-kubuntu.rc

and add the following ToolBar node:


<ToolBar newline="false" hidden="false" name="viewModeToolBar" >
<text>ViewMode Toolbar</text>
<ActionList name="viewmode_toolbar" />
</ToolBar>

Yes, you guessed, this drove me mad ;)

Sunday, May 28, 2006

Disabled Trackbacks

I just disabeld trackbacks for this blog.


They just filled my INBOX and DB.

Monday, May 22, 2006

Cookie Handling

I noticed some weirdance about how libcurl and thus pecl/http handles cookies.


I had to implement some changes which are only in CVS for now and which I'm going to outline here:



<?php
$r
= new HttpRequest("http://www.google.at/");
$r->recordHistory = true;
// we don't care about cookies by default
// enable automatic recognition of cookies
$r->enableCookies();
$r->send();
// received cookies will be sent on the next request
$r->send();
// reset those "auto" cookies
// this needs at least libcurl >= v7.14.1
$r->resetCookies();
$r->send();
echo
$r->getHistory()->toString(true);
?>

Beware that all this does not affect custom cookies set with HttpRequest::setCookies() and HttpRequest::addCookies(). Custom cookies can always be unset by calling HttpRequest::setCookies().


A final note on using the cookiestore option:



<?php
$r
= new HttpRequest("http://www.google.at/");
// load and save cookies from /tmp/cookies.txt
// if 'cookiesession' is TRUE, session cookies
// won't be loaded from the cookiestore
$r->setOptions(array(
"cookiesession" => TRUE,
"cookiestore" => "/tmp/cookies.txt")
);
$r->send();
?>

Note that using the cookiestore automatically enables libcurls cookie engine.

Tuesday, March 7, 2006

End of Youth

After more than a year of development and bulldozing the pecl-cvs mailing list, I'm ready to move pecl/http into 1.0-RC stage.


Version 0.25 has just been released and I'm confident that we'll see a stable 1.0 release not later than April.


Thanks for your condolescence and patience in the last few months :)

Wednesday, February 15, 2006

pecl/http update

Yeah, you guessed, version 0.23 of pecl/http has been released, and it's time for a feature update ;)


Cookies


http_parse_cookie() has been reimplemented (and HttpRequest::getResponseCookie() has been moved to HttpRequest::getResponseCookies(). After revisiting the original Netscape draft and the two cookie RFCs it was pretty obvious that the previous implementation was pretty bogus.


Now it works as follows:



<?php
http_parse_cookie
("cookie1=value; cookie2="1;2;3;4"; path=/");
/*
stdClass Object
(
[cookies] => Array
(
[cookie1] => value
[cookie2] => 1;2;3;4
)

[extras] => Array
(
)

[flags] => 0
[expires] => 0
[path] => /
[domain] =>
)
*/
?>

As you can see, a cookie line can have several name/value pairs. The standard additional fields like expires, path etc. are recogniced automatically. The RFCs, though, define some other standard extra elements, here's where the third parameter of http_parse_cookie() plays in:



<?php
http_parse_cookie
("cookie1=value; cookie2="1;2;3;4"; comment="none"; path=/",
0, array("comment"));
/*
stdClass Object
(
[cookies] => Array
(
[cookie1] => value
[cookie2] => 1;2;3;4
)

[extras] => Array
(
[comment] => none
)

[flags] => 0
[expires] => 0
[path] => /
[domain] =>
)
*/
?>

If "comment" wouldn't have been specified as an allowed extra element, it would just have been recognized as another cookie.
IF you pass HTTP_COOKIE_PARSE_RAW as second parameter to http_parse_cookie(), no urldecoding is performed.
The flags in the return value is a bitmask of HTTP_COOKIE_SECURE and HTTP_COOKIE_HTTPONLY.


Messages


Some users pointed me to the fact that neither HttpMessage nor HttpRequest provide accessors to the HTTP response reason phrase AKA status text. They've been added in form of HttpMessage::getResponseStatus() and HttpRequest::getResponseStatus().


Some might have wondered why HttpMessages are chained in kind of a reverse order. Well, that has internal reasons, caused by how we retreive the data from libcurl and how the message parser works. Anyway there's now HttpMessage::reverse() which reorders the messages in a more intuitive chronical way:



<?php
$msg
= new HttpMessage(
"GET / HTTP/1.1
HTTP/1.1 302 Found
Location: /foo
GET /foo HTTP/1.1
HTTP/1.1 200 Ok"
);
foreach (
$msg as $m) echo $m;
foreach (
$msg->reverse() as $m) echo $m;
/*
HTTP/1.1 200 Ok
GET /foo HTTP/1.1
HTTP/1.1 302 Found
Location: /foo
GET / HTTP/1.1

GET / HTTP/1.1
HTTP/1.1 302 Found
Location: /foo
GET /foo HTTP/1.1
HTTP/1.1 200 Ok
*/
?>

Note, though, that HttpMessage::toString(true) automatically prepends parent messages, i.e. gives the latter result.


Requests


For servers that don't urldecode cookies, a new option has been added, named "encodecookies", which omits urlencoding cookies if set to FALSE.


Similarily to the "lastmodified" request option, there's now an "etag" option working along the same lines.


HttpRequest::getHistory() now returns a real HttpMessage property, which measn that this message chain is no longer immutable to changes made by the user.


If a request fails for some reason, you should now be able to get the error message through HttpRequest::getResponseInfo("error").

Friday, February 3, 2006

Some cool new features of pecl/http

PECL::HTTP version 0.22 has been released this morning, and I want to point at some features which have been added to the extension since I last blogged about it.


Incremental zlib (de)compressors were added in form of two classes, HttpDeflateStream and HttpInflateStream. I hope the names say it all ;)


Another class, that might seem a bit odd at a quick glance, is HttpQueryString. It's a great tool to realize "paging" or sites with lots of rewrite rules AKA "pretty urls".


The class HttpMessage has finally got its iterator interface to move up the message chain in a more convenient way. Messages can now be detached and prepended from/to the message chain.


Thanks to Ilia you can now retrieve the raw request and response messages sent resp. received by an HttpRequest instance.


The function http_build_url is now the most versatile and powerful utility to handle URLs (sorry for the lack of docs). Please tell me if you don't think so ;)

Friday, January 27, 2006

imap_savebody()

If you -like me- were suffering from being unable to load big attachments through ext/imap because of PHPs memory limit, the new imap_savebody() function should be what you were looking for. It adds the ability to save any section (full mail, too) of a mail message to a file or stream.


Adding it implied a non-trivial change to ext/imap, so if you encounter any new problems -with f.e. imap_fetchbody()- speak up ASAP, please! ;)

Apache2 mod_domaintree version 1.3

I just released mod_domaintree-1.3 on freshmeat.


It'll take some time to appear, though.


The code has been cleaned up a lot and a host name to directory cache (per server/process) has been added.


Enable the cache by setting DomainTreeCache to a reasonable high number, like the amount of different domains being hosted.


Drop me a mail if you like it! ;)