PHP RFC: Add pecl_http to core
- Version: 2.4
- Date: 2014-08-19
- Last-Modified: 2015-02-20
- Author: Michael Wallner, [email protected]
- Status: Declined
- First Published at: http://wiki.php.net/rfc/pecl_http
Introduction
About
What is pecl_http?
pecl_http is an extension that aims to provide a convenient and powerful set of functionality for one of PHP’s major applications.
It eases handling of HTTP urls, headers and messages, provides means for negotiation of a client's preferred content type, language and charset, as well as a convenient way to send any arbitrary data with caching and resuming capabilities.
It provides powerful request functionality with support for parallel requests and an event loop library.
Why pecl_http?
pecl_http was first created more than ten years ago. Back in that time I was a PEAR guy and maintained a set of HTTP related packages. I ever wondered so much why PHP does not support HTTP in a more sophisticated manner and thought I’ve got to change that.
Back then there wasn’t much more than a few PEAR libraries like HTTP_Request, HTTP_Download and HTTP_Cache. What followed was a journey through PHP4 and 5, OOP and procedural programming compatible APIs with good reception from the users, but bad critics by architects and evangelists. v2 tried to settle on a more concise and modern set of OOP APIs. I still get threats from users because of this. Development was slow and it took about three years until 2.0 stable was released in late 2013.
Ever so often there were requests to bundle pecl_http with the source distribution but until now I never felt like I want to do that nor that the code was ready for such a move. This time has come, though. The only real concern I still have is the burden it creates on all core developers when moving such a big amount of code from the hands of one into the hands of a few.
Code base size
I have to admit, that the amount of code that comes with pecl_http can be called big, but that is mostly only due to core PHP lacking in many if not all areas considered important to more sophisticated processing of HTTP. Everything works out of the box to certain extent with the tools already in core unless you have to dig deeper. pecl_http tries to give you the tools you need, out of the box, instantly available for more differentiated situations, which have become more not less in the last ten years.
Miscellaneous
Usage numbers
Frankly, I don’t know. PECL stats show about 50k source package downloads per month in average, whatever that has to say. It’s placed in the top 10 where good old APC still has the lead.
Test coverage
The current test suite provides a code coverage of about 90% and is subject to improvement.
Coverage resport of v2: http://dev.iworks.at/ext-http/lcov/ext/http/index.html
There’s one test that currently fails for me due to porting the extension to ZE3, because of reference mismatch and a leak in zend_assign_to_variable_ref() as cause.
Licensing
All of the affected code was licensed under 2-clause BSD and will be re-licensed to PHP-3.01 license.
FIG and PSR-7
There are no plans to follow or adhere to the Framework Interoperability Group’s “PHP Standard Recommendation” #7.
The Guts
A fully merged tree can be inspected here (based on v2.2, so slightly out of date):
https://github.com/m6w6/php-src/tree/merge-http
An up-to-date (based on v2.3) pecl_http tree for PHP7 can be found here:
https://github.com/php/pecl-http-pecl_http/tree/phpng
Documentation
The current docs are available here:
http://devel-m6w6.rhcloud.com/mdref/http
The documentation presumably has to be converted to docbook to be included in php.net, though, at php.net/http currently reside the docs for pecl_http v1. As far as I know, the docs team did not come to a conclusion how to handle that situation.
I had hoped that in the meantime the php-docs approach/investigation of using some sort of markdown would have been successful, because the current documentation is written in a simple style of markdown, but anyway, conversion should not be too hard, I even had a volunteer in December, but did not hear back from him.
Markdown sources of the documentation can be found here:
https://github.com/m6w6/mdref-http
Dependencies
libz AKA zlib
- Type: of dep: hard build dep
- Minimum version: 1.2.0.4
- Provided functionality: encoding and decoding of message bodies with zlib/gzip/deflate encodings
- Current state: essential
libidn
- Type of dep: soft build dep
- Provided functionality: IDNA support for http\Url
- Current state: feature completive, might look into ICU as alternative
libcurl
- Type of dep: soft build dep
- Minimum version: 7.18.2
- Provided functionality: HTTP request support
- Current state: feature completive; might look into additional alternatives, but nothing has proven competitive enough yet
libevent(2)
- Type of dep: soft build dep
- Provided functionality: multi_socket API from libcurl
- Current state: nice to have - must have for more parallel requests than select() can reasonably handle; might look into additional alternatives like libuv (libev already has libevent compatibility)
pecl/propro
- Type of dep: hard build dep
- Provided functionality: byref access to object properties representing C struct members
- Current state: feature completive, suggested to be merged to ext/standard or main, please discuss
Property Proxy API
ZE2 doxygen reference can be found here:
http://php.github.io/pecl-php-propro/php__propro_8h.html
There’s been similar functionality in ZE2, but it was dysfunctional to my findings back then and has obviously been removed in ZE3.
Internals thread from 2010:
http://marc.info/?t=128351193900002&r=1&w=2
When a property is requested byref (BP_VAR_RW) from an object that stores state in a member of the objects C struct, and that object implementation uses its own property handlers, it can use a property proxy to enable that kind of access to that state.
This is accomplished by returning an instance of the property proxy object instead of the property directly. The proxy has its set and get handlers overridden and does deferred and cascaded fetch/push of the original member of the container (object) to update the state.
Every extension maintaining state outside of real object properties (e.g. dom) can make use of this functionality and does not have to emit the error “Properties of XYClass can not be accessed by ref or array key/index”.
Actual implementation for http\Message properties:
https://github.com/php/pecl-http-pecl_http/blob/phpng/php_http_message.c#L854-L879
static zval *php_http_message_object_read_prop(zval *object, zval *member, int type, void **cache_slot, zval *tmp) { zval *return_value; zend_string *member_name = zval_get_string(member); php_http_message_object_prophandler_t *handler = php_http_message_object_get_prophandler(member_name); if (!handler || type == BP_VAR_R || type == BP_VAR_IS) { return_value = zend_get_std_object_handlers()->read_property(object, member, type, cache_slot, tmp); if (handler) { php_http_message_object_t *obj = PHP_HTTP_OBJ(NULL, object); PHP_HTTP_MESSAGE_OBJECT_INIT(obj); handler->read(obj, tmp); zval_ptr_dtor(return_value); ZVAL_COPY_VALUE(return_value, tmp); } } else { return_value = php_property_proxy_zval(object, member_name); } zend_string_release(member_name); return return_value; }
pecl/raphf
- Type of dep: hard build dep
- Provided functionality: unified API for managing (persistent) resource handles
- Current state: feature completive, suggested to be merged to main, please discuss
Resource And Persistent Handle Factory API
ZE2 doxygen reference can be found here:
http://php.github.io/pecl-php-raphf/php__raphf_8h.html
I once said that raphf provides a similar set of functionality like zend_list, but unfortunately this is a bit misleading, so let’s look at the differences first:
zend_list manages refcounted zend_resources with opaque handles and custom destructors for persistent and non-persistent handles. Persistent handles are returned to the caller as is.
raphf does not need refcount support, because we’re working with objects instead of resources, and so should you probably, too. Instead a kind of copy constructor is optionally supported for object cloning.
Here’s the actual implementation of curl_easy and curl_multi ctor/copy/dtor handlers for raphf:
https://github.com/php/pecl-http-pecl_http/blob/phpng/php_http_client_curl.c#L99-L180
typedef struct php_http_curle_storage { char *url; char *cookiestore; CURLcode errorcode; char errorbuffer[0x100]; } php_http_curle_storage_t; static inline php_http_curle_storage_t *php_http_curle_get_storage(CURL *ch) { php_http_curle_storage_t *st = NULL; curl_easy_getinfo(ch, CURLINFO_PRIVATE, &st); if (!st) { st = pecalloc(1, sizeof(*st), 1); curl_easy_setopt(ch, CURLOPT_PRIVATE, st); curl_easy_setopt(ch, CURLOPT_ERRORBUFFER, st->errorbuffer); } return st; } static void *php_http_curle_ctor(void *opaque, void *init_arg) { void *ch; if ((ch = curl_easy_init())) { php_http_curle_get_storage(ch); return ch; } return NULL; } static void *php_http_curle_copy(void *opaque, void *handle) { void *ch; if ((ch = curl_easy_duphandle(handle))) { curl_easy_reset(ch); php_http_curle_get_storage(ch); return ch; } return NULL; } static void php_http_curle_dtor(void *opaque, void *handle) { php_http_curle_storage_t *st = php_http_curle_get_storage(handle); curl_easy_cleanup(handle); if (st) { if (st->url) { pefree(st->url, 1); } if (st->cookiestore) { pefree(st->cookiestore, 1); } pefree(st, 1); } } static php_resource_factory_ops_t php_http_curle_resource_factory_ops = { php_http_curle_ctor, php_http_curle_copy, php_http_curle_dtor }; static void *php_http_curlm_ctor(void *opaque, void *init_arg) { return curl_multi_init(); } static void php_http_curlm_dtor(void *opaque, void *handle) { curl_multi_cleanup(handle); } static php_resource_factory_ops_t php_http_curlm_resource_factory_ops = { php_http_curlm_ctor, NULL, php_http_curlm_dtor };
A resource factory can be created from this ops and be used directly, or can be transparently wrapped by the persistent handle ops to support process/thread wide persistency:
Actual implementation of creating the resource/persistent handle factory for http\Client\Request of the curl driver:
https://github.com/php/pecl-http-pecl_http/blob/phpng/php_http_client_curl.c#L1788-L1816
static php_resource_factory_t *create_rf(php_http_url_t *url) { php_persistent_handle_factory_t *pf; php_resource_factory_t *rf = NULL; zend_string *id; char *id_str = NULL; size_t id_len; if (!url || (!url->host && !url->path)) { php_error_docref(NULL, E_WARNING, "Cannot request empty URL"); return NULL; } id_len = spprintf(&id_str, 0, "%s:%d", STR_PTR(url->host), url->port ? url->port : 80); id = php_http_cs2zs(id_str, id_len); pf = php_persistent_handle_concede(NULL, PHP_HTTP_G->client.curl.driver.request_name, id, NULL, NULL); zend_string_release(id); if (pf) { rf = php_resource_factory_init(NULL, php_persistent_handle_get_resource_factory_ops(), pf, (void (*)(void*)) php_persistent_handle_abandon); } else { rf = php_resource_factory_init(NULL, &php_http_curle_resource_factory_ops, NULL, NULL); } zend_string_release(id); return rf; }
For php_persistent_handle_concede() to succeed, a provider has to be registered at MINIT:
if (SUCCESS != php_persistent_handle_provide(PHP_HTTP_G->client.curl.driver.client_name, &php_http_curlm_resource_factory_ops, NULL, NULL)) { return FAILURE; } if (SUCCESS != php_persistent_handle_provide(PHP_HTTP_G->client.curl.driver.request_name, &php_http_curle_resource_factory_ops, NULL, NULL)) { return FAILURE; }
php_persistent_handle_concede() would also take pointers to a wakeup and a retire function for persistent handles, so that e.g. network sockets or database handles can be prepared for the idle time or be checked that they are still valid when requested.
This is the last notable difference from zend_list and is not needed by curl, but here’s an example of wakeup and retire functions located in pecl/pq, a PostgreSQL Client:
https://github.com/php/pecl-database-pq/blob/master/src/php_pqconn.c#L563-L651
static void php_pqconn_wakeup(php_persistent_handle_factory_t *f, void **handle TSRMLS_DC) { PGresult *res = PQexec(*handle, ""); PHP_PQclear(res); if (CONNECTION_OK != PQstatus(*handle)) { PQreset(*handle); } } static void php_pqconn_retire(php_persistent_handle_factory_t *f, void **handle TSRMLS_DC) { php_pqconn_event_data_t *evdata = PQinstanceData(*handle, php_pqconn_event); PGcancel *cancel; PGresult *res; /* go away */ PQsetInstanceData(*handle, php_pqconn_event, NULL); /* ignore notices */ PQsetNoticeReceiver(*handle, php_pqconn_notice_ignore, NULL); /* cancel async queries */ if (PQisBusy(*handle) && (cancel = PQgetCancel(*handle))) { char err[256] = {0}; PQcancel(cancel, err, sizeof(err)); PQfreeCancel(cancel); } /* clean up async results */ while ((res = PQgetResult(*handle))) { PHP_PQclear(res); } /* clean up transaction & session */ switch (PQtransactionStatus(*handle)) { case PQTRANS_IDLE: res = PQexec(*handle, "RESET ALL"); break; default: res = PQexec(*handle, "ROLLBACK; RESET ALL"); break; } if (res) { PHP_PQclear(res); } if (evdata) { /* clean up notify listeners */ zend_hash_apply_with_arguments(&evdata->obj->intern->listeners TSRMLS_CC, apply_unlisten, 1, evdata->obj); /* release instance data */ efree(evdata); } }
Any extension providing (networked) services should be able to take advantage of raphf unless it needs to expose resources to userland.
raphf INI setting
There’s a global INI setting (SYSTEM), persistent_handle_limit (defaults to -1, unlimited) of debatable usefulness.
Features
C API
Most of the features are directly accessible through pecl_http's C API.
Globals
Nothing in the global namespace, except the namespace Http
.
Client
- Current status: essential
- Related functionality in core: HTTP stream wrapper, ext/curl
The HTTP stream wrapper is of limited functionality, but may work for most simple needs.
Better support for more complicated applications like different authentication schemes, proxy types, encodings, SSL/TLS layers and what not is a desirable out of the box functionality.
All of that would actually be available by ext/curl but the existing libcurl binding is in an subpar maintenance state and suffers from its own quirks. Also, to my great surprise, there are only about five people enjoying the libcurl API, or what is available from it in PHP.
Currently only libcurl is implemented as a provider for http\Client, providing most of the functionality of most-current libcurl. This should be a good bet, because libcurl is mature and ubiquitously available.This does not mean that we may not implement our own “PHP” driver for http\Client.
http\Client supports sending parallel requests, optionally driven by an event loop library like libev{,ent}.
Encoding
- Current status: essential
- Related functionality in core: ext/zlib
Actually ext/zlib supports all the same three encodings since I fixed it a few years ago, so there could be the occasion for few shared code lines. What it definitely lacks, though, are incremental encoders/decoders not depending on php_streams in form of stream filters.
AFAIK there no accessible implementation of chunked encoding in core.
Env
- Current status: feature completive
- Related functionality in core: superglobals, php:/ / input, output buffering, various functions and bits here and there
http\Env provides negotiation of content type, character set and language which is a feature set often asked for, nothing comparable exists in core.
Most of what the environmental/server request constitutes is represented by superglobals and any request body by php:input, which is absolutely no problem, especially since php:input does not point to an allocated string in the SAPI globals (even possibly allocated twice).
http\Env\Request provides a central access point for all of that data and makes it safe to change/mock it without actually changing the original environment.
http\Env\Response provides features for sending responses beyond header(), ob_start() and readfile() with support for ranges/resuming, caching and throttling.
Message
- Current status: essential
- Related functionality in core: rfc1867.c
Message parser and tools. http\Message is the base class of all request and response classes. Note that a “parent message” denotes any message appearing before a message in a stream of messages.
Also message bodies with support for building and (basic) parsing of multipart bodies, utilizing a (temporary) stream for memory efficiency.
Splitting a multipart body creates a chain of http\Message objects.
Header
- Current status: essential
- Related functionionalty in core: non-existent
Header parser and tools.
I'm really not sure what case I should make about a header and message parser implementation in an HTTP package.
Cookie
- Current status: feature completive, maybe the odd cousin
- Related functionality in core: non-existent
Cookie and Set-Cookie headers come in a special format and are ubiquitous, thus they deserve a discrete parser.
One could argue that there is related functionality in core, namely php_default_treat_data(), but it is inaccessible from userland and cumbersome to call internally.
Params
- Current status: essential
- Related functionality in core: non-existent
Header params parser; think of a content-type or an accept header. Negotiation, cookie and query parser build on it.
QueryString
- Current status: feature completive
- Related functionality in core: parse_str() (php_default_treat_data())
Query string parser and tools. Actually builds on http\Params.
parse_str() suffers from its legacy/heritage as it replaces f.e. dots with underlines.
Url
- Current status: essential
- Related functionality in core: parse_url()
URL parser and tools with UTF-8, locale multibyte and IDNA support (need to check if, and how much it diverges from IRIs). See RFC3987 and RFC3988.
I'm not sure what recommendation parse_url() follows, if any.
Unaffected PHP Functionality
The http: stream wrapper is unaffected by pecl_http.
Vote
Three way “yes/enabled by default”, “yes/disabled by default”, “no” where 50%+1 combined “Yes” votes are needed for acceptance.
Additional simple vote on the namespace prefix (“http” or “php\http”) disregarding the case.
Discussed and changed items
Parsing multipart/form-data and a/www-form-url-encoded for any request method
This functionality was removed from the proposal.
Parsing a/json into $_POST
This functionality was removed from the proposal, which removed the ext/json dependency.
Translating charsets of http\QueryString
This functionality was removed from the proposal, which removed the ext/iconv dependency.
Extended hashing methods for ETags of dynamic content
This functionality was removed from the proposal, which removed the ext/hash dependency.
Splitting up into smaller RFCs
It was requested to split this RFC up into more smaller ones, but mainly only, as observed by me, to *not* bring an HTTP client implementation into the default distribution. These requests were not considered further by me, because I think the client gives substantial value to the overall package.
Upgrade path for existing pecl_http users
A pecl_http integraded into the default distribution would be considered v3. Upcoming v2 releases could take measures to prepare any transition to the PHP7 API.
Namespace choice, or the case of the case
I consider this issue non-important, because we do not rely on the case for internal classes, so http
, HTTP
and Http
only have cosmetic effect.
There will be an extra vote on whether to prefix the http
namespace with php
. Here, I consider PHP
, php
and Php
to be equal as well.
Changelog
- 1.1
- Added “Feature corner stones”
- 1.2
- Improvements to “Introduction” and “Proposal”
- 1.3
- Added another link to the current docs
- 1.4
- Added link to fully merged tree
- Expand voting options
- 2.0
- Complete rewrite
- 2.1
- Expanded feature section
- 2.2
- Removed optional dependencies on all three extensions (json, iconv, hash), and the one INI entry related to it
- 2.3
- Removed http\Env RINIT section
- Changed namespace from
http
toHttp
- Fixed some wordings and list formattings
- 2.4
- Added “Discussed and changed items”