- Pixiv is an illustration communication site with over 5 million users and 3.3 billion monthly page views.
- The presenter works on infrastructure and software engineering at Pixiv, where he is responsible for image upload, thumbnail generation, data storage, caching strategies and more.
- Pixiv generates 12 or more thumbnails for each image uploaded to optimize loading and browsing on different devices. With over 30 million images, this amounts to over 30 terabytes of thumbnails.
Apache Storm 0.9 basic training - VerisignMichael Noll
Apache Storm 0.9 basic training (130 slides) covering:
1. Introducing Storm: history, Storm adoption in the industry, why Storm
2. Storm core concepts: topology, data model, spouts and bolts, groupings, parallelism
3. Operating Storm: architecture, hardware specs, deploying, monitoring
4. Developing Storm apps: Hello World, creating a bolt, creating a topology, running a topology, integrating Storm and Kafka, testing, data serialization in Storm, example apps, performance and scalability tuning
5. Playing with Storm using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/09/15/apache-storm-training-deck-and-tutorial/
Many thanks to the Twitter Engineering team (the creators of Storm) and the Apache Storm open source community!
VeryUtils PHP Web File Manager is a best and useful file manager for web.docxLingwen1998
VeryUtils PHP Web File Manager is a best and useful file manager for web, written in JavaScript using jQuery and jQuery UI. Creation is inspired by simplicity and convenience of Finder program used in Mac OS X operating system. VeryUtils PHP Web File Manager script helps you manage files with others in secure and simple way using your own PHP host. VeryUtils PHP Web File Manager helps you quickly create multimedia file management applications. Supports thumbnails with customizable dimensions, easy integration with editors like TinyMCE, CKEditor, etc. Built-in auto-push to Google Drive for archiving.
https://veryutils.com/php-web-file-manager
Simple File Uploader and Explorer is a simple PHP Script to upload files and manage them. The drag and drop file uploader is the main feature of this script. It allows you to upload multiple files very fast and easy way. All files are stored in a writable folder (fileFolder). Once the files are uploaded they can be viewed in Download Files section. Also you can search files, view thumbnails and Download Files.
VeryUtils PHP Web File Manager is the ultimate file and document manager. Manage your files in the cloud with desktop-like intuitive features such as dragging files to a folder, moving files to other folders or even deleting them. Try out the demo today, preview a PDF, Word, Excel and PowerPoint document or move the files around just like you do on your desktop and see how easy it is to use! Enjoy and thank you for looking.
VeryUtils PHP Web File Manager Features:
* Without require MySQL and other databases.
* No Database Used for users, so Simple to Integrate.
* Usability like the MacOS Finder or Windows Explorer.
* Mobile friendly view for touch devices.
* All operations with files and folders on a remote server (copy, move, upload, create folder/file, rename, etc.)
* High performance server backend and light client UI.
* Multi-root support.
* Local file system, MySQL, FTP, Box, Dropbox, GoogleDrive and OneDrive volume storage drivers.
* Support AWS S3, Azure, Digital Ocean Spaces and more with League\Flysystem Flysystem driver.
* Cloud storage (Box, Dropbox, GoogleDrive and OneDrive) drivers.
* Background file/folder upload with Drag & Drop HTML5 support.
* Chunked file upload for large file.
* Upload directly to the folder.
* Upload form URL (or list).
* List and Icons view.
* Keyboard shortcuts.
* Standard methods of file/group selection using mouse or keyboard.
* Move/Copy files with Drag & Drop.
* Drag & Drop to outside by starting drag with alt/option key press.
* Archives create/extract (zip, rar, 7z, tar, gzip, bzip2).
* Rich context menu and toolbar.
* Quicklook, preview for common file types.
* Edit text files and images.
* "Places" for your favorites.
* Calculate directory sizes.
* Thumbnails for image, movie files.
* Thumbnail view of uploaded files.
* Easy to integrate with web editors (elRTE, CKEditor, TinyMCE).
* Flexible configuration of access rights, upload file types, user inter
Techniques for Scaling the Netflix API - QCon SFDaniel Jacobson
This presentation was from QCon SF 2011. In these slides I discuss various techniques that we use to scale the API. I also discuss in more detail our effort around redesigning the API.
The State Library of North Carolina preserves and provides access to over 162,000 digital files from its collections. It uses CONTENTdm for access and had been storing digital content on a local server. It decided to migrate this content to DuraCloud for additional preservation storage. The migration process involved exporting metadata from CONTENTdm, file lists from the local server, checking file names and checksums, and identifying any missing files. Test uploads were done to DuraCloud before running a full sync of content over multiple days. Ongoing uploads will now be done using DuraCloud's tools. Lessons learned include the need for improved metadata handling and more automated monitoring and integration with access systems.
Crawlware is a distributed deep web crawling system that enables scalable and efficient crawls across multiple machines. It uses a job model with reusable actions to customize crawls and extracts data through complex queries. Key components include a payload generator that schedules crawls in parallel across sites, rate control to regulate the crawl rate, and auto-deduplication of extracted links. Testing is done through a Sinatra simulation and problems like changing site loads, data freshness, and JavaScript rendering still need to be addressed.
Krzysztof Kotowicz presented several HTML5 tricks that could be abused by attackers:
- Filejacking allows reading files from a user's system using the directory upload feature in Chrome. Sensitive files were exposed from some users.
- AppCache poisoning can be used in a man-in-the-middle attack to persist malicious payloads by tampering with a site's cache manifest file.
- Silent file upload uses cross-origin resource sharing to upload fake files without user interaction, potentially enabling CSRF attacks.
He warned that IFRAME sandboxing could facilitate clickjacking, and that drag-and-drop techniques risk exposing sensitive content across domains unless sites use X-
Krzysztof kotowicz. something wicked this way comesYury Chemerkin
Krzysztof Kotowicz presented several ways that HTML5 and user interaction could be abused by attackers:
- Filejacking allows uploading files from a user's system without consent by tricking them into selecting a folder. Sensitive files were taken from actual victims.
- AppCache poisoning can be used to persist malicious payloads on a user's system by tampering with application manifest files during a man-in-the-middle attack.
- Silent file upload constructs arbitrary files in JavaScript and uploads them to a victim site using cross-origin resource sharing if CSRF is possible. This was demonstrated against a real website.
- IFRAME sandboxing and drag-and-drop
Inside Picnik: How We Built Picnik (and What We Learned Along the Way)jjhuff
The document discusses the architecture and infrastructure challenges of building the photo editing website Picnik, including their use of Flash, a LAMP stack, virtualization, load balancing, storage solutions, and lessons learned around scaling, outages, and third party dependencies. It provides an overview of the technical components used to build Picnik and the operational issues they encountered along the way.
Html5: Something wicked this way comes (Hack in Paris)Krzysztof Kotowicz
This document discusses several HTML5-based attacks that could be used to compromise a target named Bob. It describes using filejacking to access files on Bob's computer, poisoning Bob's app cache to gain persistent access, performing silent file uploads to plant incriminating evidence, using UI redressing to trick Bob into actions, and extracting sensitive information from Bob's employer's internal sites using drag-and-drop content extraction. The document provides proof-of-concept demos and notes limitations but emphasizes that HTML5 expands attack possibilities against unaware users. It concludes by encouraging developers to implement proper defenses like X-Frame-Options to prevent framing attacks.
Canvas and WebGL allow for rich graphics and animation on the web through APIs for 2D and 3D drawing. Forms have been enhanced with new input types like email, number and date pickers. Features like drag and drop, geolocation, notifications and the history API enable more interactive experiences. Browser capabilities have been extended through APIs for multimedia, storage, web sockets and accessing hardware. HTML5 aims to provide these features to enhance user experience without additional plugins.
Asynchronous Processing with Ruby on Rails (RailsConf 2008)Jonathan Dahl
The document discusses asynchronous processing and provides recommendations for when and how to implement it. It describes asynchronous processing as running tasks without blocking normal execution flow. Common uses include sending emails, processing images, and database synchronization. It recommends using a background job queue like Delayed Job for general purpose asynchronous tasks and message queues like SQS with custom workers for distributed processing tasks requiring high speed and scalability.
HTML5 APIs - Where no man has gone before! - AltranRobert Nyman
This document summarizes several HTML5 APIs including classList, web storage, web SQL, IndexedDB, offline web applications, history API, web sockets, file API, drag and drop, web workers, fullscreen API, camera API, WebRTC, pointer lock API, and battery status API. It provides code examples and descriptions for how to use each API to add interactivity and offline capabilities to web applications.
This document compares YUI 2 and YUI 3 JavaScript frameworks. YUI 3 represents a significant evolution from YUI 2. Some key points:
- YUI 3 was released in 2008 as a major rewrite, focusing on modularity, extensibility and a plugin architecture.
- It uses a Base, Widget and Plugin model for developing reusable components in a simpler, more object-oriented way compared to YUI 2.
- YUI 3 aims to provide a more modular, extensible and flexible foundation for building rich JavaScript applications compared to the older YUI 2 framework.
Active Storage is Rails' built-in solution for handling file uploads and attachments. It provides a modular, scalable, and easy way to store files in cloud storage like S3. Files go through an analysis and transformation process before being downloaded or previewed. Alternatives like Shrine offer more customization but require more setup. While Active Storage works well for simple cases, other solutions may perform better for complex applications with large file uploads and advanced transformations.
Techniques for Scaling the Netflix API - QCon SFDaniel Jacobson
This presentation was from QCon SF 2011. In these slides I discuss various techniques that we use to scale the API. I also discuss in more detail our effort around redesigning the API.
The State Library of North Carolina preserves and provides access to over 162,000 digital files from its collections. It uses CONTENTdm for access and had been storing digital content on a local server. It decided to migrate this content to DuraCloud for additional preservation storage. The migration process involved exporting metadata from CONTENTdm, file lists from the local server, checking file names and checksums, and identifying any missing files. Test uploads were done to DuraCloud before running a full sync of content over multiple days. Ongoing uploads will now be done using DuraCloud's tools. Lessons learned include the need for improved metadata handling and more automated monitoring and integration with access systems.
Crawlware is a distributed deep web crawling system that enables scalable and efficient crawls across multiple machines. It uses a job model with reusable actions to customize crawls and extracts data through complex queries. Key components include a payload generator that schedules crawls in parallel across sites, rate control to regulate the crawl rate, and auto-deduplication of extracted links. Testing is done through a Sinatra simulation and problems like changing site loads, data freshness, and JavaScript rendering still need to be addressed.
Krzysztof Kotowicz presented several HTML5 tricks that could be abused by attackers:
- Filejacking allows reading files from a user's system using the directory upload feature in Chrome. Sensitive files were exposed from some users.
- AppCache poisoning can be used in a man-in-the-middle attack to persist malicious payloads by tampering with a site's cache manifest file.
- Silent file upload uses cross-origin resource sharing to upload fake files without user interaction, potentially enabling CSRF attacks.
He warned that IFRAME sandboxing could facilitate clickjacking, and that drag-and-drop techniques risk exposing sensitive content across domains unless sites use X-
Krzysztof kotowicz. something wicked this way comesYury Chemerkin
Krzysztof Kotowicz presented several ways that HTML5 and user interaction could be abused by attackers:
- Filejacking allows uploading files from a user's system without consent by tricking them into selecting a folder. Sensitive files were taken from actual victims.
- AppCache poisoning can be used to persist malicious payloads on a user's system by tampering with application manifest files during a man-in-the-middle attack.
- Silent file upload constructs arbitrary files in JavaScript and uploads them to a victim site using cross-origin resource sharing if CSRF is possible. This was demonstrated against a real website.
- IFRAME sandboxing and drag-and-drop
Inside Picnik: How We Built Picnik (and What We Learned Along the Way)jjhuff
The document discusses the architecture and infrastructure challenges of building the photo editing website Picnik, including their use of Flash, a LAMP stack, virtualization, load balancing, storage solutions, and lessons learned around scaling, outages, and third party dependencies. It provides an overview of the technical components used to build Picnik and the operational issues they encountered along the way.
Html5: Something wicked this way comes (Hack in Paris)Krzysztof Kotowicz
This document discusses several HTML5-based attacks that could be used to compromise a target named Bob. It describes using filejacking to access files on Bob's computer, poisoning Bob's app cache to gain persistent access, performing silent file uploads to plant incriminating evidence, using UI redressing to trick Bob into actions, and extracting sensitive information from Bob's employer's internal sites using drag-and-drop content extraction. The document provides proof-of-concept demos and notes limitations but emphasizes that HTML5 expands attack possibilities against unaware users. It concludes by encouraging developers to implement proper defenses like X-Frame-Options to prevent framing attacks.
Canvas and WebGL allow for rich graphics and animation on the web through APIs for 2D and 3D drawing. Forms have been enhanced with new input types like email, number and date pickers. Features like drag and drop, geolocation, notifications and the history API enable more interactive experiences. Browser capabilities have been extended through APIs for multimedia, storage, web sockets and accessing hardware. HTML5 aims to provide these features to enhance user experience without additional plugins.
Asynchronous Processing with Ruby on Rails (RailsConf 2008)Jonathan Dahl
The document discusses asynchronous processing and provides recommendations for when and how to implement it. It describes asynchronous processing as running tasks without blocking normal execution flow. Common uses include sending emails, processing images, and database synchronization. It recommends using a background job queue like Delayed Job for general purpose asynchronous tasks and message queues like SQS with custom workers for distributed processing tasks requiring high speed and scalability.
HTML5 APIs - Where no man has gone before! - AltranRobert Nyman
This document summarizes several HTML5 APIs including classList, web storage, web SQL, IndexedDB, offline web applications, history API, web sockets, file API, drag and drop, web workers, fullscreen API, camera API, WebRTC, pointer lock API, and battery status API. It provides code examples and descriptions for how to use each API to add interactivity and offline capabilities to web applications.
This document compares YUI 2 and YUI 3 JavaScript frameworks. YUI 3 represents a significant evolution from YUI 2. Some key points:
- YUI 3 was released in 2008 as a major rewrite, focusing on modularity, extensibility and a plugin architecture.
- It uses a Base, Widget and Plugin model for developing reusable components in a simpler, more object-oriented way compared to YUI 2.
- YUI 3 aims to provide a more modular, extensible and flexible foundation for building rich JavaScript applications compared to the older YUI 2 framework.
Active Storage is Rails' built-in solution for handling file uploads and attachments. It provides a modular, scalable, and easy way to store files in cloud storage like S3. Files go through an analysis and transformation process before being downloaded or previewed. Alternatives like Shrine offer more customization but require more setup. While Active Storage works well for simple cases, other solutions may perform better for complex applications with large file uploads and advanced transformations.
2. Introduction to pixiv
● Illust Communication Site
○ http://www.pixiv.net
● Users
○ 5 million
● Monthly page view
○ 3.3 billion
● Network Traffic
○ Over 6Gbps
3. About me
● Tatsuhiko Kubo(H.N:bokko)
● @cubicdaiya
● Infrastructure & Software Engineer@pixiv. Inc
4. My Works@pixiv. Inc
Responsible for
● Middle-Ware Development
● Technical Operation & Administration
● Datastore & Caching Strategy
● Image Upload & Transformation
● Illust & User Recommendation
● Notification( called Popboard at pixiv)
etc...
5. My works@private
■Software
● neoagent
○ A Yet Another Memcached Protocol Proxy Server
● dtl
○ diff template library with C++
● ngx_small_light
○ Dynamic Transformation Module for Nginx
■Writing
● Software Design 2009 Sep
○ どのようにして差分を導き出すのか~diffの動作原理を知る~
○ http://gihyo.jp/dev/column/01/prog/2011/diff_sd200906
And More -> http://cccis.jp
7. Scale of pixiv according to thumbnails
● pixiv has about 30,000,000 illusts and comics
● Each illust has about 12 ~ too many thumbnails
● 20,000 illusts and comics are uploaded every day
● Total volume is about 30TB
9. Image Upload Detail 1
● Generate too many thumbnails
○ 12 ~ too many thumbnails
● Save a original image and thumbnails to storage
○ not NFS
○ with in-house WebDAV Client(ImageClient)
20. semin-asynchronous upload mechanism
User Side Action
File Selection View Input Information View Completed View
create lock file poll until lock file is deleted
21. semin-asynchronous upload mechanism
User Side Action
File Selection View Input Information View Completed View
create lock file poll until lock file is deleted
Server Side Action
22. semin-asynchronous upload mechanism
User Side Action
File Selection View Input Information View Completed View
create lock file poll until lock file is deleted
Server Side Action
prefork server
23. semin-asynchronous upload mechanism
User Side Action
File Selection View Input Information View Completed View
create lock file poll until lock file is deleted
Server Side Action
prefork
prefork server upload worker
upload worker
・
・
・
・
・
upload worker
24. semin-asynchronous upload mechanism
User Side Action
File Selection View Input Information View Completed View
create lock file poll until lock file is deleted
Server Side Action
prefork
prefork server upload worker Genrate Thumbnails
upload worker Genrate Thumbnails
・
・
・
・
・
upload worker Genrate Thumbnails
25. semin-asynchronous upload mechanism
User Side Action
File Selection View Input Information View Completed View
create lock file poll until lock file is deleted
Server Side Action
prefork
prefork server upload worker Genrate Thumbnails delete lock file
upload worker Genrate Thumbnails delete lock file
・
・
・
・
・
upload worker Genrate Thumbnails delete lock file
26. Inside Image Upload
● User Side Action
○ Apache
○ PHP
■ ImageClient
● Server Side Action
○ daemontools
○ Python
■ python-prefork, python-q4m, python-worker
● python-q4m and python-worker are in-house libraries
○ Q4M
■ Used As Upload Job Queue
31. Detail of generating thumbnails
● pixiv uses ImageMagick
○ ImageMagick is not fast
○ But quality of generated image is good
○ Quality is more important than speed for us
○ Of course, optimization is important, too
35. benchmark of libjpeg and libjpeg-turbo
processing JPEG-image with libjpeg and libjpeg-turbo
■libjpeg
■libjpeg-turbo
36. benchmark of libjpeg and libjpeg-turbo
processing JPEG-image with libjpeg and libjpeg-turbo
■libjpeg
■libjpeg-turbo
libjpeg-turbo is faster than libjpeg by 10% on x86_64
38. Disable OpenMP
● Latest ImageMagick is OpenMP enabled at default
○ This is very slow in multi-process environment
● How to disable OpenMP
○ Re-compile with '--disable-openmp'
○ OMP_NUM_THREADS=1
● pixiv takes the latter
○ Re-compiling and Re-packaging are complicated
45. Why is dynamic thumbnail needed?
● Static thumbnail consumes disk space
○ Dynamic thumbnail does not consume disk space
● Preparing new static thumbnails takes a long time
○ Dynamic thumbnail is ready in a second!
53. Resize image with mod_small_light
GET /tank.jpg GET /small_light(dw=100,dh=100)/tank.jpg
54. Many Options
q image quality(0~100)
of output format(jpg,gif,png,tiff)
e processing engine(imlib2,imagemagick)
cc canvas color
p pattern name with SmallLightPatternDefine
etc Too many options
56. Resize image with mod_small_light
GET /small_light(p=small)/tank.jpg
GET /tank.jpg or
GET /small_light(dw=100,dh=100,e=imagemagick,jpeghint=y)/tank.jpg
57. Why not original mod_small_light?
● Some thumbnails are special
○ comic cover
○ various cropping algorithms depending on aspect ratio
● Default output format is JPEG
○ It is good for us that input and output formats are the same
58. Why not original mod_small_light?
● No support for CMYK
○ pixiv must support CMYK
● Needed support for strange aspect ratio
○ da=l -> long-edge
○ da=s -> short-edge
○ da=p -> pixiv-edge (only pixiv Edition)
65. Special Thumbnail 3
● special cropping algorithm
crop top of image if image is portrait
66. Many Extend Options
rmprof remove image profiles
crop_square crop image squarly
cover add manga cover
samec conform dw & dh & cw & ch
extendl extend long-edge
etc twenty new options in all
67. Summary
● Optimization of image processing is very important
○ Generating thumbnail takes a long time
○ Let's tune and desynchronize
● pixiv has two types of thumbnails
○ Static Thumbnail
○ Dynamic Thumbnail
● Dynamic Thumbnail is diskspace-saving and flexible
○ Big image storage is expensive
○ Easy to correspond to shift of application
■ New thumbnail types for new designs