Questions, Feedback and Suggestions #3 #146

mikf · 2019-01-01T15:46:34Z

Continuation of the old issue as a central place for any sort of question or suggestion not deserving their own separate issue. There is also https://gitter.im/gallery-dl/main if that seems more appropriate.

Links to older issues: #11, #74

rachmadaniHaryono · 2019-01-01T21:25:20Z

simple snippet to turn gallery-dl into api

from types import SimpleNamespace
from unittest.mock import patch, Mock
import os

import click
from flask.cli import FlaskGroup
from flask import (
    Flask,
    jsonify,
    request,
)

from gallery_dl import main, option
from gallery_dl.job import DataJob

def get_json():
    data = None
    parser = option.build_parser()
    args = parser.parse_args()
    args.urls = request.args.getlist('url')
    if not args.urls:
        return jsonify({'error': 'No url(s)'})
    args.list_data = True

    class CustomClass:
        data = []

        def run(self):
            dj = DataJob(*self.data_job_args, **self.data_job_kwargs)
            dj.run()
            self.data.append({
                'args': self.data_job_args,
                "kwargs": self.data_job_kwargs,
                'data': dj.data
            })

        def DataJob(self, *args, **kwargs):
            self.data_job_args = args
            self.data_job_kwargs = kwargs
            retval = SimpleNamespace()
            retval.run = self.run
            return retval

    c1 = CustomClass()
    with patch('gallery_dl.option.build_parser') as m_bp, \
            patch('gallery_dl.job.DataJob', side_effect=c1.DataJob) as m_jt:
        #  m_option.return_value.parser_args.return_value = args
        m_bp.return_value.parse_args.return_value = args
        m_jt.__name__ = 'DataJob'
        main()
        data = c1.data
    return jsonify({'data': data, 'urls': args.urls})

def create_app(script_info=None):
    """create app."""
    app = Flask(__name__)
    app.add_url_rule(
        '/api/json', 'gallery_dl_json', get_json)
    return app


@click.group(cls=FlaskGroup, create_app=create_app)
def cli():
    """This is a script for application."""
    pass


if __name__ == '__main__':
    cli()

e: this could be simple when using direct DataJob to handle the urls, but i haven't check if there is anything have to be done before initialize DataJob instance

mikf · 2019-01-03T16:22:15Z

this could be simple when using direct DataJob to handle the urls, but i haven't check if there is anything have to be done before initialize DataJob instance.

You don't need to do anything before initializing any of the Job classes:

>>> from gallery_dl import job
>>> j = job.DataJob("https://imgur.com/0gybAXR")
>>> j.run()
[ ... ]

You can initialize anything logging related if you want logging output,
or call config.load() and config.set(...) if you want to load a config file and set some custom options,
but none of that is necessary.

DonaldTsang · 2019-01-08T14:05:38Z

@rachmadaniHaryono what does that code do?

rachmadaniHaryono · 2019-01-08T16:09:14Z

simpler api (based on above suggestion)

#!/usr/bin/env python
from types import SimpleNamespace
from unittest.mock import patch, Mock
import os

import click
from flask.cli import FlaskGroup
from flask import (
    Flask,
    jsonify,
    request,
)

from gallery_dl import main, option
from gallery_dl.job import DataJob
from gallery_dl.exception import NoExtractorError


def get_json():
    data = []
    parser = option.build_parser()
    args = parser.parse_args()
    args.urls = request.args.getlist('url')
    if not args.urls:
        return jsonify({'error': 'No url(s)'})
    args.list_data = True
    for url in args.urls:
        url_res = None
        error = None
        try:
            job = DataJob(url)
            job.run()
            url_res = job.data
        except NoExtractorError as err:
            error = err
        data_item = [url, url_res, {'error': str(error) if error else None}]
        data.append(data_item)
    return jsonify({'data': data, 'urls': args.urls})


def create_app(script_info=None):
    """create app."""
    app = Flask(__name__)
    app.add_url_rule(
        '/api/json', 'gallery_dl_json', get_json)
    return app


@click.group(cls=FlaskGroup, create_app=create_app)
def cli():
    """This is a script for application."""
    pass


if __name__ == '__main__':
    cli()

rachmadaniHaryono · 2019-01-08T16:09:43Z

gug for hydrus (port 5013)

DonaldTsang · 2019-01-08T16:34:57Z

@rachmadaniHaryono instructions on using this GUG and combing it with Hydrus? Any pre-configurstions besides pip3 install gallery-dl ?

rachmadaniHaryono · 2019-01-08T21:47:33Z

put that on script (e.g. script.py)
import gug into hydrus
pip3 install flask gallery-dl (add --user if needed)
run python3 script.py --port 5013

DonaldTsang · 2019-01-09T13:49:41Z

@rachmadaniHaryono add that to the Wiki in https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts if you can, sounded like a really good solution. Also, why port 5013, is that port specifically used for something?

rachmadaniHaryono · 2019-01-09T15:15:42Z

Also, why port 5013, is that port specifically used for something

not a really technical reason. i just use it because the default port is used for my other program.

add that to the Wiki in CuddleBear92/Hydrus-Presets-and-Scripts if you can

i will consider it, because i'm not sure where to put that

another plan is fork (or create pr) for server command but i'm not sure if @mikf want pr for this

DonaldTsang · 2019-01-10T03:31:08Z

@rachmadaniHaryono https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/wiki
Also I would like @mikf to have a look at this, since this is pretty useful.
BTW, what is the speed overhead of using this over having a separate txt file like the one in Bionus/imgbrd-grabber#1492 ?

rachmadaniHaryono · 2019-01-10T09:10:49Z

BTW, what is the speed overhead of using this over having a separate txt file like the one in Bionus/imgbrd-grabber#1492 ?

this depend on hydrus vs imgbrd-grabber download speed. from my test gallery-dl give direct link, so hydrus don't have to process the link anymore.

mikf · 2019-01-10T12:47:19Z

another plan is fork (or create pr) for server command but i'm not sure if @mikf want pr for this

I've already had something similar to this in mind (implementing a (local) server infrastructure to (remotely) send commands / queries: gallery-dl --server), so I would be quite in favor of adding functionality like this.
But I'm not so happy about adding flask as a dependency, even if optional. I just generally dislike adding dependencies if they aren't absolutely necessary. I was thinking of using stuff from the http.server module in Python's standard library if possible.
Also: the script you posted here should be simplified quite a bit further. For example there is no need to build an command line option parser. I'll see if I can get something to work on my own.

A few questions from me concerning Hydrus

The whole thing is written in Python, even version 3 since the last update. Isn't there a better way of coupling it with another Python module than a HTTP server? As in "is it possible to add a native "hook" to make it call another Python function"?
Is there any documentation for the request and response data formats Hydrus sends to and expects from GUG's? I've found this, but that doesn't really explain how Hydrus interacts with other things.

rachmadaniHaryono · 2019-01-10T13:48:35Z

But I'm not so happy about adding flask as a dependency, even if optional. I just generally dislike adding dependencies if they aren't absolutely necessary. I was thinking of using stuff from the http.server module in Python's standard library if possible.

this still depend on how big will this be. will it just be an api or there will be html interface for this. although an existing framework will make it easier and the plugin for the framework will let other developer create more feature they want.

of course there is more better framework than flask as example, e.g. sanic, django but i actually doubt if using the standard will be better than those.

Also: the script you posted here should be simplified quite a bit further. For example there is no need to build an command line option parser.

that is modified version from flask cli example. flask can do that simpler but it require to set up variable environment which add another command

The whole thing is written in Python, even version 3 since the last update. Isn't there a better way of coupling it with another Python module than a HTTP server? As in "is it possible to add a native "hook" to make it call another Python function"?

hydrus dev is planned to make api for this on the next milestone. there is also other hydrus user which make unofficial api but he didn't make one for download yet. so either wait for it or use existing hydrus parser

Is there any documentation for the request and response data formats Hydrus sends to and expects from GUG's? I've found this, but that doesn't really explain how Hydrus interacts with other things.

hydrus expect either html and json and try to extract data based on the parser the user made/import. i make this one for html but it maybe changed on future version https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/blob/master/guide/create_parser_furaffinity.md .

if someone want to make one, they can try made api similar to 4chan api,copy the structure and use modified parser from existing 4chan api.

my best recommendation is to try hydrus parser directly and see what option is there. ask hydrus discord channel if anything is unclear

wankio · 2019-01-11T10:27:37Z

can gallery-dl support weibo ? i found this https://github.com/nondanee/weiboPicDownloader but it take too long to scan and dont have ability to skip downloaded files

mikf · 2019-01-13T09:26:30Z

@rachmadaniHaryono I opened a new branch for API server related stuff. The first commit there implements the same functionality as your script, but without external dependencies. Go take a look at it if you want.

And when I said your script "should be simplified ... further" I didn't mean it should use less lines of code, but less resources in term of CPU and memory. Python might not be the right language to use when caring about things like that, but there is still no need to call functions that effectively do nothing - command-line argument parsing for example.

rachmadaniHaryono · 2019-01-13T14:59:15Z

will it be only api or will there will be html interface @mikf?

e: i will comment the code on the commit

mikf · 2019-01-13T15:39:05Z

I don't think there should be an HTML interface directly inside of gallery-dl. I would prefer it to have a separate front-end (HTML or whatever) communicating with the API back-end that's baked into gallery-dl itself. It is a more general approach and would allow for any programing language and framework to more easily interact with gallery-dl, not just Python.

rachmadaniHaryono · 2019-01-13T17:09:42Z

based on 8662e72
album.title is now parsed as album tag
source url and download url are minimum 2 character (fix host:port/api/json/1 error)
description is not None or none

still on port 5013

e: related issue CuddleBear92/Hydrus-Presets-and-Scripts#69

wankio · 2019-02-01T16:49:20Z

About twitter extractor, we have limited request depend on how many tweets user had right ?
if user have over 2k+ media, 99% it can't download full media

mikf · 2019-02-02T14:17:55Z

@wankio The Twitter extractor gets the same tweets you would get by visiting a timeline in your browser and scrolling down until no more tweets get dynamically loaded. I don't know how many tweets you can access like that, but Twitter's public API has a similar restriction::

https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html

This method can only return up to 3,200 of a user's most recent Tweets. Native retweets of other statuses by the user is included in this total, regardless of whether include_rts is set to false when requesting this resource.

You could try ripme. It uses the public API instead of a "hidden", browser-only API like gallery-dl. Maybe you can get more results with that.

wankio · 2019-02-03T15:42:07Z

but if i remember, ripme rip all tweet/retweet not just user tweet

schleifen · 2019-02-07T21:56:53Z

For some reason the login with OAuth and App Garden tokens or the -u/-p commands doesn't work with flickr which makes images that require a login to view them not downloadable. But otherwise amazing tool, thank you so much!

wankio · 2019-02-24T14:45:12Z

today when i'm checking e-hentai/exhentai, it just stucked forever. maybe my ISP is the problem because i can't access e-hentai but exhentai still ok. So i think Oauth should help, using cookies instead id+password to bypass

ghost · 2019-04-10T14:26:55Z

is there a way to download files directly in a specified folder instread of subfolders?
for exemple for the picture to be downloaded in F:\Downloaded\ i tried using
gallery-dl -d "F:\Downloaded" https://imgur.com/a/xcEl2WW
but instead they get downloaded to F:\Downloaded\imgur\xcEl2WW - Inklings
is there an argument i could add to the command to fix that?

github-userx · 2024-02-01T14:58:47Z

Message ID: ***@***.***>Did gofile.io change something? Only getting errors.. (with other python scripts as well)

github-userx · 2024-02-01T15:28:30Z

Message ID: ***@***.***>GoFile.io issue solution: https:// <https://gitjub>github.com/Jules-WinnfieldX/CyberDropDownloader/pull/802

AdamSaketume26 · 2024-02-04T07:32:28Z

could someone please help me with making instagram download stories, reels, posts to their own files using command line?

AdamSaketume26 · 2024-02-04T08:18:39Z

i notice that even when gallery-dl says i have ratelimit i can still use twitter website and go to the users profile and view their tweets if i click replies or media tab. but when i change timeline strategy to media and i change twitter include to "media" it doesnt work and i am still rate limit. why is there ratelimit for gallerydl but not on website media tab?

biggestsonicfan · 2024-02-04T21:08:42Z

Can you pass the cookies that gallery-dl is currently using to a post-processor?

mikf#146 (comment)

WhyEssEff · 2024-02-06T19:35:43Z

Could we have a config option to sleep the extractor for a set amount of time upon encountering a 429 Too Many Requests error and retry with base delay before it goes into the delay interval increase routine?

I'm wondering for larger image repositories (my use case in this instance is DeviantArt, I'm downloading collections) if I just slept it for five/ten minutes or so and continued as normal afterwards if it might be faster than getting stuck in 17s delay purgatory. It's currently effectively what I'm doing when I interrupt the process when it gets too egregious and then attempt again 10 minutes later, and it seems to work as usual upon start, I just want to be able to continue where I left off.

biggestsonicfan · 2024-02-06T20:09:13Z

@WhyEssEff Is that not what sleep-request is for?
https://github.com/mikf/gallery-dl/blob/22647c2626eb8e4387407383ce9ee2508a507ec1/docs/configuration.rst#L360C1-L369C28

WhyEssEff · 2024-02-06T20:12:36Z

@biggestsonicfan I'd prefer the behavior I'm trying to get at to happen specifically on encountering an error. I'd like to assume minimum request time when possible, while telling the extractor to halt for x seconds if it throws back a 429, in order to see if it can just restart comfortably on the default delay after just not doing anything for x amount of time.

e.g., assume 0.5 second sleep-request until 429 is thrown, pause the extractor for 120 seconds, retry with default delay, then if it's still throwing 429s assume current behavior of increasing delay interval by 1s and trying again until it works

what this could look like would be something akin to the following:
[warning] API responded with 429 Too Many Requests. Pausing extractor for {configured-amount}s.

and then it could retry with default delay, upon which if it still fails it increases delay interval.

I'm wondering this because the longer delays sort of rack up runtime cumulatively and it might be more optimal for larger galleries to have this option, even if you have to set it to like 5/10 minutes to use it effectively.

throwaway242685 · 2024-02-09T19:04:38Z

hi, is there a way to download all the saved posts on my Instagram account with gallery-dl?

like, all of them at once.

Noonereallycomeon · 2024-02-15T12:31:58Z

@useless642
you can get the link from Instagram on web browser it's "https://www.instagram.com/{username}/saved/all-posts/"
there's also separate links for each collection if you want to download them separately.

mikf · 2024-02-16T00:20:22Z

@github-userx gofile will be fixed in the next release: 3433481

@biggestsonicfan Enable metadata-extractor and use the cookies property of the inserted object. cookies is a CookieJar object though.

taskhawk · 2024-02-22T01:07:45Z

How often are these QFS issues rotated? This one is getting kinda long.

throwaway242685 · 2024-02-22T05:27:01Z

hi, does gallery-dl have a parameter to update it to the master on GitHub?

yt-dlp has the following command:

yt-dlp --update-to master

does gallery-dl have something similar? if not, could this be implemented? 🙏

fireattack · 2024-02-22T06:17:32Z

You can just run python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz

BakedCookie · 2024-02-28T16:47:29Z

metadata.event has both file and skip. Is there a way to combine them? I'd like to save metadata on file download, but I'd also like to update the metadata for any skipped files. This is what my extractor looks like:

"extractor": {
        "base-directory": "./",
        
        "skip": "abort:100",
        
        "postprocessors": [
            {
                "name": "metadata",
                "mode": "json",
                "event": "skip",
                "directory": ".meta"
            }
        ],

mikf · 2024-02-28T17:05:06Z

@BakedCookie
"event": ["file", "skip"] or "event": "file,skip"

(This isn't properly documented for some reason, while other options with similar semantics like include are.)

biggestsonicfan · 2024-02-29T12:16:50Z

Can gallery-dl cache cookies grabbed from the browser for a duration if you grab cookies from a browser? I'm noticing startup takes a while per-use and if I use cookies from a file, it's instant.

mikf · 2024-02-29T12:23:10Z

@biggestsonicfan
It keeps them in memory while it is running.

To improve startup time, you could use BROWSER/.DOMAIN (e.g. firefox/.instagram) as --cookies-from-browser argument to only extract cookies for that domain instead of all; or you write them to a cookies.txt file with something like

gallery-dl --cookies-from-browser firefox --cookies-export cookies.txt --no-download http://example.org/a.jpg

and then load them from there.

#146 (comment)

JinEnMok · 2024-02-29T22:46:25Z

I noticed that per #80 there was some talk about Collections, but they still aren't implemented. They probably aren't that different from albums (see e.g. https://www.artstation.com/gallifreyan/collections/197428), so probably (?) wouldn't be that hard to implement.

Should I open an issue for this?

biggestsonicfan · 2024-03-01T01:19:30Z

To improve startup time, you could use BROWSER/.DOMAIN (e.g. firefox/.instagram) as --cookies-from-browser argument to only extract cookies for that domain instead of all; or you write them to a cookies.txt file with something like

I used to do cookies.txt on a per-site basis but it got a little tedious to manage. I already do

"cookies": ["firefox", "/firefox/profile/path", null, null, ".deviantart.com"],

if that's what you meant. I will try something like "cookies": ["firefox/.patreon", "/firefox/profile/path", null, null, ".patreon.com"], though to test.

Ah: patreon: cookies: Unsupported browser 'firefox/.patreon'

Moving on from that though, I would like to contribute a new site support for gallery-dl but other than browsing the existing code, I don't really see any templates for both an extractor and a test suite? Where would I start for a site that has embedded JSON in it's html page of the page's contents?

mikf · 2024-03-01T17:36:55Z

@JinEnMok
Will be supported in the next batch of commits I git push to GitHub.
edit: cf9e99c

@biggestsonicfan
Yeah, that's what I was referring to. There isn't much you can do then, except the cookies.txt file thing. Configurable caching behavior is not implemented yet. (I will be eventually)

I don't really see any templates for both an extractor and a test suite

There isn't any. You could take a look at merged PRs that add support for a new site.

Where would I start for a site that has embedded JSON in it's html page of the page's contents?

text.extr(…) the JSON part and util.json_loads(…) it. (example)

JinEnMok · 2024-03-01T18:29:51Z

@mikf you're the best, cheers! :)

mikf · 2024-03-01T20:00:40Z

Closing this as suggested by taskhawk (#146 (comment)).
New issue: #5262.

#146 (comment)

mikf added Questions Meta labels Jan 1, 2019

rachmadaniHaryono mentioned this issue Jan 8, 2019

Downloading tags as separate files (for Hydrus) #135

Closed

DonaldTsang mentioned this issue Jan 13, 2019

Hydrus support Xonshiz/comic-dl#205

Open

3 tasks

rachmadaniHaryono referenced this issue Jan 13, 2019

API server implementation using http.server

8662e72

DonaldTsang mentioned this issue Jan 14, 2019

Python Wrapper for WorkCrawler kanasimi/work_crawler#159

Closed

bradenhilton pushed a commit to bradenhilton/gallery-dl that referenced this issue Feb 5, 2024

[nijie] add 'count' metadata field

3cbd390

mikf#146 (comment)

mikf added a commit that referenced this issue Feb 29, 2024

[docs] add ability to specify more than one 'event'

c006f9c

#146 (comment)

mikf mentioned this issue Mar 1, 2024

Questions, Feedback, and Suggestions #4 #5262

Closed

mikf closed this as completed Mar 1, 2024

mikf unpinned this issue Mar 1, 2024

mikf added a commit that referenced this issue Mar 2, 2024

[artstation] support collections (#146)

cf9e99c

#146 (comment)

mikf mentioned this issue Dec 1, 2024

Questions, Feedback, Suggestions #5 #6582

Open

Questions, Feedback and Suggestions #3 #146

Questions, Feedback and Suggestions #3 #146

Comments

mikf commented Jan 1, 2019

rachmadaniHaryono commented Jan 1, 2019 • edited Loading

mikf commented Jan 3, 2019

DonaldTsang commented Jan 8, 2019

rachmadaniHaryono commented Jan 8, 2019

rachmadaniHaryono commented Jan 8, 2019

DonaldTsang commented Jan 8, 2019

rachmadaniHaryono commented Jan 8, 2019

DonaldTsang commented Jan 9, 2019

rachmadaniHaryono commented Jan 9, 2019 • edited Loading

DonaldTsang commented Jan 10, 2019

rachmadaniHaryono commented Jan 10, 2019

mikf commented Jan 10, 2019

rachmadaniHaryono commented Jan 10, 2019

wankio commented Jan 11, 2019

mikf commented Jan 13, 2019 • edited Loading

rachmadaniHaryono commented Jan 13, 2019 • edited Loading

mikf commented Jan 13, 2019

rachmadaniHaryono commented Jan 13, 2019 • edited Loading

wankio commented Feb 1, 2019

mikf commented Feb 2, 2019

wankio commented Feb 3, 2019

schleifen commented Feb 7, 2019

wankio commented Feb 24, 2019

ghost commented Apr 10, 2019

github-userx commented Feb 1, 2024 via email

github-userx commented Feb 1, 2024 via email

AdamSaketume26 commented Feb 4, 2024

AdamSaketume26 commented Feb 4, 2024

biggestsonicfan commented Feb 4, 2024 • edited Loading

WhyEssEff commented Feb 6, 2024 • edited Loading

biggestsonicfan commented Feb 6, 2024

WhyEssEff commented Feb 6, 2024 • edited Loading

throwaway242685 commented Feb 9, 2024 • edited Loading

Noonereallycomeon commented Feb 15, 2024 • edited Loading

mikf commented Feb 16, 2024

taskhawk commented Feb 22, 2024

throwaway242685 commented Feb 22, 2024 • edited Loading

fireattack commented Feb 22, 2024

BakedCookie commented Feb 28, 2024

mikf commented Feb 28, 2024

biggestsonicfan commented Feb 29, 2024

mikf commented Feb 29, 2024

JinEnMok commented Feb 29, 2024

biggestsonicfan commented Mar 1, 2024

mikf commented Mar 1, 2024 • edited Loading

JinEnMok commented Mar 1, 2024

mikf commented Mar 1, 2024

rachmadaniHaryono commented Jan 1, 2019 •

edited

Loading

rachmadaniHaryono commented Jan 9, 2019 •

edited

Loading

mikf commented Jan 13, 2019 •

edited

Loading

rachmadaniHaryono commented Jan 13, 2019 •

edited

Loading

rachmadaniHaryono commented Jan 13, 2019 •

edited

Loading

biggestsonicfan commented Feb 4, 2024 •

edited

Loading

WhyEssEff commented Feb 6, 2024 •

edited

Loading

WhyEssEff commented Feb 6, 2024 •

edited

Loading

throwaway242685 commented Feb 9, 2024 •

edited

Loading

Noonereallycomeon commented Feb 15, 2024 •

edited

Loading

throwaway242685 commented Feb 22, 2024 •

edited

Loading

mikf commented Mar 1, 2024 •

edited

Loading