Skip to main content
back to blogs

what i learned making large uploads reliable in flutter

March 11, 2026 12 min read

an opinionated guide to building a cross-platform upload system that handles flaky networks, background execution, and resumable uploads.

introduction

uploading a large file from a mobile app seems simple - until you try to make it reliable.

on paper, the flow is straightforward:

  • pick a file
  • send it to the backend
  • show a progress bar
  • wait for success

but in practice, things are not predictable:

  • apps get backgrounded
  • processes get killed
  • networks drop mid-upload
  • if upload state lives in memory, it disappears with the process

this is why large uploads are not just a networking problem - they are a state and recovery problem.

flutter makes this especially interesting. you get a shared cross-platform application layer, which is great for consistent product features. but the underlying platforms still behave very differently. android and ios have different constraints around background execution, file access, and long-running work, which means upload execution cannot be identical across platforms if reliability is the goal.

while working on large video uploads in a flutter app, i had to design a system that could survive real-world conditions - flaky networks, backgrounding, and partially completed uploads across restarts.

in this post, i will walk through the architecture and key design decisions that made that possible.

tldr

large file uploads are not just a networking problem - they are a state and recovery problem.

if upload state lives in memory, it breaks the moment the app is backgrounded or killed.

so the system needs to be built around:

  • a database as the source of truth (not in-memory state)
  • a resume-first design that can continue after interruptions
  • a queue-based scheduler (one file at a time, concurrency inside chunks)
  • adaptive concurrency for unstable mobile networks
  • platform-specific execution
    • android: dart-driven uploads + foreground service
    • ios: native URLSession + reconciliation

the key idea:

an upload is not a request - it is a long-running workflow.

once you treat it that way, reliability becomes a lot easier to reason about.

high level architecture

once it clicks that large uploads are not a "network problem" but a state and recovery problem, architecture changes. they cannot be treated as simple task siting inside a widget state that only works when everything goes right. but real world conditions are much more complex.

so instead of scattering logic across ui, services and callbacks, we designed a system around a few core ideas:

  • a single orchestration layer
  • a durable source of truth (caching layer)
  • a clean separation between shared logic and platform specific implementation
txt
  • flutter ui handles only the intent and updates to the user
  • queue manager handles the queue of uploads and the state of the uploads
  • upload service handles the upload of the file to the backend and connects to the upload engine
  • platform upload engine handles the upload of the file to backend for ios
  • local db (drift) stores the state of the uploads so that it can be persisted across restarts

resume-first design: why the database becomes the source of truth

the most important design decision in this system was not chunk size, background uploads or concurrency. it was "resume behaviour accross restarts".

most upload systems dont work this way, they are designed as create a task, track progress and hope the app stays alive.

this is why the database owns the upload life cycle.

each upload entry inside the db contains all the information to reconstruct and resume upload from where it left off.

  • parts already uploaded
  • total parts
  • total chunks (constant size)
  • status (idle | in_progress | complete | failed)
  • retry metadata (max retries, current retry count, last retry time)

which means even after intervention, we can relaibly resume the uploads.

how does the upload chunks actually go to the backend and s3?

  1. register an upload
  2. get presigned urls for x mb chunks
  3. upload/post each chunk on the presigned s3 url
  4. tell the backend the upload is complete with the part ids

queue processing: why only upload one file at a time

once the upload lifecycle became db driven and resumable, the next logical step was to figure out how to schedule the work.

at first, it might seem tempting to upload multple files simultaneously. but that introduces a lot of complexity and edge cases.

so intentionally, we only upload one file at a time but upload multiple chunks in parallel.

why not upload multiple files simultaneously?

large uploads are already heavy bandwith, memory, cpu, db writes and ui updates. now multiply this accross multiple files and chunks, yeah that will definitely break the app.

so instead of parallelizing at the file level, scheduling should be kept simple and concurrency should be pushed to the chunk level.

what does the queue manager actually do?

the queue manager sits between the ui and the upload pipeline. it is responsible for:

  • what runs next
  • what's active/pending/failed/paused
  • what happens of failure, completion or cancellation
  • it watches a stream from the db for changes and updates the state accordingly
  • also prioritizes uploads based on the status and metadata

concurrency

once the file is split into chunks, the next question is how many chunks to upload in parallel? a naive answer would be more chunks means faster uploads, but that's not always the case.

on mobile this would break quickly. aggressive concurrency leads to,

  • packet loss
  • retries
  • unstable latency
  • memory pressure

so the goal is not to maximize parallel requests, but be fast on good networks and resilient-stable on bad networks.

why fixed concurrency fails?

a fixed number of concurrent chunks sound simple, but mobile networks are not stable. on a good network, it maybe too low and on a bad network, it maybe too high.

so on android we have adaptive concurrency since the uploads are directly controlled on the dart side. we used an AIMD-style approach, what does that mean?

  • start with one
  • increase concurrency by 1 if the upload completed in a certain time
  • decrease concurrency by half if the upload failed or took too long

this approach helps us to be fast on good networks and resilient on bad networks.

ios is a bit different, as we dont have direct control over the uploads. they go through URLSession.background which is os managed.

background execution

this is where things stop being cross-platform. at flutter level, everything looks shared:

  • queue
  • state model
  • persistence
  • user actions

why background execution matters?

uploads take time, which means they overlap with app switching, screen lock, app suspension and more. if the uploads only work when the app is forgrounded, they are not gonna cut it. which makes background execution a core requirement.

android background upload

on android a foreground notification service helps to run the dart engine while is app is in the background and not killed due to this we dont have to create a separate native service for this so the orchestration layer stays in dart.

native stuff is used for content uri access and reading the files, whereas for the foreground notifications we use the local notification package.

ios: had to go fully native here

ios works very differently. To actually enable background upload capabilities, we have to go fully native and use URLSession.

this is because when the app is even backgrounded on ios the os gives you a grace period of about 30 seconds to start a background task but then the app is essentially killed for our purposes.

on ios flutter acts as the orchestration layer, native code is the actual execution engine and the os handles scheduling and persistence. what this actually means is that the uploads keep on working even when the app is completely killed.

  1. compute chunk size and offset, get presigned url + metadata, and pass that to the native engine. native reads the exact byte range, writes it to a temp file (ios uploads from files, not byte ranges), then creates a background upload task.

  2. run native execution in URLSessionConfiguration.background with sessionSendsLaunchEvents = true and waitsForConnectivity = true, so uploads continue while suspended and recover on network return.

  3. handle completion paths:

    • app alive: native sends events through a channel and flutter updates db state.

    • app dead: completed/failed parts are stored in UserDefaults and merged on restart.

  4. on launch, flutter syncs pending/completed/failed chunks and clears stale native pending state.

  5. continuously reconcile flutter state with native in-flight tasks to prevent duplicate chunk scheduling.

here's how we do it:

  • we keep track of which chunks have already been uploaded, and which ones are currently in progress.
  • if the app is restarted or killed, the native code exposes which upload tasks are still active.
  • the flutter layer picks up this in-flight state from native and continues to reflect progress in the UI and database.
  • this way, we avoid rescheduling or double-uploading any chunk—no repeats, no wasted upload cycles.

file access: why reading the file is not trivial

this is one of the parts which sounds boring but sure is critical cause how you actually read the file matters more than you think. loading a 20 gig file in memory doesnt actually make sense at all.

android: content uri vs file paths

on android, files don’t always come as simple paths. they can be: -> file:// paths which are essentially local storage paths -> content:// uris (gallery, storage access framework, etc)

content uris are not real paths so for chunk uploads

  • detect uri type
  • resolve uri using a ContentUriResolver class

this is why we end up using the native code here.

ios: security scoped access

ios is even stricter, if a file comes from outside your app, you don’t actually own it. you need security scoped access to the file.

which means: you need explicit permission to read it

  • that permission has to be persisted
  • and restored on every restart

why this matters for uploads?

chunk uploads depend on reading exact byte ranges. so file access has to be relaible across restarts, memory efficient and background execution compatible.

subtle problem: background + file access

this point matters most on ios.

why? because:

  • uploads are handled in the background
  • the native layer, not flutter, reads the file
  • sometimes the flutter/dart process isn’t running at all

so: file access logic must work independently of flutter.

this isn’t a minor technical detail—it’s a key part of making uploads reliable.

the key insight:

reliable uploads begin with reliable file access.

if you can’t always access the right bytes on demand, the whole upload pipeline becomes fragile.

failure modes we actually hit in production

this is where things got real. these were not theoretical problems. they showed up only after real users started uploading large videos.

app killed mid upload: upload state incomplete, missing parts

duplicate chunk uploads: stale in-memory state vs act ual uploaded parts

presigned urls expiring mid upload: chunks failing with 401/403

ios uploads completing without flutter: no event delivery, state mismatch

content uri permission loss (android): file read failures after restart

how we handled them

  • persisted every meaningful step in the database

  • skipped already uploaded parts using uploadedParts

  • treated iOS as eventually consistent, not real-time

  • added reconciliation on every restart

  • used native bridges for file access

the key idea was simple: don’t try to prevent failure, make recovery predictable

things we intentionally did not solve

not everything needs to be solved in v1, so some tradeoffs were made deliberately.

  • no parallel file uploads ()
  • no dynamic presign refresh mid upload (makes state transitions tricky)
  • no server side orchestration (decreases client control)
  • no chunk size auto tuning

chunking strategy: why ~30mb worked?

chunk size looks like a small decision, but it affects everything.

too small:

  • too many requests
  • more overhead
  • more db writes
  • more chances of failure

too large:

  • higher memory usage
  • expensive retries
  • slower recovery

we settled around ~30mb per chunk.

it gave us:

  • reasonable request count
  • efficient throughput
  • manageable retry cost
  • stable memory usage

there’s no perfect number, but this worked well in practice.

why flutter worked well (and where it didn’t)

flutter was a great fit for the control plane.

it let us centralize queue management, upload orchestration, retry logic, progress aggregation, persistence.

where flutter worked well

where native was required

for those, native integration was required.

flutter gave us a shared brain, but the body still depends on the platform.

final thoughts

large uploads look like a network problem on the surface, but it's actually a state and recovery problem. once you accept that things will break in production scenarios, you can start to design a system that can recover from failures. an upload is also not a request but a long running workflow.