Archived Support Site

This support site is archived. You can view the old support discussions but you cannot post new discussions.

File upload limitations

smallredbox's Avatar

smallredbox

22 May, 2013 05:37 AM

I have been reviewing the forum discussions and StackOverflow posts focusing on the 10 MB arbitrary limitation that the appharbor service imposes on HTTP POSTs, and I have some questions and concerns that you may be able to help me with.

Is there a plan to adjust the 10 MB cap value? Or at least allow the application's configuration to determine the maximum?

I have issues with the suggested workaround of pushing directly into an AWS data store.

Firstly, using this process the data may get uploaded, but the application doesn't know about it, meaning that you can get data in your AWS storage service that has no reference in the database.

Secondly, the AWS workaround required pre-generates the file reference in the encrypted form data. This means that components that upload dynamic numbers of files such as Uploadify can't be wired up this way, because you never know how many files will be selected at any one time.

Thirdly, the responses are assuming that AWS is being used at the data store. If you're pushing data into sql binary stores or mongo data stores then direct inserts from web is not possible and from a security point would be unacceptable anyway.

So far I've managed to work around all the nuances of deploying using AppHarbor so far, and most have required some work to do so. But breaking standard .NET functionality by putting an arbitrary value cap on this seems contrary to all sense. Are you able to explain to me how I would structure a process that works around this limit, still works with multiple file uploaders, and ensures the database storage understands about all of this data? Because I sure can't see a viable solution, which is odd, considering the exact code I have implemented works on every server other than the AppHarbor service.

Thanks for your help.

  1. Support Staff 1 Posted by rune on 22 May, 2013 07:03 AM

    rune's Avatar

    Hi,

    The limit is actually not entirely arbitrary - it's a balance between allowing for many use cases while ensuring stability on our shared load balancers. The primary reason we're enforcing this limit is to mitigate the risks associated with certain types of DoS attacks. Last time I checked our shared load balancer limits are relatively high compared to alternative solutions, but I'd certainly be interested in learning more about that.

    Please note that we also offer dedicated load balancers, and we can certainly set the limit as high as you'd like on those.

    It sounds like you've had a hard time integrating with some aspects of AppHarbor. We do of course want to make the platform as easy to use and support a wide array of use cases. Sometimes, however, limitations such as this one can actually lead to better, more stable and scalable solutions. For instance it's very, very unlikely that S3 will crash or be unable to scale compared to a solution where you service files from a SQL Server or MongoDB instance. S3 and other similar services are designed specifically for storing and serving files at scale and they do so relatively well.

    I also wanted to address some of your concerns with regards to the approach where you upload directly to S3 or a similar service:

    1) You could relatively easily create a record in your database when you create the signed form that the user uses to upload a file. This record could have a "Status" field where you indicate "Uploading" initially. After the user completes an upload there are a couple of ways you can notify your application about the successful upload - either you use the automatic redirect that S3 supports after an upload (can be an arbitrary URL, described further under "Redirection" in this article) or you can make an AJAX request back to your application notifying it about the file upload status.

    2) As mentioned in this Stack Overflow post you can actually use a ${filename} macro in your policy instead of the actual key name. This way you can reuse the same authorization for multiple files, and keep the original filename. One limitation if you're using the regular "HTTP form POST" approach is, as you point out, that you can't upload multiple files. However, this is possible by using CORS, which is supported by S3 and GCS. I wrote a blog post and a client library about it recently and you can try it right here - it allows for uploads of multiple files to S3 and GCS with pretty much minimal configuration. This solution integrates with another plugin, but it should be pretty simple to adapt it.

    3) Is there are good reason for storing files in SQL Server or MongoDB? Otherwise I don't really see a reason to do so. You can apply whatever policy you like on S3 and GCS. For instance, in the example application I linked to above the files are actually stored so that only the "Bucket owner" can access them by default. The way they are displayed is with an authorized URL, which you again can integrate with your application's business logic to support your security and privacy requirements.

    This being said we'll certainly take your concerns into consideration and reevaluate whether the maximum size should go up. The load balancers have been running very stable with the current configuration and we might be able to turn it up in the relatively near future, at least for applications on a paid plan (such as yours). Of course you could also go for a dedicated load balancer in which case we can make it available immediately.

    I'm curious to hear what your requirements are in terms of maximum upload size?

    Best,
    Rune

  2. 2 Posted by smallredbox on 22 May, 2013 09:39 AM

    smallredbox's Avatar

    Hi Rune,

    Thanks for getting back to me on this. I completely understand that it's a
    balance between features and stability, and other companies I have deployed
    on have had to balance the same constraints. The fact that AppHarbor doe
    scale so effectively using this mechanism is the primary reason that we
    went with you, so I wouldn't want to do anything to jeapordise that.

    Responding to your responses...

    1) We already do implement something similar. So it's not a leap to do
    this. However we store our files in S3 based on the Id in the database. So
    our current process is:

        User selects multiple files
        For each file
            Make ajax http post to AppHarbor
            On server
                 Pull out file specifications (image dimensions, pdf number of
    pages dimensions, video length, etc).
                 From AppHarbor Push file to S3.
                 Update database with specifications.
                 Return the details to the client.

    So to effectively implement what you're suggesting we'd have to move to a
    substantially more complex process, i.e. the following:

        User selects multiple files
        For each file
            Make an ajax call to the AppHarbor server to generate the record
    and get the Id
            Make an ajax http post to the S3 servers to push the data in
            Make an ajax call to the AppHarbor server to indicate the file has
    uploaded successfully.

        From agent, monitor for new files
        For each new file
            Pull file back to AppHarbor
            Pull out file specifications (image dimensions, pdf number of pages
    dimensions, video length, etc).
            Update database with specifications.

        From the client
        For each new file
            Wait for the agent to process the files.
            Retrieve the file details.

    As you can see... So much more complex, and only because the configuration
    is limited.

    And now because we've got this issue, to do a simple deployment of our
    application we have to:
      a) Rewrite our code base to get around your configuration.
      b) Rent a custom load balancer, even though we're processing almost no
    traffic yet.
      c) Move our application back to one of our previous hosts, with reduced
    scalability.

    Previous apps we've deployed at crucial.com.au, but we had to manage some
    of the server configurations. They managed load balancers and networking. I
    was just hoping to use a service to managed everything except for the code
    I deployed. But I don't want to specifically change all my implementations
    to fit the requirements of the hosting company. I'll bend, but I don't want
    to rewrite everything, and I really don't want to bind to a specific
    implementation that ties us to a particular company. After all, what
    happens if your company goes bust and we have to move directly on to AWS,
    or to another host.

    3) Yes there is good reason. Some companies don't want their data in S3.
    Some of the systems we've deployed are done so in the cloud, and other
    installs get done on the customer's site. On site, Mongo's GridFS is a
    great viable solution for file storage as it's redundant and reliable. With
    IoC through simple web configuration changes we can swap out data storage
    to meet the needs of the customer, and without them seeing any change in
    the process.

    At Crucial, we had a 500MB limit on http posts. It's not uncommon to get
    PDF files that are almost that big, and video's not that far off either. I
    could live with 200MB as a reasonable compromise. In MVC aparently you can
    also lock it down so there's a small post limit on all urls except for
    specific ones that you open up. We could even live with that, because all
    of our http posts for uploading resources only goes through one specific
    url.

    Kind Regards,

    Ben Aird
    Ph: 0408 815 279
    www.smallredbox.com

  3. Support Staff 3 Posted by rune on 23 May, 2013 12:56 AM

    rune's Avatar

    Hi Ben,

    Thanks for your thorough explanation of your use case. This certainly helps us evaluate our offering and the file upload limit. I can also see how using S3 or similar services may not be optimal for your scenario.

    The reason we impose such a small limit is that we don't want all applications to be able to make large uploads - in particular free apps where anyone could create and abuse it. We can significantly limit the risk and the number of apps that can upload large files by only allowing larger file uploads (probably up to 200-250MB) on paid plans. Since your application slidealicious is already on the Catamaran plan it'd be covered if we make this change.

    I can't promise anything as we'll have to thoroughly test the stability when increasing the upload file size like this - even though the number of apps is small. Let me get back to you later this week (probably Friday) with an update on the progress. Would that be ok with your timeline/schedule?

    Best,
    Rune

  4. 4 Posted by smallredbox on 23 May, 2013 05:45 AM

    smallredbox's Avatar

    Hi Rune,

    Thanks for continuing to work with me on this. I realise that it's a
    complicated issue, and I don't want to push you to do anything rashly that
    would hurt your business. So take what ever time you need to work through
    the process, and hopefully we can work something out together.

    Also, you mentioned that a custom load balancer would be an option. Do you
    offer that service, and how much does it cost? It's probably out of our
    price range at the moment, but it's something we'll probably consider if
    any of our deployed services grow further.

    Kind Regards,

    Ben Aird
    Ph: 0408 815 279
    www.smallredbox.com

  5. Support Staff 5 Posted by rune on 23 May, 2013 10:34 PM

    rune's Avatar

    Hi Ben,

    Custom load balancers are currently marketed primarily as "IP-based SSL" and is included in our Yacht plan. It's also available for any plan at $100/month, which we hope to be able to reduce in the not-so-distant future. Besides from getting a custom configuration for your load balancer tailored to suit your requirements you'd also get your own IP address, which ensures the highest SSL compatibility available.

    Best,
    Rune

  6. Support Staff 6 Posted by rune on 31 May, 2013 12:03 AM

    rune's Avatar

    Hi again,

    We've conducted some thorough testing and and have decided to start rolling it out by configuring it on an per-application basis for approved apps. I'll be happy to include you in this and increase the upload file size limit to 250MB as a start if you're still interested in that - whether it be on a shared or dedicated load balancer.

    We still reserve the right to decrease the limit again though in case it is necessary. In the worst case we'd ask that you move to a dedicated load balancer to keep the limit though, so your use case will always be supported. We don't expect this to cause any issues as we'll roll it out gradually and with additional monitoring.

    Best,
    Rune

  7. 7 Posted by smallredbox on 31 May, 2013 12:06 AM

    smallredbox's Avatar

    Hi Rune,

    That sounds great. I'd love to be a part of the increase. For the most part
    it would solve most of my uploading issues.

    Kind Regards,

    Ben Aird
    Ph: 0408 815 279
    www.smallredbox.com

  8. Support Staff 8 Posted by rune on 06 Jun, 2013 05:06 AM

    rune's Avatar

    Hi Ben,

    I've now increased the maximum request body size for slidealicious to 300MB. The change will take effect next time you deploy the application.

    Let me know if you experience any issues with it or if you're not able to actually upload that much data. It should work though, but we might have to configure a few timeouts as well. As long as there's data travelling over the wire though the uploads should complete successfully.

    Best,
    Rune

  9. rune closed this discussion on 06 Jun, 2013 05:06 AM.

Discussions are closed to public comments.
If you need help with AppHarbor please start a new discussion.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac