Skip to main content

java 下载网络文件

从网络下载文件这么基础的需求, 每次要写也还是要到网上重新搜一遍. 这次发现baledung这个文档非常完整, 摘录下.

Download a File From an URL in Java

https://www.baeldung.com/java-download-file

https://github.com/eugenp/tutorials/tree/master/core-java-modules/core-java-networking-2

https://github.com/eugenp/tutorials/blob/master/core-java-modules/core-java-networking-2/src/main/java/com/baeldung/download/FileDownload.java

java 自带流式下载

使用BufferedInputStream缓存通过read读到的每个byte, 缓存地址是application的内存.

When reading one byte at a time using the read() method, each method call implies a system call to the underlying file system. When the JVM invokes the read() system call, the program execution context switches from user mode to kernel mode and back.

try (BufferedInputStream in = new BufferedInputStream(new URL(FILE_URL).openStream());
FileOutputStream fileOutputStream = new FileOutputStream(FILE_NAME)) {
byte dataBuffer[] = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(dataBuffer, 0, 1024)) != -1) {
fileOutputStream.write(dataBuffer, 0, bytesRead);
}
} catch (IOException e) {
// handle exception
}

简化版, 直接从文件下载

InputStream in = new URL(FILE_URL).openStream();
Files.copy(in, Paths.get(FILE_NAME), StandardCopyOption.REPLACE_EXISTING);

nio下载

前面的BufferedInputStream下载, 需要经过应用内存, 然后再切换到系统内存保存到文件中. 使用nio模块可以避免这个问题, 直接通过网络下载到文件中, 全程不需要经过应用程序内存的缓存. 真是高级.

The transferTo() and transferFrom() methods are more efficient than simply reading from a stream using a buffer. Depending on the underlying operating system, the data can be transferred directly from the filesystem cache to our file without copying any bytes into the application memory.

On Linux and UNIX systems, these methods use the zero-copy technique that reduces the number of context switches between the kernel mode and user mode.

ReadableByteChannel readableByteChannel = Channels.newChannel(url.openStream());
FileOutputStream fileOutputStream = new FileOutputStream(FILE_NAME);
FileChannel fileChannel = fileOutputStream.getChannel();
fileChannel.transferFrom(readableByteChannel, 0, Long.MAX_VALUE);

异步客户端

AsyncHttpClient is a popular library for executing asynchronous HTTP requests using the Netty framework.

Notice that we’ve overridden the onBodyPartReceived() method. The default implementation accumulates the HTTP chunks received into an ArrayList. This could lead to high memory consumption, or an OutOfMemory exception when trying to download a large file.

Instead of accumulating each HttpResponseBodyPart into memory, we use a FileChannel to write the bytes to our local file directly. We’ll use the getBodyByteBuffer() method to access the body part content through a ByteBuffer.


AsyncHttpClient client = Dsl.asyncHttpClient();

FileOutputStream stream = new FileOutputStream(FILE_NAME);


client.prepareGet(FILE_URL).execute(new AsyncCompletionHandler<FileOutputStream>() {

@Override
public State onBodyPartReceived(HttpResponseBodyPart bodyPart)
throws Exception {
stream.getChannel().write(bodyPart.getBodyByteBuffer());
return State.CONTINUE;
}

@Override
public FileOutputStream onCompleted(Response response)
throws Exception {
return stream;
}
})


apchace common io

这个apache commmons io的依赖也很方便, 支持直接下载.

FileUtils.copyURLToFile(
new URL(FILE_URL),
new File(FILE_NAME),
CONNECT_TIMEOUT,
READ_TIMEOUT);

高级用法-可恢复下载

Resumable Download

https://github.com/eugenp/tutorials/blob/master/core-java-modules/core-java-networking-2/src/main/java/com/baeldung/download/ResumableDownload.java

可恢复的下载, 做cos客户端之类下载工具的, 就得研究这种细节问题了, 普通web不用管.

  • 通过head请求获取到文件长度
URL url = new URL(FILE_URL);
HttpURLConnection httpConnection = (HttpURLConnection) url.openConnection();
httpConnection.setRequestMethod("HEAD");
long removeFileSize = httpConnection.getContentLengthLong();
  • 只下载某个区间的文件byte

Here we’ve configured the URLConnection to request the file bytes in a specific range. The range will start from the last downloaded byte and will end at the byte corresponding to the size of the remote file.


long existingFileSize = outputFile.length();
if (existingFileSize < fileLength) {
httpFileConnection.setRequestProperty(
"Range",
"bytes=" + existingFileSize + "-" + fileLength
);
}

OutputStream os = new FileOutputStream(FILE_NAME, true);

Another common way to use the Range header is for downloading a file in chunks by setting different byte ranges. For example, to download 2 KB file, we can use the range 0 – 1024 and 1024 – 2048.


package com.baeldung.download;

import java.io.*;
import java.net.*;

public class ResumableDownload {

public static long downloadFile(String downloadUrl, String saveAsFileName) throws IOException, URISyntaxException {

File outputFile = new File(saveAsFileName);
URLConnection downloadFileConnection = new URI(downloadUrl).toURL()
.openConnection();
return transferDataAndGetBytesDownloaded(downloadFileConnection, outputFile);
}

private static long transferDataAndGetBytesDownloaded(URLConnection downloadFileConnection, File outputFile) throws IOException {

long bytesDownloaded = 0;
try (InputStream is = downloadFileConnection.getInputStream(); OutputStream os = new FileOutputStream(outputFile, true)) {

byte[] buffer = new byte[1024];

int bytesCount;
while ((bytesCount = is.read(buffer)) > 0) {
os.write(buffer, 0, bytesCount);
bytesDownloaded += bytesCount;
}
}
return bytesDownloaded;
}

public static long downloadFileWithResume(String downloadUrl, String saveAsFileName) throws IOException, URISyntaxException {
File outputFile = new File(saveAsFileName);

URLConnection downloadFileConnection = addFileResumeFunctionality(downloadUrl, outputFile);
return transferDataAndGetBytesDownloaded(downloadFileConnection, outputFile);
}

private static URLConnection addFileResumeFunctionality(String downloadUrl, File outputFile) throws IOException, URISyntaxException, ProtocolException, ProtocolException {
long existingFileSize = 0L;
URLConnection downloadFileConnection = new URI(downloadUrl).toURL()
.openConnection();

if (outputFile.exists() && downloadFileConnection instanceof HttpURLConnection) {
HttpURLConnection httpFileConnection = (HttpURLConnection) downloadFileConnection;

HttpURLConnection tmpFileConn = (HttpURLConnection) new URI(downloadUrl).toURL()
.openConnection();
tmpFileConn.setRequestMethod("HEAD");
long fileLength = tmpFileConn.getContentLengthLong();
existingFileSize = outputFile.length();

if (existingFileSize < fileLength) {
httpFileConnection.setRequestProperty("Range", "bytes=" + existingFileSize + "-" + fileLength);
} else {
throw new IOException("File Download already completed.");
}
}
return downloadFileConnection;
}

}