basic_data

This module defines the basic DataBunch object that is used inside to train a model. This is the generic class, that can take any kind of fastai Dataset or . You’ll find helpful functions in the data module of every application to directly create this DataBunch for you.

`class` `DataBunch`[source][test]

pytest -sv tests/test_data_block.py::test_custom_dataset

Some other tests where DataBunch is used:

pytest -sv tests/test_basic_data.py::test_DataBunch_Create [source]
pytest -sv tests/test_basic_data.py::test_DataBunch_no_valid_dl
pytest -sv tests/test_basic_data.py::test_DataBunch_save_load [source]

To run tests please refer to this .

Bind train_dl,valid_dl and test_dl in a data object.

It also ensures all the dataloaders are on device and applies to them dl_tfms as batch are drawn (like normalization). path is used internally to store temporary files, collate_fn is passed to the pytorch Dataloader (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in vision.image or the why this can be important).

train_dl, valid_dl and optionally test_dl will be wrapped in DeviceDataLoader.

`create`[source][test]

create(train_ds:, valid_ds:Dataset, test_ds:Optional[]=None, path:PathOrStr='.', bs:int=64, val_bs:int=None, num_workers:int=8, dl_tfms:Optional[Collection[Callable]]=None, device:device=None, collate_fn:Callable='data_collate', no_check:bool=False, **dl_kwargs) → DataBunch Tests found for create:

pytest -sv tests/test_basic_data.py::test_DataBunch_Create
pytest -sv tests/test_basic_data.py::test_DataBunch_no_valid_dl [source]

Some other tests where create is used:

pytest -sv tests/test_basic_data.py::test_DeviceDataLoader_getitem

To run tests please refer to this guide.

Create a from train_ds, valid_ds and maybe test_ds with a batch size of bs. Passes **dl_kwargs to DataLoader()

num_workers is the number of CPUs to use, tfms, device and collate_fn are passed to the init method.

Warning: You can pass regular pytorch Dataset here, but they’ll require more attributes than the basic ones to work with the library. See below for more details.

Visualization

`show_batch`[source][test]

show_batch(rows:int=5, ds_type:=<DatasetType.Train: 1>, reverse:bool=False, **kwargs) Tests found for show_batch:

To run tests please refer to this guide.

Show a batch of data in ds_type on a few .

Grabbing some data

`dl`[test]

dl(ds_type:DatasetType=<DatasetType.Valid: 2>) → No tests found for dl. To contribute a test please refer to this guide and .

Returns an appropriate DataLoader with a dataset for validation, training, or test (ds_type).

`one_batch`[source][test]

one_batch(ds_type:=<DatasetType.Train: 1>, detach:bool=True, denorm:bool=True, cpu:bool=True) → Collection[Tensor] Tests found for one_batch:

pytest -sv tests/test_basic_data.py::test_DataBunch_onebatch [source]
pytest -sv tests/test_basic_data.py::test_DataBunch_save_load
pytest -sv tests/test_text_data.py::test_backwards_cls_databunch [source]
pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_1
pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_2 [source]

To run tests please refer to this .

Get one batch from the data loader of ds_type. Optionally detach and denorm.

`one_item`[test]

one_item(item, detach:bool=False, denorm:bool=False, cpu:bool=False) Tests found for one_item:

pytest -sv tests/test_basic_data.py::test_DataBunch_oneitem [source]

To run tests please refer to this .

Get item into a batch. Optionally detach and denorm.

`sanity_check`[test]

Check the underlying data in the training set can be properly loaded.

You can save your DataBunch object for future use with this method.

`save`[source][test]

pytest -sv tests/test_basic_data.py::test_DataBunch_save_load

To run tests please refer to this guide.

Save the in self.path/file. file can be file-like (file or buffer)

`load_data`[test]

load_data(path:PathOrStr, file:PathLikeOrBinaryStream='data_save.pkl', bs:int=64, val_bs:int=None, num_workers:int=8, dl_tfms:Optional[Collection[Callable]]=None, device:device=None, collate_fn:Callable='data_collate', no_check:bool=False, **kwargs) → Tests found for load_data:

pytest -sv tests/test_basic_data.py::test_DataBunch_save_load [source]
pytest -sv tests/test_text_data.py::test_load_and_save_test

To run tests please refer to this guide.

Load a saved from path/file. file can be file-like (file or buffer)

Important: The arguments you passed when you created your first DataBunch aren’t saved, so you should pass them here if you don’t want the default.

Note: Data cannot be serialized on Windows and then loaded on Linux or vice versa because Path object doesn’t support this. We will find a workaround for that in v2.

This is to allow you to easily create a new DataBunch with a different batch size for instance. You will also need to reapply any normalization (in vision) you might have done on your original .

Empty for inference

`export`[test]

export(file:PathLikeOrBinaryStream='export.pkl') No tests found for export. To contribute a test please refer to this guide and .

Export the minimal state of self for inference in self.path/file. file can be file-like (file or buffer)

`load_empty`[test]

load_empty(path, fname:str='export.pkl') No tests found for _databunch_load_empty. To contribute a test please refer to this guide and .

Load an empty DataBunch from the exported file in path/fname with optional tfms.

This method should be used to create a at inference, see the corresponding tutorial.

`add_test`[source][test]

add_test(items:Iterator[T_co], label:Any=None, tfms=None, tfm_y=None) No tests found for add_test. To contribute a test please refer to and this discussion.

Add the items as a test set. Pass along label otherwise label them with .

Dataloader transforms

`add_tfm`[source][test]

add_tfm(:Callable) No tests found for add_tfm. To contribute a test please refer to and this discussion.

Adds a transform to all dataloaders.

If you want to use your pytorch in fastai, you may need to implement more attributes/methods if you want to use the full functionality of the library. Some functions can easily be used with your pytorch Dataset if you just add an attribute, for others, the best would be to create your own by following this tutorial. Here is a full list of what the library will expect.

First of all, you obviously need to implement the methods __len__ and __getitem__, as indicated by the pytorch docs. Then the most needed things would be:

c attribute: it’s used in most functions that directly create a (tabular_learner, , unet_learner, ) and represents the number of outputs of the final layer of your model (also the number of classes if applicable).
classes attribute: it’s used by ClassificationInterpretation and also in (best to use CollabDataBunch.from_df than a pytorch ) and represents the unique tags that appear in your data.
maybe a loss_func attribute: that is going to be used by Learner as a default loss function, so if you know your custom requires a particular loss, you can put it.

Toy example with image-like numpy arrays and binary label

For a specific application

In text, your dataset will need to have a vocab attribute that should be an instance of . It’s used by text_classifier_learner and when building the model.

Functions that really won’t work

To make those last functions work, you really need to use the and maybe write your own custom ItemList.

(requires .x.reconstruct, .y.reconstruct and .x.show_xys)
Learner.predict (requires x.set_item, .y.analyze_pred, .y.reconstruct and maybe .x.reconstruct)
(requires x.reconstruct, y.analyze_pred, y.reconstruct and x.show_xyzs)
DataBunch.set_item (requires x.set_item)
Learner.backward (uses DataBunch.set_item)
(requires export)

`class` `DeviceDataLoader`[test]

DeviceDataLoader(dl:DataLoader, device:, tfms:List[Callable]=None, collate_fn:Callable='data_collate') Tests found for DeviceDataLoader:

Some other tests where DeviceDataLoader is used:

pytest -sv tests/test_basic_data.py::test_DeviceDataLoader_getitem [source]

To run tests please refer to this .

Bind a DataLoader to a .

Put the batches of dl on device after applying an optional list of tfms. collate_fn will replace the one of dl. All dataloaders of a DataBunch are of this type.

`create`[source][test]

Some other tests where create is used:

pytest -sv tests/test_basic_data.py::test_DataBunch_Create
pytest -sv tests/test_basic_data.py::test_DeviceDataLoader_getitem [source]

To run tests please refer to this .

Create DeviceDataLoader from dataset with bs and shuffle: process using num_workers.

The given collate_fn will be used to put the samples together in one batch (by default it grabs their data attribute). shuffle means the dataloader will take the samples randomly if that flag is set to True, or in the right order otherwise. tfms are passed to the init method. All kwargs are passed to the pytorch DataLoader class initialization.

Methods

`add_tfm`[test]

add_tfm(tfm:Callable) No tests found for add_tfm. To contribute a test please refer to this guide and .

Add tfm to self.tfms.

`remove_tfm`[test]

remove_tfm(tfm:Callable) No tests found for remove_tfm. To contribute a test please refer to this guide and .

Remove tfm from self.tfms.

`new`[test]

new(**kwargs) No tests found for new. To contribute a test please refer to this guide and .

Create a new copy of self with kwargs replacing current values.

`proc_batch`[test]

proc_batch(b:Tensor) → Tensor No tests found for proc_batch. To contribute a test please refer to this guide and .

Process batch b of TensorImage.

Enum = [Train, Valid, Test, Single, Fix] No tests found for DatasetType. To contribute a test please refer to this guide and .

Internal enumerator to name the training, validation and test dataset/dataloader.

Open This Notebook

Open in GCP Notebooks

©2021 fast.ai. All rights reserved.
Site last generated: Jan 5, 2021

basic_data

basic_data

class DataBunch[source][test]

create[source][test]

Visualization

show_batch[source][test]

Grabbing some data

dl[test]

one_batch[source][test]

one_item[test]

sanity_check[test]

save[source][test]

load_data[test]

Empty for inference

export[test]

load_empty[test]

add_test[source][test]

Dataloader transforms

add_tfm[source][test]

For a specific application

Functions that really won’t work

class DeviceDataLoader[test]

create[source][test]

Methods

add_tfm[test]

remove_tfm[test]

new[test]

proc_batch[test]

Open This Notebook

`class` `DataBunch`[source][test]

`create`[source][test]

`show_batch`[source][test]

`dl`[test]

`one_batch`[source][test]

`one_item`[test]

`sanity_check`[test]

`save`[source][test]

`load_data`[test]

`export`[test]

`load_empty`[test]

`add_test`[source][test]

`add_tfm`[source][test]

`class` `DeviceDataLoader`[test]

`create`[source][test]

`add_tfm`[test]

`remove_tfm`[test]

`new`[test]

`proc_batch`[test]