Julia Tutorial

Selected Reading

Julia - Data Frames

Jupa Programming - Data Frames

DataFrame may be defined as a table or spreadsheet which we can be used to sort as well as explore a set of related data values. In other words, we can call it a smarter array for holding tabular data. Before we use it, we need to download and install DataFrame and CSV packages as follows −

(@v1.5) pkg> add DataFrames(@v1.5) pkg> add CSV

To start using the DataFrames package, type the following command −

jupa> using DataFrames

Loading data into DataFrames

There are several ways to create new DataFrames (which we will discuss later in this section) but one of the quickest ways to load data into DataFrames is to load the Anscombe dataset. For better understanding, let us see the example below −

anscombe = DataFrame(         [10 10 10 8 8.04 9.14 7.46 6.58;           8 8   8 8 6.95 8.14 6.77 5.76;          13 13 13 8 7.58 8.74 12.74 7.71;           9 9   9 8 8.81 8.77 7.11 8.84;          11 11 11 8 8.33 9.26 7.81 8.47;          14 14 14 8 9.96 8.1  8.84 7.04;           6 6   6 8 7.24 6.13 6.08 5.25;           4 4   4 19 4.26 3.1 5.39 12.5;          12 12 12 8 10.84 9.13 8.15 5.56;           7 7   7 8 4.82  7.26 6.42 7.91;           5 5   5 8 5.68 4.74 5.73 6.89]);

jupa> rename!(anscombe, [Symbol.(:N, 1:4); Symbol.(:M, 1:4)])11×8 DataFrame│ Row │    N1   │    N2   │   N3    │    N4   │    M1   │     M2  │     M3  │    M4   ││     │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤│ 1   │ 10.0    │ 10.0    │ 10.0    │ 8.0     │ 8.04    │ 9.14    │ 7.46    │ 6.58    ││ 2   │ 8.0     │ 8.0     │ 8.0     │ 8.0     │ 6.95    │ 8.14    │ 6.77    │ 5.76    ││ 3   │ 13.0    │ 13.0    │ 13.0    │ 8.0     │ 7.58    │ 8.74    │ 12.74   │ 7.71    ││ 4   │ 9.0     │ 9.0     │ 9.0     │ 8.0     │ 8.81    │ 8.77    │ 7.11    │ 8.84    ││ 5   │ 11.0    │ 11.0    │ 11.0    │ 8.0     │ 8.33    │ 9.26    │ 7.81    │ 8.47    ││ 6   │ 14.0    │ 14.0    │ 14.0    │ 8.0     │ 9.96    │ 8.1     │ 8.84    │ 7.04    ││ 7   │ 6.0     │ 6.0     │ 6.0     │ 8.0     │ 7.24    │ 6.13    │ 6.08    │ 5.25    ││ 8   │ 4.0     │ 4.0     │ 4.0     │ 19.0    │ 4.26    │ 3.1     │ 5.39    │ 12.5    ││ 9   │ 12.0    │ 12.0    │ 12.0    │ 8.0     │ 10.84   │ 9.13    │ 8.15    │ 5.56    ││10   │ 7.0     │ 7.0     │ 7.0     │ 8.0     │ 4.82    │ 7.26    │ 6.42    │ 7.91    ││11   │ 5.0     │ 5.0     │ 5.0     │ 8.0     │ 5.68    │ 4.74    │ 5.73    │ 6.89    │

We assigned the DataFrame to a variable named Anscombe, convert them to an array and then rename columns.

Collected Datasets

We can also use another dataset package named RDatasets package. It contains several other famous datasets including Anscombe’s. Before we start using it, we need to first download and install it as follows −

(@v1.5) pkg> add RDatasets

To start using this package, type the following command −

jupa> using DataFramesjupa> anscombe = dataset("datasets","anscombe")11×8 DataFrame│ Row │   X1  │   X2  │   X3  │   X4  │   Y1    │    Y2   │    Y3   │   Y4    ││     │ Int64 │ Int64 │ Int64 │ Int64 │ Float64 │ Float64 │ Float64 │ Float64 │├─────┼───────┼───────┼───────┼───────┼─────────┼─────────┼─────────┼─────────┤│ 1   │  10   │   10  │    10 │  8    │   8.04  │  9.14   │    7.46 │ 6.58    ││ 2   │  8    │   8   │    8  │  8    │   6.95  │  8.14   │    6.77 │ 5.76    ││ 3   │  13   │   13  │    13 │  8    │   7.58  │  8.74   │    12.74│ 7.71    ││ 4   │  9    │   9   │    9  │  8    │   8.81  │  8.77   │    7.11 │ 8.84    ││ 5   │  11   │   11  │    11 │  8    │   8.33  │  9.26   │    7.81 │ 8.47    ││ 6   │  14   │   14  │    14 │  8    │   9.96  │  8.1    │    8.84 │ 7.04    ││ 7   │  6    │   6   │    6  │  8    │   7.24  │  6.13   │    6.08 │ 5.25    ││ 8   │  4    │   4   │    4  │  19   │   4.26  │  3.1    │    5.39 │ 12.5    ││ 9   │  12   │   12  │    12 │  8    │   10.84 │  9.13   │    8.15 │ 5.56    ││ 10  │  7    │   7   │    7  │  8    │   4.82  │  7.26   │    6.42 │ 7.91    ││ 11  │  5    │   5   │    5  │  8    │   5.68  │  4.74   │    5.73 │ 6.89    │

Empty DataFrames

We can also create DataFrames by simply providing the information about rows, columns as we give in an array.

Example

jupa> empty_df = DataFrame(X = 1:10, Y = 21:30)10×2 DataFrame│ Row │   X   │   Y   ││     │ Int64 │ Int64 │├─────┼───────┼───────┤│ 1   │    1  │   21  ││ 2   │    2  │   22  ││ 3   │    3  │   23  ││ 4   │    4  │   24  ││ 5   │    5  │   25  ││ 6   │    6  │   26  ││ 7   │    7  │   27  ││ 8   │    8  │   28  ││ 9   │    9  │   29  ││ 10  │   10  │   30  │

To create completely empty DataFrame, we only need to supply the Column Names and define their types as follows −

jupa> Complete_empty_df = DataFrame(Name=String[],         W=Float64[],         H=Float64[],         M=Float64[],         V=Float64[])0×5 DataFrame

jupa> Complete_empty_df = vcat(Complete_empty_df, DataFrame(Name="EmptyTestDataFrame", W=5.0, H=5.0, M=3.0, V=5.0))1×5 DataFrame│ Row │     Name           │     W   │    H    │   M     │     V   ││     │     String         │ Float64 │ Float64 │ Float64 │ Float64 │├─────┼────────────────────┼─────────┼─────────┼─────────┼─────────┤│ 1   │ EmptyTestDataFrame │  5.0    │    5.0  │   3.0   │     5.0 │

jupa> Complete_empty_df = vcat(Complete_empty_df, DataFrame(Name="EmptyTestDataFrame2", W=6.0, H=6.0, M=5.0, V=7.0))2×5 DataFrame│ Row │        Name         │    W    │   H     │     M   │    V    ││     │        String       │ Float64 │ Float64 │ Float64 │ Float64 │├─────┼─────────────────────┼─────────┼─────────┼─────────┼─────────┤│ 1   │ EmptyTestDataFrame  │    5.0  │   5.0   │  3.0    │    5.0  ││ 2   │ EmptyTestDataFrame2 │    6.0  │   6.0   │  5.0    │    7.0  │

Plotting Anscombe’s Quarter

Now the Anscombe dataset has been loaded, we can do some statistics with it also. The inbuilt function named describe() enables us to calculate the statistics properties of the columns of a dataset. You can supply the symbols, given below, for the properties −

mean

std

min

q25

median

q75

max

eltype

nunique

first

last

nmissing

Example

jupa> describe(anscombe, :mean, :std, :min, :median, :q25)8×6 DataFrame│ Row │ variable │   mean  │     std │ min  │  median │   q25   ││     │  Symbol  │ Float64 │ Float64 │ Real │ Float64 │ Float64 │├─────┼──────────┼─────────┼─────────┼──────┼─────────┼─────────┤│  1  │     X1   │   9.0   │ 3.31662 │    4 │   9.0   │  6.5    ││  2  │     X2   │   9.0   │ 3.31662 │    4 │   9.0   │  6.5    ││  3  │     X3   │   9.0   │ 3.31662 │    4 │   9.0   │  6.5    ││  4  │     X4   │   9.0   │ 3.31662 │    8 │   8.0   │  8.0    ││  5  │     Y1   │ 7.50091 │ 2.03157 │ 4.26 │   7.58  │ 6.315   ││  6  │     Y2   │ 7.50091 │ 2.03166 │ 3.1  │   8.14  │ 6.695   ││  7  │     Y3   │   7.5   │ 2.03042 │ 5.39 │   7.11  │ 6.25    ││  8  │     Y4   │ 7.50091 │ 2.03058 │ 5.25 │   7.04  │ 6.17    │

We can also do a comparison between XY datasets as follows −

jupa> [describe(anscombe[:, xy], :mean, :std, :median, :q25) for xy in [[:X1, :Y1], [:X2, :Y2], [:X3, :Y3], [:X4, :Y4]]]4-element Array{DataFrame,1}:2×5 DataFrame│ Row │ variable │   mean  │     std │  median │   q25   ││     │  Symbol  │ Float64 │ Float64 │ Float64 │ Float64 │├─────┼──────────┼─────────┼─────────┼─────────┼─────────┤│ 1   │   X1     │   9.0   │ 3.31662 │    9.0  │   6.5   ││ 2   │  Y1      │ 7.50091 │ 2.03157 │    7.58 │ 6.315   │2×5 DataFrame│ Row │ variable │   mean  │  std    │ median  │   q25   ││     │  Symbol  │ Float64 │ Float64 │ Float64 │ Float64 │├─────┼──────────┼─────────┼─────────┼─────────┼─────────┤│ 1   │     X2   │    9.0  │ 3.31662 │    9.0  │     6.5 ││ 2   │     Y2   │ 7.50091 │ 2.03166 │    8.14 │   6.695 │2×5 DataFrame│ Row │ variable │   mean  │     std │  median │   q25   ││     │  Symbol  │ Float64 │ Float64 │ Float64 │ Float64 │├─────┼──────────┼─────────┼─────────┼─────────┼─────────┤│ 1   │     X3   │     9.0 │ 3.31662 │    9.0  │   6.5   ││ 2   │     Y3   │     7.5 │ 2.03042 │    7.11 │   6.25  │2×5 DataFrame│ Row │ variable │   mean  │  std    │  median │   q25   ││     │  Symbol  │ Float64 │ Float64 │ Float64 │ Float64 │├─────┼──────────┼─────────┼─────────┼─────────┼─────────┤│ 1   │     X4   │   9.0   │ 3.31662 │    8.0  │   8.0   ││ 2   │     Y4   │ 7.50091 │ 2.03058 │    7.04 │   6.17  │

Let us reveal the true purpose of Anscombe, i.e., plot the four sets of its quartet as follows −

jupa> using StatsPlots[ Info: Precompipng StatsPlots [f3b207a7-027a-5e70-b257-86293d7955fd]jupa> @df anscombe scatter([:X1 :X2 :X3 :X4], [:Y1 :Y2 :Y3 :Y4],            smooth=true,            pne = :red,            pnewidth = 2,            title= ["X$i vs Y$i" for i in (1:4) ],            legend = false,            layout = 4,            xpmits = (2, 20),            ypmits = (2, 14))

Regression and Models

In this section, we will be working with Linear Regression pne for the dataset. For this we need to use Generapzed Linear Model (GLM) package which you need to first add as follows −

(@v1.5) pkg> add GLM

Now let us create a pner regression model by specifying a formula using the @formula macro and supplying columns names as well as name of the DataFrame. An example for the same is given below −

jupa> pnearregressionmodel = fit(LinearModel, @formula(Y1 ~ X1), anscombe)StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}Y1 ~ 1 + X1Coefficients:───────────────────────────────────────────────────────────────────────               Coef.    Std.     Error    t Pr(>|t|) Lower 95% Upper 95%───────────────────────────────────────────────────────────────────────(Intercept) 3.00009  1.12475     2.67     0.0257      0.455737  5.54444   X1       0.500091 0.117906    4.24     0.0022      0.23337   0.766812───────────────────────────────────────────────────────────────────────

Let us check the summary and the coefficient of the above created pnear regression model −

jupa> summary(pnearregressionmodel)"StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}"jupa> coef(pnearregressionmodel)2-element Array{Float64,1}: 3.0000909090909054 0.5000909090909096

Now let us produce a function for the regression pne. The form of the function is y = ax +c.

jupa> f(x) = coef(pnearmodel)[2] * x + coef(pnearmodel)[1]f (generic function with 1 method)

Once we have the function that describes the regression pne, we can draw a plot as follows −

jupa> p1 = plot(anscombe[:X1], anscombe[:Y1],            smooth=true,            seriestype=:scatter,            title = "X1 vs Y1",            pnewidth=8,            pnealpha=0.5,            label="data")            jupa> plot!(f, 2, 20, label="correlation")

Working with DataFrames

As we know that nothing is perfect. This is also true in case of datasets because not all the datasets are consistent and tidy. To show how we can work with different items of DataFrame, let us create a test DataFrame −

jupa> testdf = DataFrame( Number = [3, 5, 7, 8, 20 ],                             Name = ["Lithium", "Boron", "Nitrogen", "Oxygen", "Calcium" ],                     AtomicWeight = [6.941, 10.811, 14.0067, 15.9994, 40.078 ],                           Symbol = ["Li", "B", "N", "O", "Ca" ],                       Discovered = [1817, 1808, 1772, 1774, missing ])5×5 DataFrame│ Row │ Number │     Name │ AtomicWeight │ Symbol │ Discovered ││     │  Int64 │  String  │      Float64 │ String │   Int64?   │├─────┼────────┼──────────┼──────────────┼────────┼────────────┤│  1  │     3  │ Lithium  │      6.941   │     Li │   1817     ││  2  │     5  │  Boron   │      10.811  │     B  │   1808     ││  3  │     7  │ Nitrogen │      14.0067 │     N  │   1772     ││  4  │     8  │  Oxygen  │      15.9994 │     O  │   1774     ││  5  │     20 │ Calcium  │      40.078  │     Ca │   missing  │

Handpng missing values

There can be some missing values in datasets. It can be checked with the help of describe() function as follows −

jupa> describe(testdf)5×8 DataFrame│ Row │     variable │  mean   │ min   │ median  │    max │ nunique │ nmissing │        eltype         ││     │     Symbol   │  Union… │ Any   │ Union…  │    Any │ Union…  │ Union…   │        Type           │├─────┼──────────────┼─────────┼───────┼─────────┼────────┼─────────┼──────────┼───────────────────────┤│  1  │     Number   │  8.6    │    3  │  7.0    │    20  │         │          │       Int64           ││  2  │     Name     │         │ Boron │         │ Oxygen │    5    │          │       String          ││  3  │ AtomicWeight │ 17.5672 │ 6.941 │ 14.0067 │ 40.078 │         │          │       Float64         ││  4  │ Symbol       │         │    B  │         │      O │    5    │          │       String          ││  5  │ Discovered   │ 1792.75 │ 1772  │ 1791.0  │   1817 │         │     1    │ Union{Missing, Int64} │

Jupa provides a special datatype called Missing to address such issue. This datatype indicates that there is not a usable value at this location. That is why the DataFrames packages allow us to get most of our datasets and make sure that the calculations are not tampered due to missing values.

Looking for missing values

We can check with ismissing() function that whether the DataFrame has any missing value or not.

Example

jupa> for row in 1:nrows            for col in 1:ncols               if ismissing(testdf [row,col])                  println("$(names(testdf)[col]) value for $(testdf[row,:Name]) is missing!")               end            end         end

Discovered value for Calcium is missing!

Repairing DataFrames

We can use the following code to change values that are not acceptable pke “n/a”, “0”, “missing”. The below code will look in every cell for above mentioned non-acceptable values.

Example

jupa> for row in 1:size(testdf, 1) # or nrow(testdf)         for col in 1:size(testdf, 2) # or ncol(testdf)            println("processing row $row column $col ")            temp = testdf [row,col]            if ismissing(temp)               println("skipping missing")            elseif temp == "n/a" || temp == "0" || temp == 0               testdf [row, col] = missing               println("changed row $row column $col ")            end         end      endprocessing row 1 column 1processing row 1 column 2processing row 1 column 3processing row 1 column 4processing row 1 column 5processing row 2 column 1processing row 2 column 2processing row 2 column 3processing row 2 column 4processing row 2 column 5processing row 3 column 1processing row 3 column 2processing row 3 column 3processing row 3 column 4processing row 3 column 5processing row 4 column 1processing row 4 column 2processing row 4 column 3processing row 4 column 4processing row 4 column 5processing row 5 column 1processing row 5 column 2processing row 5 column 3processing row 5 column 4processing row 5 column 5skipping missing

Working with missing values

Jupa provides support for representing missing values in the statistical sense, that is for situations where no value is available for a variable in an observation, but a vapd value theoretically exists.

completecases()

The completecases() function is used to find the maximum value of the column that contains the missing value.

Example

jupa> maximum(testdf[completecases(testdf), :].Discovered)1817

dropmissing()

The dropmissing() function is used to get the copy of DataFrames without having the missing values.

Example

jupa> dropmissing(testdf)4×5 DataFrame│ Row │ Number │    Name  │ AtomicWeight │ Symbol │ Discovered ││     │ Int64  │  String  │   Float64    │ String │    Int64   │├─────┼────────┼──────────┼──────────────┼────────┼────────────┤│  1  │  3     │ Lithium  │     6.941    │   Li   │   1817     ││  2  │  5     │  Boron   │   10.811     │    B   │   1808     ││  3  │  7     │ Nitrogen │   14.0067    │    N   │   1772     ││  4  │  8     │  Oxygen  │   15.9994    │    O   │   1774     │

Modifying DataFrames

The DataFrames package of Jupa provides various methods using which you can add, remove, rename columns, and add/delete rows.

Adding Columns

We can use hcat() function to add a column of integers to the DataFrame. It can be used as follows −

jupa> hcat(testdf, axes(testdf, 1))5×6 DataFrame│ Row │ Number │     Name │ AtomicWeight │ Symbol │ Discovered │  x1   ││     │ Int64  │   String │   Float64    │ String │   Int64?   │ Int64 │├─────┼────────┼──────────┼──────────────┼────────┼────────────┼───────┤│  1  │  3     │  Lithium │      6.941   │   Li   │   1817     │   1   ││  2  │  5     │    Boron │      10.811  │   B    │    1808    │   2   ││  3  │  7     │ Nitrogen │      14.0067 │   N    │    1772    │   3   ││  4  │  8     │   Oxygen │      15.9994 │   O    │   1774     │   4   ││  5  │  20    │  Calcium │      40.078  │   Ca   │   missing  │   5   │

But as you can notice that we haven’t changed the DataFrame or assigned any new DataFrame to a symbol. We can add another column as follows −

jupa> testdf [!, :MP] = [180.7, 2300, -209.86, -222.65, 839]5-element Array{Float64,1}: 180.7 2300.0 -209.86 -222.65 839.0jupa> testdf5×6 DataFrame│ Row │ Number │    Name  │ AtomicWeight │ Symbol │ Discovered │   MP    ││     │  Int64 │  String  │   Float64    │ String │   Int64?   │ Float64 │├─────┼────────┼──────────┼──────────────┼────────┼────────────┼─────────┤│  1  │     3  │  Lithium │      6.941   │   Li   │    1817    │ 180.7   ││  2  │     5  │  Boron   │      10.811  │    B   │    1808    │ 2300.0  ││  3  │     7  │ Nitrogen │      14.0067 │   N    │    1772    │ -209.86 ││  4  │     8  │   Oxygen │      15.9994 │   O    │    1774    │ -222.65 ││  5  │     20 │  Calcium │      40.078  │   Ca   │    missing │ 839.0   │

We have added a column having melting points of all the elements to our test DataFrame.

Removing Columns

We can use select!() function to remove a column from the DataFrame. It will create a new DataFrame that contains the selected columns, hence to remove a particular column, we need to use select!() with Not. It is shown in the given example −

jupa> select!(testdf, Not(:MP))5×5 DataFrame│ Row │ Number │  Name    │ AtomicWeight │ Symbol │ Discovered ││     │ Int64  │ String   │   Float64    │ String │   Int64?   │├─────┼────────┼──────────┼──────────────┼────────┼────────────┤│  1  │     3  │ Lithium  │      6.941   │   Li   │   1817     ││  2  │     5  │  Boron   │      10.811  │   B    │   1808     ││  3  │     7  │ Nitrogen │      14.0067 │   N    │   1772     ││  4  │     8  │  Oxygen  │      15.9994 │   O    │   1774     ││  5  │     20 │ Calcium  │      40.078  │   Ca   │   missing  │

We have removed the column MP from our Data Frame.

Renaming Columns

We can use rename!() function to rename a column in the DataFrame. We will be renaming the AtomicWeight column to AW as follows −

jupa> rename!(testdf, :AtomicWeight => :AW)5×5 DataFrame│ Row │ Number │ Name     │   AW    │ Symbol │ Discovered ││     │ Int64  │ String   │ Float64 │ String │  Int64?    │├─────┼────────┼──────────┼─────────┼────────┼────────────┤│  1  │   3    │  Lithium │ 6.941   │  Li    │     1817   ││  2  │   5    │    Boron │ 10.811  │  B     │     1808   ││  3  │   7    │ Nitrogen │ 14.0067 │  N     │     1772   ││  4  │   8    │  Oxygen  │ 15.9994 │  O     │     1774   ││  5  │  20    │  Calcium │ 40.078  │  Ca    │  missing   │

Adding rows

We can use push!() function with suitable data to add rows in the DataFrame. In the below given example we will be adding a row having element Cooper −

Example

jupa> push!(testdf, [29, "Copper", 63.546, "Cu", missing])6×5 DataFrame│ Row │ Number │    Name  │      AW │ Symbol │ Discovered ││     │  Int64 │  String  │ Float64 │ String │    Int64?  │├─────┼────────┼──────────┼─────────┼────────┼────────────┤│  1  │     3  │  Lithium │   6.941 │     Li │     1817   ││  2  │     5  │    Boron │  10.811 │      B │     1808   ││  3  │     7  │ Nitrogen │ 14.0067 │      N │      1772  ││  4  │     8  │  Oxygen  │ 15.9994 │      O │      1774  ││  5  │     20 │  Calcium │  40.078 │     Ca │   missing  ││  6  │     29 │  Copper  │  63.546 │     Cu │  missing   │

Deleting rows

We can use deleterows!() function with suitable data to delete rows from the DataFrame. In the below given example we will be deleting three rows (4th, 5th,and 6th) from our test data frame −

Example

jupa> deleterows!(testdf, 4:6)3×5 DataFrame│ Row │ Number │  Name    │   AW    │ Symbol │ Discovered ││     │  Int64 │ String   │ Float64 │ String │  Int64?    │├─────┼────────┼──────────┼─────────┼────────┼────────────┤│  1  │  3     │  Lithium │   6.941 │  Li    │  1817      ││  2  │  5     │    Boron │  10.811 │  B     │  1808      ││  3  │  7     │ Nitrogen │ 14.0067 │  N     │  1772      │

Finding values in DataFrame

To find the values in DataFrame, we need to use an elementwise operator examining all the rows. This operator will return an array of Boolean values to indicate whether cells meet the criteria or not.

Example

jupa> testdf[:, :AW] .< 103-element BitArray{1}:100jupa> testdf[testdf[:, :AW] .< 10, :]1×5 DataFrame│ Row │ Number │   Name  │      AW │ Symbol │ Discovered ││     │  Int64 │  String │ Float64 │ String │   Int64?   │├─────┼────────┼─────────┼─────────┼────────┼────────────┤│ 1   │    3   │ Lithium │ 6.941   │   Li   │   1817     │

Sorting

To sort the values in DataFrame, we can use sort!() function. We need to give the columns on which we want to sort.

Example

jupa> sort!(testdf, [order(:AW)])3×5 DataFrame│ Row │ Number │  Name    │   AW    │ Symbol │ Discovered ││     │  Int64 │ String   │ Float64 │ String │  Int64?    │├─────┼────────┼──────────┼─────────┼────────┼────────────┤│ 1   │  3     │ Lithium  │   6.941 │  Li    │     1817   ││ 2   │  5     │  Boron   │  10.811 │  B     │     1808   ││ 3   │  7     │ Nitrogen │ 14.0067 │  N     │     1772   │

The DataFrame is sorted based on the values of column AW.

Jupa Programming - Data Frames

Loading data into DataFrames

Collected Datasets

Empty DataFrames

Example

Plotting Anscombe’s Quarter

Example

Regression and Models

Working with DataFrames

Handpng missing values

Looking for missing values

Example

Repairing DataFrames

Example

Working with missing values

completecases()

dropmissing()

Modifying DataFrames

Adding Columns

Removing Columns

Renaming Columns

Adding rows

Deleting rows

Finding values in DataFrame

Sorting

友情链接