Getting Started w/ Python, Spark, and Databricks: Difference between revisions
No edit summary |
No edit summary |
||
Line 16: | Line 16: | ||
=== Databricks Native Viewer === | === Databricks Native Viewer === | ||
<pre>display(df)</pre> | <pre>display(df)</pre> | ||
== Manipulating Dataframes == | |||
=== Select === | |||
<pre>df.select("*","firstName","last_name")</pre> | |||
=== Filter === | |||
<pre>df.select("*").filter("firstName='Brian'").filter('lastName='Popp') | |||
== Combining Dataframes == | == Combining Dataframes == |
Revision as of 19:22, 2 August 2019
Loading Data
Viewing Contents of a Parquet Folder
%fs ls /mnt/training/crime-data-20016
Reading Parquet
df = spark.read.parquet ( "/mnt/training/cimr-data-2016/Crime-Data-Boston-2016.parquet" )
Viewing Results
Text Based (Generic)
show(df)
Databricks Native Viewer
display(df)
Manipulating Dataframes
Select
df.select("*","firstName","last_name")
Filter
df.select("*").filter("firstName='Brian'").filter('lastName='Popp') == Combining Dataframes == === Union === <pre>homicidesBostonDF = homicidesNewYorkDF.union ( homicidesBostonDF )