[root@hadoop-04 ~]# PYSPARK_PYTHON=python3 pyspark2 Python 3.6.4 (default, Mar 21 2018, 13:55:56) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.2.0.cloudera2 /_/
Using Python version 3.6.4 (default, Mar 21 2018 13:55:56) SparkSession available as 'spark'. >>> lines = sc.textFile('/afis/flume/auth/2018/03/16/auth.1521129675887.log') >>> pythonlines = lines.filter(lambda line:"python"in line) >>> pythonlines.count() [Stage 0:> (0 + 2) / 2]18/03/22 13:25:22 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hadoop-02, executor 2): java.io.IOException: Cannot run program "python3": error=2, 没有那个文件或目录 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:65) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: error=2, 没有那个文件或目录 at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 14 more ...... >>> 18/03/22 13:25:23 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, hadoop-03, executor 1): TaskKilled (stage cancelled)
[root@hadoop-04 ~]# PYSPARK_PYTHON=python3 pyspark2 --master local Python 3.6.4 (default, Mar 21 2018, 13:55:56) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.2.0.cloudera2 /_/
Using Python version 3.6.4 (default, Mar 21 2018 13:55:56) SparkSession available as 'spark'. >>> lines = sc.textFile('/afis/flume/auth/2018/03/16/auth.1521129675887.log') >>> pythonlines = lines.filter(lambda line:"python"in line) >>> pythonlines.count() 0
[root@hadoop-04 ~]# PYSPARK_PYTHON=/usr/local/bin/python3 pyspark2 Python 3.6.4 (default, Mar 21 2018, 13:55:56) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 18/03/22 14:54:52 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.2.0.cloudera2 /_/
Using Python version 3.6.4 (default, Mar 21 2018 13:55:56) SparkSession available as 'spark'. >>> lines = sc.textFile('/afis/flume/auth/2018/03/16/auth.1521129675887.log') >>> pythonlines = lines.filter(lambda line:"python"in line) >>> pythonlines.count() 0 >>> pythonlines = lines.filter(lambda line:"SessionTask"in line) >>> pythonlines.count() 719