Real Estate Market Analysis of UAE
Topics Covered: Data ScrapingData PreprocessingData CleaningData SegmentationGISTableauMarket AnalysisData Analysis
This was a comprehensive data analysis project that involved various stages. We initiated by scraping data from the web, specifically from PropertyFinder.ae. After gathering the data, we meticulously cleaned and organized it using Python. Additionally, we conducted segmentation to enhance our analysis by categorizing the data in various ways. To further enrich our insights, we incorporated GIS (Geographic Information System) data for more in-depth analysis. Finally, we wrapped up the project by creating interactive dashboards to visualize our findings, making the data easily accessible and comprehensible for presentation purposes.
Project Presentation Video
Data Scraping
Step-by-step description of the key tasks and accomplishments are as follows:
- Developed a web scraping script to extract property data from PropertyFinder.ae.
- Gathered property details such as property type, price, area, bedrooms, bathrooms, and location.
Challenges
- The data scraping process was met with several challenges, requiring extensive trial and error to overcome.
- Not all the records were returning the same number of indexes, leading to the compromise of entire record rows.
- The information in the primary page we were scraping did not have extensive amount of information that we desired.
Solutions
To address the challenges, we developed a tailored algorithm that allowed us to handle the data irregularities effectively. We conducted numerous iterations and testing to ensure that the maximum number of rows could be scraped during the process.
Additionally, we incorporated a new tab-opening and tab-closing system within the same script. This innovation enabled us to access information on property pages, facilitating the extraction of property and agent details.
Data Preprocessing
Step-by-step description of the key tasks and accomplishments are as follows:
- Imported the scraped data into Python and loaded it into a pandas DataFrame.
- Conducted data cleaning, which included handling missing values, removing duplicates, and formatting columns.
- Performed data transformations and created derived columns to enhance the analysis.
Challenges
- There were a minimal number of missing values due to the algorithm we created.
- There was no clear structure of the dataset, so understanding the data initially was challenging.
Solutions
We created a blueprint of how we needed to structure the data, and we decided to convert the dataset into a third normal form. This way, each table would only contain a single element with a primary key. We identified these unique elements in the dataset:
- Properties
- Agents
- Companies
- Languages
- Amenities
1. Data Normalization Process
We organized the raw data into the five unique entities that constitute the core of our dataset. This involved identifying and separating these fundamental elements.
Further, we examined the relationships and affiliations of other data with these core entities, aiming to establish clear and structured connections.
The outcome of this process was the normalization of a single, complex table into seven distinct tables. Five of these tables represent the original entities, and the remaining two are relational tables that capture the relationships between the core elements.
2. Ensuring Data Integrity
With the dataset's structure clarified, we focused on ensuring data integrity and establishing unique primary keys for each table. We systematically examined each table and looked for columns that could serve as primary keys. If none were found, we sought candidate keys to guarantee uniqueness for each row.
To facilitate this process, we utilized Python's UID (Unique Identifier) library. This enabled us to generate unique primary keys, ensuring data consistency and referential integrity within the database.
By having unique primary keys in place for each table, we gained the flexibility and agility required to advance to subsequent stages of our analysis.
3. GIS Data Incorporation
Initially, our dataset lacked clear latitude and longitude data for each property. However, we possessed detailed street addresses for each property listing. To leverage this information, we harnessed the power of Google Maps' API for automated geocoding.
Through this process, we were able to automatically generate latitude and longitude data for every property listing in our dataset. This transformation opened up a new dimension of analysis that was previously unavailable to us, enabling more comprehensive geographical insights and visualization.
4. Data Segmentation
Following the initial data collection and cleansing phases, we moved on to data segmentation, a crucial step in organizing and categorizing our dataset. Our segmentation strategy revolved around key attributes to enhance analysis:
- Price: We categorized properties based on their price ranges, allowing for targeted insights into the affordability and value of real estate offerings.
- Area Size: Properties were segmented by their area sizes, helping us understand the distribution of property sizes across different locations.
- Location: With the newly acquired geographical data through GIS incorporation, we grouped properties based on their precise locations. This added a spatial dimension to our analysis, making it easier to identify regional trends and patterns.
- Posting Date: We also segmented properties by their posting date, facilitating a temporal analysis to spot trends and changes over time.
By applying these segmentation criteria, we gained a deeper understanding of our dataset, which facilitated more targeted analysis and visualization.
Entity-Relationship Diagram (ERD)
Why we did this?
- Data Structure Clarity: The ERD visually clarified the dataset structure, making it easier to work with.
- Data Normalization: It enabled data normalization, reducing redundancy and ensuring efficiency.
- Data Integrity: The ERD enforced referential integrity, ensuring data consistency and reliability.
- Database Design: It served as the basis for designing a relational database, essential for our analysis.
- Data Analysis Clarity: The ERD enhanced the clarity of data analysis by identifying key entities and their relationships, enabling better insights and decision-making.
Interesting Findings:
- There is a general trend toward the availability of expensive properties. We observed a compelling trend indicating a higher abundance of expensive properties. As property values decrease, their availability also declines.
- Apartments are the predominant property type, followed by villa and townhouse.
- Jumeirah Village Circle, Downtown Dubai, and Dubai Marina are the best places for finding a property within the lowest price range.
Please be advised that the dashboards are interactive and there are a many ways insights can be achieved besides these points mentioned above.